MOSES: A METAHEURISTIC OPTIMIZATION SOFTWARE
Transcripción
MOSES: A METAHEURISTIC OPTIMIZATION SOFTWARE
MOSES: A M ETAHEURISTIC O PTIMIZATION S OFTWARE E CO S YSTEM A PPLICATIONS TO THE A UTOMATED A NALYSIS OF S OFTWARE P RODUCT L INES AND S ERVICE - BASED A PPLICATIONS J OS É A NTONIO PAREJO M AESTRE PhD dissertation Supervised by Dr. Antonio Ruiz Cortés and Dr. Sergio Segura Rueda Universidad de Sevilla October 2013 First published in 1/10/2013 by José Antonio Parejo Maestre Copyright © http://www.isa.us.es/members/joseantonio.parejo [email protected] This is a copyleft document but the content is copyrighted Support: PhD dissertation granted by the Spanish Government under CICYT projects SETI (TIN-2009-07366) and TAPAS (TIN2012-32273), the Andalusian Government projects ISABEL (TIC-2533) and THEOS (TIC-5906), and the European Commission through the European Network of Excellence in Software Services and Systems (S-Cube). Dr. Antonio Ruiz Cortés, Profesor Titular del Área de Lenguajes y Sistemas Informáticos de la Universidad de Sevilla y Dr. Sergio Segura Rueda, Profesor Contratado Doctor del Área de Lenguajes y Sistemas Informáticos de la Universidad de Sevilla, HACEN CONSTAR que D. José Antonio Parejo Maestre, Ingeniero en Informática por la Universidad de Sevilla, ha realizado bajo nuestra supervisión el trabajo titulado MOSES: a Metaheuristic Optimization Software EcoSystem, Applications to the Automated Analysis of Software Product Lines and Service-based Applications Una vez revisado, autorizamos el comienzo de los trámites para su presentación como Tesis Doctoral al tribunal que ha de juzgarlo. Fdo. Dr. Antonio Ruiz Cortés y Dr. Sergio Segura Rueda en la Universidad de Sevilla 1/10/2013 Yo, D. José Antonio Parejo Maestre con NIF número 44602941F, DECLARO mi autorı́a del trabajo que se presenta en la memoria de esta tesis doctoral que tiene por tı́tulo: MOSES: a Metaheuristic Optimization Software EcoSystem, Applications to the Automated Analysis of Software Product Lines and Service-based Applications Lo cual firmo, Fdo. D. José Antonio Parejo Maestre en la Universidad de Sevilla 1/10/2013 Universidad de Sevilla The committee in charge of evaluating the dissertation presented by José Antonio Parejo Maestre in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Software Engineering, hereby recommends of this disserta. tion and awards the author the grade Miguel Toro Bonilla Catedrático de Universidad Universidad de Sevilla José Cristobal Riquelme Santos Catedrático de Universidad Universidad de Sevilla José Raul Romero Profesor Contratado Doctor Universidad de Córdoba Stefan Wagner Professor University of Applied Sciences Upper Austria Jorge Cardoso Associate Professor Universidad de Coimbra. Portugal. To put record where necessary, we sign minutes in , Dedicado a mi familia. Y en especial a Ana Rut porque sin su amor y entrega este documento y la felicidad de su autor no serı́an posibles. A CKNOWLEDGEMENTS After a long time of hard work, finally it’s time to look back and thank all the people who made this thesis possible. With all my gratitude. First of all, thank God, for giving me each day of life, and the gifts and strength to carrying out this thesis. Second, I want to thank my supervisors, Antonio and Sergio, because this thesis is the result of their encouragement, guidance, support and help. Thanks Antonio, for being as open-minded and brave as to embark on this adventure, allowing me work on this topic. Thanks for your useful advice on both, work and life, and thanks for your mood, your support and your leadership. Thanks Sergio, for your willingness, for your hard work, for the long hours you have spent side to side with me. Without your contribution this thesis would be much worse. I must thank you both for accepting my limitations without blame, for helping me to overcome them, and for all your dedication in this time. My fellow researchers of the ISA group also deserve a big thanks because, in one way or another, they contributed to this thesis. In particular I am grateful to Pablo Fernández, for being an actual friend and a great fellow all these years; to Guti, Manolo and Joaquı́n, for giving me his companionship and support; to Amador and Pablo Trinidad, for our discussions and for making me laugh even on the hardest times; to Carlos, Jesús and Adela for making more bearable the long hours of work with his companionship at lunch and breaks; to Beatriz for having her door always open for me; and to the rest of members of the ISA group, for their companionship and great help along this time, specially to those that are abroad, Cristina and José Marı́a. My gratitude also to the technicians of our group: Manuel León, Alejandro Trinidad, Alberto Calleja and Pablo León. I also want to show my gratitude to other members of the Department of Computer Languages and Systems, specially to Jorge Garcı́a, for being on my side all this time, and for his willingness to work with me in STATService. I am grateful also to Pepe for his contagious laughter, for his visits to our office, and along with Fernando, Rafael and Miguel, for everything you have taught me. During i my education, you made me discern the beauty that hides on the algorithms, on the languages, on the software and on its engineering, this thesis is also the result of your work. I must express gratitude to Barbara Pernici, who welcomed me warmly in his lab during my research stay in Milan, and to Maria-Grazia Fugini who supported me and worked with me during the stay. Por último unas palabras de agradecimiento en español. Gracias a mi familia, mis padres, mis hermanas, mis cuñados y cuñadas, mis suegros, etc., por estar siempre ahı́. Especialmente, gracias mamá y papá, por enseñarme la cultura del esfuerzo, pero también la importancia del descanso, por confiar plenamente en mis posibilidades, por apoyarme siempre. Finalmente, los más importantes para mı́, GRACIAS ANA RUT, porque sin tu sacrificio y entrega, sin tu amor y tu apoyo, esta tesis hubiera sido imposible. Gracias Ana, mi pequeña, por ser un rayo de luz en los momentos de oscuridad, la flor que me mira sonriendo desde tu foto en el rincón de mi mesa. Gracias, Antonio y Marcos, por hacerme desconectar a la fuerza, y por convertir la vuelta a casa en el momento más gratificante del dı́a. José Antonio Parejo Maestre September 2013 ii A GRADECIMIENTOS Tras mucho tiempo de duro trabajo, es el momento de mirar atrás y dar las gracias a toda la gente que hizo esta tesis posible. Con toda mi gratitud. En primer lugar, le doy las gracias a Dios, por darme cada dı́a de vida, los dones y la fuerza necesaria para llevar a cabo esta tesis. En segundo lugar, quiero dar las gracias a mis directores, Antonio y Sergio, porque esta tesis es el resultado de sus ánimos, orientación, apoyo y ayuda. Gracias Antonio, por ser tan valiente y abierto como para embarcarte en esta aventura, por permitirme trabajar en este tema que tanto me apasiona. Gracias por tu consejo, siempre útil y atinado, para el trabajo y para mi vida. Gracias sobre todo por tu humor y tu trato, por tu apoyo y tu liderazgo suave e inspirador. Gracias Sergio, por tu disposición, por lo duro que has trabajado durante tantas horas, siempre codo con codo conmigo. Sin tu aporte esta tesis serı́a mucho peor. Gracias a ambos por aceptar mis limitaciones sin reproches, por ayudarme a superarlas, por toda vuestra dedicación durante este tiempo. Mis compañeros del grupo ISA también merecen un gran agradecimiento, porque de una manera u otra, han contribuido a que esta tesis sea posible. Estoy especialmente agradecido a Pablo Fernández, por ser un verdadero amigo y un gran compañero todos estos años; a Guti, Manolo y Joaquı́n, por darme su compañı́a y apoyo; a Amador y Pablo Trinidad por nuestras conversaciones y por hacerme reir incluso en los momentos más duros; a Carlos, Jesús y Adela por hacer más soportables las largas horas de trabajo con su compañı́a en los almuerzos y las meriendas; a Beatriz Bernárdez por tener siempre abierta su puerta para mi; y al resto de miembros del grupo ISA, Jose, Octavio, David Ruiz y David Benavides, Ana Belén, Fabricia, pero muy especialmente a aquellos que están lejos, Cristina and José Marı́a. Mi gratitud también a los técnicos del grupo, por su gran trabajo y disposición cuando hemos colaborado en los distintos proyectos: Manuel León, Alejandro Trinidad, Alberto Calleja y Pablo León. También quiero mostrar mi gratitud a los miembros del Departamento de Lenguajes y Sistemas por la manera en que me han acogido y tratado estos años, especialmente a iii Jorge Garcı́a, por estar a mi lado y escucharme todo este tiempo, y por su disposición a colaborar conmigo en STATService. No puedo dejar de agradecer a Pepe por contagiarme tantas veces su risa, y por sus visitas a nuestro despacho, y junto con Fernando, Rafael y Miguel, por todo lo que me han enseñado. Durante mi formación, ellos consiguieron que supiera vislumbrar la belleza escondida en los algoritmos, en los lenguajes, en el software y su ingenierı́a, y por eso esta tesis es también fruto de su labor. También debo expresar mi gratitud a Barbara Pernici, por acogerme en su grupo durante mi estancia en Milan, y a Maria-Grazia Fugini por su apoyo y su colaboración conmigo durante la estancia. Gracias a mi familia, mis padres, mis hermanas, mis cuñados y cuñadas, mis suegros, y a mis otros hermanos, por estar siempre ahı́. Especialmente, gracias mamá y papá, por enseñarme la cultura del esfuerzo, pero también la importancia del descanso, por confiar plenamente en mis posibilidades, por apoyarme siempre. Finalmente, los más importantes para mı́, GRACIAS ANA RUT, porque sin tu sacrificio y entrega, sin tu amor y tu apoyo, esta tesis hubiera sido imposible. Gracias Ana, mi pequeña, por ser un rayo de luz en los momentos de oscuridad, la flor que me mira sonriendo desde tu foto en el rincón de mi mesa. Gracias, Antonio y Marcos, por hacerme desconectar a la fuerza, y por convertir la vuelta a casa en el momento más gratificante del dı́a. José Antonio Parejo Maestre Septiembre 2013 iv A BSTRACT Most of the problems that we face nowadays can be expressed as optimization problems. An optimization problem is solved by finding, from a set of candidate solutions, the one that best fulfills a set of objectives. Finding the best solution in an optimization problem is hard or even infeasible in most real cases. Heuristic algorithms have been used for decades to guide the search for satisfactory solutions in hard optimization problems at an affordable cost. Metaheuristics are reusable schemes that ease the implementation of heuristic-based algorithms to solve optimization problems. The use of metaheuristics to solve optimization problems is a largely studied topic in the field of computer sciences. In this context, software engineers recently realized of the benefits of using metaheuristics to solve hard optimization problems, usually referred to as search-based problems. This has led to a “search-based” trend observed in a number of software engineering conferences and special issues on the matter. However, despite its many benefits, the application of metaheuristics requires overcoming numerous obstacles. First, the implementation of efficient metaheuristic programs is a complex and error-prone process that require of knowledgeable developers. Although some supporting tools have been proposed, these usually automate only single tasks of the process. Also, a key challenge on the application of metaheuristic is experimentation. This is due to the fact that there is no analytical method to choose a suitable metaheuristic program for a given problem. Instead, experiments must be performed to compare the candidate techniques and their possible variants. This can lead to hundred of potential alternatives to be compared making the design, execution and analysis of experiments complex and time-consuming. Besides this, experiments are usually performed ad-hoc with generic tools and no clear guidelines introducing threats to validity and making them hardly automated and reproducible. The goal of this thesis is to reduce the cost of applying metaheuristics for solving optimization problems. To that purpose, we present a set of tools to support the selection, configuration and evaluation of metaheuristic-based applications. First, we present a comparison framework and a survey of Metaheuristic Optimization Frameworks (MOFs). This supports the selection of the right MOF for a given optimization problem. Second, we present an experimental description language (SEDL), and an v extension of it (MOEDL), to support the description of experiments and their results in a succinct, self-contained and machine-processable way. Third, we present a set of analysis operations for SEDL documents. Among others, these operations support the automated validation of SEDL experiments warning users about potential threats and suggesting possible fixes. Fourth, we present a software ecosystem (MOSES) to support the integration of metaheuristic and experimentation tools. Also, we present a reference implementation of the ecosystem, including the following tools, i ) FOM, a MOF developed by the authors, ii ) an Experimental Execution Environment (E3) for the automated analysis, execution and replication of experiments described in SEDL, and iii ) a suite of on-line software tools (STATService) supporting the most common statistical analysis tests used in the context of metaheuristics. For the validation of our work, we used MOSES to solve two relevant search-based problems in the context quality-driven web service composition and performance testing on the analysis of feature models. As a result, MOSES lessened the implementation effort and the experimentation burden, producing algorithms that improve the state of the art for both problems. vi R ESUMEN Muchas de las situaciones a las que nos enfrentamos cada dı́a pueden expresarse como problemas de optimización. Un problema de optimización se resuelve encontrando, de entre un conjunto de soluciones candidatas, aquella que mejor satisface un conjunto de objetivos. Encontrar la mejor solución para un problema de optimización es difı́cil o incluso inviable en muchos casos reales. Los algoritmos heurı́sticos se han utilizado durante décadas para guiar la búsqueda de soluciones satisfactorias para problemas de optimización duros en un plazo de ejecución asequible. Las metaheurı́sticas son esquemas de algoritmos reutilizables que facilitan el diseño de algoritmos heurı́sticos para resolver problemas de optimización duros. El uso de metaheurı́sticas para resolver problemas de optimización es un tema ampliamente estudiado. En este contexto, los ingenieros de software recientemente se dieron cuenta de los beneficios del uso de metaheurı́sticas para resolver problemas de optimización duros, generalmente conocidos como problemas de búsqueda (del inglés search-based problems). Esto ha llevado a una lı́nea de investigación emergente sobre problemas basados en búsqueda que se aprecia en las conferencias de ingenierı́a de software y los números especiales de revistas sobre el tema. Sin embargo, a pesar de sus muchas ventajas, la aplicación de metaheurı́sticas presenta numerosos obstáculos. En primer lugar, la implementación de las metaheurı́sticas como programas eficientes es un proceso complejo y propenso a errores que requiere desarrolladores expertos. Aunque se han propuesto algunas herramientas de apoyo, por lo general éstas sólo automatizan tareas aisladas de este proceso. Otro desafı́o clave en la aplicación de metaheurı́sticas es la experimentación. Esto se debe a que no hay ningún método teórico general para elegir un programa metaheurı́stico adecuado para un problema dado, pues deben realizarse experimentos para comparar las técnicas candidatas y sus posibles variantes. Esto puede conducir a cientos de alternativas posibles que deben compararse haciendo que el diseño, la ejecución y el análisis de los experimentos sean complejos y se dilaten en el tiempo. Finalmente, los experimentos se realizan generalmente con herramientas genéricas y sin directrices respecto a las amenazas a la validez, su automatización y replicabilidad. El objetivo de esta tesis es el de reducir el costo de la aplicación de metaheurı́sticas vii para la resolución de problemas de optimización. Para tal fin, se presenta un conjunto de herramientas para apoyar la selección, configuración y evaluación de las soluciones basadas en metaheurı́sticas. En primer lugar, se presenta un marco de comparación, en base al cual se han estudiado las caracterı́sticas de varios frameworks para optimización con metaheurı́sticas (MOFs). Esto da soporte la selección del MOF adecuado para el problema de optimización a resolver. En segundo lugar, se presenta un lenguaje de descripción experimental (SEDL), y una extensión del mismo (MOEDL), para dar soporte a la descripción de los experimentos y sus resultados, de manera sucinta, autocontenida y procesable automáticamente. En tercer lugar, se presenta un conjunto de operaciones de análisis de documentos SEDL. Entre otras, estas operaciones dan soporte a la validación automática de las amenazas potenciales a la validez, y advertir a los usuarios de SEDL y sugerir posibles soluciones. En cuarto lugar, se presenta un ecosistema software (MOSES) para dar soporte a la integración de las herramientas de metaheurı́sticas y de experimentación. Además, se presenta una implementación de referencia del ecosistema, incluyendo las siguientes herramientas: i ) FOM, el framework desarrollado por los autores, ii ) un entorno de ejecución Experimental (E3) para el análisis automatizado, la ejecución y la replicación de experimentos descritos en SEDL y MOEDL, y iii ) una suite de herramientas de software en lı́nea (STATService) que da soporte al análisis estadı́stico con los test más comunes en el contexto de las metaheurı́sticas. Para la validación de este trabajo, se ha usado MOSES para resolver dos problemas de optimización basados en búsqueda relevantes en el contexto de la ingenierı́a del software: la maximización de la calidad de composiciones de servicios web y las pruebas de rendimiento en el análisis de los modelos de caracterı́sticas. Como resultado, MOSES ha disminuido el esfuerzo en la implementación y la carga de la experimentación, y se han diseñado algoritmos que mejoran el estado de la técnica para ambos problemas viii C ONTENTS I Preface 1 1 Introduction 3 1.1 Research context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 Tooling support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Thesis goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Proposal solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 On the implementation of MPS applications . . . . . . . . . . . . 12 1.4.2 On the description of MOEs . . . . . . . . . . . . . . . . . . . . . . 12 1.4.3 On the automated analysis of MOEs . . . . . . . . . . . . . . . . . 13 1.4.4 On the automated conduction and replication of MOEs . . . . . . 13 1.4.5 On the development of MPS applications . . . . . . . . . . . . . . 14 1.4.6 Overall contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 Thesis context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 Structure of this dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 17 ix CONTENTS II 2 Background Information Optimization Problems and Metaheuristics 21 2.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.2 Why are optimization problems hard? . . . . . . . . . . . . . . . . 22 Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.1 Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Single-solution based metaheuristics . . . . . . . . . . . . . . . . . . . . . 27 2.3.1 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.2 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.3 Tabu search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Population methods based metaheuristics . . . . . . . . . . . . . . . . . . 32 2.4.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.2 Path Relinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.3 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . 37 2.4.4 Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Building methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.1 GRASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.2 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . 39 Metaheuristic Optimization Frameworks . . . . . . . . . . . . . . . . . . 40 2.6.1 Why are MOFs valuable? . . . . . . . . . . . . . . . . . . . . . . . 41 2.6.2 Drawbacks: All that glitters ain’t gold . . . . . . . . . . . . . . . . 42 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2 2.3 2.4 2.5 2.6 2.7 3 Experimentation x 19 45 CONTENTS 3.1 The concept of Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 Sample experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Experimental Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.1 Objects, subjects and populations . . . . . . . . . . . . . . . . . . . 48 3.3.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Experimental Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.1 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.2 Confirmatory data analysis . . . . . . . . . . . . . . . . . . . . . . 61 3.4 3.5 3.6 III 4 Experimental Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.1 Internal validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Metaheuristic Optimization Experiments . . . . . . . . . . . . . . . . . . 73 3.6.1 Selection and Tailoring experiments . . . . . . . . . . . . . . . . . 73 3.6.2 Tuning Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.6.3 Designs for MOEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.6.4 Analyses for MOEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.6.5 Threats to validity in MOEs . . . . . . . . . . . . . . . . . . . . . . 77 Contributions 79 Motivation 81 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 xi CONTENTS 4.3 4.4 5 4.2.1 On the implementation of MPS applications . . . . . . . . . . . . 83 4.2.2 On the description of MOEs . . . . . . . . . . . . . . . . . . . . . . 83 4.2.3 On the execution of MOEs . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.4 On the analysis of MOEs . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.5 On the replicability of MOEs . . . . . . . . . . . . . . . . . . . . . 87 Overview of our contributions . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.1 On the implementation of MPS applications . . . . . . . . . . . . 88 4.3.2 On the description of MOEs . . . . . . . . . . . . . . . . . . . . . . 88 4.3.3 On the execution of MOEs . . . . . . . . . . . . . . . . . . . . . . . 89 4.3.4 On the analysis of MOEs . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.5 On the replicability of MOEs . . . . . . . . . . . . . . . . . . . . . 90 4.3.6 On the development of MPS-based applications . . . . . . . . . . 91 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Comparative framework for MOFs 93 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Review Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2.2 Source material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2.3 Inclusion and Exclusion criteria . . . . . . . . . . . . . . . . . . . . 97 5.2.4 Comparison Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 xii Metaheuristic Techniques (C1) . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 102 5.3.2 Assessment and Feature Coverage Analysis . . . . . . . . . . . . 105 5.3.3 Comparative analysis . . . . . . . . . . . . . . . . . . . . . . . . . 106 CONTENTS 5.4 5.5 5.6 5.7 5.8 Adapting to a problem and its structure (C2) . . . . . . . . . . . . . . . . 107 5.4.1 Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 108 5.4.2 Assessment and Feature Coverage Analysis . . . . . . . . . . . . 114 5.4.3 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 114 Advanced characteristics (C3) . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.5.1 Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 115 5.5.2 Assessment and Feature Cover Analysis . . . . . . . . . . . . . . 116 MPS life-cycle Support (C4) . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.6.1 Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 118 5.6.2 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 121 Design, Implementation and Licensing (C5) . . . . . . . . . . . . . . . . . 122 5.7.1 Characteristics description . . . . . . . . . . . . . . . . . . . . . . 122 5.7.2 Assessment and feature cover analysis . . . . . . . . . . . . . . . 124 Documentation & support (C6) . . . . . . . . . . . . . . . . . . . . . . . . 125 5.8.1 5.9 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 128 Discussion and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.9.1 Capabilities Discussion . . . . . . . . . . . . . . . . . . . . . . . . 130 5.9.2 Evolution of the market of MOFs . . . . . . . . . . . . . . . . . . . 131 5.9.3 Potential areas of improvement of current frameworks . . . . . . 132 5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6 Scientific Experiments Description Language 137 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2 Experimental description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2.1 Objects, subjects and population . . . . . . . . . . . . . . . . . . . 138 xiii CONTENTS Constants and variables . . . . . . . . . . . . . . . . . . . . . . . . 140 6.2.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2.4 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.5 Analyses specification . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.3 Experimental execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.4 Automated analysis of SEDL documents . . . . . . . . . . . . . . . . . . . 148 6.5 6.6 7 6.2.2 6.4.1 Information extraction operations . . . . . . . . . . . . . . . . . . 149 6.4.2 Operations for validity checking . . . . . . . . . . . . . . . . . . . 151 Extension points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.5.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.5.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5.3 Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5.4 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.5.5 Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Metaheuristic Optimization Experiments Description Language 157 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.2 MOEDL experimental descriptions . . . . . . . . . . . . . . . . . . . . . . 161 7.3 xiv 7.2.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.2.2 Problem types and instances . . . . . . . . . . . . . . . . . . . . . 162 7.2.3 Metaheuristic techniques . . . . . . . . . . . . . . . . . . . . . . . 163 7.2.4 Termination criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.2.5 Random number generation algorithm . . . . . . . . . . . . . . . 166 Types of MOEs supported by MOEDL . . . . . . . . . . . . . . . . . . . . 166 CONTENTS 7.4 8 7.3.1 Selection and tailoring experiments . . . . . . . . . . . . . . . . . 166 7.3.2 Tuning experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Transformation from MOEDL to SEDL . . . . . . . . . . . . . . . . . . . . 168 7.4.1 Transformation of common elements . . . . . . . . . . . . . . . . 169 7.4.2 Transformation of Techniques Comparison Experiments . . . . . 171 7.4.3 Transformation of technique tuning experiments . . . . . . . . . . 173 7.5 Extension points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 MOSES: A Meta-heuristic Optimization Software Ecosystem 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.3 Reference Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8.4 8.3.1 Architectural Style . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8.3.2 Abstract Component View . . . . . . . . . . . . . . . . . . . . . . 188 MOSES Reference Implementation (MOSES[RI]) . . . . . . . . . . . . . . 192 8.4.1 8.5 8.6 9 STATService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Using MOSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.5.1 IV 181 MOSES Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Validation Validation 203 205 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.2 QoS-aware composite web services binding . . . . . . . . . . . . . . . . . 206 xv CONTENTS 9.3 9.4 V 9.2.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.2.3 Tailoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9.2.4 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Generation of hard FMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9.3.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 9.3.3 Tailoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 9.3.4 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.3.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 9.3.6 Experiments on the generation hard FMs . . . . . . . . . . . . . . 218 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Final Remarks 10 Conclusions 225 227 10.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 10.2 Support for Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10.3 Discussion, Limitations and Extensions . . . . . . . . . . . . . . . . . . . 230 VI Appendices 233 A MOFs assessment data 235 A.1 Evaluation per Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 A.2 Global evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 xvi CONTENTS B Meta-models and Schemas 251 B.1 SEDL Meta-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 B.2 MOEDL Meta-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 B.3 XML Schemas of SEDL and MOEDL . . . . . . . . . . . . . . . . . . . . . 265 C A Metaheuristics Description Syntax in EBNF 267 D Statistical tests supported 271 E SEA 273 F EEE: Experimental Execution Environment 277 G QoS-aware Binding of Composite Web Services 279 G.1 The QoS-aware Composite Web Services Binding Problem . . . . . . . . 279 G.1.1 QoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 G.2 Our Proposal: QoSGasp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 G.3 Previous Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 G.3.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 G.3.2 Hybrid TS with SA . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 G.4 Experiments performed on the QoSWSCB Problem . . . . . . . . . . . . 292 G.4.1 Experiment QoSWSCB-#A1: Tailoring of GRASP . . . . . . . . . . 292 G.4.2 Experiment #A2: Tuning of GRASP+PR . . . . . . . . . . . . . . . 293 G.4.3 Experiment #1: Selection of a technique for QoSWSCB . . . . . . 293 G.4.4 Experiment #2: Selection of a technique for QoSWSCB (with a different objective function) . . . . . . . . . . . . . . . . . . . . . . 298 G.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 xvii CONTENTS H Generation of Hard Feature Models 305 H.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 H.2 Feature Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 H.3 ETHOM: An evolutionary algorithm for feature models . . . . . . . . . . 309 H.3.1 Instantiation of the algorithm . . . . . . . . . . . . . . . . . . . . . 312 H.4 Experiments on the generation hard feature models . . . . . . . . . . . . 313 H.4.1 Experiment #1(b): Maximizing execution time in a SAT Solver . . 315 H.4.2 Experiment #2: Maximizing memory consumption in a BDD solver316 H.4.3 Experiment #3(a): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with JaCoP) . . . . . . . . 317 H.4.4 Experiment #3(b): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with SAT) . . . . . . . . 319 H.4.5 Experiment #4: Evaluating the impact of the Heuristics of JaCoP 319 H.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 I J Evidences of Utility and Applicability 321 I.1 Utility of the Comparative Framework for MOFs . . . . . . . . . . . . . . 321 I.2 Utility of MOSES[RI] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 I.2.1 Utility of FOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 I.2.2 Utility of STATService . . . . . . . . . . . . . . . . . . . . . . . . . 323 I.2.3 Utility of EEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Acronyms 327 Bibliography 328 xviii L IST OF F IGURES 1.1 Metaheuristic problem solving life-cycle . . . . . . . . . . . . . . . . . . . 6 1.2 Experimentation in the context of the MPS life-cycle . . . . . . . . . . . . 7 1.3 Summary of contributions per phase of the MPS life-cycle . . . . . . . . 15 2.1 Objective function landscapes of several optimization problems. . . . . . 23 2.2 Taxonomy of optimization techniques . . . . . . . . . . . . . . . . . . . . 26 2.3 Search paths generated by HC and SA . . . . . . . . . . . . . . . . . . . . 31 2.4 Crossover operators with binary enconding . . . . . . . . . . . . . . . . . 33 2.5 Sample crossover and mutation for car design solutions . . . . . . . . . . 34 2.6 Path generation in PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.7 Paths between binary encoded solutions of the car design problem. . . . 38 2.8 MOFs conceptual map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Experimental life-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Conceptual map about experimental description . . . . . . . . . . . . . . 49 3.3 Experimental objects, populations and sample . . . . . . . . . . . . . . . 50 3.4 Taxonomy of experimental variables . . . . . . . . . . . . . . . . . . . . . 51 3.5 Taxonomy of experimental variables according to their levels . . . . . . . 53 3.6 Hypothesis acceptance and rejection areas . . . . . . . . . . . . . . . . . . 63 5.1 Stacked Bar Chart showing MOFs techniques support . . . . . . . . . . . 107 5.2 Adaption to the problem and its structure support . . . . . . . . . . . . . 112 xix LIST OF FIGURES 5.3 Advanced characteristics support . . . . . . . . . . . . . . . . . . . . . . . 117 5.4 General optimization process support . . . . . . . . . . . . . . . . . . . . 121 5.5 Design, implementation & licensing assessment . . . . . . . . . . . . . . 125 5.6 Frameworks size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.7 Publications and external authors per MOF . . . . . . . . . . . . . . . . . 129 5.8 Documentation and technical support . . . . . . . . . . . . . . . . . . . . 130 5.9 General scores of MOFS as Kiviat diagrams . . . . . . . . . . . . . . . . . 135 6.1 SEDL structure and its mapping to a sample experiment . . . . . . . . . 139 6.2 Schema of the context information supported by SEDL . . . . . . . . . . 140 6.3 Schema of the context information supported by SEDL . . . . . . . . . . 140 6.4 SEDL document with randomized design . . . . . . . . . . . . . . . . . . 141 6.5 Descriptive hypothesis supported by SEDL . . . . . . . . . . . . . . . . . 142 6.6 Simple randomized design supported by SEDL . . . . . . . . . . . . . . . 142 6.7 Simple experimental execution in SEDL . . . . . . . . . . . . . . . . . . . 145 6.8 Experiment description and relational model of its results . . . . . . . . . 146 6.9 Sample of command experimental procedure . . . . . . . . . . . . . . . . 147 6.10 Samples of statistical analyses specifications and results . . . . . . . . . . 148 7.1 MOEDL structure and its mapping to a sample MOE . . . . . . . . . . . 159 7.2 Sample MOEDL experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.3 Example of mapping from MOEDL to SEDL . . . . . . . . . . . . . . . . 161 7.4 Problem instances enumeration supported by MOEDL . . . . . . . . . . 162 7.5 Optimization benchmarks specification supported by MOEDL . . . . . . 163 7.6 Problem instance generator defined in MOEDL . . . . . . . . . . . . . . . 163 7.7 Metaheuristic techniques specification supported by MOEDL . . . . . . 164 xx LIST OF FIGURES 7.8 Global termination criteria and random number generators in MOEDL . 165 7.9 Tailoring experiment in MOEDL . . . . . . . . . . . . . . . . . . . . . . . 167 7.10 Tuning experiment in MOEDL . . . . . . . . . . . . . . . . . . . . . . . . 168 7.11 Transformation from MOEDL to SEDL . . . . . . . . . . . . . . . . . . . . 174 8.1 Template MOSES component . . . . . . . . . . . . . . . . . . . . . . . . . 188 8.2 Components of MOSES 8.3 MOSES[RI] component diagram . . . . . . . . . . . . . . . . . . . . . . . 193 8.4 MOSES[RI] deployment diagram . . . . . . . . . . . . . . . . . . . . . . . 194 8.5 Architecture and users of STATService . . . . . . . . . . . . . . . . . . . . 195 8.6 Decision tree used for test selection . . . . . . . . . . . . . . . . . . . . . . 197 8.7 Snapshots of the STATService web portal . . . . . . . . . . . . . . . . . . 200 8.8 MOSES Studio user interface navigability. . . . . . . . . . . . . . . . . . . 201 9.1 Selection experiment for QoSWSC (Exp 1) . . . . . . . . . . . . . . . . . . 208 9.2 Selection experiment for QoSWSC (Exp 2) . . . . . . . . . . . . . . . . . . 209 9.3 Selection experiment for QoSWSC (Exp 3) . . . . . . . . . . . . . . . . . . 211 9.4 Selection experiment for QoSWSC (Exp 4) . . . . . . . . . . . . . . . . . . 212 9.5 Results of STATService for Exp1 (100ms) . . . . . . . . . . . . . . . . . . . 213 9.6 Tailoring of ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 9.7 Tuning of ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9.8 Analysis report and decision path generated by STATService . . . . . . . 218 9.9 ETHOM - Experiment #1 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 220 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.10 ETHOM - Experiment #2 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 221 9.11 ETHOM - Experiment #3 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 222 9.12 ETHOM - Experiment #4 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 223 xxi LIST OF FIGURES 10.1 Publications related to the contributions of this dissertation . . . . . . . . 230 B.1 Meta-model of experiments in SEDL . . . . . . . . . . . . . . . . . . . . . 254 B.2 Meta-model of experiments context in SEDL . . . . . . . . . . . . . . . . 254 B.3 Meta-model of experimental hypotheses in SEDL . . . . . . . . . . . . . 255 B.4 Meta-model of experimental variables in SEDL . . . . . . . . . . . . . . . 255 B.5 Meta-model of design in SEDL . . . . . . . . . . . . . . . . . . . . . . . . 256 B.6 Meta-model of experimental designs in SEDL . . . . . . . . . . . . . . . . 256 B.7 Meta-model of experimental configurations in SEDL . . . . . . . . . . . . 257 B.8 Meta-model of experimental executions in SEDL . . . . . . . . . . . . . . 258 B.9 Meta-model of experimental analyses specifications and results . . . . . 258 B.10 Meta-model of dataset specifications in SEDL . . . . . . . . . . . . . . . . 259 B.11 Meta-model of statistical analyses in SEDL . . . . . . . . . . . . . . . . . 260 B.12 Types of Experiments supported by MOEDL and their structure . . . . . 263 B.13 Termination criteria supported by MOEDL and their structure . . . . . . 264 E.1 Layout and structure of SEA lab-packs . . . . . . . . . . . . . . . . . . . . 275 G.1 Goods Ordering Composite Service . . . . . . . . . . . . . . . . . . . . . 280 G.2 Box plot of results in Experiment #1 and problem instance 9 . . . . . . . 296 G.3 Results in Experiment #1 and problem instance 2 . . . . . . . . . . . . . . 296 G.4 Results of each technique in Experiment #2 for problem instance 0 . . . 302 G.5 Results of each technique in Experiment #2 for problem instance 9 . . . 302 H.1 Feature relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 H.2 Cross-tree constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 H.3 Mobile phone feature model (without cross-tree constraints) . . . . . . . 309 xxii LIST OF FIGURES H.4 Encoding of a feature model . . . . . . . . . . . . . . . . . . . . . . . . . . 310 H.5 Example of one-point crossover in our algorithm . . . . . . . . . . . . . . 311 H.6 Examples of infeasible individuals and repairs . . . . . . . . . . . . . . . 312 H.7 Distribution of fitness values for random and evolutionary search . . . . 318 I.1 Support letter from University of Tubingen . . . . . . . . . . . . . . . . . 321 I.2 Support letter from Univeristy of Applied Science of Upper Austria . . . 322 I.3 Expression of interest on FOM from ISOIN . . . . . . . . . . . . . . . . . 323 I.4 Timeline of individual visitors to the STATService web portal . . . . . . 324 I.5 Map of visits to the STATService web portal . . . . . . . . . . . . . . . . . 325 xxiii L IST OF TABLES 1.1 Support per contribution for each phase of the MPS life-cycle . . . . . . 16 1.2 Support per contribution for each phase of the experimental life-cycle . 16 2.1 Pros and Cons of using MOFs . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.1 Two Sample 3x3 latin squares for a technique comparison experiment . . 58 3.2 Statistical procedure decision table. . . . . . . . . . . . . . . . . . . . . . 61 3.3 Specific STH for basic experiments with a single independent variable . 61 3.4 Specific STH for experiments with multiple independent variables . . . 62 3.5 Regression coefficients and models . . . . . . . . . . . . . . . . . . . . . . 62 5.1 Selected MOFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Areas of interest and comparison characteristics . . . . . . . . . . . . . . 100 5.3 MOFs Programming languages, platforms and licenses . . . . . . . . . . 126 9.1 Tailoring variants in ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . 215 9.2 Tuning values in ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 A.1 Coverage of features in area C1 . . . . . . . . . . . . . . . . . . . . . . . . 238 A.2 Coverage of features in area C2 . . . . . . . . . . . . . . . . . . . . . . . . 243 A.4 Coverage of features in area C4 . . . . . . . . . . . . . . . . . . . . . . . . 245 A.3 Coverage of features in area C3 . . . . . . . . . . . . . . . . . . . . . . . . 247 A.5 Scores for C1 - C4 and C6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 xxv LIST OF TABLES A.6 Scores for C5 design, implementation & licensing . . . . . . . . . . . . . 249 A.7 Global scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 D.1 Set of tests and post-hoc analyses supported by SEDL . . . . . . . . . . . 271 G.1 Service providers per Role and their corresponding QoS Guarantees . . 280 G.2 QoS aggregation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 G.3 Parameters Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 G.4 Means of obj. func. per algorithm and exec. time (Experiment 1) . . . . . 297 G.5 Mean percentage of solutions improving any obtained by other tech. (Exp1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 G.6 Means of obj. func. values per algorithm and execution time in Exp. 2 . 300 G.7 Mean percentage of solutions improving any obtained by other tech. (Exp2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 H.1 Algorithm tailoring, experiment ETHOM #A1 . . . . . . . . . . . . . . . 313 H.2 ETHOM tuning, experiment ETHOM #A1 . . . . . . . . . . . . . . . . . . 313 H.3 Evaluation results on the generation of feature models maximizing execution time in a CSP solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 H.4 Maximum execution times produced by random models and our evolutionary program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 H.5 BDD size and computation time of the hardest feature models found . . 316 xxvi PART I P REFACE 1 I NTRODUCTION Life is an optimisation problem, with tons of variables and constraints,... We can only optimise life, never solve it Chetan Baghat, (from Three mistakes of my life)- Indian novel writter his chapter presents an overview on the results presented throughout this dissertation. In Section §1.1 we review our research context. Section §1.2 describes the purpose of this work and motivates the problems addressed in this thesis. Section §1.3 describes our research goals. Section §1.4 describes the approach followed to fulfill such goals. Section §1.5 explains the context in which this work has been performed. Finally, in Section §1.6 we present the structure of this dissertation. T 3 CHAPTER 1. INTRODUCTION 1.1 R ESEARCH CONTEXT The progress and prosperity of our species has been largely determined by our ability to optimize the tasks that we perform. The step forward in Neolithic was to optimize our ability to produce food by cultivating the land and taming animals. Through the industrial revolution we greatly optimized production processes. Nowadays, the optimization of the processing and transmission of information is driving the revolution of computers and Internet. Humankind has not stopped looking for -and findingmore optimized solutions to the problems we face every day. Solving optimization problems is therefore an important task which appears in virtually every area of human activity [94]. An optimization problem can be defined as finding, from a set of candidate solutions, the one which best fulfills a set of objectives, where not all candidate solutions are feasible in general. In Mathematics and Computer science, functions are used to define which solution is better, describing the objecives of maximization or minimization, and a set of constraints are used to specify which solutions are feasible. Solutions are usually expressed as assignments of values to the set of variables on which objective functions and constraints are defined. Depending on the frequency and available time for problem solving, and on the required quality of the solutions, Rardin and Uzsoy [232] establish three kinds of optimization problems. Design problems are solved once (or at least infrequently) and quality of solutions is critical. Control problems are solved very frequently, solutions must be provided in near real-time, and quality of solutions is important but not inalienable. Planning problems provide a balance between those extremes. 1.1.1 Metaheuristics Heuristics are optimization algorithms that use details of the problem to improve the solutions obtained [219, 241]. Heuristic methods have proven to be a handy tool to solve hard optimization problems. Usually heuristics are approximate, providing a balance between the quality of solutions and the execution time required to obtain them. The problem-specificity of heuristics makes its design and development a timeconsuming task to be faced for each problem. Metaheuristics avoid the need of designing an ad-hoc heuristic algorithm for each problem from scratch. Metaheuristics provide a reusable algorithm scheme that can be tailored for each problem. The advantage 4 1.1. RESEARCH CONTEXT of reducing the cost of designing optimization algorithms as well as their good results for solving NP-hard problems, have boosted the widespread adoption of metaheuristics. Consequently, metaheuristics have been used in a plethora of contexts for solving disparate optimization problems in recent decades [120], leading also to a boom of the research in this area [94]. The tailoring of metaheuristics is performed by completing the steps in the algorithmic scheme that are not fully specified. For those steps, the intended behaviour is loosely specified by the metaheuristic, defining what should be done at a high abstraction level, but not how. This abstraction mechanism enables the definition of the algorithmic scheme of the metaheuristic independently of the problem, while taking advantage of problem-specific knowledge in the behaviour of the tailored algorithms. We name such steps, the tailoring points of the metaheuristic. Solving optimization problems using metaheuristics requires performing numerous activities to be undertaken in a coordinated manner. Figure §1.1 shows a possible grouping of such activities as a process with five major stages: Selection, Tailoring, Implementation, Tuning and Execution. We coin this process as the “Metaheuristic Problem Solving (MPS) life-cycle”. In the Selection stage the specific metaheuristic to use for problem solving is chosen. In the Tailoring stage the algorithmic scheme of the metaheuristic is completed and tailored to the specific problem at hand, obtaining a fully specified algorithm. In the Implementation stage the algorithm is implemented as a metaheuristic optimization program. In the Tuning stage specific values for the parameters of the metaheuristic program are set (e.g. population size in evolutionary algorithms). The result of this stage, a tuned metaheuristic program that can be invoked for a problem instance and provides a solution (or a set of solutions), is denoted as a MPS application. Finally, the MPS application is executed to obtain solutions to the problem in the Execution stage. The quality of the solutions provided by a MPS application depends on the appropriate matching of optimization problem and solving algorithm -this is a widely accepted consequence of the No Free Lunch (NFL) theorem [300]. In the specific context of metaheuristic optimization, this translates into an appropriate decision making in the stages of selection, tailoring and tuning. However, current theoretical development does not provide analytical methods to make those decisions [50] and the advised procedure is experimentation [22, 50]. We name the experiments intended to make such decisions Metaheuristic Optimization Experiments (MOEs). 5 CHAPTER 1. INTRODUCTION Figure 1.1: Metaheuristic problem solving life-cycle 1.1.2 Experimentation Experimentation refers to a methodical procedure of actions and measurements with the goal of empirically verifying or falsifying an hypothesis [116, 154, 248]. In the context of the MPS life-cycle, this experimentation means a process of structured inquiry on the alternatives for decision making in selection, tailoring, and tuning, and an analysis of the impact of such alternatives on the performance of the algorithm. Current methodologies involve performing several experiments and analyses in an specific sequence in order to make such decisions [19, 22, 29, 50, 237], which requires a considerable effort. In order to illustrate this point, it is necessary to consider the activities associated with experimentation in the context of the MPS life-cycle. Figure §1.2 shows a possible grouping as a process with four main ativities: Design, Development, Conduction, and Analysis. These stages are integrated into the activities of the MPS life-cycle as described below. The Design of the experiment is the first activity to be performed. The output of this activity is a detailed plan -hereinafter experimental protocol- that tries to maximize the information obtained from the experiment. In MPS contexts it implies deciding what metaheuristic algorithms will be implemented using which tailorings and parameter values, and how are we going to run them in order to reach the conclusions we are searching for. Next, experimental artefacts are developed. In MPS contexts this involves two activities: implementing the metaheuristic algorithms chosen in the previous activity, and implementing the program that executes the experimental protocol as defined in the design phase (hereinafter the experimental program). Once all experimental artefacts are available, the experiment is conducted. In our context this implies 6 1.1. RESEARCH CONTEXT Figure 1.2: Experimentation in the context of the MPS life-cycle executing the experimental program. This execution generates a dataset of results to be analysed. Finally, based on the results of such analysis, conclusions are drawn. In our context, this involves interpreting the results of the analysis to choose the metaheuristic algorithm and its parameter setting. The quality of the MPS life-cycle is determined mostly by the quality of the experiments performed for its decision making activities. The quality of the experiments is determined primarily by two factors: its degree of validity and its replicability [116] Traditionally researchers have established two types of experimental validity: external and internal [248]. In this dissertation we focus on internal validity, since it is essential to ensure the accuracy of the decision making1 . Internal validity is defined as the extent to which we can infer that the hypothesis holds (or not) from the experimental process and data. In turn, an experiment is replicable when its results can be verified and/or clarified through conducting another experiment, either following the same experimental protocol in similar conditions (exact replication), or through a different procedure aimed to verify similar hypotheses about the same phenomenon (conceptual replication) [126]. The data intensive nature of current research in computer science, and the always changing environment of current computation platforms, makes the achievement of replicability even more difficult. This situation has led to the recent rise of two trends: reproducible research and executable papers. The former trend emphasizes the need of pro1 Precision of measurements in MOEs depends mainly on the implementation technology and execution platform, and external validity is mainly related with the generalizability of conclusions 7 CHAPTER 1. INTRODUCTION viding a comprehensive, detailed and unambiguous description of the experiments and a copy of its results and artefacts [276]. These elements are usually provided as laboratory packages (henceforth named lab-packs) that contain all the relevant information of the experiment. The goal of providing a comprehensive description of the experiments is being achieved currently by creating repositories of experimental lab-packs [87, 202, 218, 223]. The executable papers trend focuses on automating the dependent replication of experiments, rather than on enabling the manual replication by other researchers. Some initiatives have emerged recently to support the creation of executable papers, such as “the executable papers grand challenge” [82], that promotes the creation of platforms for authoring and publishing executable papers [114, 127, 200]. 1.1.3 Tooling support The effort required for running the MPS life-cycle depends strongly on tooling support. In the context of MOEs (stages of selection, tailoring and tuning) those tools are mainly statistical analysis packages and design of experiment systems. The burden of the implementation stage can be significantly reduced using specific software tools. In this dissertation this kind of software tools are named Metaheuristic Optimization Frameworks (MOFs). The use of MOFs also improves the confidence in the correctness of the implementation, since the algorithms and tailoring mechanisms provided have been validated by other users. A considerable number of MOFs have been proposed in the literature (in Chapter §5 we have identified up to thirty four). In this dissertation we focus on supporting the process of optimization problem solving with metaheuristics when experimentation is required and MOFs are used. Additionally, we aim at enabling the creation of automatically reproducible and easy to replicate experiments with high internal validity. 1.2 M OTIVATION A key question regarding the MPS life-cycle is when it is worthy to apply metaheuristics to solve a problem. In order to answer this question, we may model the worthiness of a technique T for solving an optimization problem P. The value of an optimization technique for a given problem can be measured as the value of the solutions that it provides in terms 8 1.2. MOTIVATION of costs savings, increased profits, better quality of products and services, etc. Thus, our model is based on the net profit generated by solutions obtained. It is computed as the value generated by the solution obtained minus the cost of solving the problem: Pro f it ( T, P) = Value(Solution( T, P)) − Cost ( T, P) For instance, an optimization problem that most of us solve each week is to maximize the number of items from the market that we store in our fridge. In this case, the profit of using an optimization technique is usually negative, since the costs of modelling the problem, and applying the technique are usually bigger than value of the solutions provided. Moreover, the dimensionality of this problem is usually small, thus humans can provide good solutions cheaply. Nevertheless, solving a similar packing problem, such as the transportation containers filling problem is very profitable. The causes of this high profit are that the value generated by good solutions is high, and that the problem is solved very frequently. Moreover, the standardization of containers and item sizes reduces the complexity of the problem and the cost of solving it. The Value depends on the optimality of the solutions provided by T, and the Cost is comprised of the costs of executing each activity of the MPS life-cycle. In particular, the cost comprises of implementation cost, execution cost, and decision making cost (i.e., experimentation cost). There are two ways of improving the worthiness of metaheuristics for any problem: reducing the costs of executing the MPS life-cycle or increasing the value of the solutions that they provide. The former involves reducing implementation and experimentation costs, since execution costs are usually small and almost fixed. According to the NFL theorem [300], the latter implies improving the accuracy of the decisions made in the MPS life-cycle; i.e., improving capability to draw clear and accurate conclusions from MOEs. A way to improve the performance of processes is providing tooling support. In this sense, to reduce the implementation burden, an extensive number of MOFs have been created (up to thirty four, c.f. Section §5.2.2). However, the support provided by MOFs for the different metaheuristics is uneven, and its use involves overcoming a steep learning curve (c.f. chapter §5). Thus, the choice of the appropriate MOF is crucial to ensure that the costs are actually reduced by its use. Neither general reviews nor comparative studies have been conducted in the literature on MOFs. On the contrary, literature either lacks of comparative analysis (e.g. [281]), or it focuses on very specific 9 CHAPTER 1. INTRODUCTION criteria (such as performance or genetic operators) with a narrow perspective and few MOFs compared (e.g. [41, 71, 107]). Moreover, even though there is no framework that supports a majority of metaheuristics (c.f. Section §5.3) and interoperability problems between frameworks have been known for decades [182], there are not proposals to support MOFs integration or interfaces standardization. Regarding experimentation, to the best of our knowledge there is no tool providing support for the whole process depicted in Figure §1.2. On the contrary, there are different tools that support one or two activities each, and a general lack of interoperability and of automation in the information exchange between the tools. This situation has three main consequences on the experimentation process: 1. It becomes tedious and time-consuming since users must execute the corresponding actions on each tool and must perform manually the information exchange. This problem is aggravated by the fact that several replications are needed to reach strong conclusions, and that each experiment could be executed several times2 . As a consequence, it is not surprising that the usual strategy to make the decisions in the selection, tailoring and tuning stages is the “best-guess strategy” [21]. 2. It becomes error-prone and knowledge-demanding. Since no tool has a complete picture of the experimental process, the responsibility of maintaining the consistency in the activities of the process relies on the experimenter. Moreover, users are forced to master a set of complex tools and choose the correct path through a forest of features and options. This problem is aggravated by the inherent complexity of the subjects (design of experiments, statistical analysis, etc.), that requires some education [155]. 3. It becomes harder to replicate. The need of using several independent tools with its own configuration, versioning, dependencies, and feature changes along time, creates a complex environment that is difficult to replicate [62, 269]. This problem is aggravated by some specific issues of randomized optimization algorithms and experimental computer science in general [21, 151], the number of tailoring points and possible variants of metaheuristics, and the lack of a widely accepted scheme of experimental reporting similar to those used in the natural sciences [116]. 2 In some of our articles [247] the experimentation had to be carried out up to four times, due to the need of taking into account special cases in the encoding and reparation mechanisms. Those problems were noticed in the analysis stage, leading to the need of modifying the implementations, and re-conducting the experimentation. 10 1.3. THESIS GOALS 1.3 T HESIS GOALS The main goal of this thesis is: Main Goal Improving the applicability of metaheuristics for solving optimization problems when experiments must be carried out by reducing its cost This abstract goal translates into three specific objectives: Specific Goals • Define a comparison and selection mechanism for MOFs • Ensure the replicability and internal validity of MOEs • Speed-up and automate the execution of the MPS lifecycle An additional goal of this work is to devise a minimum implementation able to support the conceptual solutions provided to meet the above goals and enable its validation. Finally, some of the objectives stated are not specific of the metaheuristic optimization area. The automation of the experimentation and the systematization of the experimental descriptions are challenging goals for any experimental branch of computer science. Authors believe that the contributions provided may conform a suitable starting point for the creation of a general platform for executable experiments and reproducible research. Thus, we try to maximize the scope of our approaches to contribute to the creation of such a platform in the future. 11 CHAPTER 1. INTRODUCTION 1.4 P ROPOSAL SOLUTION The main contribution of this thesis is a set of support tools to reduce the cost of using metaheuristics to solve optimization problems. These contributions can be divided into five groups described below. 1.4.1 On the implementation of MPS applications In order to ease the implementation of metaheuristic algorithms applications we propose the following contributions: • A Comparison Framework (CF) to reduce the cost of selecting the best MOF to solve a given optimization problem. The framework includes a comprehensive set of features that an ideal MOF should support, definitions of metrics for assessing the support of such features, and means to aggregate such assessments into general quantitative scores. Based on such comparison framework, ten Metaheuristic Optimization Frameworks (MOFs) are assessed to provide a picture current state of the art. This contribution has been published in the Soft Computing Journal [213]. 1.4.2 On the description of MOEs For the description of metaheuristic experiments we propose the following contributions: • Two languages to reduce the cost of describing, automating and replicating experiments: SEDL and MOEDL. Scientific Experiments Description Language (SEDL) enables the description of domain-independent experiments in a precise, unambiguous, tool-independent, and machine-processable way. SEDL documents include all the information required to describe the design and execution of experiments including the definition of variables, hypotheses and analysis tests. SEDL also includes several extensions points for the creation of domain-specific languages. In turn, Metaheuristic Optimization Experiments Description Language (MOEDL) is an extension of SEDL for the description of MOEs. MOEDL abstracts the user of the majority of the implicit details of typical metaheuristic experiments such as techniques comparison or parameter tuning. 12 1.4. PROPOSAL SOLUTION 1.4.3 On the automated analysis of MOEs For the automated analysis and validation of experimental descriptions and results, we present the following approaches: • A set of 15 analysis operations on SEDL documents to reduce the cost of checking the validity and replicability of the experiments. These operations automatically check the existence of validity threats warning the users and suggesting fixes. For instance, we provide an operation that checks if the size of data generated by the experimental conduction is consistent with the design of the experiment. • A statistical analysis tool (STATService) to reduce the cost of validating experimental conclusions by testing hypotheses. STATService is especially designed to be used by inexperienced users with no background on statistical tests. Given an input data set, the tool automatically choose the most suitable statistical tests providing the corresponding results. STATService is provided with several interfaces, including a web interface, which makes it intuitive and easy to use by experimenters from any research discipline. The tool has already being used by 9 laboratories in 5 countries3 . 1.4.4 On the automated conduction and replication of MOEs In order to automate the conduction and replication of MOEs, we present the following contributions: • A Metaheuristic Optimization Software EcoSystem (MOSES), to reduce the cost of executing the MPS life-cycle. MOSES provides the design of a global architecture for supporting the automation of the experimentation process in the context of metaheuristic optimization. This architecture is defined in terms of service contracts, and software components that act as providers and consumers of those contracts, binding with other components. The information exchange between those components is based on SEDL, MOEDL and a format for experimental labpack (Scientific Experiment Archive (SEA)) proposed by the authors. 3 This information is about registered users. The number of anonymous users is much larger (c.f. see web statistics in Appendix §I) 13 CHAPTER 1. INTRODUCTION • A Reference Implementation of MOSES (MOSES[RI]) including an implementation of the key components of ecosystem, namely FOM, EEE and STATService. Framework for Metaheuristic Optimization (FOM) is a MOF developed by the authors. Since FOM is the framework that supports more metaheuristic techniques (c.f. Section §5.3), it enables the exploration of alternative metaheuristic approaches and their respective tailorings at a low cost. Experiment Execution Environment (E3) enables the full automation of metaheuristic experiments described as a MOEDL document plus a SEA lab-pack. 1.4.5 On the development of MPS applications We evaluated our contributions by developing MPS-based applications for solving two relevant software engineering problems, namely: • Quality-drive web service composition. In this problem, the goal is to find a set of candidate services that maximize the overall non-functional properties (i.e., quality) of a web service composition. Experiments show that our algorithm, called QoSGasp, outperforms previous metaheuristic approaches proposed in literature for real-time binding scenarios. Specifically, QoS-aware GRASP+PR algorithm for service-based applications binding (QoS-Gasp) provided bindings that improve QoS provided by previous proposals up to a 40%. • Hard Feature Model Generation. In this problem, we try to create feature models (c.f. Section §H.2) as difficult to analyze as possible for current tools, in order to determine its performance in pessimistic scenarios. The proposed algorithm, called ETHOM, found feature models of realistic size whose analysis takes more then 30 minutes by current tools. 1.4.6 Overall contributions The set of contributions provided and the corresponding stages of the MPS lifecycle for which they contribute to reduce costs are depicted in Figure §1.3. The contributions for the description (Section §1.4.2), automation (Section §1.4.4) and validation (Section §1.4.3) of MOEs contribute to reduce the cost of the stages where experimentation takes place, this is, selection, tailoring an tuning. Our comparative framework as well as FOM (Section §1.4.1) contribute to reduce the cost of 14 1.5. THESIS CONTEXT Figure 1.3: Summary of contributions per phase of the MPS life-cycle implementing metaheuristic applications. Finally, the two metaheuristic algorithms presented (Section §1.4.5) address specific search-based problems and were used to validate the rest of our contributions. Tables §1.1 and §1.2 depict a tabular summary of our contributions indicating the specific stages of the MPS and MOE lifecyles that they support. As illustrated, most of our contributions are intended to ease the burden of experimentation in the stages of selection, tailoring and tuning. We may also remark that we provide support for all the tasks of both lifecyles. 1.5 T HESIS CONTEXT This thesis has been developed in the context of the Applied Software Engineering research group (Ingenierı́a del Software Aplicada - ISA) of the University of Seville. The following research projects and networks made this thesis possible: 15 CHAPTER 1. INTRODUCTION Impact of our contributions on the phases of the MPS life-cycle Contribution Selection Tailoring Implementation 4 4 4 4 4 4 MOSES -E3 MOE description 4 4 4 4 4 MOEDL & SEDL SEA 4 4 4 4 Tuning Execution MPS Implementation CF FOM 4 MOE automation 4 4 4 4 4 4 4 4 4 4 4 4 4 MOE validation Analysis Operations STATService Table 1.1: Support per contribution for each phase of the MPS life-cycle Impact of our contributions on the phases of the experimentation life-cycle Contribution Design Develop Conduct Analyse MOE automation MOSES & MOSES [RI] -E3 4 4 4 4 4 4 4 4 4 4 4 4 MOE description MOEDL & SEDL SEA MOE validation Analysis Operations STATService 4 4 4 Table 1.2: Support per contribution for each phase of the experimental life-cycle SETI: reSearching on intElligent Tools for the Internet of services (TIN2009-07366), project funded by the Spanish Government (Ministerio de Economı́a y Competitividad) In the context of this project I was awarded a four-year grant for the development of my PhD thesis. Thus, this project constitutes the main funding of this work. 16 1.6. STRUCTURE OF THIS DISSERTATION ISABEL: Ingenierı́a de Sistemas Abiertos Basada en Lı́nEas de productos (TIC-2533). Excellence project funded by the Regional Government of Andalusia. It set the basis for starting the appropriate definition of a comparison framework for MOFs. S-Cube: the European Network of Excellence in Software Services and Systems, funded by the European Commission. Thanks to our participation in this network, we identified the need to provide a QoS-aware binding algorithm for Service Based Applications in real-time. THEOS: Tecnologı́as Habilitadoras para EcOsistemas Software. Excellence project funded by the Regional Government of Andalusia. It supported the development of MOSES [RI] and Evolutionary algoriTHm for Optimized feature Models (ETHOM). TAPAS: Tecnologı́as Avanzadas para Procesos como Servicios (TIN2012-32273), project funded by the Spanish Government (Ministerio de Economı́a y Competitividad). It supported the development of QoS-Gasp. 1.6 S TRUCTURE OF THIS DISSERTATION This dissertation is organised as follows: Part I: Preface. It comprises this introduction chapter, in which we introduce our research context, motivate our thesis by presenting the problems addressed, establish our goals and summarize our contributions. Part II: Background Information. This part feeds the reader with information regarding the research context in which our work has been developed. In Chapter §2, we introduce the main concepts of optimization, metaheuristics and experimentation. In chapter §3 we delve on the concept of experiment, its properties, and the specific type of experimentes performed in the MPS life-cycle. Part III: Our Contribution. This part is the core of our dissertation and is organized in five chapters. In Chapter §4 we define the problems identified regarding the support for the MPS and experimentation life-cycles, and we review the literature on the topics with regard to each and every problem identified. In Chapter §5 we present our comparison framework for MOFs and a benchmark of current MOFs developed based on it. Chapter §6 provides a description of SEDL, our general purpose experimental description language. This chapter also presents a catalog of analysis operations for SEDL documents. Chapter §7 describes a domain-specific language for describing 17 CHAPTER 1. INTRODUCTION metaheuristic experiments. In this chapter a set of transformation rules from MOEDL to SEDL are also presented. Chapter §8 describes a software ecosystem architecture for supporting experimentation named MOSES and its reference implementation (named Reference Implementation of the Core of MOSES (MOSES[RI])). Furthermore, this chapter describes our proposal for the statistical analysis required in the context of the MPS life-cycle, named STATService (as part of MOSES[RI]). Part IV: Validation. This part describes the validation performed on MOSES[RI]. This validation is based on the application of MOSES[RI] to two search-based problems described in Chapter §9. Part V: Final Remarks. Chapter §10 concludes this dissertation and highlights some future research directions. Part VI: Appendices. Several appendices have been attached to this dissertation to add supplemental material. Appendix §A provides the full data tables of our evaluation of MOFs based on the comparison framework described in chapter §5. Appendix §B describes the abstract syntax of SEDL and MOEDL by using UML meta-models. Appendix C describes the syntax for metaheuristic algorithms specification supported by MOEDL in EBNF. Appendix §D provides a table with the set of Statistical Tests supported by STATService and SEDL. Appendix §E describes our proposal for structure and packaging format of experimental lab-packs (SEA). Appendix §F briefly describes the experimental execution environment proposed as a part of the reference implementation of MOSES, named E3. In Appendix §G we describe our approach for solving the quality-driven web service composition problem, and the results of the experiments performed to compare it with some previous proposals. In Appendix §H we describe our approach for hard feature model generation and the results of the experiments performed on it. Finally, appendix §I shows some additional evidences of the utility of our contributions. First, it provides support letters from two of the authors of the MOF evaluated in our survey, stating that it has been useful for the selection of features to be added in upcoming versions of their tools. Next, an expression of interest of FOM from a consultancy company is shown. Finally, the use statistics of the web interface of STATService are provided. 18 PART II B ACKGROUND I NFORMATION 2 O PTIMIZATION P ROBLEMS AND M ETAHEURISTICS You should reach the limits of virtue, before you cross the border of death Ancient Spartan Saying, n this chapter, we introduce optimization problems and metaheuristics. In section §2.1 the concepts of optimization problem, global optimum in contrast with to local optima are presented. Furthermore, we discuss about the underlying reasons of the difficulty of optimization problems solving, and present some sample optimization problems that are related to our specific application problems. Section §2.2 describes different techniques for optimization problem solving focusing on metaheuristics. Section §2.6 presents software tools available for solving optimization problems through metaheuristics, focusing on software frameworks. Finally, section §2.7 summarizes the concepts presented in this chapter. I 21 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS 2.1 O PTIMIZATION 2.1.1 Introduction Optimization is about choosing the best element from a set of available alternative solutions A. In the simplest case, this means to minimize or maximize a function f by choosing the values of their parameters. Thus, an optimization problem P = ( A, f ) amounts to find an x ∗ ∈ A, such that given a function f : A −→ R, then ∀ x ∈ A • f ( x ∗ ) ≥ f ( x ) 1 . In this formulation x ∗ denotes the best solution also named the global optimum. Some optimization problems can have multiple global optima with the same value on f (note the ≥ in the formulae). Optimization problems are often expressed with a special notation. For instance, min x∈R ( x2 + 1) asks for the minimum value for the objective function x2 + 1, where x ranges over the real numbers R. The single global optimum in this case is x = 0. Figure 2 2 §2.1(a) depicts the value of the objective function f ( x ) = e−( x +y ) . A maximization 2 2 problem can be defined as maxx,y∈[−2,2] (e−( x +y ) ), where both x and y are real variables that range from −2 to 2. A solution to this problem can be denoted as (v x , vy ), where v x and vy refer to the values for x and y respectively. The optimal solution to this problem is (0, 0) as Figure §2.1(a) shows. When more than one objective function must be optimized at the same time, the problem becomes multiobjective. In multiobjective problems, there could be multiple optimal solutions for each function to be optimized and consequently, it may be that there is not a single solution that is the global optimum for all of the objective functions. For instance, decision making for economic policies are a typical area of application for multiobjective optimization, since several closely related indicators must be controlled and optimized simultaneously: minimize inflation, unemployment and deficit while maximizing growth and commercial balance. 2.1.2 Why are optimization problems hard? Solving optimization problems is in general a hard task. Most classical optimization problems are NP-hard, and for typical real life situations problem instances have huge or even infinite solution spaces. Furthermore, in some cases there is not an analytical 1 This formulation defines a maximization problem, but we could define a minimization problem by stating that ∀ x ∈ A • f ( x ∗ ) ≤ f ( x ) 22 2.1. OPTIMIZATION (a) e−( x 2 + y2 ) has a single global optimum (b) Objective function with a local optimum (c) The neighborhood of the solution (0, 0) is (d) Objective function with infinite local optima highlighted. Figure 2.1: Objective function landscapes of several optimization problems. 23 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS expression for the objective function, or its evaluation is so time consuming or resource intensive that sampling a significant part of the search space is inconceivable. As an example, let us consider the design of a car as an optimization problem2 . The goal is to create a car design that maximizes speed, which is a hard problem since a car is a highly complex system in which speed depends on a number of parameters such as engine type, its components as well as shape and body elements. The objective function of this optimization problem would not have an analytical expression, but a simulator should be used to obtain measurements about the speed that the designs can reach. As a consequence, the evaluation of the objective function would require the execution of a number of simulation runs, and would be extremely time consuming. Furthermore, this problem is likely to have extra constraints like keeping the cost of the car under a certain value, making some designs infeasible. One additional difficulty of optimization problem solving is that, even for singleobjective problems, it is not clear when the global optimum is found, taking the risk of choosing a sub-optimal solution as result of stopping the search. For instance, 2 2 2 2 for the objective function f ( x, y) = 2e−(−1.7+ x) −(−1.7+y) + e− x −y , shown in Figure §2.1(b), being a maximization problem, one problem solving technique that traverses the search space could find the solution (0, 0), and check that all its adjacent solutions are worse than (0, 0). If the technique decides to stop the search and returns (0, 0), it is missing the actual global optimum (1.7, 1.7). A key concept related to the solution space is neighboring. It is said that a solution y ∈ A is neighbour to other solution x ∈ A if y is close to x regarding a specific criteria. Neighborhoods are defined by a neighboring function neighborhood : A −→ P ( A), being P ( A) the powerset of A. Thus, neighborhood( x ) provides a set of solutions in A that are neighbors of x. An alternative way of defining a neighborhood is by means of a boolean function isNeighbor : A × A −→ {true, f alse} where given two solutions x and y, it indicates whether y is a neighbor of x or not. For instance, given the problem of maximizing the function shown in Figure §2.1(b), with R2 as search space, we can qdefine a neighborhood based on the euclidean distance such as: isNeighbor ( p1 , p2 ) ≡ ( p1 .x − p2 .x )2 + ( p1 .y − p2 .y)2 < 1. Figure §2.1(c) shows the neighborhood of a solution using this function. Given an maximization problem P = ( A, f ), a neighboring function n, and a solution x ∈ A, x is a local optimum iff x is better than the remaining solutions in its neigh2A 24 similar example was used to illustrate the working of evolutionary algorithms in [290]. 2.2. METAHEURISTICS borhood; i.e. ∀y ∈ n( x ) • f ( x ) ≥ f (y). An additional difficulty is that the number of local optima in the solution space of an optimization problem can be huge (or even infinite). For instance, Figure §2.1(d) shows the search landscape for the problem of maxi2 2 2 2 mizing f ( x, y) = 2e−(−1.7+ x) −(−1.7+y) + e− x −y − 0.1Sin(2( x − y)] + 0.1Sin(2( x + y)), that has an infinite number of local optima when using as solution space A = R2 . 2.2 M ETAHEURISTICS The techniques to solve optimization problems may be classified into exact and heuristic (see Figure §2.2, based on [267, 268, 291]). The former provide the global optimum under some convergence conditions, and range from finding the optimum of the objective function analytically, to applying algorithms such as Newton’s method or Dantzig’s simplex. When convergence conditions meet and solutions are obtained with an affordable time and resources consumption, exact techniques are preferred. Regarding the latter, many optimization problems do not meet those constraints, and heuristics emerge as strategies that try to find good solutions within time and resources limits deemed practical. However, heuristics are approximate, and thus do not guarantee finding the global optimum for the problem. Additionally, heuristic optimization strategies use problem-specific knowledge beyond the definition of the problem itself to find solutions more efficiently [241]. Therefore, heuristics are problemspecific, and must be carefully designed for each problem. Metaheuristics appear as general schemes of algorithms that can be tailored to solve different optimization problems, and they have generated a great amount of research and industrial activity during the previous decades [205]. There exist a number of definitions of metaheuristic [78, 120, 190, 205], being the one of Glover and Kochenberger the most widely accepted definition [120], who define metaheuristic as: “An iterative process that guides the operation of one or more subordinated heuristics to efficiently produce quality solutions for an optimization problem”. In this dissertation we refer to subordinate heuristics as tailoring points, since they enable the tailoring of the metaheuristic to each particular. For instance, the simplest metaheuristic conceivable for any problem is randomly sampling problem’s solution space, named Random Search (RS) (see Algorithm 1 3 ). This metaheuristic has three tai3 The efficiency of this algorithm can be improved by storing the evaluation of the objective function for the best solution found as most MOFs do. In this dissertation we prioritize readability over efficiency. 25 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Figure 2.2: Taxonomy of optimization techniques loring points, namely: the random solution generation procedure, the objective function evaluation, and the termination criterion [120]. Algorithm 1 Random Search bestSolFound ← random() repeat currentSolution ← random() if f (currentSolution) > f (bestSolFound) then bestSolFound ← currentSolution end if until Termination Criterion is satisfied return bestSolFound According to the way of dealing with solution space, metaheuristics are classified as Single solution based, Population method based and Building method based [267, 268, 291]. This classifier is not disjoint, for instance, Ant Colony Optimization (ACO) [72] is both a population method based metaheuristic and a building method based metaheuristic, since various ants work together to build, iteratively but simultaneously, various solutions to the problem. In the next sections we delve and describe some metaheuristics of each class, that will be used in this dissertation later. 26 2.3. SINGLE-SOLUTION BASED METAHEURISTICS 2.2.1 Hybridization A large number of publications in literature do not purely follow the concepts of one single metaheuristic. Instead they combine various algorithmic ideas, sometimes also with exact techniques. These approaches are commonly referred to as hybrid metaheuristics. This specific line of research is becoming very popular and has been successful in a large number of applications [34] Last but not least, metaheuristics can be combined in different ways, using one technique for generating the initial solution(s) of other, using one technique as a subroutine in the interactions of the main loop of other, etc4 . In the context of this dissertation, an hybrid approach that combines GRASP with Path Relinking is applied to solve the QoS-aware Web Service Composition Binding (QoSWSCB) problem (see Section §G.2). In our implementation, GRASP is used to initialize the elite set of PR. The elite set is not updated in each iteration of PR, but different starting and target solutions are chosen. 2.3 S INGLE - SOLUTION BASED METAHEURISTICS Single-solution based metaheuristics search the optimum of an optimization problem by improving iteratively a solution. The execution of this kind of metaheuristics could be regarded as a trajectory through the search space [120]. When this trajectory is driven by a neighbourhood the next solution to be explored is always a neighbour of the current solution. In such case, it is said that it is a local search algorithm. 2.3.1 Hill Climbing One of the simplest local search approaches is the Hill Climbing (HC) algorithm (see Algorithm 2), a.k.a. Steepest Descent (SD) when solving minimization problems. This technique searches successively for the best neighbour solution until reaching a local optimum, i.e., there is no neighbour to the current solution such that generates improvement on the objective function. Hill Climbing suffers from a number of drawbacks: (i) it converges toward local optima, (ii) it is very sensitive to the initial solution, and (iii) it traverses all the neighbourhood of current solutions before choosing the next one, and thus for large neigh4 Interested readers can find a taxonomy of hybridization in [267]. 27 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Algorithm 2 Hill climbing nextSolution ← initialSolution repeat currentSolution ← nextSolution for all x ∈ neighborhood(currentSolution) do if f ( x ) > f (nextSolution) then nextSolution ← x end if end for until f (nextSolution) ≤ f (currentSolution) return currentSolution bourhoods the search is extremely time-consuming [268]. Many alternative approaches to hill climbing have been proposed to overcome those problems such as simulated annealing and tabu search. These approaches are described next5 . 2.3.2 Simulated annealing The application of Simulated Annealing (SA) to solve optimization problems was proposed by Kirkpatrick et al. in [161]. Inspired by the natural process of slow cooling used in metallurgy, SA is a local search algorithm (see Algorithm 3) that enables the stochastic acceptance of degrading solutions. This acceptance allows escaping from local optima and continuing the search. SA generates at each iteration a set of neighbours and picks among them the solution whose neighbourhood will be explored in the next iteration. Improving neighbours are always accepted, but degrading solutions (those whose objective function evaluation are worse than current solution) could also be accepted based on a probability P(∆E, τ ). This probability depends on the amount of degradation ∆E = f (currentSolution) − f (nextSolution) of the objective function and current temperature τ. The temperature is a time-varying parameter that determines the probability of accepting non-improving solutions. The following constraints are usually imposed to P(∆E, τ ): • P(∆E, τ ) ∈ [0, 1] • Solutions that do not improve can be accepted as the next solution if temperature is not zero, i.e., when ∆E > 0 and τ > 0 then P(∆E, τ ) > 0. 5 Some authors consider GRASP as an improvement of local search strategies that uses a multi-start strategy. However, given that the construction phase of GRASP can be far more complex that the own local search algorithm, we consider GRASP as a building technique that uses local search as a subroutine, consequently we describe it in Section §2.5 28 2.3. SINGLE-SOLUTION BASED METAHEURISTICS • When temperature is zero the procedure becomes a Hill Climbing, i.e., P(∆E, 0) = 0 for degrading solutions (∆E > 0). This probability usually follows the Boltzmann distribution P(∆E, τ ) = e− ∆E τ The rule that drives the evolution of τ along time is named cooling scheme of the simulated annealing algorithm. Several cooling schemes has been proposed in literature each one which a different cooling speed, namely: Linear, Logarithmic, and Exponential. The exploration of the neighborhood of current solutions is performed by means of the pickNeighbors(solution,size) algorithm, that returns a set of size randomly chosen neighbors of solution. The exploration of the neighborhood is bounded by a maximum number of neighbors, that is a parameter of the algorithm named neighborsPerIteration. In the algorithm this parameter is used for invoking pickNeighbors. Algorithm 3 Simulated Annealing τ ← initialTemperature, currentSolution ← initialSolution, bestSolFound ← initialSolution {Main loop} repeat for all x ∈ pickNeighbors(currentSolution, neighborsPerIteration) do if f ( x ) > f (bestSolFound) then bestSolFound ← x end if if P( f (currentSolution) − f ( x ), τ ) > random() then currentSolution ← x break end if end for τ ← cooling(τ ) until Termination Criterion is satisfied return bestSolFound Algorithm 3 shows the pseudo-code of simulated annealing. The cooling scheme is implemented through the cooling sub-routine. The exploration of the neighbourhood of current solutions is performed through the pickNeighbors(solultion,size) subroutine, that returns a set of size size of randomly chosen neighbours of solution. The exploration of the neighborhood is bounded by a maximum number of neighbors, that is a parameter of the algorithm named neighborsPerIteration. In the algorithm this parameter is used for inovking pickNeighbors. 29 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Figure §2.3 shows the different paths generated by HC and SA in a small solutions space (19 solutions) and a fitness function similar to that shown in Figure §2.1(b), with a local optimum in s3 and a global optimum in s15 . The neighbourhood definition in this example is based on the position of the solution (being neighbours of a solution si the previous (si−1 ) and next solution (si+1 ) in the one-dimensional search space). Using s2 as the initial solution, HC will get stuck in the local optimum s3 . On the contrary, using the same initial solution SA has a certain probability of reaching directly to the global optimum s15 .6 2.3.3 Tabu search Basic ideas of Tabu Search were proposed by Glover in [118]. This technique uses an adaptive memory that guides the search process, avoiding searching in circles through the solution space. This memory scheme is implemented using data structures that store either visited solutions (tabu list), some components of those solutions, or even the frequency of appearance of some components in the solutions visited. If a solution is identified as tabu by the memory structure, the search will discard it as the next solutions to explore, and the search will be driven to a different area of the solution space A. In order to avoid discarding promising solutions, an aspiration criteria is implemented. For instance, a usual aspiration criteria is allowing to select a tabu solution if it improves the current solution by a percentage. Algorithm 4 shows the pseudo-code of Tabu Search. The neighborhood is explored using pickNeighbor, that returns a random neighbor of the solution provided as parameter, and by the direct enumeration using n.7 Subroutine tabu checks if a solution is marked as tabu given the current state of the tabu memory. 6 It is worth noting that the probabilities shown in Figure §2.3 are not the probabilities of reaching each solution from the previous one in the path, but the acceptance probabilities according to the Boltzman distribution. For instance, the probability of going from, s2 to s3 is not the probability of acceptance of s3 given the current temperature τ (whose value is 1), since there is a probability that the alternative neighbour s1 were chosen (the order of evaluation of the available neighbours in random). 7 Real implementations of Tabu Search (such as those provided by FOM, cf. section §8.4) usually perform the exploration of the neighborhood through the generation of non-tabu moves. This modification allows avoiding the enumeration of all the neighbours of current solution, improving the performance significantly on large neighbourhood structures. 30 2.3. SINGLE-SOLUTION BASED METAHEURISTICS Figure 2.3: Search paths generated by HC and SA 31 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Algorithm 4 Tabu Search bestSolFound ← initialSolution currentSolution ← initialSolution nextSolution ← initialSolution InitializeTabuMemory {Main Loop} repeat nextSoluiton ← pickNeighbor (currentSolution) {Neighbourhood Exploration} for all x ∈ neighbors(currentSolution) do if ¬tabu( x )∨ aspiration criterion is satisfied then if f ( x ) > f (nextSolution) then nextSolution ← x if nextEval < bestEval then bestSolFound ← nextSolution end if end if end if end for updateTabuMemory currentSolution ← nextSolution until Termination Criterion is satisfied return bestSolFound 2.4 P OPULATION METHODS BASED METAHEURISTICS Population Methods manage simultaneously a set of solutions in the search space to diversify the search. Two different population methods are used to solve the application problems shown in this dissertation: evolutionary algorithms and path relinking. 2.4.1 Evolutionary Algorithms Principles of biological evolution have inspired the development of a set of metaheuristic optimization techniques called Evolutionary Algorithms (EAs). In EAs a set of candidate solutions are combined and modified iteratively to obtain better solutions. Each solution is referred to as individual or chromosome in analogy to the evolution of species in biological genetics where DNA of individuals is combined and modified along generations enhancing species through natural selection. EAs are among the most widely used metaheuristics being applied successfully in nearly all scientific and engineering areas by thousands of practitioners [17, Section D]. All EA variants, such as 32 2.4. POPULATION METHODS BASED METAHEURISTICS Figure 2.4: Crossover operators with binary enconding genetic algorithms or evolutionary strategies, are based on a common working scheme shown in Algorithm 5. First, the initial population (i.e. set of candidate solutions to the problem) is created. This initialization is performed usually by sampling randomly the solutions space. Iterations on the remainder of the algorithm are performed until the termination criterion is met. In order to create offspring, individuals need to be encoded expressing its characteristics in a form that facilitates its manipulation during the rest of the algorithm. In biological genetics, DNA encodes individual’s characteristics on chromosomes that are used on reproduction and whose modifications produce mutants. For instance, classical encoding mechanisms on EAs are binary vectors encoding numerical values in genetic algorithms (binary encoding) [17, Sec. C1.2] and tree structures encoding abstract syntax tree of programs in genetic programming [169]. In the main loop of the algorithm (see Algorithm 5), chromosomes are selected from the current population in order to create new offspring. In this process, better chromosomes usually have a greater probability of being selected resembling the natural evolution where stronger individuals have more chances of reproduction. For instance, two classic selection mechanisms are roulette wheel and tournament selection [124]. When using the former, the probability of choosing a chromosome is proportional to its fitness (objective function evaluation) determining the width of the slice of a hypothetic spinning roulette wheel. This mechanism is often modified assigning probability based on the position of the chromosomes in a fitness–ordered ranking. When using tournament selection, a group of n chromosomes is randomly chosen from the population and a winner is selected according to its fitness. Once parents are chosen, crossover is performed. It combines individuals and produces new individuals in an analogous way to biological reproduction. Crossover mechanisms depend on the encoding scheme used but standard mechanisms are present 33 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Figure 2.5: Sample crossover and mutation for car design solutions in literature for widely used encodings [17, Sec. C3.3]. For instance, two classical crossover mechanisms for binary encoding are one-point crossover [139] and uniform crossover [5]. When using the former, a random location in the vector is chosen as break point and portions of vectors after the break point are exchanged to produce offspring. When using uniform crossover, the value of each vector element is taken from one parent or other with a certain probability, usually 50%. Figure §2.48 shows an example of application of both crossover mechanisms with binary encoding. Fig. §2.5(a) shows an illustrative application of crossover in our example of car design. An F1 car and an small family car are combined by crossover producing a sports car. At the mutation step, random changes are applied to the individuals. Changes are performed with a certain probability where small modifications are more likely than larger ones. This step is crucial to prevent the algorithm from getting stuck prematurely at a locally optimal solution. An example of mutation in our car optimization problem is presented in Figure §2.5(b). The shape of a family car is changed by adding a back spoiler while the rest of its design parameters remain intact. In order to evaluate the fitness of new and modified individuals decoding is performed. It often happens that the changes performed in the crossover and mutation steps create individuals that are not valid designs or break a constraint, this is usually referred to as an infeasible individual [17], e.g. a car with three wheels. Once an infeasible individual is detected, this can be either replaced by an extra correct one or it can 8 Inspired 34 in fig. 2 of [43] 2.4. POPULATION METHODS BASED METAHEURISTICS be repaired, i.e. slightly changed to make it feasible. Finally, individuals are evaluated and the next population is formed in which individuals with better fitness values are more likely to remain in the population. This process simulates the natural selection of the better adapted individuals that survive and generate offspring improving species. Algorithm 5 Evolutionary Algorithm Initialize population Encode population bestSolFound ← decode( population[0]) for all chromosome ∈ Population do if f ((decode(chromosome)) > f (bestSolFound) then bestSolFound ← decode(chromosome) end if end for {Main loop} repeat Parents ← crossoverSelection( Population) {Selection for Crossover} O f f spring ← crossover ( Parents) {Crossover} Population ← mutation( Population) {Mutation} {Evaluation of new population and Offspring} for all chromosome ∈ ( Population ∪ O f f spring) do if f (decode(chromosome)) > f (bestSolFound) then bestSolFound ← decode(chromosome) end if end for {Selection of survival chromosomes (Next population)} Population ← survivalSelection( Population ∪ O f f spring) until Termination Criterion is satisfied return bestSolution Algorithm 5 shows the pseudo-code of an Evolutionary Algorithm. The crossover and mutation sub-routines implement the operators on the populaion, taking as parameters and returning a set of chromosomes and being dependent on the specific problem at hand and solution encoding used. The selection for crossover and survival is performed through the crossoverSelection and survivalSelection sub-routines respectively, and are independent of the problem at hand. 35 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS (a) A sequence of neighboring solutions are (b) Two different paths (red and black) between generated iteratively until reaching t s and t Figure 2.6: Path generation in PR 2.4.2 Path Relinking Path relinking is a search procedure proposed by Glover [119] that generates new solutions by exploring trajectories that connect other solutions on the search space. The set of solutions that are connected are named “the elite set”. It starts from the elite set, choses an initiating or starting solution s, and generates a path in the search space that leads toward other solution in the elite set, called guiding or target solution t. Thus a sequence (s, x1 , x2 , . . . , xm , t) of neighboring solutions is generated from s to t, where x1 ∈ n(s), x2 ∈ n( x1 ), . . . , t ∈ n( xm ). The best solution in the path is returned as result. It is worth noting that given a starting and target solutions, multiple neighboring paths between those solutions are possible, and as a consequence a criterion for choosing the neighbor leading to the target solution is needed. Figure §2.6(a) depicts the process of path generation for a start solution s, to a target solution t. Figure §2.6(b) shows two different neighboring paths from s to t. Algorithm 6 shows the pseudo-code of Path Relinking. The pickNeighborTowards sub-routine returns a random neighbour of the first solution provided as parameter, approaching to the second solution provided as parameter. The exhaustive exploration of the solutions neighbouring a solution s that approach to a target solution t is performed through the subroutine neighborsTowards9 . The criteria for path neighbour selection in this specific algorithm is elitist, i.e., the best neighbor that leads towards t 9 This pseudo-code has the disadvantage that for large neighbourhoods the exhaustive exploration makes the search inefficient. In those situations, the technique is adapted to explore a fixed number of randomly chosen approaching neighbours (using the pickNeighborTowards sub-routine 36 2.4. POPULATION METHODS BASED METAHEURISTICS Algorithm 6 Path Relinking Initialize elite set {Main Loop} repeat s ← pickStartSolution(eliteSet) bestSolPath ← s t ← pickTargetSolution(eliteSet) nextSolution ← pickNeighbourTowards(s, t) pathSize ← 0 while (distance(current, t) > 0) ∧ ( pathSize < maxPathSize) do for all x ∈ neighborsTowards(current, t) do if f ( x ) > f (nextSolution) then nextSolution ← x if f ( x ) > f (bestSolPath) then bestSolPath ← x end if end if end for currentSolution ← nextSolution pathSize ← pathSize + 1 end while if f (bestSolPath) > f (bestSolFound) then bestSolFound ← bestSolPath end if until Termination Criterion is satisfied return bestSolFound is chosen to create the path. Figure §2.7 shows a concrete example of two paths between initial and final solutions in the context of the binary encoded car design. Each node (vector of bits) in the path encodes a design that becomes gradually more similar to the guiding solution (the differences are removed step by step). 2.4.3 Particle Swarm Optimization This technique is a stochastic algorithm inspired by the behaviour of birds flocking and fish schooling. The algorithm iteratively modifies a population of solutions (named the swarm), whose interactions are expressed as equations. Solutions in the swarm are represented as particles in an n-dimensional space with a position and speed. The original proposal by Kennedy and Eberhart has been applied successfully to a variety of problems [54, 216]. This technique has been also adapted to support discrete variables, and different equations to rule swarm interaction have been proposed 37 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Figure 2.7: Paths between binary encoded solutions of the car design problem. [49, 230, 278, 297]. 2.4.4 Scatter Search Initially proposed by Glover, it operates on a set of solutions, the reference set, by combining existing solutions to create new ones. In contrast to other evolutionary methods like genetic algorithms, scatter search is based on systematic designs and methods, where new solutions are created from the linear combination of two solutions of the reference set, using strategies for search diversification and intensification. 2.5 B UILDING METHODS Building methods work with incomplete solutions adding elements iteratively until a feasible solution for the problem at hand is found. Metaheuristics of this class shown in Figure §2.2 are GRASP and Ant Systems. Next we describe GRASP in detail since it is used during the validation of this dissertation. 2.5.1 GRASP The Greedy Randomized Adaptive Search Procedure (GRASP) ([89]) is an iterative optimization technique that has been successfully applied to a plethora of real life applications and research problems [91]. Each iteration consist of GRASP consists of constructing a solution and then applying an improvement procedure, typically a local search method. Iterations are absolutely independent without any memory, and could 38 2.5. BUILDING METHODS lead to creating the same solution. GRASP is efficient only if the construction step samples different promising regions of the search space, thus it creates a feasible solution using a randomized greedy algorithm as shown next. This algorithm is based on iteratively adding elements to a partial solution. At each iteration of the construction phase, a Restricted Candidate List (RCL) is generated, containing a subset of the candidate elements that could be added to the current partial solution. The RCL is the key of GRASP, determining the greedy and stochastic behaviour of the technique. The greedy behaviour is based on a greedy function g : E −→ R, that is used to decide whether a candidate element e ∈ E is in the RCL or not. The stochastic behaviour is caused by the random selection from the RCL of the element e to be added to the current partial solution. The most usual criteria used to decide the elements in the RCL are: • Cardinality based: The RCL comprises of the p best elements (according to the value provided by g for all the candidate the elements). • Value based: The RCL comprises the elements ei that are better than a given threshold t on g; i.e., ei ∈ RCL ⇒ g(ei ) ≥ t. This threshold is usually computed based on a parameter α, and the maximum gmax and minimum gmin values of g for the elements in E, where t = gmin + α · gmax − gmin . The parameter α controls the balance between greediness and randomness of the algorithm, if α ≈ 1 implies that the creation procedure is almost random, α ≈ 0 implies that the g-best element will be chosen at each iteration. Once an element is added to the partial solution, the RCL list is updated. The construction phase is extremely important for GRASP success ([222, 235]), since it must provide a proper balance between diversification and intensification in the search. The factors that affect this balance are the specific greedy function used and the RCL membership criterion. In GRASP algorithm (see Algorithm 7 the subroutine named greedyRandomizedSolution() is used for implementing the construction of new solutions. This subroutine is described in detail in algorithm 8. 2.5.2 Ant Colony Optimization This technique, a.k.a. Ant Systems (AS), was proposed originally by Dorigo and Gambardella. It is a probabilistic optimization algorithm inspired by the food forag- 39 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS Algorithm 7 GRASP main loop bestSolFound ← random() {Main loop} repeat currentSolution ← greedyRandomizedSolution() {Construction} currentSolution ← improve(currentSolution) {Improvement} if f (currentSolution) > f (bestSolFound) then bestSolFound ← currentSolution end if until Termination Criterion is satisfied return bestSolFound Algorithm 8 GRASP Construction Phase currentSolution ← neutralSolution() validElements ← {} RCL ← {} {Main Construction loop} repeat validElements ← elements(currenSolution) RCL ← selectCandidateElements(validElements, g) chosenElement ← selectElement( RCL) addElement(currentSolution, chosenElement) until isComplete(currentSolution) return currentSolution ing behaviour of ants that use a data structure called “pheromone trace” to support communication between them. 2.6 M ETAHEURISTIC O PTIMIZATION F RAMEWORKS Solving problems by using metaheuristics can be aided by the use of software tools, that range from problem specification languages and editors with integrated solvers (such as COMET, OptQuest, or OPL Studio) [132, 274, 279], to statistical analysis packages, such as SPSS or R[226]. For instance, when new problems are considered, metaheuristics should be implemented and tested, implying costs and risks. The Object Oriented Paradigm has become a successful mechanism used to ease the burden of application development and particularly, on adapting a given metaheuristic to the specific problem to solve. Based on this paradigm, there are some proposals which jointly offer support for the most widespread techniques, platforms and languages. These kind of approaches are named Metaheuristic Optimization Frameworks (MOFs). 40 2.6. METAHEURISTIC OPTIMIZATION FRAMEWORKS According to Parejo et al. [213], a MOF can be defined as: “a set of software tools that provide a correct and reusable implementation of a set of metaheuristics, and the basic mechanisms to accelerate the implementation of its partner subordinate heuristics (possibly including solution encodings and technique-specific operators), which are necessary to solve a particular problem instance using techniques provided”. Figure §2.8 depicts a conceptual map showing these elements and their relationships. In this figure, MOFs and its components are shaded. Figure 2.8: MOFs conceptual map MOFs not only provide a set of implemented techniques, they also simplify the adaptation of those implementations to the problem. MOFs also provide additional tools to help the whole optimization problem solving such as mechanisms to monitor the optimization processes, supporting tools to determine appropriate values of parameters of techniques, and to identify the reasons that prevent techniques from finding optimal solutions. 2.6.1 Why are MOFs valuable? The No Free Lunch (NFL) theorem [300] claims that there is no strategy or algorithm that generally behaves better than another for the entire set of possible problems. 41 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS In Ho and Pepyne words “Universal optimizers are impossible” [135]. The NFL theorem has been used as an argument against the use of MOFs, since there can be no universal optimal solver nor a software implementation of it [281, Chapter 4, pp 82-83]. Nevertheless, frameworks are not intended to be a universal optimal implemented solution, but tailorable tools that allow performing this implementation in an better way in terms of the implementation cost and effort. The NFL theorem implies the need to “match” a problem and the optimization technique used to solve it in order to obtain optimal or near optimal solutions. Metaheuristics allow performing such a matching by adapting its underlying heuristics. The purpose of MOFs is the optimization of such adaption mechanisms in a more reusable and easy way. Furthermore, when trying to solve a new problem without specific knowledge (with regards to well-known similar problems and their best-matching techniques), it is even advantageous to use several metaheuristics to ensure a proper matching to the problem. The benefits of using MOFs that implement several metaheuristics are even more obvious in this scenario. The main advantage of using MOFs is that they provide validated, fully functional and optimized versions of a set of metaheuristic techniques and their variants. They also provide mechanisms to facilitate the proper implementation of the underlying heuristics, depending on the problem, the representation of solutions, etc. As a consequence, we only have to implement those elements directly related to the problem, freeing us, as far as possible, of worrying abouts the aspects that do not depend on it. In addition, the use of MOFs decreases the risk of bugs in the implementation and therefore the time (and associated cost) invested in testing and debugging. Some MOFs provide additional features to aid solving the optimization problem, such as optimization process monitoring and results analysis tools, capabilities for parallel and distributed optimization tasks execution, supporting mechanisms for techniques parameters value determination, graphical reports and user-friendly interfaces. 2.6.2 Drawbacks: All that glitters ain’t gold MOFs also have some drawbacks. One is their steep learning curve. The user needs to know the set of variation and extension points to use in order to adapt the framework to the problem, and understand how they are related to the behavior of the software. This means that when we exactly know which technique to apply and we are confident in our implementation skills, using a MOF may be discouraging unless 42 2.7. SUMMARY you have expertise in using it. Another drawback to consider when using MOFs is that the flexibility to adapt the MOF is limited by its design. Consequently, a proper framework design is essential to achieve the most favorable balance between the capabilities provided and its flexibility. This drawback implies that it could be impossible to implement certain variants or modify certain behavior when using a MOF, which is specially serious in the context of research, where experimentation with different variants and capability of customization is a key feature (cf. Sec. §2.6.1). An increased testing and debugging complexity is a disadvantage resulting from the inversion of control (i.e. loss of explicit control over the execution flow of our application) that involves the use of a framework. Finally, the use of MOFs implies increasing the size of the software, creating dependencies on third-party libraries, and an increase on the complexity of the application. Table 2.1: Pros and Cons of using MOFs Advantages Drawbacks Reduced implementation burden and Steep learning curve ability to apply various techniques and variants with little additional effort Additional tools to help problem solv- Advanced knowledge to make adaptaing (monitoring, reporting, parallel tions. Lack of flexibility to implement and distributed computing) variants of metaheuristics Optimized and validated implementa- Induced complexity (when debugging tion (except the extensions and adap- and testing) and additional depention created by users or the undetected dences errors potentially present in the MOF) Users with little knowledge can use The choice of the right MOF may be an the framework not only as a tool for issue, since switching from one MOF software application development en- to another has a high cost, they provide vironment but as a methodological aid diverse features and there are no comparative benchmarks in literature 2.7 S UMMARY In this chapter basic concepts about optimization has been presented. Next, Metaheuristics have been defined, showing its relationship with other types of optimization problem solving techniques. In order to consolidate such concepts, the algorithmic 43 CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS scheme of the metaheuristics used in the applications and validation of this dissertation has been described. Finally, the types of software tools used for solving optimization problems with metaheuristics has been presented. At this point, we have focused on MOFs, since they are probably the most widely used type of tool. 44 3 E XPERIMENTATION The use of precise, repeatable experiments is the hallmark of a mature scientific or engineering discipline Lewis et al. in [173], his chapter provides the basic concepts on experimentation required to understand the contributions of this dissertation. First, Section §3.1 provides a definition of experiment and describes its life-cycle. Section §3.2 presents two sample experiments used throughout this and subsequent chapters. Next, Section §3.3 introduces the basic concepts related to description of experiments including the variables, hypotheses and design. Section §3.4 describes the concepts and techniques used during the execution of the experiments and the statistical analysis of their results. The concept of experimental validity is presented in Section §3.5. Section §3.6 describes the specific issues of MOEs, and the usual types of experiments performed in the context of the MPS life-cycle. Additionally, sections §3.6.3 and §3.6.4 describe respectively some specific experimental designs and analyses proposed in literature for MOEs. T 45 CHAPTER 3. EXPERIMENTATION 3.1 T HE CONCEPT OF E XPERIMENT Experimentation 1 refers to a methodical procedure of actions and observations with the goal of empirically verifying or falsifying a hypothesis [116, 154, 248]. Shadish et al. [248] define experiment as “A study in which an intervention is deliberately introduced to observe its effect”. For instance, in the context of a medical experiment with a new antipyretic (drug used to reduce the fever), the intervention would be to treat feverish patients with the drugs, and the observation is to measure the reduction of body temperature along time. Related concepts are natural experiments and correlational studies [248]. Natural experiments are studies where the cause usually cannot be manipulated i.e., there is not intervention, and the study contrasts a naturally occurring event such as the temperature of the ocean. Correlational studies (a.k.a. non-experimental or observational studies) simply observe the size and direction of a relationship among variables without establishing a relationship of causality or generalizing the relationship out of the observed data. Natural experiments and correlational studies are not experiments according to the definition provided by Shadish et al. In this dissertation the concept of experiment is more general, similar to that used in [116], including both natural experiments and correlational studies, namely a process of systematic inquiry and data collection with the aim to confirm or disprove a hypothesis. Thus, an experiment is not a static object, it is a process that flows through an ordered sequence of activities, from the formulation of the experimental hypothesis to the drawn of conclusions regarding the hypothesis. Figure §3.1 depicts this process as a life-cycle. The first activity in the experimentation life-cycle is the statement of the experimental hypothesis. According to [116], this activity comprises of several steps: the identification of the research problem or question, a survey of the literature to check if the question has been answered by others, and the reduction of the research question to a testable hypothesis. For instance this reduction will transform the question “is the new antipyretic drug effective?” to a hypothesis such as “the average reduction of corporal temperature in feverish patients, measured after 2 hour of the administration 1 There is not an absolute consensus about experimental terminology in the research community [116, preface],[248, page 12]. This dissertation uses the terminology and definitions provided by Gliner et al. in [116] for the basic concepts. The terminology defined in [248] is used for threats to internal validity, since it is more appropriate for computational experiments. 46 3.1. THE CONCEPT OF EXPERIMENT Figure 3.1: Experimental life-cycle of a dose of 100mg of the drug, is statistically significant”. The design of the experiment is the next activity. According to [116], it comprises of: the selection a sample for the experiment, the identification of the instruments and artefacts required for experimentation, and the creation of plan for data collection and analysis. The output of this activity is thus a detailed plan -called experimental protocolthat tries to maximize the information obtained from the experiment. Next, experimental artefacts are developed. In social sciences this activity usually involves the creation of the forms used to collect information, and a database or some computer support for storaging the information. In computational experiments this activity also involves the implementation of the algorithms. Once all experimental artefacts are available, the experiment is conducted and data is collected. Next the data is analysed, usually through inferential statistics. Finally, based on the results of such analysis, either conclusions are drawn or the experimental hypothesis and design are modified leading to a new execution of the life-cycle to reach new conclusions. 47 CHAPTER 3. EXPERIMENTATION 3.2 S AMPLE EXPERIMENTS Along this chapter two basic experimentation scenarios are used to illustrate the basic concepts about experimentation. The first one is taken from the medical area, and the other is taken from the specific area of application of this thesis -optimization problem solving with metaheuristics-. This selection is intentional since our aim is to show that experimental concepts presented in this chapter are general and potentially applicable to any area of human knowledge. Experiment #1.The goal of this experiment is to discern if a new drug can be used as antipyretic and in which doses. A set of individuals with fever will be used in this experiment to evaluate the effects of the drug. The reduction of body temperature of patients will be measured, and the reduction induced by the new drug will be compared with the reduction generated by previous alternatives and placebos. Experiment #2. The goal of this experiment is to compare the effectiveness of several metaheuristic techniques in solving optimization problems in quality-driven web service composition2 . In particular, the metaheuristics to be compared are Tabu Search (TS), Simulated Annealing (SA), Evolutionary Algorithms (EA), Greedy Randomized Adaptive Search Procedure (GRASP) and GRASP hybridized with Path Relinking (GRASP+PR). For the experiment, each metaheuristic will be implemented in an specific metaheuristic program. Then, each program will be run to find solutions for a set of given optimization problems comparing their results. 3.3 E XPERIMENTAL D ESCRIPTION In this section, we focus on the concepts regarding the initial activities of the experimental life-cycle. This mainly refer to the selection of the experimental objects and subjects, hypothesis statement and experimental design. Figure §3.2 depicts the concepts described in this section and its relationships as a conceptual map. 3.3.1 Objects, subjects and populations The first element to be defined in any experiment are the experimental objects (a.k.a. experimental units). The experimental objects are the elements of interest that partici2 This 48 problem is described in Section §9.2 3.3. EXPERIMENTAL DESCRIPTION Figure 3.2: Conceptual map about experimental description pate in a particular experiment. In the social and biological sciences, those objects are usually people, and are called participants or individuals. For instance in experiment #1 experimental objects are sick people with fever. In experiment #2, the experimental objects represent algorithm runs. The target population of an experiment is defined as the set of experimental objects to which we would like to generalize the conclusions obtained from the experiment, i.e., the subject of the hypothesis of the experiment. The target population of experiment #1 could be all adult humans with fever. The target population of experiment #2 could be all the runs of an algorithm on any instance of the optimization problem to solve. Since the target population of an experiment can be huge or even infinite, the researcher usually uses a sample of it in the experiment. The sampling of the target population can be divided in two phases. First the accessible population (a.k.a. sampling frame) is defined. The accessible population is the set of experimental individuals that could participate in the experiment. For instance, the accessible population of experiment #1 could be the set of patients diagnosed with fever in an specific hospital in the province of Seville. The selected sample is the set of experimental objects that actually 49 CHAPTER 3. EXPERIMENTATION participate in the experiment, taken from the accessible population. This selection can be performed in a number of different ways, from random to convenience. For instance, the selected sample of experiment #1 could be 20 individuals with fever chosen at random from the Seville Hospital. Experimental subjects (a.k.a. experimenters) are the people who apply the methods, techniques or treatments to the experimental objects. For instance, in experiment #1, the specific doctors or nurses that administer the drug to the patients are the experimental subjects. In some cases, experimental subjects may influence the results of the experiments and therefore they must be controlled during the design of the experiment, i.e., experimental subject are treated as a variable. Figure 3.3: Experimental objects, populations and sample 3.3.2 Variables An experimental variable is defined as a characteristic of the experimental objects or of the experimental environment that can have different values. For instance, in experiment #2 the specific metaheuristic technique applied for solving a problem in a run is a variable, since it varies between runs. When a relevant characteristic has only one value in the context of an experiment, it is a constant (a.k.a. parameter). For instance, in experiment #2 the termination criterion is the same for all the metaheuristics in the experiment, thus it is a constant. The roles that a variable can play in an experiment are outcome (a.k.a. output, response or dependent variable), and factor (a.k.a. independent variable or input). This taxon- 50 3.3. EXPERIMENTAL DESCRIPTION Figure 3.4: Taxonomy of experimental variables omy is depicted in Figure §3.4 as an UML class hierarchy. The outcome is the presumed result of the experiment, and its value is used for testing the hypothesis of the experiment. It measures or assesses the effect of factor(s). For instance, in experiment #2 the value of the objective function for the best solution found by each metaheuristic is the outcome. The outcome of experiment #1 is the difference of body temperature after two hours of dose administration. Factors can be classify into two types, controllable factors and non-controllable factors. A controllable factor (a.k.a. active independent variable) is a variable whose value is applied or given to the experimental objects. The values of this kind of variable are usually controlled or manipulated in some way by the experimenter. A controllable factor in experiment #1 is the dose administered to the patients. A controllable factor in experiment #2 is the specific optimization technique applied in a run. The values of a non-controllable factor are not changed during the study. For instance, the age and the gender of patients in experiment #1 are non-controllable factors since they are not modified by experimenters but are different among patients. The different values of a factor could make the outcomes of the experiment not comparable. When this risk exists, the experimental objects must be grouped into blocks according to the value of the factor, which is used as a so-called blocking variable. This creates homogeneous blocks that receive the same treatment along the experiment making the results comparable. For instance, in experiment #1, the effects of the drug measured in children and adults could not be comparable. Thus, the age of the patients could be used as a blocking variable dividing patients into two blocks, those under 18and those aged 18 and older. If a factor is of no particular interest in an experiment, but could be useful in subsequent replications or its impact on the response is unknown, it is called a nuisance 51 CHAPTER 3. EXPERIMENTATION variable (a.k.a. extraneous variable). Nuisance variables must be ruled out or controlled in order to ensure the validity of the experiment. According to [116] a way to to control the effect of this kind of variables on the conclusions of the experiments is random assignment of experimental objects to experimental groups (this type of assignment procedure is described below). For instance, in experiment #1 the sex of patients is not interesting, but its impact on the effectiveness of antipyretics is unknown. Consequently, the specific drug and dose administered to patients should be chosen randomly. Randomization ensures that the effect of the sex factor is averaged among the results of all the drugs and doses, reducing the bias introduced in the results. The levels of a variable are the set of values that it can have in the context of the experiment. For instance, in experiment #1, the levels of the “dose” variable could be: 0mg, 100mg, 200mg, etc. In experiment #2, the factor metaheuristic has five levels: EA, TS, SA, GRASP and GRASP+PR. One important characteristic about variables is whether the levels are unordered categories or they are ordered from low to high3 . For instance, the levels of the “metaheuristic” variable in experiment #2 are not ordered, since they are essentially labels. The variables whose levels are not ordered are said to be nominal variables. Conversely, ordered variables have a set values that vary from low to high within a certain range. Depending on the measurement scale of the levels, ordered variables can be divided into ordinal and real variables. In ordinal variables, the levels are ordered from low to high in a ranking, but the intervals between the various ranks are not equal. For instance, in a F1 race the second place car may finish twenty seconds after the winner but only a fraction of second before the third place car. Real variables (a.k.a. scalar variables) not only have levels that are ordered, but also the values associated to those levels are equally spaced. These variables are named rational or intervalar depending on whether they have a real zero value or not . It is worth noting that the kind of variable may not be directly related to the output of the mechanism used to measure it or the nature and range of its value in the real world. In this sense, we can have an ordinal variable, whose levels are “low” and “high”, but whose values are measured as real values ranging from 0 to 50 (we must define the two intervals [0, X] and (X, 50] that determine when a real value is high or low). In the same way, we can have two variables that are both rational, but whose levels have integer and floating point values respectively. The taxonomy of variables according to their levels is depicted in Figure §3.5 as an UML class hierarchy. 3 Here 52 we describe the most common sorts of measurement scales following [88], [162] and [154] 3.3. EXPERIMENTAL DESCRIPTION Figure 3.5: Taxonomy of experimental variables according to their levels The experimental objects that have the same levels for factors are usually arranged into groups. In this sense, the group denotes a specific set of individuals with specific experimental conditions along the conduction process. The mechanism used to decide the experimental individuals that pertain to each group, i.e., which treatment will be applied to them, is denoted as assignment. 3.3.3 Hypotheses The hypothesis of an experiment is a statement about its variables. This hypothesis can be related to an underlying theory that predicts that the hypothesis holds. The objective of an experiment is to disprove or confirm the hypothesis. According to Popper’s concept of scientific truth, the theories whose predictions have not been disproved by experiments and for which no other alternative theory is available, are considered as true [48]. Scientific hypothesis are divided into three types [116, p. 38]: difference (or differential), associational and descriptive. Differential hypotheses Differential hypotheses state that there is a difference in the value of the outcome of two or more groups, that is, the factors make a significant difference on the outcome. For instance, a differential hypothesis for experiment #2 is that the specific metaheuristic used for optimization (controllable factor) has a significant impact on the value of the objective function (outcome). Another typical differential hypothesis used in medicine is that a specific drug has an impact on the symptoms of a particular disease, e.g. whether the drug actually reduces the fever or not in experiment #1. 53 CHAPTER 3. EXPERIMENTATION Associational hypotheses Associational hypotheses state that there is a specific relationship between the levels of two variables. For instance, a typical associational hypothesis is that of a linear dependence in the value of two variables x and y, where y = a ∗ x + b. In a possible associational hypothesis for experiment #1 would be that “the decrease in body temperature is proportional to the dose”. Descriptive hypotheses Descriptive hypotheses state that the variables have some value, central tendency or variability, and summarize the data obtained. In this sense, descriptive hypotheses usually do not have factors, and their assertions refer only to the current sample; i.e., the target population, the accessible population and the sample are the same. A sample descriptive hypothesis for experiment #1 is that the reduction of body temperature for the specific individuals that participate in the experiment treated with the antipyretic range between 0.4 and 2.6 degrees. This kind of hypotheses allows including as experiments correlational studies and natural observations, which are usually performed to explore new questions, or are the only data available at the moment for one subject. 3.3.4 Design Experimental design is what differentiates scientific and engineering experiments from a careful natural observation [154, 248]. The main point of experimental design is controlling factors. Montgomery defines experimental design as “the process of planning the experiment so that appropriate data can be collected and analysed by statistical methods, resulting in valid an objective conclusions (regarding the experimental hypothesis)”. We further refine this definition of Montgomery based on the description by Hinkelmann and Kempthorne [134] as follows: Experimental design is the specification of (i) the actuations to be performed during experimental conduction -regarding the levels of the variables involved-, (ii) the specific experimental objects on which they will be performed, and (iii) the arrangement of (i) and (ii) with the aim to minimize experimental error and systematic bias. Thus, an experimental design should specify: • Selection of experimental objects. This is about selecting, from the accessible population, the experimental objects that will take part in the experiment. This is usually done by means of the algorithm used to perform the selection, not by enumerating the experimental objects explicitly. This algorithm is called the se- 54 3.3. EXPERIMENTAL DESCRIPTION lection method. For instance, in experiment #1 the selection could be performed by choosing randomly 50 individuals among all the feverish patients in the accessible population. • Specification of variables and levels. This determines the factor variables that will be modified by experimental subjects during the experimental conductions and which levels will be set. The modification of the level of a factor variable is usually referred to as a treatment. For instance in experiment #1 the action associated with treatments are the administration of the drug, and the levels of the corresponding factor “dose” could be 0mg i.e., a placebo, 100mg, and 200mg. • Specification of treatments and groups. It specifies the experimental objects that will receive each specific treatment. The set of experimental objects that receive the same sequence of treatments are usually denoted as a group. This specification is also performed by means of the algorithm used to perform the assignment of experimental objects to groups, not by enumerating the experimental objects explicitly. This algorithm is usually named the assignment method. For instance, in experiment #1, the assignment method used to decide which treatments are applied to each one of the 50 individuals in the sample could be random. • Treatment and Measurement sequence. The specific sequence in which the experimental objects receive the treatments and the outcome variables are measured. Those details are finally expressed in an experimental protocol. It is worth noting that the experimental design determines the information that will be gathered from the experiment and the capability of the subsequent analyses to disprove or confirm the hypothesis. Thus, the consistency of the experimental design with regards to the experimental hypothesis is a crucial characteristic of any experiment. Experimental design is also intimately related to the internal validity of the experiment, since the specific arrangement of the treatments and measurements, and the methods used for selection and assignment of experimental objects can avoid most threats to validity (c.f. Section §3.5). Principles of experimental design In order to assure validity of the analysis, and to increase its capability for providing clear conclusions regarding the hypothesis, experimental designs should fulfil three basic principles [116, 134, 195]: 55 CHAPTER 3. EXPERIMENTATION 1. Repetition. This principle establishes that each treatment must be repeated on different experimental objects a number of times. The pursued goal is reducing the bias introduced by the specific characteristics of every single experimental objects in the observations of the outcome variable. For instance, in experiment #1, the effect of each specific dose and antipyretic drug should be measured on several patients. Regarding how to determine the right number of repetitions, some information about the treatments, experimental objects and distribution of the outcome is need to determine how many repetitions to run. This information is known only to experimenters who are experienced in the experimental domain. If such information is not available, one option is to use values accepted in the literature of the domain. For instance, for simple comparison experiments with normal distribution of the outcomes, a sample size of 30 experimental objects with a single treatment and measurement per object is widely accepted as a reasonable minimum [248]. 2. Randomization. This principle establishes that the decision on which treatment should be applied to each experimental object must be made according to a random scheme. The goal pursued is to reduce the bias introduce when all the repetitions of a treatment are performed on individuals with similar characteristics. For instance, in experiment #1 if the new drug would be administered to the youngest patients only, the results of the study could be biased since they are more sensitive to antypiretics. 3. Local Control or Blocking. The basic idea behind this principle is that when there are factors that make the outcomes of the experiment non comparable, the selected sample should be partitioned into blocks as homogeneous as possible. Within those blocks observations are comparable. The groups, treatments and observations are then replicated for each block. For instance, in experiment #2 we should have one block per problem instance, since executions of algorithms on different problem instances are not comparable. Next, some classical experimental designs compliant with the principles are presented. Complete randomized design The simplest design for differential hypotheses is the completely randomized design. Given t treatments and N = tr homogeneous experimental objects. The N experimental objects are partitioned in t experimental groups of r elements, each group with 56 3.3. EXPERIMENTAL DESCRIPTION a different treatment, and the experimental subjects are assigned randomly to an experimental group. The outcome is measured once the treatment is performed on each experimental object. Thus the dataset generated contains N values for the outcome in total, and r observations for each specific treatment. This design requires that the assignment procedure of the experimental objects to groups and the specific treatment applied to each group is random, and that there are no blocking variables. For instance, let’s suppose that a single problem instance is going to be optimized in experiment #2. In those circumstances a complete randomized design would be appropriate. We could set the number of experimental objects (algorithm runs) to 100 with one group for each level of the factor “metaheuristic” (TS. SA, EA, GRASP and GRASP+PR). Each group would comprise of 20 experimental objects (algorithm runs). The specific order of execution of the runs would be chosen randomly. Randomized complete block design There are many situations with systematic variation among sets of experimental objects depending on its factors. In such situations, a random design is not feasible since the design should take this variation into account to “eliminate” its effect on the conclusions making observations comparable. This leads us to the concept of local control or blocking, introduced by Fisher [93]. For instance, in experiment #2 the variable “instance” has an strong impact on the value of the outcome “objectiveFunction”. Consequently, the results obtained by techniques for different problem instances cannot be compared. Randomized complete block designs are used for experiments with differential hypotheses and a single blocking variable. In a randomized complete block design, the selected sample is divided into b sets of homogeneous experimental objects called blocks. A complete randomized design is performed on each block. Thus the dataset generated contains b ∗ N values for the outcome variable in total, and b ∗ r observations for each specific treatment. For instance, in experiment #2 a randomized complete block design could specify a number of objects (algorithm runs) equal to 500. There will be 5 groups with their corresponding levels of the factor variable “metaheuristic”. There will be 10 blocks, one per level of the blocking variable “instance”. Each group (and the corresponding algorithm) will comprise of 10 experimental objects (runs) chosen randomly. Thus, the dataset generated contains 500 values for outcome variable “objectiveFunction” in total, and 100 values for each specific optimization technique, and 50 values for each specific problem instance. 57 CHAPTER 3. EXPERIMENTATION Latin square The Latin square design is used for experiments with differential hypotheses and two blocking variables. It is used to compare t different treatments in a matrix with t rows and t columns. The rows and columns actually represent two restrictions on randomization. In this design a single treatment is used for each combination of levels of each blocking variable. This means that the treatments of the elements for each row and column of the matrix are different (no treatment is repeated per row and column). For instance, in experiment #2 the techniques available to find solutions for the optimization problem are TS, SA and EA. The experiment has a single factor, i.e., the technique used to solve the problem. It also has two blocking variables, the specific problem instance (I1 , I2 , and I3 ) and the termination criterion used: 100, 5000, 10000. Two latin squares for experiment #2 are shown in table §3.1. Table 3.1: Two Sample 3x3 latin squares for a technique comparison experiment Factor1: Problem Instance Factor2: Termnation Criterion I1 I2 I3 (MaxExecTime) 100ms EA TS SA (MaxExecTime) 5000ms TS SA EA (MaxExecTime) 100000ms SA EA TS Termination Criterion (MaxExecTime) 100ms 5000ms 100000ms Problem Instance I1 I2 I3 TS EA SA SA TS EA EA SA TS It is worth noting that latin squares are reduced (or incomplete) designs, in the sense that not every treatment is applied under every combination of levels of the blocking variables. Thus, its experimental protocol requires less treatments and measurements (optimization techniques runs in this example) than randomized complete block designs, making experimental conduction faster and cheaper. In our example, the latin square design requires 9 runs while the randomized complete block design requires 27 runs (assuming that no repetition is performed, i.e., groups size is 1). Factorial designs Factorial designs are used when several factors are present in the experiment. For instance, in experiment #1 there are two controllable factors drug and dose. The fac- 58 3.4. EXPERIMENTAL EXECUTION tor drug could be nominal, with hypothetical levels “never-fever”, “colder-plus”, and “bye-fever”. The factor dose could be scalar with levels 10mg, 50mg,. . . In this case, a treatment consists of a combination of levels for drug and dose. The experimental protocol of factorial designs varies the levels of each of controllable factor until considering all possible level combinations simultaneously. Factorial experiments are used widely in scientific and industrial experimentation because they allow to evaluate the effects each factor and their combinations (named interactions). A 2k factorial design is a specific kind of factorial design where k factors with 2 levels are studied. For instance, let us consider a slight modification of experiment #2. Instead of comparing different techniques we could compare different variants of EA. In particular, let suppose we compare two alternative crossover operators (uniform and one-point) and two alternative selection strategies (roulette and tournament). In this situation, with two controllable factors, a factorial design is appropriate. Specifically we have a 22 factorial design describe the 4 possible tailorings of the EA. The experimental protocol would have 4 groups of equal size, i.e., one per possible combination of the levels of the factors. The sequence of application of the treatments would be random. 3.4 E XPERIMENTAL E XECUTION In this section, we focus on the concepts regarding the final activities of the experimental life-cycle, denoted as experimental execution. According to our definition of the experimentation life-cycle, experimental execution comprises of two phases, experimental conduction and data analysis. The description of an experiment is not enough for automating the replication of the experiment. Even for experiments in computer science where experimental protocols are implemented as programs (such as metaheuristics optimization experiments), more details are required to perform an automated conduction of the experiment. In order to support such automation, a detailed description of the process of experimental conduction, including its inputs and outputs is required. Additionally, in order to evaluate if an experiment could be replicated, additional details are required (such as ranges for environmental and extraneous variables, human resources required for replication, material equipment, etc.). Experimental conduction involves treatment application and data collection. This activity should be performed in an unbiased and objective way. 59 CHAPTER 3. EXPERIMENTATION Analysis of data is the process of inspecting and modelling data with the goal of discovering information, draw conclusions, and support decision making. Statistics is the basic tool used to draw conclusions from the data retrieved during the data analysis phase. The specific type of statistical procedure to be used for data analysis depends strongly on both the type of hypothesis and the design of the experiment, and it is usually performed in two phases: exploratory and confirmatory analyses. Throughout exploratory data analysis is possible to detect mistakes in experimental conduction, check the assumptions taken during experimental design, and assess the direction and rough size of the relationship among the experimental variables. In turn, throughout confirmatory data analysis is possible to confirm or disprove the hypothesis of the experiment by means of statistical inferences. 3.4.1 Exploratory data analysis There is a plethora of disparate techniques that are used for exploratory data analysis that can be classified in very different ways. A possible manner divides such techniques into three groups: tabulations of the data, graphs (a.k.a. charts) and descriptive statistics. Graphs are visual representations of the sample, being histograms and bar charts two of the most widely used graph type. However, there exists a wide graph type offer, which is very useful for more specific purposes. For example, the so–called scatter-plot is often used in case of experiments with relational hypotheses since it allows observing the degree of association between two variables. Descriptive statistics can be further sub-classified into central tendency measures and variability measures. Central tendency measures such as the mean, the median and the mode describe the way the values of a sample cluster around some value. The mean (a.k.a. arithmetic average) is appropriated when the sample follows a normal distribution in absence of outliers, and it is defined only for real variables. The median (a.k.a. middle score) is more suitable than the mean when the frequency distribution is skewed markedly or outliers are present, and it is defined for both real an ordinal variables. Finally, the mode, the most common value in the sample, is usually the least precise central measure for real and ordinal variables, but it can be computed for nominal variables. Variability measures describe the spread or dispersion of a given sample. Thus, in an extreme case where all the scores in a distribution are the same, the variability of the sample is zero. Standard deviation and inter-quartile range (distance between the 60 3.4. EXPERIMENTAL EXECUTION 25th and 75th percentiles) are the most commonly used variability measures for real and ordinal variables respectively. For nominal variables, an usual variability measure is the number of different categories present in the data and the percentage of the samples in each category. 3.4.2 Confirmatory data analysis Overview Using the appropriate confirmatory data analysis technique is crucial to obtain true conclusions for a specific experimental design and hypothesis. Table §3.2 shows the relationship among hypothesis types, number of factors, and appropriate type of statistical technique used for confirmatory data analysis. Tables §3.3, §3.4, and §3.5 show the specific statistical techniques to be used under specific circumstances. Those tables are based on the recommendations provided by Gliner et al. in [116], completed in some cases with the recommendations provided in [68], [248], [195] and [134]. It is worth noting than although tables §3.3, §3.4, and §3.5 provide a number of statistical techniques and a complex selection framework, the casuistic and set of techniques to be used is still incomplete. For instance, in the cases of the application of ANOVA or Friedman tests, additional post-hoc procedures such as Bonferroni, Holm, etc., are usually required to decide if a specific treatment is better than other (see [69] for a detailed description of the techniques to be used). This complexity is one of the main motivations for our goal of automating this task of the experimental life-cycle. Number of factors Zero One More Differential Basic STH (table §3.3) Complex STH (table §3.4) Type of Hypothesis Associational Correlation coefficients (table §3.5) Complex correlation models (table §3.5) Descriptive Exploratory analysis and basic STH Table 3.2: Statistical procedure decision table. Table 3.3: Specific STH for basic experiments with a single independent variable Experimental Design Type and distribution of the outcome Real Normal Real not-Normal Ordinal Nominal two-levels factor No blocking Blocking Independent Samples t-Test Mann-withney Chi Square or Fisher exact Test Paired samples tTest Wilcoxon or Sign Test McNemar three-or-more-levels factor No blocking Blocking4 Oneway ANOVA Kruskal-Wallis Repeated Measures ANOVA Friedman Chi Square Cochran Q 61 CHAPTER 3. EXPERIMENTATION Table 3.4: Specific STH for experiments with multiple independent variables Type and distribution of the outcome Experimental Design two-levels factor Not blocking Blocking three-or-more-levels factor Not Blocking Blocking Real Normal Factorial ANOVA Log linear Factorial ANOVA Log linear Real not-normal Ordinal Nominal Factorial ANOVA (rep. measures) Friedman Friedman - Factorial ANOVA (rep. measures) Friedman Friedman - Table 3.5: Regression coefficients and models Type and dist. of indep. vars. Type and All Real and Normal dist. of the Mixed out. variable All Real & Normal Single factor Mixed All Ordinal or Nominal Φ or CRAMER‘s V Φ or CRAMER‘s V Φ or CRAMER‘s V Pearson or Spearman ρ Bi-variate or Kendall τ Regression Spearman ρ or Kendall τ All Ordinal or Nominal More-than-three factors All Real Otherwise Normal Multiple Regression Discriminant Logistic Analysis Regression According to table §3.2, the recommended technique for confirmatory data analysis depends strongly on the type of experimental hypothesis. For instance, for associational hypotheses with a single factor, correlation coefficients such as the Pearson product moment correlation and the Spearman rho could be used. Providing a comprehensive description of each method described in tables §3.3,§3.4, and §3.5 is out of the scope of this dissertation. Excellent descriptions of such procedures, their premises and application constraints, and numerous real examples are provided in [116], [195] and [134]. Since the types of MOEs taken into account in this dissertation use differential hypotheses (c.f. Sections §3.6 and §7.3) the primary analysis mechanism used in practice is Statistical Testing of Hypotheses5 (STH). STH is applied to decide whether there are significant differences between datasets of experimental results. Consequently, can assess if a metaheuristic, tailoring or tuning is better than other. STH is described in deeper detail in the next subsection. Statistical Testing of Hypotheses STH work by defining two hypotheses, the null hypothesis H0 and the alternative hypothesis H1 . The null hypothesis is a statement of no effect or no difference, whereas the alternative hypothesis represents the presence of an effect or a difference. 5 Confidence 62 intervals are also used, but STH is more popular 3.4. EXPERIMENTAL EXECUTION Figure 3.6: Hypothesis acceptance and rejection areas Thus, if H1 holds, then significant differences exist between groups of individuals, the performance of algorithms, or the effect of a technique or methodology for software development. Both hypothesis are mutually exclusive; i.e., if H0 holds then H1 does not hold, and vice-versa The result generated by most statistical tests is a p-value. A p-value is the probability of the observations provided as result of the experiment assuming that H0 is true. A p-value provides information about whether a hypothesis test is significant or not, and it also indicates something about how significant the result is: the smaller the p-value, the stronger the evidence against H0 [69]. Decision making on the hypotheses using statistical tests is performed by imposing a minimum threshold for the p-value from which we consider that the null hypothesis H0 is false. This threshold is named the significance level and denoted as α. Figure §3.6 shows a diagram that describes the roles of α and the p-value in the decision making about the hypotheses. The usual process for applying STH is [154, 299]: 1. Map the experimental hypothesis of the experiment into a pair of statistical hypotheses (H0 and H1 ). The hypotheses are described in terms of the parameters of the distributions of random variables from which a sample can be obtained by conducting the experiment (the data-set to be analysed). As a consequence, researchers must identify metrics that measure the variables that appear in the experimental hypothesis, and define the mechanism that will be used to instrument them. For instance, in MOEs the performance of a metaheuristic is usually measured by the value of the objective function for the best solution found in a run with a specific termination criterion. Once the random variables are identified and its instrumentation mechanisms are specified, statistical hypotheses H0 and H1 are stated. The mapping between the experimental and the statistical hypothesis is usually performed in terms of the parameters of the distributions of the random variables. For instance in experiment #1 the null hypothesis H0 would state that “there is no significant difference on the means of the distribution of values for 63 CHAPTER 3. EXPERIMENTATION the objective function provided by the different techniques; i.e., that their performance is similar.” 2. Decide which statistical analysis procedure is appropriate (the specific test of hypothesis). The main factors affecting this decision are: the type of hypothesis, the experimental design, and the nature of the data. Furthermore, this last factor is in turn twofold: the type of the variables can be nominal (e.g. X, Y Z), ordinal (e.g. good, fair, bad) or intervalar/ratio (e.g. 10.0, 5.0, 2.0), and the statistical properties of the data-set. In case of non-gaussian distributions non-parametric tests should be used. 3. Select a significance level (α). It is widely accepted that if the p-value is lower than 0.05, there is enough evidence to reject the null hypothesis H0 and assume that H1 holds. 4. Compute the p-value. In order to compute the p-value, the value of the test statistic T must be previously calculated from the dataset. Given the test statistic T and the expected distribution of the data D, the p-value is computed. 5. Decide to reject null hypothesis. It is widely accepted that if and only if the p-value is less than the significance level α the null hypothesis must be accepted. For instance, in the example shown in Figure §3.6 the value of the p-value is not below the selected α, thus there is no evidence for rejecting the H0 . When STH is used to detect significant differences among two distributions (the null hypothesis would be that the distributions are identical), they are called simple comparison tests. Conversely, when STH is used to detect significant differences among three or more distributions (the null hypothesis would be that all the distributions are identical), they are called multiple comparison tests. The use of such a null hypothesis in multiple comparison tests involves that the alternative hypothesis states that there are at least one distribution that is different from the rest. If the null hypothesis in a multiple comparison test is rejected, then we know that are significant differences but ignore among which of the distributions. Thus, in order to reach to concrete conclusions about which specific distributions are different form others, and additional type of statistical technique named post-hoc procedure must be applied. Post-hoc procedures are a special kind of STHs, concerned with finding relationships among a couple of distributions from the associated multiple comparison test. They control the accumulation of potential errors that derives for linking a sequence 64 3.5. EXPERIMENTAL VALIDITY of statistical tests in order to provide a global significance level for all the comparisons performed. For each specific of multiple comparison test (such as ANOVA or Friedman test), a specific set of post-hoc procedures has been defined in the literature. 3.5 E XPERIMENTAL VALIDITY The term validity refers to the approximate truth of a statement or inference. In the context of experimentation, it is usually applied to the conclusions regarding the hypothesis of an experiment. Thus, when an inference about the hypothesis (whether that it holds or not) is valid, it means that there is sufficient evidence, both in the data and the experimental process, to support the inference [248]. Validity is thus a property of inferences (or conclusions), it is not a property of designs or hypotheses. For example, using a complete randomized design does not guarantee the validity of an inference about the effectiveness of the antipyretic in experiment #1. There could be many reasons invalidating the inference. For instance, in experiment #1 the nurses could administer an erroneous dosis of the drug leading to wrong conclusions. For the sake of simplicity and in accordance with most of the literature on the topic, we will define an experiment as valid if it allows to draw valid conclusions. Threats to validity are the specific causes the compromise the validity of a conclusion. In this dissertation the enumeration of threats to validity presented by Shadish et al. in [248] is used with slight adaptations to our terminology. In general terms, threats to validity can be dividid into two groups: internal and external threats. The formers are those that affect to the validity of the conclusions of the experiment. The later are those that put at risk the generalization of the conclusions obtained. In the next subsections we list some of these threats. 3.5.1 Internal validity The internal validity of an experiment is defined as the extent to which we can infer that the hypothesis holds (or not) from the experimental process and data gathered. Consequently, the threats to internal validity are those caused by the characteristics of the experimental process and its setting. In the following, we describe some of the most common internal threats reported in the literature [116, 248]. In particular, we focus on those that could be automatically warned, detected or fixed. For a better understanding, each threat is presented by means of: i ) a brief definition, ii ) examples of 65 CHAPTER 3. EXPERIMENTATION situations where the threat could appear in experiments #1 and #2, iii ) possible mechanisms to diagnose (i.e., detect) the threat, and iv) ways of neutralizing the threat. The internal validity threats can be classified into four groups, describied in the following subsections. Threats caused by environmental factors and nuisance variables IVT-1 Wrong temporal precedence Definition: The measurements are erroneously taken before applying the treatment. Example: In experiment #1, a nurse could measure the corporal temperature of a patient before administering the drug. In experiment #2, an implementation bug could make the program to return the initial solution found as a result, instead of the best solution found during all iterations. Diagnosis: This threat is difficult to diagnose in general. However, in computational experiments, the execution environment can monitor the conduction of the experiment ensuring that no measures are taken before the treatments. Neutralization: This threat is neutralized fixing the problems that cause a wrong sequence of treatment-measurement and repeating the conduction of the experiment. This cannot be done automatically in general. ITV-2 History and environmental factors Definition: External events influence the conduction of the experiment affecting the outcome. Example: In experiment #1 could happen simultaneously to a heat wave falsifying the effect of the drug. In experiment #2, an operating system update could happen simultaneously to the execution of a program, decreasing the computational resources available for its execution. Diagnosis: Since the events that cause the bias on the outcome are not predictable nor monitored in general, there is not a general mechanism for diagnosing this threat. Neutralization: According to Shadish et al. and Gliner et al. the best approach to minimize this threat is the use of a random assignment and a randomized sequence of treatment application. 66 3.5. EXPERIMENTAL VALIDITY ITV-3 Testing effects Definition: Treatments on an experimental object affect other treatments. Examples: In experiment #1, the reaction to the drug could be different in patients that had already taken the drug before. In experiment #2 the effect of memory caches and predictive executions in modern processors could lead to an improvement of the results provided by some techniques. Diagnosis: To be the best of our knowledge, there is no methods to diagnose this threat. A tentative approach would be to compute the correlation of the values of the outcome with the index of the treatments related to the measurements in the sequence of the experimental protocol (globally, per block, and per group). If there is an strong correlation between the values, then the conclusions of the experiment could be threatened. Neutralization: Again, the recommended approach to minimize this threat is the use of a random assignment and a randomized sequence of treatment application [116, 248]. ITV-4 Instrumentation effects Definition: The way in which a variable is measured has an effect on its value. Example: In experiment #1, the conclusions would be threatened if the instrument used for measuring body temperature could get deteriorated during the experiment providing different measurements. In experiment #2, this threat could be a risk if the solutions generated by the metaheuristic program were dynamic, changing along time. Diagnosis: This threat can be diagnosed by obtaining several measurements prior and after the conduction of the experiment. Another diagnosis strategy for this threat is using multiple instrumentation artefacts during the experimental conduction. For more approaches to diagnose this specific threat to validity c.f. [116, chapter 11]. Neutralization: To the best of our knowledge the only way to neutralize this threat is to use different artefacts for the measurement. This strategy cannot be automated in general. The use of multiple instrumentation artefacts and its random assignment to perform the measurements during experimental conduction can also mitigate the effects of this threat. 67 CHAPTER 3. EXPERIMENTATION Threats caused by the characteristics of groups ITV-5 Heterogeneity of experimental objects Definition: Experimental objects are not homogeneous. As a result, the effects of the treatments on the outcomes are confounded with that of the specific levels of the noncontrollable factors of the experimental objects. This threat is also referred to as assignment and selection bias [116]. Examples: In experiment #1, if babies and adults are mixed in the groups the effect of the drug could be confounded due to the difference in the relation between weight and dose; i.e., in babies weighting 10 kilograms a dose of 100mg can cause a much bigger effect than in adults weighting 100 kilograms. Diagnosis: This threat arises if the experimental description contains non-controllable factors that are not used in the blocking of the design. This diagnosis should be interpreted as a warning for the experimenters. Another approach would be to measure the correlation between the levels of non-controllable factors not present in the blocking criterion and the outcome. If there is a strong correlation between the variables the conclusions drawn are threatened. Neutralization: The only way to minimize this threat is to reconduct the experiment introducing the non-controllable factors as blocking criteria. ITV-6 Attrition Definition: The measurement of outcomes for some experimental object fail, are lost, or become impossible to obtain during experimental conduction. This threat is also referred to as mortality. Examples: In experiment #1 some patients could die during the conduction of the experiment, leading to no measurements of the outcome. It is worth noting, that certain levels of attrition are acceptable (but not desirable) in experimental areas such as biology, medicine, etc. Diagnosis: Based on the experimental description the expected number of outcome measurements can be computed for classical designs. Since the measurements of the outcome variable should be provided as part of the lab-pack, it is possible to diagnose this threat by comparing the expected and actual number of measurements. Neutralization: The only way to neutralize this threat is repeat the conduction of the 68 3.5. EXPERIMENTAL VALIDITY experiment. For MOEs this mechanism can be automated. Threats related to the statistical analysis of the data ITV-7 Small sampling Definition: Statistical tests do not recognize as statistically significant the differences in the observations when the sample is very small. This threat is also referred to as low statistical power. Example: In experiment #1 if the number of feverish patients in the Hospital is small, for instance 5, the experiment conclusions would be threatened since specific and rare characteristics of one or two of the patients could lead to wrong conclusions. Diagnosis: A simple diagnostic mechanism for this threat is to check if the size of the sample is larger than a minimum. Historically, authors have used 30 experimental objects as such minimum for simple comparisons. Neutralization: According to [116, 248] there are several ways to increase the statistical power, but the most usual is to increase the size of the sampling. In MOEs, this can be usually automated by increasing the number of run of the metaheuristic programs. IVT-8 Violations of the preconditions of statistical test Definition: The preconditions of the statistical tests are not met and them conclusions drawn from them are erroneous. Example: In experiment #2, depending on the distribution and characteristics of the data, the recommended statistical test for the experiment according to table §3.3 would be either a T-test or a Wilcoxon test. If the data is not normal and the T-test is applied, the preconditions of the T-test would be violated and the conclusions could be erroneous. Diagnosis: Most of the preconditions of statistical tests, such as normality or homoscedasticity of data, can be evaluated through other statistical tests. Thus, those auxiliary tests can be used to check whether certain preconditions are fulfilled or not. Neutralization: In order to neutralize this threat, the statistical analysis must be repeated using statistical tests whose preconditions are not violated. The automation of this mechanism involves selecting and running the statistical test automatically. 69 CHAPTER 3. EXPERIMENTATION IVT-9 Fishing and error rate Definition: Several comparisons among pairs of observations are performed using simple comparison tests. As a result the error rate is accumulated and the conclusions of statistical tests are misleading. Example: In experiment #2 up to five optimization techniques are compared. If we apply a simple comparison tests, such as the T-test with α = 0.05 to evaluate the significance of the differences for each couple of techniques, we would perform 10 simple comparisons. The probability of error of the whole set of comparisons is not 0.05, but significantly higher [69]. Thus, a multiple comparison tests with post-hoc procedures should be used. Diagnosis: The automated diagnosis of this threat requires the analysis of the experimental description to determine the number of comparisons to be performed and the type of the test specified. Neutralization: In order to neutralize this threat, the statistical analysis must be repeated using appropriate multiple comparison statistical tests and post-hoc procedures. Again, the automation of this mechanism involves the automated selection and execution of the statistical test. IVT-10 Restriction of range Definition: Variables are restricted to a narrow range that do not match the actual domain of the observations. Example: This threat is equivalent to the out of range errors and precision losses due to castings in programming languages. For instance, in experiment #1 we could use [36, 40] as the valid range for the patient temperature. If we get a measurement of 41, it would be truncated during the analysis of the data leading to erroneous conclusions. In experiment #2, if the objective function is real with a range between [0, 500.000], but the levels specified for the experimental variable are in the range [0, 10] (meaning that measurements of values bigger than 10 are interpreted as a level of 10), it is probable that almost any observation were associated with the level 10. Consequently, the statistical test could miss actual differences in the distributions of observations of the variable. 70 3.5. EXPERIMENTAL VALIDITY Diagnosis: This threat can be diagnosed chekcing that all the values registered at the end of the experiment are in the ranges defined for each variable. We are not aware of any diagnosis method for the cases in which the values are in the range but that range does not match the actual range of the observations. Neutralization: The experiment should be repeated using an an adequate range for the variables. IVT-11 Inaccurate size estimation effect Definition: Some statistical tests overestimate or underestimate the differences in the observations depending on the type of variable. This leads experimenters to draw wrong conclusions about the experimental hypothesis. Example: In experiment #2, we could use a Quade’s test for multiple comparison. However, it is well known [69] that this test overestimates the differences of results between techniques when problem instances are very different. See [248, page 52] for some additional examples. Diagnosis: The diagnosis of this threat depends on behaviour of the specific statistical test or post-hoc procedure and on the specific experiment. For instance, the use of Quade’s test could be admissible if the difference of behaviour for the techniques with different problem instances would be of paramount importance for solving the optimization problem. However its use is not recommended in general. As a consequence, the diagnosis of this threat should be interpreted as a warning for experimenters, that should be confirmed. Neutralization: In order to neutralize this threat, the statistical analysis must be repeated using appropriate multiple comparison statistical tests and post-hoc procedures. Threats caused by the the characteristics of the experimental conduction IVT-12 Unreliability of treatment implementation Definition: The implementation of the treatment on experimental object is erroneous. Example: In experiment #1 some nurses, could forget administrating the drug to several patients, or could administer a different dose. 71 CHAPTER 3. EXPERIMENTATION Diagnosis: This could be diagnosed by analysing the variance and the outliers of the distributions of measurement per instrumentation artefact, block and group. However, this method is applicable when groups have a minimum size, and the variance of the distribution is small. Neutralization: If unreliable measures are detected as outliers, the filtering of those measurements is a valid neutralization mechanism. Otherwise, removing the threat could require the conduction of the experiment with different instrumentation artefacts. 3.5.2 External Validity The external validity of an experiment is defined as the extent to which conclusions can be generalized. Thus, most external validity issues are related to experimental objects, settings, treatments or outcomes that were not studied in the experiment [60, 61, 248]. Threats to external validity The threats to the external validity of the experiment are those that could affect the way in which the conclusions are generalized from the experimental sample to the target population. Next, we enumerate those proposed by [248] and [116]. The information regarding the diangosis and neutralization of external validity threats are omitted since it is out of the scope of this work. Interaction of experimental objects with factor effects: A conclusion drawn with a sample could not be extrapolated to a different sample. In experiment #1, the reductions of body temperature observed with specific feverish patients from Sevilla could be different form those observed on a different set of patients. Interaction of treatment implementation and factor effects: The effect observed for a treatment could vary depending on the specific details of their application. In experiment #1, the effect of the drug could be different if patients gulp it with water or not. Interaction of factor effects and outcome variable: The effect observed for a treatment could depend on the specific outcome variable measured. In experiment #2, the effect of the drug could be different if we measure fever through the amount of sweat. Interaction of factor effects and experimental setting: The effect observed for a treatment could depend strongly on specific elements of the experimental setting or the 72 3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS context of the experiment. In experiment #1, the reduction of body temperature could depend on the temperature of the room where the treatments were applied. 3.6 M ETAHEURISTIC O PTIMIZATION E XPERIMENTS Metaheuristic optimization experiments have some characteristics that make them even more time consuming and difficult to set-up than other computational algorithmic experiments. First, the stochastic nature of the algorithms makes it necessary a high number of runs per group to get significant results, leading to long-running experiments. Second, even using a MOF a significant development effort is needed. This development effort is due to the need of adapting the techniques to the problem at hand (by using the extension points of the MOFs that encode the tailoring points of the implemented metaheuristics), and implement the experimental procedure, thus we will have bugs that are usually detected during the execution of the experiment or in the data analysis of the results. As a consequence, experiments are usually run several times until reaching to valid results. Third, the high number of variants and parameters that the different techniques present lead to experimental variables with a high number of levels and complex designs. Finally, one experiment usually requires the realization of subordinate experiments, magnifying the effect of the previous characteristics. For instance, in technique comparison experiments the performance of the techniques depends strongly on the parameter values used. As a consequence, in order to perform a fair comparison, one subordinate experiment per technique must be performed, in order to find its optimal parameter configuration [21]. Next we describe the types of MOEs taken into account in this dissertation to support the MPS life-cycle. 3.6.1 Selection and Tailoring experiments Given an optimization problem P = ( f , A) and a set of metaheuristic algorithms M = { M1 , . . . , Mn }, the goal of this experiment in to determine the techniques that perform better than all the others; i.e., find the best techniques to solve P. Each metaheuristic algorithm is run with a set of parameter values that is constant along the experiment. Each metahuristic Mi generates a set of solutions Si = {si,1 , . . . , si,nruns }, where si,j ∈ A is the solution generated by the j-th run of Mi . This notion of better performance usually operationalized as the maximum average value on f of the solutions provided by each technique in maximization problems, i.e., 73 CHAPTER 3. EXPERIMENTATION |S | ∑ i f (si,j ) avg( f , Si ) = j=|1S | . Thus the goal of the experiment is i such that ∀( Mi ∈ M∗ , Mk < M∗ ) • avg( f , Si ) > avg( f , Sk ). finding the subset M∗ ⊆ M The experimental object in this experiment are runs of metaheuristics in M solving P. There is one single controllable factor (we name it “technique”), whose levels are labels corresponding to the different metaheuristics {0 M10 , . . . ,0 Mn0 }. The outcome is the value on f of the solution provided by each experimental object (algorithm run). The hypothesis of this type of experiment is differential, stating that the specific metaheuristic used to solve the problem has an impact on the value on f of the solutions provided. In statistical terms, the null hypothesis H0TC of this type of MOE states that “there is no difference in the mean value on f for the populations of solutions generated by the metaheuristics”. The alternative hypothesis H1TC states that “there is a significant difference in the mean value on f for the populations of solutions generated by the metaheuristics”, i.e., that there are some techniques that perform better than the rest. Usually, this comparison is performed not for a single problem instance, but for a finite set I = { I1 , . . . , Im } of problem instances. Since the specific problem instance to be solved can have an important impact on the outcome of the experiment, it is treated usually as a blocking variable playing the role of nuisance factor. The design of this type of MOE is usually a blocking factorial, where each metaheuristic Mi is executed nruns times on each problem instance Ik . The analysis for testing the statistical hypothesis depends on the number of metaheuristics in M. Since one single factor is present, the techniques described in table §3.3 are used. When we compare a pair of metaheuristics (| M| = 2), then we have a simple comparison experiment, and the recommended analyses are the T-Test and the MannWithney or Wilcoxon’s Tests if some of the premises of the T-test are violated (typically normality, but also sphericity or homoscedasticy, and independence). If the statistical test rejects the null hypothesis we conclude that some metaheuristics are better than others, and use the means computed to determine which the best is. When comparing three or more metaheuristics, then we have a multiple comparison experiment. The recommended analyses are ANOVA and Friedman’s test if some of the premises of the ANOVA are violated (typically normality, sphericity or homoscedasticy, and independence). If the statistical test rejects the null hypothesis we conclude that some metaheuristics are better than others. In this case we need to 74 3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS perform post-hoc tests to determine among which metaheuristic are significant differences, and determine those that perform better [69]. A slight variant of this kind of MOE is when several tailorings (algorithms) generated for the same metaheuristic are compared. The hypotheses, experimental objects, design and analyses are similar to those used for selection experiments, hence the generalization as technique comparison experiments. 3.6.2 Tuning Experiments In this type of MOE a single metaheuristic algorithm Mx having a set of parameters {ρ1 , . . . , ρn } with specific domains { D1 , . . . , Dn } is used. The goal of the experiment is to find an optimal parametrization of Mx , i.e., the parameter values V ∗ = (v1∗ , . . . , v∗n ) that provide the best performance for solving P with Mx where vi ∈ Di , i = 1, . . . , n. The experimental object in this experiment are runs of Mx solving P. The otucome is the value on f of the solution provided by each experimental object (algorithm run). There is one controllable factor per parameter, and the levels of such variables depend on the specific domain of each parameter. The comparison is usually performed not for a single problem instance, but for a finite set I = { I1 , . . . , Im } of problem instances. Since the specific problem instance to be solved can have an important impact on the outcome of the experiment, it is treated usually as a blocking variable playing the role of nuisance factor. The main difference between both types of experiments is that the number of factors and their levels can be much higher for Technique Tuning Experiments than for Technique Comparison Experiments. Furthermore, the domain of some parameters in Technique Comparison Experiments can have an infinite number of values, leading to its discretization by the experimenter, to the use of complex designs [19, 29, 50, 237], or to carrying out a number related Technique Tuning Experiments for performing the Tuning stage of the MPS as proposed in [22]. For the purpose of this dissertation, a simple blocking factorial design is chosen for this kind of experiments. Problem instance is usually treated as a blocking variable due to the specific problem instance may strongly impact on the outcome. However, the contributions of this dissertation are extensible and in most cases can be adapted to alternative designs. In the remainder of this subsection the use of this design is assumed. 75 CHAPTER 3. EXPERIMENTATION The null hypothesis of this type of MOE states that “there is no difference in the mean value on f for the populations of solutions generated by the any of the parametrizations chosen”. The alternative hypothesis states that “there is a significant difference in the mean value on f for the populations of solutions generated by the metaheuristics”, i.e., there are some parametrizations that perform better than the rest. The analysis for testing the hypotheses is similar to that specified for Technique Comparison Experiments when using a blocking factorial design, but in this cases multiple factors are present, leading to the use of the techniques described in table §3.4. 3.6.3 Designs for MOEs The experimental designs used for selection and tailoring experiments are usually classical designs. When the experiment has a single controllable factor, complete randomized designs are used (with blocking when non-controllable factors are present), since the problem instance is usually a blocking factor the most usual design in this case is randomized complete block design. In a very similar way, when the experiment has multiple controllable factors (such as in tailoring experiments with variants in multiple tailoring points), factorial and factorial blocking designs are applied. Tuning experiments present a much wider diversity of experimental design. Although, factorial and factorial blocking designs are usually applied, several of complex designs for this kind of MOE has been proposed in literature. The most relevant proposals in this area are: Response surface designs [37, 237], SPO (Sequential Parameter Optimization) [21], and racing algorithms [19, 29, 181] (interested readers can find a comprehensive survey in [24]). It is worth noting that most of those proposals do not provide an explicit experimental protocol, but algorithm that decides the experimental protocol at condution time based on the observations obtained at each moment. 3.6.4 Analyses for MOEs MOEs present some specific characteristics that had led to the development of specific way for deciding which specific statistical analysis techniques should be used. In the specific context of computer science, the assumptions enumerated above are less likely than in natural or social sciences such as biology and psychology where variables are usually normal [116]. In this sense, MOEs require a wide spectrum of tests of hypothesis. 76 3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS Furthermore, multiple comparison tests have one important drawback: they can only detect significant differences over the whole set of data; i.e., the test detects that there are significant differences between any of the multiple compared groups, treatments or algorithms, but it does not identify between which ones. One could think of applying simple comparison tests to detect the differences between every pair of variables in the dataset but this process introduces an important error risk. This error comes from the combination of many pairwise comparisons, that involves increasing the probability of rejecting one or more true null hypotheses (in statistical terminology, this error is denoted as the Family Wise Error Rate) [111]. In these circumstances the use of post-hoc procedures is recommended. Useful practical considerations for nonparametric multiple comparisons test are provided in [69] including recommendations for the selection of the post-hoc procedures to be applied. 3.6.5 Threats to validity in MOEs Experimental conduction in MOEs implies the repeated execution of different metaheuristic programs to find solution on each problem instances. The assignment of the executions to metaheuristic programs and problem instances is usually randomized and automated, thus avoiding human bias. Furthermore, since experimental objects are metaheuristic executions on specific problem instances, no risk of mortality or attrition exists. However, even with such an automated experiment conduction and assignment, there exists risk of maturation, testing or memory effects. For instance, the effect of memory caches and predictive executions in modern processors could lead to an improvement of the results provided by the programs if their execution is repeated sequentially. In order to minimize the impact of such effects on the results of the experiment, the order in which metaheuristic programs are run on the different problem instances is randomized. MOEs are also affected by all the threats regarding to the validity of the analysis described in section §3.5.2. For instance, fishing and error rate threats are also important, hence the need of using multiple comparison tests and posthoc procedures. Additionally, history threats (also called extraneous environmental events in [116]) can also affect MOEs. This threat is caused by the concurrency of events that affect the experimental environment with the experiment conduction, impacting on the set of outcomes of the experiment. For instance, an automated operating system update could be performed on the experimentation computer while the experiment is carried out. In 77 CHAPTER 3. EXPERIMENTATION this situation, the measured performance of the techniques could be affected, depending on the specific stop criteria used in the experiment. Finally, since metaheuristic algorithms must be implemented as programs MOEs are threatened unreliability of treatment implementation (implementations can have bugs). 78 PART III C ONTRIBUTIONS 4 M OTIVATION Every problem has in it the seeds of its own solution. If you don’t have any problems, you don’t get any seeds. Norman Vincent Peale, 1898 – 1993 American author and minister n this chapter, we motivate our contributions. After a brief introduction to the chapter goals in Section §4.1, we describe the problems addressed in this dissertation and the current solutions found in the literature in Section §4.2. Section §4.3 outlines the main contributions of this thesis relating them to the problem that motivated our work. Finally, a summary of the content closes the chapter in Section §4.4. I 81 CHAPTER 4. MOTIVATION 4.1 I NTRODUCTION The use of metaheuristics to solve optimization problems is a largely studied topic in the field of computer sciences. In this context, software engineers recently realized of the benefits of using metaheuristics to solve hard optimization problems, usually referred to as search-based problems. This has led to a “search-based” trend observed in a number of software engineering conferences and special journal issues on the matter [105, 129, 130, 131]. However, despite its many benefits, the application of metaheuristics requires overcoming numerous obstacles. First, the implementation of metaheuristic programs is a complex and error-prone process that require of knowledgeable developers. Although some supporting tools have been proposed, these usually automate only single parts of the MPS life-cycle. Furthermore, a key challenge on the application of metaheuristic is experimentation. This is due to the fact that there is no analytical method to choose a suitable metaheuristic program for a given problem. Instead, experiments must be performed to compare the candidate techniques and their possible variants. This can lead to hundred of potential alternatives to be compared making the design, execution and analysis of experiments complex and time-consuming. Besides this, experiments are usually performed ad-hoc with generic tools and no clear guidelines [21] introducing threats to validity and making them hardly automated and reproducible. The goal of this chapter is to motivate the need for specific tools that reduce the cost of using metaheuristics, especially for those not familiar with them as software engineers. To that purpose, we follow a software engineering approach and present a set of generic and extensible tools to support most of the activities of the MPS and MOE life-cycles. These contributions are the results of an exhaustive analysis of the state of the art as well as our own experience in solving optimization problems in the area of software engineering. We trust that our work may contribute to the progress of metaheuristics in general and search-based software engineering in particular. 82 4.2. PROBLEMS 4.2 P ROBLEMS The following subsections describe the problems that motivate the contributions presented in this dissertation and the related works found in the literature. 4.2.1 On the implementation of MPS applications There exist tens of different MOFs available for researchers and practitioners. Each MOF supports a subset of metaheuristics techniques and phases of the MPS life-cycle resulting in a quite heterogeneous set of features. This hinders the comparison of MOFs and therefore the selection of the right framework for a given optimization problem. As a result, the use of a MOF usually involve reviewing extensive documentation (or code) in order to find out the features that it supports in terms of metaheuristics techniques, tailoring mechanisms, formats, parallelization, hybridization, etc. This is a tedious and time-consuming task, especially for unexperienced users, that increases the cost of using metaheuristics. Comparisons frameworks in the literature about MOFs are either informal evaluations using author criteria or focus on performance [298]. Gagnè and Parizeau present a comparison of MOFs but only those supporting evolutionary algorithms [107]. In [281], Voß and Woodruff present a constructive discussion of various software libraries but there is a lack of a comparative analysis. Finally, some articles such as [41, 71], present a concrete MOF and include a related work section comparing their tool with others. However, those works present a narrow perspective with a brief comparison of a few MOFs. 4.2.2 On the description of MOEs One of the main obstacles to automate the execution, validation and replication of experiments is the lack of rigorous and detailed experimental descriptions. Although many papers report an experimental setup, this is usually an informal description in natural language were many details are omitted or vaguely described. Instead, as presented in Chapter §3, experimental descriptions should include information about the experimental objects, subjects and population as well as definition of the variables, hypothesis and design protocol. The lack of means to describe experiments also affects to their validity. In many 83 CHAPTER 4. MOTIVATION cases, researchers and practitioners are not even aware of the different types of threats to the validity of their results. In fact, not all authors describe the threats to validity of their experiments and only a few distinguish between internal and external validity. In any case, it is up to the user to detect the threats of his experiments and to implement the necessary measures to neutralize them. This is a manual task that requires a deep knowledge of experimentation, and it is therefore out of the scope of many users including software engineers. This slows down the experimentation process and affects to its validity hindering the progress of research disciplines. Current solutions for describing experiments rely on the so-called Experiment Description Languages (EDLs). EDLs can be categorized as descriptive and operational. Descriptive languages allow describing experiments ensuring that a minimum set of sections and element descriptions are provided and they rely on experimenters for interpreting the description and using it for replication. Thus, descriptive languages hinder automation and expose the experimentation process to human errors during conduction, due to mistakes or misinterpretations of experimental descriptions. An example of generic descriptive language is EDL [253], that provides a basic XML syntax to organize the description of experiments in any scientific area. Tons of domain specific formats and guidelines for experimental descriptions that can be regarded as descriptive languages have been proposed in specific areas. For instance, a scheme for experimental description in software engineering is provided by Juristo and Moreno in [155]. On the contrary, operational languages such as SED-ML [167], allow describing experiments ensuring its automated replication. To achieve it, operational languages must be allowed specifying numerous details, which is the main reason why most proposals of operational languages are domain-specific. For instance, SED-ML is intended for simulation experiments and PEBL [196, 250] for the creation of form-based experiments in psychology. To the best of our knowledge, previous proposals of EDLs for MOEs are limited exclusively to the proprietary formats defined by the MOFs. Those formats usually specify the problems to be solved and the techniques used to solve them, but are not full fledged experimental descriptions, since they do not specify any hypothesis, experimental design, or analysis procedure. The single format that provides support for execution results is that of Optsicom Optimization Suite [108]. Other frameworks such as JCLEC [277] and HeuristicLab [284] only provide support and templates for the specification of optimization techniques, problems and tasks. There are two elements of MOEs description for which previous approaches have 84 4.2. PROBLEMS been proposed in the literature, namely optimization problems specifications and optimization techniques specifications. Regarding the former, several languages has been proposed in the literature [156], such as AMPL [102] or GAMS [70]. Regarding the latter, some proposals for particular metaheuristics (such as EDL [76] for Evolutionary Algorithms and Localizer [191] for local search), and other notations for the general description of optimization algorithms have been proposed [74, 75]. In this sense, it is noticeable that HeuristicLab provides a graphical notation for this purpose [285]. Again, those formats are not fully fledged experimental description formats, since they focus on describing optimization problems or metaheuristic algorithms. Those formats lack of elements for describing hypotheses, experimental designs, analysis procedures or the requirements of the experimental setting for replication. 4.2.3 On the execution of MOEs Analogously to the experimental descriptions, there is a lack of means to describe the information required to execute experiments. This hinders the automation of experiments and, more importantly, their reproduction in comparable settings. This information should include, as a minimum, information about the configuration required to run the experiment: operating system, libraries, pre/post processing, etc. A related issue is the lack of interoperable means to distribute the results of experiments which are usually given in natural language or in ad-hoc formats as excel spreadsheets. This hinders tackling the threats to validity related to the consistency between experimental description and results. For instance, it is currently up to the user to check that the number of algorithm runs matches with the expected number indicated in the description (see threat ITV-6 in Chapter §3). This makes difficult undertaking experiments and increases the chances of obtaining wrong conclusions. Related work on the execution of MOEs can be classified into scientific workflow systems (SWS), generic experiment execution platforms, and domain specific metaheuristic execution tools and services. SWS are designed specifically to compose and execute a series of computational or data manipulation steps in engineering or research contexts. The trend of creating SWS originated in the bioinformatics area, where the needs of data processing are massive, but it has been expanded to other areas, culminating with the creation of several generic SWS such as Taverna [203], Kepler [177], LONIPipeline [236] and Trident [20]. SWS integrate in a seamless way tasks to be performed by researchers and support- 85 CHAPTER 4. MOTIVATION ing tools for but: i ) they are not designed specifically for experimentation, thus require the specification of the own experimentation life-cycle as part of the workflow, ii ) they do not provide any validation mechanism regarding the threats to the validity of the experiments, iii ) the do not force the specification of a minimum set of information regarding the experiment per se; i.e., the checking of the presence of such information must be coded in the own scientific workflow, and iv) when alternative analysis or complex design are present, they require the coding of the alternatives and decision mechanism in the own scientific workflow. Tasks associated to issues iii ) and iv) are very tedious and error-prone for most experiments. Regarding generic experiment execution platforms we point out the Collage authoring environment [200], which provides a computational environment for the execution of scripts that implement experiments. It supports several implementation languages such as python or bash. Another interesting proposal is SHARE [127] a web portal for the creation and sharing of executable papers through virtual machines. None of these tools introduce any validity checking or minimum requirements about information provided for the experiments. Moreover, they do not use any kind of EDL, but rely on descriptive languages or source code implementations. Finally, recent trend in this category are experimental data repositories [87, 202, 218, 223], that do not support the automated replication or execution of the experiments but the dissemination of experimental execution results, analyses, and lab-packs. Regarding metaheuristic execution tools, some proposals provide the execution of metaheuristic programs as a service [113, 206], but to the extent of our knowledge no specific tools have been developed for experimentation. 4.2.4 On the analysis of MOEs Once the results of a experiment are obtained, these must be analysed to check whether they support or disprove the experimental hypothesis. As explained in Chapter §3, in the context of metaheuristics this is commonly done using statistical tests. This is a complex task that require a deep knowledge of the available tests and their application conditions. Furthermore, statistical analysis tools such as SPSS [144] or R [2] involves overcoming a steep learning curve, especially for those out of the fields of maths and statistics. The consequences of such complexity is an increase in the resources needed to analyse experimental results. Related work in this context can be classified into three categories: software tools 86 4.2. PROBLEMS and libraries for statistical analysis, on-line software for statistical analysis, and web services for statistical testing of hypotheses. A set of statistical analysis libraries have been created by different authors, such as JavaNPST [69], the supporting library developed by Garcı́a et al. [111], the Java Statistical Classes [28], or the Apache Commons Math library [12]. However, the set of tests available in those libraries is usually incomplete, focusing either on parametric or non parametric tests, and lacking in most cases of post-hoc analysis procedures. Regarding on-line statistical analysis tools there are also a number of proposals [51, 176, 221, 259, 272, 292, 293]. However, those tools do not provide neither some of the non-parametric multiple comparison tests, such as Quade’s test or Aligned Friedman test, nor post-hoc analysis procedures. Those proposals do not provide integrated wizards nor methodological guidance for the selection of the statistical test to be applied. Moreover, the approaches of [51] and [292] are not based on web standards, but on Java applets that are embedded on the pages As far as we know, the only system providing XML web services for statistical testing of hypotheses is [259]. This site provides some operations implementing parametric tests (specifically ANOVA and T-Student tests), but does not provide non-parametric tests. 4.2.5 On the replicability of MOEs The possibility of reproducing experiments is a key point for their validation. Replicability is a widely adopted practice in certain areas of science. For instance, everytime that a relevance finding is published in the area of p hysics, other labs in the world rapidly reproduce the experiment to confirm or disprove the discovery. In this sense, in order to reproduce an experiment, all the information related to its description and execution should be provided. This is usually not the case in current literature in computer science, where the experimental details presented in a typical paper are usually insufficient to implement the same algorithm [80]. As pointed out by Eiben and Jelasity in [80], effectively reproducing and verifying the results found in literature is almost impossible, or at minimum, an extremely laborious task. In this sense, even when the source code of the experiment is provided, it may not be reproducible. The dependences on the computational environment, runtimes and specific configurations needed imposes limitations, forcing the researchers to perform code-reviews, and even substantial modifications in the source code in or- 87 CHAPTER 4. MOTIVATION der to replicate the experiment. Furthermore, source code quality and comments of research prototypes is not as good and complete as in production code, leading to a process that is usually more tedious and error-prone than re-implementing the algorithms and experimental execution procedures from scratch. The problem of reproducing experiments is aggravated in the field of metaheuristics due to the huge number of technique variants, its stochastic nature, and the lack of a widely accepted scheme for describing experiments and their execution. 4.3 O VERVIEW OF OUR CONTRIBUTIONS Next, we overview the main contributions of this dissertation relating them to the problems presented in the previous section. 4.3.1 On the implementation of MPS applications We present a comparison framework to facilitate the comparison and selection of the best MOF for a given problem. The framework includes 271 features that an ideal MOF should support. A metric has been defined for each feature in order to assess the support provided by current MOFs. Furthermore, means to aggregate such assessments into general quantitative scores in six different areas have been defined. In total, we have studied 34 different MOFs found in the literature. Based on such comparison framework, ten MOFs are assessed to provide a picture of the current state of the art. 4.3.2 On the description of MOEs We present two languages for the description of experiments: SEDL and MOEDL. SEDL (Scientific Experiments Description Language) is a generic language to describe experiments in a precise, unambiguous and tool-independent way. SEDL documents include all the information that a basic experiment description should provide regardless of the application domain, namely: objects, subjects, population, variables, hypothesis, treatments and analysis design. SEDL also defines a set of extension points that allow the creation of Domain Specific Languages (DSLs) for the definition of domainspecific experiments. MOEDL (Metaheuristic Optimization Experiments Description Language) is a DSL based on SEDL for describing MOEs, which avoids the need for providing a full description of the most common elements of typical metaheuristic ex- 88 4.3. OVERVIEW OF OUR CONTRIBUTIONS periments, such as techniques comparison and parameter tuning. Moreover, MOEDL include extension points that enable adding support for multiple types of problems and techniques, e.g. multi-objective optimization problems. Although currently MOEDL does not provide support for the problems and techniques specification languages described in Section §4.2.2, an extension point has been introduced in order to add such support in the future. The benefits of using experimental description languages are plentiful. First, they contribute to the automation and replication of experiments making experimental descriptions complete and self-contained. Second, these languages facilitate exchanging experimental information among researchers and practitioners. Finally, languages can be helpful for amateurish users who can use them as a guide of the basic information required to perform a correct experiment. In addition to previous languages, we present a catalog of 15 analysis operations on SEDL documents. These operations automatically check the most common validity threats associated to experiments warning users and suggesting possible fixes (see Chapter §3). For instance, these analysis operations can detect inconsistencies between the hypothesis of the experiment and the design specified for its conduction, and warn the user that this design cannot disprove or confirm such hypothesis. Finally, we present MOSES, a software ecosystem and associated reference implementation for the automated processing of SEDL and MOEDL documents. Among other features, MOSES supports the automated analysis of SEDL and MOEDL documents. To the best of our knowledge, this is the first approach proposing a generic, extensible and operational language for describing experiments and specific tools supporting their analyses. 4.3.3 On the execution of MOEs SEDL and MOEDL documents include specific sections for the information required to execute the experiment as well as reporting the statistical analysis of the results. This contributes to the automation and replication of the experiments including all the information needed to perform the experiment as well as to compare the outcomes. Additionally, we propose 15 analysis operations on SEDL documents for checking the threats to validity related to the matching between experimental description and results. For instance, one of those operations compare the expected number of measurements of the outcome variable that should be generated according to 89 CHAPTER 4. MOTIVATION the experimental design, with the actual number of measurements present in the results. If these values are different it means that the conclusions of the experiment are threatened due to, i ) attrition ITV-6, i.e., the experiment generated less measurements than expected, ii ) a problem in the implementation of the treatments IVT-12 or iii ) a failure in the measurements, i.e., the experiment generated more measurements than expected. Given a SEDL or a MOEDL document representing experiments, MOSES supports the implementation of the algorithms described through FOM, the automated analysis of the experiment, the automated conduction and replication of MOEs through Experiment Execution Environment (E3), and the publication of lab-packs following a specific format named Scientific Experiment Archive (SEA). 4.3.4 On the analysis of MOEs We present STATService, a suite of on-line software tools for performing statistical analysis. Roughly speaking, STATService provides a multi-purpose and multi-user platform to carry out statistical analysis that focuses on reusability, ease of use and learnability. It also provides three different interfaces (web, programmatic, and MSExcel-based) to use their statistical analysis capabilities. Furthermore, STATService provides added-value utilities such as a wizard to help non-expert users to select the most appropriate test for their concrete dataset, and a programmatic interface as XML web services for advanced users. STATService integrates some traditional libraries with additional tests developed by authors in order to provide a comprehensive set of statistical tests. Specifically, STATService integrates tests provided by JavaNPST, the supporting library developed by Garcı́a et al. in [111], and the Apache Commons Math library [12]. 4.3.5 On the replicability of MOEs Each one of the approaches presented in the previous sections contribute in one way or another to the replicability of experiments. First, the languages SEDL and MOEDL enable the complete description of experiments in a self-contained and machineproccesable way. Second, MOSES enable the automated execution of experiments described in SEDL and MOEDL as well as the automated validation of their descriptions and results. The own nature of the ecosystem make it easily extensible easing the replication of experiments using third-party MOFs or statistical analysis tools. 90 4.4. SUMMARY 4.3.6 On the development of MPS-based applications For the evaluation of our approaches we used them to solve two relevant problems in the context of software engineering. • Quality-driven web service composition. In this problem the Quality of Service (QoS) provided by each service is used to drive the choice of the specific service provider to invoke in a composition, trying to maximize the global QoS experienced by the users. Experiments show that our algorithm, named QoS-Gasp, outperforms previous metaheuristic approaches proposed in literature for real-time binding scenarios. Specifically, QoS-Gasp provided bindings that improve QoS provided by previous proposals up to a 40%. • Hard Feature Model Generation. In this problem, we searched for feature models (c.f. Section §H.2) as difficult to analyse as possible for current tools, in order to determine its performance in pessimistic scenarios. The proposed algorithm, named ETHOM, successfully identified models producing much longer executions times (up to 30 minutes) and higher memory consumption than those obtained with random models of identical or even larger size. 4.4 S UMMARY In this chapter, we have presented the main problems that hinder the use of metaheuristics and that motivated this dissertation. These mainly focus on the lack of comparison frameworks for MOFs and the lack of support for the description, execution, analysis and replicability of MOEs. We have also outlined the current solutions and we have summarized our contributions emphasizing the gap they fill. 91 5 C OMPARATIVE FRAMEWORK FOR MOF S If you cannot measure it, you cannot control it. John Grebe, 1900 – 1984 American chemist his chapter presents a comparative study of Metaheuristic Optimization Frameworks. As criteria for comparison a set of 271 features grouped in 30 characteristics and 6 areas have been selected. These features include the different metaheuristic techniques supported, mechanisms for solution encoding, constraint handling, neighbourhood specification, hybridization, parallel and distributed computation, software engineering best practices, documentation and user interface, etc. A metric has been defined for each feature so that the scores obtained by a framework are averaged within each group of features; leading to a final average score for each framework. Out of thirty four frameworks identified in the literature ten have been selected using well-defined filtering criteria, and the results of the comparison are analyzed with the aim of identifying improvement areas and gaps in specific frameworks and the whole set. Section §5.2 describes the methodology used to create our comparative framework divided into six areas. In further sections, each area is developed in detail (sections §5.3 to §5.8), defining a set of characteristics, its importance, metrics, and data sources used for its evaluation. In each section, charts and interesting results on the current support by the selected MOFs are provided. In Section §5.9 we discuss the results obtained from a global perspective, showing significant gaps and general tendencies. Finally, in Section §5.10 we summarize and present the main conclusions. T 93 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS 5.1 I NTRODUCTION The key point of this chapter is to provide a general comparative framework to guide the selection of a particular MOF and to evaluate the current MOFs found in the literature. In doing so, this chapter extends the comparative framework of [107] including frameworks that incorporate several types of metaheuristic techniques (cf. Sec. §5.3) and presents a comparative analysis of a large set of features. Specifically, this chapter advances the state of the art in the following: 1. A general comparative framework for MOFs that can be used to classify, evaluate and compare them. 2. An analysis of the current relevant MOFs in the literature based in the comparative framework proposed. 3. An evaluation of the current state of the art of MOFs from the research context that can be used: (i) to guide newcomers in the area and (ii) to identify relevant gaps to MOFs developers. It is important to highlight that the main value of this study is neither in comparing the rankings of two concrete MOFs in a feature or characteristic, nor in stating which MOFs better fulfills the benchmark criteria; but the establishment of a general comparison framework which clearly defines the set of desirable features of MOFs; depicting a real “state of the art” of MOFs with improvement directions and gaps in features support. This comparison framework has shown its value and generality, allowing the evaluation of the new versions of assessed MOFs released during the realization of this study without modifications (four MOFs released new versions). Moreover, the possibility of downloading the benchmark as a spreadsheet and tailoring it to user needs by modifying its weights is also crucial for making it more relevant and applicable. 5.2 R EVIEW M ETHOD The present comparative is based on the software technology evaluation methodology proposed by Brown and Wallnau in [39], which seeks to identify the value added by technology through the establishment of a descriptive model in terms of its features of interest and their relationship and importance to its usage contexts. In our case, 94 5.2. REVIEW METHOD only the first phase is performed, providing a descriptive model which enables the evaluation of technologies and the description of the features of interest. The second phase which involves conducting experiments with each of the MOFs associated with specific use scenarios, and is beyond the scope of this dissertation. In order to establish our descriptive model of characteristics to be supported by MOFs, and select the set of MOFs to assess, we followed a systematic and structured method inspired by the guidelines by Kitchenham [163]: First, we stated a set of research questions (see next sub-section). Secondly, we established the information sources used for the search of the candidate MOFs. Then, we applied filtering criteria to obtain the final set of MOFs to be analyzed. Finally, we composed and grouped the full set of comparison criteria, and used them to assess MOFs. 5.2.1 Research Questions In this section, we further refine the research questions presented in Chapter §4 regarding MOFs: • RQ1:What metaheuristics are currently supported by MOFs? This question motivates the following sub-questions: – Is there a MOF that supports the whole set of techniques? – What is the most popular technique? i.e., Which is the technique implemented by most MOFs? – Is there a “core set of techniques” supported by more than the 50% of the assessed MOFs? • RQ2: What tailoring mechanisms do current MOFs support, and to what extent are those mechanisms supported? This question motivates the following sub-questions: – Is there a “core set of adaption mechanisms” (such as solution encoding mechanisms, operators, etc. ) supported by more than the 50% of the assessed MOFs? – What MOF is better suited to adapt to specific problem solving? • RQ3: What combination of techniques (hybrid approaches) are supported when using a MOF? This question motivates the following sub-question: 95 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS – Is hybridization a widely supported feature (supported by more than half of the assessed frameworks)? – What is the most common hybridization mechanism supported by MOFs? • RQ4: Can current MOFs help to find out the best tuning for their supported metaheuristics (for instance performing hyper-heuristic search)? • RQ5: To what extent do current MOFs take advantage of parallelization capabilities of metaheuristics and distributed computing? • RQ6: What additional tools are provided by current MOFs in order to support the MPS life-cycle? • RQ7: Which costs and licensing model do current MOFs go by? • RQ8: What platforms (operating system, programming languages, etc.) are supported by current MOFs? • RQ9: Are current MOFs using software engineering best practices in order to improve code quality, maintainability, stability and performance? After reviewing all this information we also want to answer some more general questions: • RQ10: What degree of maturity and popularity do current MOFs have? This question motivates the following sub-questions: – What problems have been solved with each MOF? – What documentation and help on its use does each MOF provide? – Are current MOFs supported by scientific publications? – What is the user community of each current MOF? – Which is currently the most popular MOF? • RQ11: What are the challenges to be faced in the evolution and development of MOFs? 96 5.2. REVIEW METHOD 5.2.2 Source material The information sources used for the search of MOFs have primarily been electronic databases through their online search engines. Specifically, we have searched on: IEEE Xplore, ACM Digital Library, SpringerLink and Scopus. The following search strings have been used: “Metaheuristic Optimization Framework”, “Heuristic Optimization Framework”, “Metaheuristic Software library”, “ Metaheuristic Optimization Library” and “Metaheuristic Optimization Tool”. Based on the results obtained, a list of candidate MOFs was generated, that later was enlarged using direct web searches (using Google and the search strings described above) and references present on papers and frameworks‘ web sites. Key references obtained during this phase were [281] by Voß and Woodruff and Gagnè and Parizeau. However, frameworks web sites were a key data source, given that their links, articles and related work sections allowed us establish the full reference set to study. After a detailed analysis of these references, an initial set of main supported features and MOFs were established, and basic information gathering of those tools was performed. The list of candidate optimization tools contains 34 entries: Comet, EvA2, evolvica, Evolutionary::Algorithm, GAPlayground, jaga, JCLEC, JGAP, jMetal, n-genes, Open Beagle, Opt4j, ParadisEO / EO, Pisa, Watchmaker, FOM, Hypercube, HotFrame, Templar, EasyLocal, iOpt, OptQuest, JDEAL, Optimization Algorithm Toolkit, HeuristicLab, MAFRA, Localizer++, GALIB, DREAM, Discropt, MALLBA, MAGMA and UOF. 5.2.3 Inclusion and Exclusion criteria Some MOFs were discarded to keep the size and complexity of the review at a manageable level, establishing the following filtering criteria: • The development of MOFs must be alive, and error fixing supported by their developers. A MOF where users must debug all errors found by themselves and that will not provide future improvements or features is not a valid option. Consequently this is our first filtering criterion. We consider as abandoned those frameworks without new versions (even minor bug fixes) or papers published in the last five years. This criterion eliminated 8 MOFs: jaga, HotFrame, Templar, MAFRA, DREAM, Discropt and UOF. • MOFs to be evaluated must be frameworks implemented in general purpose Object Oriented languages (such as Java or C++). They must provide a general de- 97 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS sign where user-defined classes are integrated in order to produce an optimization application for solving the problem at hand. There are useful optimization tools that do not meet those requirements, and are consequently out of the scope of this chapter, but might be studied in a similar comparative research work. This criterion eliminated 3 MOFs: Evolutionary::Algorithm, PISA, Comet and OptQuest. • MOFs must support at least two different optimization techniques, considering multi-objective variants of techniques as different techniques. Otherwise, they are considered specific applications, even if they can adapt to various problems. This criterion eliminated 9 MOF: evolvica, n-genes, GALib, GAPlayground, Hypercube, JGAP, Open Beagle, jMetal, Watchmaker. • Those frameworks for which an executable version or source code with its documentation could not be obtained were also eliminated (after contacting authors and requesting from them a valid version). This criterion eliminated 4 MOFs: iOpt, JDEAL, OptQuest and MAGMA. Table 4 shows the final set of frameworks compared along with their specific versions and web sites. Name EasyLocal ([71]) ECJ ([179]) EO/ ParadisEO/ MOEO/ PEO ([41]) EvA2 ([170]) FOM ([209]) HeuristicLab ([283]) JCLEC (and KEEL) ([277]) MALLBA ([8]) Optimization Algorithm Toolkit ([40]) Opt4j ([178]) Ver. 2.0 20 1.2 Web (htpp adress) http://satt.diegm.uniud.it/EasyLocal++/ http://cs.gmu.edu/˜eclab/projects/ecj/ http://paradiseo.gforge.inria.fr http://eodev. sourceforge.net/ 2 0.8 3.3 4.0 2.0 1.4 http://www.ra.cs.uni-tuebingen.de/software/EvA2/ 2.1 http://opt4j.sourceforge.net http://www.isa.us.es/fom http://dev.heuristiclab.com http://JCLEC.sourceforge.net http://sci2s.ugr.es/keel/ http://neo.lcc.uma.es/mallba/easy-mallba/index.html http://optalgtoolkit.sourceforge.net Table 5.1: Selected MOFs In spite of the considerable effort during the development of this work, and that the MOFs have been chosen based on well-defined and consistent filtering criteria, some 98 5.2. REVIEW METHOD metaheuristic optimization libraries of great practical interest have not been included in this study (e.g. JGAP, Hypercube, Watchmaker or Comet). 5.2.4 Comparison Criteria Evaluating a software tool usually implies understanding and balancing competing concerns regarding the new technology. In this sense, the proposed comparative criteria covers 6 areas of interest that are subdivided into 30 specific characteristics, which in turn are subdivided into 271 features. Table §5.2.4 shows the areas and corresponding set of characteristics in this study, along with the associated research question that we intend to answer through the evaluation of each characteristic. Table §5.2.4 covers a wide range of concerns, from MOF specific characteristics such as supported metaheuristic techniques or solution encoding (covered in areas C1, C2 and C3), to general concerns such as usability, documentation and licensing model (covered in areas C4, C5 and C6). Specifically, these areas are directly related to our research questions: • Area C1 establishes a set of metaheuristic techniques and variants to be supported by MOFs. The assessment of this area for each framework allows us to answer RQ1 and its sub-questions. • Area C2 describes the possible ways of tailoring the problem through metaheuristics. Thus, its assessment provides a basic way of answering RQ2, showing the support provided by each framework. • Area C3 comprises of a set of advanced capabilities such as distributed and parallel processing, or hybridization. • Area C4 defines different kinds of additional features that are (or could be) supported by MOFs. • Area C5 shows the platforms and programming languages supported by each framework, along with the use of software engineering best practices. • Area C6 defines characteristics that assess the issues concerning the sub-questions of RQ10. Due to the different kinds of characteristics present in the comparison framework, a proper quantification of the facilities provided by each MOF is a complex issue. Some- 99 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS Area C1 Metaheuristic Techniques C2 Adaption to the Problem and Its Structure C3 Advanced Characteristics C4 General Optimization Process Support C5 Design, Implementation & Licensing C6 Documentation & support Characteristic C1.1 Steepest Descent / Hill Climbing C1.2 Simulated Annealing C1.3 Tabu Search C1.4 GRASP C1.5 Variable Neighborhood Search (VNS) C1.6 Evolutionary Algorithms C1.7 Particle Swarm Optimization C1.8 Artificial Immune Systems C1.9 ACO C1.10 Scatter Search C1.11 Multi-objective Metaheuristics C2.1 Solution Encoding C2.2 Neighborhood Structure definition C2.3 Auxiliary Mechanisms supporting population based heuristics (Genetic Operators) C2.4 Solution Selection Mechanisms C2.5 Fitness Function Specification C2.6 Constraint Handling C3.1 Hybridization C3.2 Hyper-heuristics C3.3 Parallel & Distributed Computing C4.1 Termination Conditions C4.2 Batch execution C4.3 Experiments Design C4.4 Statistical Analysis C4.5 User Interface & Graphical Reports C4.6 Interoperability C5.1 Implementation Language C5.2 Licensing model C5.3 Platforms availability C5.4 Usage of Soft. Eng. Best Practices (Test, Design Patterns, UML) C5.5 Size ( classes and packages/modules) C5.6 Numerical Handling C6.1 Sample problems types C6.2 Articles & papers C6.3 Documentation C6.4 Users & Popularity Related RQ RQ1 RQ2 RQ3 RQ4 RQ5 RQ6 RQ8 RQ7 RQ8 RQ9 RQ10 Table 5.2: Areas of interest and comparison characteristics times it is meaningless to use quantitative values for assessing certain characteristics (e.g. it makes no sense to associate a quantitative value to the language in which the MOF is implemented). Therefore, for some characteristics we avoid defining metrics, using them as simple attributes of MOFs which might be relevant to users. Other characteristics such as MOF size, have been left out of the comparative analysis because 100 5.2. REVIEW METHOD they do not affect the research questions. However, the information harvested can be useful for further analysis. In our comparative approach, we have attempted to obtain a knowledge base about real capabilities provided by MOFs which are as objective as possible. In so doing, each characteristic has been defined, and a set of features is identified to evaluate its support (with minor exceptions). Features are defined taking into account the maximum possible support that could provide an ideal MOF, not the current state of the art MOFs, in order to identify gaps, and answer RQ11. Consequently, there are characteristics that are not fully supported by any MOF, and even some for which current support is nearly non-existent. In case we need a subjective criteria, we have adopted the perspective of the research-use context (cf. Section §2.6.1) and the research questions stated. We are working on three levels: areas, characteristics and features; where characteristics are aggregated into areas and various features are used to evaluate individual characteristics. For each feature and MOF a value is measured with two methods: first, features corresponding characteristics of areas C1 to C4 are evaluated using a binary true/false value, avoiding subjectivity on the value assignment. This information is defined as feature coverage, and is the base of a more general evaluation that provides a global quantitative value for each characteristic and area. Second, areas C5 and C6 represent non-functional characteristics corresponding to transversal aspects that can not be measured in an objective way; as a consequence, each feature is defined with a score marked by the research use context. A specific value has been given to each characteristic based on these features. A weighting that defines the contribution of each feature to the general support of the characteristic. In the same way, each area is measured based on a weighted sum of the evaluation of its corresponding characteristics. The proposed weights range from 0.0 to 1.0, meaning none and full contribution to characteristics support respectively. Three different types of metrics have been devised: • Uniform: weighting is associated evenly to each feature of the characteristic. This metric type is usually associated with variants or features with no clear predominance in terms of popularity or performance. • Proportional: a basic feature is given a significant weight (usually 0.5) and the remaining weight is evenly associated to the other features of the characteristic. This metric type is associated with a characteristic with a more useful feature with some rare variants or additional features. 101 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS • Ad Hoc: weighting is associated to features based on specific author criteria. It is important to note that we have set weights from a research use context on optimization problem solving. However, in other specific scenarios such as teaching, or industrial problem solving, weights could vary in order to reflect the exact importance of features, characteristics and areas on those contexts. This mechanism allows customized versions of the comparative study and tailored conclusions. This information is published as a public spreadsheet at: 1 . In this way data can be verified and reused, and weights can be redefined. Moreover, for areas C1,C2, C3 and C4, tables showing feature coverage per framework, and weights are provided in this chapter (Tables §A.1, §A.2, §A.1 and §A.4 respectively). In the following sections we describe each area, its characteristics, corresponding features and weights, and global scores obtained by each MOF. Tables 9, 10, and 11 in the appendix, show these scores in detail. 5.3 M ETAHEURISTIC T ECHNIQUES (C1) The main feature of any MOF is the set of supported metaheuristics. A characteristic is defined for each metaheuristic, which indicates the support the MOFs provide for it. Most of these metaheuristics have been described in detail in Section §2.2, thus for the sake of brevity, only their specific features taken into account and their weights are provided in this section. 5.3.1 Characteristics Description A set of 11 characteristics has been defined, with 52 features, comprising most major metaheuristics proposed in the literature; either based on intelligent search (characteristics C1.1,C1.2,C1.3 and C1.5), on solution building (C1.4, C1.9 and C1.10) or populations (C1.6, C1.7, C1.8, C1.9 and C1.10). Furthermore we have evaluated the incorporation of techniques for multi-objective problem solving (C1.11). Metaheuristics and variants taken into account have been chosen following [120] and some technique specific references such as [3],[17] and [54]. Next we describe in detail each of these characteristics. The coverage of features by frameworks and their weights are shown in Table §A.1. 1 http://www.isa.us.es/MOFComparison this document contains comments about feature coverage and why some features are assessed as partially supported by some MOFs 102 5.3. METAHEURISTIC TECHNIQUES (C1) C1.1 Steepest Descent / Hill Climbing: This technique searches successively for the best neighbor solution until reaching a local optimum. This technique is commonly used for hybridization (c.f. characteristic C3.1). Metric: We have defined two different features: (i) basic implementation until local optimum is found, and (ii) multi-start implementation using a random initial solution when local optimum is found. A uniform metric is used (with each feature weighing 0.5). C1.2 Simulated Annealing: We have defined a feature associated to the basic implementation of this technique, and features for some of its variants. Variants on the cooling scheme: linear, exponential scheme as proposed by Kirkpatrick et al. [161], logarithmic scheme as defined by Geman and Geman [115], and schemes based on thermodynamics (defined by Nulton and Salamon and Andresen and Gordon). Additionally, we have evaluated the variants on the acceptance criterion of worsening solutions: metropolis acceptance proposed by [161] and logistic acceptance [123]. Metric: A proportional metric is used, where the basic implementation has a weight of 0.5, each cooling scheme variant weighs 0.1, and each acceptance criterion variant weighs 0.1. C1.3 Tabu Search: This technique uses procedures designed to cross boundaries of local optima by establishing an adaptive memory to guide the search process. Metric: An ad-hoc metric is used to asses this characteristic. A feature representing the basic implementation of this technique using as memory a tabu list weighs 0.3, memory for recent solution’s components weighs 0.2, frequency-based memory for solution’s components weighs 0.3, and the inclusion of aspiration criteria weighs 0.2. C1.4 GRASP:A unique feature indicating support for this technique is used, evaluated as a binary value indicating if the framework provides some kind of support. C1.5 Variable Neighbourhood Search (VNS): Several variants of this technique have been proposed in literature, based on them we propose the following features: (i) Original proposal implementation (VNS);(ii) Variable Neighbourhood Descent (VND); (iii) Reduced VNS (RVNS); (iv) Variable Neighbourhood Decomposition Search (VNDS) by [128] and (v) Skewed VNS by [66]. Metric: A uniform metric is used, having a weight of 0.2 for each feature. C1.6 Evolutionary Algorithms (EA): There are many techniques based on principles of biological evolution that are denoted as Evolutionary Algorithms. Specifically, EAs comprise of three independently developed approaches: evolutionary strategies (ES) proposed by [233], evolutionary programming according to [99] and genetic algorithms 103 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS as developed by [140]. These techniques present different variants based on the elements used for adapting to the problem (some of them present in other techniques) and some additional variation points. In order to create a global and coherent comparative criteria, we have identified various characteristics for those variations. Remarkably, the selection of individuals for crossover and survival is independent of the solution encoding; thus frameworks can provide implementations using different selection criteria and can reuse them, since mechanisms for selecting solutions are used in various metaheuristics. We have created a characteristic for evaluating the support for solution selection (C2.4). Crossover and mutation mechanisms are dependent on the representation scheme used, and the efficiency of a specific mechanism will strongly depend on the problem to be solved. Consequently, we have created an associated characteristic in the area of adaptation to the problem (C2.3). Thus, this feature (C1.6) only measures the support provided by frameworks for general evolutionary algorithms, without taking into account solution encoding capabilities, the genetic operators nor the selection mechanisms available. Of the many variants that have been proposed in literature for the basic evolutionary algorithm, we take into account: (i) the use of variable population sizes (e.g. GAVaPS [13]), (ii) niching methods (commonly used to solve multi-modal optimization problems), (iii) individuals that encode more than one solution to the problem (usually diploid). [125], (iv) co-evolution of multiple populations in competitive and cooperative environments as described in [17, Chapter on co-evolutionary algorithms] and (v) differential evolution as developed by [225]. Variants (i), (iii) and (iv) as well as some versions of (ii) can be implemented regardless of the problem, the solution encoding or the operators used. Metric: An ad hoc. metric is defined to asses this characteristic. Three features have been identified to evaluate the support of the different evolutionary approaches, with each feature weighing 0.2. With regards to the variants, (i) weighs 0.05, (ii) weighs 0.1, and (iii) weighs 0.05, (iv) weighs 0.1 and (v) weighs 0.1. We evaluate variants as binary variables, in terms of the support afforded by frameworks. C1.7 Particle Swarm Optimization (PSO): In this technique, the topology of the neighbourhood of particles, i.e. the particles that influence the position of a given particle according to the equations, generate a full set of possible variants. In the original PSO, two different kinds of topologies were defined: (i) global, specifying that all particles are neighbours of each other; and (ii) local, specifying that only a specific number of particles can affect a given particle. In [160] a systematic review of neighbourhood topologies are described, and in [264] the concept of “dynamic” neighbourhood topology is proposed. Another interesting variant is the use of a “life time” for solutions 104 5.3. METAHEURISTIC TECHNIQUES (C1) in the swarm, after this time solutions are randomized. Metrics: We have created a feature to represent the original proposal for real variables and classic equations, It weighs 0.3. Discrete variable support weighs 0.2. Equations customization weighs 0.2. The explicit modelling and support of different neighbourhood topologies weighs 0.2. Finally, lifetime support weighs 0.1. C1.8 Artificial Immune Systems (AIS): This technique intends to use the structure and operation of biological immune systems of mammals, and apply it to solving optimization problems. This technique comprises various proposals: Clonal Selection algorithms originally proposed by[199] and its variants such as CLONALG, developed by [65] and optIA; Immune Network algorithms and Dentritic Cell algorithms Metrics: A uniform metric is used to asses this characteristic (with each feature weighing 0.25). C1.9 Ant Colony System (ACS): In this chapter the following variants are taken into account: The original proposal of Ant System (AS) and Ant Colony System (ACS) as proposed by [73], Ant System using Rankings (ASrank), Min-Max Ant System (MMAS) according to [262], and API as developed by [192]. Metrics: An ad hoc. metric is defined for this characteristic, corresponding weights are shown in Table §A.1. C1.10 Scatter Search (SS): This technique has a unique feature, evaluated as a binary value, which indicates if the framework provides it with some kind of support. C1.11 Multi-objective Metaheuristics: The technique most commonly used to solve multi-objective optimization problems is EA ([77]). However, some variants of other techniques have also been taken into account: SA (MOSA as proposed by[273] and PASA as developed by [265]), PSO ([217]) and ACO ([148]). Those variants have been adapted to solve multi-objective optimization problems. Regarding the EA variants to evaluate, we have taken into account: the original proposal by [122] (PGA), MOGA as proposed by [100], Non Dominated Sorting Genetic Algorithm (NSGA and NSGA-II) as developed by [67], Niched Pareto Genetic Algorithm (NPGA) according to [143], Strength Pareto Evolutionary Algorithm (SPEA and SPEA-II ([307, 308])), Pareto Envelope based Selection Algorithms (PESA and PESA-II) ([56]), Pareto-archived ES (PAES) ([165]), multi-objective messy GA (MOMGA) ([275]) and ARMOGA ([242]). Metrics: A uniform metric is used to assess this characteristic. 5.3.2 Assessment and Feature Coverage Analysis It is remarkable that only four features of this area are supported by a minimum of six out of the ten MOFs under study. These features correspond to metaheuristics 105 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS SD/HC, SA and EA. This fact shows a dispersion in techniques supported by MOFs, and consequently, implies that users have little choice if they want to use techniques out of this set. Thus MOF is determined by the technique the user wants to apply. An interesting fact shown in Table §A.1 is that 39% of features in this area are not supported by any MOF. Consequently, current MOFs have room for improvement in this area. Moreover the distribution of those unsupported features imply that MOF techniques support is aimed at the basic variants. This does not apply to the techniques in the core set, TS and some multi-objective variants, since those techniques only have features that represent variants with more than 30% of MOFs supporting them. ParadisEO, Eva2, and FOM, have the highest number of features supported in this area, followed by HeuristicLab and OAT. 5.3.3 Comparative analysis FOM is the framework that provides a broader support of optimization techniques, closely followed by Paradiseo, Eva2 and HeuristicLab. It is important to note that more features supported does not imply more techniques supported, since some techniques have a number of variants and specific heuristics implementations modeled as features. The weights contribute to express this fact, by making that each technique sums a total score of 1 unit once the features are weighted. Figure §5.1 shows a stacked columns diagram for the C1 area characteristics. Each color or texture represents a metaheuristic and each column the support provided by a MOF. The number of techniques supported by each MOF can be easily identified by the number of different colors/textures in its column. The degree of support for each technique is expressed through each pattern’s height (computed based on the weight associated to their features and the feature support information shown). The total height of each column provides a measure of the global support of metaheuristics by its corresponding MOF. The almost universal support for EA and the lack of support for AIS is remarkable. SS is only supported by Eva2, and GRASP is only supported by FOM. Other metaheuristics with very little support are ACO, TS and VNS. This could be due to the complexity of modeling in abstract, the elements involved in their operation and reusing or customizing them (ACO and TS are based on features of solutions, and VNS needs to apply different neighborhood structures). When applying EAs using java; ECJ, JCLEC and EvA2 appear as highly competitive options; whilst Paradiseo and MALLBA are the MOFs available if the user plans to use C++. In .NET environments, the only option available for applying EAs is HeuristicLab. 106 5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2) Figure 5.1: Stacked Bar Chart showing MOFs techniques support We can provide an answer to RQ1 and its sub-questions based on information shown in Table §A.1 and Figure §5.1: characteristics of area 1 summarize the whole set of metaheuristics currently supported by assessed frameworks. Most variants of those techniques are unsupported. The most widely supported techniques are EA, SD/HC and SA, which are supported by more than 60% of asssessed frameworks. Finally, there is no universal MOF, which provides support for all the techniques. 5.4 A DAPTING TO A PROBLEM AND ITS STRUCTURE (C2) As stated in the previous section, MOFs provide implementation of metaheuristic techniques for problem solving. They also provide mechanisms to express problems properly in order to apply these techniques. MOFs allow for the adaptation of their supported metaheuristics for better problem solving. For instance, frameworks can provide appropriate data structures that the techniques can handle. This two-way adaptation (techniques to problem for efficient problem solving, and problems to techniques for proper solution handling and underlying heuristics implementation) is basically done in three ways: selecting an appropriate solution representation/encoding, specifying the objective function to optimize, and implementing the set of underlying heuristics required by the metaheuristic used to solve the problem. 107 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS 5.4.1 Characteristics Description This area evaluates the capabilities provided by MOFs to support this adaption. Characteristic C2.1 aims at assessing capabilities to represent solutions to optimization problems based on the set of data structures provided by frameworks. Characteristics C2.2, C2.3 and C2.4 aim to assess the supported set of underlying heuristics. Characteristic C2.5 aims to assess the capabilities of declarative objective function specification based on the representations assessed in C2.1. Finally, C2.6 aims to assess capabilities of constraint handling. Features and characteristics described in this section have been structured following [17] and [240] for solution encodings (C2.1), [17] for selection and genetic operators (C2.3 and C2.4), [3] for neighborhood definition capabilities (C2.2), and [190] for constraint handling techniques (C2.6). Next we describe in detail each of these characteristics: • C2.1 Solution Encoding: Solution encodings are data structures that allow the modeling of solutions for metaheuristic techniques to handle. In this sense, the increased flexibility and the more data structures provided, the lower the effort invested by the users to address problems. Metric: In order to evaluate this characteristic, we have taken into account 3 criteria: provided data structures (vectors, matrices, trees, graphs and maps), data types and information encoding, and the ability to use combined representations as described by [240]. A proportional metric is used, where this last feature weighs 0.4. Data types taken into account are bits (with usual or Gray encoding), integers, floating point numbers, and strings. The remaining weight is evenly divided among these combination of data type and data structure. • C2.2 Neighborhood Structure definition. A proper neighborhood structure definition is a key factor for the success of intelligent search-based heuristics. Neighborhood structure strongly depends on solution representation, and its suitability depends on the problem to be solved and the technique used to solve it (as stated by [3]). Metric: The assessment is divided into 3 features: pre-defined neighborhood structures provided by MOFs weigh 0.6; neighborhood structures of composite representations weigh 0.3, and a weight of 0.1 is given to complex neighborhood structures that apply different neighborhood structures randomly or based on some rule. • C2.3 Auxiliary Mechanisms supporting population-based heuristics (Genetic Op- 108 5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2) erators). Genetic Operators are the main underlying heuristics on EA. Their implementation (except for selection operators, evaluated in C2.4) is usually dependent on solution representation; therefore, MOFs must provide the corresponding implementations for their supported representations. Various alternatives for implementing each genetic operator have been proposed in literature as described below. We have relied primarily on [17, chapter C3.3] to develop the definition and features of this characteristic. The most common genetic operators are crossover and mutation. Weights have been evenly distributed among all variants provided for each operator. Next we enumerate the crossover operators proposed in literature for solution encodings of Table 3. – Binary and integer vectors: The original crossover operator was proposed by [140] and named “one point crossover” (1PX), the generalization of this operator for n crossover points (NPX) was proposed by [152], uniform crossover (UX) [5], punctuated crossover (PNCTX) [243], shuffled crossover (SX) [86], half uniform crossover (HCX) [84] and random respectful crossover (RRX) as proposed by [229]. – Floating Point vectors: Operators 1PX, NPX and UX are in principle applicable to floating point vectors, but they support a set of specific crossover operators for being implemented by MOFs: arithmetic crossover (AX/BLX) [189, p 112], heuristic crossover (HX) [301], simplex crossover (SPLX) [234], geometric crossover (GEOMX) [189], blend crossover (BLX-alpha) [85], crossover operators based on objective function scanning (F-BSX) and diagonal multiparental crossover (DMPX) as proposed by [81]. – Permutations: Basic crossover operators, such as 1PX, NPX, UX, etc., generate infeasible individuals when using permutation-based representations; it is therefore necessary to design specific operators for such representations, such as: order crossover operator (OX) [63], partially mapped crossover (PMX) [121], order-2 and position crossover [266], uniform crossover for permutations (UPX) [63, p 80], maximal preservative crossover (MPX) [197, p 331], cycle crossover (CX) [204] and merge crossover (MX) and defined by Blanton and Wainwright [32]. – State Machines: Crossover operators for state machines (SMFx) were initially proposed by [97, 99] (p. 21-23). In this comparative study, we evaluate those operators and 1PX using a vectorial representation of the state machine (SM1PX) as defined by [306], state one to one state interchange as pro- 109 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS posed by [98], uniform crossover for state machines (SMUX) and the merge operator (SMJO) as defined by [31]. – Trees: There is real difficulty in defining proper crossover operators for trees, and specifically trees representing programs, since generally constraints have to be imposed on their structure, semantics and associate data types. The most common crossover operator for trees were proposed by [58]. In this comparative, we also considered those defined by [169], and the adaptations proposed by [193]. – Crossover operators for Composite Representations (CSX): Crossover operators for individuals using composite representations can be used by applying the corresponding operators to each component of the representation . – Composite crossover operators (CMPX): By assigning a probability (or decision rule) to the application of an operator from a set of valid crossover operators for the representation used, composite crossover operators are possible. Next we enumerate the mutation operators proposed in literature for the solution encodings of Table 3. – Binary and integer vectors: We have taken into account the original mutation operator proposed by [139, pp 109-111]. – Floating Point Vectors: The mutation operator based on an uniform distribution U (b, −b) (RUm) proposed by [64], the normal mutation operator (RNm) developed by [245], the mutation operators based on Cauchy (RCm) and Laplace (RLm) distribution as proposed by [194, 302], and the proposals of adaption of mutation ratio according to [245] and [96], are the mutation operator for floating vectors that have been considered. – Permutations: The mutation operators for permutations covered by this comparison are: 2-opt (P2Optm), 3-opt(P3Optm) and k-opt (PKOptm), simple interchange mutation operator (PSWm) o insertion operator (deleting the item from its original position) of 2 element (PIm), and “scramble mutation operator” (PSCm) [266]. – State machines: The basic mutation operator for state machines is based on the set of its states and transitions, slightly modifying any state or transition as porposed by [17, C3.2.4]. – Trees: The mutation operators for trees covered by this comparison are those proposed by [11]: (i) grow mutation operator (TGm); (ii) reduction mutation 110 5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2) operator (TSHRm); (iii) swapping mutation operator (TSWm); (iv) cycle mutation operator (TCm); and (v) the gaussian mutation operator for numeric nodes (TGNm). The adaption proposed by [193] is also taken into account. – Mutation operators for composite representations (CSm): Mutation operators for individuals using composite representations can be created by applying the corresponding operators to each component of the representation. – Composite Mutation operators (CPXm): Composite mutation operators are possible, through the assignment of a probability (or decision rule) to the application of an operator from a set of valid operators for the representation used. – Mutation operators using dynamic probability (DEm): There exists empirical evidence [95] that the use of a dynamic mutation probability that decreases exponentially along the evolution process, improves the performance of EAs. In this comparison, we have taken this feature it into account. Metric: A uniform metric is defined, where the weight evenly distributed among mutation (0.5) and crossover (0.5) operators. For each variant of those operators, weights are uniformly associated. • C2.4 Selection Mechanisms: This characteristic assess the support for the different criteria for solution selection. The problem of selecting a subset amongst a larger set of solutions appears as a specific heuristic on a number of metaheuristic techniques (SA, TS, EA, ACO, etc.). By applying OO analysis and design methodologies, and specifically the strategy design pattern 2 , objects encapsulating the solution selection logic are called selectors. The use of different selectors allows for controlling the trade-off between exploration and exploitation of the search space. As a consequence, performance of metaheuristic techniques in finding good solutions to problems is drastically affected by those selection criteria. Usually, selection criteria are based on the adequacy of solutions, but there is a wide set of possibilities, from random to elitism (stochastic and deterministic). In this comparison the following criteria are taken into account: (i) elitist selector (Es), that picks the best solutions, and its variants; expected value selector 2 The strategy pattern is a particular software design pattern, whereby algorithms can be selected at runtime. This pattern is useful for situations where it is necessary to dynamically swap the algorithms used in an application. The strategy pattern is intended to provide a means to define a family of algorithms, encapsulate each one as an object, and make them interchangeable [109]. 111 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS (EVs) and elitist expected value selector (EEVs) as proposed by [152]; (ii) proportional selector (Ps) as proposed by [139], where probability of select s, P(s) is proportional to their fitness, and its variants, random sampling selector (RSSs) and stochastic tournament selector (STs) [38]; stochastic universal sampling selector (SUSs) as proposed by [18]; (iii) ranking based selectors: linear (LRs) and non-linear (NLRs), developed by [294]; (iv) selection schemas (µ, λ), (µ + λ) and (v) threshold based selectors (Ths); (vi) Boltzman selector (Bs), (vii) a fully random selector (RNDs), (viii) and a selector that combines a pair of different selectors (COMBs) by dividing the set of elements to select amongst its components. Metric: A uniform metric is used to assess this characteristic. Figure 5.2: Adaption to the problem and its structure support • C2.5 Fitness Function Specification Support: The most problem dependent element of metaheuristic techniques is the objective function to be optimized. Therefore, even when using MOFs, its evaluation is usually implemented explicitly by users, and integrated into the framework through its extension points. However, based on the solution encodings supplied by MOFs, it is possible to provide tools for declarative objective function specification, freeing the user from the low level task of implementing it. In this case, a Domain Specific Language (DSL) is a tool of great interest for objective function specification. The advantages of using a DSL, compared to classical implementation are that the DSL can be a much simpler language than the implementation language, and integration of the objective function can be automatic if 112 5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2) the MOF supports it. If the MOF provides suitable DSL tools for the specification of the objective function (such as syntax highlighting and in-line debugging and error information), it could lead to a more declarative paradigm for metaheuristic problem solving, improving the usability of metaheuristics and contributing to a wider application of such techniques. There are also drawbacks when using DSLs for objective function specification, such as the need to learn a new language, performance loss, and the inability to model some objective functions using the language constructs. Finally, there are problems types for which the automatization of objective function evaluation is impossible, since it relies on a human operator’s interaction to evaluate solutions. In order to support this kind of problems, MOFs can provide a form in which users can directly provide the evaluation of solutions. Moreover, a partial implementation would be provided, where MOF users would customize the data entry form and solution representation (graphical or textual), designing a user friendly interface integrated within the framework. Metric: A uniform metric is defined to assess this characteristic, using features enumerated above: DSL support, DSL tools, and forms for solution evaluation by human operators. • C2.6 Constraint Handling: A feature of great importance for proper problem modeling is constraint definition support. There are usually two different ways to handle constraints when solving optimization problems 3 : (i) include constraint meeting in objective function definition as penalties; (ii) and create repairing mechanisms that are applied to infeasible solutions. There are three alternatives of implementation for those mechanisms on MOFS: (a) provide global repairing mechanisms that users can implement for the problem at hand, (b) explicit modeling of each constraint, and (c) specific repairing mechanisms for each constraint. In the same way as in characteristic C2.5, (iii) the use of a DSL can make it easier to specify constraints for users, and some mechanisms, such as penalization (cf. (i)), can be applied without the need of implementation by users. Metric: An ad hoc. metric is defined to asses this characteristic, where the weights have been associated to each feature as follows: (i) penalization 0.3, (ii.a) global repairing mechanism 0.2, (ii.b) individual constraint modeling 0.2 (ii.c) individual constraints repairing mechanisms 0.2, and (iii) DSL support 0.1. 3 Various techniques to adapt metaheuristics to constrained problems have been proposed in literature (c.f. [190] for instance). However, most of these approaches require ad hoc implementation of the techniques depending on the problem and type of constraints to handle; consequently it is difficult to integrate those proposals into a MOF. Those ad hoc techniques have been omitted in our comparison 113 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS 5.4.2 Assessment and Feature Coverage Analysis It is remarkable that only 9.57% of features of this area are supported by a minimum of six out of the ten MOFs under study. Moreover, those features are associated to only three characteristics (namely C2.1, C2.3 and C2.4), and are mainly related to EA. An interesting fact shown in Table §A.2 is that more than 25% of features in this area are not supported by any framework. 5.4.3 Comparative Analysis Area C2 along with C3 have the smallest average score of our benchmark, evidencing that framework developers have put more emphasis on coding algorithms for problem solving than in the support for an easy and efficient adaptation of these algorithms to the problem. Remarkably, there is a lack of support for: (i) the definition of neighborhood structures (except EasyLocal, ParadisEO and HeuristicLab), (ii) the specification of the objective function, and (iii) constraint handling (exceptions are FOM, Eva2, ParadisEO and HeuristicLab). In Figure §5.2 a stacked columns diagram is shown for the characteristics of this area. Just like in Figure §5.1 colors represent characteristics of this area and columns their support by the assessed MOFs. Based on information shown in Table §A.2 and Figure §5.2, we can provide an answer for RQ2: The means of problem adaption are summarized by the characteristics of area C2, however, current support of these mechanisms is limited and strongly depends on the MOF and metaheuristic to use for problem solving. It is important to note that characteristic C2.4, is intimately related to EA support, and consequently those MOFs that do not support this technique are no able to support the features of this characteristic. However, those MOFs, such as EasyLocal, are still able to provide support for the rest of the area, and constitute very useful alternatives when applying other techniques. Thus, users must have this into account when comparing different MOFs. 114 5.5. ADVANCED CHARACTERISTICS (C3) 5.5 A DVANCED CHARACTERISTICS (C3) In this area we evaluate general and advanced characteristics, not related to specific metaheuristics techniques. Specifically, the characteristics assessed in this area are: the use of hybrid techniques, the implementation of hyper-heuristics and distributed and parallel execution. These characteristics are of great interest since they can either drastically improve the results obtained or simplify the application of techniques. They are especially interesting because their implementation involves a high cost and complexity, preventing their application in many contexts. As MOFs can provide these characteristics pre-implemented, their applicability is significantly broadened. 5.5.1 Characteristics Description C3.1 Hybridization: There is ample empirical evidence of the success of hybrid techniques for optimization problem solving as stated by Talbi [267]. Several authors have described taxonomies of hybrid metaheuristics, to discern the ways techniques can be combined such as [267] and [238]. In this work we restrict the concept of hybrid metaheuristic to a combination of techniques integrated at a high-level (as defined by [231]), where each technique keeps its overall structure except at the point of invocation of the other. Specifically we have considered four different types of hybridization: (i) batch execution of the same technique (BEMIh), in which the technique is executed several times; (ii) batch execution of different techniques (BEMMh), where various techniques are executed sequentially and can the results of one can be used as an initial solution of others; (iii) interleaved execution of a technique as a step in each iteration of another; possibly affecting the internal variables (IMMh); and (iv) combinations of various types of the above (Ch). Metric: An ad hoc. metric is defined to asses this characteristic, with the weights of the features being: (i) BEMIh 0.1, (ii) BEMMh 0.2, (iii) IMMh 0.6 and (iv) Ch 0.1. C3.2 Hyper.heuristics: A hyper-heuristic is readily defined as: a heuristic that selects heuristics. Hyper-heuristics are intended to provide robust and general techniques of broad applicability without needing extensive knowledge of both the technique and the problem to solve. Hyper-heuristics have received much attention in recent years [47, 57]. Hyper-heuristics search from the heuristics space the heuristic that best solves a particular problem. The search space for hyper-heuristics could consist of four different subspaces: (i) optimization techniques space, with fixed parameters for each technique, (ii) parameter values space for a technique; (iii) underlying heuristics space 115 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS for a technique (e.g. searching on a space of applicable selection, mutation or crossover operators when using an evolutionary algorithm); and (iv) search space of possible solution encodings . Metric: A uniform metric is defined to asses these characteristics (with each search space weighing 0.25). C3.3 Paralell & Distributed Computation: Many adaptations of metaheuristics have been proposed in the literature to exploit the paralell processing capabilities available in current distributed environments. Incorporating these strategies in a MOF is a significant improvement in their applicability and relevance to the resolution of a great number of real problems, given the complexity and cost of its implementation. Parallel and distributed execution of metaheuristics techniques without intercommunication (IPDM) can be implemented independently of the technique to apply. The only requirement is the installation of the MOF in each of the computers of the distributed environment and enabling a mechanism for communication and control in order to design, plan, launch execution and control optimization tasks in that distributed environment. Another similar variant is one in which techniques can exchange solutions (SSPDM). A parallel EA-based on islands with migration (as proposed by [295]) would qualify as a SSPDM technique. Finally, techniques that need a change on the implementation of metaheuristics are sub-classified by [41] into: Parallel Local Search Metaheuristics: a unique executing instance of the metaheuristic controls the distributed and parallel exploration of its current solution’s neighborhood (LSPDNM). Parallel Population-based Metaheuristics: There are two different approaches to create paralell population based metaheuristics: (i) parallel and distributed objective function evaluation for the individuals of the population (PDPEDM), where in each network node a different subset of individuals conform the current population to be evaluated. The main difference with SSPDM is that a unique instance of the metaheuristic algorithm is executed in the distributed environment. (ii) Parallel evaluation of the objective function, where computing objective function of a solution implies parallel processing in various nodes (PDESSM). Metric: A uniform metric is defined to assess this characteristic, where variants taken into account are IPDM, SSPDM, LSPDNM, PDPEDM and PDESSM. 5.5.2 Assessment and Feature Cover Analysis It is remarkable that only 6.25% of features of this area are supported by a minimum of six out of the ten MOFs under study. Furthermore, 40% of MOFs provide a nearly 116 5.6. MPS LIFE-CYCLE SUPPORT (C4) nil support (fewer than 10% of features) in this area. With respect to the features of this criterion, the highest scores correspond to ParadisEO and FOM. Although both frameworks support the first characteristic, FOM does not support Parallel and Distributed Optimization whilst ParadiseEO does not support Hyper-heuristics. Currently, FOM is the only framework that supports Hyperheuristics. In Figure §5.3 a stacked columns diagram is shown for the characteristics of this area. Figure 5.3: Advanced characteristics support Table §A.1 and Figure §5.3, answer RQ3, RQ4 and RQ5: Basic hybridization, such as (BEMIh) and (BEMMh) is currently supported by many MOFs, but more advanced hybridization techniques, such as (IMMh) and (Ch) are not. Parallel and distributed computing is currently supported by ParadisEO, ECJ, MALLBA and to a limited extent by other mainly EA-oriented frameworks such as JCLEC and EvA2. 5.6 MPS LIFE - CYCLE S UPPORT (C4) One of the strengths of MOFs is its capacity to support the MPS life-cycle. This support allows users without a deep knowledge in the area to apply metaheuristic techniques and obtain useful real results. This area evaluates these capacities. 117 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS 5.6.1 Characteristics Description Seven characteristics have been established, covering the various stages of execution of the global optimization problem solving process (4.1, 4.2, 4.3, 4.4 and 4.7), and the ability to interact with the user (4.5) and with other systems (4.6). The following describes those characteristics: C4.1 Termination Conditions: Metaheuristics do not provide explicit temination criteria, since, in general it is not possible to evaluate whether it have reached the global optimum solution. Thus, users have to set criteria based on the specific needs and context of the problem to decide when to stop the execution of the metaheuristic. MOFs can provide implementations of the usual criteria for reuse, among which we find: (i) maximum number of iterations, (ii) maximum execution time, (iii) maximum number of objective function evaluations, (iv) maximum number of iterations or execution time without improvement in the optimal solution found (v) reaching a concrete objective function value (vi) and logical combinations (using operators AND / OR) of the above (e.g. ExecTime ≤ 36000 OR ExecTimeWithOutImprovement ≥ 3600. (vii) Furthermore, termination conditions can be established independently of the problem to solve but dependent on the technique used, such as a termination criterion based on the diversity of the population when using an EA. Finally, (viii) we evaluate the facilities provided to enable the definition of specific criterion by its implementation. In this sense, we have assessed the use of abstract classes or interfaces to evaluate the termination condition and its use in the implementation of the metaheuristic techniques provided. Metric: A proportional metric is defined, were (viii) weighs 0.3 , and the remaining weight is evenly distributed among the other criteria. C4.2 Batch mode execution: The ability to automatically run a set of optimization tasks, where the user only has to specify the sequence and number of times to execute each task is important when performing experiments. The support of this feature promotes cost reduction, by automating one of the most tedious tasks of research and studies with empirical validation. We have defined four features related to this automation: (i) repeated execution of a task (using the same technique, parameters values and instance of the problem); (ii) repeated execution of a task with different parameters (defined a range or set of values for the parameters of the technique); (iii) execution of various tasks on the same instance of the problem; and (iv) execution of various tasks on multiple instances of the problem. Metric: A weight of 0.2 has been given for the four features described above. In addition the ability to randomize the optimization task execution sequence and the generation and loading of a document or file where 118 5.6. MPS LIFE-CYCLE SUPPORT (C4) tasks are defined (the task execution plan, where description of tasks to execute can be user-supplied or generated by MOFs) weighs 0.2. C4.3 Experimental Design: The appropriate design of experiments is essential to obtain valid conclusions in any study. This characteristic assesses the support provided by MOFs to establish hypothesis, identify dependent and independent variables, and select and define experiments properly using standard designs (factorial, latin squares, fractional, etc.). This characteristic is assessed independently of the previous characteristic (C4.2), and the capacity for statistical analysis of results (C4.4). There are two different ways to support this characteristic: (i) provide integration mechanisms with design of experiments systems (such as GOSSET [254]); and (ii) implement the utilities for experimental design in the MOF itself. The alternative (i) implies that capabilities for experiment design are those of the system to integrate with, and are difficult to assess in the context of this comparative. We have created a set of features in order to assess the capabilities of frameworks that use this approach (ii): (a) hypothesis definition support, specifically common hypothesis, such as equality of performance of two techniques or irrelevance of the value of a parameter in a range; (b) experiments modeling, supporting the definition of dependent and independent variables and their nature (nominal, ordinal or scalar); (c) experiments design based on the previous model using common schemes; and finally (d) the capability of executing the experiments automatically, this feature assess the capability of generating a proper task execution plan for the experiments designed (C4.2 evaluates capabilities of automation of those plans execution).Metric: A proportional metric is defined, where approach (i) weighs 0.2, and the remaining weight is evenly distributed among features of approach (ii). C4.4 Statistical Analysis: One of the most common tasks in solving optimization problems (and in any study with an empirical component) is the statistical analysis of experimental data and results. There are two different ways to support this characteristic: (i) to provide integration mechanisms with statistical analysis systems (such as R or SPSS); and (ii) to implement the utilities for statistical analysis in the MOF itself. One of the disadvantages of approach (i) is that the user must import data into the statistical analysis system and perform statistical tests on it, interpret results and return to the framework to change parameters or implementations if necessary. This approach frees the MOF from the implementation of the statistical tests. Moreover, statistical analysis systems are usually more complete and powerful than implementations of tests integrated on frameworks. On the other hand, the use of strategy (ii) 119 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS allows the framework to automate the tests and associated data exchange, showing the results integrated in its user interface, and even react autonomously to the results of tests. A set of features have been created in order to assess capabilities of frameworks that use approach (ii), concerning the support of various tests both parametric and non-parametric: (a) t-student; (ii) one way ANOVA; (iii) two way ANOVA; (iv) n-way ANOVA; (v) Mann-Withney U test; (vi) Wilcoxon test; and (vii) Kolmogorov-Smirnov test (or any test to assess the distribution of normal data). The use of approach (ii) does not necessarily imply that approach (i) can not be applied. In this sense the integration with the statistical software can be performed at the test execution level (to free the implementation burden), while providing programmatic support or graphical interfaces integrated in the MOF. Metric: A proportional metric is defined, where approach (i) weighs 0.3, and the remaining weight is distributed uniformly among features of approach (ii). C4.5 User Interface, Graphical Reports and Charts: The usability of applications strongly depends on the proper design of its Graphical User Interface (GUI). Specifically, an appropriate GUI for MOFs requires taking into account the rest of the characteristics of this comparison criteria: the ability to select and configure the parameters of the different techniques, reporting of the results and monitoring of the status of optimization tasks and of the global execution plan, the control of nodes in distributed and parallel computing environments, the on-line technical support, and the assistance or communication with the user forums and developers of the MOF. Moreover, although GUI design and usability could be assessed, the evaluation would include a subjective bias. In order to avoid it, we have defined the following set of features to be evaluated: (i) Integrated help and basic usability (menus, shortcut buttons, etc.); (ii) techniques specification and parameters configuration support, (iii) problem modeling and data import, (iv) Graphical support of advanced features (subdivided into batch mode execution configuration, design of experiments and statistical analysis of results) (v) the use of optimization project where all the information about problem instances, techniques and results are stored, and (vi) the graphical representation of results through diagrams and figures. Metric: A uniform metric is defined to assess this characteristic (each feature weighs 0.2). If the MOF only show the evolution of the objective function of the best solution, but no additional metrics are provided (such as population diversity when using EA, or current solution when using TS or SA), feature (vi) has been evaluated with half of the weight. C4.6 Interoperability: This characteristic assesses the set of capabilities that frameworks provide to exchange information and interact with other systems. Specifically 120 5.6. MPS LIFE-CYCLE SUPPORT (C4) the following features are taken into account: (i) results and data export capabilities (considering formats such as CSV or excel/odf files); (ii) data import capabilities (using formats such as CSV, excel/odf files or specific formats of standard libraries of each problem type, such as SATLIB or TSPLIB); (iii) the capability of deployment and invocation as a web service (as in [112]); and (iv) the use of XML to store information associated to optimization projects (selected solution encoding, objective function and problem model, techniques and their parameters, experiment design and results and statistical analysis, etc.), so that other systems can process these data and parameters in a simple way. Metric: A uniform metric is defined to assess this characteristic (each feature weighs 0.25). 5.6.2 Comparative Analysis The low score obtained by ParadisEO in this area is surprising, highlighting this as a potential area of improvement for that framework. OAT is among the highest scored frameworks (which has a well-designed GUI as well as powerful experiments execution and statistical analysis support) followed by JCLEC, whose characteristics in this area have been evaluated together with those of its associated project KEEL (focused on Data Mining and classification applications). Note that this area has, together with areas C2 and C3, the lowest support levels, thus representing significant areas of improvement in the present framework ecosystem. In Figure §5.4 a stacked columns diagram is shown for the characteristics of this area. Figure 5.4: General optimization process support 121 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS Table §A.4 and Figure §5.4, answer the requirements for RQ6: Area C4 characteristics summarize the capabilities provided by current MOFs for helping in the conduction of research studies and the general problem solving process. Those characteristics vary from statistical analysis and experiment execution engines, to GUIs with wizards and chart generation. These tools, however, are not interoperable, and the quality and support of each MOF is not homogeneous, it is disperse on the set of frameworks. Consequently those tools are not available for all techniques nor for programming languages and platforms. 5.7 D ESIGN , I MPLEMENTATION AND L ICENSING (C5) Both a suitable licensing model and the availability to run in multiple platforms are essential to the success of any software product. In the case of software frameworks, incorporating proper design and effective implementation is also very important, since applications created using it incorporate their design therein (with the errors and problems that they may contain). Moreover, the efficiency of those applications is limited by the efficiency of the framework. As a consequence, a comparison area has been defined to group this set of characteristics as described below. 5.7.1 Characteristics description C5.1 Language: Implementation language can be a key factor for users of MOFs, since the use of a well-known programming language reduces development costs and likelihood of errors. Frameworks under consideration in this share are implemented in C++, C# and Java. C5.2 Licensing: Cost is not a characteristic of interest since all the frameworks assessed are free; however, licensing of MOFs can limit the context and purposes of their use, or they can be forced to provide the client with the source code of the generated application. From this perspective the types of license we take into account are: (i) commercial; (ii) free without providing MOF source code nor commercial use; (iii) free with MOF source code available only for certain organization and usages (usually universities and non profit activities); (iv) MOF source code available under GPL (GNU General Public License) or similar, that forces the distribution of the source code of derived products under GPL license; and (v) MOF source code available under LGPL (GNU Lesser General Public License) or similar, that allows the use for commercial 122 5.7. DESIGN, IMPLEMENTATION AND LICENSING (C5) application without restrictions on source code availability. Metric: This feature is not evaluated using a set of features but we establish a direct score, based on the freedom that each license provides: (i) Commercial Licensing = 0; (ii) Free binaries (no commercial use) = 0.25; (iii) Restricted availability of source code = 0.5; (iv) GPL = 0.75 and (v) LGPL =1. C5.3 Supported Platforms: The set of platforms taken into account are: Windows, Unix (Linux, Solaris, HPUX, etc.) and Mac. Metric: A uniform metric is defined, with each platform weighing 0.b 3 ; in the case of partial support (only a limited set of features are available on a certain platform) we penalize it with 50%. C5.4 Software engineering best practices: A proper design and following of software engineering best practices is especially important for MOFs. However, assessing the design of a framework in a quantitative and objective way is a difficult task. As a result, features only evaluate basic use of certain tools and processes recognized as best practices such as: (i) the use of design patterns to promote flexibility in variation points; (ii) the use of automated tests (unit tests), this characteristic is evaluated based on the source code of MOFs (for those that do not provide the source code, evaluation is based on the documentation, if tests exists); (iii) explicit documentation of the MOF variation and extension points; and (iv) the use of reflective capabilities and dependence injection to promote flexibility as described by [104]. The latter feature corresponds to the capabilities of the framework to dynamically load types of problems, objective functions, and other elements associated with customization or extension without having to recompile the framework. With regards to feature (iv), MOFs that perform runtime loading of modules have been associated with half of the weight, while those that use a dependence injection system for the management of modules have full weight. Metric: A uniform metric is defined to assess this characteristic. C5.5 Size: A basic measure of the complexity of a framework is its size. The size of a framework can be measured by various metrics, number of lines of code, number of classes and packages / modules, number of variation points and possible combinations of components, etc. It would be inappropriate to use the size of frameworks as a quantitative evaluative criteria, since the functionalities supported are not directly related to it, and an increase in its size does not necessarily imply greater complexity in its use. Therefore, we consider it as a qualitative criterion. As a consequence, we consider some of these measures for each framework but they will not be included in the quantitative assessments. C5.6 Numerical handling: Most metaheuristic techniques are stochastic, requiring 123 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS the use of a random number generator. This fact has two consequences: (i) choosing a good random number generator is a key point for the proper behavior of the techniques implemented by MOFs; and (ii) in order to support experiments replicability, a unique seed must be used on all random number generators used by along the framework and its customizations/extensions developed by users. Features evaluating this two important points are defined for this characteristic, where (i) evaluates if a proper random number generator is provided (either a Mersene Twister implementation or support for customization of the random number generation scheme); and (ii) evaluates the replicability of experiments based on the support of a global seed and provision of a random number generator using this seed to user implemented modules. Metric: A uniform metric is defined to assess this characteristic. 5.7.2 Assessment and feature cover analysis This area seems to be the most homogeneous and supported in the sense that most frameworks support almost all the features and to a high degree. The platforms supported is practically universal, except for HeuristicLab, EsayLocal and some modules of ParadisEO. It is remarkable also the general adoption of the UML notation, as well as the open source licensing models. In Figure §5.5 a stacked columns diagram is shown for some characteristics of this area. With regards to the size of MOFs, Figure §5.6 shows the framework sizes in terms of number of packages(or modules) and classes (or files, when there is not a direct relation from files to classes). These attributes may be of interest because the size of a framework may be an indirect measure of its complexity and therefore of its possible difficulty of use. However, the restrictions imposed by the language should be taken into account as for example in java each public class must be in a separate file. Table §5.3 and Figure §5.5 provide the answers to RQ7, RQ8 and RQ9. There is wide availability of MOFs per platform, where each technique is available on nearly all platforms. This fact is due to the use of platform independent programming languages such as Java and C++ (using standard libraries). However, as there is not a MOF supporting all techniques, users must be careful since although there could be available platforms providing implementations for missing techniques, the effort needed for changing is considerable, and implies give up other features or variants. All the frameworks evaluated provide GPL or free licenses for teaching or research purposes. Finally, basic software engineering best practices, such as UML diagrams of MOFs architecture and dynamic module loading are widely supported, but more 124 5.8. DOCUMENTATION & SUPPORT (C6) Figure 5.5: Design, implementation & licensing assessment Figure 5.6: Frameworks size advanced ones, such as automated tests, use of dependence injection libraries and explicit variation points documentation are not supported. Notably, some frameworks do not support the use of a proper random number generator nor its customization. 5.8 D OCUMENTATION & SUPPORT (C6) When selecting a framework for developing any kind of application, documentation, technical support and user community responsiveness are important. These are 125 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS MOF Platforms License EasyLocal ECJ Prog. Lang. C++ Java Unix All ParadisEO C++ EvA2 FOM HeuristicLab JCLEC (and KEEL) MALLBA Optimization Algorithm Toolkit Opt4j [178] Java Java C# Java C++ Java All (Except for windows if using PEO) All All Windows All Unix All GPL Open Source (Academic free license) CECILL (ParadisEO) and LGPL (EO) Java All LGPL GPL GPL LGPL Open Source LGPL LGPL Table 5.3: MOFs Programming languages, platforms and licenses the factors that can smooth out the learning curve when users have no experience and need to solve problems or errors that arise during use. Consequently we have considered those factors, including additional features, in order to measure the maturity of the frameworks; such as types of problems that MOFs bring as samples, and the number of scientific articles published using the framework: C6.1 Sample problem types: As a measure of maturity and supportiveness of frameworks, we have this characteristic that assesses the implemented problem types that MOFs provide. This characteristic can also measure to what extent MOFs have been applied and tested with different kinds of problems. Moreover, solved problem types can be excellent starting points if users try to solve problems to some extent similar to those provided. The set of problem types considered comprises problem families such as TSP, SAT, QAP, Job Shop Scheduling, Flow Shop Scheduling and knapsack, iterated prisoners dilemma, symbolic regression problems, and others. The exact problem types can be consulted in the evaluation data sheet mentioned previously. Metric: A uniform metric is defined, where the weight is distributed evenly amongst the evaluated problem types. The set is comprised of fifty nine different problem types. C6.2 Articles & papers: Another way to assess the maturity and quality of MOFs is 126 5.8. DOCUMENTATION & SUPPORT (C6) through scientific publications that describe MOFs or report their use. The assessment of this characteristic relies on publications found during our literature review and on publications enumerated on MOFs websites. A total number of 285 publications were found for the selected MOFs, searching for papers from 2000 to 2010. Metric: An ad hoc. metric is defined: the maximum score (1.0) was assigned to the framework with the most publications, namely ECJ with 113, and the score of the other frameworks were computed based on this formula: score=(publications of MOF N) / (maximun number of publications per MOF)4 . C6.3 Documentation: Documentation is the main source of information for users in a framework, a capital element to enable its use. This characteristic is assessed based on the presence (or absence) of the following features: (i) User manual; (ii) Technical/development documentation; (iii) “How to” document, where short recipes are provided to perform usual actions; (iv) frequently asked question section on the web site of framework documentation; and (v) MOF web site. Metric: A uniform metric is defined, where each feature weighs 0.2. C6.4 Users & Popularity: This characteristic intends to assess the number of users of each framework. The evaluation of this characteristic is based on the number of researchers using each framework outside the MOF creators research group and development team; we name them “external users”. In order to evaluate this characteristic we have filtered publications found during our literature review using each MOF and on publications enumerated on MOFs websites, removing those where one of its authors is member of the development team or research group of MOFs creators. Metric: An ad hoc. metric is defined: the maximum score (1.0) were assigned to the framework with more external publications, namely ECJ with 84, and the scores of the other frameworks were computed based on this formula: score=(external publications of MOF N) / (maximun number of external publications per MOF). The whole set of publications found per framework is available at http://www.isa.us.es/uploads/MOFs/ bib/N-external.bib, where N is the name of each MOF; for instance ECJ bibliography of external publications is available at http://www.isa.us.es/uploads/MOFs/ bib/ECJ-external.bib. 4 The whole set of publications found per framework is available at http://www.isa.us.es/ uploads/MOFs/bib/N.bib, where N is the name of each MOF; for instance, ECJ bibliography is available at http://www.isa.us.es/uploads/MOFs/bib/ECJ.bib. 127 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS 5.8.1 Comparative Analysis In general, the feature that is less supported in this area is the implemented problem types. With regards to papers that describe or apply MOFs and popularity between external authors, ECJ is the most salient framework, that dwarfs the other MOFs in this comparative. Figures §5.7(a) to §5.7(d) illustrate this fact: Figure §5.7(a) shows the number of publications per MOF and year. ECJ appears as the senior framework, obtaining a dominant position early in the procedures which it still holds. The Figure (b) shows the total number of publications per MOF. Subfigure §5.7(c) shows the number of external and internal publications per MOF as an stacked columns chart. Figure §5.7(d) shows the number of external authors per MOF. ECJ is followed by ParadisEO and HeuristicLab in number of publications. ECJ has nearly 75% of external publications and 65% of external authors. The less popular frameworks are FOM and OAT with nearly null external usage and a small number of publications. Note that there are two frameworks that score low on Documentation, namely OAT and EasyLocal. All frameworks have active and supportive communities of users/developers. Figure §5.8 uses a stacked columns diagram to summarize the support of this area’s characteristics. Figures §5.7 and §5.8, and the information gathered along this study provide an answer to RQ10: Currently, a high number of MOFs are available which support a wide set of features. So when addressing new problems or performing research studies on well-known ones, the use of MOFs becomes a valid approach. MOFs use outside of developers research groups could be boosted by an improvement of framework documentation and support. Currently, the most popular framework is ECJ, which has a large community of external users and a wealth of publications year on year. Moreover, there seems to be a correlation between the score in area C3 and MOFs popularity, since frameworks with higher scores in that area are those with higher popularity. This fact is not surprising, since that area contains some of the features that add more value for user. These features, such as distributed and parallel optimization, make MOFs tools capable for solving extremely complex problems, and are difficult to implement from scratch; thus making those frameworks more attractive for users that need those features and contributing to make those MOFs popular. 128 5.8. DOCUMENTATION & SUPPORT (C6) (a) Publications per frameworMOF and year (b) Total publcations per MOF (c) Total number of internal and external publications (d) Total number of external authors per MOF per MOF Figure 5.7: Publications and external authors per MOF 129 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS Figure 5.8: Documentation and technical support 5.9 D ISCUSSION AND C HALLENGES In this section, we discuss the results obtained in this study. Based on these results, we identify a number of challenges (RQ11) to be addressed in the future. Challenges are part of the authors’ own personal view of open questions, based on the analysis presented in this paper. Figure §5.9 shows global score results for MOFs as Kiviat diagrams, summarizing the results of this study; evaluating MOFs from a research user perspective. In the appendix, Table §A.2 shows the global score obtained for each MOF and characteristic as well as the average for each area. To achieve the maximum score in areas C1,C2 and C3, each MOF would have to implement an ample subset of the current state of the art on metaheuristics, so it is not surprising that the scores do not generally reach the maximum possible value. On the contrary, the small average values on areas C4 and C6 are significant and therefore show a general improvement direction for current MOFs. 5.9.1 Capabilities Discussion On average, the MOF with the best score is ECJ (maximum area in Fig. §5.9), making it a preferred choice if users can use EA on java. However, this MOF scores below average in areas C1 and C5, which are clear improvement areas for it, and could lead users to evaluate different options (C1 measures techniques available). The next best scored MOF is ParadisEO, salient in areas C1 and C3, which uses C++ as its implemen- 130 5.9. DISCUSSION AND CHALLENGES tation language. This MOF, however, scores below average in C4 area, making this a clear improvement area. The MOFs that provide the amplest support in terms of the variety of metaheuristics (criterion C1) are FOM and ParadisEO. The score obtained by OAT in C4 area is remarkable, much above average, and it is due to its GUI, experiments execution and statistical analysis tooling. In this same area the support of JCLEC (and its twin project KEEL) is also above average. However the best score of the GUI characteristic is obtained by HeuristicLab, that in its las version (3.3) provides an complete, highly configurable and intuitive user interface. C5 area is where all of MOFs provide better average results. This is not surprising given that these characteristics are key for frameworks use and success, and are clear signs of technical competence and maturity. In this sense, MOFs without good design or implementation simply do not survive. Finally, the average value of area C6 indicates the need to improve documentation, user guidance and support. Thus we define Challenge 1: Improve documentation, user guidance and support and GUI tooling. 5.9.2 Evolution of the market of MOFs The creation of this benchmark has been a time consuming and demanding task. However the length of this task has allowed the evaluation of an additional feature of the set of MOFS: its liveliness and evolution speed. During the creation of this benchmark, various frameworks released new major versions with important improvements, namely ECJ, PARADISEO, JCLEC and HeuristicLab, moreover other frameworks such as EvA2 and Opt4j released minor versions with bug fixes and minor features. This evolution allowed us to test the evaluation framework presented in this study. No modifications were needed in order to asses those new versions of the MOFs and their features, thus it validates the flexibility and completeness of our approach. Moreover, both the previous and new versions of those frameworks were evaluated, providing a dynamic view of the ecosystem, in contrast with the static one shown in the previous sections. In this sense, we can evaluate the “hot areas”, i.e. those areas where more evolution have been performed; and the speed in the evolution of the assessed MOFs. In this sense the area with bigger improvements are C4 and C5, primarily due to the improvements in the GUI and licensing model of HeuristicLab and the new GUI of ECJ. Additionally C1 and C6 have also improved significantly but in a smaller scale, since new techniques and better documentation are provided by the assessed MOFs. The MOF with a bigger improvement in this time was HeuristicLab changing directly form version 1.1 to version 3.3. In this new version significant improvement is performed in the licensing model (it becomes an open source project under GPL license) 131 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS and the GUI and documentation has been found. The next framework in terms of improvement during the creation of this benchmark was ECJ, where a multi-objective technique and a GUI were added; complementary, significant improvements in documentation has been developed. Finally, the evolution measured shows that the current MOFs Ecosystem is a vibrant and living one, where new versions and important features are added continuously. Both the final evaluation of current versions and the previous one are available as Google Docs spreadsheets at http://www.isa.us.es/MOFComparison and http: //www.isa.us.es/MOFComparison-OLD respectively. They can be downloaded and exported to different formats such as MS Office or open office for customization and tailoring. 5.9.3 Potential areas of improvement of current frameworks In addition to the points stated above about area C6, based on the finished comparative study carried out, and on results described above, we enumerate below some gaps and unsupported features that have been identified. The areas where we see the most room for improvement are C2 (Adaption to the Problem and its Structure), C3 (Advance Characteristics), C4 (General Optimization Process Support). Specifically, some features that have room for improvement are: • Hyper-heuristics support. • Support for designing and automated running of experiments and for analyzing results. • User guides together with wizards, project templates and GUI to aid the optimization process • Parallel and Distributed computing support. • Domain Specific Languages for objective function and constraints formulation Thus we define Challenge 2: Provide added-value features for optimization, such as hyper-heuristics and Parallel and distributed computing capabilities. In particular, with regards to the C5 area (Design, Implementation & Licensing), we have identified the following unmet of software engineering best practices: 132 5.10. SUMMARY • Absence of unit tests. Note that one of the discarded EA-oriented optimization library (JGAP) is recognized reference for this practice [186], however assessed MOFs donı̈¿½t provide unit tests in general (except for JCLEC and HeuricLab). • Heterogeneity of project building and description mechanisms. It would be interesting that, as in ParadisEO, projects provide files for framework compilation using standard mechanisms such as makefiles in C++, or ant or maven build files in java. • Absence of explicit documentation of variation points. Although all the frameworks that have been evaluated provide extensive technical documentation of the different classes and modules, none of them provide a scheme (such as feature models) to describe the variation points of the framework, nor are these even described explicitly in natural language in the documentation. Moreover, none of the frameworks use the UML profiles for framework documentation [101]. • Limited dynamic and reflexive capabilities for loading problems, heuristics and techniques variants. Thus, only Opt4j uses a dependency injection mechanism (such as Google Juice or Spring). Finally, regarding area C1 (Metaheuristic techniques) there is always the possibility of enlarging the portfolio of techniques implemented. The current support is uneven, with some techniques (such as EA) practically universally supported and others (such as GRASP, SS, ACO or AIS) being rarely implemented. Thus we define Challenge 3: Improve techniques and variants support and Challenge 4: Develop standard benchmarks for MOFs. 5.10 S UMMARY In this chapter we have performed an assessment based on the state of the art of the main MOFs. The motivation of the study is based on the implications of the NFL theorem in terms of the desirability and advantages of using such tools, on the complexity and difficulty of learning and mastering the use of any of these frameworks and on the availability of a good number of MOFs. From the MOFs assessment carried out, we can draw the following conclusions: • Frameworks are useful tools that can speed up the development of optimizationbased problem solving projects, reducing their development time and costs. They 133 CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS might also be applied by non-expert users as well as extend the user base and the applications scope for metaheuristics techniques. • There are many MOFs available, which overlap and provide similar capabilities which means that a certain duplication of efforts has been made. It would be great if a certain coordination and standardization of these MOFs was carried out in order to improve the support given to the user community. • There are visible gaps in the support of specific key characteristics, as shown in section §5.9.3. The contributions presented in this chapter were published in the indexed journal Soft Computing [213]. 134 5.10. SUMMARY Figure 5.9: General scores of MOFS as Kiviat diagrams 135 6 S CIENTIFIC E XPERIMENTS D ESCRIPTION L ANGUAGE The limits of my language means the limits of my world. Ludwig Wittgenstein, 1889 – 1951 Austrian-British philosopher Writing cannot express all words, words cannot encompass all ideas. Confucius, n this chapter we present SEDL, a language for describing scientific experiments. Section §6.1 provides an introduction to the scope and general structure of the language. Section §6.2 illustrates the elements of SEDL experimental descriptions. Section §6.3 shows how SEDL can be used to describe experimental executions. In Section §6.4 we present a catalog of analysis operations for SEDL documents. The extension points of the language are described in Section §6.5. Finally, Section §6.6 summarizes the contributions of this chapter. I 137 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE 6.1 I NTRODUCTION In this chapter we present SEDL (Scientific Experiments Description Language), a domain-independent language to describe experiments in a precise, tool-independent and machine-processable way. Additionally, we present a catalog of analysis operations to support the automated validation and extraction of information in SEDL experiments. Figure §6.1 shows the structure of an experiment in SEDL and a sample experiment. The document is divided into two main sections: experimental description and experimental execution. The former includes details about the objects, subjects, population, constants, variables, hypothesis, design and analysis specification. The latter includes information concerning the configurations used to run the experiments and the results obtained. A formal definition of the abstract syntax of SEDL using UML metamodels is presented in Appendix §B. The appendix also includes a specific concrete XML-based syntax to support serialization in our ecosystem MOSES (see Chapter §8). In this chapter, a human-readable syntax based on plain text is used. In the following sections, the different parts of a SEDL document are described providing examples. 6.2 E XPERIMENTAL DESCRIPTION The description of an experiment includes all the information required to conduct the experiment. In the next subsections we explain how experiments are described in a SEDL document. 6.2.1 Objects, subjects and population The first section of a SEDL document includes information about the context of the experiment, namely experimental subjects, population and objects. The accessible population could also be optionally described. Figure §6.2 shows a SEDL document fragment of the experiment #1 presented in Chapter §3. The subjects (experimenters) are Bart and Lisa Simpson. The experimental objects are individual human beings with fever. The population is composed of all the sick people with fever. The accessible population is the sick people in the Seville Hospital. 138 6.2. EXPERIMENTAL DESCRIPTION Figure 6.1: SEDL structure and its mapping to a sample experiment 139 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE EXPERIMENT : New−A n t i p y r e t i c 1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Subjects : B a r t Simpson L i s a Simpson Objects : ’Individuals with fever ’ Population : ’Any feverish person ’ A c c e s s i b l e P o p u l a t i o n : ’Feverish people in the Sevilla Hospital ’ ... Figure 6.2: Schema of the context information supported by SEDL 6.2.2 Constants and variables Constants in SEDL are defined by an identifier and a value. Variables are defined by an identifier, a type and a domain. Default types are integer, real and enumerates that can be in turn ordered or not. Variable domains can be described by extension (i.e., by enumerating each possible value), or by intension (i.e., defining a minimum and maximum value). Also, variables are divided into: controllable factors (Factors), non-controllable factors (NCFactors), nuisance variables (Nuisances) and outcomes (Outcomes). Figure §6.3 depicts the constants and variables sections of the SEDL document for experiment #2 presented in Chapter §3. The experiment has a constant (TerminationCriterion), a controllable factor (OptTech), a non controllable factor (Instance), and an outcome (ObjectiveFunction). The levels of the variables can be simple values, such as labels or integers, or a named list of properties (pairs of keys and values). For instance, P0( File : ‘/tmp/p0.pwsc0 ) means that the level P0 of the variable Instance has a property named File with value ‘/tmp/p0.qwsc’. EXPERIMENT : New−A n t i p y r e t i c 1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 ... Constants : T e r m i n a t i o n C r i t e r i o n : ’MaxTime (10000) ’ // In m i l l i s e c o n d s Variables : Factors : OptTech enum EA, GRASP1 , . . . , TS+SA // Opt . t e c h n i q u e NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e I n s t a n c e enum o r d e r e d P0 ( f i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language Outcomes : O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found ... Figure 6.3: Schema of the context information supported by SEDL 140 6.2. EXPERIMENTAL DESCRIPTION 6.2.3 Hypotheses The type of hypothesis of a SEDL experiment is specified with a keyword indicating whether the hypothesis is differential, associational or descriptive (see Section §3.3.3 for details). If the hypothesis is differential, it is assumed that the goal of the experiment is to confirm or disprove that the values of the controllable factors make a difference in the outcome. Associational hypotheses are intended to confirm or disprove that the relationship between the controllable factor and the outcomes follows a specific mathematical relation. Finally, descriptive hypotheses state that the value of the outcome has certain statistical properties. An example of differential hypothesis is presented in Figure §6.4. The hypothesis implicitly states that the specific optimization technique used has a significant impact on the value of the objective function, i.e., that the techniques have different performance. Figure §6.5 shows a descriptive hypothesis, stating that the average decrease in body temperature for the patients participated in the experiment #1 is 2.83 Celsius degrees. The syntax for describing statistical analyses is defined in section §6.3. EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 ... Variables : Factors : OptTech enum EA, GRASP1 , . . . , TS+SA // Opt . t e c h n i q u e NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e I n s t a n c e enum o r d e r e d P0 ( F i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language Outcomes : O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found // Means t h a t t h e combination o f f a c t o r s , i n t h i s c a s e t h e opt . t e c h . makes Hypothesis : D i f f e r e n t i a l // a d i f f e r e n c e on t h e value o f t h e outcome Figure 6.4: SEDL document with randomized design Regarding associational hypotheses, the mathematical relationship depends on the type of the outcome variable. This relation can be Linear for scalar outcomes. This means that the value of the outcome variable depends on the values of the factors (both controllable an non-controllable) according to the following equation: outcome = C0 + C1 ∗ f actor1 + ... + Cn ∗ f actorn where C0 , . . . , Cn are constants in R. If the outcome factor is not real the type of relation should be specified. For ordered and enumerated outcomes the relationships are usually Logistic [133], and for binary variables it is Probit [33]. 141 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE EXPERIMENT : A n t i p y r e t i c s −Desc−Hyp−1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 ... Constants : dose : 2 0 0 // measured i n m i l l i g r a m s Variables : Outcomes : bodyTemperatureDecrease f l o a t // i n C e l s i u s degrees measured 2 // hours a f t e r t h e a d m i n i s t r a t i o n Nuisances : age : i n t e g e r ( 1 8 , 6 0 ) weight : f l o a t ( 4 0 , 1 2 0 ) Hypothesis : D e s c r i p t i v e Mean( bodyTemperatureDecrease ) = 2 . 8 3 Figure 6.5: Descriptive hypothesis supported by SEDL 6.2.4 Experimental design The design of a SEDL experiment includes information about the sampling, assignments, blocks, groups and the experimental protocol. These concepts are fully described in Section §3.3.4. Figure §6.6 shows the design of experiment #2. EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 ... Variables : Factors : OptTech enum EA, GRASP1 , . . . , RS // Opt . t e c h n i q u e NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e I n s t a n c e enum P0 ( F i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language Outcomes : O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found Hypothesis : D i f f e r e n t i a l Design : Sampling RandomBlock Blocking I n s t a n c e Assignment Random Groups OptTech s i z e 20 P r o t o c o l Random Analyses // Use ANOVA or Friedman A1 : FactANOVAwRS( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) ) Tukey ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) ) A2 : Friedman ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) , 0 . 0 3 ) Holms ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) ) Figure 6.6: Simple randomized design supported by SEDL The supported sampling methods are random, random block and custom. If the sampling method is random the objects are randomly selected from the population. If the 142 6.2. EXPERIMENTAL DESCRIPTION sampling method is random block, the objects are randomly picked from the population and assigned to the blocks until all blocks are complete. The number of blocks depends on the non controllable factors used for blocking in the assignments. Finally, custom samplings methods are also allowed using the extensions points of the language (see Section §6.5). In the example, a random block sampling method is used. Thus, the runs of the metahuristics programs are divided into 10 blocks, one for each problem instance. The assignments indicates how the objects are grouped, the size of the groups, and which values of the controllable factors (i.e., treatments) are applied to each group. In Figure §6.6, this means that each metaheuristic program will be run 20 times for each block, i.e., problem instance. The protocol establishes the order in which the individual objects receives the treatments. The experimental protocols supported in SEDL are random and custom. In the example shown in Figure §6.6, the protocol establishes that the order in which the metaheuristic programs are run with each problem instance is random. Custom protocols can be defined using the extension points of the language. 6.2.5 Analyses specification The description of a SEDL experiment concludes with the specification of the statistic analyses to be performed on the outcomes. This specification comprises of a list of alternative set of statistical tests. This list is prioritized indicating that the first test set whose preconditions holds will be applied, ignoring the rest. Each set of statistical tests has an identifier (A1 and A2 in the example) and may include one ore more tests that are expected to be applied in sequential order. The experiment described in Figure §6.6 describes two alternative sets of statistical analyses, one using parametric tests (ANOVA and the Tukey’s post-hoc procedure) and the other using non-parametric tests (Friedman’s tests and Holms post-hoc procedure). SEDL supports the specification of two types of statistical analyses on experimental datasets, descriptive statistics and statistical analyses. Regarding descriptive statistics, SEDL supports the specification of: • Central tendency measures include mean, median, mode, and confidence intervals. • Variability measures include standard deviation, range, inter-quartile range, and confidence intervals. 143 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE • Rankings establish an order relation on the levels of a set of factors based on the value of a descriptive statistic, e.g. mean. Regarding statistical analyses, SEDL supports the specification of: • Null Hypothesis Significance Tests (NHST). The tests and post-hoc procedures supported are described in Appendix §D. In SEDL, NHST comprise of: the name of the specific test to be performed, the dataset on which the test should be performed, and the significance level α , e.g. ANOVA( Filter (OptTech), 0.001) means: “perform an ANOVA multiple comparison test on the datasets per optimization technique with α = 0.001”. • Correlation coefficients. In SEDL the specific methods used to compute correlation are specified by name. The correlation coefficients supported are Spearman [257], Pearson [260], Kendall [158] and Cramer [59]. The datasets on which the analyses should be performed are specified in terms of three different operations: • Filtering: This operation selects a subset of the results dataset based on the values of specific variables, in a very similar way as the WHERE clause in SQL. SEDL supports two types of filters, per variable and per group. For instance, in experiment #2, Filter (OptTech = EA) would generate a single dataset with the measurements of the variables where the metaheuristic EA was used. Another example in the context of experiment #2 is Filter (OptTech), which generates as many datasets as levels has the OptTech variable, i.e., one dataset per metaheuristic. • Projection: A projection defines the set of variable measurements that a dataset will contain. It is similar to the enumeration of columns after the SELECT in SQL queries. For instance in experiment #2, Project(ObjectiveFunction) would generate a single dataset with all the measurements of the ObjectiveFunction. When a single outcome variable is specified in the experiment, a projection by this variable is implicitly assumed. For instance, in experiment #1, the mean of the body temperature decrease per dose can be specified as Mean( Filter (dose)), which is equivalent to Mean( Filter (dose).Project(bodyTemperatureDescrease)) • Grouping: Grouping operations define how the elements in different datasets will be arranged for comparison in statistical tests. Its primary use is the specification of the blocks defined by blocking variables. For instance, in experiment #2, 144 6.3. EXPERIMENTAL EXECUTION Grouping( Instance) means that the datasets to be compared must contain tuples with same value for the Instance variable. 6.3 E XPERIMENTAL EXECUTION As introduced in Section §3.4, experimental executions comprises of two phases, experimental conduction and data analysis. Detailing these phases is especially relevant to automate the execution and replication of computational experiments. Figure §6.7 describes a simple SEDL configuration. EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 ... Configuration C1 : Inputs : F i l e p e r I n s t a n c e ’${Instance }. qoswsc ’ Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . . Runs : E1 : R e s u l t : F i l e ’...’ Analyses : A2 : p−value : 0 . 0 0 0 1 7 // Thus t h e r e a r e s i g . d i f f . ( our hyp . i s TRUE) // Conclusion : D i f f e r e n t i a l Hypothesis Accepted Figure 6.7: Simple experimental execution in SEDL SEDL describes the experimental execution in terms of configurations. A configuration includes all the information required to conduct and replicate the experiment. This includes the URIs of the input files and the output files to be generated as well as the experimental settings. The results files of an experiment are assumed to have a relational structure, where each tuple contains the information obtained from a measurement of the corresponding values of variables. In particular, the attributes on the relation correspond to the variables of the experiment or any other relevance information of the object. For instance, in experiment #1, in addition to the values of the variables (dose and difference in corporal temperature) the results tuples can also contain the social security number (SSN) of the patient. The experimental settings describe the requirement to conduct the experiment within an specific configuration. This point is clearly domain-dependent and thus it is defined as an extension point of the language. However, an specific extension for computational experiments is provided. This SEDL extension describes experimental settings 145 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE Figure 6.8: Experiment description and relational model of its results in terms of operating system, runtimes and libraries. Other types of experimental settings (such as medical equipment, or measurement instruments) could be specified through specific DSLs using the corresponding extension point of the language. Furthermore, configurations may contain a detailed description of the conduction process named experimental procedure. This description is aimed specifically at automation. While the experimental design defines what treatments will be applied on which experimental objects and in which order, the experimental procedure details how to apply the treatments. The language required to specify such a description depends strongly on the type of experiment, thus we have created an extension points for this purpose. In our reference implementation two different DSLs are provided: one supporting the execution treatments through shell commands, and another one specific for metaheuristic optimizations with MOFs. Figure 6.9 shows an examples of experimental procedure specified using the command line procedure. Note that the values and the variables and constants of the experiment are used to apply the treatment. Specifically, the procedure described in this figure states that the treatment depends on a single variable named “solver” (but can use any constant defined in the experiment). The application of the treatment is performed by invoking a command that runs a java program named ETHOM, and the procedure specifies as parameters of such java program the output file to be generated, 146 6.3. EXPERIMENTAL EXECUTION a set of constants values, and a property of the specific variable that the treatment takes as input (solver). EXPERIMENT : SampleCommandExperimentalProcedure v e r s i o n 1 . 0 //no r e p o s i t o r y ... Constants : / * Parameters o f t h e F e a t u r e Model t o be g ene rate d * / NFeatures : 5 0 0 // Number o f f e a t u r e s o CTC : 20 // P e r c e n t a g e o f Cross Tree C o n s t r a i n t s / * Parameters o f ETHOM ( i t i s an E v o l u t i o n a r y Algorithm ) * / CrossoverProb : 0 . 7 // Crossover r a t e or p r o b a b i l i t y MutationProb : 0 . 0 0 5 // Mutation p r o b a b i l i t y P o p u l a t i o n S i z e : 100 // P o p u l a t i o n s i z e Variables : Factors : // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n s o l v e r enum JaCob ( name : ’CSP -JaCoB ’ ) , Choco ( name : ’CSP -Choco ’ ) Outcomes : O b j e c t i v e F u n c t i o n i n t e g e r // B e s t value o f t h e o b j . func . found ... Configurations : C1 Procedure : Command as Treatment ( s o l v e r ) : ’java -jar ETHOM Results.csv ${NFeatures} ${CTC} ${solver.name }\ ${ CrossoverProb } ${ MutationProb} ${ PopulationSize }’ ... Figure 6.9: Sample of command experimental procedure Configurations may include a set of so-called runs. A run represents a conduction of the experiment using an specific configuration. Note that this is primordial to make experiments self-contained and to enable the comparison of results in subsequent replications. An experimental run is described in terms of the result dataset and the results of the analyses. Accordingly to the described in the previous section, the results can be either of descriptive statistics or results of statistical tests The results of descriptive statistics are presented in terms of its value per object/group, e.g. in experiment #2 for a mean of the objective function by metaheuristic (specified as Mean( Filter (OptTech))) the results would be ( EA) : 7.21, ( RS) : 2.57, etc. Another example in the context of experiment #2 would be computing the confidence interval of the mean per metaheuristic and problem instance, specified as CI ( Filter ( Instance,Op The results generated for this analyses would be ( P1| EA) : 12.53, ( P1| RS) : 9.12, . . .,( P10| EA) : 721.3,( P10| RS) : 512.32, etc. The results of statistical test are presented per object/group indicating the name and value of each statistical measure, namely, p-value, freedom degree and threshold. Figure §6.10, shows a set of examples of analysis and their corresponding results. 147 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE EXPERIMENT : S a m p l e A n a l y s e s S p e c i f i c a t i o n v e r s i o n 1 . 0 //no r e p o s i t o r y ... Design : ... Analyses : A1 : Avg ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) A2 : StdDev ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) A3 : Range ( F i l t e r ( OptTech ) . P r o j ( ObjFunc ) ) A4 : CI ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) A5 : IQR( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) A6 : Ranking ( Avg ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) , OptTech ) A7 : Median ( P r o j ( ObjFunc ) ) A8 : Friedman ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) , 0 . 0 5 ) Holms ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) , 0 . 0 5 ) A9 : Pearson ( BodyTempDiff , { Dose } ,{ ’Cx’ } ) ... Configurations : C1 Runs : E1 : ... Analyses : A1 : Avg (EA| PI0 ) : 6 . 3 2 , . . . , Avg (GRASP6 | P0 ) : 5 . 4 1 , . . . , (EA| PI0 ) : 6 . 3 2 , . . . , (GRASP6 | PI0 ) : 5 . 4 1 A2 : StdDev (EA| PI0 ) : 1 . 6 4 , . . . , StdDev (GRASP6 | P0 ) : 0 . 4 1 , . . . , StdDev (EA| PI10 ) : 5 . 3 2 , . . . , StdDev (GRASP6 | PI10 ) : 1 . 1 A3 : Range (EA ) : 1 . 7 − 2 2 3 . 6 2 , . . . , Range (GRASP) : 0 . 8 7 − 1 8 3 . 4 3 A4 : CI (EA| PI0 ) : 6 . 1 2 − 7 . 2 1 8 } , . . . , CI ( GRASP6 | P0 ) : 5 . 3 5 − 5 . 9 3 , . . . , CI (EA| PI10 ) : 6 . 3 2 , . . . , CI (GRASP6 | PI10 ) : 5 . 4 1 a5 : IQR(EA| PI0 ) : 1 . 8 2 , . . . , IQR( GRASP6 | PI0 ) : 0 . 5 7 , . . . , IQR(EA| PI10 ) : 6 . 3 2 , . . . , IQR( GRASP6 | PI10 ) : 5 . 4 1 A6 : Ranking : ( GRASP+PR1 ) : 1 , (GRASP6 ) : 2 , . . . , ( TS+SA ) : 5 A7 : Median : 3 . 1 7 6 A8 : Friedman : Pvalue : 0 . 0 0 0 1 7 , d e s c r i p t i o n : ’Chi -Squared dist.’ , freedom degrees : 2 4 { (EA vs TS+SA) Pvalue : 0 . 0 6 3 Sthreshold : 0 . 0 2 } , . . . , { (EA vs GRASP6 ) Pvalue : 0 . 0 0 3 Sthreshold : 0 . 0 2 } A9 : Pearson ( BodyTempDiff , { Dose } ) : {C : 0 . 2 , Cdose : 0 . 0 0 1 5 , r : 0 . 7 8 9 1 2 } Figure 6.10: Samples of statistical analyses specifications and results 6.4 A UTOMATED ANALYSIS OF SEDL DOCUMENTS In this section, we present a catalog of 15 analysis operations for the analysis of SEDL documents. In particular, we first present several operations for checking the internal validity of experiments described in SEDL. Then, we propose several operations to extract information from SEDL documents. We may remark that the characteristics of SEDL makes the implementation of these operations straightforward. To the best of our knowledge, this is the first approach supporting the automated validation of experiments. 148 6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS 6.4.1 Information extraction operations In this section, we present a set of generic operations for the automated extraction of information from SEDL documents. These can be helpful to make decision during the experimental process and to report further information about experiments. Also, these can be used as auxiliary operations to check the validity of experiments described in SEDL. Number of Blocks This operation computes the number of blocks of the experiment as the product of the number of levels of the blocking variables. It takes as input an SEDL experiment E. This operation can be expressed as follows: #blocks( E) = ∏ (v.levels.size) if B , ∅ (6.1) v∈ B 1 Otherwise where B is the set of blocking variables of the experiments, i.e., E.design.blockings. For instance, the number of blocks of the experiment described in Figure §6.6 is 10, since it has a single 10-levels blocking variable. Another example is the experiment Exp1a described in figure §9.9 (in Chapter §9), which has two block variables NFeatures and CTC with 4 levels each. For this experiment #blocks( Exp1a) = 4 ∗ 4 = 16. Measurements per object This analyses operation computes the number of measurements performed on each experimental object a group (this is usually specified by the experimental protocol). For complete randomized designs, blocking factorial designs, and most of the designs described in literature, its value is 1 [116]. An scenario where the result of this operation would be 2 is experiment #1 if the decrease in body temperature of patients were measured two times. We denote this operation as #measurementPerObject. Measurements per experimental Block This operation computes the total number of measurements that should be generated for each block in a classical design. This operation can be expressed as follows: measurementsPerBlock( E) = ∑ ( g.size ∗ #measurementPerObject(E.design.protocol [ g])) g∈ G 149 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE where G is the set of groups defined in the design of the experiment, i.e., E.design.groups. The analysis operation #measurementPerObject is defined below. For instance, for the experiment E described in Figure §6.6, measurementsPerBlock( E) = ∑51 (20 ∗ 1) = 100, since it has five groups (one per level of factor OptTech) of size 20. Design Cardinality (a.k.a. Design size) This operation computes the expected number of measurements that should be generated according to a specific design. It takes as input an experiment E. It can only compute the result for classical design (this is a precondition of the operation). The cardinality is computed as the product of the number of blocks (computed through the analysis operation blocks) and the number of measurements per block specified in the design (computed through the analysis operation measurementsPerBlock). This operation can be expressed as follows: #design( E) = #blocks( E) ∗ #measurementsPerBlock( E) For instance, in the experiment E described in Figure §6.6 #design( E) = 10 ∗ 100 = 1000 since #blocks( E) = 10 and #measurementsPerBlock( E) = 100. Sample size (#sample) This operation computes the number of objects in the sample. It can only compute the result for classical design (this is a precondition of the operation). The cardinality is computed as the product of the number of blocks and the sum of the sizes of all groups.This operation can be expressed as follows: #sample( E) = #blocks( E) ∗ ∑ ( g.size) g∈ G where G is the set of groups defined in the design of the experiment i.e., E.design.groups, and #blocks is the analysis operation defined next. It is noticeable that when #measurementsPerObject is 1 for all the groups of the experiment, then #sample( E) = #design( E). Results cardinality (a.k.a. Results size) This operation computes the total size of the dataset of an specific execution of a lab-pack. We assume that the structure of the dataset is a relation (in the sense of the relational model). This operation can be expressed as follows in relational algebra: #results( L, ExId) = Ωcount(∗) (rel ( L.executions[ ExId].results)) 150 6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS where L is the labpack that contains both the experimental description of the experiment and the results of the executions, and expression rel ( L.executions[ ExId].results) provides the results of the execution identified by ExID as a relation. 6.4.2 Operations for validity checking In this section, we present a set of analysis operations for checking the internal validity of experiments described in SEDL. Given a SEDL document, the operations diagnose possible internal threats in the experiment. The results of these operations could be used to warn experimenters about potential threats and suggest possible fixes. The operations can be divided into two groups, those used to validate the description of the experiment (description validation) and those used to validate the results and statistical tests (execution validation). Also, we distinguish two different severity levels for the results of the operations, warning and error. Warnings show evidences of possible threats to validity while errors confirm their existence. Table §6.4.2 depicts a summary of the analysis operations for validity checking. For each operation, the threat that it diagnoses, its name, type of validation (description or execution validation) and severity level are shown. In the following subsections, we define each operation indicating the threat that it detects and how to neutralize or minimize it. Analysis operations for checking the internal validity of SEDL experiments Threat Operation Validation IVT-2 IVT-3 IVT-5 IVT-6 IVT-7 IVT-8 IVT-9 IVT-10 IVT-11 Random design Description Random conduction Description Blocking factors Description Missing measurements Execution Small sampling Description Statistical preconditions Execution Multiple comparison Description Out of range Execution Recommended test Execution Severity Warning Warning Warning Warning/Error Error Error Error Error Warning Random design This analysis operation checks whether the sampling, the assignment, and the protocol are randomized. If they are not, the experiment would be more susceptible to 151 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE external events that may influence the conduction of the experiment and its outcome. This should be warned as an internal validity threat in the experiment (ITV-2, Section §3.5.1). To minimize this threat, a random sampling, assignment mechanism and protocol should be used. Random conduction This analysis operation checks whether the experimental protocol defined in the SEDL document is randomized. If it is not, the results could be strongly biased in the case that certain treatments on objects could affect to other subsequent treatments. This should be interpreted as a warning related to the threat to validity ”testing effects”’ (ITV-3, Section §3.5.1). To minimize this threat, a random protocol should be used. Blocking factors This analysis operation checks if there exists a non controllable factor that is not used for blocking in the design. In such a case, the conclusions of the experiment could be threatened (ITV-5, Section §3.5.1). Since the materialization of the threat is not fully confirmed this should be interpreted as a warning. To neutralize the threat, all non controllable factors should be used for blocking. This operation can be expressed as follows: blockingFactors( E) ⇔ ∃nc f ∈ N • nc f < E.Design.Blocking where N stands for the set of non-controllable factors of the experiment, i.e., E.Variables.NCFactors. Missing measurements This analysis operation checks if the results of an experimental conduction contains less measurements that those specified in the design. If the operation returns true, it means that some measurement of the outcomes failed or were lost (ITV-6, Section §3.5.1). In general, this is considered an error. However, in some cases, a small percentage of missing measurements can be acceptable and therefore this can be interpreted as an warning. To neutralize the threat, the experiment should be repeated. This operation can be defined as follows: missingMeasurements( L, ExecId) ⇔ #results( L, ExecId) < #design( L.experiment) where L represents the experimental lab-pack and ExecId the identifier of the execution to check.The analysis operations #results and #design are defined above. For 152 6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS instance, in the first run of experiment #A1 with QoS-Gasp (c.f. Section ??) this operation found missing measurements (less results than those expected), warning us of a bug in the implementation of the techniques. Small sampling This analysis operation returns true if the size of the sampling is less than an input parameter s. This operation is used to diagnose the validity threat derived of using a number of experimental objects too small (IVT-7, Section §3.5.1). Since the minimum accepted size for the sampling varies among experimental disciplines and areas1 , it is provided as a parameter to the operation. To neutralize this threat, the experiment should be repeated with a larger sampling. This can be expressed as follows: smallSample( E, threshold) ⇔ #sample( E) < threshold where #sample is an analysis operation described above. Statistical preconditions As explained in Section §3.4.2, some statistical tests are only applicable to dataset with certain characteristics. For instance, ANOVA is applicable only when the data follow a normal distribution. This operation checks if the dataset specified in a SEDL experiment fulfils all the preconditions required by the statistical tests to be performed. If not, the results would be threatened (IVT-8, Section §3.5.1). This threat is neutralized by repeating the experiment with a set of statistical tests that are suitable for the type of dataset. This can be expressed as follows: statPrecnd( L, ExId) ⇔ ∃( a ∈ A), ∃( p ∈ a.precond) • ¬ p.holds(res( L, ExId)) where A is the set of NHST analyses specified for the experiment, a.precond is the set of preconditions of the NHST a, and p.holds is an operation that checks whether the preconditions hold or not. Multiple comparison This analysis operation checks if several datasets are being compared using a simple comparison statistical test, instead of a specific test for multiple comparison. If a simple comparison test is used, the error rate could be accumulated leading to erroneous conclusions (IVT-9, Section §3.5.1). To neutralize this threat, the statistical analysis should be repeated using a suitable technique for comparing multiple datasets. 1A sample size of 10 could be inappropriate for MOEs but acceptable for an experiment regarding rare diseases. 153 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE Out of range This operation checks if the any of the values of the variables is out of range. In that case, the conclusions of the experiment could be clearly biased (IVT-10, Section §3.5.1). To neutralize it, the experiment should be repeated using suitable ranges for the variables. This can be expressed as: outO f Range( L, ExecId) ⇔ ∃v ∈ V, m ∈ Projection(res( L, ExecId), v) • m.value < v.levels where V is the set of variables of the experiment of the lab-pack, i.e., L.experiment.variables, and Projection(res( L, ExId), v) computes the set of values of variable v in the results of the execution ExecId. Recommended test This operation checks whether the statistical tests performed on the experimental results are those recommended by an specific analysis methodology. For instance, in the thesis we follow the methodology proposed by Derrac et al.. Using the wrong technique may threat the validity of the conclusion due to the “inaccurate size estimation effect” (IVT-11, Section §3.5.1). The result of this operation should be interpreted as a warning. To neutralize the threat, the experimental analysis should be repeated using some of the recommended tests in the methodology. In our specific implementation the decision tree depicted in Figure §8.6. 6.5 E XTENSION POINTS In order to provide the flexibility required to support the description of experiments in different scientific or engineering areas, several extension points have been defined in SEDL. These are presented in the following subsections. 6.5.1 Context • Population. This extension point enables a detailed description of the populations of the experiment. The current version of SEDL supports the description of the populations in natural language. 154 6.5. EXTENSION POINTS 6.5.2 Hypotheses • Assertion. This extension point enables the usage of different languages for specifying the assertions of descriptive hypotheses. Currently, SEDL supports the use of statistical tests (where the assertion states that the null hypothesis is rejected) and descriptive statistics. • Relation. This extension point allows the usage of different languages for specifying the relation among dependent and independent variables in associational hypotheses. For instance, this extension point could be used to provide support for relations based on additional kinds of regression (c.f. Table §3.5). 6.5.3 Designs • Sampling. This extension point enables the accurate and machine-processable description of sampling methods. It could be used to enable the assessment of the external validity of experiments. • Assignment. This extension point allows the accurate description of custom methods in order to integrate it into the automatic assessment of the internal validity of experiments. • Experimental design. This extension point enables the succinct and accurate description of well-known experimental designs defined in the literature such as factorials, Latin squares and hypercubes, etc. • Sizing. This extension point enables the description of group sizes based on different elements of the experiment such as expressions using the value of the constants of the experiment, number of levels of variables, and the α used for statistical tests. Currently SEDL supports constant group sizes. 6.5.4 Configurations • Experimental Procedure. The purpose of this extension point is to enable an accurate description of the procedure of execution of the experiment in different scientific areas. Currently, two DSLs are provided, one for executing computer based experiments as sequences of commands, and another for executing optimization tasks in MOFs. 155 CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE • Settings. This extension point enables the description of the specific equipment required for experimental conduction in particular scientific environments. For instance, this extension point could be used to create a DSL for experimental instruments in Physics, enabling the description of oscilloscopes with a minimum precision, a mass spectrograph, or particle accelerators with a minimum power. • Inputs. This extension point enables the use of alternative sources of input data for experiments such as public online datasets or experimental repositories [87, 202, 218, 223]. Currently SEDL supports the specifications of input files. • Outputs. This extension point enables the use of alternative mechanisms for storing and publishing the result datasets of experimental executions. Currenly SEDL supports the specification of output files. 6.5.5 Analyses • Datasets. This extension point enables the usage of user-friendly languages for specifying the datasets on which experimental analyses should be performed. Currently, SEDL provides a language for specifying simple operations based on the operators of relational algebra, namely Filtering, Grouping and Projection (c.f. Section §6.2.5). • Analysis specification. This extension point enables the usage of additional types of analyses and their result format. For instance, this extension point could be used to enable the generation of histograms and box-plots for exploratory data analysis. 6.6 S UMMARY In this chapter, a language for experimental description named SEDL has been proposed. Its main elements have been introduced through a number of sample experimental descriptions using a concrete syntax based on plain text. Despite the expressiveness and precision in experimental description that SEDL supports, the description of experiments is still tedious and time-consuming due to the high number of elements to be specified. As a response to these drawbacks, an specific DSL intended for the description of MOEs based on SEDL is proposed in the next chapter. 156 7 M ETAHEURISTIC O PTIMIZATION E XPERIMENTS D ESCRIPTION L ANGUAGE If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart. Nelson Mandela, 1918 – South African anti-apartheid revolutionary and politician n this chapter we present MOEDL, a domain-specific language for the description of MOEs based on SEDL. Section §7.1 provides a brief introduction to the language. Sections §7.2 describes the structure of a MOEDL document. The types of MOEs supported by MOEDL and their specific syntax in the language are presented in Section §7.3. Section §7.4 presents a set of transformation rules to transform MOEDL documents into SEDL documents. The extension points of the language are described in Section §7.5. Finally, Section §7.6 summarizes the contributions of this chapter. I 157 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE 7.1 I NTRODUCTION In this chapter we present Metaheuristic Optimization Experiments Description Language (MOEDL), a domain-specific language for the description of MOEs based on SEDL. MOEDL has been designed with the goal of reducing the time and expertise required for describing MOEs with maximum guarantees of validity and replicability. SEDL eases the achievement of those goals since: i ) it enables the interpretation of any MOEDL experiment as a SEDL experiment and check its validity using the analysis operations defined for SEDL (c.f. Section §6.4); and ii ) it frees MOEDL from defining many of the low-level details already supported in SEDL. Figure §7.1 shows the general structure of a MOEDL description and its specific materialization for a sample experiment. The abstract and XML concrete syntax of MOEDL are provided in Appendix §B. The document is divided into three main sections: problems, techniques and configuration. The former includes details about the problem such as its type and problem instances to solved. The second includes information about the metaheuristic techniques used to solve the problem, the termination criterion and random number generator used. The later includes information concerning the execution configuration. In this dissertation the interpretation and analysis of MOEDL documents are performed on the basis of its corresponding SEDL document. To that purpose, we present a set of transformation rules from MOEDL to SEDL, i.e., any MOEDL document can be automatically transformed to a SEDL document. This approach to the design of MOEDL has important advantages. First, it enables the creation of more succinct experimental descriptions, since we can skip the elements that are common to any MOE and incorporate them to the corresponding SEDL documents during the transformation process. Furthermore, this approach enables grouping several experimental design decisions into alternative choices in MOEDL reducing the risk of using incompatible designs. Interpreting MOEDL documents as their corresponding SEDL documents also implies several consequences. First, MOEDL should be as simple as possible, removing all the elements that could be delegated to SEDL. Second, it requires of support tools to define, select and apply the specific transformation strategy, henceforth named experimental methodology for MOEDL interpretation1 (c.f. Section §7.4). 1 Given a MOE it is possible to define several SEDL descriptions that encode the semantics of such experiment with alternative designs [23, 30] that are known as experimental methodologies [21]. 158 7.1. INTRODUCTION Figure 7.1: MOEDL structure and its mapping to a sample MOE 159 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE Figure §7.2 and §7.3 show a simple MOEDL experiment and its corresponding SEDL counterpart. The experiment describes a simple technique comparison. The optimization problem to be solved is the Traveling Salesman Problem (TSP). The experiment includes two problem instances (bruma14 and berlin52, taken from the TSPLib benchmark [1]) and two metaheuristic techniques, a random search and a simulated annealing with exponential cooling scheme. MOEDL : : EXPERIMENT : SimpleTSP v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 Problem Types : // T r a v e l l i n g Salesman Problem TSP ( ’es.us.isa.fomfw.problems.tsp.TSPProblem ’ ) { O b j e c t i v e f u n c t i o n s : TourLength I n s t a n c e s ( f i l e : ’${instance }.tsp’ ) : burma14 berlin52 } O p t i m i z a t i o n T e c h n i q u e s ( encoding : ’es.us.isa.fomfw.problems.tsp.solutions.TSPSolution ’ ) : RS ( RandomSearch ) { } SA(SA) { i n i t i a l T e m p e r a t u r e : 10000 neighbourPerIteration : 5 coolingScheme : E x p o n e n t i a l ( 0 . 9 5 ) } T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 ) // i n m i l l i s e c o n d s −> 1 second Random Number Generator : // Mersene T w i s t e r RNDG B a s i c ( seed : 1 7 4 3 2 6 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . . Runs : E1 : s t a r t 1 0 : 1 2 : 2 7 2013/08/11 f i n i s h 2 0 : 5 6 : 4 8 2013/08/11 R e s u l t : F i l e ’...’ Analyses : A2 : Wilcoxon ( OptTech , I n s t a n c e ) : p−value : 0 . 0 0 0 1 7 Figure 7.2: Sample MOEDL experiment In order to describe the structure of experimental descriptions in MOEDL, first we depict the common elements for any MOE, and next the particular elements of each specific type of MOE. 160 7.2. MOEDL EXPERIMENTAL DESCRIPTIONS EXPERIMENT : SimpleTSP v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 runs : 2 0 Constants : RS−Configuration : ’RandomSearch {}’ SA−Configuration : ’SimulatedAnnealing { initialTemperature : 10000 neighbourPerIteration : 5 coolingScheme : Exponential (0.95) }’ T e r m i n a t i o n C r i t e r i o n : ’MaxTime (1000) ’ RandomNumberGenerator : ’class:org.apache.commons.math3.random. MersenneTwister ’ Encoding : ’class:org.apache.commons.math3.random. MersenneTwister ’ ProblemType : ’es.us.isa.fomfw.problems.tsp.TSPProblem ’ Variables : F a c t o r s : OptTech enum RS , SA // Optimization t e c h n i q u e NCFactors : I n s t a n c e enum burma13 ( F i l e : ’burma13.tsp’ ) , b e r l i n 5 2 ( F i l e : ’berlin52.tsp’ ) Outcomes : O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found Design : Sampling RandomBlock Assignment Random Blocking I n s t a n c e Groups by OptTech s i z e 20 P r o t o c o l Random Analyses // Use T−T e s t or Wilcoxon A1 : T−T e s t ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) ) A2 : Wilcoxon ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . . Runs : E1 : s t a r t 1 0 : 1 2 : 2 7 2013/08/11 f i n i s h 2 0 : 5 6 : 4 8 2013/08/11 R e s u l t : F i l e ’...’ Analyses : A2 : Wilcoxon ( OptTech , I n s t a n c e ) : p−value : 0 . 0 0 0 1 7 Figure 7.3: Example of mapping from MOEDL to SEDL 7.2 MOEDL 7.2.1 Preamble EXPERIMENTAL DESCRIPTIONS The preamble of a MOEDL document identifies the experiment as a MOE by including the prefix MOEDL :: to the EXPERI MENT declaration, and specifies the type of experiment and set of transformation rules to be applied to be applied, using the type and methodology properties respectively. Figure §7.4 shows a simple MOEDL experiment of type techniqueComparison, that uses the basic set of transformations described in this dissertation for this type of experiments. 161 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE 7.2.2 Problem types and instances MOEDL documents must describe the optimization problems to be solved. The description of each optimization problem comprises of i ) an identifier of the problem, ii ) a definition of the problem including an enumeration of their objective functions, and iii ) the specific problem instances to be solved in the experiment. Problems instances in MOEDL can be described as follows: • Instances enumeration: The definition of each problem instance contains an identifier and a set of properties. The file containing the data of that particular problem instance is usually specified as a property. Figure §7.4 depicts a sample MOEDL document fragment with problem instance enumeration. MOEDL provides sugar syntax for providing the file property that uses the identifier if the problem instance as a parameter in the file name. This short-cut is supported as a property of the Instances declaration, as shown in the comments of Figure §7.4. • Benchmark: An optimization benchmark is a published set of instances of a particular optimization problem. It is specified indicating the download URL and the identifiers of the specific problem instances to be used in the experiment. Figure §7.5 shows a MOEDL document fragment including a benchmark definition. • Instances generator: This enables the automated generation of problem instances. Generators are specified as an executable command and a set of parameters. The specific seed to be used, the number of instances to be generated, and the files that will be generated are usually parameters of this command. Figure §7.6 depicts a sample MOEDL document fragment using a problem instance generator. MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 ... Problem Types : // QoS−aware Composite Web S e r v i c e s Binding QoSWSCB( c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) { O b j e c t i v e f u n c t i o n s : GlobalQoS Instances : //More s u c c i n c t : I n s t a n c e s ( f i l e : ’ ${ i n s t a n c e } . qoswsc ’ ) P0 ( F i l e : ’P0.qoswsc ’ ) // P0 , . . . , P10 ... P10 , { F i l e : ’P10.qoswsc ’} } ... Figure 7.4: Problem instances enumeration supported by MOEDL 162 7.2. MOEDL EXPERIMENTAL DESCRIPTIONS MOEDL: : EXPERIMENT : TSP−Sample v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 ... Problem Type : // T r a v e l l i n g Salesman Problem TSP ( c l a s s : ’es.us.isa.fomfw.problems.tsp.TSPProblem ’ ) { O b j e c t i v e f u n c t i o n s : TourLength Benchmarks : TSPLib ( f i l e : ’${instance }.tsp’ ) { u r l : h t t p : //comopt . i f i . uni−h e i d e l b e r g . de/ s o f t w a r e /TSPLIB95/ t s p / I n s t a n c e s : b e r l i n 5 2 , burma14 , t s p 2 2 5 } ...} Figure 7.5: Optimization benchmarks specification supported by MOEDL MOEDL: : EXPERIMENT : QoSWSCB2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 ... Problem Types : // QoS−aware Composite Web S e r v i c e s Binding QoSWSCB( c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) { Command( n I n s t a n c e s : 5 , seed : 1 , outputPath : ’/tmp/qowswc/’ , f i l e T e m p l a t e : ’I${run}. qowsc ’ ) : ’QoSWSCGen.bat ${seed}, ${outputFile},${nInstances}, ${fileTemplate }’ } ... Figure 7.6: Problem instance generator defined in MOEDL 7.2.3 Metaheuristic techniques The description of each metaheuristic technique applied in MOEs is performed through its Parameters. A parameter in MOEDL is a couple key/value (e.g., populationSize: 100). Parameters can be nested enabling the creation of complex parameters. For instance, in Figure §7.7, the survival policy is described using two parameters: the mainSelector and secondarySelector. The complete grammar of the metaheuristic techniques specification languages is provided in Appendix §B.3 as an EBNF syntax specification. Such grammar specifies two common parameters of any metaheuristic technique, namely the termination criterion and the initialization scheme. Current support for termination criteria specification is described in Section §7.2.4. The initialization scheme defines how will be generated the initial solutions used by the metaheuristic. It can be used for hybridizing metaheuristic techniques, specifying that one metaheuristic is used as initialization scheme of another [267]. Regarding initialization schemes, two alternatives are supported: Random that initializes the technique with random solu- 163 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE tions, and Technique, that uses another metaheuristic technique to generate the initial solutions. MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 ... Optimization Technique : GRASP1(GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL cre ation : RangeBased ( type : , alpha : 0 . 2 5 g−f u n c t i o n : g6 ( c l a s s : ’es.us.isa.qoswsc.G6’ ) ) Local improvement : SD ( ) encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } EACanfora (EA) { I n i t i a l i z a t i o n S c h e m e : Random , populationSize : 100 , mutationProbability : 0 . 0 1 , c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : RouletteWheel , m u t a t i o n S e l e c t o r : RandomSelector , survivalPolicy : PrioritizedCompositeSelector ( mainSelector : E l i t i s t S e l e c t o r ( rate : 2 , absolute : true ) , s e c o n d a r y S e l e c t o r : RouletteWheel ) encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’ } ... Figure 7.7: Metaheuristic techniques specification supported by MOEDL 7.2.4 Termination criterion MOEDL requires the specification of a termination criterion either globally or per metaheuristic technique. The termination criteria currently supported by MOEDL are: • Max Time. It specifies a maximum execution time (in milliseconds), e.g. “MaxTime(10000)” specifies an execution of ten seconds. • Max Iterations. It specifies a maximum number of iterations for the execution of the metaheuristic technique, e.g. “MaxIterations(100)” specifies the execution of 100 iterations. • Max Obj Func. Evaluations. It specifies a maximum number of evaluations for the objective function, e.g. “MabObjFuncEval(100)”. • And: This is a composite termination criterion that takes as parameters a set of subordinated termination criteria. The execution of the metaheuristic technique 164 7.2. MOEDL EXPERIMENTAL DESCRIPTIONS will terminate when all the subordinated termination criteria are met simultaneously. For instance, “MaxTime(10000) AND MaxIterations(100)” specifies that the run will terminate after 10 seconds of executions and 100 iterations has been executed, i.e., we could execute 200 iterations if the metaheuristic iterates very quickly until reaching 10 seconds of execution. • Or: This is a composite termination criterion where the execution of the technique will terminate when one of the subordinated termination criteria is met. For instance, “MaxTime(10000) OR MaxIterations(100)” specifies that the execution will terminate if we reach 100 iterations or 10 seconds of execution, i.e., it could perform only 70 iterations times only if the metaheuristic iterates slowly. • Repeat for: This is a composite termination criterion which introduces a blocking variable in the experiment. Thus, the experiment will be repeated for each subordinated criterion. It is only supported as global termination criterion for the experiment. Figure §7.8 shows a sample MOEDL document fragment including a composite termination criterion with several subordinated criteria (maximum execution times). Each metaheuristic program will be executed five times, one for each maximum execution time, i.e., 100ms, 500ms, 1000ms, 5000ms and 10000ms. MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 ... Optimization Techniques : GRASP1(GRASP) { ... CanforaGA (EA) { ... T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) , MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) ) Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m B a s i c ( seed : 2 1 4 3 5 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) ... Figure 7.8: Global termination criteria and random number generators in MOEDL 165 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE 7.2.5 Random number generation algorithm MOEDL documents should include the definition of the algorithm used to generate random numbers. This can be defined globally (for all the techniques) or locally (for each specific technique) as parameters. The algorithm is specified with an identifier and the name of the class that implements it. When the class is not specified, the default random number generation algorithm of the runtime specified in the configuration is assumed. Figure §7.8 illustrates the definition of a random number generator in a MOEDL document. 7.3 T YPES OF MOE S SUPPORTED BY MOEDL Experiments in MOEDL provide support for the selection, tailoring and tuning phases of the MPS life-cycle. Next, we describe the specific types of experiments supported by MOEDL. 7.3.1 Selection and tailoring experiments These experiments are intended to compare either different metaheuristics techniques (selection) or different variants of the same technique, e.g. EA with one-point crossover vs. EA with uniform crossover (tailoring). This is done by comparing the solutions provided by each technique in a specific set of problem instances. As described in Section §7.2.3, the techniques to be compared in an experiment are described in terms of parameters. However, MOEDL provides syntactic sugar for specifying the set of variants of a tailoring point to be compared in tailoring experiments. For instance, Figure §7.9 shows the description of different variants of GRASP with two different greedy functions (g-function) and three possible local improvement methods. This would translate into the following 6 variants to be compared: • GRASP+SD-1, using SD as algorithm for the local improvement phase of GRASP, and g1 as the greedy function. • GRASP+SD-2, using SD as algorithm for the local improvement phase of GRASP, and g2 as the greedy function. • GRASP+SD-3, using SD as algorithm for the local improvement phase of GRASP, 166 7.3. TYPES OF MOES SUPPORTED BY MOEDL and g3 as the greedy function. • GRASP+TS-1, using Tabu Search as algorithm for the local improvement phase of GRASP, and g1 as the greedy function. • GRASP+TS-2, using Tabu Search as algorithm for the local improvement phase of GRASP, and g2 as the greedy function. • GRASP+TS-3, using Tabu Search as algorithm for the local improvement phase of GRASP, and g3 as the greedy function. MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 . . . // The c o n t e x t i s omitted f o r t h e sake o f b r e v i t y Optimization Technique : GRASP+${ Local improvement}−${g−f u n c t i o n } (GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL cre ation : RangeBased ( alpha : 0 . 2 5 ) g−f u n c t i o n : V a r i a n t s { Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G2’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) , Custom ( ’G3’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) } } Local improvement : V a r i a n t s { SD , TS{ memory : Recency ( 2 5 ) , a s p i r a t i o n : BestImprovement ( ) , termination criterion : MaxIterations (50) } } } encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPExplorableSolution ’ } T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 ) Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m B a s i c ( seed : 6 4 3 2 8 1 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Figure 7.9: Tailoring experiment in MOEDL 7.3.2 Tuning experiments The goal of tuning experiments is to find the best configuration for a single technique, in terms of parameter values. Thus, in these experiments the solutions provided by a single technique with different parameter configurations are compared. The parameter values to be used in a MOEDL tuning experiment are describe using a parameter space. A parameter space comprises of a set of parameter dimensions. 167 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE Each parameter dimension specifies the domain of one parameter of the metaheuristic technique. For instance, Figure §7.10 shows the definition of a three-dimensional parameter space for an evolutionary algorithm. Each parameter has two possible values. Thus, this experiment would result in eight different variants of the metaheuristic program being run, i.e., one for each possible combination of values. MOEDL: : EXPERIMENT : QoSWSCB−AUX1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueTuning , methodology : BasicMOSES , runs : 2 0 ... Optimization Technique : EACanfora (EA) { I n i t i a l i z a t i o n S c h e m e : Random , − crossoverSelector : RouletteWheelSelector , m u t a t i o n S e l e c t o r : RandomSelector , survivalPolicy : PrioritizedCompositeSelector ( mainSelector : ParentsSelector ( selector : ElitistSelector , rate : 2 , absolute : true ) , secondarySelector : OffspringSelector ( s e l e c t o r : RouletteWheelSelector ) ) ), encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_Individual ’ } Parameters Space : Dimensions : p o p u l a t i o n S i z e enum 5 0 , 100 m u t a t i o n P r o b a b i l i t y enum 0 . 0 1 , 0 . 0 5 c r o s s o v e r P r o b a b i l i t y enum 0 . 5 , 0 . 8 Figure 7.10: Tuning experiment in MOEDL 7.4 T RANSFORMATION FROM MOEDL TO SEDL MOEDL documents do not describe fully-fledged experiments since many details such as the variables or the hypothesis are implicit. As a result, MOEDL documents do not include all the information required to automate and replicate MOEs. Besides this, the information included in MOEDL experiments is insufficient to apply the analysis operations for validity checking proposed in Section §6.4. In order to enable the automated execution and analysis of MOEs, we propose a set of transformation rules to transform MOEDL documents into SEDL documents. Given a MOE described in MOEDL, it is possible to define several SEDL descriptions 168 7.4. TRANSFORMATION FROM MOEDL TO SEDL that encode the semantics of such experiment with alternative experimental design properties, such as the set of treatments or the sampling method, that are known in the literature as experimental methodologies [21, 23, 30]. Therefore, different methodologies could lead to different transformation rules from MOEDL to SEDL experiments. The methodology described in this dissertation transforms MOEDL experiments to SEDL experiments including: i ) a differential hypothesis, ii ) a complete blocking factorial experimental design, and iii ) the null hypothesis statistical tests for the analysis described in [69]. The transformations defined in this section are described as correspondences between MOEDL elements and their corresponding SEDL elements. The elements on both sides (source and target) are described in natural language where the context is set to a experiment object denoted as exp. Additionally, for a more comprehensible and succinct representation of the transformation rules, we propose using a syntax inspired in a simplified version of ATL [153] with the following structure: FROM <t y p e o f MOEDL e l e m e n t> WHEN <p r e c o n d i t i o n f o r f i r i n g t h e t r a n s f o r m a t i o n r u l e> TO <t y p e o f SEDL e l e m e n t t o g e n e r a t e> HAVING < p r e d i c a t e s and s t a t e m e n t s a b o u t t h e e l e m e n t s g e n e r a t e d> 7.4.1 Transformation of common elements In this section, we describe the transformation of the elements that are common to all types of MOEDL experiments to SEDL, namely: RULE 1.-The transformation rule copies the elements that belong directly to SEDL or that are not directly related to the experimental methodology used for experimentation. Specifically, the context, the configurations, and the executions and analyses present in the original MOEDL experiment are directly copied. As a consequence, the consistency among the pre-existing analyses copied by the basic transformation and the hypothesis and the design that could be generated by the specific set of rules applied is not guaranteed (since those previous analyses could have been generated through a different set of rules). / / RULE 1 FROM MOEDL: : Experiment ( moe ) TO SEDL : : Experiment ( exp ) HAVING exp . Id=moe . Id+’bySEDLtoMOEDL ’ AND exp . c o n t e x t =moe . c o n t e x t AND exp . name=moe . name AND exp . metaID=moe . metaID AND exp . a n n o t a t i o n s =moe . a n n o t a t i o n s AND exp . n o t e s =moe . n o t e s AND exp . c o n f i g u r a t i o n s =moe . c o n f i g u r a t i o n s AND exp . design=new D e t a i l e d E x p e r i m e n t a l D e s i g n ( ) AND exp . c o n f i g u r a t i o n s . add ( new Configuration ( ’MOEDL2SEDL - Configuration ’ ) 169 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE In the remainder of this chapter, it is assumed that the variables defined in this transformation are available as global variables for defining the remainder transformation rules (namely moe denoting the MOEDL metaheuristic optimization experiment to transform, and exp denoting the SEDL experiment generated). RULE 2.-Currently, the basic transformation from MOEDL to SEDL supports only mono-objective problems, since it generates a single outcome variable named “ObjectiveFunctionValue”2 . The levels of “ObjectiveFunctionValue” are defined by intension, and its value is determined by the domain of the objective function of the problem. In our implementation this variable is rational by default. Actually, this domain is the union of the domains of the objective function for each problem instance in the experiment. Thus, in order to ensure an appropriate interpretation of the values during the analysis, the experiments should contain problem instances with objective functions with simple and compatible domains. Note that if all the instances correspond to the same problem type, this requirement is usually met. FROM MOEDL: : Experiment ( moe ) TO SEDL : : V a r i a b l e ( outcome ) HAVING exp . v a r i a b l e s . c o n t a i n s ( outcome ) AND outcome . r o l e =Outcome AND outcome . Id= O b j e c t i v e F u n c t i o n AND outcome . type= r a t i o n a l AND outcome . domain= F l o a t RULE 3.- For each problem instance an input file will be generated. Additionally, if there is more than one problem type, or a single problem type with multiple instances, then a nominal variable named “instance” is created. The role of “instance” is characteristic (non-controllable factor), and it is used as a blocking variable. The domain of the variable is the set of identifiers of the problem instances. / / RULE 3 / / RULE 3 . a ( I n p u t F i l e s ) FROM MOEDL: : ProblemInstance ( i ) TO SEDL : : Configuration : : Input ( f ) HAVING f . type= F i l e AND f . name= i . f i l e / / RULE 3 . b ( i n s t a n c e v a r i a b l e ) , / / the enumeration contains a l l the i n s t a n c e s d e f i n e d f o r the experiment FROM MOEDL: : ProblemInstanceEnumeration ( i n s t a n c e s ) WHEN i n s t a n c e s . s i z e () >1 TO SEDL : : V a r i a b l e ( i n s t a n c e V a r ) HAVING exp . v a r i a b l e s . c o n t a i n s ( i n s t a n c e V a r ) AND i n s t a n c e V a r . r o l e = C h a r a c t e r i s t i c AND outcome . Id= i n s t a n c e AND outcome . type=Nominal 2 For multi-objective problems a variable for each specific objective should be defined. Additionally, a variable denoting the metric used for technique comparison, such as the hypervolume, pareto front error, etc., should be defined. 170 7.4. TRANSFORMATION FROM MOEDL TO SEDL AND outcome . domain= f o r e a c h i n s t a n c e i n i n s t a n c e s , { i n s t a n c e . problemType . Id+’-’+ i n s t a n c e . Id } AND exp . design . b l o c k i n g V a r i a b l e s . c o n t a i n s ( i n s t a n c e V a r ) RULE 4.- Global random number generation algorithm and termination criterion become parameters of the algorithm. / / RULE 4 . a G l o b a l Random Number G e n e r a t o r FROM MOEDL: : Random Number Generator ( rng ) WHEN moe . i s G l o b a l ( rng ) TO SEDL : : Parameter ( p ) HAVING exp . c o n f i g u r a t i o n s −>f i n d ( ’MOEDL2SEDL - Configuration ’ ) . parameters . c o n t a i n s ( p ) ) AND p . name=’RandomNumberGenerator ’ AND ( p . value=rng . c l a s s +rng . f u n c t i o n ) / / RULE 4 . b G l o b a l T e r m i n a t i o n C r i t e r i o n FROM MOEDL: : T e r m i n a t i o n C r i t e r i o n ( t c ) WHEN i n s t a n c e s . s i z e () >1 TO SEDL : : Parameter ( p ) HAVING exp . c o n f i g u r a t i o n s −>f i n d ( ’MOEDL2SEDL - Configuration ’ ) . parameters . c o n t a i n s ( p ) ) AND p . name=’TerminationCriterion ’ AND p . value= t c . t o S t r i n g ( ) 7.4.2 Transformation of Techniques Comparison Experiments The rules for the basic transformation from a MOEDL techniques comparison experiment into plain SEDL experimental descriptions are enumerated next: The hypothesis of technique comparison of experiments is that the specific technique used for optimization has an impact on the value of the objective function of the solutions obtained, i.e., some techniques perform better that others. RULE TCE-1. For technique comparison experiments, a factor variable named “technique” and a differential hypothesis related to “ObjectiveFunctionValue” and to “technique” are generated’. The factor “technique” is a nominal variable whose levels are defined as by extension as the set of labels that identify each technique described in the. The specific details of the configuration used for each technique are expressed as parameters of the experiment with name “${technique.Id}-Configuration”. / / RULE TCE−1 FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0 / / TCE − 1. a : F a c t o r V a r i a b l e TO SEDL : : V a r i a b l e ( t e c h ) HAVING t e c h . Id=’Techique ’ AND t e c h . r o l e = F a c t o r AND t e c h . type=Nominal AND t e c h . domain={ f o r e a c h moe . t e c h n i q u e s t e c h i q u e , t e c h n i q u e . Id } / / TCE − 1. b : c o n f i g u r a t i o n o f t e c h n i q u e s a s e x p e r i m e n t a l c o n s t a n t s ( p a r a m e t e r s ) AND exp . design . parameters . addAll ( { f o r e a c h moe . t e c h n i q u e s t e c h v a r , new Parameter ( ’#’+ t e c h v a r . Id+’-Configuration ’ , t e c h v a r . parameters ) }) / / TCE − 1. c : D i f f e r e n t i a l H y p o t h e s i s 171 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE TO SEDL : : D i f f e r e n t i a l H y p o t h e s i s ( dh ) HAVING exp . h y p o t h e s i s =dh AND dh . outcome=ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ ) AND dh . f a c t o r s ={ t e c h } RULE TCE-2. Regarding the experimental design, the transformation encodes a basic methodology with a complete blocking factorial design, where all the variables that are neither outcomes nor factors are blocking variables; i.e. experiments are replicated for all possible combination of levels of such variables. This technique is used for dealing with experiments that specify multiple problem instances in the previous section. The design contains as many groups as optimization techniques, whose size is the number of runs specified in the MOE. The methodology implements a simple experimental protocol base on a sequence of treatment and measurement in all the groups (the treatment assigns the variable “technique” the level corresponding to the specific technique associated with the group. / / RULE TCE−2 ( F u l l y b l o c k i n g f a c t o r i a l e x p e r i m e n t a l d e s i g n ) FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0 TO SEDL : : F u l l y S p e c i f i e d E x p e r i e m n t a l D e s i g n ( ddsgn ) HAVING exp . design . d e t a i l e D e s i g n =ddsgn AND ddsg . assignmentMethod=new RandomAssignmentMethod ( ) / / One Group p e r T e c h n i q u e AND ddsg . groups . addAll ( f o r e a c h moe . t e c n i q u e s tech , new Group ( ’G-’+ t e c h . Id , / / The g r o u p i s a s s o c i a t e d w i t h an s p e c i f i c t e c n i q u e run new V a l u a t i o n ( exp . design . v a r i a b l e s . f i n d ( ’Tecnique ’ ) , t e c h . Id ) ) , moe . NRuns ) / / The s i z e o f t h e g r o u p i s t h e number o f r u n s / / Experimental protocol : AND ddsg . p r o t o c o l =new E x p e r i m e n t a l P r o t o c o l ( ) AND f o r e a c h ( f o r e a c h ddsg . group g , / / 1. − T r e a t m e n t ( t e c h . run ) , t h e v a r i a b l e v a l u a t i o n o f t h e g r o u p d e t e r m i n e s t h e t e c h . ddsg . p r o t o c o l . addStep ( g , new Treatment ( g . v a r i a b l e V a l u a t i o n s ) ) , / / 2. − Measurement o f t h e o b j . f u n c . f o r t h e b e s t s o l u t i o n f o u n d . ddsg . p r o t o c o l . addStep ( g , new Mesaurement ( ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ ) RULE TCE-3. Regarding experimental analysis specification, the basic methodology implemented through the transformation follows the guidelines described in Section §3.4.2. Consequently, the transformation generates the specification of factorial ANOVA with repeated measures as primary analysis technique. If the premises of ANOVA are not met, Friedman with the Holms post-hoc procedure is specified as secondary analysis option. This analysis methodology is consistent with most of the guidelines provided in the literature [69, 116, 195, 248]. In this case, the filtering is performed for each level of the “technique” variable, usually generating a multiple comparison test among datasets of objective function values obtained with different 172 7.4. TRANSFORMATION FROM MOEDL TO SEDL optimization techniques3 . / / RULE TCE−3 A n a l y s i s S p e c i f i c a t i o n FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0 TO SEDL : : E x p e r i m e n t a l A n a l y s i s S p e c i f i c a t i o n ( ans ) HAVING exp . design . a n a l y s i s S p e c i f i c a t o n =ans / / F a c t o r i a l ANOVA w i t h r e p e a t e d m e a s u r e s a s p r i m a r y a n a l y s i s AND ans . addAnalysis ( new NHST( ’FactorialANOVAwRD ’ , / / D a t a s e t f o r t h e a n a l y s i s ( a c c o r d i n g t o t h e d e s i g n c r e a t e d by r u l e TCE− 2) new DataSetSpec ( { new F i l t e r S e t ( exp . design . v a r i a b l e s . f i n d ( ’technique ’ ) ) , new Grouping ( exp . design . v a r i a b l e s . f i n d ( ’instance ’ ) ) , new P r o j e c t i o n ( exp . design . v a r i a b l e s . f i n d ( ’ObjectiveFunction ’ ) ) } ) , / / D e f a u l t a l p h a i s 0 . 0 5 ( t h i s param . i s c o n f i g u r a b l e i n t h e a c t u a l i m p l . ) 0 . 0 5 ) ) . add ( new PostHoc ( ’Tukey ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) / / F r i e d m a n w i t h Holms p o s t −h o c a s s e c o n d a r y a n a l y s i s AND ans . addAnalysis ( new NHST( ’Friedman ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) . add ( PostHoc ( ’Holms ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) ) ) 7.4.3 Transformation of technique tuning experiments The rules for our transformation of MOEDL tuning experiments into plain SEDL experiment descriptions are shown next: The hypothesis of this kind of experiments is that the parameter values used for optimization have an impact on the value of the objective function of the obtained solutions, i.e., that some configurations are better than others for the set of problem instances of the experiment. A complete blocking factorial design is generate, where the hypothesis has a single factor variable named configuration, whose levels are the combinations of possible values of each dimension of the parameters space. RULE TTE-1. For technique parametrization experiments a single factor variable named “configuration” is created. This variable denotes the combination of parameter values used for optimization. A differential hypothesis related to “ObjectiveFunctionValue” and to “configuration” is also created. The factor “configuration” is a nominal variable whose levels are defined as by extension as the set of points in the parameters space; i.e., all possible combination of values of the parameter dimensions. The specific value of each parameter dimension is associated to the corresponding level of 3 In our current implementation the transformation takes into account the number of levels of “technique” and generates the corresponding single comparison tests for simple comparisons. 173 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE Figure 7.11: Transformation from MOEDL to SEDL 174 7.4. TRANSFORMATION FROM MOEDL TO SEDL “configuration” as part of their definition4 . / / RULE TTE−1 FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0 / / TPE − 1. a : F a c t o r V a r i a b l e f o r c o n f i g u r a t i o n s TO SEDL : : V a r i a b l e ( c o n f i g ) HAVING c o n f i g . type=Nominal AND t e c h . Id=’configuration ’ AND t e c h . r o l e = F a c t o r AND t e c h . domain=Permutations ( {moe . parametersSpace . dimensions } ) / / TPE − 1. b : D i f f e r e n t i a l H y p o t h e s i s TO SEDL : : D i f f e r e n t i a l H y p o t h e s i s ( dh ) HAVING exp . h y p o t h e s i s =dh AND dh . outcome=ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ ) AND dh . f a c t o r s ={ c o n f i g } RULE TTE-2. For technique tuning experiments, the specific details of the constant elements of the configuration for the optimization technique technique are stored in a of the experiment with name “Technique-Configuration”. / / RULE TTE−2 FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0 TO SEDL : : Parameter ( t e c h ) HAVING t e c h . Id=’’ T e c h n i q u e −Configuration ’’ AND t e c h . value=moe . t e c h n i q u e s [ 0 ] . parameters AND exp . design . parameters . add ( t e c h ) RULE TTE-3. The transformation encodes a basic methodology with a complete blocking factorial design, where the “instance” variable is used as blocking factor. The design contains as many groups as levels has the “configuration” variable, whose size is the number of runs specified in the MOE. The methodology implements a simple experimental protocol base on a sequence of treatment and measurement in all the groups (the treatment assigns the variable “configuration” to the level corresponding to the specific parameter values associated with the group. / / RULE TTE−3 ( F u l l y b l o c k i n g f a c t o r i a l e x p e r i m e n t a l d e s i g n ) FROM MOEDL: : Experimet ( moe ) WHEN moe . t e c h n i q u e s . s i z e () >1 / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n TO SEDL : : F u l l y S p e c i f i e d E x p e r i e m n t a l D e s i g n ( ddsgn ) HAVING exp . design . d e t a i l e D e s i g n =ddsgn AND ddsg . assignmentMethod=new RandomAssignmentMethod ( ) / / One Group p e r T e c h n i q u e AND ddsg . groups . addAll ( f o r e a c h Permutations ( {moe . parametersSpace . dimensions } ) c o n f i g , new Group ( ’G-’+ c o n f i g . Id , / / The g r o u p i s a s s o c i a t e d w i t h an s p e c i f i c p a r a m e t e r c o n f i g u r a t i o n new V a l u a t i o n ( exp . design . v a r i a b l e s . f i n d ( ’# configuration ’ ) , c o n f i g . Id ) ) , moe . NRuns ) / / The s i z e o f t h e g r o u p i s t h e number o f r u n s 4 In our transformation the generation of such set of values is performed through the Permutations function, that provides the set of permutations for the space dimensions as levels, and configures each level with the specific value for each dimension. The identifiers of the level are generated sequentially. 175 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE / / Experimental protocol : AND ddsg . p r o t o c o l =new E x p e r i m e n t a l P r o t o c o l ( ) AND f o r e a c h ( f o r e a c h ddsg . group g , / / 1. − T r e a t m e n t ( t e c h . run ) , t h e v a r i a b l e v a l u a t i o n o f t h e g r o u p d e t e r m i n e s t h e c o n f i g . ddsg . p r o t o c o l . addStep ( g , new Treatment ( g . v a r i a b l e V a l u a t i o n s ) ) , / / 2. − Measurement o f t h e o b j . f u n c . f o r t h e b e s t s o l u t i o n f o u n d . ddsg . p r o t o c o l . addStep ( g , new Mesaurement ( ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ ) RULE TTE-4. Regarding experimental analysis specification, the basic methodology implemented through the transformation follows the guidelines described in Section §3.4.2. Consequently, the transformation generates the specification of factorial ANOVA with repeated measures as primary analysis technique. If the premises of ANOVA are not met, Friedman with the Homls post-hoc procedure is specified as secondary analysis option. This analysis methodology is consistent with most of the guidelines provided in the literature [69, 116, 195, 248]. The only difference between this transformation rule and rule TTE-3 is the way results are filtered for comparison. In this case, the filtering is performed for each level of the “configuration” variable, usually generating a multiple comparison test among datasets of objective function values obtained with different configurations5 . / / RULE TTE−4 A n a l y s i s S p e c i f i c a t i o n FROM MOEDL: : Experimet ( moe ) / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0 TO SEDL : : E x p e r i m e n t a l A n a l y s i s S p e c i f i c a t i o n ( ans ) HAVING exp . design . a n a l y s i s S p e c i f i c a t o n =ans / / F a c t o r i a l ANOVA w i t h r e p e a t e d m e a s u r e s a s p r i m a r y a n a l y s i s AND ans . addAnalysis ( new NHST( ’FactorialANOVAwRD ’ , / / D a t a s e t f o r t h e a n a l y s i s ( a c c o r d i n g t o t h e d e s i g n c r e a t e d by r u l e TCE− 2) new DataSetSpec ( { new F i l t e r S e t ( exp . design . v a r i a b l e s . f i n d ( ’configuration ’ ) ) , new Grouping ( exp . design . v a r i a b l e s . f i n d ( ’instance ’ ) ) , new P r o j e c t i o n ( exp . design . v a r i a b l e s . f i n d ( ’ObjectiveFunction ’ ) ) } ) , / / D e f a u l t a l p h a i s 0 . 0 5 ( t h i s param . i s c o n f i g u r a b l e i n t h e a c t u a l i m p l . ) 0 . 0 5 ) ) . add ( new PostHoc ( ’Tukey ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) / / F r i e d m a n w i t h Holms p o s t −h o c a s s e c o n d a r y a n a l y s i s AND ans . addAnalysis ( new NHST( ’Friedman ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) . add ( PostHoc ( ’Holms ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) ) ) 5 In our current implementation the transformation takes into account the number of levels of “configuration” and generates the corresponding single comparison tests for simple comparisons 176 7.5. EXTENSION POINTS 7.5 E XTENSION POINTS MOEDL provides the following extension points: • Metaheuristic optimization experiment. The main extension point of MOEDL is the own experiment definition. Thus new types of metaheuristic optimization experiments with additional sections can be created. • Problem type and Instances specification. This extension point enables the integration of DSLs for specifying optimization problems an instances. Through this extension point languages such as AMPL [102], GAMS [70] or MPS could be integrated into MOEDL. • Metaheuristic Optimization technique. This extension point enables the integration of DSLs for specifying metaheuristic optimization techniques, including their tailoring and tuning. The specific languages for used by the MOFs identified in Chapter §5 could be integrated into MOEDL through this extension point. 7.6 S UMMARY In this chapter, a language for describing MOEs named MOEDL has been proposed. Its main elements have been introduced through a number of sample experimental descriptions using a plain text syntax. MOEDL supports the description of MOEs in an the expressive, succinct and precise way. This language paves the way for the automated execution of MOEs, enabling the partial validation of internal consistency automatically through the analysis operations described in Section §6.4. A brief description of MOEDL was published in [215]. 177 7.6. SUMMARY 179 CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE 180 8 MOSES: A M ETA - HEURISTIC O PTIMIZATION S OFTWARE E COSYSTEM Do you know what my favorite renewable fuel is? An ecosystem for innovation. Thomas Friedman , 1953 – American Journalist n this chapter we present a software ecosystem for the development of MPS-based solutions (MOSES). In Section §8.1 we introduce the chapter motivating the need for a software ecosystem. In Section §8.2 the key features of the ecosystem are enumerated. The design of the ecosystem is detailed in Section §8.3. In this chapter we also present a reference implementation of the ecosystem and its main components. Section §8.5 depicts our vision of a fully fledged software ecosystem for metaheuristic optimization. Finally, in Section §8.6 we summarize the chapter. I 181 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM 8.1 I NTRODUCTION The survey of metaheuristic frameworks presented in Chapter §5 revealed several limitations and obstacles in the current tool support for the MPS and MOE lifecyles, among others: • There are too many MOFs. Up to 34 MOFs were identified in the literature. This proliferation of MOFs is not sustainable. In fact, about a half of those tools have been discontinued or abandoned. • There is no universal MOF. There is no MOF that support all the metaheuristic techniques proposed. Also, the features of MOFs are disperse, and strong discrepancies appear between the scores at different areas. • There is a lack of interoperability and reuse between MOFs. The only integration feature supported by some MOFs is the capability of loading standard problem format instances. • MOFs are big already, and they are growing. Most MOFs have hundreds of classes (c.f. Figure §5.6). The size of a framework is an indirect measure of its complexity and therefore of the steepness of its learning curve. • MOFs are evolving from software frameworks to implement metaheuristics toward optimization problem solving packages with a minimal experimentation support. There exists a trend to extend MOFs with an ampler set of features for supporting more activities in the MPS life-cycle and the experimentation process. MOFs are adding capabilities for experiment design, charting and reporting, batch execution and monitoring of jobs, and statistical analysis among others. • There is no widespread adoption. Except for ECJ[179], ParadisEO [41] and HeuristicLab [283], MOFs are used mainly by its developers, with a small number of external users. Summarizing, there exists a scenario were a number of organizations create, maintain and evolve competing software (the MOFs) that try to support a similar set of processes (the MPS lifecycle). These tools are not only competing, but also they are complementary since no one provides support for all the activities and variants of the MPS and MOE lifecycles. This creates a complex environment with high variability 182 8.2. FEATURES and absence of interoperability, where portability and reuse of the developments and experiments among MOFs is extremely difficult. Considering the current of state of the art, it does not seem that proposing yet another MOF may contribute to facilitate the development of MPS-based solutions. Thus, in this dissertation we adopt a different approach: we model the set of tools supporting the MPS life-cycle as a software ecosystem that we have coined as Metaheuristic Optimization Software EcoSystem (MOSES). Software ecosystems are an emerging topic in software engineering promoting the development of families of products developed by independent contributors [53]. This approach enables embracing the high variability and multi-organizational structure of MPS in a natural way. It also supports the integration of the different tools that can be both competitors and complementary into the architecture of the ecosystem. In the following sections, we describe the key features of MOSES outlining the main aspects of its design. Then, we present MOSES[RI], a reference implementation of the ecosystem, detailing its main components. Finally, we present our vision of the future of MOSES as an extensible integration development environment for the implementation of MPS-based solutions. 8.2 F EATURES Key features taken into account in the inception and development of MOSES can be summarized differentiating among functional and non-functional features. Functional features define what must be supported while non-functional features define constraints on how that support should be provided. The functional features required by our ecosystem are described below: FF.1 Aid and support for the design and development of metaheuristic algorithms: Executing the MPS life-cycle requires implementing one or several metaheuristic programs. As described in Chapter §5 this feature is currently fulfilled by MOFs. We consider this feature as mandatory. FF.2 Experimental design support: Since several activities of the MPS life-cycle involve decision-making through experimentation, the activities of the experimental process should also be supported by the ecosystem. Consequently, some guidance in the design of experiments for the most common decisionmaking experiments used in the MPS life-cycle should be provided. As a first 183 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM approach to support this feature, the ecosystem should enable the automated analysis of experiments and lab-packs using the analysis operations described in Chapter §6. FF.3 Experiment execution support: Once the metaheurisic algorithm is implemented and a suitable experimental design has been identified, the experiment must be conducted. Thus, the ecosystem should automate such execution, and allow users to monitor an manage the optimization process involved. Thus, the ecosystem should support the automated condution of MOEs. FF.4 Results analysis support: After experiment execution, the results obtained must be analyzed in order to disprove or confirm the hypothesis of the experiment. Such analysis is usually performed in the case of metaheuristic experiments by means of statistical techniques. The specific statistical tests that must be used in the analysis of results depends both on the experiment design, and on the number and nature of the variables of the experiment. As a result, the ecosystem should support as many statistical tests as possible, and aid in the appropriate selection and use of the test for a given dataset or lab-pack. FF.5 Report generation: The ecosystem should provide reports and charts on the results of experiments, such as histograms, tables, etc. Regarding the non-functional features required by our ecosystem we have identified the following: NFF.1 High performance: Since one of our goals is to speed up the execution of the MPS life-cycle, the architecture of the software ecosystem should promote a high performance. This requirement suggests the use of distributed an parallel computing capabilities. This feature is specially important for experimental conduction, since it can take weeks to complete the conduction some experiments. NFF.2 Incremental deployment: The architecture of the ecosystem will have a number of components, and each implementation of such components can provide a specific set of features in order to fulfil its responsibilities. This structure creates a high degree of variability, with a high number of possible configurations. For instance, R + JCLEC + R, SPSS + HeuristicLab + Excel, FOM + STATService. The ecosystem should support the installation/deployment and un-installation/un-deployment of components as required by 184 8.2. FEATURES its users. Each component of the ecosystem should be able to work in isolation, and integrate with other components as they are deployed/installed. This implies keeping the dependency relationships among the components untangled, and protect its logic from failures in the related components, in order to avoid the “nothing works until everything work” syndrome. Since services are deployed, managed and versioning independently, the adoption of a service oriented architecture could help us to support this feature. NFF.3 Self-awareness & Self-description: The architecture of the software ecosystem should enable the enumeration of the components available in the ecosystem, and evaluate if the set of features provided is enough for executing the specific tasks required to perform the activities required. In order to enable such level of self-awareness, each component should be able to describe its supported features and evaluate its ability to successfully fulfill a given request. For instance, a specific statistical analysis component should provide a mechanism to answer the following question: Can you perform a nonparametric multiple comparison analysis of a dataset with a binary dependent variable and a single active independent variable with 3 levels?. NFF.4 Interoperability: The inter-organizational nature of the software ecosystem increases the probabilities of requiring the integration and automation of information flows. As a consequence interoperability is a must for MOSES. NFF.5 Openness: (a) User openness: At least one open source implementation of each component should be available, in such a way that full support of the MPS life-cycle is available to users with zero licensing costs. (b) Contributors openness: The variability of the applications, and different variants and configurations of metaheuristics is huge. For instance, readers can refer to the alternatives for implementing the crossover operator enumerated in table §A.2. As a consequence, the ecosystem should be open to incorporate new implementations of its components, and some of those implementations should also be open to incorporate changes by third parties, in order to fulfil their specific needs on their application domains. In order to maintain the ecosystem fully functional, such variability must be modelled and controlled. Introducing the appropriate variation points in the ecosystem architecture and in the underlying implementations is crucial to achieve such 185 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM flexibility and control. In this context users can extend and customize their behaviour without losing stability or features that are not customized. This flexibility is known as the open-close principle, and is usually stated as: “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification” [188]. The fulfilment of this requirement by the tools implies the clear definition of the extension points and interfaces to be used by the contributors. (c) License model: Our experience in developing industry-ready tools (over 14 tools in last 5 years1 ) has caused us to reflect upon many aspects [270]. One of the lessons learned is that distributing or licensing our tools as Open Source Software to a certain extent facilitates not only their adoption by third-parties but also its upkeep by some of these users. On occasions, some of these third-parties find it interesting to contribute with the tool evolution and maintenance by adding new features or evolving existing ones; or even creating new related tools in order to build a software ecosystem. Currently, MOSES is licensed as LGPLv3 [147]. 8.3 R EFERENCE A RCHITECTURE 8.3.1 Architectural Style A plethora of architectural styles and patterns has been proposed in the literature [25, 53, 103], where systems can exhibit a combination of one or more styles and patterns. Service Oriented Architecture (SOA) is proposed as the architectural style for the ecosystem, since it provides a elements to achieve a majority of the non-functional features stated above. SOA defines the architecture of a software system as a collection of distributed components that provide or consume services. Those services are reusable, distributed, loosely coupled, and accessible using standardized protocol and data formats. SOA integrates in a natural way into the multi-organizational and distributed nature that is typical of software ecosystems, since services can be developed, maintained, deployed and consumed by different organizations. Adopting SOA allows the fulfilment of features NFF.2, NFF.3 and NFF.4. It supports the fulfilment of NFF.2 since services are loosely coupled and independently versioned, and deployed. It supports the fulfilment of NFF.3 since it introduces mechanisms for 1 www.isa.us.es/tools 186 8.3. REFERENCE ARCHITECTURE registering and querying the available implementations of a service. Since services are technologically agnostic by definition, NFF.4 is also fulfilled. Regarding features NFF.1 and NFF.5, adopting SOA provides both advantages and drawbacks regarding requirement NFF.1. On the one hand, the distributed nature of the SOA and the routing and mediation mechanisms it supports enables the parallel and distributed use of several service implementations, which contributes to achieve high performance. On the other hand, the use of standard protocols and distributed computing mechanisms for invocation introduces latency and overhead on each invocation. Since optimization problem solving with metaheuristics implies usually a high computational cost, authors consider that the adoption of SOA is convenient regarding feature NFF.1. However, the use of the infrastructure elements required for achieving some of the advantages of SOA can undermine NFF.5a, since the available open source implementations are heavyweight and require powerful computing platforms. Moreover, the use of such infrastructure elements create a complex development and configuration scenario which also undermines NFF.5b. Furthermore, the combined use of SOA and web services standards provides responses to some of the software architecture challenges that the creation of software ecosystems implies [36], namely: • Interface Stability: When using web services standards such as WSDL+XML Schema, the room for backward compatible changes on the services’ contracts is wide [83]. The independent versioning and deployment of the services allows maintaining multiple versions of the same service whilst the backward compatibility constraints is met. Reducing the impact of changes on the interface, through the use of more granular independently maintained services. This challenge is intimately related to that of extensibility, since the use of the extensibility mechanisms of the standards2 , allows maintaining the general interface of the service stable while extending the functionality provided through it. • Integration of external contributions: The integration challenge in the context of software ecosystems has three dimensions: user interface, workflow and data. Using SOA and web service standard allows addressing this challenge at two out of those dimensions. In particular, it provides standards for integrating data using XML, JSON, etc., and integrating workflows through standardized service 2 And more specifically XML Schemas, which is used to describe the structure of the input and output parameters of the operations in a web service. 187 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM Figure 8.1: Template MOSES component interfaces and orchestration engines and languages like BPEL. 8.3.2 Abstract Component View The abstract component view of MOSES specifies the components of its architecture by describing the interface implementations3 they must provide and they can require. Thus, this view does not provide information about any implementation in particular, i.e., it is implementation agnostic or abstract. In order to fulfil features NFF.2, NFF.3 each MOSES component provides two optional interfaces named Validator and CapacityEvaluator, and requires an optional interface named Binder (see Figure §8.1)The validator interface has the responsibility of checking the syntactic and semantic correctness of the information used in the domainspecific services provided by the components. Validator interface enables the integration of the property checking mechanisms described in Chapter §6 into MOSES by incorporating these checks into the corresponding validation logic. Moreover, it constitutes a mechanism to protect some participants from the failures that could happen in others and therefore, it contributes to fulfil requirement NFF.2. CapacityEvaluator interface supports the partial fulfillment of requirement NFF.3 by defining among the wide set of functional features that could be provided by one participant, the specific subset that a specific component does actually provide. For instance, the ExperimentalDataAnalyzer participant has the responsibility of performing 3 For the sake of brevity we refer to interface implementation simply as interface except when confusions may appear. 188 8.3. REFERENCE ARCHITECTURE analyses on the data generated by experiment executions. However, the set of analyses on experimental data is huge, and not all those analysis are required in a specific area (for instance MOEs are usually analysed using statistical test of hypothesis). Consequently, a software component such as STATService playing the role of ExperimentalDataAnalyzer can define if it can perform a specific analysis in a experimental data set through its capacity evaluator interface. It is worth mentioning that this interface also contributes to the fulfilment of requirement NFF.2 since the invocation of an analysis that is not supported by the ExperimentalDataAnalyzer would result in an error. For the sake of clarity, the ports of interfaces provided are shown at the left, the ports of services required are shown at the right, and the ports of the general purpose services (Validators, CapacityEvaluators, and Binder) are shown at the bottom. A template of component showing this distribution is depicted in Figure §8.1 as an UML component diagram. The abstract component view of MOSES is depicted in Figure §8.2 as an UML Component Diagram using the SOAML profile [256]. It is important to note that each element in this architectural diagram is a participant, that defines a component type. Consequently, each participant can be implemented by different tools on different instantiations of the ecosystem. Those elements stereotyped as “core” conform MOSES [CORE]. The set of components depicted in Figure §8.2 are described below: Metaheuristic Development Platform. It is a core participant since it provides the basic mechanisms for incorporating problem’s knowledge into the a metaheuristic and implementing metaheuristic algorithms. Consequently, any software component playing the role of this participant should support the use of one or more MOFs. This component provides two specific interfaces named ProjectGenerator and Packager. The project generation interface is responsible of creating development projects based on MOEDL MetaheuriticOptimizatonExperiments. Those projects enable the implementation of metaheuristic algorithms by developers. The MOFs supported by MOSES would beused for such implementation, and consequently the corresponding software artefacts should be included in the project as libraries, dependences or references (depending on the implementation technology and project type). The packaging service is responsible of packaging such development projects into a SEA lab-pack that could be used to interact with the remainder participants of the 189 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM Figure 8.2: Components of MOSES 190 8.3. REFERENCE ARCHITECTURE ecosystem. ExperimentalExecutor. It is a core participant since it has the responsibility of executing experiments. In so doing, it requires storing the information regarding the experiment as a SEA lab-pack. Consequently, this provides two specific interfaces named Executor and Deployer. The former is responsible of executing an experiment with a specific configuration. According to SEA the experiment should be described as a SEDL BasicExperiment or a domain-specific experiment type that extends SEDL Experiment. This constitutes an important variation point, where different software component playing the role of this participant in a Particular ecosystem instances could provide support for different domain-specific experiment types. Consequently, this participant provides interfaces CapacityEvaluator<Experiment> and Validator<Experiment>. The execution of an experiment is a process that can take a long time and that the transmission of labpacks for service invocation in distributed environments can involve a heavy load4 . Consequently, in order to fulfil NFF.1, the participant supports the deployment of experiment lab-packs in this execution environment the Deployer interface. A Validator<SEALabpack> interface is also provided. This participant can optionally consume the services provided by most of the remainder participants of the ecosystem, in order to automate the publication of experimental lab-packs in repositories (through a Deployer interface), the analysis (through a Analyzer interface) and the generation of reports (through a ReportGenerator interface) for the results of the executions performed. ExperimentalDataAnalizer. It is a core participant since statistical analysis and testing of hypotheses is essential for getting the right conclusions according to the results obtained in the experiments. It provides the interface Analyzer. Consequently, it provides the interfaces CapabilityEvaluator<AnalysisRequest> and Validator<AnalysisRequest>. This participant can optionally requires a ReportGenerator interface in order to generate reports from the analyzed data. The set of additional participants defined in of MOSES are: ExperimentalRepository. This participant has the responsibility of storing experimental information and allowing its querying and retrieval. The capability of de4 The size of the lab-packs of our application experiments is in the order of tens of megabytes. 191 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM ploying and retrieving experimental information is achieved by exposing a Deployer service, that enables the upload and download of SEA lab-packs. The capability of querying the experimental lab-packs deployed is achieved through the Finder service provided. There exists some tools that fulfil partially the role of this component, such as the Optimization knowledge base [244] or the reproducible research repositories [87, 202, 218, 223]. . ExperimentalDesigner. A proper experimental design is essential to get the right answers to out research questions and solve problems at hand. This participant is responsible of aiding in the creation appropriate experimental descriptions. It provides the interface Designer that generates experimental descriptions based on DSLs (such as MOEDL) and other kinds of experiment specifications (typically less formal than SEDL). Moreover, through the DesignExpander interface this participant is responsible of expanding experimental designs; i.e., it takes a PredefinedExperimentalDesign and returns a FullySpecifiedExperimentalDesign as output. It is worth noting that there exists types of experimental designs whose expansion is not possible. For instance, the treatments and measurements generated by a RacingAlgorithmDesign and some ResponseSurfaceDesign are not predectible until the experiment is executed. Additionally, this participant provides the interface Validator<ExperimentalDesign> and a capacity evaluators for each exposed service. ExperimentalReportGenerator. This participant has the responsibility of generating charts and reports in an automated or semi-automated way. It helps the actors to draw conclusions from the results of the experiment and make decisions in the context of the MPS life-cycle. Sample tools that could play the role of this participants are plugins or scripts for ofimatic packages, or ad-hoc modules. It provides a unique domain specific interface named ReportGenerator, generic interfaces for validating requests and evaluating the capability of generating reports by specific implementations. 8.4 MOSES R EFERENCE I MPLEMENTATION (MOSES[RI]) In this section we present a reference implementation of MOSES, MOSES [RI]. This implementation is composed of three main components, a MOF (FOM), an Experimental Execution Platform (E3) and an statistical analysis tool (STATService). FOM was fully described in Chapter §5, and STATService is below in this chapter.E3 enables 192 8.4. MOSES REFERENCE IMPLEMENTATION (MOSES[RI]) Figure 8.3: MOSES[RI] component diagram the automated analysis and execution of SEDL and MOEDL experiments. For details about E3 and SEA we refer the reader to Appendixes §F and §E respectively. Figure §8.3 depicts an UML component diagram of MOSES-RI showing how it extends the main components of MOSES. As illustrated, FOM plays the MetaheuristicsDevelopmentEnvironment role, E3 plays the ExperimentExecutor role and STATService plays the ExperimentalDataAnalyzer role. The current version of MOSES [RI] define components, interfaces and exchange data formats. However, all these elements must still be integrated to release all their potential. This is part of our future work. Figure §8.8 depicts a prototype interface of our vision of such an integrated development environment for MPS-based solutions. In such an integrated environment users could create MPS projects for solving optimization problems, choosing the specific MOF to be used for implementation. The system could aid in this decision using the data form of comparative framework, showing which MOFs provide better support for the techniques that the user plans to apply. Additionally, the system could integrate E3 and STATService in order to automate the execution and analysis of the experiment. 193 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM Figure 8.4: MOSES[RI] deployment diagram 8.4.1 STATService STATService is a suite of on-line software tools to perform the most usual Null Hypotheses Statistical Tests (NHST) in the field of metaheuristics. The tool provides several user interfaces, including a web portal, which makes it extremely easy to use. The tool supports a variety of both parametric and non parametric tests and an integrated decision tree which selects automatically the most suitable tests for the input dataset. Additionally, the tool assists on the interpretation of the results (using colours, tables and graphs) which ease drawing conclusions even for amateur users. The web portal of STATService is available at http://moses.us.es/statservice. Interfaces STATService can be used using four different interfaces, namely: • A distributable open-source java package with all the implementations of the tests in order to be integrated into other java applications. • A web portal that allows importing data and applying the tests and post-hoc analyses from any standard browser. The input data format supported are: comma separated values (CSV), plain text with user-defined delimiters, and MS Excel spreadsheets. • XML web services that allow the programmatic invocation of STATService from any computer platform and programming language in a distributed and standards- 194 8.4. MOSES REFERENCE IMPLEMENTATION (MOSES[RI]) Figure 8.5: Architecture and users of STATService based way. • A MS Excel plugin which (like the web portal) is aimed to ease the use of STATService directly through the interface of the spreadsheet. Design The architecture of STATService is described as an UML component diagram in Figure §8.5. The diagram has been decorated with additional images, that are used to better describe their elements. The architecture of STATService is conceived for creating a distributed system, where users can access the functionality of statistical analysis ubiquitously through Internet. Figure §8.5 shows two different nodes represented as 3D boxes, namely: the server where STATService is deployed and the computer of the user. The XML web services implements the interface defined by the ExperimentalDataAnalizer participant, which integrates STATService as a component of MOSES. This allows a seamless interaction with other MOSES components independently of the clients platform and development technology. Both the web and XML interfaces use a common core implementation of the statistical analysis logic. This architecture reduces the implementation burden, promotes reuse, and ensures the consistence of results between both interfaces. The core test 195 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM implementation consists of a set of statistical test developed by authors, and a refactoring and re-design of the source base provided by the SCI 2 S research group as a companion to their papers [111]. Additionally, this core tests implementation integrates some test from the JavaNPST library [69] and from the statistics package of the Apache Commons Math library [12]. The different components of the core implementation are shown in Figure §8.5 on the right side as a zoom. Statistical tests supported STATService implements a wide set of parametric (pairwise and multiple comparison), non-parametric (pairwise and multiple comparison), normality and homocedasticity tests. It also provides post-hoc procedures for multiple comparison. The tests implemented are shown in Appendix §D. Besides this, STATService offers a service called SMARTest that selects the best set of tests to carry out a statistical comparison according to a specific methodology. This service analyses properties such as normality or homocedasticity using statistical tests, and executes the best test according to the decision tree shown in Figure §8.6. It performs an evaluation of the premises of parametric tests (except for independence), and chooses the specific test to be applied based on the methodology described in [69]. Reporting results For each p-value provided as a result, STATService provides its value, the value of the statistic, the distribution used to compute it (including its freedom degrees), and the significance level that should be used for rejecting the null hypothesis H0 (using 0.05 by default for usual tests and the adjusted value for post-hoc procedures). STATService generates rankings and the complete table of p-values for non-parametric multiple comparison and post-hoc procedures respectively. The use of colors fonts in the p-values, and the presence of links to the information regarding the tests, makes it straightforward drawing conclusions from the statistical information. When used through its web interface, STATService uses the font colour to show the meaning of the p-value, in order to help users to interpret its value (using red for H0 rejection, and green for H0 non rejection). Furthermore, STATService shows the results of the evaluation of the decision tree step by step, and it describes the alternative tests that should be used if the assumptions and considerations made during the evaluation of the decision tree are not met. In turn, it shows links that allow the application of the alternative tests in the decision tree directly. Figure§8.7 shows some snapshots of the web portal interface of STATService. 196 8.5. USING MOSES Figure 8.6: Decision tree used for test selection 8.5 U SING MOSES 8.5.1 MOSES Studio In order to ease the development of MPS solutions, we have incorporated a web application called MOSES studio. MOSES studio does not only provide a MOEDL/SEDL documents editor, but it also provides several facilities to validate and analyze MOEDL/SEDL documents. Its user interface is summarized in the diagram of Figure §8.8. As figure denotes, MOSES studio presents the information according to the kind of document loaded. Its main features are the following. 1. Basic documents management operations, File menu depicted in the Figure, including: • create by-default MOEDL/SEDL descriptions to make easier its description from scratch. 197 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM • create scenarios to launch several analysis operations by using a simple Javascript syntax. • perform typical file system operations with the documents and scenarios such as: open, save, save as, print, or rename. • download documents and scenarios in several formats, namely, SEDL, MOEDL and SEDL/MOEDL serialised as XML, SEA lab-pack. 2. Common advanced operations that apply to any experiment description, Main window menu, including: • Experiment edition including facilities such as undo, redo, copy, paste, and find. • Experiment validation by means of a unique validate button that performs a different checking and explaining depending on the kind of experiment currently loaded. • Experiment conduction by means of a unique through E3. 3. SEDL-specific operations that were included in the Tool menu when a SEDL description is loaded. The supported operations are: • To analyse the experiment conduction. Such an analysis may check if the validity of the experiment is threatened using the automated analysis operations described in Chapter §6. If threats are detected, an explaining report with advices for neutralization is showed, and the sections of experimental descriptions affected are highlighted. 4. To perform MOEDL-specific operations that were included in the Tool menu when a MOEDL description is loaded. The supported operations are: • To obtain the equivalent SEDL description. 5. Repository mangement allows to add or remove research repositories and perform searches by different criteria, such as, optimization problem or instances, experimental subjects (author), technique, etc. The MOEDL and SEDL editor depicted in Figure §8.8 provides the following features: 1. Syntax highlighting. 198 8.6. SUMMARY 2. Auto-complete The experimental scenarios developer, that is depicted in Figure §8.8, allows to launch several analysis operations consecutively by using a simple JavaScript notation, and the result is shown in a log window. Note that this developer scenario allows experimenting with more than one analysis operation, configuration and experimental execution. 8.6 S UMMARY In this chapter, we presented a software ecosystem for the development of MPSbased solutions. This ecosystem set the basis for the integration of the current disparate metaheuristic tools. In particular, we described the features and the design of our ecosystem MOSES in detail and we proposed a reference implementation of its main components. Finally, we glimpsed our vision of the future of the ecosystem as an open IDE for the development of MPS-based solutions. A subset of the contributions presented in this paper were presented at the national conference MAEB. In [211] the concept of software ecosystem for metaheuristic optimization was presented. In [214] we proposed STATService. Finally, in [215] MOEDL was proposed. 199 CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM (a) Home page of STATService (b) STATService data reviewer & editor (c) Test selection form (d) Results provided by STATService (decision tree, ranking and p-values) Figure 8.7: Snapshots of the STATService web portal 200 8.6. SUMMARY Figure 8.8: MOSES Studio user interface navigability. 201 PART IV VALIDATION 9 VALIDATION At the heart of science is an essential balance between two seemingly contradictory attitudes –an openness to new ideas, no matter how bizarre or counter-intuitive they may be, and the most ruthless sceptical scrutiny of all ideas, old and new. This is how deep truths are winnowed from deep nonsense. Carl Sagan, 1934–1996 American astronomer, exobiologist and writer n this chapter, we report the results of MOSES validation. In particular, we explain how we used the ecosystem to implement, evaluate an analyze two MPS applications in the context of search-based software engineering: QoS-Gasp and ETHOM. Section §9.1 introduces how the validation was undertaken. Sections §9.2 and §9.3 explain how we used MOSES along the MPS lifecyles of QoS-Gasp and ETHOM respectively. Finally, some conclusions are summarized in Section §9.4. I 205 CHAPTER 9. VALIDATION 9.1 I NTRODUCTION As part of our thesis, we developed two specific algorithms, QoS-Gasp and ETHOM, to solve two significant problems in the context of search-based software engineering. These problems and algorithms are summarized in the following sections and fully described in Appendixes §G and §H. The development, evaluation and analysis of both algorithms were tedious, error-prone and extremely time-consuming. This was one of the main reasons that motivated us to propose a set of tools to reduce the cost of using metaheuristics in the context of software engineering. In this chapter, we report the results of using MOSES to replicate all the experiments performed during the development of QoS-Gasp and ETHOM. This has been used as a validation of the ecosystem assessing the gains that it provides in real settings. For each problem, we explain how we used the tools of the ecosystem along the MPS lifecyle. On each phase, we used MOEDL to describe the experimental descriptions and their results. Also, we created a lab-pack for each experiment including the instances of the problem. Once the MOEDL documents were ready, we used E3 to automatically check their internal validity. This helped us to detect and fix a bug in the implementation of QoS-Gasp. Then, we again used E3 to load the experiments described in MOEDL and run them automatically using, among others, our framework FOM. Finally, after the execution of each experiment, we used STATService to perform all the required statistical tests automatically and obtain conclusions. 9.2 Q O S- AWARE COMPOSITE WEB SERVICES BINDING In service oriented scenarios, applications are created by composing atomic services and exposing the resulting added value logic as a service. When several alternative service providers are available for composition, quality of service (QoS) properties such as execution time, cost, or availability are taken into account to make the choice, leading to the creation of QoS-aware composite web services. Finding the set of service providers that result in the best QoS is a NP-hard optimization problem. To address this problem, we propose QoS-Gasp, a metaheuristic algorithm for performing QoSaware web service composition at runtime. QoS-Gasp is an hybrid approach that combines GRASP with Path Relinking. For the evaluation of our approach we compared it with related metaheuristic algorithms found in the literature. The experiments show that when results must be available in seconds, QoS-Gasp improves the results of pre- 206 9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING vious proposals up to 40%. Beside this, QoS-Gasp found better solutions than any of the runs of the techniques compared in a 92% of the runs when results must be available in 100ms. The complete description of QoS-Gasp and the experimental results are fully reported in Appendix §G. This work has been submitted to the IEEE Transactions on Services Computing journal. In the following subsections, we provide a complete description of how MOSES supported the design and validation of QoS-Gasp on each one of the phases of the MPS life-cycle. 9.2.1 Selection Regarding the selection phase, it is worth noting that QoS-Gasp is not our first attempt to design an effective algorithm for solving the QoSWSCB problem. On the contrary, an hybrid of Evolutionary Algorithm and Tabu Search was proposed initially for solving the problem, since it is a simple hybridization of the most widely used metaheuristic for this problem [43]. The results of this algorithm were good for small instances of the problem and it was published in [210]. However, the algorithm did not provide good results for large problems. Next, an Ant Colony Optimization algorithm was designed but its results were unsatisfactory. Finally, we explored the possibilities of using GRASP and its hybridization with Path Relinking for solving this problem. This process of selection by educated guessing, trial and error took a long time, which was one of the main motivations for the development of MOSES. Summarizing, the selection of the metaheuristic was performed based on authors’ experience, the known properties of previous proposals, and the specific metaheuristic algorithms applied in most cases. Since our first efforts to solve this problem date back to 2008, MOSES could not be applied for the selection of such approaches except for QoS-Gasp. In the remainder of this section, we refer exclusively to the application of the MPS life-cycle aimed at designing and validating QoS-Gasp. We used MOSES to compare four different metaheuristics and select the one providing better results for the QoSWSCB problem. More specifically, we compared Genetic Algorithms (GA) [43], hybrid Tabu Search with Simulated Annealing (TS+SA) [166], and our approaches, GRASP and GRASP with Path Relinking (GRASP + PR). For the comparison, we used two different objective functions (a.k.a. global QoS functions) reported in the literature. Figures §9.1 and §9.2 show the MOEDL documents used for the experiments. We will refer to these experiments as Exp1 and Exp2 respectively. For each experiment, we created a lab-pack with 11 instances of the problem. 207 CHAPTER 9. VALIDATION MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 Problem Types : QoSWSCB( ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) O b j e c t i v e f u n c t i o n s : GlobalQoS I n s t a n c e s ( f i l e : ’Problem -${i}. qoswsc ’ ) : P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10 Optimization Techniques : GRASP1(GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL cre ation : { type : RangeBased , alpha : 0 . 2 5 , g−f u n c t i o n : {Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’}} Local improvement : SD encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’} GRASP+PR${g−f u n c t i o n } : { T e c h n i q u e : PR , i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector , I n i t i a l i z a t i o n S c h e m e : T e c h n i q u e (GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL crea tion : RangeBased{ alpha : 0 . 2 5 , g−f u n c t i o n : V a r i a n t s { Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) } Local improvement : SD , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } }, g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector , e l i t e S e t S i z e : 2 0 , r e l i n k i n g S t e p s : 5 0 encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } EACanfora (EA) { I n i t i a l i z a t i o n S c h e m e : Random , populationSize : 100 , mutationProbability : 0 . 0 1 , c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : { type : RouletteWheel } m u t a t i o n S e l e c t o r : RandomSelector , s u r v i v a l R e p l a c e r : PrioritizedCompositeSelector{ m a i n S e l e c t o r : { type : E l i t i s t S e l e c t o r , r a t e : 2 , a b s o l u t e : t r u e } s e c o n d a r y S e l e c t o r : { type : RouletteWheel } } encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’ } TS+SA( c l a s s : ’es.us.isa.qoswscb.technique. HybridTSandSA ’ ) { memory : Recency ( 1 0 0 } , A s p i r a t i o n : BestImprovement , c oo li ng s ch em e : E x p o n e n t i a l ( r : 0 . 9 5 } , i n i t i a l t e m p e r a t u r e : 1 0 0 0 0 , neigbours per iteration : 5 , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBExplorableSolution ’ } T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) , MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) ) Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m B a s i c ( seed : 2 3 1 7 5 2 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 Figure 9.1: Selection experiment for QoSWSC (Exp 1) 208 9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING MOEDL: : EXPERIMENT : QoSWSCB2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 Problem Types : QoSWSCB( ’es.us.isa.qowswc.problem. CanforaProblemDefinition ’ ) O b j e c t i v e f u n c t i o n s : CanforasGlobalQoS I n s t a n c e s ( f i l e : ’Problem -${i}. qoswsc ’ ) : P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10 QoSWSCB O b j e c t i v e f u n c t i o n s : GlobalQoS Optimization Techniques : GRASP1(GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL cre ation : { type : RangeBased , alpha : 0 . 2 5 , g−f u n c t i o n : {Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’}} Local improvement : SD encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’} GRASP+PR${g−f u n c t i o n } : { T e c h n i q u e : PR , i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector , I n i t i a l i z a t i o n S c h e m e : T e c h n i q u e (GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL crea tion : RangeBased{ alpha : 0 . 2 5 , g−f u n c t i o n : V a r i a n t s { Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) } Local improvement : SD , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } }, g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector , e l i t e S e t S i z e : 2 0 , r e l i n k i n g S t e p s : 5 0 encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } EACanfora (EA) { I n i t i a l i z a t i o n S c h e m e : Random , populationSize : 100 , mutationProbability : 0 . 0 1 , c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : { type : RouletteWheel } m u t a t i o n S e l e c t o r : RandomSelector , s u r v i v a l R e p l a c e r : PrioritizedCompositeSelector{ m a i n S e l e c t o r : { type : E l i t i s t S e l e c t o r , r a t e : 2 , a b s o l u t e : t r u e } s e c o n d a r y S e l e c t o r : { type : RouletteWheel } } encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’ } TS+SA( c l a s s : ’es.us.isa.qoswscb.technique. HybridTSandSA ’ ) { memory : Recency ( 1 0 0 } , A s p i r a t i o n : BestImprovement , c oo li ng s ch em e : E x p o n e n t i a l ( r : 0 . 9 5 } , i n i t i a l t e m p e r a t u r e : 1 0 0 0 0 , neigbours per iteration : 5 , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBExplorableSolution ’ } T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) , MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) ) Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m B a s i c ( seed : 6 4 8 2 7 5 2 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 Figure 9.2: Selection experiment for QoSWSC (Exp 2) 209 CHAPTER 9. VALIDATION Before running the experiments, we used E3 to check the validity of the MOEDL documents automatically. No threats to validity were detected. 9.2.2 Implementation For the implementation of the metaheuristic program we used our framework FOM since it supports all the techniques under comparison in this experiment. Additionally, this was helpful to test the framework in a new application scenario. 9.2.3 Tailoring In order to find the best variant of GRASP+PR for QoS-Gasp we performed an additional experiment (Exp3). In the experiment, we compared seven greedy functions and three specific values of α (greediness parameter). Figure §9.3 depicts the MOEDL document used to describe this experiment. Again, we used E3 to check the validity of the MOEDL documents automatically. Interestingly, the operation “missing measurement” detected an error invalidating our results, i.e.,. the number of measurements did not match the expected number of measurements in the design. After some debugging we detected that the problem was a bug in the implementation of the tailorings. We fixed the bug and repeated the experiment and the analysis with no more detected threats. 210 9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING MOEDL: : EXPERIMENT : QoSWSCBA1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0 Problems : QoSWSCB O b j e c t i v e f u n c t i o n s : GlobalQoS c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ I n s t a n c e s ( f i l e : ’A1 -P${instance }. qoswsc ’ ) : P0 , P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10 Optimization Techniques : GRASPTech (GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL cre ation : RangeBased ( alpha : V a r i a n t s { 0 . 2 5 , 0 . 5 , 0 . 7 5 } ) , g−f u n c t i o n : V a r i a n t s { Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) , Custom ( ’G2’ , c l a s s : ’es.us.isa.qoswsc.G2’ ) , Custom ( ’G3’ , c l a s s : ’es.us.isa.qoswsc.G3’ ) , Custom ( ’G4’ , c l a s s : ’es.us.isa.qoswsc.G5’ ) , Custom ( ’G5’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) , Custom ( ’G6’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) , Custom ( ’G7’ , c l a s s : ’es.us.isa.qoswsc.G7’ ) } Local improvement : SD , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 ) Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m ’ , B a s i c ( seed : 3 7 2 3 4 2 1 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 Figure 9.3: Selection experiment for QoSWSC (Exp 3) 9.2.4 Tuning For tuning QoS-Gasp we performed an additional experiment (Exp4) to search the best combination of values for the parameters of path relinking, namely: number of solutions in the elite set generated by GRASP, number of elite solutions, the number of path per iteration and the number of neighbours to explore per path. Figure §9.4 shows the MOEDL document describing this experiment. The operations for the automated validation of the experiment revealed no threats. 211 CHAPTER 9. VALIDATION MOEDL: : EXPERIMENT : QoSWSCBA2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 type : T e c h n i q u e P a r a m e t r i z a t i o n , methodology : BasicMOSES , runs : 2 0 Problem Types : QoSWSCB( ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) O b j e c t i v e f u n c t i o n s : GlobalQoS I n s t a n c e s ( f i l e : ’A1 -P${instance }. qoswsc ’ ) : P0 , P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10 Optimization Techniques : GRASP+PR ( PR) { I n i t i a l i z a t i o n S c h e m e : M e t a h e u r i s t i c (GRASP) { I n i t i a l i z a t i o n S c h e m e : Random , RCL crea tion : RangeBased{ alpha : 0 . 2 5 , g−f u n c t i o n : Custom ( ’G6’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) } , Local improvement : SD , encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ } }, i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector , g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector } Parameters Space : Dimensions : e l i t e S e t S i z e enum 5 , 1 0 , 20 r e l i n k i n g S t e p s enum 1 0 , 2 0 , 50 T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 ) Random Number Generator : ( ’Mersenne twister algorithm ’ , B a s i c ( seed : 4 6 3 2 4 5 1 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ ) Configurations : C1 : Outputs : F i l e ’Results -${ finishTimestamp }.csv’ S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 Figure 9.4: Selection experiment for QoSWSC (Exp 4) 9.2.5 Analysis Once the results of each experiment were obtained, we proceeded to analyze the data. To that purpose, we uploaded the results of each experiment to the web interface of STATService in CSV format. The tool automatically performed all the tests required and returned the following results: i ) a ranking of variants, ii ) the p-values of all the statistical tests, iii ) the p-values of the post-hoc procedures, and iv) the decision path followed to perform the tests (c.f. Figure §8.6). Figure §9.5 shows a screenshot of the analysis report provided by STATService for Exp3. 212 9.3. GENERATION OF HARD FMS Figure 9.5: Results of STATService for Exp1 (100ms) 9.3 G ENERATION OF HARD FM S A Feature Model (FM) is a compact representation of the products of a software product line. The automated extraction of information from FMs is a thriving topic involving numerous analysis operations, techniques and tools [26]. Performance evaluations in this domain mainly rely on the use of random FMs. However, these only provide a rough idea of the behaviour of the tools with average problems and are not sufficient to reveal their real strengths and weaknesses. To address this problem, we propose to model the problem of finding computationally-hard FMs as an optimiza- 213 CHAPTER 9. VALIDATION tion problem and we solve it using a novel Evolutionary algoriTHm for Optimized feature Models (ETHOM). Given a tool and an analysis operation, ETHOM generates input models of a predefined size maximizing aspects such as the execution time or the memory consumption of the tool when performing the operation over the model. This allows users and developers to know the behaviour of tools in pessimistic cases providing a better idea of their real power. Experiments using ETHOM on a number of analyses and tools have successfully identified models producing much longer executions times and higher memory consumption than those obtained with random models of identical or even larger size. The complete description of ETHOM and the experimental results are fully reported in Appendix §H. This work has been submitted to the Information and Software Technology journal. In the following subsections, we provide a complete description of how MOSES supported the design and validation of ETHOM on each one of the phases of the MPS life-cycle. Also, we present how the ecosystem contributed to replicate the experiments performed to evaluate the effectiveness of the algorithm in different scenarios. The experimental description for this application are provided in SEDL since: i ) the implementation was not performed with any MOF, and ii ) it allows to perform a more complete validation testing the expressiveness SEDL with more experiments. 9.3.1 Selection The idea of generating hard FMs to evaluate the performance of analysis tools was inspired by the work of Wegener et al. [290]. In their work, the authors showed that Evolutionary Algorithms (EAs) are effective in finding hard inputs for real time systems. Thus, we decided to follow the same approach, based on EAs, and we did not compare any other techniques, i.e., no selection experiments were performed. 9.3.2 Implementation The characteristics of the problem led us to defined a custom encoding (trees of fixed size) and thus custom crossover and mutation operators as well as specific repairing mechanisms. As a result, we could not adapt any of the existing MOFs and we had to implement an ad-hoc java implementation of ETHOM. 214 9.3. GENERATION OF HARD FMS 9.3.3 Tailoring With the aim of finding a suitable tailoring of ETHOM, we performed numerous executions of a sample optimization problem evaluating different combination of values for the tailoring points of the algorithm, presented in Table §9.1. Underlined values were those providing better results and therefore those selected for the final configuration of ETHOM. The optimization problem used for tailoring was to find a FM maximizing the execution time invested by the analysis tool when checking whether the model is void (i.e., whether it represents at least one product). We chose this analysis operation because it is currently the most quoted in the literature [26]. In particular, we looked for FMs of different size maximizing execution time in the CSP solver JaCoP integrated into the FaMa framework v1.0. We choose FaMa mainly because our familiarity with the tool. Tailoring Point Variants evaluated and selected Selection strategy Roulette-wheel, 2-Tournament Crossover strategy One-point, Uniform Infeasible individuals Replacing, Repairing Table 9.1: Tailoring variants in ETHOM Figure §9.6 depicts the description of the tailoring experiment using SEDL. The analysis operations for internal validation revealed no threats in the experiment. 215 CHAPTER 9. VALIDATION EXPERIMENT : ETHOM−A1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Objects : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : NFeatures : 5 0 0 // Number o f f e a t u r e s o f t h e FM t o be ge nera ted CTC : 20 // P e r c e n t a g e o f Cross Tree C o n s t r a i n t s t o be gene rat ed S o l v e r : ’CSP -JaCoB ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n CrossoverProb : 0 . 7 MutationProb : 0 . 0 0 5 P o p u l a t i o n S i z e : 100 E x e c u t i o n s : 5000 Variables : Factors : s e l e c t i o n enum ’Roulette -wheel ’ , ’2-Tournament ’ c r o s s o v e r enum ’One -point ’ , ’Uniform ’ i n f e a s i b i l i t y T r e a t m e n t enum ’Reparison ’ , ’Relacement ’ Outcomes : O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found Hypothesis : D i f f e r e n t i a l Design : Sampling : Random D e t a i l e d D e s i g n : Custom Assignment : Random Groups : by s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t s i z i n g 10 P r o t o c o l : Random Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . ) A1 : FactANOVAwRS( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 ) Tukey ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 ) A2 : Friedman ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 ) Holms ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 ) Configuration C1 : Outputs : F i l e ’Results -ETHOM -A1.csv’ r o l e : MainEvidence format : CSV mapping : VarsPerColumn S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 Procedure : Command as Treatment ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) : ’java -jar ETHOM Results.csv ${NFeatures} ${CTC} ${Executions} \ ${Solver} ${selection} ${crossover} \ ${ infeasibilityTreatment } ${ CrossoverProb } ${ MutationProb }\ ${ PopulationSize } ’ Figure 9.6: Tailoring of ETHOM 9.3.4 Tuning For tuning ETHOM we repeated the same process described for the tailoring but this time we evaluated the solutions found with different values for the key parameters of the algorithm. The parameters and values evaluated are presented in Table 9.2. Figure §9.7 depicts the SEDL document used to describe and automate the execution 216 9.3. GENERATION OF HARD FMS of the experiment. No threats were detected. EXPERIMENT : ETHOM−A2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Object : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : NFeatures : 5 0 0 // Number o f f e a t u r e s o f t h e FM t o be ge nera ted CTC : 20 // P e r c e n t a g e o f Cross Tree C o n s t r a i n t s t o be gene rat ed S o l v e r : ’CSP -JaCoB ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n s e l e c t i o n : ’Roulette -wheel ’ c r o s s o v e r : ’One -point ’ i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ Variables : Factors : CrossoverProb enum 0 . 7 , 0 . 8 , 0 . 9 MutationProb enum 0 . 0 0 5 , 0 . 0 0 7 5 , 0 . 0 2 P o p u l a t i o n S i z e enum 5 0 , 1 0 0 , 200 E x e c u t i o n s enum 2 0 0 0 , 5000 Outcome O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found Hypothesis : D i f f e r e n t i a l Design : Sampling : Random Assignment : Random Groups : s i z i n g 10 P r o t o c o l : Random A n a l y s e s // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . ) A1 : FactANOVAwRS( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 ) Tukey ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 ) A2 : Friedman ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 ) Holms ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 ) Configurations C1 : Outputs : F i l e ’Results -ETHOM -A2.csv’ r o l e : MainEvidence format : CSV mapping : VarsPerColumn S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0 Procedure : Command as Treatment ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) : ’java -jar ETHOM Results -ETHOM -A2.csv ${NFeatures} ${CTC} \ ${Executions} ${Solver} ${selection} ${crossover} \ ${ infeasibilityTreatment } ${ CrossoverProb } \ ${ MutationProb } ${ PopulationSize } ’ Figure 9.7: Tuning of ETHOM In total, we performed over 40 million executions of the objective function to find a good tailoring and tuning of ETHOM. 217 CHAPTER 9. VALIDATION Parameter Crossover probability Mutation probability Size initial population #Executions fitness function Values evaluated and selected 0.7, 0.8, 0.9 0.0075, 0.005, 0.02 50, 100, 200 2000, 5000 Table 9.2: Tuning values in ETHOM Figure 9.8: Analysis report and decision path generated by STATService 9.3.5 Analysis For the statistical tests of each experiment we used the web interface of STATService. This allowed us to know if our hypothesis were confirmed in a few seconds. Figure §9.8 shows an screenshot of the analysis report provided by STATService for experiment #1 of ETHOM. Additionally, we had to do some manual work to create fitness evolution graphs, e.g. histograms. This was a feature that we missed in STATService and that we plan to add in the future (c.f. Chapter §10). 9.3.6 Experiments on the generation hard FMs Once that we found a suitable configuration for ETHOM, we performed several experiments to evaluate its effectiveness with different optimization criteria. These 218 9.3. GENERATION OF HARD FMS experiments are described in the following subsections. Experiment #1: Maximizing execution time in a CSP Solver In this experiment, we evaluated the ability of ETHOM to search for input feature models maximizing the analysis time of a solver. In particular, we measured the execution time required by a CSP solver to find out if the input model was consistent (i.e. it represents at least one product). This was the same problem used to tune the configuration of our algorithm. Again, we chose the consistency operation because it is currently the most used in the literature. Figure §9.9 depicts the SEDL document used to describe this experiment. In a related experiment we evaluated the ability of ETHOM to search for input feature models maximizing the analysis time of a solver. The only difference between this experiment an the previously described is the solver used for the analysis (Parameter Solver: ’FAMA-SAT’). Thus, we simply copied the experiment and changed the parameter in order to execute the experiment. Experiment #2: Maximizing memory consumption in a BDD solver In this experiment, we evaluated the ability of our evolutionary program to generate input FMs maximizing the memory consumption of a solver. In particular, we measured the memory consumed by a BDD solver when finding out the number of products represented by the model. We chose this analysis operation because it one of the hardest operations in terms of complexity and it is currently the second operation most quoted in the literature [26]. We decided to use a BDD-based reasoner for this experiment since it has proved to be the most efficient option to perform this operation [26]. The SEDL document describing this experiment is presented in Figure §9.10. 219 CHAPTER 9. VALIDATION EXPERIMENT : ETHOM−E1a v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Objects : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n T e r m i n a t i o n c r i t e r i o n : ’MaxMObjFuncEvaluations (5000) ’ RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’} Variables : F a c t o r FMGenerator enum ETHOM( command : ’ETHOM ’ , s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 ) , RandomGen ( command : ’RandomFMGenerator ’ ) Outcome O b j e c t i v e F u n c t i o n i n Z // B e s t value o f t h e o b j . func . found NCFactors NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000 CTC enum 1 0 , 2 0 , 3 0 , 40 Hypothesis : D i f f e r e n t i a l Design : Sampling : Random D e t a i l e d D e s i g n : Custom Assignment : Random Blocking : NFeatures , CTC Groups : FMGenerator s i z i n g 25 P r o t o c o l : Random Analyses : // Use T−T e s t or Wilcoxon ( a . k . a . Mann−Withney ) A1 : TTest ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) Tukey ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) A2 : Friedman ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) Holms ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) Configurations : C1 : Outputs : F i l e ’Results -ETHOM -1a.csv’ E x p e r i m e n t a l S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0 Experimental procedure : Command as Treatment ( FMGenerator , NFeatures , CTC ) : ’java -jar ${ FMGenerator } Results -ETHOM -1a.csv ${NFeatures} ${CTC} \ ${ Termination_criterion } ${Solver} ${FMGenerator.selection} \ ${FMGenerator.crossover} ${FMGenerator. infeasibilityTreatment } \ ${FMGenerator. CrossoverProb } ${FMGenerator. MutationProb} \ ${FMGenerator. PopulationSize }’ // I f a p r o p e r t y i s not d e f i n e d i t s value i s ’ ’ Figure 9.9: ETHOM - Experiment #1 in SEDL Experiment #3: Evaluating the impact of the number of generations During the work with ETHOM, we detected that the maximum number of generations used as stop criterion had a great impact in the results of the algorithm. We evaluated that impact with a double aim. First, we tried to find out the minimum number of generations required by ETHOM to offer better results than random techniques 220 9.3. GENERATION OF HARD FMS EXPERIMENT : ETHOM−E2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Object : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : S o l v e r : ’SPLOT -BDD’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n T e r m i n a t i o n c r i t e r i o n : ’MaxMObjFuncEvaluations (5000) ’ RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’} Variables : Factors : FMGenerator enum ETHOM( command : ’ETHOM ’ , s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 ) , RandomGen ( command : ’RandomFMGenerator ’ ) NCFactors : NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000 CTC enum 1 0 , 2 0 , 3 0 , 40 Outcomes : O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found Hypothesis : D i f f e r e n t i a l Design : Sampling : Random Assignment : Random Blocking : NFeatures , CTC Groups : by FMGenerator s i z i n g 25 P r o t o c o l : Random Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . ) A1 : TTest ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) A2 : Wilcoxon ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) Configurations : C1 : Outputs : F i l e ’Results -ETHOM -2. csv’ r o l e : MainEvidence format :CSV mapping : VarsPerColumn Setting : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0 Procedure : Command as Treatment ( FMGenerator , NFeatures , CTC ) : ’java -jar ${ FMGenerator } Results -ETHOM -2. csv ${NFeatures} ${CTC} \ ${ Termination_criterion } ${Solver} ${FMGenerator.selection} \ ${FMGenerator.crossover} ${FMGenerator. infeasibilityTreatment } \ ${FMGenerator. CrossoverProb } ${FMGenerator. MutationProb} \ ${FMGenerator. PopulationSize }’ Figure 9.10: ETHOM - Experiment #2 in SEDL on the search for hard FMs. Second, we wanted to find out whether ETHOM was able to find even harder models than in our previous experiments when allowed to run for a large number of generations. In particular, we performed two experiments with two different solvers for the evaluation of the fitness function, CSP and BDD. The description of the experiments only differed in the fitness function and so they were almost identical. Figure §9.11 shows the SEDL description of the experiment with BDD. 221 CHAPTER 9. VALIDATION EXPERIMENT : ETHOM−E3a v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Object : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : ETHOM: { s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ , c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 } NFeatures : 500 CTC : 20 S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’} Variables : Factors : NGenerations enum 1 0 , 2 5 , 5 0 , 7 5 , 1 0 0 , 125 Outcomes : E f f e c t i v e n e s s I n t e g e r // % o f ti me s t h a t ETHOM outperforms Random Search Hypothesis : D i f f e r e n t i a l Design : Sampling : Random Assignment : Random Groups : by FMGenerator s i z i n g 25 P r o t o c o l : Random Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . ) A1 : ANOVA( F i l t e r ( NGenerations ) , 0 . 0 5 ) Tukey ( F i l t e r ( NGenerations ) , 0 . 0 5 ) A2 : KruskalWalls ( F i l t e r ( NGenerations ) , 0 . 0 5 ) Configurations : C1 : Outputs : F i l e ’Results -ETHOM -3a.csv’ Setting : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0 Procedure : Command as Treatment ( NGenerations ) : ’java -jar Effectiveness Results -ETHOM -3a.csv ${NFeatures }\ ${CTC} ${ NGenerations * ETHOM. populationSize } ${Solver} \ ${ETHOM.selection} ${ETHOM.crossover} \ ${ETHOM. infeasibilityTreatment } ${ETHOM. CrossoverProb } \ ${ETHOM. MutationProb } ${ETHOM. PopulationSize }’ Figure 9.11: ETHOM - Experiment #3 in SEDL Experiment #4: Evaluating the impact of the analysis heuristics In this experiment we checked whether the hard FMs generated by our evolutionary approach were also hard for solvers using other heuristics. In particular, we repeated the analysis of the hardest FMs found in experiment #1 using the other seven heuristics available in the CSP solver JaCoP. Figure §9.12 shows the description of this experiment in SEDL. 222 9.4. SUMMARY EXPERIMENT : ETHOM−E4 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 Object : ’Run of ETHOM for the parameters specified ’ Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’ Constants : S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n ETHOM: { s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ , c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 } RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’} Variables : Factors : J a C o P H e u r i s t i c enum ’MaxRegret ’ , ’LargestMin ’ , ’SmallestMax ’ , ’MostConstrainedDynamic ’ , ’MinDomainOverDegree ’ , ’LargestDomain ’ , ’SmallestDomain ’ , ’SmallestMin ’ NCFactors : NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000 CTC enum 1 0 , 2 0 , 3 0 , 40 Outcomes : O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found Hypothesis : D i f f e r e n t i a l Design : Sampling : Random Assignment : Random Blocking : NFeatures , CTC Groups : by FMGenerator s i z i n g 25 P r o t o c o l : Random Analyses : A1 : ANOVA( F i l t e r ( J a C o P H e u r i s t i c ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) A2 : Friedman ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 ) Configuration C1 : Outputs : F i l e ’Results -ETHOM -4. csv’ r o l e : MainEvidence format :CSV mapping : VarsPerColumn E x p e r i m e n t a l S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0 Experimental procedure : Command as Treatment ( J a C o P H e u r i s t i c , NFeatures , CTC ) : ’java -jar ETHOM Results -ETHOM -4. csv ${NFeatures} ${CTC} \ ${ETHOM.Executions} ${Solver} ${ETHOM.selection} \ ${ETHOM.crossover} ${ETHOM. infeasibilityTreatment } \ ${ETHOM. CrossoverProb } ${ETHOM. MutationProb} \ ${ETHOM. PopulationSize } ${ JaCoPHeuristic }’ Figure 9.12: ETHOM - Experiment #4 in SEDL 9.4 S UMMARY In this chapter, we illustrated how MOSES may contribute to reduce the cost of using metaheuristics in the context of two search-based problems in software engineering. MOEDL and SEDL contributed to provide a succinct and self-contained description of the experiments and their results. This in turn made possible the automated detection of possible threats in the experiments. In fact, we automatically detected a bug in the implementation of one of the algorithms. Also, the experimental descriptions 223 CHAPTER 9. VALIDATION and their corresponding lab-packs enabled the automated execution of experiments with one-click. Finally, STATService made the statistical analysis of the data complete automated. These results support the validity of our conclusions as an effective approach to reduce the cost of using metaheuristics. 224 PART V F INAL R EMARKS 10 C ONCLUSIONS If at first, the idea is not absurd, then there is no hope for it. Albert Einstein, 1879 – 1955 German physicist It’s a dangerous business, Frodo, going out your door. You step onto the road, and if you don’t keep your feet, there’s no knowing where you might be swept off to. J. R. R. Tolkien, (from The Lord of the Rings 1892 – 1973 English Writter 10.1 C ONCLUSIONS The main conclusion that can be drawn from this dissertation is that: The current support to develop MPS applications can be improved with MOSES. We are convinced that our dissertation is only a first, baby step, but it is in the right way. We contributed in three key points: the language (SEDL/MOEDL), the Software 227 CHAPTER 10. CONCLUSIONS Development Kit (FOM and the analysis operations catalogue) and the development and execution environment (MOSES). These three aspects determine the capabilities of many software development tools, and to the best of our knowledge, in the case of MPS applications we are pioneers in providing both a novel experiments description language and a novel development and execution environment. It has been a long path to run, but we are convinced that our research strategy was the right one. Throughout this dissertation we took some decisions that lead us to achieve our goal successfully. Firstly, we decided betting for FOM as the reference MOF. Many times we were about to abandon this decision, but after surveyed the state of the art we concluded that in spite of on average it was not the MOF with the best score, it provided the broadest spectrum of metaheuristic techniques. Therefore, it was good enough start point to build a comprehensive solution for supporting MPS life-cycle. Secondly, we decided to validate our work in building experimental intensive MPS applications, what has supposed an important extra effort. We also forced ourselves to develop publicly-available tools to show our progress. In this sense, some of the needs of end-users help us to identify new and interesting features. At following we expose some more specific conclusions. Regarding the implementation of MPS-based applications, our comparison Framework has resulted useful not only as a reference guide for practitioners, but also as a helpful tool in deciding on the directions for the further development of other MOFs such as Eva2 and HeuristicLab (see Appendix §A). Furthermore, FOM has been used by a local company to build MPS-based solutions in the urban traffic and operational management arena. Regarding the description of MOEs, SEDL was conceived with a twofold purpose: as end-user language and as intermediate fully-fledged specification language for experiments, i.e. to serve as the target domain into which other domain-dependent experiment description languages would be translated in order to be benefited by its automated analysis support. Our experience defining and using MOEDL in our validation scenarios allow us to claim that our initial conception has been successful and it can be also found useful for other colleagues. Regarding the automated analysis, the 15 analysis operations identified have a very easy implementation which means our solution can be easily shared and reproduced. Furthermore, the applicability of these analysis operations goes beyond SEDL and could be potentially applicable for any experiment description language with a formal semantics. With regard to STATService, it was conceived assuming that it would be 228 10.2. SUPPORT FOR RESULTS used by inexperienced users with no background on statistical tests. Considering that it has already being used by 9 labs in 5.countries we can conclude that our assumption was in the right direction. Regarding the automated conduction and replication of MOEs, MOSES makes easier the comprehensive development of MPS applications having been validated in experimentation intensive scenarios. MOSES has been incrementally developed with a set of facilities to implement metaheuristic algorithms, to conduct and replicate SEDL and MOEDL experiments, etc. However, we must validate it with more MPS applications, and some facilities must be included in the tool suite that integrates MOSES in order to enhance its usefulness. In the following section we discuss some of these potential enhancements. Additionally, during the validation of our approach we have obtained results for solving specific optimization problems in the Search Based Software Engineering area. We provided an algorithm based on the hybridization of GRASP and Path Relinking for solving the QoS-aware Web Service Composition problem at runtime. Furthermore, we collaborated with Dr. Segura for designing a novel evolutionary algorithm for finding computationally–hard feature models. As a final conclusion we conjecture that the lack of a commonly accepted MOF as well as the lack of a comprehensive support of MPS applications life-cycle are delaying the application of metaheuristics for solving optimization problems by researchers and practitioners in Software Engineering. In this regard, we are confident that our work will provide a foundation on which MPS applications can be built. Furthermore, apart from our technique’s inherent value, having developed MOSES as open source tooling support which can be quickly integrated and reused and easy-to-use is a determining factor to achieve a useful result and settles the basis to spread the use of metaheuristics. As an example, the replication of QOSGasp and ETHOM in MOSES takes only the time to push the Start button. 10.2 S UPPORT FOR R ESULTS Some of the results shown in this thesis have been already published in scientific forums. Figure §10.1 summarises these publications, grouping them by two dimensions: type and topic. Five types are defined: book, journal, conferences, tool-demos and workshops. Furthermore some types of publications have a quality level associ- 229 CHAPTER 10. CONCLUSIONS ated, JCR for journals, and CORE and MAS (Microsoft Academic Search) rankings for conferences. Figure 10.1: Publications related to the contributions of this dissertation 10.3 D ISCUSSION , L IMITATIONS AND E XTENSIONS In this section we discuss some of the decisions we have made in this dissertation highlighting its main limitations and possible extensions. As performed in conclusions, we organize the content of this section in the main contributions of our work. Regarding the description of MOEs, current experimental descriptions in SEDL only support the definition of simple hypotheses, that relate a single dependent variable with a whole set of independent variables (either in terms of causality or of covariance). Consequently, complex research studies that comprise multiple related hypothesis and nested experimental designs are neither expressible in SEDL nor though the provided analysis operations. In a similar way MOEDL has not been tested for multiobjective optimization problems, and the transformations defined in this dissertation do not support them. Extension: Extend the meta-models of SEDL and MOEDL and the catalogue of anal- 230 10.3. DISCUSSION, LIMITATIONS AND EXTENSIONS ysis operations to deal with complex experiments and multi-objective optimization problems. Definitions of the transformation MOEDL2SEDL. Although there exist multiple methodologies for performing each type of MOE supported by MOEDL (c.f. for instance [21] for technique parametrization experiments), a single transformation with a specific experimental design has been provided. Extension: Define a new participant and several services in MOSES for supporting this new variability dimension. The participant would be responsible of transforming experimental description according to different experimental methodologies, and of validating experimental descriptions according to those experimental methodologies. Flexibility of replicability and internal validity checkings. Although the checkings defined for replicability and internal validity are appropriate for MOEs, they would result in wrong conclusions and false positives when applied to experiments in other areas. For instance, the checking that the number of actual measuremets is equal to the expetced given the experimental design leads to an positive detection if there is even the smaller percentage of mortality or attrition. However, certain levels of mortality or attrition are acceptable in other areas such as biology, medicine or social sciences. Extension: Extend the definition of the replicability and internal validity checkings in order to support its customization to different scientific areas. Scope of MOSES [RI]. The contributions presented in this dissertation have been implemented in MOSES [RI]. Nevertheless this tool suite presents some limitations to be overcome: • Lack of a reference implementation for the ExperimentalRepository participant. • Lack of a reference implementation for the ExperimentalDesigner participant. • Lack of a reference implementation for theExperimentalReportGenerator participant. Extensions: Current components could also be extended in the following ways: • STATService: – Generate Histograms and Box-Plots. 231 CHAPTER 10. CONCLUSIONS – Generate p-value interpretation diagrams. – Compute the power of the tests and the required sample size to reach a minimum power1 . • FOM: The improvements to be performed in MOSES are described through the evaluation performed in Chapter §5. • MOSES: We plan to integrate all the tools and implement MOSES Studio. Furthermore, we plan to study the possibilities of other software ecosystems, such as Eclipse to support the features of MOSES on desktop. 1 Remember that the power evaluates the probability of having false negatives in NHST, i.e., that the null hypothesis is accepted when there exist significant differences. 232 PART VI A PPENDICES A MOF S ASSESSMENT DATA n this appendix, we provide detailed information about the scores obtained in each characteristic by each framework. Interested readers can obtain more detailed information about assessment on characteristics and features (including comments on problems found on the assessment, penalizations on some features and its underlying reasons and informations sources used to assess it) in http:// www.isa.us.es/ MOFComparison. Moreover, this spreadsheet can be downloaded and exported to various formats, and is provided in such a way that user can customize weights of each characteristic, feature and area, allowing the creation of tailored benchmarks more adapted to its specific needs. I 235 APPENDIX A. MOFS ASSESSMENT DATA A.1 E VALUATION PER A REA In order to assess the MOFs selected on each area of our comparative framework, we have crawled their the source code, user and technical documentation, and user interface. The result of such assessment are the tables shown in this section. The last column of the tables of this section show the number of MOFs supporting each feature. The last two rows show the number of features supported by each MOF, and a score computed as the weighted sum of features supported divided by the number of characteristics in the area. Table §A.1 shows the feature coverage of Area C1, along with the weight corresponding to each feature in its associated characteristic Table §A.2 shows the feature coverage of area C2, along with the weight corresponding to each feature in its associated characteristics. Table §A.1 shows feature coverage of Area C3, along with the weight corresponding to each feature in its associated characteristic. The feature coverage of C4 area is shown on Table §A.4, along with the weight corresponding to each feature in its associated characteristic. As an exception, the features of the GUI characteristic has been assessed using a real value between 0.0 and 1.0. 236 C1 Supported Metaheuristics TS GRASP VNS EA PSO 4 4 4 4 4 4 4 4 4 4 4 4 SUM 4 MALLBA 4 HeuristicLab 4 EasyLocal 4 Opt4j 4 OAT FOM 4 JCLEC EvA2 SA Feature Weight Basic Implementa0.5 tion Multi-Start 0.5 Basic Impl. 0.5 Lineal Annealing 0.1 Exponential/Geometric0.1 Annealing Logaritmic An0.1 nealing Metropolic Accep0.1 tance Logistic Accep0.1 tance Basic Impl. 0.3 Recent Features/0.2 Moves Based Tabu Memory Frecuency Based 0.3 Tabu Memory Basic Aspiration 0.2 Criteria 1 Basic VNS (VNS) 0.2 Variable Neigh0.2 borhood Descent (VND) Reduced VNS 0.2 (RVNS) VNS with Decom0.2 position Skewed VNS 0.2 (SVNS) Basic EA Imple0.2 mentation of GA Basic EA Imple0.2 mentation of ES Basic EA Imple0.2 mentation of GP GAVaPS 0.05 Diploid Individu0.05 als support Coevolution sup0.1 port Differential evolu0.1 tion Niching Methods 0.1 Basic Implementa0.3 tion Discrete Variable 0.2 Support ParadisEO SD/HC ECJ Area Characteristic A.1. EVALUATION PER AREA 4 4 4 8 ∼ 4 4 4 4 4 4 4 7.5 7 3 5 4 4 1 4 4 4 4 4 4 4 4 7 0 4 4 4 ∼ 4 4 4 4 4 4 4 4 4 4 4 3.5 4 4 2 4 4 1 2 1 4 0 0 0 4 4 4 4 4 9 4 4 4 4 4 4 8 4 4 4 4 4 4 4 4 4 4 4 4 4 7 0 1 4 1 4 4 ∼ 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3.5 6 1 Continued on next page 237 AIS ACS Multi-Objective Metaheuristics Scatter S. 238 0.2 0.1 0.25 0.25 0.25 0.25 0.1 0.2 0.4 0.2 0.1 1 3 4 3 0 1 0 0 0 4 4 4 4 4 4 4 4 4 4 4 4 PGA 0.0625 MOGA 0.0625 4 4 4 NSGA 0.0625 4 4 NSGA-II 0.0625 4 4 4 4 NPGA 0.0625 SPEA 0.0625 4 SPEA-II 0.0625 4 4 4 PAES 0.0625 PESA 0.0625 4 PESA-II 0.0625 4 MOMGA 0.0625 ARMOGA 0.0625 Multiobjective 0.0625 AS/ACO Multiobjective PSO 0.0625 POSA 0.0625 MOSA 0.0625 Feature Support Count 10.5 20 19 17.5 7 Weighted Sum 0.207 0.381 0.394 0.450 0.081 Table A.1: Coverage of features in area C1 4 10 0.175 4 9.5 0.264 16 0.324 3 3 2 1 0 1 0 3 2 6 0 1 4 0 1 1 0 0 0 4 11 0.259 SUM MALLBA HeuristicLab EasyLocal 4 Opt4j 4 OAT 4 JCLEC 4 FOM EvA2 Weight 0.2 ParadisEO Feature Customizable Dynamic Equations Topologies Lifetime support CLONAG optIA Immune Networks Detritic Cell Algorithms AS ACS MMAS ASrank API Basic. Impl. ECJ Area Characteristic APPENDIX A. MOFS ASSESSMENT DATA 10 0.245 0 0 0 130.5 E/A Auxiliary Methods C2 Problem Adaption/Encoding N. D. OAT Opt4j 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 0,4 4 4 4 4 4 4 4 4 4 4 4 4 9 0 4 0 9 0 4 7 0 2 3 0 4 4 6 2 4 0,04 0,04 0,04 0,04 0,04 0,04 0,04 0,04 0,04 0,04 0,04 0,04 4 4 4 SUM JCLEC 4 MALLBA FOM 4 EasyLocal EvA2 Weight 0,04 4 0,04 0,04 4 0,04 ParadisEO ECJ Feature Bit Vector Bit Matrix Bit Map Bit GrayCode Vector Integer Vector Integer Matrix Integer Map Real Vector Real Matrix Real Map String Vector String Matrix String Map Permutation Expression Tree State-Machine / Graph Combined / Arbitrary representations Neighborhood structures for solution encodings Neighborhood structures for composite solution encondings Complex Neighborhood structures (1PX) Integer/Bit Vector One Point Crossover (NPX) Integer/Bit Vector n-Points Crossover (including 2) (UX) Integer/Bit Vector Uniform crossover (HCX) Half Uniform Crossover (PNCTX) Integer/Bit Vector Punctuated Crossover (SX) Integer/Bit Vector Shuffled Crossover HeuristicLab Solution encoding Area Characteristic A.1. EVALUATION PER AREA 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 4 4 4 0,6 4 4 4 3 0,3 4 4 4 3 0,1 0 0.0102 4 4 4 4 4 4 4 4 8 0.012 4 4 4 4 4 4 4 4 8 0.0102 4 4 4 4 4 4 4 7 4 4 4 3 0.0102 0.0102 0.0102 0 4 1 Continued on next page 239 SUM MALLBA HeuristicLab EasyLocal Opt4j OAT 0 0.0071 4 4 4 4 4 4 6 0.0071 4 4 4 4 4 4 6 0.0071 4 4 4 4 4 4 6 0.0071 4 4 4 4 4 0.0071 4 6 4 1 0.0071 0 0.0071 1 0.0071 4 4 4 4 0.0071 4 0.0071 4 4 0.01 4 0.01 0.01 4 0.01 4 0.01 4 6 4 2 1 0.01 4 4 2 4 4 3 4 4 2 4 4 3 1 4 Continued on next page 240 JCLEC FOM EvA2 Weight 0.0102 ParadisEO Feature (RRX) Integer/Bit Vector Random Respectfull Recombination (R1PX) Real Vector One Point Crossover (RNPX) Real Vector n-Points Crossover (including 2) (RUX) Real Vector Uniform Crossover (RAX) Real Vector Arithmetic Crossover (RHX) Real Vector Heuristic Crossover (RSPLX) Real Vector Simplex Crossover (GEOMX) Geometric crossover (BLX-alpha) Blended crossover (F-BSX) Real Vector Fitness Scaning Based Crossover (DMPX) Real Vector Diagonal MultiParental Crossover (POX) Permutation Davis order Crossover (PPMX) Permutation Partially mapped Crossover (P2OX) Permutation Order 2 Crossover (PPX) Permutation Position Crossover (PDUX) Permutation Davis Unifform Crossover (PMPX) Permuation Maximal Preservative Crossover ECJ Area Characteristic APPENDIX A. MOFS ASSESSMENT DATA 1 0.023 0.023 4 HeuristicLab MALLBA SUM EasyLocal Opt4j OAT JCLEC FOM EvA2 Weight 0.01 ParadisEO Feature (PCX) Permutatio Cycle Crossover (TCX) Tree Cramer Crossover (TKX) Tree Koza Crossover (TMX) Tree Montana Crossover (SMFc) State Machine Fogel Crossover (SM1Pc) State Machine Zou & Grefenstette Crossover (SMUc) State Machine Uniform Crossover (SMJo) State Machine Join Operator (CSc) Composite/Combined Solution Encoding Crossover (CPXc) Complex Crossover operator (Bm) Binary/Integer Vector Basic Mutation (RUm) Real Vector Unifform Mutation (RNm) Real Vector Normal Mutation (RCM) Real Vector Cauchy Mutation (RLm) Real Vector Laplace Mutation (RSDm) Real Vector Schwefel Dynamic Mutation (RFDm) Real Vector Fogel Dynamic Mutation (P2Optm) Permutation 2-Opt mutation (P3Optm) Permutation 3-Opt mutation ECJ Area Characteristic A.1. EVALUATION PER AREA 4 4 2 4 4 4 4 4 4 4 4 4 1 0.023 4 0.0178 0 0.0178 0 0.0178 0 0.0178 0 0.07 4 4 4 4 5 4 0.07 0.06 0 4 0.01 0.01 4 4 4 4 4 4 4 7 4 4 4 4 4 4 6 4 4 4 4 4 6 0.01 0 0.01 0 0.01 0 0.01 0 0.01 4 1 0.01 4 1 Continued on next page 241 Solution Selection SUM MALLBA HeuristicLab EasyLocal Opt4j OAT 0 0.01 4 4 0.01 4 4 2 0.01 4 4 2 4 4 4 4 4 3 4 4 5 0.015 4 0.015 4 0.015 4 0.015 4 4 4 4 4 0.06 0 0.06 4 4 4 4 0.06 0.06 0.07 0.066 4 5 4 1 1 4 4 4 4 4 4 4 4 4 4 0.066 0.07 9 0 0 4 4 4 4 4 4 4 4 0.066 8 0 0.066 0.066 3 1 1 4 4 4 4 4 Continued on next page 242 JCLEC FOM EvA2 Weight 0.01 ParadisEO Feature (PKOptm) Permutation K-Opt mutation (PSWm) Permutation Swap Mutation (PIm) Permutation Insertion Mutation (PSCm) Permutation Scramble Mutation (TGm) Tree Grow Mutation (TSHm) Tree Shrink Mutation (TSWm) Tree Swapping Mutation (TCm) Tree Cycle Mutation (SMBm) State Machine Basic Mutation Operator (CSm) Composite/Combined Solution Encoding Mutation (CPXm) Complex Mutation operator (DEm) Dynamic probaility mutation Elitist Selector (Es) Expected Value Selector (EVs) Elitist Expected Value Selector (EEVs) Proportional Selector (Ps) Determinist Sampling Selector (DSs) Remaining Stocastic Sampling Selector (RSSs) Stocastic Tournament Selector (STs) ECJ Area Characteristic APPENDIX A. MOFS ASSESSMENT DATA 4 4 4 4 4 9 SUM MALLBA HeuristicLab EasyLocal Opt4j Feature Weight Stocastic Universal 0.066 4 4 Sampling Selector (SUSs) Linear Ranking Se0.066 4 4 lector (LRs) Mu.Lambda Selec0.066 4 4 4 4 4 tor (MLs) Mu+Lambda 0.066 4 4 4 4 4 Selector (M+Ls) Threshold Selector 0.066 (Ths) Boltzman Selector 0.066 4 4 4 (Bs) Random Selector 0.066 4 4 4 4 4 (RNDs) Combined Selector 0.066 4 4 4 (CMBs) DSL for Objective 0.33 Function Definition (DSLof) GUI & Graph. 0.33 tools for Objective Function Definition (GUIof) Interactive Ob0.33 jective Function Definition (Iof) Explicit Constraint 0.1 4 Modeling Penalization on 0.3 4 4 4 Objective Function Individual Con0.3 4 straint Solution Repairing Mechanism Global Solution Re0.3 4 4 pair Mechanism Feature Support Count 33 36 43 12 36 Weighted Sum 0.254 0.413 0.306 0.090 0.258 Table A.2: Coverage of features in area C2 OAT JCLEC FOM EvA2 ParadisEO ECJ O.F.S. Const. H. Area Characteristic A.1. EVALUATION PER AREA 2 4 3 4 4 7 4 4 7 0 3 4 4 4 8 3 ∼ 0.5 ∼ 0.5 0 1 3 1 2 10 0.078 29 0.210 2 0.150 48 0.484 23 0.102 272 243 Experiment Design 4 4 0.1 4 4 0.1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 0.1 4 4 4 0.1 4 4 4 ∼ 4 4 0.1 0.2 4 0.2 0.2 4 4 10 4 8 4 6 4 7 4 ∼ 4.5 3 4 4 4 ∼ 4 4 4 4 4 4 2.5 8 2.5 4 4 4 3 0.2 0 0.2 0 0.2 ∼ 4 4 2.5 0.2 ∼ 4 4 2.5 Continued on next page 244 4 SUM 4 4 MALLBA 0.1 4 HeuristicLab 4 EasyLocal FOM 4 Opt4j EvA2 4 OAT Weight 0.1 4 JCLEC Batch processing Feature Max iterations terminator Fitness value terminator Max execution time terminator Max objective function evaluations terminator Max iter./exec. time/obj. func. ev. without improvement terminator Composite Logical combinations terminator Technique specific Automated repetition of a single optimization task Automated repetition of a task varying parameters of the technique Automated repetition of different tasks (possibly varying the parameter of each one) Automated repetition of different tasks (possibly varying the parameter of each one) over different instances of the problem Random execution of planified tasks Hypothesis Definition Support Experiment Modelling Support (Definition of Dependent and Independent Variables) ParadisEO C4 Optimization Process Support Finalization Conditions ECJ Area Characteristic APPENDIX A. MOFS ASSESSMENT DATA Statistical Analysis One way ANOVA Two way ANOVA Multi-way ANOVA Wilcoxon Mann-Withney KolmogorovSmirnov Statistical Analysis Systems export/import GUI Design & Usability Metaheuristic techniques configuration & spec. Problem definition. modelling and data import Support for planning of optimization tasks Graphical Support for methodological guidance Charting Interoperab. Data Export Data Import Web Services Facade XML usage for projects & config Feature Support Count Weighted Sum SUM MALLBA HeuristicLab EasyLocal Opt4j OAT JCLEC FOM EvA2 Weight 0.2 ParadisEO Feature Experiment Design Suppoort (Factorial. fractional. latin squares. nested. etc.) Experiment Execution Support Experiment Execution Plan Generation Support Experimental Design Systems import/export T-Student ECJ Area Characteristic A.1. EVALUATION PER AREA ∼ 0.5 0.2 4 4 4 3 0.2 ∼ ∼ ∼ ∼ 0.1 4 4 4 4 0.1 0.1 0.1 4 4 4 4 4 4 4 4 0.1 0.1 0.1 4 4 4 4 4 4 4 ∼ 2.5 0.2 0.4 0.5 0.8 0.5 0.1b 6 0.1b 6 3 3 2 3 4 1 4 ∼ 0.3 0.1b 6 0.1b 6 4 0.5 0.2 1 0.5 0.5 1 0.1b 6 0.4 0.1b 6 0.25 0.25 0.25 1 4 4 0.25 ∼ ∼ ∼ 1 ∼ ∼ 1 1 1 0.5 1 0.8 1 1 4.9 4.3 0.5 1 2 1 1 3 2.3 0.5 0.4 1 1 ∼ ∼ 0.5 1 4 4 4 12.4 8 12.7 17.2 18.1 0.368 0.192 0.347 0.411 0.470 Table A.4: Coverage of features in area C4 0.5 7 3.5 3.5 0 2.5 4 23.2 0.582 7.7 0.208 8.2 0.165 14.5 0.458 5.5 0.131 127.5 245 APPENDIX A. MOFS ASSESSMENT DATA A.2 G LOBAL EVALUATION Table §A.2 show the global assessment computed per characteristic given the evaluation of each feature provided in th previous section. The value associated to each characteristic is computed by the aggregated sum of their features multiplied by its corresponding weight. The value associated to each feature is 1.0 if total support is provided (4), 0.5 if the MOF provides partial support (∼), and 0.0 if the feature is not supported. Table §A.2 shows the evaluation of area C5. Table §A.2 summarizes the total scores obtained by each framework on the different areas of the comparative. 246 OAT Opt4j EasyLocal HeuristicLab ∼ 4 4 4 4 4 ∼ ∼ ∼ 4 MALLBA JCLEC 4 FOM 4 EvA2 ParadisEO Weight Characteristic Feature Hybridization (BEMIh) Batch Ex0.1 ecution Multiple Instances Hybridization (BEMMh) Batch 0.2 Execution Multiple Metaheuristics Hybridization (IMMh) Interleaved 0,6 Multiple Metaheuristics Hybridization (Ch) Combined Hib0.1 ridizacion HyperPre-implemented Pa- 0.25 Heuristics rameter Setting metaproblem Pre-Implemented 0,25 Technique selection meta-problem Pre-implemented 0.25 Operators/Low level heuristics selection meta-problem Pre-implemented so- 0.25 lution encoding selection meta-problem Paral. & (IPDM) Independent 0.2 Dist, Parallel & Distributed Metaueristics execution (SSPDM) Shared So0.2 lutions (or Populations) Parallel & Distributed Metaheuristics (LSPDNM) Local 0,2 Search using Parallel & Dist Neighborhood exporation . Metah. (PDPEDM) Parallel 0.2 & Distributed Population Evaluation Metah. (PDESSM) Paralell 0.2 & Dist. Evaluation of Single Solution Metah. Feature Support Count Weighted Sum ECJ C3 Advanced Metaheuristic Characteristics Area SUM 7.5 4 4 2.5 ∼ 4 ∼ 4 ∼ 0.5 4 1 4 1 0 0 4 4 4 4 4 4 4 4 4 4 0 4 4 4 4 5.5 0.400 6 0.500 4 3 0.200 4 0.333 1.5 0.133 1 0.033 1 0.033 Table A.3: Coverage of features in area C3 2 0.100 3 0.300 4 4 4 3 4.5 0.367 31.5 Hybridization support Hyper-Heuristics support Parall. & Dist. Opt Finalization Conditions support Batch processing Experiments Design Support Statistical Analysis features User Interface & Graphical Reports Interoperability Problems & Tutorials Papers Documentation Popularity / Users HeuristicLab MALLBA 1 0.800 0 0.6 0.7 0.8 0.7 0.510 0 0 0 0 0 0 0 0 0 1 0 0.2 0.7 0 0 0 0 0 0.330 0.100 0.060 0.25 0.7 0.7 0.7 0 0.7 0.4 0.570 0.5 0 0 0 0.5 0 0.3 0.3 0.260 0 0 0 0 0.25 0 0 0 0 0.025 0 0 0.188 0 0.438 0.438 0.7 0 0 0 0 0.188 0.9 0 0 0 0 0.125 0 0 0 0 0 0.0625 0.3 0 0 0.190 0.044 0.113 0 0.9 0.588 0.9 0.15 0 0.443 0.270 C1 - Metaheuristic Techniques 1 1 0 1 1 1 0 0.8 0.6 0.9 0 0 0 0 0.7 0 0.2 0 0 0 0.9 1 0.2 0.85 0.6 0.8 0.3 0.7 0 0 0 0.125 0.7 0 OAT 1 JCLEC 1 FOM 0 EvA2 Avg ParadisEO EasyLocal Solution enconding Neighborhood Definition E/A Auxiliary Methods Solution Selection Objective Function Specification Contraint Handling Opt4j SD (Steepest Descent / Hill Climbing) SA (Simulated Annealing) TS (Tabu Search) GRASP VNS (Variabl Neighborhood Search) EA (Evolutionary Algorithms) PSO (Particle Swarn Optimization) AIS (Artificial Immune Systems) ACO Scatter Search MultiObjective Metaheuristics ECJ Characteristic C2 - Adaption to the Problem and Its Structure 0.7 0.775 0.075 0.588 0.113 0.738 0.9 0 0 0 0 0 0.226 0.6 0 0.409 0.467 0 0 0.3 0.4 0 0.7 0 0.8 0.8 0.393 0.667 0 0 0.467 0 0.426 0.533 0 0.02 0.333 0 0.321 0.2 0 0 0 0 0.616 0.467 0.333 0.197 0.267 0 0.261 0.400 0.033 0.6 0.7 0 0 0 0 0 0 0.160 0.1 0 0.3 0 0.9 0 0.3 0 0.370 0.050 0 0 0 0.8 0.300 0.6 0.15 0.5 0.7 0.665 C3 - Advanced Characteristics 0 0.5 0.4 0.1 0 0.5 0 0 0.6 0 0 0 C4 - MPS lifecycle Support 0.95 0.8 0.6 0.75 0.7 0.9 0.4 0 0 0 0.2 0 0.2 0.6 0.4 0.72 0.3 0.74 0.2 0 0.6 0.04 0.4 0.1 0 0 0.270 0.220 0 0 0.15 0.5 0.6 0.7 0 0.2 0 0 0.215 0.483 0 0.533 0.367 0.25 0.75 0.45 0 1 0.083 0.392 0.625 0.25 0.125 0 0.25 0.25 0 0 0.75 0 0.225 0.36 1 0.8 1 0.136 0.407 0.79 0.0595238 0.033898305 0.142 0.2 0 0.407 0.345 0.61 0 0.136 0.221 0.35 0 0.175 0.274 0.511 0.118 C6 - Documentation & support 0.068 0.034 0.153 0.288 0.136 0.283 0.027 0.195 0.027 0.097 0.6 0.41 0.59 0.14 0.62 0 0 0 0.027 0 Table A.5: Scores for C1 - C4 and C6 HeuristicLab MALLBA OAT Opt4j EasyLocal Open Source JCLEC Sof. Eng. Best Practices Packages / Modules Classes / Files (for non OO languages) Numerical Handling GPL FOM Plat- Open Source (Academic free license) All EvA2 Supported tforms ParadisEO ECJ Characteristic Licensing LGPL LGPL LGPL LGPL LGPL All All All All All WindowsWindows and Unix 0.2 0.64 0.9 0.1 0.7 0.4 0.7 0.2 80 244 119 785 80 514 1 0 0.62 CECILL ( ParadisEO) y LGPL (EO) All (Except for windows if using PEO) 0.4 28 226 10 542 54 594 215 510 63 304 109 373 35 417 1 1 0.75 0 1 0.5 1 GPL 1 unix Table A.6: Scores for C5 design, implementation & licensing Area 1 Supported Metaheuristics 2 Problem Adaption/Encoding 3 Advanced Metaheuristic Characteristics 4 Optimization Process Support 5 Design, Implementation & licensing 6 Documentation, samples & popularity ECJ 0.207 ParadisEO 0.381 EvA2 0.394 FOM 0.450 JCLEC 0.081 OAT 0.259 Opt4j 0.175 EasyLocal 0.264 HeuristicLab 0.324 MALLBA 0.245 Avg 0.282 0.254 0.413 0.306 0.090 0.258 0.078 0.210 0.150 0.484 0.102 0.249 0.400 0.500 0.200 0.333 0.133 0.033 0.033 0.100 0.300 0.367 0.226 0.368 0.192 0.347 0.411 0.470 0.582 0.208 0.165 0.458 0.131 0.356 0.905 0.797 0.738 0.660 0.975 0.650 0.925 0.717 0.708 0.417 0.786 0.789 0.348 0.238 0.118 0.234 0.364 0.213 0.094 0.340 0.177 0.304 Average per Framework 0.487 0.439 0.371 0.344 0.359 0.328 0.294 0.248 0.436 0.240 0.367 Table A.7: Global scores B M ETA - MODELS AND S CHEMAS his appendix provides formal definitions of the languages described in this dissertation in terms of their UML meta-model. As a consequence those descriptions are independent of the specific concrete syntaxes and serializations provided for such meta-model like xml schemas and documents, plain text syntaxes such as SEDL4People, or graphical notations. In the UML diagrams extension points of the meta-models are denoted shading the classes in red and with the ExtensionPoint stereotype. T Specifically, Section §B.1 provides the full specification of the meta-model of SEDL. Section §B.2 provides the full specification of the meta-model of MOEDL. 251 APPENDIX B. META-MODELS AND SCHEMAS B.1 SEDL M ETA - MODEL The SEDL meta-model is the result of an extensive analysis of a variety of experiments developed by authors [210, 213, 247], a careful study of the related literature, and a process of successive refinements of the meta-model after applying it to different scenarios. Specifically, we have taken [116] as the main reference for general experiment descriptions and [21, 23] for the specific details of metaheuristic optimization experiments. Additionally, we have evaluated other approaches (c.f. Chapter §4) and the proprietary formats and classes used by the set of MOFs assessed in Chapter §5 for experiment description. In general we use UML class diagrams to define the structure of the metamodel. The general structure of the metamodel of SEDL is depicted in Figure §B.1. Specifically, it defines Experiment as a base abstract class that provides the basic identification attributes for the experiment and acts as extension point. This extension point can be used by DSLs to define domain specific experiments by subclassification. SEDL provides a domain independent subclass of Experiment named BasicExperiment for describing any kind of scientific experiment. BasicExperiment enables a comprehensive specification of experiments by providing formalizations of the basic concepts described in Chapter §3. In particular, the attributes of a BasicExperiment are defined as follows: • id: string. Every experiment must be uniquely identified by an identifier, that usually will be a number preceded by ’Exp’. • name: string. This attribute provides a descriptive name for the experiment. • metaId: string. This attribute contextualizes the identifier of the experiment, reducing drastically the possibilities of identification conflicts. It identifies the author who performed the experiment, its supporting organization, and the experimental repository where the experiment is stored. Its purpose is similar to the namespace attribute of XML documents. The use of URLs as the value of this attribute is encouraged. • context: Context. This attribute provides detailed information about people, organizations and projects related to the experiment. • hypothesis: Hypothesis. Experiments in SEDL have an unique scientific hypothesis. This scientific hypothesis must be a logical assertion testable using the 252 B.1. SEDL META-MODEL data gathered during experimental conduction [183, 224], [4, chap. 6]. In the next section, a detailed description of Hypothesis is provided. • design: Design. The design of a SEDL experiment describes the set of variables and constants that are involved in the experiment, and a description of the experimental protocol that describes when and how are measured and modified the variables. Moreover, designs contain an specification of the analysis procedures to be used on the data gathered during experimental conduction. • configurations: Configuration[0..*]. Configurations describe the specific experimental settings and details about experimental conduction. For instance, in metaheuristic optimization experiments the configuration should specify the metaheuristic programs executed and the execution platform used (hardware and software). Configurations also provide details about the inputs and expected outputs of the experiment. Additionally, configurations are relevant for the description of the experiment along its life-cycle, since they describe the set experimental conductions performed (executions: Execution[0..*]). Executions describe a specific conduction the experiment, in terms of the execution process and its results. Moreover, each execution contains the results of the application of the methods specified in the design for analysis (analyses: Analysis[0..*]). They are intended to support the testing of the hypothesis of the experiment in order to draw of conclusions. • annotations: string[0..*]. Annotations are machine-processable information that can be included in the experiment for use by specific tools. • notes: string[0..*]. Other information about the experiment that cannot be fitted in previous fields can be recorded here. The specific meta-model of the context of experiments in SEDL is shown in Figure §B.2. The structure and types of hypotheses supported by SEDL are depicted in Figure §B.3. A RelationalHypothesis states an assertion about the relationship between the outcome and a non-empty the set of factors or characteristics (independent variables). This relationship can be a causal relationship (a change in the levels of the independent variables causes a change in the level of the dependent variable), or it can be ruled by an specific mathematical expression that allows to predict the level of the dependent variable based on the levels of the independent ones. This dichotomy leads to the creation of two different types of relational hypotheses in SEDL: DifferentialHypothesis and AsociationalHypothesis. We have included extension points for the description 253 APPENDIX B. META-MODELS AND SCHEMAS Figure B.1: Meta-model of experiments in SEDL Figure B.2: Meta-model of experiments context in SEDL 254 B.1. SEDL META-MODEL Figure B.3: Meta-model of experimental hypotheses in SEDL Figure B.4: Meta-model of experimental variables in SEDL of assertions in DescriptiveHypothesis and relationships between variables in AsociationalHypothesis. Thus users can define their own DSLs for specifying such elements. The meta-model of SEDL variables is depicted in Figure §B.4. The structure of detailed designs is described in Figure §B.6 as an UML class diagram. Some classes of this diagram have invariants. Specifically, the invariant of Treatment specifies that the referenced variables in its valuations are ActiveIndependentVariables (otherwise the level of the variable could not be changed). The invariant of VariableValuation ensures that its level is in the domain of the associated variable. The specification of predefined experimental designs is supported through the use of the PredefinedDesign extension point. In order to perform property checking on SEDL documents with a predefined designs, such designs must be expanded into its complete specification. In chapter §8 we define the contract that authors must fullfill in order to support the expansion of their specific predefined designs. In this dissertation 255 APPENDIX B. META-MODELS AND SCHEMAS Figure B.5: Meta-model of design in SEDL Figure B.6: Meta-model of experimental designs in SEDL 256 B.1. SEDL META-MODEL Figure B.7: Meta-model of experimental configurations in SEDL we only provide the specific designs needed by the metaheuristic optimization experiments we focus on (techniques comparison designs and technique parametrization designs), that are defined in Chapter §7. The components of a Configuration are depicted in Figure §B.7. The description if the input data required for experimental conduction is performed in SEDL through the ExperimentalInputs element, that comprises of a set of inputDataSources. Additionally, the InputFile of the experiment can contain a set of VariableValuations named features, in order to describe the levels of the variables that a specific input file has associated. The structure of SEDL Executions is described in Figure §B.8. Figure §B.9 depicts the structure of the specification of analyses to be performed (ExperimentalAnalysisSpecification), and their results (AnalysisResult). Since SEDL 257 APPENDIX B. META-MODELS AND SCHEMAS Figure B.8: Meta-model of experimental executions in SEDL Figure B.9: Meta-model of experimental analyses specifications and results 258 B.1. SEDL META-MODEL Figure B.10: Meta-model of dataset specifications in SEDL is aimed at the automation of the experimentation life-cycle, its provides support mainly for statistical analyses (other type of analyses such as charts or summary tables are useful, but require human interpretation and evaluation). Specifically, the Statistic package provides specific subclasses for ExperimentalAnalysisSpecification and AnalysisResult. The elements of this package are described in detail in the following subsections. Additionally, package DatasetSpecification provides mechanisms for specifying the subset of the results on which the analyses should be performed. Figure §B.10 depicts the elements provided in this package. The elements of this package are translations of the basic operators of relational algebra (Projection and Filtering) to our model plus the grouping operator for specifying how to compare datasets in presence of blocking variables. In this sense, the results of a dataset specification are specified by the union of the results of its associated projections as applied to the results obtained by applying its filters sequentially (equivalent to a single filter whose criteria is an AND of the corresponding filtering criteria). The structure of the StatisticalAnalysisSpecifications and StatisticalAnalysisResults supported by SEDL is depicted in Figure §B.11. Regarding analysis specification the meta-model defines: • CentralTendencyMeasures describes the way in which a real DependentVariable cluster around some value, that in our meta-model is stored in the attribute centralValue. Specifically, SEDL supports the following types of measures: the Mean, the Median, the Mode, and the ConfidenceInterval. The Mode is an special kind 259 APPENDIX B. META-MODELS AND SCHEMAS Figure B.11: Meta-model of statistical analyses in SEDL 260 B.1. SEDL META-MODEL of central tendency measure since it can describe information about any kind of variable, as a consequence, it can be associated to a specific level. ConfidenceIntervals provide information both about the clustering of values and its dispersion. As a consequence it is also a type of variability measure. The limits of the confidence interval are min and max. • VariabilityMeasure describes dispersion of the values of a real DependentVariable, expressed as magnitude that in our meta-model is stored in the attribute variabilityValue. Specifically, SEDL supports the following types of measures: the StandardDeviation, the Range, the InterQuartileRange, and the ConfidenceInterval. • Ranking defines an order relation on the Level of an IndependentVariable based on the value of a descriptive statistic. This class has an invariant that specifices that the associated DescriptiveStatistic should not be a Ranking. Regarding StatisticalAnalysis, SEDL supports the description of several types: • Null Hypothesis Significance Test (NHST), is a decision making mechanism about an hypothesis (named the null hypothesis) based on a dataset. These tests determine if the results would lead to the rejection of the null hypothesis for a prespecified level of significance. Specifically, NHSTs answer the following question: Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?. That probability is named the p-value. The usual practice is to assume that the null hypothesis is false when the p-values is lower than the significance level. The p-value is computed based on the value of a test statistic and on its theoretical distribution under the null hypothesis. Such distribution can have a number of freedom degrees. In SEDL the specific tests to be applied is identified by the attribute name. When NHST are used to detect significant differences among two distributions (the null hypothesis would be that the distributions are identical), they are called simple comparison NHST. Conversely, when NHST are used to detect significant differences among three or more distributions (the null hypothesis would be that all the distributions are identical), they are called multiple comparison NHST. Additionally, a MultipleComparisonNHST can be associated with a set of Post-hocProcedure. These procedures are a special kind of NHSTs, concerned with finding relationships among a couple of distributions from the as- 261 APPENDIX B. META-MODELS AND SCHEMAS sociated multiple comparison test. The specific distributions compared are identified by the respective variable valuations as shown in Figure §B.9. • CorrelationCoeficient describes the degree of relationship between two or more data sets, and is used to infer the presence or absence of an association. There are several methods to compute correlation. In the SEDL meta-model both CorrelationCoeficient and NHST are generic, without binding to a specific hypothesis testing or coefficient computation method. This enhances the expressiveness of the language. For each specific type of analysis specification, SEDL defines a corresponding subtype of StatisticalAnalysisResult: • DescriptiveStatisticValue supports expressing the results for any DescriptiveStatistic, such asMeans, Medians, StandardDeviations, etc. It is associated with a level (value) of the dependent variable of the experiment. Since the relationship of the level with the dependent variable is not shown in the diagram of Figure §B.11, this constraint is specified as invariant of DescriptiveStatisticValue (the variable of the level specified as result pertains to the domain of a DependentVariant). • RankingResult provides an order list of the levels for the ranking criterion. Class RankingResult, has an OCL invariant that specifies that the values specified in the ranking should be in the domain of the ranking variable variable. • ConfidenceIntervalValue provides a minimum and maximum value for a confidence interval. • CorrelationCoeficientValue provides a value and optional description of the correlation coeficient computed for the results of the experiment. • PValue provides the p-value of the test of hypothesis associated. It contains the value, the degrees of freedom, and an optional description. B.2 MOEDL M ETA - MODEL Figure §B.12 depicts the main elements of MOEDL, their structure and their relationships as an UML class diagram. 262 B.2. MOEDL META-MODEL Figure B.12: Types of Experiments supported by MOEDL and their structure 263 APPENDIX B. META-MODELS AND SCHEMAS Figure B.13: Termination criteria supported by MOEDL and their structure MetaheuristiOptimizationExperiment has an OCL invariant specifying that either a global termination criterion is specified for the experiment, or each optimization technique defines its own termination criterion. It is worth noting that the use of different termination criteria in this kind of experiments could lead to bias in the comparison (some algorithms can use more computational resources than others in their execution), and consequently to wrong conclusions regarding the performance of algorithms. An OCL invariant in MetaheuristiOptimizationExperiment states that a random number generation algorithm and seed must be specified either for the experiment as a global setting, or for each optimization technique in particular. An OCL invariant in MetaheuristiOptimizationExperiment states that termination criterionmust be specified either for the experiment as a global setting, or for each optimization technique in particular. Figure §B.12 depicts the meta-model of the description of this kind of experiments supported by MOEDL. A TechniqueParametrizationExperiment is associated with a single technique (note the first clause of its invariant), and contains a set of SimpleParameter definitions. These definitions specify the domain of the parameter using a SEDL Domain but not its value, since finding its optimal value is the purpose of the experiment. Currently only simple parameter dimensions are supported, i.e., the possible values of the parameters must be enumerated. Figure §B.13 describes the termination criteria supported in MOEDL. 264 B.3. XML SCHEMAS OF SEDL AND MOEDL B.3 XML S CHEMAS OF SEDL AND MOEDL The XML schemas that provide a concrete syntax for the above described metamodels are available at: • SEDL: http://moses.us.es/schemas/sedl/v1/SEDL.xsd • MOEDL: http://moses.us.es/schemas/moedl/v1/MOEDL.xsd Those schemas has been generated directly from the meta-models using the default Eclipse EMF transformation to XML schema. 265 C A M ETAHEURISTICS D ESCRIPTION S YNTAX IN EBNF 1 Syntactical definitions Metaheuristics Supported meataheuristic ::= [openPar meaheuristicDec closePars] | metaheuristicDef metaheuristicDef ::= EA | SD | VNS | TS | SA | AS | GRASP | PR | RS ; Common parameters MHParams initParam solutionClass initType metaheuristicInit randomInit ::= ::= ::= ::= ::= ::= initScheme [listSep solutionClass [listSep terminationCriterion]] ; 0 Init 0 valueSep initType ; 0 class 0 ; |0 encoding 0 classSpec ; randomInit | metaheuristicInit ; metaheuristic ; ‘Random 0 ; Termination Criteria TerminationCriterion NIterationsCriterion MaxTimeCriterion RepeatCriterion TerminationCriterionList ::= ::= ::= ::= ::= NIterationsCriterion | MaxTimeCriterion | RepeatCriterion ; ‘MaxIterations 0 openParams Integer closeParams ; ‘MaxTime 0 openParams Integer closeParams ; ‘Repeat 0 openParams TerminationCriterionList closeParams ; TerminationCriterion { listSep TerminationCriterion } ∗ ; Selection criteria ::= (ElitistSelector | RandomSelector | RouletteWheelSelector | TournamentSelector | ProportionalRakSelector | CustomSelector ) openPars SignSpec listSep RepeatSpec closePars ; ElitistSelector ::= 0 Elitist 0 ; RandomSelector ::= ‘Random 0 ; RouleteWheelSelector ::= ‘Roulette 0 ; TournamentSelector ::= ‘Tournament 0 Integerselector ; ProportionalRanksSelector ::= ‘ProportionalToRanks 0 ; CustomSelector ::= CustomElement ; SignSpec ::= [‘sign 0 ]‘positive 0 | ‘negative 0 ; RepeatSpec ::= [‘repeat 0 ]Boolean ; Selector Steepest Descent / Hill Climbing SD ::= (0 SD 0 |0 HC 0 )[initSep MHParams endSep] ; Simulated Annealing SA SAParams CoolingScheme LinearCS ExponentialCS LogaritmicCS InitialTemperature SolutionsPerIter ::= ::= ::= ::= ::= ::= ::= ::= initSep MHParams SAParams endSep ; listSep CoolingScheme listSep InitialTemperature listSep SolutionsPerIter ; [0 coolingScheme 0 valueSep] (LinearCS | ExponentialCS | LogaritmicCS ) ; 0 Linear 0 openPars Float closePars ; 0 Exponential 0 openPars Float closePars ; 0 Logarimic 0 openPars Float closePars ; 0 [ initialTemperature 0 valueSep] Float ; [0 solutionPerIter 0 valueSep] Integer ; 1 267 Tabu Search TS ::= initSep MHParamslistSep TSParams endSep ; Tabu Search parameters TSParams TSMemory RecencyTSMemory FrequencyTSMemory CustomTSMemory TSAspriationCriterion ImproveAspCrit CustomAspCrit ::= ::= ::= ::= ::= ::= ::= ::= TSMemorylistSep TSAspirationCriterion ; RecencyTSMemory | FrequencyTSMemory | CustomTSMemory ; 0 Recency 0 openPar Integer closePar ; Frecuency openPar Integer closePar ; CustomElement ; ImproveAspCrit | CustomAspCrit ; 0 Improvement 0 ; 0 Custom 0 openPars classSpec closePars ; Variable Neighbourhood Search (VNS) VNS ::= initSep MHParamslistSep VNSParams endSep ; VNS parameters VNSParams SolutionsPerNeigh NeighList NeighbourhoodStruct ::= ::= ::= ::= SolutionsPerNeigh listSep NeighList ; [0 solutionsPerNeigh 0 valueSep] Integer ; NeighbourhoodStruct listSepNeighbourhoodStruct + ; CutomElement ; Evolutionary Algorithm EA ::= ‘EA0 initSepMHParamslistSepEAParamsendSep ; Evolutionary Algorithm Parameters EAParams ::= PopulationSize listSep CrossoverOp listSep CrossoverSel listSep CrossoverProb [IncestThreshold listSetp] IncestTreatment [MutationOplistSetp] MutationSel listSep MutationProb SurvivalPolicy ; CrossoverOp ::= 0 1px 0 |0 2px 0 |0 Unifform 0 | CustomOp ; CrossoverSel ::= [0 crossoverSel 0 cvalueSep]selector ; CrossoverProb ::= [0 crossoverProb 0 cvalueSep] Float ; CustomOp ::= CustomElement MutationOp ::= [0 mutationOp 0 cvalueSep] 0 bitflip 0 | CustomOp ; MutationSel ::= [0 mutationSel 0 cvalueSep] Selector ; MutationProb ::= [0 mutationProb 0 cvalueSep] Float ; IncestThreshold ::= [0 mutationThreshold 0 cvalueSep] Float ; IncestTreatment ::= [0 incestTreatment 0 cvalueSep] ‘Repair 0 ReparisonMechanism |0 Remove 0 ; SurvivalPolicy ::= [0 survivalPolicy 0 cvalueSep]SelectorRelpacer ; SelectorReplacer ::= Selector ; GRASP GRASP ::= ‘GRASP 0 initSep MHParams listSep GRASPParams endSep ; 2 268 GRASP parameters GRASPParams RCLSelection GreedyAndAlphaRCL GFunction Alpha CustomRCLSelection ::= ::= ::= ::= ::= ::= RCLSelection, LocalImprovement ; [RCLcreation valueSep] GreedyAndAlphaRCL | CustomRCLSecletion ; 0 Greedy 0 openPars GFunction listSep Alpha closePars ; 0 [ g − function 0 valueSep] CustomElement ; [0 alpha 0 valueSep] Float ; CustomeElement ; PR PR ::= ‘PR 0 initSep MHParams listSep PRParams endSep ; PR parameters PRParams GSS EliteSetSize RelinkingSteps ::= ::= ::= ::= GSS listSep EliteSetSize listSepRelinkingSteps ; [0 guidings olutions elector 0 ] Selector ; [0 eliteSetSize 0 valueSep] Integer ; [0 relinkingSteps 0 valueSep] Integer ; Ant Systems AS ::= ‘AS 0 initSep MHParams listSep ASParams endSep ; Ant System parameters ASParams ::= ::= NAnts ::= NAnts ::= PheromoneTrailUpdater ::= EvaporationScheme ::= NAnts listSepNUpdaters listSep PheromoneTrailUpdater listSep EvaporationScheme ; [0 ants 0 valueSep] Integer ; [0 updaters 0 valueSep]Integer [0 %0 ] ; Selector ; CoolingScheme ; 3 269 2 Lexical definitions Basic Types CutomElement ::= ::= classSpec ::= Float ::= Integer ::= Digit ::= Boolean ::= String ::= Character ::= letter ::= [0 Custom 0 openPar [String listSep] classSpec closePar ] | classSpec ; [0 class 0 valueSep] String ; Integer (0 .0 Integer ) |0 f 0 ; [+ | −] Digit {Digit} ∗ ; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ; 0 true 0 | 0 false 0 ; 0 0 ‘ Character {Character } ∗0 ‘0 ; letter | Digit ; 0 0 a | 0b0 | 0c0 | 0d 0 | 0e 0 | 0f 0 | 0g 0 0 0 | h | 0 i 0 | 0 j 0 | 0 k 0 | 0 L0 | 0 m 0 | 0 n 0 | 0 o 0 | 0 p 0 | 0 q 0 | 0 r 0 | 0 s 0 | 0 t 0 | 00 | 0u 0 | 0v 0 | 0w 0 | 0x 0 | 0x 0 | 0z 0 | | 0 A0 | 0 B 0 | 0 C 0 | 0 D 0 | 0 E 0 | 0 F 0 | 0 G 0 | 0 H 0 | 0 I 0 | 0 J 0 | 0 K 0 | 0 L0 | 0 M 0 | 0 N 0 | 00 | 0 O 0 | 0 P 0 | 0 Q 0 | 0 R 0 | 0 S 0 | 0 T 0 | 0U 0 | 0V 0 | 0W 0 | 0X 0 | 0Y 0 | 0Z 0; Separators listSep valueSep initSep endSep openPars closePars ::= ::= ::= ::= ::= ::= ‘,0 ; ‘ :0 ; ‘{0 ; ‘}0 ; ‘(0 ; ‘)0 ; 4 270 D S TATISTICAL TESTS SUPPORTED NHST suppport in SEDL Purpose Normality condition Test Reference Kolmogorov-Smirnov Lilliefors Shapiro-Wilk [255] [175] [251] Levene [172] Parametric pairwise comparison T-student [252] Non-parametric pairwise comparison Wilcoxon McNemar [296] [185] ANOVA [252] Friedman Aligned Friedman Iman & Davenport Quade Cochran Q [106] [137] [146] [228] [252] Bonferroni-Dunn Holm Hochberg Hommel Holland Rom Finner Li Shaffer Nemenyi [79] [141] [136] [142] [138] [239] [92] [174] [249] [198] Homoscedasticity condition Parametric multiple comparison Non-parametric multiple comparison Post-hoc analyses Table D.1: Set of tests and post-hoc analyses supported by SEDL 271 E SEA According to the experimentation guidelines of the literature [21, 80, 286], the reporting of experiments with the aim of reproducibility requires not only specifying its hypothesis, design and analyses (tasks for which we have provided support with the above described languages), but also providing all the input and output data of the experiment, along with all the experimental artefacts used for its conduction, such as survey forms, data gathering spreadsheets, etc. In the context of computational experiments those artefacts are usually algorithms implementations or elements encoded as computer files, and consequently such information can be packaged along with the experiment description in an electronic resource that fully describes the experiment. We denote such as resource as the experimental lab-pack. Using a lab-pack for replicating an experiment requires identifying the role of each comprising element (inputs, outputs, experimental artefacts), and using them properly during the execution of the experimental procedure. We consider that the role of those elements in the experiment should have an impact on their location in the lab-pack, in order to ease its use independently of the experiment description and promote an structured layout. Thus, we propose the creation of a standard layout for the elements of lab-packs depending on their role in the experimental protocol, and a default location and file name for the main experimental description (written in SEDL or in one of its DSLs). Such a standard folder structure is depicted in Figure §E.1. According to this figure, the experimental description file of a SEA lab-pack must be at the root folder, and its name must be “experimentalDescription” (with the corresponding extension, for instance “.sed” if it is plain SEDL or “.moe” for MOEDL descriptions). The layout of the lab-pack folders is divided into general and configuration-specific files. The general elements are placed in folders directly under the root of the lab-pack, such as the “/input” folder, that contains all the general input data used during the experiment. For instance, the files that contain the specific problem instance data in a MOEDL experiment if they are available in a configuration-independent format (such as plain text or xml files). The configurations folder contains one sub-folder per Configuration specified in the exper- 273 imental description of the lab-pack, where the name of the folder must be the identifier of the corresponding Configuration. Each configuration contains a set of folders with configuration-specific information as described next: • “configurations/hConfigIDi/input” contains the input information that is specific for this configuration; the • “configurations/hConfigIDi/artefact” contains the set of artefacts used for the execution of the experimental procedure of the configuration, such as the source code of the algorithms, their binary executables, or the documents that contain the survey used in the experiment. Those artefacts are arranged according to the nature in three different folders: – “src” for the editable documents or source code of the algorithms (in this way they can be used to generate the artifacts needed to perform future non-exact replications. – “bin” for the actual artefacts used in the experimental protocol execution. For instance, this folder could contain the forms used for the survey in “.pdf” format or the executable “.jar” archives for java algorithms. – “docs” for the additional documentation about how to use the artefacts during the experimental conduction. For instance, this folder could contain a transcription of the speech performed by instructors prior to surveying users, or the documentation about the usage of the executable files or the source code provided. • “configurations/hConfigIDi/executions” this folder contains the information about each individual execution of the experiment with this configuration. The results of each execution will be stored in a folder whose name is the execution ID according to the experimental description of the lab-pack, i.e., “configurations/hConfigIDi/executions/hExecutionIDi/h ResultFileNamei ”. We propose using the tar packaging format as described by IEEE in the standard [145] for distributing whole lab-packs as single files, and the use of the “.sea” extension for such files. Additionally, a “.zea” extension could also be used if the lab-pack file is compressed using the zip format as described by ISO/IEC in [149]. Regarding to the use of the elements in the lab-pack in the context of execution of the experimental procedure, we consider that it is strongly dependent on the experiment domain and the corresponding DSL used for describing the experimental 274 Figure E.1: Layout and structure of SEA lab-packs procedure. For non-automated experimental procedures, the actors involved in the replication can have access to the elements in the lab-pack and use them accordingly to the description provided in the procedure. For automated experimental procedures, we rely on the extensibility of SEDL for supporting different DSLs for experimental procedures description and the extension mechanisms provided in MOSES for creating modules that perform the execution accordingly. Regarding the languages for experimental procedures description provided in this dissertation, our current implementations ensure that: Experimental procedures described using the command-based language, the elements provided in the global and configuration specific1 “/input” folders will be copied to the commands execution environment in a folder named “/input”, consequently they can be used as parameters for any command. Moreover, the elements provided in the global an configuration specific “/artefacts/bin” folders will be available in the path of the commands execution environment, and can consequently be invoked as commands. Those experiments 1 Note the any experimental procedure execution is associated with a specific configuration of the experiment 275 described using MOEDL have an implicit experimental procedure. This procedures are executed using an optimization execution engine provided by MOSES (E3) that executes the algorithms specified on the problems instances in a randomized order. MOSES-EE delegates the responsibility of running optimization specific tasks (algorithm X on problem instance Y) on a MOF-specific module named the metaheuristics experimental tasks execution engine (MOF-E3). E3 ensures that any MOF-EE will have access to a copy of the whole lab-pack in a specific path specified as a parameter at invocation. In the specific case of FOM, since it is a JAVA framework, the optimization task execution engine adds all the jar files provided in the global an configuration specific “/artefacts/bin” folders to the class-loader, and the global an configuration specific “/inputs” are copied in a similar way as in the commands-based language. Thus, problem instances data can be loaded directly from the “/inputs” folder and the optimization algorithms created according to the description provided in the configuration parameters of the experiment (whose implementation could be stored in the jars provided in the “/artefacts/bin” folder of the lab-pack) can optimize them. 276 F EEE: E XPERIMENTAL E XECUTION E NVIRONMENT The purpose of E3 is to provide a basic implementation of the ExperimentalExecutor service as defined in MOSES (c.f. §8.3.2). Currently it is a command line tool that loads the experimental descriptions (either SEDL or MOEDL through a transformation) and executes the experiment. This execution is performed in two alternative ways. If the experimental procedure is a Command it executes the corresponding invocations through the shell of the operating system, according to the sequence specified by the experimental protocol (currently only Random protocols are provided). Otherwise it tries to execute the experiment as a MOF by running the algorithms specified in FOM. E3 assumes that the current directory where it is invoked is a lab-pack that follows the structure specified by SEDL. Thus, all the file names are interpreted accordingly. For instance, problem instance files are searched in the “/input” and “/con f igurations/hcon f igID i/input”. This implementation is currently a prototype, not intended for industrial use. Furthermore, the automated execution of the analyses specified in the experimental description is not still implemented. 277 G Q O S- AWARE B INDING OF C OMPOSITE W EB S ERVICES Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives William A. Foster, Found in Igniting the Spirit at Work: Daily Reflections G.1 T HE Q O S- AWARE C OMPOSITE W EB S ERVICES B IND ING P ROBLEM In service oriented environments, complex applications are developed by composing web services; i.e., specifying through the control flow instructions of the application the sequence of invocation of composed services. These applications can be created general purpose programming languages, that contain instructions invoking several web services, such as java or C#, or expressed through web service composition languages such as BPEL [10]. In the latter case, the compositions are deployed usually as services, in such a way that other service based applications can use it (as services being composed). In the remainder of this dissertation we focus on this latter case, but the algorithms and solutions described are applicable to service based applications implemented with general purpose programming languages with minimal modifications. Service based applications can be composed using abstract services, where the services to be invoked are chosen dynamically at runtime from a set of candidates that implement the same functionality and present a compatible interface. In this context, Quality of Service (QoS) has been identified as a key element to guide the selection of those candidates, using the values of the QoS properties for each candidate service, such as execution time, invocation cost or availability. The development of QoS-aware composite services leads to the creation of context-aware and automat- 279 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Figure G.1: Goods Ordering Composite Service ically optimized applications, depending on available services and user preferences. This problem represents an important SOC research challenge [207, 208] identified as part of the Search Based Software Engineering area [129]. QoS-aware Composite Web Service Binding (QoSWSCB) implies solving a NP-hard optimization problem [14, 35]. In [14] is shown that this problem is similar to the MMKP, where every instance of the MMKP can be formulated as a QoSWSC. Since QoS levels provided by a service may change frequently, and even some services can become unavailable or new services emerge [44], composition approaches which take into account runtime changes in the QoS of component services are needed. When this problem is solved at runtime (during the execution of the sequence of invocations that conform the composition) or at invocation time(immediately before its start), it is called a reoptimization or rebinding problem [303][15]. In these circumstances, the time to obtain a good solution is a critical issue, and the use of heuristics appears as a promising approach [27]. In literature some metaheuristic techniques has been proposed to solve this problem [42, 166, 287]. Actor Provider Task Candidate Service Cost (in cents) Execution Time BANK A t1 s1,A 1 0.2 B t2 s2,A 2 0.2 t1 s1,B 1.5 0.1 t2 s2,B 5 0.15 PROVIDER C D t3 t4 t3 t4 s3,C s4,C s3,D s4,D 1 2 1 5 0.2 0.2 0.4 0.25 DELIVERY E F t5 t5 s5,E s5,F 1 2 0.2 0.2 DIG. SIGN. G H t6 t6 s6,G s6,H 1 2 0.2 0.1 SURVEYING I J t7 t7 s7,I s7,J 1.5 5 0.1 0.15 Table G.1: Service providers per Role and their corresponding QoS Guarantees 280 G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM A motivating example In order to illustrate the QoSWSCB problem, a goods ordering service inspired in the example provided in [305] is depicted in Fig. §G.1 using BPMN 2.0. The diagram specifies a business process exposed as a composite web service that uses 7 services with alternative providers (henceforth named tasks, t1 , . . . , t7 ). Table §G.1 shows the available service providers for each task and their corresponding QoS attributes. As illustrated, two candidate services are available for each task. The composition starts when a client sends an order. First the order is registered. Next if the payment type of the order is “Credit Card”, the card is checked (t1 ) and the payment (t2 ) is performed. As depicted in Table §G.1, two banks providers are available, A and B, and each of them provide candidate services for the tasks t1 and t2 , denoted as s1,A , s2,A , s1,B and s2,B . Different providers could be chosen in the binding of the CWS for each task; e.g. A for t1 , and B for t2 . Next the stock is checked (t3 ) and the products are reserved for pick-up (t4 ). If any product in the order is not in stock, the user is informed of the delay and the CWS waits for some time until activities t3 and t4 are repeated (creating a loop). It is worth noting that the same provider must be chosen for the tasks t3 and t4 , since the reservation in t4 refers to the stock of the specific provider queried in t3 . Once the order is ready for delivery two branches are performed in parallel. The pick-up and delivery (t5 ) to the client is requested, and an e-mail is sent to the client with an enclosed digitally signed invoice (t6 ). Once the activities on both branches are performed, the completion of an user satisfaction survey (t7 ) is requested. Additionally, Fig. §G.1 shows several QoS constraints that must be fulfilled. These constraints may affect to single tasks (e.g. ”the cost of credit card payment must be lower than 0.1$”) or to a group of tasks (e.g. “The total execution time of the remainder activities after having the order ready for delivery must be lower than 0.5 seconds”) The QoSWSCB problem can be stated as finding the binding that fulfills all the QoS constraints and maximizes or minimizes certain user-defined optimization criteria, e.g. minimize cost. Note that this may become extremely complex as the number of candidate services increases. In this example two providers are available for each task, thus 128 (27 ) different bindings are possible. This problem becomes especially convoluted in rebinding scenarios where providers can become unavailable and QoS levels may change unexpectedly. Moreover a single service provider can expose its services with different QoS values 281 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES defining a whole set of alternative candidates. For instance, the SLA of the Amazon Simple Storage Service (AmazonS3) provides three types of storage (standard, redundant, and glacier) with different QoS values The global cost of the binding χ = ( A, B, D, D, F, H, J ) for an invocation where payment is performed using credit card would be: Qcost (χ) = cost1 (χ) + cost2 (χ) + cost3 (χ) + cost4 (χ) + cost6 (χ) + cost7 (χ) = cost(s1,A ) + cost(s2,B ) + cost(s3,D ) + cost(s4,D + cost(s5,F ) + cost(s6,H ) + cost(s7,J ) = 1 + 5 + 1 + 5 + 2 + 2 + 5 = 21 cents. Since the total execution time of two parallel branches is equal to the maximum of their execution times global execution time for such binding, under the assumption of finding the goods in stock, would be: Q ExecTime (χ) = ( ExecTime1 (χ) + ExecTime2 (χ) + ExecTime3 (χ)) + ExecTime4 (χ) + + Max ( ExecTime5 (χ), ExecTime6 (χ)) + ExecTime7 (χ) = ExecTime(s1,A ) + ExecTime(s2,B ) + ExecTime(s3,D ) + ExecTime(s4,D + Max ( ExecTime(s5,F ), ExecTimet(s6,H )) + ExecTime(s7,J ) = 0.2 + 0.15 + 0.4 + 0.25 + + Max (0.2, 0.1) + 0.15 = 1, 25 seconds. The process of QoS-aware binding composite web services The QoS-aware binding of a composite web service is performed as follows: When the CWS is invoked or a rebinding is needed [44], the set T = {t1 , . . . , tn } of tasks is identified. For each task ti , the set of service providers available Si = {si,1 , . . . , si,m } (named candidate services) is determined by performing a search on a service registry. For each candidate service si,j , the QoS information is retrieved. The value provided by si,j for the QoS property q is denoted as qi,j ; e.g. according to Table §G.1 the cost of invoking the payment service of provider A (cost2,A ) is 0.02$. Given that some registry technologies do not support QoS information, a QoS-enriched registry or alternative QoS information source (such as a Service Level Agreements Repository or a Service Trading Framework [90]) is needed. The set of QoS properties taken into account is denoted as Q. Taking into account this information the expected QoS provided by the application can be optimized. The goal of this optimization is to find the binding that maximizes the utility of the global QoS provided according to the consumers’ preferences. Such preferences determine which binding is more valuable based on the global QoS levels Qq provided for each property q. For instance, a total execution time Q ExTime of 2 seconds could be fair for some users but too much for others. User preferences are expressed as weights wq and utility functions Uq for each QoS property q. The weights define the relative importance of each property. For instance, wCost = 0.2 and wExTime = 282 G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM 0.1 means cost is twice as important as execution time for the user. Utility functions Uq define which values of the specific property are more useful for the user. Thus, our goal translates in to finding the best binding χ∗ that maximizes the global user utility computed as: GlobUtil (χ) = ∑ Uq (Qq (χ)) ∗ wq (G.1) q ∈Q having ∑q∈Q wq = 1. Similar schemes for expressing user preferences and global utility function have been used extensively in the literature [15, 43, 261, 303]. G.1.1 QoS Model QoS properties The set of quality properties Q = {C, T, A, R, S} considered in this dissertation has been used extensively in related work [15, 43, 303]. It comprises of: Cost (C). Fee that users must pay for invoking a service. Execution Time (T). Expected delay between service invocation and the instant when result is obtained. Availability (A). Probability of accessing the service per invocation, where its domain is [0, 1]. Reliability (R). It measures the trustworthiness of the service. It represents the ability to meet the quality guarantees for the rest of the properties. Its value is usually computed based on a ranking performed by end users. For example, in www.amazon.com, the range is [0, 5] where 0 means that QoS guarantees are violated systematically, and 5 means that guarantees are always respected. In this dissertation we assume its domain is [0, 1]. Security (S). It represents the quality aspect of a service to provide mechanisms to assure confidentiality, authentication and non-repudiation of the parties involved. Usually this property implies the use of encryption algorithms with different strength, different key sizes on underlying messages, and some kind of access control. In this article we use a categorization of the security, where the use of an encryption algorithm and key size in a service implies a numerical value associated to this property for the ser- 283 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES vice. Its domain is [0, 1], where value 0 means no security at all and value 1 means maximum security. Although all those properties are domain-independent, new and possibly domain-dependent quality properties can be added without fundamentally altering our approach. QoS properties are usually classified as negative or positive. A quality property is positive if the higher the value, the higher the user utility. For instance, availability is a positive property, since the higher the availability the better. A quality property is negative if the higher the value, the lower the utility. For instance, cost is a negative property. A widely used [15, 43, 303] definition of the utility of the value x for a QoS property q is: 1 if qmax − qmin = 0 x −qmin if q is positive (G.2) Uq ( x ) = max − qmin q max q −x if q is negative qmax −qmin where qmax and qmin are the maximum and minimum values of qi,j for all candidate services. Computing the Global QoS Apart from the specific providers chosen for each task, the global QoS values for the CWS depend on: The workflow of the composition and the type of QoS property. Global QoS is computed by recursively applying a QoS aggregation function according to the building blocks that define the structure of the composition. Table 2 summarizes the aggregation functions applied for each QoS property q and type of building block. These functions are widely applied in literature [15, 43, 261, 287, 303]. For instance, the total execution time of the parallel branches is computed as the maximum execution time of any branch, but the execution time of a sequence of tasks is computed as the sum. Similarly, given a specific worklow, for instance the parallel branches of our motivating example (tasks t6 and t5 ), the total cost is computed as the sum of the costs of the tasks, but the total execution time is computed as the maximum execution time of any branch. The specific branches chosen for execution and the number of iterations performed in loops. Since in general the specific run-time behaviour of loops and alternative branches is unknown in advance, an estimate of this behaviour is needed to perform QoS-aware binding [44]. Specifically, for each loop and alternative execution branch in the workflow, the average number of iterations k, and the probability of branch execu- 284 G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM Sequence (S) Loop (L) Branch (B) Fork (F) ∑i=n1 C (si )o p f Cost (C) ∑im=1 C ( ai ) k · ∑in=1 C ( ai ) ∑im=1 Pi · C (sib ) Time (T) ∑im=1 T ( ai ) k · ∑in=1 T ( ai ) ∑im=1 Pi · T (sib ) max T (si ) Reliability (R) ∏im=1 R( ai ) ∏im=1 A( ai ) (∏in=1 R( ai ))k ∑im=1 Pi · R(sib ) ∑im=1 Pi · A(sib ) ∏ i =1 R ( s i ) p f ∏ i =1 A ( s i ) ∑im=1 Pi · S(sib ) f B ( FS (sib ) , [ pi ]) mini=1 S(si ) Avaliability (A) Security (S) min(S( ai ))i∈{1...m} Custom attribute (F) f S ( F ( ai ))i∈1...m (∏in=1 A( ai ))k min(S( ai )) f L (s L , k )) f p f p f f f F ( F (si ))i∈1...p Table G.2: QoS aggregation functions tion pb are estimated. For instance, given PCCard = 0.8 and k = 2, i.e. probability of using credit card is 0.8, and 2 iterations of stock reservation are performed; the estimated global cost for the binding χ = ( A, B, D, D, F, H, J ) in the sample problem instance presented in Section §G.1 is: QCost (χ) = Cost of switch(χ) + Cost of Loop(χ) + Cost of fork(χ) + Cost7 (χ) = 0.8 ∗ 0.025 + 2 ∗ 0.06 + 0.09 = 0.23$ The actual global QoS values provided can differ significantly from the estimations in some invocations, since those values are estimates. In the worst case this deviation can lead to the violation global QoS constraints. To avoid this problem, the re-binding triggering approach proposed in [44] could be used. The approach used to compute the global QoS of binding Qq (χ) is similar to that proposed in [46]. These functions are recursively defined on activities of the composition structure , allowing to compute global values by the recursive application of the corresponding function on each building block. Constraints of the QoSWSCB problem The QoSWSCB problem has three types of constraints[15, 303]: Global QoS constraints. They affect the QoS of the CWS as a whole. E.g. the total cost of the composition must be lower than five ≡ Qcost (χ) < 5. Local QoS Constraints. They affect the QoS values provided by the service chosen for a specific tasks. E.g. the cost of payment (t2 ) must be lower than 1 ≡ cost2 (χ) < 1. Service dependence constraints. A CWS may use several services that must be binded to the same provider. This situation creates a dependence, i.e., if the provider is selected for one of the tasks, then it must be selected for the rest of tasks it implements. In our 285 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES motivating example there exists a dependence constraint between tasks t3 and t4 (stock management and reservation). Some examples of these interdependences can be found in [15] regarding stateful services. Our Proposal: QoSGasp QoS-Gasp is a novel proposal for solving the QoSWSCB problem. It stands for “QoS-aware GRASP+PR algorithm for service-based applications binding”. It is an hybrid algorithm, where GRASP is used for initializing the elite set of Path Relinking. Current approaches for solving the QoSWSC problem The QoS-aware service composition is one of the most promising possibilities that Service Oriented technologies brings to service developers and software architects. It brings the dynamic, loosely coupled service selection paradigm of service orientation to its maximum expression. This problem provides an excellent application scenario for different methods and techniques, ranging from pure optimization techniques to artificial intelligence systems. In this context, two kinds of optimization strategies have been formulated for this problem in literature[303] [14]; global and local selection. • Local approaches have two main drawbacks: (i)Obtained solutions are suboptimal with regards to the overall quality of the composite web service. (ii) Global constraints according the structure of the composite web service and their quality properties cannot be imposed. • Global approaches try to optimize the whole set of services used in the composition according to their QoS properties, taking into account the structure of the composition. Therefore, global QoS constraints can be formulated. In doing so, more information about the expected properties of the composition must be provided, in order to overcome the variability that could arise in the optimization process. Global approaches to solve the QoSWSC problem comprise: • The use of Integer Programing techniques ([303] [7]), Linear ([45]) or Mixed (I/L) Programming techniques ([15] [227]). These kind of approaches model the problem using integer and/or real variables and a set of constraints. Although these approaches provide the global optimum of the problem, and their performance is better for small size instances of the problem, genetic algorithms outperform 286 G.2. OUR PROPOSAL: QOSGASP these techniques for problem instances with avg(|Si |) > 17 [42]. The use metaheuristics is more flexible, because those techniques can consider non-linear composition rules and different fitness function formulations [14]. • The use of Heuristic techniques. [150] and [55] develop some specific heuristics to solve the service composition problem. Applications of metaheuristics to this problem are present in the literature, mainly using different genetic algorithm based approaches. These incorporate variants to the work presented by [42]; either on the encoding scheme, the fitness function or QoS model ([110] [263] [287]), or using population diversity handling techniques ([304]). [52] and [282] use a multi objective evolutionary approach to identify a set of optimal solutions according to different quality properties without generating a global ranking. [220] uses fuzzy logic to relax the QoS constraints, in order to find alternative solutions when it is not possible to find any solution to the problem. Some authors have proposed the use of simulated annealing ([287]), but no experimental results have been provided. In recent years, the trend has been to apply slight modifications to the own formulation of the QoSWSCB problem. In [171] the cost is used as the QoS property guiding the search, but penalties and rewards expressed in the SLAs that define the QoS levels of candidate services are taken into account. In [305], the building blocks supported by the problem solving algorithm are extended to unstructured conditional and loop patterns. In [180] the input parameters of the composition and the specific state of the execution in rebinding are taken into account to improve the estimation of the QoS provided by each possible binding, making the search more accurate. In [164] the latency of service invocations is taken into account for computing the global QoS. G.2 O UR P ROPOSAL : Q O SG ASP In this section we present QoS-Gasp a novel proposal for solving the QoSWSCB problem. It stands for “QoS-aware GRASP+PR algorithm for service-based applications binding”. It is an hybrid algorithm, where GRASP is used for initializing the elite set used in Path Relinking. Next we describe how GRASP and PR have been adapted for solving the QoSWSCB problem. 287 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Solution encoding In order to apply metaheuristic optimization algorithms a suitable encoding of solutions is needed. An encoding is the mechanism used for expressing the characteristics of solutions in a form that facilitates its manipulation during the rest of the algorithm. In QoS-Gasp a vector-based encoding structure is used. This encoding has been used extensively in literature [42, 110]. Solutions are encoded as a vector of integer values, with a size equal to the number of tasks | T | = n. Value j at position i of this vector encodes the choice of service j as provider for task i. Thus, the vector [ j| . . . |k ] of size n encodes the binding χ = {s1,j , . . . , sn,k }. For instance, in our motivating example, the vector that encodes the binding ( A, B, D, D, F, H, J ) would be [0|1|1|1|1|1|0|0]. The index of each provider is determined by order of appearance in table §G.1; e.g. for Banks A ≡ 0 and B ≡ 1. Note that the values in each position of the vector would be either 0 or 1, since we have only two providers per task in our motivating example; i.e., the encoding is not binary. Constraints support GRASP and PR do not support the optimization of constrained optimization problems directly. In order to overcome this drawback, a variant of equation G1 is used as objective function. This variant takes into account a penalization term in a very similar way as in [43]. This term is computed using a weight wun f , and a function D f that measures the distance of a binding χ from a full constraint satisfaction: ∑ Meet(c, χ) D f (χ, C ) = c∈C |C | (G.3) being C the set of global and interdependence constraints of the problem 1 . Meet(c, χ) is a function that measures the distance to the fullfillment of a single constraint c by the binding χ 0 if c is met if c is global abs( Qq (χ) − Tq ) Meet(c, χ) = (G.4) (Dist. to threshold) and unmet Fract. of interdep. if c is a dep. serv. missing const. unmet 1 Local constraints are not taken into account, since they can be met by preprocessing the set of candidate services [15]. 288 G.2. OUR PROPOSAL: QOSGASP In this function, we denote the threshold of each global constraint on QoS property q as Tq . For instance, given the global constraint ”‘the total cost of the composition must be lower than five”’ ≡ Qcost (χ) < 5, then Tcost = 5. Thus our final function to be maximized is: ObjFunc(χ) = GlobUtil (χ) − (wun f ∗ D f (χ)) (G.5) having 0 ≤ wun f ≤ 1. GRASP building phase In QoS-Gasp, GRASP elements represent choices of a specific candidate service for a task. Thus, the solution χ is built by choosing a service for a task at each iteration of the loop until the solution is a complete binding. The partial solution at iteration k is denoted as χk . The specific task to bind at iteration k is randomly chosen. The set of valid elements for the task ti is determined by the service dependency constraints. For instance, in our motivating example there exists a dependency constraint between t3 (stock querying) and t4 (reservation for pickup). Thus, if a provider has been chosen for task t3 in our partial solution χk , then the same provider should be chosen for t4 . If conflicting dependency constraints are found the construction phase restarts, since it is not possible to create a feasible solution from χk−1 . QoS-Gasp uses a RCL selection scheme that has been applied extensively in the literature of GRASP [235]. Specifically, this selection is driven by an evaluation function g -that must be defined for the specific optimization problem to solve- and a greediness parameter α (between 0 and 1). Function g provides a value in R for each candidate service, where gmin is the minimum and gmax is the maximum of those values. A service si,j will be in the RCL if g(si,j ) is greater or equal than gmin + α · ( gmax − gmin ); i.e., α defines the proportion of the range [ gmin , gmax ] in which candidates are discarded from RCL. Thus, for α = 0 all the candidates are in the RCL (none is discarded), and the construction phase becomes random. If α = 1 only the candidates with a value in g of gmax would be in the RCL. The function g and value of α are crucial for the performance of GRASP [235]. We defined up to seven different greedy functions for the QoSWSCB problem. Since the optimal values of those parameters depends on the problem to be solved, we performed a preliminary experiment testing each of function g with several values of α. All the details about the g functions and their evaluation are reported in [212]. Summarizing, an auxiliary experiment #A1 was performed in order to choose the best com- 289 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES bination of values for α and g. The best average results were obtained for α = 0.25 with greedy functions G1, G2 and G6 showed below: G1 (si,j , χk ) = ∑ wq · Uq (qi,j ) (G.6) q∈ Q G1 is “miopic” and unadaptive, meaning that it only considers the QoS value of each service, ignoring the current solution under construction χk , but its evaluation is extremely fast. G2 (si,j , χk ) = D f (χk ) − D f (χk ∪ si,j ) (G.7) G2 uses the difference of distance to constraint satisfication of the current partial solution χk and the new partial solution, denoted as χk ∪ si,j , but it ignores the QoS weights G6 (si,j , χk ) = ObjFunc(χk ∪ si,j ) − GlobUtil (χk ) (G.8) G6 is based directly on the gradient of the global QoS, but ignoring the distance to constraint satisfaction of the current solution. This subtle variant penalizes the selection of elements that generate constraint violations. In order to evaluate D f , GlobUtil, and ObjFunc, a random solution is generated at the beginning of the construction phase, and their elements are used to complete the choices for unassigned tasks in χk . GRASP improvement phase The GRASP improvement phase in QoS-Gasp is a local search procedure based on a neighbourhood definition. The neighbourhood of a binding χ comprises of all possible bindings that have exactly n − 1 assignments in common with χ; i.e. have the same candidate services selected for each task except for one. QoS-Gasp uses Hill Climbing, where only a percentage of the neighbourhood is explored. Path Relinking QoS-Gasp uses the adpation of GRASP described above to initialize the elite set used by PR. The length of the path between initiating and guiding solutions in QoSGasp is determined by the number of different service candidates. Each step of any relinking path, incorporates one service candidate from the guiding solution. It is worth noting that the order in which service candidates are incorporated defines different paths. Consequently, for each pair of initiating and guiding solutions a high number of different paths could be explored. In order to reduce the computational cost of such exploration, QoS-Gasp restricts the number of paths generated between eah pair of solutions to Npaths . It introduces the service candidates from the guiding solution in a 290 G.3. PREVIOUS PROPOSALS random order, and it limits the number of neighbours explored in each path to Nsteps . These parameters control the balance between the diversifycation of the areas of the search space explored and the exhaustiveness of the search in those areas, which is crucial in rebinding scenarios where execution time is scarce. G.3 P REVIOUS P ROPOSALS G.3.1 Genetic Algorithms The proposal described in [42] has been implemented for comparison since it is the most cited GA-based approach for this problem. In particular, the initial population is generated randomly. A standard one-point crossover operator [78] is used. The mutation operator modifies the candidate to a single task, both chosen randomly. Parameter values are chosen according to [42]. G.3.2 Hybrid TS with SA A hybrid of TS with Simulated Annealing (SA) was proposed in [166] for solving the QoSWSCB problem. This proposal was aimed at finding feasible solutions of constrained instances, thus the search was driven by the constraint meeting distance and the execution terminates when a feasible solution is found. In order to enable the comparison with our proposals, and to continue optimizing according to user preferences even when all constraints are met, a modification has been carried out. When all the constraints are met, the difference between the QoS value of current solution Qq and the average QoS for this property Avgq is used for guiding the search. Specifically, the QoS property selected to guide the improvement in the algorithm is the one minimizing s ∗ ( Qq (χ) − Avgq ) ∗ wq , where s is 1 if q is positive and −1 if it is negative; i.e., our modification tries to generate neighbors improving the solution in the QoS property with the bigger improvement room and importance for users. The pseudo-code of the resulting algorithm, and a detailed explanation of its working scheme is available in the additional material ([212]). 291 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES G.4 E XPERIMENTS PERFORMED ON THE Q O SWSCB P ROB LEM G.4.1 Experiment QoSWSCB-#A1: Tailoring of GRASP The aim of preliminary experiment #A1 was to choose the parameters of the GRASP construction phase to be used in the experimentation. These parameters are two: (i) α, that controls the level of elitism versus randomness when creating the RCL; and (ii) the greedy function G, to be used in the evaluation of features at each iteration of the construction phase. This experiment is similar to those presented in [222], which we have used for inspiration. In this experiment we assume that the extreme values of α, zero and one, are not valid options. When α is zero the constructions phase turns into a purely elitist algorithm, similar to local optimization methods. When α is one, the construction method is a purely random solutions generation procedure. Consequently, in this experiment the following range of values for α is used: 0.25, 0.5, and 0.75. The range of possible greedy functions along with their formulations are described in Section §G.2, from G1 to G7. A good greedy function should provide not only near-optimal solutions, but generate enough diversity as to explore the multiple local optima of the search space [222, 235]. A measure of diversity of solutions is defined based on our notion of neighboring solutions A factorial design is used for this preliminary experiment, generating all the possible combinations of parameter values. Eleven problem instances were generated by the algorithm described in Appendix C of [212]. In general, the best global configuration is α = 0.25 and G = G6. However, for hardly constrained instances, the configuration α = 0.5 and G = G2 provides better mean results, and should be used for this kind of problem. It is not surprising that G2 generates much diversity, since for unconstrained instances of the problem its use is equivalent to a random selection of features. Note that the evaluation of this greedy function will always be zero for unconstrained problems and thus all the features are inserted in the RCL independently of the value of α. G1 and G4 have also a good performance in general but are worse than G2. 292 G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM Paramenter Initial Grasp Iterations Number of Elite Solutions Number of paths per Iteration Number of neighbors to explore per path Range {10, 50, 100} {2, 5, 10} {2, 5, 10} {5, 10, 50} Table G.3: Parameters Ranges G.4.2 Experiment #A2: Tuning of GRASP+PR The aim of preliminary experiment #A2 was to choose the parameters of the GRASP+PR approach to be used in the experimentation. The parameters values for the underlying GRASP technique are chosen based on the results of the previous section. Thus, only the values of parameters specific of Path Relinking left. These parameters are: (i) Initial Grasp Iterations (number of grasp iterations prior to the intensification phase of PR), (ii) Number of Elite Solutions, (iii) Number of Paths per Iteration; and (iv) Number of neighbours to be explored on each path. A factorial design is used for this preliminary experiment, generating all the possible combinations of parameter values in a certain range. The ranges used for each parameter are described in table §G.3. These ranges were determined by authors criteria and some exploratory executions of GRASP+PR. A number of 11 problem instances were generated by the algorithm described in [212]. GRASP+PR where run 30 times for each problem instance and parameters configuration. Techniques were configured for using a termination criteria based on a maximum execution time of 100ms. Due to the size of the experiment design, we simply describe the conclusions of the experiment. Both the source code of the implementation and the raw data of results are available on-line for interested readers. The configuration with best mean results was: (i) Initial Grasp Iterations = 50; (ii) Number of Elite Solutions = 10; (iii) Number of paths per Iteration = 2; and (iv) Number of neighbors to explore per path = 50. G.4.3 Experiment #1: Selection of a technique for QoSWSCB The aim of this experiment is to compare the performance of our proposal and previous ones in terms of the QoS of solutions they provide. Previous proposals (as de- 293 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES scribed in Section §G.3) are compared to ours when solving a number of instances of the QoSWSC problem. Specifically, we compare Genetic Algorithms (GAs) and Hybrid Tabu Search with Simulated Annealing (TS/SA), with a GRASP using G1 (GRASP(G1)), and two variants of GRASP with Path Relinking (GRASP+PR) that use G2 and G6. Positive scaling utility function were used for Availability, Reliability and Security (we denote this set of properties as Q+ = { A, R, S}), c.f. section §G.1.1. Negative scaling utility functions were used for the remaining properties (Q− = {C, T }). The weights used for each QoS property were: wun f = 0.5, wC = 0.3, wT = 0.3, w A = 0.1, wS = 0.2, w R = 0.1. Since FOM solves minimization problems, an objective function that subtracts the value of ObjFunc (as described in equation G.5) to 1.0 was used. Problem instances were generated by the algorithm described in appendix C of [212]. Table §G.4 shows the mean results per problem instance and execution time. Specifically, table §G.4 is divided into four sub-tables by execution time. In each sub-table, rows depict the results obtained for each problem instance, and columns depict the results obtained by each optimization technique. The best means per problem instance and execution time is highlighted in boldface. Note that the problem was modelled as a minimization problem for compatibility with the experimental framework FOM which mean that the lower the value the better. It is noticeable that GRASP+PR(G6) obtained the best mean results in all cases. GA provides intermediate results, better than TS+SA, but not as good as GRASP+PR and GRASP. The performance of TS+SA was bad except for tightly constrained problem instances. Our statistical analysis revealed that the differeces among GRAPS+PR(G6) and the other techniques are statistically significant (with α = 0.05) except for one problem instance and technique. Specifically, the differences between GRASP(G1) and GRAPS+PR(G6) are not signficant for Problem P7 when execution times are longer than 500ms. It is worth noting that P7 is significantly smaller that the others. It contains only 7 tasks and 63 candidate services. Thus, authors infer that for small instances of the problem, GRASP(G1) can behave nearly as well as GRASP+PR(G6). The causes of this behaviour could be: (i) the inneficiency of the intensification strategy of PR, since the probability of overlapping of paths is bigger for small problem instances; and (ii) the capability of GRASP for exploring the promising area of the search space for small problem instances. In order to evaluate extent to which some techniques outperform others, we computed the percentage of runs where the result obtained by one technique are better than any result (out of the 30 runs) obtained by other technique (for the same problem instance and execution time). Table §G.5 summarizes these results. It is divided into four sub-tables by execution time, were each sub-table contains a square matrix with 294 G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM the optimization techniques in rows and columns. Specifically, the value of a cell is the mean of the percentage described above for the problem instances. For instance, the value in the second row and first column of the top-left sub-table specifies that for execution times of 100ms, on average for the problem instasces a 92.42% of the solutions obtained by GRASP+PR(G6) are better than any solution obtained by GA. This means that the results obtained by GRASP+PR(G6) outperform those obtained by GA. Since the percentages are averaged for all the problem instances and refer to different pairs of techniques, the sum by rows and columns is not 100%. Table §G.5 confirms the conclusions drawn above, since the row of GRASP+PR(G6) has the higher percentage in almost any execution time and column. However, it is noticiable the small percenge of such row for the column of GRASP(G1), while the trasposed cell (row GRASP(G1) and column GRASP+PR(G6)) has also a small percentage. This means that, although on average the results of GRASP+PR(G6) are better and have less dispersion than those of GRASP(G1), the latter can find ocasionally better solutions than those usually found by the former. Another noticeable finding is the progressive decrease of the percentages of GRASP+PR(G6) and GRASP(G1) when execution time increases. Figures §G.4.3 and §G.4.3 show box plots for two problem instances with a termination criterion of 100ms. Each figure depicts four populations, defined as the values of ObjFunc for the best solution obtained in the runs of an optimization technique. Thus each population has 30 samples. Results of GRASP+PR(G6) are labeled as GRASP+PR, and those of GRASP(G1) as GRASP. Specifically, for each population the boxplot shows: the minimum sample represented as the lower horizontal line segment, lower quartile (Q1) represented as the lower limit of the box, median (Q2) segment dividing the box, upper quartile (Q3) represented as the top of the box, and largest sample represented as the upper horizontal line segment. Samples considered outliers are represented as circles or stars. The distribution of the results obtained by GRASP+PR is the best in both figures. The small variability of the results provided by TS+SA is analyzed in depth in [212]. The improvements provided by our proposals are significant not only in a statistical sense, but also in terms of the actual QoS provided. As a motivating example, the QoS of solutions provided by GRASP+PR(G6) for problem instance C4 are 49.25% and 28% better on average than those provided by GAs and TS+SA respectively. These improvements are noteworthy when translated into costs savings and execution time decreases. 295 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Figure G.2: Box plot of results in Experiment #1 and problem instance 9 Figure G.3: Results in Experiment #1 and problem instance 2 296 G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM Execution Time Technique Problem P0 Problem P1 Problem P2 Problem P3 Problem P4 Problem P5 Problem P6 Problem P7 Problem P8 Problem P9 Problem P10 GA 0,317053066 0,832070664 0,314220241 0,786458899 0,810939436 0,345341638 0,814693606 0,755621698 0,859142490 0,802587993 0,333850406 GRASP+PR (G6) 0,31467194 0,82958546 0,30428238 0,77332510 0,81066853 0,33979296 0,79437721 0,74596884 0,85159186 0,78813945 0,33271258 100 ms GRASP+PR (G2) 0,31559766 0,83114955 0,30821952 0,77774798 0,81082234 0,34254369 0,80665351 0,74604446 0,85524606 0,79375533 0,33290040 GRASP(G1) 0,31585178 0,82996641 0,30961194 0,77939848 0,81086532 0,34201717 0,79564660 0,74597200 0,85185832 0,79500608 0,33326753 TS/SA 0,37823648 0,90526782 0,40335166 0,87387101 0,81292188 0,39219569 0,89469352 0,82326859 0,91504732 0,88106812 0,34420161 Execution Time Technique Problem P0 Problem P1 Problem P2 Problem P3 Problem P4 Problem P5 Problem P6 Problem P7 Problem P8 Problem P9 Problem P10 GA 0,316847884 0,832185790 0,314327155 0,785376182 0,810937976 0,345156788 0,815804062 0,756262913 0,859046630 0,803349334 0,334067627 GRASP+PR (G6) 0,31465716 0,82958531 0,30334881 0,77303310 0,81066853 0,33979296 0,79307634 0,74577745 0,85159186 0,78810276 0,33271171 500 ms GRASP+PR (G2) 0,31541106 0,83047811 0,30667738 0,77626168 0,81077270 0,34138343 0,80427884 0,74566567 0,85385226 0,79208361 0,33276010 GRASP(G1) 0,31559366 0,82996641 0,30770218 0,77730067 0,81081414 0,34179962 0,79564660 0,74580965 0,85185832 0,79353237 0,33286655 TS/SA 0,37819911 0,90526782 0,40335070 0,87377022 0,81291786 0,39217004 0,89464300 0,82099777 0,91504127 0,88106812 0,34393207 Execution Time Technique Problem P0 Problem P1 Problem P2 Problem P3 Problem P4 Problem P5 Problem P6 Problem P7 Problem P8 Problem P9 Problem P10 GA 0,31702924 0,83257414 0,31406676 0,78422132 0,81094311 0,34514013 0,81558663 0,75901268 0,85937275 0,80275109 0,33372791 GRASP+PR (G6) 0,31464257 0,82958531 0,30334250 0,77294846 0,81066853 0,33978323 0,79244253 0,74549613 0,85159186 0,78810276 0,33268621 1000 ms GRASP+PR (G2) 0,31514186 0,82991653 0,30495659 0,77398728 0,81072816 0,34021847 0,79865127 0,74544938 0,85210501 0,79004525 0,33266052 GRASP (G1) 0,31521718 0,82992594 0,30548735 0,77478427 0,81073885 0,34087602 0,79564660 0,74549475 0,85185832 0,79100871 0,33268679 TS/SA 0,37819911 0,90526782 0,40335070 0,87377022 0,81291786 0,39217004 0,89464300 0,82099777 0,91504127 0,88106812 0,34393207 Execution Time Technique Problem P0 Problem P1 Problem P2 Problem P3 Problem P4 Problem P5 Problem P6 Problem P7 Problem P8 Problem P9 Problem P10 GA 0,31708794 0,83234986 0,31456350 0,78437278 0,81090575 0,34478886 0,81608455 0,75804813 0,85909064 0,80183890 0,33395456 GRASP+PR (G6) 0,31464248 0,82958531 0,30334250 0,77284804 0,81066853 0,33977041 0,79238052 0,74542966 0,85159186 0,78810276 0,33265462 50000 ms GRASP+PR (G2) 0,31503530 0,82979438 0,30436232 0,77318822 0,81070306 0,33982145 0,79697338 0,74541133 0,85185755 0,78949346 0,33264226 GRASP (G1) 0,31511347 0,82981234 0,30472537 0,77396728 0,81072227 0,34035985 0,79564660 0,74542556 0,85183389 0,78994663 0,33265207 TS/SA 0,37819911 0,90526782 0,40335070 0,87377022 0,81291786 0,39217004 0,89464300 0,82099777 0,91504127 0,88106812 0,34393207 Table G.4: Means of obj. func. per algorithm and exec. time (Experiment 1) 297 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Exec. Time GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA Exec. Time GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA 92,42% 35,45% 84,55% 0,00% 500 ms GA GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) 0,00% 0,30% 0,00% 87,27% 63,33% 1,21% 58,48% 1,21% 0,91% 83,94% 0,61% 60,30% 0,30% 0,30% 0,30% 0,30% TS+SA 90,91% 90,91% 90,91% 90,91% 1000 ms GRASP+PR(G2) GRASP(G1) 0,00% 0,00% 68,18% 1,52% 0,91% 4,24% 0,00% 60,30% 0,30% 0,30% 0,30% TS+SA 90,91% 90,91% 90,91% 90,91% 50000 ms GA GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) 0,91% 0,91% 0,91% 71,52% 36,06% 5,15% 72,73% 0,91% 9,09% 72,73% 0,00% 31,82% 0,91% 0,30% 0,30% 0,61% TS+SA 36,97% 89,39% 76,06% 90,61% Exec. Time GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA Exec. Time GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA 100 ms GRASP+PR(G2) GRASP(G1) TS+SA 0,30% 0,00% 100,00% 80,30% 3,94% 100,00% 0,30% 0,30% 100,00% 0,91% 70,00% 100,00% 0,00% 0,00% 0,00% GRASP+PR1(G6) 0,00% 86,67% 74,85% 83,03% 0,30% GRASP+PR1(G6) 0,00% Table G.5: Mean percentage of solutions improving any obtained by other tech. (Exp1) G.4.4 Experiment #2: Selection of a technique for QoSWSCB (with a different objective function) In order to ensure that the differences between our proposals and the previous approaches do not depend on the specific fitness function and problem instances used, we repeated the experiment using 11 additional problem instances and the objective 298 G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM function defined in [43]. ∑ (wq · Uq ( Qq (χ)) min f Can f (χ) = q∈ Q− ∑ (wq · Uq ( Qq (χ)) + wun f · D f (χ) (G.9) q∈ Q+ The results obtained for this experiment are shown in table §G.6 using the same structure and notation as in table §G.4. GRASP+PR(G6) generetes the best mean results for most problem instances. Specifically, for execution times of 500ms GRASP+PR(G6) provides the best average results for 8 out of 11 problem instances. TS+SA provided the best mean results for problem C2. This fact confirms that for tightly constrained problem instances it can perform better than GA and the GRASP-based proposal. This result is coherent, since it priorizes constraint satisfaction in the search process [166]. GRASP provided the best mean results for two problem instances (C5 and C6). Table §G.7 shows the mean percentages of improvments in a similar way as table §G.5. Again, GRASP+PR(G6) provided the highest percentages in general. The capabiliy of GRASP(G1) for finding sporadically the best results is confirmed by the results in table §G.7. Moreover, the descreasing trend of the percentages of GRASP+PR(G6) when execution time increases is also appreciable. A noticeable difference regarding table §G.5 are the percentages of TS + SA. The performance of this technique is much better in this experiment. Thus, the performance of TS + SA is higly influenced by the specific objective function used for modelling the global utility. Statistical tests confirmed that the differences in the group of techniques were statistically significant in almost all cases. The only exception were the differences between GRASP+PR(G6) and TS+SA for problem (C2) and execution times of 50000ms. Figures §G.4.4 and §G.4.4 show two plots depicting the results of each technique for two different problem instances with equation 9.9 as objective function, and a termination criterion of 100ms. Again, the distribution of GRASP+PR is the best in both figures. 299 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Exec. Time Technique Problem C0 Problem C1 Problem C2 Problem C3 Problem C4 Problem C5 Problem C6 Problem C7 Problem C8 Problem C9 Problem C10 GA 20,3294 17546,9250 77,4635 365838,7379 4660,0503 43077,7130 504,0981 29445,2042 623,4833 141414,7664 21682,8585 GRASP+PR (G6) 18,1494 16798,8826 53,3737 354607,1630 2688,8626 40087,5854 353,8804 32899,8809 590,7976 129780,8750 20345,8959 100 ms GRASP+PR (G2) 19,0278 16892,6761 69,3314 357263,5720 4032,7248 41927,6160 367,8332 25257,0317 604,8967 135381,2160 20448,2644 GRASP(G1) 18,8089 16883,4022 50,6206 357324,9570 2817,3613 39712,0757 347,8916 19163,0773 605,9958 133877,3010 20392,9211 TS/SA 19,4567 18028,5566 47,2274 381218,1130 2758,6429 39804,2874 348,1642 20070,3702 653,1621 144955,3870 26421,6496 Execution Time Technique Problem C0 Problem C1 Problem C2 Problem C3 Problem C4 Problem C5 Problem C6 Problem C7 Problem C8 Problem C9 Prolem C10 GA 20,2267 17538,6257 77,2953 371234,6973 4717,7005 43167,4373 512,1532 27976,6404 621,2869 143803,8980 21803,1250 GRASP+PR (G6) 18,1300 16793,7861 50,7654 354238,9660 2522,7860 40087,5854 352,3514 16228,1583 565,9943 126129,2150 20189,7876 500 ms GRASP+PR (G2) 18,7870 16853,8274 64,8281 354854,3030 3846,3891 40992,9427 368,4614 22018,1279 593,6641 131166,6470 20295,9563 GRASP(G1) 18,7038 16814,3450 50,6206 354498,9940 2817,3613 39712,0757 347,2431 19156,5452 591,7945 130685,0550 20295,1878 TS/SA 19,4567 18028,5566 47,2274 381218,1130 2758,6429 39804,2874 348,1642 20070,3702 653,1621 144955,3870 26421,6496 Execution Time Technique Problem C0 Problem C1 Problem C2 Problem C3 Problem C4 Problem C5 Problem C6 Problem C7 Problem C8 Problem C9 Problem C10 GA 20,2537 17344,5197 77,8725 366237,9430 4729,2815 43157,9618 499,4118 28586,1747 622,9089 143082,5650 21699,4216 GRASP+PR (G6) 18,1294 16795,3229 49,1226 353935,1220 2379,8780 40039,7574 354,9664 15381,4457 556,9008 125785,1540 20183,3250 1000 ms GRASP+PR (G2) 18,7945 16836,7681 62,7276 355399,0430 3529,8844 40893,9150 368,2569 19556,9182 581,6285 129145,8090 20295,5048 GRASP (G1) 18,3892 16799,8261 50,6205 353959,6700 2817,3613 39712,0757 344,4944 18875,8913 574,9983 128143,6330 20228,4809 TS/SA 19,4567 18028,5566 47,2274 381218,1130 2758,6429 39804,2874 348,1642 20070,3702 653,1621 144955,3870 26421,6496 Execution Time Technique Problem C0 Problem C1 Problem C2 Problem C3 Problem C4 Problem C5 Problem C6 Problem C7 Problem C8 Problem C9 Problem C10 GA 20,5018 17444,8214 79,1106 368806,7140 4591,2618 43269,8455 513,9603 28453,8342 623,6269 142111,7900 21672,8291 GRASP+PR (G6) 18,1271 16789,8868 47,3028 353530,0840 2474,0530 40228,1459 360,8437 14439,0012 557,2021 125284,279 20204,3569 50000 ms GRASP+PR (G2) 18,7918 16841,7142 63,4201 354685,5790 3550,5512 40794,3524 377,2588 19615,3699 584,9948 129281,0100 20282,9220 GRASP (G1) 18,2799 16789,4079 50,6206 353517,1300 2817,3613 39712,0757 340,9694 18326,1133 568,9675 127287,3410 20213,8838 TS/SA 19,4567 18028,5566 47,227387 381218,1130 2758,6429 39804,2874 348,1642 20070,3702 653,1621 144955,3870 26421,6496 Table G.6: Means of obj. func. values per algorithm and execution time in Exp. 2 300 G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM Execution Time GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA Execution Time 94,85% 62,73% 100,00% 45,15% GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA 88,79% 61,21% 90,61% 36,06% Execution Time GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA 75,45% 40,91% 75,45% 36,06% Execution Time GA GA GRASP+PR(G6) GRASP+PR(G2) GRASP(G1) TS+SA 71,52% 54,24% 72,42% 26,97% 200 ms GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA 0,00% 0,00% 0,00% 41,21% 59,09% 13,94% 76,97% 0,00% 1,21% 62,12% 7,88% 62,42% 90,91% 9,09% 26,97% 9,09% 500 ms GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA 0,30% 0,30% 0,30% 42,73% 74,24% 8,18% 73,33% 0,00% 1,21% 56,06% 3,33% 73,94% 74,55% 18,18% 36,36% 18,18% 1000 ms GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA 0,61% 0,61% 0,61% 35,15% 59,39% 2,42% 72,12% 0,30% 2,73% 54,55% 0,61% 57,88% 70,00% 18,18% 27,27% 27,27% 50000 ms GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA 0,91% 0,91% 0,91% 35,45% 56,97% 8,48% 66,97% 0,00% 1,82% 48,79% 0,00% 56,06% 65,15% 18,18% 18,48% 18,18% Table G.7: Mean percentage of solutions improving any obtained by other tech. (Exp2) One of the most surprising results of the study is the low dispersion of results obtained by the TS/SA algorithm between different executions, specially when using the objective function of equation 9.9. Authors logged the execution of this technique in order to determine the root of these results. One of the causes was the acceptance criteria defined: according to [166], the probability of acceptance of a solution χ0 as the next current solution, given that current solution is χ, is exp(( F (χ) − F (χ0 )) · iteration) -where F is the fitness function. For functions with large scale values such as equation 9.9, it is rare to choose solutions that do not improve, since if there is a large difference in the objective function and the number of iterations increase, the probability of accepting them becomes extremely small. Therefore, the algorithm only chooses 301 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES Figure G.4: Results of each technique in Experiment #2 for problem instance 0 Figure G.5: Results of each technique in Experiment #2 for problem instance 9 non improving neighbors very early in the optimization process. Furthermore, for constrained problem instances the criteria of constraint satisfaction prevails, since it is applied earlier than the optimization criteria. Under these circumstances, the neighbor generation subroutine tends to use always the same properties for improvement, thus leading the search to similar areas in the search space. Finally, the deterministic initial solution generation method (local optimization based initialization, in Table 4 of [166]), 302 G.5. THREATS TO VALIDITY also contributes to generate the nearly constant results. G.5 T HREATS TO VALIDITY In order to clearly delineate the limitations of the experimental study, next we discuss internal and external validity threats. Internal validity. This refers to whether there is sufficient evidence to support the conclusions and the sources of bias that could compromise those conclusions. In order to minimize the impact of external factors in our results, QoS-Gasp was executed 30 times per problem instance to compute averages. Moreover, statistical tests were performed to ensure significance of the differences identified between the results obtained by the compared proposals. External validity. This is concerned with how the experiments capture the objectives of the research and the extent to which the conclusions drawn can be generalized. This can be mainly divided into limitations of the approach and generalizability of the conclusions. Regarding the limitations, experiments showed no significant improvements when comparing QoS-Gasp with a simple GRASP for small problem instances and short execution times. This limitation is due to: (i) the capability of GRASP to explore a significant amount of the search space, and (ii) the overlapping of the paths explored by PR for such small problem instances. Regarding the generalizability of conclusions, two different objective functions, and two different sets of problem instances were used. Additionally the parameters and size of were chosen from a survey of the most common values used in the literature (c.f. tables of problem instance parameters in [212] and [261]). Finally, conclusions regarding the performance of QoS-Gasp are not generalizable to scenarios with longer exeuctions times, pointing out a direction of future work. 303 APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES 304 H G ENERATION OF H ARD F EATURE M ODELS The most effective way to do it, is to do it. Amelia Earhart, 1897 – disappeared in 1937 American aviation pioneer and author H.1 I NTRODUCTION Recent publications reflect an increasing interest to evaluate and compare the performance of analysis techniques and tools on the analyses of feature models. One of the main challenges when assessing performance is to find hard problems that show the strengths and weaknesses of the tools under test in extreme situations (e.g. those producing longest execution times). Feature models from real domains are by the far the most appealing input problems. Unfortunately, although there are references to large–scale real feature models, only small examples from research publications or cases studies are available. For instance, the largest feature model available in the SPLOT feature model repository [258] at the time of writing this paper has 287 features. This lack of hard realistic feature models, has led authors to evaluate their tool with large–scale randomly generated feature models of 5000, 10000 and up to 20000 features. More recently, some authors have also suggested looking for tough and realistic feature models into the open source community. Regardless of the type of feature model used during experimentation, the characteristics of the tools under tests are not considered in the current state of the art. As a result, current performance evaluations only provide a rough idea of the behaviour of tools with average problems rather than looking for specific weak points related to the type of technique or algorithm under evaluation. Hence, developers and users would probably be more interested to know whether their tool can crash with a hard realis- 305 APPENDIX H. GENERATION OF HARD FEATURE MODELS tic model of small or medium size rather than knowing the execution times of huge random model out of their scope. The main goal of software testing is to find inputs that reveal errors in the software under test. The exhaustive search for these inputs is acknowledged to be unfeasible due to the size and complexity of the programs, there are simply too many inputs combinations. As pointed by McMinn [184], random testing is not a feasible solution: ”random methods are unreliable and unlikely to exercise ‘deeper’ features of software that are not exercise by mere chance”. In this context, metaheuristic search techniques have proved to be a promising solution for the automated generation of test data for both functional [184] and non–functional properties [6]. Metaheuristic search techniques are frameworks which use heuristics to find solutions to hard problems at an affordable computational cost. Typical metaheuristic techniques are evolutionary algorithms, hill climbing or simulated annealing [280]. For the generation of test data, these strategies translate the test criteria into an objective function (also called fitness function) that is used to evaluate and compare the candidate solutions respect to the overall search goal. Using this information, the search is guided toward promising areas of the search space. Wegener et al. [288, 289] were one of the first proposing using evolutionary algorithms to verify the time constraints of software back in 1996. In their work, the authors used genetic algorithms to find input combinations that violate the time constraints of real–time systems, that is, those inputs producing an output too early or too late. Their experimental results showed that evolutionary algorithms are much more effective than random search in finding input combinations maximizing or minimizing execution times. Since then, a number of authors have followed their steps using metaheuristics and especially evolutionary algorithms for the testing of non–functional properties such as execution time, quality of service, security, usability or safety [6, 184]. Inspired by the ideas of Wegener and later authors, in this an evolutionary algorithms for the automated generation of hard feature models has been proposed. In particular, we model the problem of finding computationally–hard feature models as an optimization problem and we solve it using a novel evolutionary algorithm for feature models. Given a tool and analysis operation, our algorithm generate input models of a predefined size maximizing aspects as the executions times or the memory consumption of the tool when performing the operation. For the evaluation of our approach, we performed several experiments using different analysis operations, paradigms, tools and optimization criteria. In total, we performed over 50 million executions of analysis operations for the configuration and evaluation of our approach. 306 H.2. FEATURE MODELS The results showed how our evolutionary program successfully identified input models causing much longer executions times and higher memory consumption than random models of identical or even larger size. Furthermore, the data revealed that the hard feature models found have similar properties to the realistic models found in the literature. H.2 F EATURE M ODELS Feature models were first introduced as a part of the Feature-Oriented Domain Analysis method (FODA) by Kang back in 1990 [157]. In this first approach, three kinds of relationships between features were proposed: • Mandatory. If a child feature is defined as mandatory, it must be included in all the products in which its parent feature is included. Mandatory features are generally modelled using a filled circle as shown in Figure §H.1(a). For instance, according to the feature model of Figure §H.3, it is mandatory that the software for mobile phones includes support for calls. • Optional. If a child feature is defined as optional, it can be optionally included in all products in which its parent feature appears. Optional features are generally represented using a empty circle as shown in Figure §H.1(b). For instance, support for multimedia devices (e.g. camera) is an optional feature in the feature model of Figure §H.3. • Alternative. A set of child features are defined as alternative, if only one of them can be selected when its parent feature is part of the product. Figure §H.1(c) depicts the usual visual representation for this relationship. As an example, according to the feature model of Figure §H.3, a mobile phone may use a Symbian or a WinCE operating system but not both in the same product. Notice that a child feature can only appear in a product if its parent feature does. The root feature is a part of all the products within the software product line. In addition to the parental relationships between features, two kinds of cross-tree constraints between features were identified in FODA. These are: • Requires. If a feature A requires a feature B, the inclusion of A in a product implies the inclusion of B in such product. Requires constraints are commonly modelled using unidirectional arrows as shown in Figure §H.2(a). For instance, 307 APPENDIX H. GENERATION OF HARD FEATURE MODELS (a) Mandatory (b) Optional (c) Alternative Figure H.1: Feature relationships according to the feature model of Figure §H.3, mobile phones including camera require highde f inition. • Excludes. If a feature A excludes a feature B, both features cannot be part of the same product. This constraints are visually represented using bidirectional arrows as shown in Figure §H.2(b). As an example, the software product line represented by the model of Figure §H.3 rules out the possibility of offering support for GPS with a basic screen in the same product. (a) Requires (b) Excludes Figure H.2: Cross-tree constraints In order to clarify the concepts concerning basic feature models we present some examples. Figure §H.3 depicts a simplified feature model inspired by the mobile phone industry. The model illustrates how features are used to specify and build software for mobile phones. The software loaded in the phone is determined by the features that it supports. According to the model, all phones must include support for calls, messaging, and one screen ( basic), colour or HighDe f inition (but not two screens). Furthermore, the software for mobile phones may optionally include support for multimedia devices such as camera, MP3 or both of them. 308 H.3. ETHOM: AN EVOLUTIONARY ALGORITHM FOR FEATURE MODELS Figure H.3: Mobile phone feature model (without cross-tree constraints) H.3 ETHOM: A N EVOLUTIONARY ALGORITHM FOR FEA TURE MODELS ETHOM is a novel evolutionary algorithm for solving optimization problems on feature models. The algorithm takes several size constraints and a fitness function as input and returns a feature model of the given size maximizing the optimization criteria defined by the function. The algorithm takes several size constraints and a fitness function as input and returns a feature model of the given size maximizing the optimization criteria defined by the function. In Section §2.4.1, we described the general structure of an evolutionary algorithm and explained its basic steps. In the following, we describe how these basic steps are carried out in our algorithm. Initial population. The initial population is generated randomly according to the size constraints received as input. The current version of our algorithm allow the user to specify the number of features, percentage of cross-tree constraints and maximum branching factor of the feature model to be generated. Evaluation. Feature models are evaluated according to the fitness function received as input obtaining a numeric value that represents the quality of the candidate solution (i.e. its fitness). Encoding. For the representation of feature models as individuals we propose using a custom encoding. Existing encodings were ruled out since these were either not adequate to represent tree structures (e.g. binary encoding [16]) or were not able to produce solutions of a fixed size (e.g. tree encoding [168]), a key requirement in our approach. Figure §H.4 depicts an example of our encoding. As illustrated, each model is represented by means of two arrays, one storing information about the tree and another one with information about Cross-Tree Constraints (CTC). The order of each feature in the array corresponds to the Depth–First Traversal (DFT) order of the tree. Hence, feature labelled with ‘0’ in the tree is stored in the first position of the array, feature with ‘1’ is stored the second position and so on. Each feature in the tree array is defined as a two-tuple < PR, C > where PR is the type of relationship with its parent feature 309 APPENDIX H. GENERATION OF HARD FEATURE MODELS Figure H.4: Encoding of a feature model (M: Mandatory, Op: Optional, Or: Or-relationship, Alt: Alternative) and C is the number of children of the given feature. As an example, first position in the tree array, < Op, 2 >, indicates that feature labelled with ‘0’ in the tree has an optional relationship with its parent feature and has two child features (those labelled with ‘1’ and ‘3’). Analogously, each position in the CTC array stores information about one constraint in the form < TC,O, D > where TC is the type of constraint (R: Requires, E: Excludes) and O and D are the indexes of the origin and destination features in the tree array respectively. Selection. This step determines how the individuals of one generation are selected to be combined and produce new offspring. Selection strategies are generic and can be applied regardless of how the individuals are represented. In our algorithm, we experimented with both rank-based roulette-wheel and binary tournament selection strategies obtaining positive results with both of them. Crossover. These are the techniques used to combine chromosomes in some way and produce new individuals in an analogous way to biological reproduction. We tried two different crossover techniques in our algorithm with positive results, one–point and uniform crossover. Figure §H.5 depicts an example of the application of one–point crossover in our algorithm. The process starts by selecting two parent chromosomes to be combined. For each array in the chromosomes, the tree and CTC arrays, a random point is chosen (so–called crossover point). Finally, the offspring is created by copying the content of the arrays from the beginning to the crossover point from one parent and the rest from the other one. Notice that the characteristics of our encoding guarantee a fixed size for the individuals. Mutation. In this step, random changes are applied to the chromosomes to prevent the algorithm from getting stuck prematurely at a locally optimal solution. Mutation operators must be 310 H.3. ETHOM: AN EVOLUTIONARY ALGORITHM FOR FEATURE MODELS Figure H.5: Example of one-point crossover in our algorithm specifically designed for the type of encoding used. In our algorithm, we defined four different types of custom mutation operators, namely: • Operator 1. It changes randomly the type of a relationship in the tree array, e.g. from mandatory,< M, 3 >, to optional,< Op, 3 >. • Operator 2. It changes randomly the number of children of a feature in the tree, e.g. from < M, 3 > to < M, 5 >. The new number of children is the range [0, BF ] where BF is the maximum branching factor indicated as input. • Operator 3. It changes the type of a cross-tree constraint in the CTC array, e.g. from excludes < E, 3, 6 > to requires < R, 3, 6 >. • Operator 4. It changes randomly (with equal probability) the origin or destination feature of a constraint in the CTC array, e.g. from < E, 3, 6 > to requires < E, 1, 6 >. Origin and destination features are ensured to be different. These operators are applied randomly with the same probability. Decoding. At this stage, the array-based chromosomes are translated back into feature models in order to be evaluated. In our algorithm, we identified three types of patterns making a chromosome infeasible or semantically redundant, namely: i ) those encoding set relationships (orand alternative) with a single child feature (e.g. Figure §H.6(a)), ii ) those containing cross-tree constraints between features with parental relationship (e.g. Figure §H.6(b)), and iii ) those containing features sharing contradictory or redundant cross-tree constraints (e.g. Figure §H.6(c)). The specific approach used to address infeasible individuals, replacement or repairing, mainly depend on the problem and it is ultimately up to the user. Survival. Finally, the next population is created by including all the new offspring plus those individuals from the previous generation that were selected for crossover but did not generate descendants due to probability. 311 APPENDIX H. GENERATION OF HARD FEATURE MODELS Figure H.6: Examples of infeasible individuals and repairs H.3.1 Instantiation of the algorithm In this section, we propose to model the problem of finding computationally–hard feature models as an optimization problem and to solve it using an instantiation of our evolutionary algorithm. We chose evolutionary computation because it has proved to be a robust search technique suited for the complex search spaces and noisy objective functions used when dealing with non–functional properties [6]. Key benefit of our approach is that it takes into account the characteristics of the tools under test trying to exploit its vulnerabilities. Also, our approach is very generic being applicable to any automated operation on feature models, not only analyses, in which the quality (i.e. fitness) of the models can be measured quantitatively. Next, we clarify the main aspects of the configuration of our algorithm: • Fitness function. Our first attempt was to measure the execution time in milliseconds invested by FaMa to perform the operation. However, we found that this was very inaccurate since the result of the function was deeply affected by the system load, i.e. it was not deterministic. To solve this problem, we decided to measure the fitness of a feature model as the number of backtracks produced by the analysis tool during its analysis. A backtrack represents a partial candidate solution to a problem that is discarded because it cannot be extended to a full valid solution [271]. In contrast to the execution time, most CSP backtracking heuristics are deterministic. Together with execution time, the number of backtracks is commonly used to measure the complexity of constraint satisfaction problems [271]. Thus, we may assume that the higher the number of backtracks the longer the computation time. • Infeasible individuals. We evaluated the effectiveness of both replacement and repairing techniques and we finally opted for the later. More specifically, we used the fol- 312 H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS lowing repairing algorithm with infeasible individuals: i ) isolated set relationships are converted into optional relationships (e.g. the model in Figure §H.6(a) is changed as in Figure §H.6(d)), ii ) cross-tree constraints between features with parental relationships are removed (e.g. the model in Figure §H.6(b) is changed as in Figure §H.6(e)), and iii ) two features cannot share more than one constraint (e.g. the model in Figure §H.6(c) is changed as in Figure §H.6(f)). • Stop criteria. There is not means of deciding when an optimum input has been found and the evolutionary algorithm should be stopped [289]. Therefore, we decided to allow the algorithm to continue for a given number of executions of the fitness function taking the largest number of backtracks obtained as the optimum, i.e. solution to the problem. H.4 E XPERIMENTS ON THE GENERATION HARD FEATURE MODELS In order to evaluate our approach, we developed a prototype implementation of our algorithm in Java. With the aim of finding a suitable tailoring and tuning of our algorithm, we performed numerous executions of a sample optimization problem evaluating different combination of values for the variants and key parameters of the algorithm, presented in Tables §H.1 and §H.2 respectively. The optimization problem was to find a feature model maximizing the execution time invested by the analysis tool when checking whether the model is void (i.e., whether it represents at least one product). We chose this analysis operation because it is currently the most quoted in the literature [26]. In particular, we looked for feature models of different size maximizing execution time in the CSP solver JaCoP integrated into the FaMa framework v1.0. We choose FaMa mainly because our familiarity with the tool. Tailoring Point Variants evaluated and selected Selection strategy Roulette-wheel, 2-Tournament Crossover strategy One-point, Uniform Infeasible individuals Replacing, Repairing Table H.1: Algorithm tailoring, experiment ETHOM #A1 Parameter Values evaluated and selected Crossover probability 0.7, 0.8, 0.9 Mutation probability 0.0075, 0.005, 0.02 Size initial population 50, 100, 200 #Executions fitness function 2000, 5000 Table H.2: ETHOM tuning, experiment ETHOM #A1 313 APPENDIX H. GENERATION OF HARD FEATURE MODELS Underlined values were those providing better results and therefore those selected for the final configuration of ETHOM. In total, we performed over 40 million executions of the objective function to find a good setup for our algorithm (taking into account experiments #A1 and #A2). Experiment #1(a): Maximizing execution time in a CSP Solver In this experiment, we evaluated the ability of ETHOM to search input feature models maximizing the analysis time of a solver. In particular, we measured the execution time required by a CSP solver to find out if the input model is consistent (i.e., whether it is void or not). This was the same problem used to tune the configuration of ETHOM. Again, we chose the consistency operation because it is currently the most used in the literature. Next, we present the experimental description of our experiment in SEDL4People and this experiment. We define the effectiveness of our evolutionary program as the percentage of times (out of 20) in which the program found a better optimum than random models, i.e. a higher number of backtracks. The effectiveness of evolutionary programming was over 90% in most of the cases reaching 100% in six of them. Overall, our evolutionary program found harder feature models than those generated randomly in 88.75% of the executions. We may remark that ETHOM revealed the lowest effectiveness with those models containing 10% of cross-tree constraints. This is due to the simplicity of the analysis in these models. The number of backtracks produced by these models was very low, zero in most of the cases, and thus our evolutionary program had problems finding promising individuals that could evolve towards optimal solutions. Table §H.3 depicts the evaluation results for the range of feature models with 20% of crosstree constraints. For each number of features and search technique, random and evolutionary, the table shows the average and maximum fitness obtained as well as the average and maximum execution times of the hardest feature models found. The effectiveness of the evolutionary program is also presented in the last column. As illustrated, the evolutionary program found feature models producing a number of backtracks larger by several orders of magnitude than those produced by random models. The fitness of the hardest models generated using our evolutionary approach was on average 1,950 times higher than that of random models (108,579.13 backtracks against 55.63) and 2,450 times higher in the maximum value (6.6 million backtracks against 2,751). As expected, these results were also reflected in the execution times. On average, the CSP solver invested 0.03 seconds to analyse the random models and 5.35 seconds to analyse those generated using our evolutionary generator. The superiority of evolutionary search was especially remarkable in the maximum times ranging from the 0.23 seconds of random models to the 251.45 seconds (4.1 minutes) invested by the CSP solver to analyse the hardest feature model generated by our evolutionary program. Overall, our evolutionary approach produced a harder feature model than random techniques in 94% of the executions in the range of 20% of constraints. 314 H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS #Feat. 200 400 600 800 1,000 Total Random Testing Avg Fitness Max Fitness Avg Time (s) Max Time (s) 6.45 15 0.01 0.01 13.20 37 0.02 0.02 29.50 223 0.03 0.06 53.95 304 0.05 0.06 175.05 2,751 0.07 0.23 55.63 2,751 0.03 0.23 Avg Fitness 310.25 8,028.85 8,765.65 346,217.95 179,572.95 108,579.13 Evolutionary Algorithm Max Fitness Avg Time (s) 2,122 0.02 153,599 0.28 118,848 0.67 6,678,168 13.19 3,167,253 12.58 6,678,168 5.35 Max Time (s) 0.06 4.80 7.33 251.45 208.91 251.45 Effectiveness 90 95 100 95 90 94 Table H.3: Evaluation results on the generation of feature models maximizing execution time in a CSP solver A global summary of the results is presented in Table §H.4. The table depicts the maximum execution times invested by the CSP solver to analyse the hardest models found using random and evolutionary search. The data show that our approach was more effective than random models in all size ranges. The hardest random model required 0.23 seconds to be processed. In contrast, our evolutionary approach found five models requiring between 1 and 4.2 minutes to be analysed. Interestingly, our algorithm was able to find harder but significantly smaller feature models (between 400 and 800 features) than the hardest random model found (1,000 features). This emphasizes the ability of our approach to generate motivating input models of realistic size that reveal the vulnerabilities of tools and heuristics instead of just running them using large random models. #Feat. 200 400 600 800 1,000 10% CTC Rand. Time (s) EA Time (s) 0.01 0.05 0.07 0.22 0.05 1.78 0.05 62.30 0.10 3.43 20% CTC Rand. Time (s) EA Time (s) 0.01 0.06 0.02 4.80 0.06 7.33 0.06 251.45 0.23 208.91 30% CTC Rand. Time (s) EA Time (s) 0.02 0.14 0.02 1.02 0.03 4.79 0.05 250.95 0.07 84.99 40% CTC Rand. Time (s) EA Time (s) 0.01 0.02 0.02 0.10 0.03 7.25 0.05 0.35 0.06 0.49 Table H.4: Maximum execution times produced by random models and our evolutionary program. H.4.1 Experiment #1(b): Maximizing execution time in a SAT Solver The only difference between this experiment an the experiment #1(a) is the solver used for the analysis (Parameter Solver: ’FAMA-SAT’). Consequently, we omit the experimental description. Regarding the results, the differences in the execution times obtained using random and evolutionary techniques were not significant. This finding supports the results of Mendoca et al. [187] that show that checking the consistency of feature models with simple cross-tree constraints (i.e. those involving three features or less) using SAT solvers is highly efficient. We emphasize, however, that SAT solvers are not the optimum solution for all the analyses that can be performed on a feature model [26]. Previous studies shows that CSP and BDD solvers are often a better alternative than SAT solvers and therefore experiments with these and others solvers are still necessary to study their applicability. The complete set of results summarized 315 APPENDIX H. GENERATION OF HARD FEATURE MODELS in tables and figures showing the effectiveness of ETHOM for this experiment are available in [246]. H.4.2 Experiment #2: Maximizing memory consumption in a BDD solver In this experiment, we evaluated the ability of our evolutionary program to generate input feature models maximizing the memory consumption of a solver. In particular, we measured the memory consumed by a BDD solver when finding out the number of products represented by the model. We chose this analysis operation because it one of the hardest operations in terms of complexity and it is currently the second operation most quoted in the literature [26]. We decided to use a BDD-based reasoner for this experiment since it has proved to be the most efficient option to perform this operation [26]. Although it is possible to find a good variable ordering that reduces the size of the BDD, the problem of finding the best variable ordering remains NP-complete Table §H.5 depicts the number of BDD nodes of the hardest feature models found using random techniques and our evolutionary program. For each size range, the table also shows the computation time (BDD building time + execution time) invested by SPLOT to analyse the model. As illustrated, our evolutionary program found better results than random techniques in all size ranges. On average, the BDD size found by our evolutionary approach was between 2 and 12.5 times higher than those obtained with random models. The largest BDD generated from random models had 25.3 million nodes while the largest BDD obtained using our evolutionary program had 27.9 million nodes. The results suggest, however, that the maximum found by evolutionary search would be much higher if we would not have limited the improvement factor in the range of 250 features (30% constraints) to make the experiment affordable. As expected, the superiority of our evolutionary program was also observed in the computation times required by each model to be compiled and analysed. This suggest that our approach can also deal with optimization criteria involving compilation time. Overall, our evolutionary program found feature models producing higher memory consumption than random models in 99.3% of the executions. #Feat. 50 100 150 200 250 10% CTC Random Evolutionary BDD size Time (s) BDD Size Time (s) 781 0 1,963 0 7,629 0.01 20,077 0.02 65,627 0.10 188,985 0.31 203,041 0.09 924,832 0.86 1,720,983 3.69 7,170,121 25.94 20% CTC Random Evolutionary BDD size Time (s) BDD Size Time (s) 2,074 0 8,252 0.01 33,522 0.03 161,157 0.20 374,675 0.91 3,060,590 12.80 2,735,005 4.34 19,698,780 75.05 25,392,597 82.28 27,970,630 253.32 30% CTC Random Evolutionary BDD size Time (s) BDD Size Time (s) 2,455 0.01 10,992 0.01 95,587 0.08 419,835 0.73 673,410 1.28 11,221,303 24.22 3,394,435 58.22 23,398,161 380.52 20,579,015 343.72 22,310,416 431.62 Table H.5: BDD size and computation time of the hardest feature models found 316 H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS Figure §H.7 shows the frequency with which each fitness value was found during the search of a feature model producing the largest BDD. The data presented corresponds to the hardest feature models generated in the range of 50 features and 10% of cross-tree constraints. We chose this size range because it produced the smallest BDD sizes and facilitated the comparison of the results of both techniques using the same scale. For random models (Figure §H.7(a)), a narrow Gaussian-like curve is obtained with more than 99% of the executions producing fitness values under 300 BDD nodes. During evolutionary execution (Figure §H.7(b)), however, a wider curve is obtained with 39% of the execution producing values over 300 nodes. Both histograms clearly show how evolutionary programming performed a more exhaustive search in a larger portion of the solution space than that explored by random models. This trend was also observed in the rest of size ranges. During this experiment, we found that the fitness function was not deterministic, that is, different executions with the same input feature model produced different number of BDD nodes. We found, however, that the variations in the number of nodes were small and did not affect the effectiveness of our evolutionary program. H.4.3 Experiment #3(a): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with JaCoP) During the work with ETHOM, we detected that the maximum number of generations used as stop criterion had a great impact in the results of the algorithm. In this experiment, we evaluated that impact with a double aim. First, we tried to find out the minimum number of generations required by ETHOM to offer better results than random techniques on the search for hard feature models. Second, we wanted to find out whether ETHOM was able to find even harder models than in our previous experiments when allowed to run for a large number of generations. Next, we present the experimental description of our experiment in SEDL4People and its results. For each number of generations (i.e., stop criterion), the maximum fitness and the effectiveness of both random and evolutionary search are presented. The results revealed that the effectiveness of ETHOM was around 96% (CSP solver) and 100% (BDD solver) when the number of generations was 25 or higher. More importantly, we found that the results provided by evolutionary search were better and better as the number of generations was increased without reaching a clear peak meanwhile the results of random search showed little or no improvement at all. In the execution with the CSP solver, ETHOM produced a new maximum fitness of more than 77 million backtracks meanwhile random search found a maximum value of only 1,603 backtracks. 317 APPENDIX H. GENERATION OF HARD FEATURE MODELS (a) Distribution of fitness values for random models (b) Figure H.7: Distribution of fitness values for random and evolutionary search 318 H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS H.4.4 Experiment #3(b): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with SAT) The only difference between this experiment an the experiment #1(a) is the solver used for the analysis (Parameter Solver: Parameter Solver: ’SPLOT-BDD’ and the analysis operation was the number of products). Consequently, we omit the experimental description. In a very similar way as in the previous experiment, the maximum random fitness produced in experiment #3(b) was 89,779 nodes, far from the best fitness obtained by our evolutionary program, 22.7 million nodes. Finally, we may emphasize that the maximum number of BDD nodes found by ETHOM in the range of 125 generations (22.2 million nodes) was 120 times higher than the maximum obtained when using 25 generations as stop criterion (185,203 nodes). This shows the power of ETHOM when it is allowed to run for a long number of generations. H.4.5 Experiment #4: Evaluating the impact of the Heuristics of JaCoP In this experiment we checked whether the hard feature models generated by our evolutionary approach were also hard for solvers using other heuristics. In particular, we repeated the analysis of the hardest feature models found in experiment #1 using the other seven heuristics available in the CSP solver JaCoP. The results revealed that the hardest feature models found in our experiment, using the heuristic MostConstrainedDynamic, were trivially solved by some of the others heuristics. This finding supports our working hypothesis: feature models that are hard to analyse by one tool or technique could be trivially processed by others and viceversa. Hence, we conclude that using standard set of problems, random or not, is therefore not sufficient for a full evaluation of the performance of different tools. Instead, as in our approach, the characteristics of the techniques and tools under evaluation must be carefully examined to identify their strengths and weaknesses providing helpful information for both users and developers. Next, we present the experimental description of our experiment in SEDL4People and its results. The results revealed that the hardest feature models found in our experiment, using the heuristic MostConstrainedDynamic, were trivially solved by some of the others heuristics. This finding supports our working hypothesis: feature models that are hard to analyse by one tool or technique could be trivially processed by others and vice-versa. Hence, we conclude that using standard set of problems, random or not, is therefore not sufficient for a full evaluation of the performance of different tools. Instead, as in our approach, the characteristics of the techniques and tools under evaluation must be carefully examined to identify their strengths and weaknesses providing helpful information for both users and developers. 319 APPENDIX H. GENERATION OF HARD FEATURE MODELS H.5 T HREATS TO VALIDITY We briefly discuss the threats to validity of our work: • Experimental procedure. In order to ensure validity of the experimental approach, experiments were performed in a randomized order on the same computer and were replicated 25 times for each experimental configuration. Additionally, the results were formally validated by means of statistical tests that clearly showed the superiority of our algorithm when compared to random search. More specifically, as detailed in [246], Wilcoxon tests were performed on the results obtained with random and evolutionary search. • Limitations of the approach. Experiments showed no significant improvements when using our algorithm with problems of low complexity, i.e., feature models with 10% of constraints in Experiment #1. This limitation is due to the extremely flat shape of fitness landscape found in simple problems in which most fitness values are equal or close to zero. Another limitation of the experimental approach is that experiments for extremely hard feature models become unfeasible. We may remark, however, that this limitation is intrinsic to the problem of looking hard feature models and thus it also affects to random search. Finally, we emphasize that in the worst case our algorithm behave randomly equalling the strategies for the generation of hard feature models used in the current state of the art. • Generalizability of the conclusions. In our experiments, we used two different analysis operations which could seem not sufficient to generalize the conclusions of our study. We remark, however, that these operations are currently the most quoted in the literature, have significantly different complexity and, more importantly, are the base for the implementation of many other analysis operations on feature models [26]. Thus, feature models that are hard to analyse for these operations would certainly be hard to analyse by those operations that use them as an auxiliary function making our results extensible to other analyses. Similarly, we just used two different analysis tools for the experiments, FaMa and SPLOT. We remark, however, that these tools are developed and maintained by independent laboratories providing the sufficient degree of heterogeneity for our study. Finally, the results obtained reveal that the shape and properties of the hard feature models generated are similar to those found in the literature and therefore there is no threat to validity due to the lack of realism of the generated models. 320 I E VIDENCES OF U TILITY AND A PPLICABILITY I.1 U TILITY OF THE C OMPARATIVE F RAMEWORK FOR MOF S During the elaboration of the Comparative Framework of MOFs, we gathered letters from MOFs authors’ stating the usefullness of the CF for making decissions about the planification features to include in their next releases. Figure I.1: Support letter from University of Tubingen 321 APPENDIX I. EVIDENCES OF UTILITY AND APPLICABILITY Figure I.2: Support letter from Univeristy of Applied Science of Upper Austria 322 I.2. UTILITY OF MOSES[RI] I.2 U TILITY OF MOSES[RI] I.2.1 Utility of FOM FOM has been used along almost a decade for solving optimization problems. Unfortunately, the number of downloads and active users has not beet tracked along these years. However, ther is evidence of its use solve traffic planning problems in industry. Additionally we attach an expression of interes from a local engineering company. Figure I.3: Expression of interest on FOM from ISOIN I.2.2 Utility of STATService Since STATService has a web interface it is possible to track the use of the appplication. Figures §I.2.2, §I.2.2, show distribution of visits along time and space repectively. 323 APPENDIX I. EVIDENCES OF UTILITY AND APPLICABILITY Go to this report http://labs.isa.us.es:8080/statservice http://labs.isa.us.es:80… labs.isa.us.es:8080/statservice [DEFAULT] Dec 1, 2011 Jul 8, 2013 Audience Overview % of visits: 100.00% Overview Visits 100 50 January 2012 April 2012 July 2012 October 2012 January 2013 April 2013 Ju... 1,451 people visited this site New Visitor Visits 2,391 Returning Visitor Unique Visitors 1,451 39.8% Pageviews 6,633 Avg. Visit Duration 00:03:03 Pages / Visit 60.2% 2.77 Bounce Rate 69.76% % New Visits 60.10% Country / Territory Visits 1. Spain % Visits 1,180 49.35% 2. United States 166 6.94% 3. Italy 156 6.52% 4. India 79 3.30% 5. United Kingdom 58 2.43% 6. Brazil 43 1.80% 7. Indonesia 40 1.67% 8. Australia 35 1.46% 9. Malaysia 32 1.34% 10. France 26 1.09% view full report © 2013 Google Figure I.4: Timeline of individual visitors to the STATService web portal 324 I.2. UTILITY OF MOSES[RI] Go to this report http://labs.isa.us.es:8080/statservice http://labs.isa.us.es:80… labs.isa.us.es:8080/statservice [DEFAULT] Nov 1, 2011 Jul 8, 2013 Location % of visits: 100.00% Map Overlay Site Usage 1 Country / Territory 1,268 Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate 2,483 2.77 00:03:02 60.69% 70.16% % of Total: 100.00% (2,483) Site Avg: 2.77 (0.00%) Site Avg: 00:03:02 (0.00%) Site Avg: 60.61% (0.13%) Site Avg: 70.16% (0.00%) 1. Spain 1,268 3.46 00:04:09 36.44% 64.12% 2. United States 166 1.40 00:00:31 98.19% 78.31% 3. Italy 157 5.76 00:08:17 38.85% 45.22% 4. India 80 1.32 00:00:30 97.50% 88.75% 5. United Kingdom 58 1.24 00:00:37 98.28% 87.93% 6. Brazil 44 1.16 00:00:29 52.27% 93.18% 7. Indonesia 40 1.28 00:00:21 100.00% 77.50% 8. Australia 35 1.29 00:00:51 82.86% 94.29% 9. Malaysia 32 1.38 00:00:18 96.88% 78.12% France 26 1.23 00:00:33 100.00% 10. 80.77% Rows 1 10 of 96 © 2013 Google Figure I.5: Map of visits to the STATService web portal I.2.3 Utility of EEE E3 is currently in a beta version, and has not been published except for testing purposes and validation of the contributions of this dissertation. 325 J A CRONYMS EEE ETHOM Experiment Execution Environment. Evolutionary algoriTHm for Optimized feature Models. FOM Framework for Metaheuristic Optimization. MIaMOE Minimum Information about Metaheuristic Optimization Experiments. Metaheuristic Optimization Experiment. Metaheuristic Optimization Experiments Description Language. Metaheuristic Optimization Framework. Metaheuristic Optimization Software EcoSystem. Reference Implementation of the Core of MOSES. Metaheuristic Problem Solving. MOE MOEDL MOF MOSES MOSES[RI] MPS QoS-Gasp QoSWSCB QoS-aware GRASP+PR algorithm for servicebased applications binding. QoS-ware Web Service Composition Binding. SEA SEDL Scientific Experiment Archive. Scientific Experiments Description Language. 327 Acronyms 328 B IBLIOGRAPHY [1] Tsplib benchmar library. accesible at: http://www.iwr.uniheidelberg.de/iwr/comopt/soft/ TSPLIB95/TSPLIB.html. (page 160). [2] The r project. GNU project, 2013. URL http://www.r-project.org/. (page 86). [3] E. Aarts and J. Lenstra. Local Search in Combinatorial Optimization. Wiley, 1997. (pages 102, 108). [4] P. Achinstein. Scientific Evidence: Philosophical Theories and Applications. Scientific Evidence. Johns Hopkins University Press, 2005. ISBN 9780801881183. URL http://books. google.es/books?id=xTN7NGO52OoC. (page 253). [5] D. H. Ackley. A connectionist machine for genetic hillclimbing. Kluwer Academic Publishers, Norwell, MA, USA, 1987. ISBN 0-89838-236-X. (pages 34, 109). [6] W. Afzal, R. Torkar, and R. Feldt. A systematic review of search-based testing for nonfunctional system properties. Information and Software Technology, 51(6):957–976, 2009. ISSN 0950-5849. doi: 10.1016/j.infsof.2008.12.005. (pages 306, 312). [7] R. Aggarwal, K. Verma, J. Miller, and W. Milnor. Constraint driven web service composition in meteor-s. In SCC ’04: Proceedings of the 2004 IEEE International Conference on Services Computing, pages 23–30, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2225-4. (page 286). [8] E. Alba and J. F. Chicano. Software project management with gas. Information Sciences, 177(11):2380 – 2401, 2007. ISSN 0020-0255. doi: DOI:10.1016/j.ins.2006.12. 020. URL http://www.sciencedirect.com/science/article/B6V0C-4MTK976-2/2/ 8570bc12b346047bd32fed96dc473c3c. (page 98). [9] B. Andresen and J. M. Gordon. Constant thermodynamic speed for minimizing entropy production in thermodynamic processes and simulated annealing. Phys. Rev. E, 50(6): 4346–4351, Dec 1994. doi: 10.1103/PhysRevE.50.4346. (page 103). [10] T. Andrews, F. Curbera, H. Dholakia, Y. Goland, J. Klein, F. Leymann, K. Liu, D. Roller, D. Smith, S. Thatte, I. Trickovic, and S. Weerawarana. BPEL4WS specification, 2003. (page 279). 329 BIBLIOGRAPHY [11] P. J. Angeline, D. B. Fogel, and L. J. Fogel. A comparison of self-adaptation methods for finite state machines in a dynamic environment. In Proc. 5th Ann. Conf. on Evolutionary Programming, pages 441–450, 1996. (page 110). [12] Apache. The statistical package of the apache commons math library. http://commons. apache.org/math/userguide/stat.html, 2011. URL l. (pages 87, 90, 196). [13] J. Arabas, Z. Michalewicz, and J. Mulawka. Gavaps-a genetic algorithm with varying population size. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 73–78 vol.1, Jun 1994. doi: 10.1109/ICEC.1994.350039. (page 104). [14] D. Ardagna and B. Pernici. Global and local qos guarantee in web service selection. In Business Process Management Workshops, pages 32–46, 2005. (pages 280, 286, 287). [15] D. Ardagna and B. Pernici. Adaptive service composition in flexible processes. Software Engineering, IEEE Transactions on, 33(6):369–384, 2007. (pages 280, 283, 284, 285, 286, 288). [16] T. Back, D. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. IOP Publishing Ltd., Bristol, UK, UK, 1997. ISBN 0750303921. (page 309). [17] T. Back, D. B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. IOP Publishing Ltd., Bristol, UK, UK, 1997. ISBN 0750303921. URL http://portal.acm. org/citation.cfm?id=548530. (pages 32, 33, 34, 102, 104, 108, 109, 110). [18] J. Baker. Reducing bias and inefficiency in the selection algorithm. In Proceedings of the Second International Conference on Genetic Algorithms, pages 14–21, 1987. (page 112). [19] P. Balaprakash, M. Birattari, and T. Stützle. Improvement strategies for the f-race algorithm: sampling design and iterative refinement. In Proceedings of the 4th international conference on Hybrid metaheuristics, HM’07, pages 108–122, Berlin, Heidelberg, 2007. SpringerVerlag. ISBN 3-540-75513-6, 978-3-540-75513-5. URL http://dl.acm.org/citation. cfm?id=1777124.1777133. (pages 6, 75, 76). [20] R. Barga, J. Jackson, N. Araujo, D. Guo, N. Gautam, and Y. Simmhan. The trident scientific workflow workbench. In eScience, 2008. eScience ’08. IEEE Fourth International Conference on, pages 317–318, 2008. doi: 10.1109/eScience.2008.126. (page 85). [21] T. Bartz-Beielstein. Experimental Research in Evolutionary Computation: The New Experimentalism (Natural Computing Series). Springer, 1 edition, Apr. 2006. ISBN 3540320261. (pages 10, 73, 76, 82, 158, 169, 231, 252, 273). 330 BIBLIOGRAPHY [22] T. Bartz-Beielstein and M. Preuss. Experimental research in evolutionary computation. In Proceedings of the 2007 GECCO conference companion on Genetic and evolutionary computation, GECCO ’07, pages 3001–3020, New York, NY, USA, 2007. ACM. ISBN 978-1-59593698-1. doi: 10.1145/1274000.1274102. URL http://doi.acm.org/10.1145/1274000. 1274102. (pages 5, 6, 75). [23] T. Bartz-Beielstein and M. Preuss. Tuning and experimental analysis in evolutionary computation: what we still have wrong. In GECCO (Companion), pages 2625–2646, 2010. (pages 158, 169, 252). [24] T. Bartz-Beielstein, M. Chiarandini, and L. Paquete. Experimental Methods for the Analysis of Optimization Algorithms. SpringerLink: Springer e-Books. Springer, 2010. ISBN 9783642025389. URL http://books.google.es/books?id=UXogQWx8_HEC. (page 76). [25] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Addison Wesley, 3 edition, 2013. (page 186). [26] D. Benavides, S. Segura, and A. Ruiz-Cortés. Automated analysis of feature models 20 years later: A literature review. Information Systems, 35(6):615 – 636, 2010. ISSN 0306-4379. doi: 10.1016/j.is.2010.01.001. (pages 213, 215, 219, 313, 315, 316, 320). [27] R. Berbner, M. Spahn, N. Repp, O. Heckmann, and R. Steinmetz. Heuristics for qosaware web service composition. Web Services, 2006. ICWS ’06. International Conference on, pages 72–82, Sept. 2006. (page 280). [28] A. J. Bertie. Java applications for teaching statistics. MSOR Connections, 2(3):28–81, 2002. (page 87). [29] M. Birattari, T. Stützle, L. Paquete, and K. Varrentrapp. A racing algorithm for configuring metaheuristics. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 11–18, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. ISBN 1-55860-878-8. URL http://dl.acm.org/citation.cfm?id=646205.682291. (pages 6, 75, 76). [30] M. Birattari, M. Zlochin, and M. Dorigo. Towards a theory of practice in metaheuristics design: A machine learning perspective. RAIRO-Theoretical Informatics and Applications, 40(02):353–369, 2006. (pages 158, 169). [31] M. Birgmeier. Evolutionary programming for the optimization of trellis-coded modulation schemes. In Proc. 5th Ann. Conf. on Evolutionary Programming, 1996. (page 110). [32] J. L. Blanton, Jr. and R. L. Wainwright. Multiple vehicle routing with time and capacity constraints using genetic algorithms. In Proceedings of the 5th International Conference on 331 BIBLIOGRAPHY Genetic Algorithms, pages 452–459, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. ISBN 1-55860-299-2. (page 109). [33] C. I. Bliss. The calculation of the dosage-mortality curve. Annals of Applied Biology, 22 (1):134–167, 1935. ISSN 1744-7348. doi: 10.1111/j.1744-7348.1935.tb07713.x. URL http: //dx.doi.org/10.1111/j.1744-7348.1935.tb07713.x. (page 141). [34] C. Blum, M. J. B. Aguilera, A. Roli, and M. Sampels. Hybrid Metaheuristics: An Emerging Approach to Optimization. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 354078294X, 9783540782940. (page 27). [35] P. A. Bonatti and P. Festa. On optimal service selection. In WWW ’05: Proceedings of the 14th international conference on World Wide Web, pages 530–538, New York, NY, USA, 2005. ACM. ISBN 1-59593-046-9. (page 280). [36] J. Bosch. Architecture challenges for software ecosystems. In Proceedings of the Fourth European Conference on Software Architecture: Companion Volume, pages 93–95. ACM, 2010. (page 187). [37] G. E. Box and K. Wilson. On the experimental attainment of optimum conditions. Journal of the Royal Statistical Society. Series B (Methodological), 13(1):1–45, 1951. (page 76). [38] A. Brindle. Genetic Algorithms for Function Optimization. PhD thesis, University of Alberta, Edmonton, 1981. (page 112). [39] A. W. Brown and K. C. Wallnau. A framework for evaluating software technology. IEEE Softw., 13(5):39–49, 1996. ISSN 0740-7459. doi: http://dx.doi.org/10.1109/52.536457. (page 94). [40] J. Brownlee. Oat: The optimization algorithm toolkit. Technical report, Complex Intelligent Systems Laboratory, Swinburne University of Technology, 2007. (page 98). [41] S. Cahon, N. Melab, and E.-G. Talbi. Paradiseo: A framework for the reusable design of parallel and distributed metaheuristics. Journal of Heuristics, 10(3):357–380, 2004. ISSN 1381-1231. doi: http://dx.doi.org/10.1023/B:HEUR.0000026900.92269.ec. (pages 10, 83, 98, 116, 182). [42] G. Canfora, M. D. Penta, R. Esposito, and M. Villani. Qos-aware replanning of composite web services. Web Services, 2005. ICWS 2005. Proceedings. 2005 IEEE International Conference on, 1:121–129, 11-15 July 2005. doi: 10.1109/ICWS.2005.96. (pages 280, 287, 288, 291). [43] G. Canfora, M. D. Penta, R. Esposito, and M. L. Villani. An approach for qos-aware service composition based on genetic algorithms. In GECCO ’05: Proceedings of the 2005 332 BIBLIOGRAPHY conference on Genetic and evolutionary computation, pages 1069–1075, New York, NY, USA, 2005. ACM. ISBN 1-59593-010-8. (pages 34, 207, 283, 284, 288, 299). [44] G. Canfora, M. D. Penta, R. Esposito, and M. L. Villani. A framework for qos-aware binding and re-binding of composite web services. Journal of Systems and Software, 81(10): 1754–1769, 2008. (pages 280, 282, 284, 285). [45] V. Cardellini, E. Casalicchio, V. Grassi, and F. L. Presti. Efficient provisioning of service level agreements for service oriented applications. In IW-SOSWE ’07: 2nd international workshop on Service oriented software engineering, pages 29–35, New York, NY, USA, 2007. ACM. ISBN 9781595937230. URL http://portal.acm.org/citation.cfm?id=1294936. (page 286). [46] J. Cardoso, A. Sheth, J. Miller, J. Arnold, and K. Kochut. Quality of service for workflows and web service processes. Web Semantics: Science, Services and Agents on the World Wide Web, 1(3):281–308, April 2004. doi: 10.1016/j.websem.2004.03.001. (page 285). [47] K. Chakhlevitch and P. Cowling. Hyperheuristics: Recent developments. In Adaptive and Multilevel Metaheuristics, pages 3–29. 2008. (page 115). [48] A. F. Chalmers. What Is This Thing Called Science? Hackett Pub Co, 3 edition, Jan. 1999. ISBN 0702230936. (page 53). [49] A. Chatterjee and P. Siarry. Nonlinear inertia weight variation for dynamic adaptation in particle swarm optimization. Computers and Operations Research, 33(3):859 – 871, 2006. ISSN 0305-0548. (page 38). [50] M. Chiarandini, L. Paquete, M. Preuss, and E. Ridge. Experiments on metaheuristics: methodological overview and open issues. Technical Report DMF-2007-03-003, The Danish Mathematical Society, 2007. (pages 5, 6, 75). [51] A. Chu, J. Cui, and I. D. Dinov. Socr analyses: Implementation and demonstration of a new graphical statistics educational toolkit. Journal of Statistical Software, 30(3):1–19, April 2009. (page 87). [52] D. Claro, P. Albers, and J. Hao. Selecting web services for optimal composition. In Proc. Int. Conf. Web Services (ICWS 05), 2005. (page 287). [53] P. Clements, D. Garlan, L. Bass, J. Stafford, R. Nord, J. Ivers, and R. Little. Documenting Software Architectures: Views and Beyond. Pearson Education, 2010. ISBN 0201703726. (pages 183, 186). [54] M. Clerc. Particle Swarm Optimization. ISTE Publishing Company, February 2006. ISBN 1905209045. (pages 37, 102). 333 BIBLIOGRAPHY [55] D. Comes, H. Baraki, R. Reichle, M. Zapf, and K. Geihs. Heuristic approaches for qosbased service selection. Service-Oriented Computing, 6470:441–455, 2010. (page 287). [56] D. Corne, J. D. Knowles, and M. J. Oates. The pareto envelope-based selection algorithm for multi-objective optimisation. In PPSN VI: Proceedings of the 6th International Conference on Parallel Problem Solving from Nature, pages 839–848, London, UK, 2000. SpringerVerlag. ISBN 3-540-41056-2. (page 105). [57] P. I. Cowling, G. Kendall, and E. Soubeiga. Hyperheuristics: A tool for rapid prototyping in scheduling and optimisation. In Proceedings of the Applications of Evolutionary Computing on EvoWorkshops 2002, pages 1–10, London, UK, 2002. Springer-Verlag. ISBN 3-540-43432-1. (page 115). [58] N. L. Cramer. A representation for the adaptive generation of simple sequential programs. In Proceedings of the 1st International Conference on Genetic Algorithms, pages 183– 187, Hillsdale, NJ, USA, 1985. L. Erlbaum Associates Inc. ISBN 0-8058-0426-9. (page 110). [59] H. Cremér. Mathematical Methods of Statistics (PMS-9), volume 9. Princeton university press, 1999. (page 144). [60] L. J. Cronbach and K. Shapiro. Designing evaluations of educational and social programs. Taylor & Francis, 1983. (page 72). [61] L. J. Cronbach, S. R. Ambron, S. M. Dornbusch, R. D. Hess, R. C. Hornik, D. Phillips, D. F. Walker, and S. S. Weiner. Toward reform of program evaluation. Jossey-Bass Publishers San Francisco, 1980. (page 72). [62] J. W. Daly, A. Brooks, J. Miller, M. Roper, and M. Wood. Verification of results in software maintenance through external replication. In H. A. Müller and M. Georges, editors, ICSM, pages 50–57. IEEE Computer Society, 1994. ISBN 0-8186-6330-8. (page 10). [63] L. Davis. Applying adaptive algorithms to epistatic domains. In IJCAI’85: Proceedings of the 9th international joint conference on Artificial intelligence, pages 162–164, San Francisco, CA, USA, 1985. Morgan Kaufmann Publishers Inc. ISBN 0-934613-02-8, 978-0-934-613026. (page 109). [64] L. Davis. Adapting operator probabilities in genetic algorithms. In Proceedings of the third international conference on Genetic algorithms, pages 61–69, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-006-3. (page 110). [65] L. de Castro and F. Von Zuben. Learning and optimization using the clonal selection principle. Evolutionary Computation, IEEE Transactions on, 6(3):239–251, Jun 2002. ISSN 1089-778X. doi: 10.1109/TEVC.2002.1011539. (page 105). 334 BIBLIOGRAPHY [66] M. C. de Souza and P. Martins. Skewed vns enclosing second order algorithm for the degree constrained minimum spanning tree problem. European Journal of Operational Research, 191(3):677 – 690, 2008. ISSN 0377-2217. doi: DOI:10.1016/j.ejor.2006. 12.061. URL http://www.sciencedirect.com/science/article/B6VCT-4N2KTC4-7/ 2/7799160d76fbba32ad42f719ee72bbf9. (page 103). [67] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002. (page 105). [68] J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:130, 2006. (page 61). [69] J. Derrac, S. GarcÃa, D. Molina, and F. Herrera. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1):3 – 18, 2011. ISSN 2210-6502. doi: 10.1016/j.swevo.2011.02.002. URL http://www.sciencedirect.com/ science/article/pii/S2210650211000034. (pages 61, 63, 70, 71, 75, 77, 87, 154, 169, 172, 176, 196). [70] G. Development and R. Rosenthal. GAMS: A User’s Guide. Books on Demand, 2006. ISBN 9783833435089. URL http://books.google.es/books?id=253PPQAACAAJ. (pages 85, 177). [71] L. Di Gaspero and A. Schaerf. Easylocal++: An object-oriented framework for flexible design of local search algorithms. Software — Practice & Experience, 33(8):733–765, July 2003. doi: 10.1002/spe.524. (pages 10, 83, 98). [72] M. Dorigo and G. Di Caro. The ant colony optimization meta-heuristic. New ideas in optimization, pages 11–32, 1999. (page 26). [73] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1): 53–66, April 1997. URL http://citeseer.ist.psu.edu/115428.html. (pages 39, 105). [74] S. Dower. Specifying ant system with esdl. Technical report, Technical Report TR/CIS/2010/2, Swinburne University of Technology, 2010. (page 85). [75] S. Dower and C. Woodward. Specifying particle swarm optimisation with esdl. Technical report, Technical Report TR/CIS/2010/5, Swinburne University of Technology, 2010. (page 85). 335 BIBLIOGRAPHY [76] S. Dower and C. J. Woodward. Esdl: a simple description language for populationbased evolutionary computation. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages 1045–1052, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0557-0. doi: 10.1145/2001576.2001718. URL http://doi.acm. org/10.1145/2001576.2001718. (page 85). [77] Dreo, A. Pétrowski, P. Siarry, and E. Taillard. Metaheuristics for Hard Optimization: Methods and Case Studies. Springer, December 2005. ISBN 354023022X. (page 105). [78] J. Dreo, A. Petrowski, and E. Taillard. Metaheuristics for Hard Optimization. Springer, 2003. (pages 25, 291). [79] O. J. Dunn. Multiple comparisons among means. Journal of the American Statistical Association, 56:52–64, 1961. (page 271). [80] A. Eiben and M. Jelasity. A critical note on experimental research methodology in ec. Computational Intelligence, Proceedings of the World on Congress on, 1:582–587, 2002. (pages 87, 273). [81] A. E. Eiben, P.-E. Raué, and Z. Ruttkay. Genetic algorithms with multi-parent recombination. In PPSN III: Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature, pages 78–87, London, UK, 1994. Springer-Verlag. ISBN 3-540-58484-6. (page 109). [82] Elsevier. The executable papers grand challenge, 2011. executablepapers.com/. (page 8). URL http://www. [83] T. Erl, A. Karmarkar, P. Walmsley, H. Haas, L. U. Yalcinalp, K. Liu, D. Orchard, A. Tost, and J. Pasley. Web Service Contract Design and Versioning for SOA. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1 edition, 2009. ISBN 9780136135173. (page 187). [84] L. J. Eshelman. The chc adaptive search algorithm : How to have safe search when engaging in nontraditional genetic recombination. Foundations of Genetic Algorithms, pages 265–283, 1991. URL http://ci.nii.ac.jp/naid/10000024547/en/. (page 109). [85] L. J. Eshelman and J. D. Schaffer. Real-coded genetic algorithms and interval-schemata. In D. L. Whitley, editor, Foundation of Genetic Algorithms 2, pages 187–202, San Mateo, CA, 1993. Morgan Kaufmann. (page 109). [86] L. J. Eshelman, R. A. Caruana, and J. D. Schaffer. Biases in the crossover landscape. In Proceedings of the third international conference on Genetic algorithms, pages 10–19, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-006-3. (page 109). 336 BIBLIOGRAPHY [87] S. Exchange. The reproducibility initiative, 2011. URL https://www.scienceexchange. com/reproducibility. (pages 8, 86, 156, 192). [88] N. E. Fenton and S. L. Pfleeger. Software Metrics: A Rigorous and Practical Approach. PWS Publishing Co., Boston, MA, USA, 2nd edition, 1998. ISBN 0534954251. (page 52). [89] T. A. Feo and M. G. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6:109–133, 1995. (page 38). [90] P. Fernández, M. Resinas, and R. Corchuelo. Towards an automatic service trading. Upgrade, 7(5):26–29, 2006. ISSN 1684-5285. (page 282). [91] P. Festa and M. Resende. An annotated bibliography of grasp part ii: Applications. International Transactions in Operational Research, 16(2):131–172, 2009. ISSN 1475-3995. (page 38). [92] H. Finner. On a monotonicity problem in step-down multiple test procedures. Journal of the American Statistical Association, 88:920–923, 1993. (page 271). [93] R. A. Fisher. The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33:503 – 513, 1926. (page 57). [94] C. A. Floudas and P. M. Pardalos. Encyclopedia of Optimization, 2nd edition, October 2008. ISBN 978-0-387-74760-6. (pages 4, 5). [95] T. C. Fogarty. Varying the probability of mutation in the genetic algorithm. In Proceedings of the 3rd International Conference on Genetic Algorithms, pages 104–109, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-066-3. (page 111). [96] D. Fogel, L. Fogel, and J. Atmar. Meta-evolutionary programming. In Signals, Systems and Computers, 1991. 1991 Conference Record of the Twenty-Fifth Asilomar Conference on, pages 540–545 vol.1, Nov 1991. doi: 10.1109/ACSSC.1991.186507. (page 110). [97] L. J. Fogel. On the Organization of Intellect. PhD thesis, UCLA, 1964. (page 109). [98] L. J. Fogel and D. B. Fogel. Artificial intelligence through evolutionary programming. Technical report, Final Report for US Army Research Institute, contract no PO-9-X561102C-1, 1986. (page 110). [99] L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial intelligence through simulated evolution. Wiley, 1966. (pages 103, 109). [100] C. M. Fonseca and P. J. Fleming. Genetic algorithms for multiobjective optimization: Formulationdiscussion and generalization. In Proceedings of the 5th International Conference 337 BIBLIOGRAPHY on Genetic Algorithms, pages 416–423, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. ISBN 1-55860-299-2. (page 105). [101] M. Fontoura, C. Lucena, A. Andreatta, S. Carvalho, and C. Ribeiro. Using uml-f to enhance framework development: A case study in the local search heuristics domain. Journal of Systems and Software, 57(3):201–206, 2001. cited By (since 1996) 3. (page 133). [102] R. Fourer, D. Gay, and B. Kernighan. AMPL: a modeling language for mathematical programming. Thomson/Brooks/Cole, 2003. ISBN 9780534388096. URL http://books.google. es/books?id=Ij8ZAQAAIAAJ. (pages 85, 177). [103] M. Fowler. Patterns of enterprise application architecture. A Martin Fowler signature book. Addison Wesley Professional, 2003. ISBN 9780321127426. URL http://books.google. es/books?id=FyWZt5DdvFkC. (page 186). [104] M. Fowler. Inversion of control containers and the dependency injection pattern., 2004. URL http://www.martinfowler.com/articles/injection.html(2004). (page 123). [105] G. Fraser and J. T. de Souza, editors. Search Based Software Engineering. Proceedings of the 4th International Symposium on SSBSE. Springer, 2012. (page 82). [106] M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200):675–701, 1937. (page 271). [107] C. Gagnè and M. Parizeau. Genericity in evolutionary computation software tools: Principles and case-study. International Journal on Artificial Intelligence Tools, 15(2):173–194, 2006. (pages 10, 83, 94, 97). [108] M. Gallego, F. Gortazar, and E. G. Pardo. Optsicom optimization suite, un conjunto de herramientas para la investigacion en optimizacion. In Proceedings of the VII Spanish Conference on Metaheuristic, Evolutionary and Bio-inspired Algorithms (MAEB), pages 352– 363, 2010. (page 84). [109] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, illustrated edition edition, January 1994. (page 111). [110] C. Gao, M. Cai, and H. Chen. Qos-driven global optimization of services selection supporting services flow re-planning. In Advances in Web and Network Technologies, and Information Management, Lecture Notes in Computer Science, pages 516–521. Springer, 2007. doi: 10.1007/978-3-540-72909-9\ 56. (pages 287, 288). 338 BIBLIOGRAPHY [111] S. Garcı́a, A. Fernández, J. Luengo, and F. Herrera. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10):2044–2064, 2010. (pages 77, 87, 90, 196). [112] J. Garcı́a-Nieto, E. Alba, and F. Chicano. Using metaheuristic algorithms remotely via ros. In GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 1510–1510, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-697-4. doi: http://doi.acm.org/10.1145/1276958.1277239. (page 121). [113] P. Garcı́a-Sánchez, J. González, P. A. Castillo, J. Merelo, A. M. Mora, J. L. J. Laredo, and M. G. Arenas. A distributed service oriented framework for metaheuristics using a public standard. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pages 211–222. Springer, 2010. (page 86). [114] M. Gavish and D. Donoho. A universal identifier for computational results. Procedia Computer Science, 4(0):637 – 647, 2011. Proceedings of the International Conference on Computational Science, ICCS 2011. (page 8). [115] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. Readings in computer vision: issues, problems, principles, and paradigms, pages 564–584, 1987. (page 103). [116] J. A. Gliner, G. A. Morgan, and N. L. Leech. Research methods in applied settings: an integrated approach to design and analysis (second edition), volume 2nd. Tylor and Francis Group, 2009. (pages 6, 7, 10, 46, 47, 52, 53, 55, 61, 62, 65, 66, 67, 68, 69, 72, 76, 77, 149, 172, 176, 252). [117] F. Glover. Heuristics for integer programming using surrogate constraints. Decision Sciences, 8(1):156–166, 1977. URL http://dx.doi.org/10.1111/j.1540-5915.1977. tb01074.x. (page 38). [118] F. Glover. Tabu search ? part i. ORSA Journal on Computing, 1:190?206, 1989. (page 30). [119] F. Glover. A template for scatter search and path relinking. In J.-K. Hao, E. Lutton, E. Ronald, M. Schoenauer, and D. Snyers, editors, Artificial Evolution, volume 1363 of Lecture Notes in Computer Science, pages 1–51. Springer Berlin / Heidelberg, 1998. ISBN 978-3-540-64169-8. (page 36). [120] F. Glover and G. A. Kochenberger. Handbook of Metaheuristic. Kluwer Academic Publishers, 2002. (pages 5, 25, 26, 27, 102). 339 BIBLIOGRAPHY [121] D. Goldberg and R. Lingle. Alleles loci and the traveling salesman problem. In 1, editor, Proc. 1st Int. Conf. on Genetic Algorithms and their Applications, pages 154–159, 1985. (page 109). [122] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine learning. Addison Wesley, 1989. (page 105). [123] D. E. Goldberg. A note on boltzmann tournament selection for genetic algorithms and population-oriented simulated annealing. Complex Systems, 4(4):445–460, 1990. (page 103). [124] D. E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. In FOGA. Proceedings of the First Workshop on Foundations of Genetic Algorithms., pages 69–93, 1990. (page 33). [125] D. E. Goldberg and R. E. Smith. Nonstationary function optimization using genetic algorithm with dominance and diploidy. In Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, pages 59–68, Hillsdale, NJ, USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 104). [126] O. S. Gómez, N. Juristo, and S. Vegas. Replications types in experimental disciplines. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, pages 3:1–3:10, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0039-1. doi: 10.1145/1852786.1852790. URL http://doi.acm.org/10.1145/ 1852786.1852790. (page 7). [127] P. V. Gorp and S. Mazanek. Share: a web portal for creating and sharing executable research papers. Procedia Computer Science, 4(0):589 – 597, 2011. ISSN 1877-0509. URL http: //www.sciencedirect.com/science/article/pii/S1877050911001207. Proceedings of the International Conference on Computational Science, ICCS 2011. (pages 8, 86). [128] P. Hansen, N. Mladenović, and D. Perez-Britos. Variable neighborhood decomposition search. Journal of Heuristics, 7(4):335–350, 2001. ISSN 1381-1231. doi: http://dx.doi.org/ 10.1023/A:1011336210885. (page 103). [129] M. Harman. The current state and future of search based software engineering. Future of Software Engineering, 2007. FOSE ’07, pages 342–357, 23-25 May 2007. doi: 10.1109/FOSE. 2007.29. (pages 82, 280). [130] M. Harman and A. Mansouri. Search based software engineering: Introduction to the special issue of the ieee transactions on software engineering. IEEE Transactions on Software Engineering, 36(6):737–741, 2010. ISSN 0098-5589. doi: http://doi. ieeecomputersociety.org/10.1109/TSE.2010.106. (page 82). 340 BIBLIOGRAPHY [131] M. Harman, P. McMinn, J. de Souza, and S. Yoo. Search based software engineering: Techniques, taxonomy, tutorial. In B. Meyer and M. Nordio, editors, Empirical Software Engineering and Verification, volume 7007 of Lecture Notes in Computer Science, pages 1–59. Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-25230-3. URL http://dx.doi.org/ 10.1007/978-3-642-25231-0_1. (page 82). [132] P. V. Hentenryck, L. Michel, P. Laborie, W. Nuijten, and J. Rogerie. Combinatorial optimization in opl studio. In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, EPIA ’99, pages 1–15, London, UK, UK, 1999. Springer-Verlag. ISBN 3-540-66548-X. (page 40). [133] J. Hilbe. Logistic Regression Models. A Chapman & Hall book. CRC PressINC, 2009. ISBN 9781420075755. URL http://books.google.es/books?id=eJcMIAAACAAJ. (page 141). [134] K. Hinkelmann and O. Kempthorne. Design and Analysis of Experiments, Introduction to Experimental Design. Design and Analysis of Experiments. Wiley, 2007. ISBN 9780470191743. URL http://books.google.es/books?id=T3wWj2kVYZgC. (pages 54, 55, 61, 62). [135] Y. C. Ho and D. L. Pepyne. Simple explanation of the no-free-lunch theorem and its implications. Journal of Optimization Theory and Applications, 115(3):549–570, December 2002. URL http://www.ingentaconnect.com/content/klu/jota/2002/00000115/ 00000003/00450394. (page 42). [136] Y. Hochberg. A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75:800–803, 1988. (page 271). [137] J. L. Hodges and E. L. Lehmann. Ranks methods for combination of independent experiments in analysis of variance. Annals of Mathematical Statistics, 33:482–497, 1962. (page 271). [138] B. S. Holland and M. D. Copenhaver. An improved sequentially rejective bonferroni test procedure. Biometrics, 43:417–423, 1987. (page 271). [139] J. H. Holland. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press, 1975. ISBN 0472084607. (pages 34, 110, 112). [140] J. H. Holland. Adaptation in Natural and Artificial Systems. MIT press. 2nd edition, 1992. (pages 104, 109). [141] S. Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979. (page 271). 341 BIBLIOGRAPHY [142] G. Hommel. A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika, 75:383–386, 1988. (page 271). [143] J. Horn, N. Nafpliotis, and D. Goldberg. A niched pareto genetic algorithm for multiobjective optimization. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 82–87 vol.1, Jun 1994. doi: 10.1109/ICEC.1994.350037. (page 105). [144] IBM. SPSS 17 Statistical Package. http://www.spss.com/, accessed November 2010. (page 86). [145] IEEE. Posix std 1003.1, 2004. (page 274). [146] R. L. Iman and J. M. Davenport. Approximations of the critical region of the friedman statistic. Commun. Stat., 18:571–595, 1980. (page 271). [147] F. S. F. Inc. Gnu lesser general public http://www.gnu.org/copyleft/lesser.html. (page 186). licence version 3. [148] S. Iredi, D. Merkle, and M. Middendorf. Bi-criterion optimization with multi colony ant algorithms. In EMO ’01: Proceedings of the First International Conference on Evolutionary Multi-Criterion Optimization, pages 359–372, London, UK, 2001. Springer-Verlag. ISBN 3-540-41745-1. (page 105). [149] ISO/IEC. Information technology – document container file – part 1: Core (formal name). profile of zip file format. np 21320-1., 2011. (page 274). [150] M. C. Jaeger, G. Mühl, and S. Golze. Qos-aware composition of web services: An evaluation of selection algorithms. On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE, pages 646–661, 2005. (page 287). [151] D. S. Johnson. A theoretician’s guide to the experimental analysis of algorithms. 2002. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.1935. (page 10). [152] K. A. D. Jong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, 1975. (pages 109, 112). [153] F. Jouault, F. Allilaire, J. Bézivin, I. Kurtev, and P. Valduriez. Atl: a qvt-like transformation language. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, pages 719–720. ACM, 2006. (page 169). [154] N. Jurison and A. M. Moreno. Basics of Software Engineering Experimentation. Kluwer Academic Publishers, 2005. (pages 6, 46, 52, 54, 63). 342 BIBLIOGRAPHY [155] N. Juristo and A. Moreno. Basics of Software Engineering Experimentation. Springer, 2001. ISBN 9780792379904. URL http://books.google.es/books?id=ovWfOeW653EC. (pages 10, 84). [156] J. Kallrath. Modeling Languages in Mathematical Optimization. Applied Optimization. Springer, 2004. ISBN 9781402075476. URL http://books.google.es/books?id= wJYART7VYe8C. (page 85). [157] K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature–Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, SEI, 1990. (page 307). [158] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. (page 144). [159] J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 4, pages 1942–1948 vol.4, 1995. doi: 10.1109/ICNN.1995.488968. URL http://dx.doi.org/10.1109/ICNN.1995.488968. (page 37). [160] J. Kennedy and R. Mendes. Population structure and particle swarm performance. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2002. (page 104). [161] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983. (pages 28, 103). [162] B. A. Kitchenham. Software Metrics: Measurement for Software Process Improvement. Blackwell Publishers, Inc., Cambridge, MA, USA, 1996. ISBN 1855548208. (page 52). [163] B. A. Kitchenham. Procedures for undertaking systematic reviews. Technical report, Computer Science Department, Keele University, 2004. (page 95). [164] A. Klein, F. Ishikawa, and S. Honiden. Towards network-aware service composition in the cloud. In Proceedings of the 21st international conference on World Wide Web, WWW ’12, pages 959–968, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1229-5. doi: 10.1145/ 2187836.2187965. URL http://doi.acm.org/10.1145/2187836.2187965. (page 287). [165] J. D. Knowles and D. W. Corne. Approximating the nondominated front using the pareto archived evolution strategy. Evol. Comput., 8(2):149–172, 2000. ISSN 1063-6560. doi: http://dx.doi.org/10.1162/106365600568167. (page 105). [166] J. M. Koa, C. O. Kima, and I.-H. Kwonb. Quality-of-service oriented web service composition algorithm and planning architecture. Journal of Systems and Software, 81(11):2079– 2090, 2008. (pages 207, 280, 291, 299, 301, 302). 343 BIBLIOGRAPHY [167] D. Köhn and N. Novère. Sed-ml — an xml format for the implementation of the miase guidelines. In Proceedings of the 6th International Conference on Computational Methods in Systems Biology, CMSB ’08, pages 176–190, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978-3-540-88561-0. doi: 10.1007/978-3-540-88562-7 15. URL http://dx.doi.org/ 10.1007/978-3-540-88562-7_15. (page 84). [168] J. Koza. Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA, 1992. ISBN 0-262-11170-5. (page 309). [169] J. R. Koza. Genetic programming: On the programming of computers by natural selection. MIT Press, 1992. (pages 33, 110). [170] M. Kronfeld, H. Planatscher, and A. Zell. The EvA2 optimization framework. In C. Blum and R. Battiti, editors, Learning and Intelligent Optimization Conference, Special Session on Software for Optimization (LION-SWOP), number 6073 in Lecture Notes in Computer Science, LNCS, pages 247–250, Venice, Italy, Jan. 2010. Springer Verlag. URL http: //www.ra.cs.uni-tuebingen.de/publikationen/2010/Kron10EvA2Short.pdf. (page 98). [171] P. Leitner, W. Hummer, and S. Dustdar. Cost-based optimization of service compositions. IEEE T. Services Computing, 6(2):239–251, 2013. (page 287). [172] H. Levene. Robust test for quality variance. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, pages 278–292, 1960. (page 271). [173] J. A. Lewis, S. M. Henry, D. G. Kafura, and R. S. Schulman. On the relationship between the object-oriented paradigm and software reuse: An empirical investigation. Technical report, Blacksburg, VA, USA, 1992. (page 45). [174] J. Li. A two-step rejection procedure for testing multiple hypotheses. Journal of Statistical Planning and Inference, 138:1521–1527, 2008. (page 271). [175] H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62:399–402, 1967. (page 271). [176] R. Lowry. Vassarstats: Website for statistical computation. 1998. URL http://faculty. vassar.edu/lowry/VassarStats.html. (page 87). [177] B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039–1065, 2006. ISSN 1532-0634. doi: 10. 1002/cpe.994. URL http://dx.doi.org/10.1002/cpe.994. (page 85). 344 BIBLIOGRAPHY [178] M. Lukasiewycz, M. Glaß, F. Reimann, and S. Helwig. Opt4j - the optimization framework for java. http://www.opt4j.org, 2009. (pages 98, 126). [179] S. Luke, L. Panait, G. Balan, S. Paus, Z. Skolicki, E. Popovici, K. Sullivan, J. Harrison, J. Bassett, R. Hubley, A. Chircop, J. Compton, W. Haddon, S. Donnelly, B. Jamil, and J. O’Beirne. Ecj: A java-based evolutionary computation research system. http://cs.gmu.edu/ eclab/projects/ecj/, 2009. (pages 98, 182). [180] H. Ma, F. Bastani, I.-L. Yen, and H. Mei. Qos-driven service composition with reconfigurable services. IEEE Transactions on Services Computing, 6(1):20–34, 2013. ISSN 1939-1374. doi: http://doi.ieeecomputersociety.org/10.1109/TSC.2011.21. (page 287). [181] O. Maron and A. W. Moore. The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, 11:193–225, 1997. (page 76). [182] M. Mattsson, J. Bosch, and M. E. Fayad. Framework integration problems, causes, solutions. Commun. ACM, 42(10):80–87, Oct. 1999. ISSN 0001-0782. doi: 10.1145/317665. 317679. URL http://doi.acm.org/10.1145/317665.317679. (page 10). [183] D. G. Mayo. An objective theory of statistical testing. Synthese, 57:297–340, 1983. ISSN 0039-7857. URL http://dx.doi.org/10.1007/BF01064701. 10.1007/BF01064701. (page 253). [184] P. McMinn. Search-based software test data generation: a survey: Research articles. Software Testing Verification and Reliability., 14(2):105–156, 2004. ISSN 0960-0833. doi: 10.1002/stvr.v14:2. (page 306). [185] Q. McNemar. On the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157, 1947. (page 271). [186] K. Meffert. JUnit Profi-Tips. Entwickler.Press, 2006. (page 133). [187] M. Mendonça, A. Wasowski, and K. Czarnecki. SAT–based analysis of feature models is easy. In Proceedings of the Sofware Product Line Conference, 2009. (page 315). [188] B. Meyer. Object-Oriented Software Construction. Prentice Hall, 1988. (page 186). [189] Z. Michalewicz. Genetic Algorithms Plus Data Structures Equals Evolution Programs. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1994. ISBN 0387580905. (page 109). [190] Z. Michalewicz and D. B. Fogel. How to Solve It: Modern Heuristics. Springer, December 2004. ISBN 3540224947. URL http://www.amazon.ca/exec/obidos/redirect?tag= citeulike09-20&path=ASIN/3540224947. (pages 25, 108, 113). 345 BIBLIOGRAPHY [191] L. Michel and P. Van Hentenryck. Localizer a modeling language for local search. In Principles and Practice of Constraint Programming-CP97, pages 237–251. Springer, 1997. (page 85). [192] N. Monmarchè, G. Venturini, and M. Slimane. On how pachycondyla apicalis ants suggest a new search algorithm. Future Gener. Comput. Syst., 16(9):937–946, 2000. ISSN 0167739X. (page 105). [193] D. J. Montana. Strongly typed genetic programming. Evol. Comput., 3(2):199–230, 1995. ISSN 1063-6560. doi: http://dx.doi.org/10.1162/evco.1995.3.2.199. (pages 110, 111). [194] D. J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. In IJCAI’89: Proceedings of the 11th international joint conference on Artificial intelligence, pages 762–767, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. (page 110). [195] D. Montgomery. Design and Analysis of Experiments. John Wiley & Sons Canada, Limited, 1997. ISBN 9780471260080. (pages 54, 55, 61, 62, 172, 176). [196] S. T. Mueller. The PEBL Manual. 2010. (page 84). [197] H. Muhlenbein. Evolution on time and space-the parallel genetic glgorithm. Foundations of Genetic Algorithms, 1991. URL http://ci.nii.ac.jp/naid/10016718767/en/. (page 109). [198] P. B. Nemenyi. Distribution-free Multiple comparisons. PhD thesis, Princeton University, 1963. (page 271). [199] G. J. V. Nossal and J. Lederberg. Antibody production by single cells. Nature, 181(4620): 1419–1420, May 1958. doi: 10.1038/1811419a0. URL http://dx.doi.org/10.1038/ 1811419a0. (page 105). [200] P. Nowakowski, E. Ciepiela, D. Harezlak, J. Kocot, M. Kasztelnik, T. Bartybski, J. Meizner, G. Dyk, and M. Malawski. The collage authoring environment. Procedia Computer Science, 4(0):608 – 617, 2011. ISSN 1877-0509. URL http://www.sciencedirect.com/ science/article/pii/S1877050911001220. Proceedings of the International Conference on Computational Science, ICCS 2011. (pages 8, 86). [201] J. D. Nulton and P. Salamon. Statistical mechanics of combinatorial optimization. Phys. Rev. A, 37(4):1351–1356, Feb 1988. doi: 10.1103/PhysRevA.37.1351. (page 103). [202] S. of Electronics and C. S. at the University of Southampton. Reproducible research repository. URL http://rr.epfl.ch/. (pages 8, 86, 156, 192). 346 BIBLIOGRAPHY [203] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover, C. Goble, A. Goderis, D. Hull, D. Marvin, P. Li, P. Lord, M. R. Pocock, M. Senger, R. Stevens, A. Wipat, and C. Wroe. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, 18(10):1067–1100, 2006. ISSN 1532-0634. doi: 10.1002/cpe.993. URL http://dx.doi.org/10.1002/cpe.993. (page 85). [204] I. M. Oliver, D. J. Smith, and J. R. C. Holland. A study of permutation crossover operators on the traveling salesman problem. In Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, pages 224–230, Hillsdale, NJ, USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 109). [205] I. Osman and G. Laporte. Metaheuristics: A bibliography. Annals of Operations Research, 63:511–623, 1996. ISSN 0254-5330. URL http://dx.doi.org/10.1007/BF02125421. 10.1007/BF02125421. (page 25). [206] B. Paechter, T. Back, M. Schoenauer, M. Sebag, A. Eiben, J. J. Merelo, and T. C. Fogarty. A distributed resource evolutionary algorithm machine (dream). In Evolutionary Computation, 2000. Proceedings of the 2000 Congress on, volume 2, pages 951–958. IEEE, 2000. (page 86). [207] M. P. Papazoglou and W.-J. van den Heuvel. Service oriented architectures: approaches, technologies and research issues. VLDB J., 16(3):389–415, 2007. (page 280). [208] M. P. Papazoglou, P. Traverso, S. Dustdar, F. Leymann, and B. J. Krämer. Service-oriented computing: A research roadmap. In F. Curbera, B. J. Krämer, and M. P. Papazoglou, editors, Service Oriented Computing, volume 05462 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2005. (page 280). [209] J. A. Parejo, J. Racero, F. Guerrero, T. Kwok, and K. Smith. Fom: A framework for metaheuristic optimization. Computational Science ICCS 2003. Lecture Notes in Computer Science., 2660:886–895, 2003. no-indexada. (page 98). [210] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. Qos-aware services composition using tabu search and hybrid genetic algorithms. In Talleres de las JISBD. IADIS’08., 2008. (pages 207, 252). [211] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. De frameworks a ecosistemas: Evolución del software para optimización metaheurı́stica. In Actas del Congreso Español De Metaheuristicas, Algoritmos Evolutivoy y Bioispirados. MAEB2010. Celebrado en el marco del CONGRESO ESPAÑOL DE INFORMÁTICA (CEDI 2010)., Sep 2010. (page 199). 347 BIBLIOGRAPHY [212] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. On parameter selection and problem instances generation for qos-aware binding using grasp and path-relinking. Research Report 2011-4, ETSII. Av. Reina Mercedes s/n. 41012. Sevilla. Spain, 2011. (pages 289, 291, 292, 293, 294, 295, 303). [213] J. A. Parejo, S. Lozano, A. Ruiz-Cortès, and P. Fernandez. Metaheuristic optimization frameworks: A survey and benchmarking. Soft Computing, 2011. (pages 12, 41, 134, 252). [214] J. A. Parejo, A. R.-C. Jorge Garcı́a, and J. C. Riquelme. Statservice: Herramienta de análisis estadı́stico como soporte para la investigación con metaheurı́sticas. In Actas del VIII Congreso Español sobre Metaheurı́sticas, Algoritmos Evolutivos y Bio-inspirados, 2012. (page 199). [215] J. A. Parejo, S. Segura, and A. Ruiz-Cortés. Achieving replicability: Is there life for our experiments after publication? In Actas del IX Congreso Español sobre Metaheurı́sticas, Algoritmos Evolutivos y Bio-inspirados, 2013. (pages 177, 199). [216] K. Parsopoulos and M. Vrahatis. Recent approaches to global optimization problems through particle swarm optimization. Natural Computing, 1(2):235 to 306, 2002. (page 37). [217] K. E. Parsopoulos and M. N. Vrahatis. Particle swarm optimization method in multiobjective problems. In SAC ’02: Proceedings of the 2002 ACM symposium on Applied computing, pages 603–607, New York, NY, USA, 2002. ACM. ISBN 1-58113-445-2. doi: http://doi.acm.org/10.1145/508791.508907. (page 105). [218] J. K. Patrick Vandewalle and M. Vetterli. Reproducible research portal, 2009. (pages 8, 86, 156, 192). [219] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison Wesley, 1984. (page 4). [220] M. D. Penta and L. Troiano. Using fuzzy logic to relax constraints in ga-based service composition. In GECCO ’05: Proceedings of the 2005 conference on Genetic and evolutionary computation, June 2005. (page 287). [221] J. C. Pezzullo. Statpages.org. 2010. URL http://statpages.org. (page 87). [222] E. Piñana, I. Plana, V. Campos, and R. Martı́. Grasp and path relinking for the matrix bandwidth minimization. European Journal of Operational Research, 153(1):200–210, 2004. (pages 39, 292). [223] R. R. Planet. The reproducible research librum, 2011. URL http://www.rrplanet.com/. (pages 8, 86, 156, 192). 348 BIBLIOGRAPHY [224] K. R. Popper. Objective Knowledge. Oxford University Press, 1972. (page 253). [225] K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution A Practical Approach to Global Optimization. Natural Computing Series. Springer-Verlag, Berlin, Germany, 2005. (page 104). [226] T. R. Project. Web site of the r project. visited in 2013. URL http://www.r-project.org. (page 40). [227] Y. Qu, C. Lin, Y. Wang, and Z. Shan. Qos-aware composite service selection in grids. Grid and Cooperative Computing, 2006. GCC 2006. Fifth International Conference, pages 458–465, Oct. 2006. doi: 10.1109/GCC.2006.77. (page 286). [228] D. Quade. Using weighted rankings in the analysis of complete blocks with additive block effects. Journal of the American Statistical Association, 74:680–683, 1979. (page 271). [229] N. J. Radcliffe. Forma analysis and random respectful recombination. In In Foundations of Genetic Algorithms,, page 222 229, 1991. (page 109). [230] I. Rahman, A. K. Das, R. B. Mankar, and B. D. Kulkarni. Evaluation of repulsive particle swarm method for phase equilibrium and phase stability problems. Fluid Phase Equilibria, May 2009. ISSN 03783812. doi: 10.1016/j.fluid.2009.04.014. URL http://dx.doi.org/ 10.1016/j.fluid.2009.04.014. (page 38). [231] G. R. Raidl. Hybrid Metaheuristics, chapter A Unified View on Hybrid Metaheuristics, pages 1 – 12. Springer, 2006. (page 115). [232] R. L. Rardin and R. Uzsoy. Experimental evaluation of heuristic optimization algorithms: A tutorial. Journal of Heuristics, 7(3):261–304, May 2001. ISSN 1381-1231. doi: 10.1023/A: 1011319115230. URL http://dx.doi.org/10.1023/A:1011319115230. (page 4). [233] I. Rechenberg. Cybernetic solution path of an experimental problem. Royal Aircraft Establishment Library Translation 1122, Farnborough, Uk, 1965. (page 103). [234] J.-M. Renders and H. Bersini. Hybridizing genetic algorithms with hill-climbing methods for global optimization: two possible ways. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 312–317 vol.1, Jun 1994. doi: 10.1109/ICEC.1994.349948. (page 109). [235] M. G. C. Resende. Greedy randomized adaptive search procedures. In Encyclopedia of Optimization, pages 1460–1469. 2009. (pages 39, 289, 292). [236] D. E. Rex, J. Q. Ma, and A. W. Toga. The loni pipeline processing environment. Neuroimage, 19(3):1033–1048, 2003. (page 85). 349 BIBLIOGRAPHY [237] E. Ridge and D. Kudenko. Tuning the performance of the mmas heuristic. In T. Stützle, M. Birattari, and H. H. Hoos, editors, Engineering Stochastic Local Search Algorithms. Designing, Implementing and Analyzing Effective Heuristics, volume 4638 of Lecture Notes in Computer Science, pages 46–60. Springer Berlin / Heidelberg, 2007. ISBN 978-3-540-744450. (pages 6, 75, 76). [238] A. Roli and C. Blum. Hybrid metaheuristics: An introduction. In Hybrid Metaheuristics. Springer, 2008. (page 115). [239] D. M. Rom. A sequentially rejective test procedure based on a modified bonferroni inequality. Biometrika, 77:663–665, 1990. (page 271). [240] F. Rothlauf. Representations for Genetic and Evolutionary Algorithms. Springer, 2nd edition, 2006. (page 108). [241] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. 3rd edition. Prentice Hall Series in Artificial Intelligence. Pearson Education/Prentice Hall, 2010. ISBN 9780136042594. URL http://books.google.es/books?id=8jZBksh-bUMC. (pages 4, 25). [242] D. Sasaki. Armoga: An efficient multi-objective genetic algorithm. Technical report, 2005. (page 105). [243] J. D. Schaffer and A. Morishima. An adaptive crossover distribution mechanism for genetic algorithms. In Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application, pages 36–40, Hillsdale, NJ, USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 109). [244] A. Scheibenpflug, S. Wagner, E. Pitzer, and M. Affenzeller. Optimization knowledge base: an open database for algorithm and problem characteristics and optimization results. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, GECCO Companion ’12, pages 141–148, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1178-6. doi: 10.1145/2330784.2330806. URL http://doi.acm. org/10.1145/2330784.2330806. (page 192). [245] H.-P. Schwefel. Numerical Optimization of Computer Models. John Wiley & Sons, Inc., New York, NY, USA, 1981. ISBN 0471099880. (page 110). [246] S. Segura, J. A. Parejo, R. M. Hierons, D. Benavides, and A. Ruiz-Cortés. Ethom: An evolutionary algorithm for optimized feature models generation (v. 1.2). Technical report, July 2012. (pages 316, 320). 350 BIBLIOGRAPHY [247] S. Segura, J. A. Parejo, R. M. Hierons, D. Benavides, and A. Ruiz-Cortés. Ethom: An evolutionary algorithm for optimized feature models generation (v. 1.2). Technical report, July 2012. (pages 10, 252). [248] W. R. Shadish, T. D. Cook, and D. T. Campbell. Experimental and quasiexperimental designs for generalized causal inference. Houghton Mifflin, 2 edition, July 2001. ISBN 0395615569. URL http://www.amazon.com/exec/obidos/redirect?tag= citeulike07-20&path=ASIN/0395615569. (pages 6, 7, 46, 54, 56, 61, 65, 66, 67, 69, 71, 72, 172, 176). [249] J. P. Shaffer. Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81(395):826 –831, 1986. (page 271). [250] T. Shane. A partial implementation of the bica cognitive decathlon using the psychology experiment building language (pebl). International Journal of Machine Consciousness, 2(02): 273–288, 2010. (page 84). [251] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality. Biometrika, 52 (3–4):591–611, 1965. (page 271). [252] D. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton, 2006. (page 271). [253] R. Sinnema. Experiment description language (edl), 2003. sourceforge.net/. (page 84). URL http://edl. [254] N. J. A. Sloane and R. H. Hardin. Gosset: A general-purpose program for designing experiments, 1991-2003. URL http://www.research.att.com/˜njas/gosset/index. html. (page 119). [255] N. V. Smirnov. Estimate of deviation between empirical distribution functions in two independent samples (in russian). Bulletin of Moscow University, 2:3–16, 1939. (page 271). [256] SoaML 1 0. Service Oriented Architecture Modeling Language (SoaML) Specification, Version 1.0. Object Management Group, Mar. 2012. URL http://www.omg.org/spec/SoaML/1. 0/. (page 189). [257] C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 15(1):72–101, 1904. (page 144). [258] splot. S.P.L.O.T.: Software Product Lines Online Tools. http://www.splot-research. org/, accessed October 2010. (page 305). [259] I. StatPoint Technologies. Statgraphics online. statgraphicsonline.com/SGOnline.aspx. (page 87). 2012. URL http://www. 351 BIBLIOGRAPHY [260] S. M. Stigler. Francis galton’s account of the invention of correlation. Statist. Sci., 4(2): 73–79, 1989. (page 144). [261] A. Strunk. Qos-aware service composition: A survey;. In Proceedings of the European Conference on Web Services (ECOWS10), 2010. (pages 283, 284, 303). [262] T. Stutzle and H. Hoos. Max-min ant system and local search for the traveling salesman problem. Evolutionary Computation, 1997., IEEE International Conference on, pages 309–314, Apr 1997. doi: 10.1109/ICEC.1997.592327. (page 105). [263] S. Su, C. Zhang, and J. Chen. An improved genetic algorithm for web services selection. In Distributed Applications and Interoperable Systems, volume 4531/2007 of Lecture Notes in Computer Science, pages 284–295. Springer, 2007. doi: 10.1007/978-3-540-72883-2\ 21. URL http://dx.doi.org/10.1007/978-3-540-72883-2_21. (page 287). [264] P. N. Suganthan. Particle swarm optimiser with neighbourhood operator. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC, pages 1958–1962, 1999. (page 104). [265] R. Suresh and K. Mohanasundaram. Pareto archived simulated annealing for permutation flow shop scheduling with multiple objectives. In Cybernetics and Intelligent Systems, 2004 IEEE Conference on, volume 2, pages 712–717, 2004. doi: 10.1109/ICCIS.2004. 1460675. (page 105). [266] G. Syswerda. Foundations of genetic algorithms, chapter A Study of Reproduction in Generational and Steady-State Genetic Algorithms. Morgan Kaufmann, 1991. (pages 109, 110). [267] E.-G. Talbi. A taxonomy of hybrid metaheuristics. J. Heuristics, 8(5):541–564, 2002. (pages 25, 26, 27, 115, 163). [268] E.-G. Talbi. Metaheuristics - From Design to Implementation. Wiley, 2009. ISBN 978-0-47027858-1. (pages 25, 26, 28). [269] J. Timmer. Keeping computers from ending science’s reproducibility. Article in Ars Technica, Jan 2010. URL http://arstechnica.com/science/2010/01/ keeping-computers-from-ending-sciences-reproducibility/. (page 10). [270] P. Trinidad, C. Müller, J. Garcı́a-Galán, and A. Ruiz-Cortés. Building industry-ready tools: Fama framework and ada. In Third International Workshop on Academic Software Development Tools and Techniques, pages 160–173, 2010. (page 186). [271] E. Tsang. Foundations of Constraint Satisfaction. Academic Press, 1995. ISBN 0-12-701610-4. (page 312). 352 BIBLIOGRAPHY [272] D. G. Uitenbroek. Sisa: Simple interactive statistical analysis. 1997. URL http://www. quantitativeskills.com/sisa/. (page 87). [273] E. Ulungu, J. Teghem, P. Fortemps, and D. Tuyttens. Mosa method: a tool for solving multiobjective combinatorial optimization problems. Journal of Multi-Criteria Decision Analysis, 8(4):221 to 236, 1999. (page 105). [274] P. Van Hentenryck and L. Michel. Constraint-Based Local Search. The MIT Press, 2005. ISBN 0262220776. (page 40). [275] D. A. Van Veldhuizen and G. B. Lamont. Multiobjective optimization with messy genetic algorithms. In SAC ’00: Proceedings of the 2000 ACM symposium on Applied computing, pages 470–476, New York, NY, USA, 2000. ACM. ISBN 1-58113-240-9. doi: http://doi. acm.org/10.1145/335603.335914. (page 105). [276] P. Vandewalle, J. Kovacevic, and M. Vetterli. Reproducible Research in Signal Processing - What, why, and how. IEEE Signal Processing Magazine, 26(3):37–47, 2009. doi: 10.1109/ MSP.2009.932122. (page 8). [277] S. Ventura, C. Romero, A. Zafra, J. Delgado, and C. Hervás. Jclec: a java framework for evolutionary computation. Soft Computing, 12(4):381–392, 2008. URL http://dx.doi. org/10.1007/s00500-007-0172-0. (pages 84, 98). [278] J. S. Vesterstrøm and J. Riget. Particle swarms: Extensions for improved local, multi-modal, and dynamic search in numerical optimization. PhD thesis, Dept. of Computer Science, University of Aarhus, 2002. (page 38). [279] S. Voß. Meta-heuristics: The state of the art. pages 1–23. 2001. (page 40). [280] S. Voß. Meta-heuristics: The state of the art. In Proceedings of the Workshop on Local Search for Planning and Scheduling-Revised Papers in ECAI, pages 1–23. Springer-Verlag, London, UK, 2001. ISBN 3-540-42898-4. (page 306). [281] S. Voß and D. L. Woodruff. Optimization Software Class Libraries. Kluwer Academic Publishers, 2002. (pages 9, 42, 83, 97). [282] H. Wada, P. Champrasert, J. Suzuki, and K. Oba. Multiobjective optimization of slaaware service composition. Congress on Services - Part I, 2008. SERVICES ’08. IEEE, pages 368–375, July 2008. doi: 10.1109/SERVICES-1.2008.77. (page 287). [283] S. Wagner. Heuristicı̈¿½Optimizationı̈¿½Softwareı̈¿½Systemsı̈¿½Modelingı̈¿½ofı̈¿½Heuristicı̈¿½Optimizationı̈¿½Algorithmsı̈¿½in theı̈¿½HeuristicLabı̈¿½Softwareı̈¿½Environment. PhD thesis, Johannesı̈¿½Kepler University, Linz, Austria, 2009. (pages 98, 182). 353 BIBLIOGRAPHY [284] S. Wagner and M. Affenzeller. Heuristiclab: A generic and extensible optimization environment, 2005. URL http://dx.doi.org/10.1007/3-211-27389-1_130. (page 84). [285] S. Wagner, G. Kronberger, A. Beham, S. Winkler, and M. Affenzeller. Model driven rapid prototyping of heuristic optimization algorithms, 2009. URL http://dx.doi.org/10. 1007/978-3-642-04772-5_94. (page 85). [286] D. Waltemath, R. Adams, D. Beard, F. Bergmann, U. Bhalla, R. Britten, V. Chelliah, M. Cooling, J. Cooper, E. Crampin, et al. Minimum information about a simulation experiment (miase). PLoS computational biology, 7(4):e1001122, 2011. (page 273). [287] H. Wang, P. Tong, P. Thompson, and Y. Li. Qos-based web services selection. icebe, 0: 631–637, 2007. doi: http://doi.ieeecomputersociety.org/10.1109/ICEBE.2007.88. (pages 280, 284, 287). [288] J. Wegener, K. Grimm, M. Grochtmann, and H. Sthamer. Systematic testing of real-time systems. In Proceedings of the Fourth International Conference on Software Testing and Review (EuroSTAR), 1996. (page 306). [289] J. Wegener, H. Sthamer, B. Jones, and D. Eyres. Testing real-time systems using genetic algorithms. Software Quality Control, 6(2):127–135, 1997. ISSN 0963-9314. doi: 10.1023/A: 1018551716639. (pages 306, 313). [290] J. Wegener, H. Sthamer, B. F. Jones, and D. E. Eyres. Testing real-time systems using genetic algorithms. Software Quality Journal, 6:127–135, 1997. ISSN 0963-9314. (pages 24, 214). [291] T. Weise. Global Optimization Algorithms - Theory and Application. Self-Published, second edition, 2009. Online available at http://www.it-weise.de/. (pages 25, 26). [292] W. West and T. Ogden. Statistical analysis with webstat, a java applet for the world wide web. Journal of Statistical Software, 2(3):1–7, 9 1997. ISSN 1548-7660. URL http: //www.jstatsoft.org/v02/i03. (page 87). [293] W. West, Y. Wu, and D. Heydt. An introduction to statcrunch 3.0. Journal of Statistical Software, 9(5):??–??, 3 2004. ISSN 1548-7660. URL http://www.jstatsoft.org/v09/i05. (page 87). [294] D. Whitley. The genitor algorithim and selection pressure: Why rank-based allocation of reproductive trials is best. In Proceedings of the Third International Conference on Genetic Algorithms, pages 116–121, 1989. (page 112). [295] D. Whitley, S. Rana, and R. B. Heckendorn. The island model genetic algorithm : On separability, population size and convergence. CIT. Journal of computing and information technology, 7(1):33 – 47, 1999. (page 116). 354 BIBLIOGRAPHY [296] F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83, 1945. (page 271). [297] D. N. Wilke, S. Kok, and A. A. Groenwold. Comparison of linear and classical velocity update rules in particle swarm optimization: notes on diversity. International Journal for Numerical Methods in Engineering, 70(8):962 984, 2007. (page 38). [298] G. C. Wilson, A. Mc Intyre, and M. I. Heywood. Resource review: Three open source systems for evolving programs–lilgp, ecj and grammatical evolution. Genetic Programming and Evolvable Machines, 5(1):103–105, 2004. ISSN 1389-2576. doi: http://dx.doi.org/10. 1023/B:GENP.0000017053.10351.dc. (page 83). [299] C. Wohlin. Experimentation in Software Engineering: An Introduction. The Kluwer International Series in Software Engineering. Kluwer Academic, 2000. ISBN 9780792386827. URL http://books.google.es/books?id=nG2UShV0wAEC. (page 63). [300] D. Wolpert and W. Macready. No free lunch theorems for optimization. Evolutionary Computation, IEEE Transactions on, 1(1):67–82, Apr 1997. ISSN 1089-778X. (pages 5, 9, 41). [301] A. H. Wright. Genetic algorithms for real parameter optimization. In Foundations of Genetic Algorithms, pages 205–218. Morgan Kaufmann, 1994. (page 109). [302] X. Yao and Y. Liu. Fast evolutionary programming. In Proc. 5th Ann. Conf. on Evolutionary Programming, 1996. (page 110). [303] L. Zeng, B. Benatallah, A. H. Ngu, M. Dumas, J. Kalagnanam, and H. Chang. Qos-aware middleware for web services composition. IEEE Trans. Softw. Eng., 30(5):311–327, 2004. ISSN 0098-5589. doi: http://dx.doi.org/10.1109/TSE.2004.11. (pages 280, 283, 284, 285, 286). [304] C. Zhang, S. Su, and J. Chen. Diga: Population diversity handling genetic algorithm for qos-aware web services selection. Comput. Commun., 30(5):1082–1090, March 2007. ISSN 0140-3664. doi: 10.1016/j.comcom.2006.11.002. URL http://portal.acm.org/ citation.cfm?id=1228023. (page 287). [305] H. Zheng, W. Zhao, J. Yang, and A. Bouguettaya. Qos analysis for web service compositions with complex structures. Services Computing, IEEE Transactions on, PP(99):1, 2012. ISSN 1939-1374. doi: 10.1109/TSC.2012.7. (pages 281, 287). [306] H. Zhou and J. J. Grefenstette. Induction of finite automata by genetic algorithms. In oceedings of the 1986 IEEE International Conference on Systems, Man, and Cybernetics, page 170 to 174, 1986. (page 109). 355 BIBLIOGRAPHY [307] E. Zitzler and L. Thiele. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evolutionary Computation, 3(4):257– 271, 1999. (page 105). [308] E. Zitzler, M. Laumanns, and L. Thiele. Spea2: Improving the strength pareto evolutionary algorithm. Technical report, Computer Engineering and Networks Laboratory (TIK). Department of Electrical Engineering. Swiss Federal Institute of Technology (ETH), 2001. (page 105). 356