MOSES: A METAHEURISTIC OPTIMIZATION SOFTWARE

Comentarios

Transcripción

MOSES: A METAHEURISTIC OPTIMIZATION SOFTWARE
MOSES: A M ETAHEURISTIC
O PTIMIZATION S OFTWARE
E CO S YSTEM
A PPLICATIONS TO THE A UTOMATED A NALYSIS OF
S OFTWARE P RODUCT L INES AND S ERVICE - BASED
A PPLICATIONS
J OS É A NTONIO PAREJO M AESTRE
PhD dissertation
Supervised by Dr. Antonio Ruiz Cortés and
Dr. Sergio Segura Rueda
Universidad de Sevilla
October 2013
First published in 1/10/2013 by
José Antonio Parejo Maestre
Copyright ©
http://www.isa.us.es/members/joseantonio.parejo
[email protected]
This is a copyleft document but the content is copyrighted
Support: PhD dissertation granted by the Spanish Government under CICYT projects
SETI (TIN-2009-07366) and TAPAS (TIN2012-32273), the Andalusian Government
projects ISABEL (TIC-2533) and THEOS (TIC-5906), and the European Commission
through the European Network of Excellence in Software Services and Systems
(S-Cube).
Dr. Antonio Ruiz Cortés, Profesor Titular del Área de Lenguajes y Sistemas Informáticos de la Universidad de Sevilla y Dr. Sergio Segura Rueda, Profesor Contratado Doctor del Área de Lenguajes y Sistemas Informáticos de la Universidad de
Sevilla,
HACEN CONSTAR
que D. José Antonio Parejo Maestre, Ingeniero en Informática por la Universidad
de Sevilla, ha realizado bajo nuestra supervisión el trabajo titulado
MOSES: a Metaheuristic Optimization Software EcoSystem,
Applications to the Automated Analysis of Software Product Lines and Service-based
Applications
Una vez revisado, autorizamos el comienzo de los trámites para su presentación como
Tesis Doctoral al tribunal que ha de juzgarlo.
Fdo. Dr. Antonio Ruiz Cortés y Dr. Sergio Segura Rueda
en la Universidad de Sevilla
1/10/2013
Yo, D. José Antonio Parejo Maestre con NIF número 44602941F,
DECLARO
mi autorı́a del trabajo que se presenta en la memoria de esta tesis doctoral que
tiene por tı́tulo:
MOSES: a Metaheuristic Optimization Software EcoSystem,
Applications to the Automated Analysis of Software Product Lines and Service-based
Applications
Lo cual firmo,
Fdo. D. José Antonio Parejo Maestre
en la Universidad de Sevilla
1/10/2013
Universidad de Sevilla
The committee in charge of evaluating the dissertation presented by
José Antonio Parejo Maestre in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Software Engineering, hereby recommends
of this disserta.
tion and awards the author the grade
Miguel Toro Bonilla
Catedrático de Universidad
Universidad de Sevilla
José Cristobal Riquelme Santos
Catedrático de Universidad
Universidad de Sevilla
José Raul Romero
Profesor Contratado Doctor
Universidad de Córdoba
Stefan Wagner
Professor
University of Applied Sciences
Upper Austria
Jorge Cardoso
Associate Professor
Universidad de Coimbra.
Portugal.
To put record where necessary, we sign minutes in
,
Dedicado a mi familia.
Y en especial a Ana Rut
porque sin su amor y entrega
este documento
y la felicidad de su autor
no serı́an posibles.
A CKNOWLEDGEMENTS
After a long time of hard work, finally it’s time to look back and thank all the people
who made this thesis possible. With all my gratitude.
First of all, thank God, for giving me each day of life, and the gifts and strength to
carrying out this thesis.
Second, I want to thank my supervisors, Antonio and Sergio, because this thesis is
the result of their encouragement, guidance, support and help. Thanks Antonio, for
being as open-minded and brave as to embark on this adventure, allowing me work
on this topic. Thanks for your useful advice on both, work and life, and thanks for
your mood, your support and your leadership. Thanks Sergio, for your willingness,
for your hard work, for the long hours you have spent side to side with me. Without
your contribution this thesis would be much worse. I must thank you both for accepting my limitations without blame, for helping me to overcome them, and for all your
dedication in this time.
My fellow researchers of the ISA group also deserve a big thanks because, in one
way or another, they contributed to this thesis. In particular I am grateful to Pablo
Fernández, for being an actual friend and a great fellow all these years; to Guti, Manolo
and Joaquı́n, for giving me his companionship and support; to Amador and Pablo
Trinidad, for our discussions and for making me laugh even on the hardest times;
to Carlos, Jesús and Adela for making more bearable the long hours of work with
his companionship at lunch and breaks; to Beatriz for having her door always open
for me; and to the rest of members of the ISA group, for their companionship and
great help along this time, specially to those that are abroad, Cristina and José Marı́a.
My gratitude also to the technicians of our group: Manuel León, Alejandro Trinidad,
Alberto Calleja and Pablo León. I also want to show my gratitude to other members
of the Department of Computer Languages and Systems, specially to Jorge Garcı́a, for
being on my side all this time, and for his willingness to work with me in STATService.
I am grateful also to Pepe for his contagious laughter, for his visits to our office, and
along with Fernando, Rafael and Miguel, for everything you have taught me. During
i
my education, you made me discern the beauty that hides on the algorithms, on the
languages, on the software and on its engineering, this thesis is also the result of your
work.
I must express gratitude to Barbara Pernici, who welcomed me warmly in his lab
during my research stay in Milan, and to Maria-Grazia Fugini who supported me and
worked with me during the stay.
Por último unas palabras de agradecimiento en español. Gracias a mi familia, mis
padres, mis hermanas, mis cuñados y cuñadas, mis suegros, etc., por estar siempre
ahı́. Especialmente, gracias mamá y papá, por enseñarme la cultura del esfuerzo, pero
también la importancia del descanso, por confiar plenamente en mis posibilidades, por
apoyarme siempre. Finalmente, los más importantes para mı́, GRACIAS ANA RUT,
porque sin tu sacrificio y entrega, sin tu amor y tu apoyo, esta tesis hubiera sido imposible. Gracias Ana, mi pequeña, por ser un rayo de luz en los momentos de oscuridad,
la flor que me mira sonriendo desde tu foto en el rincón de mi mesa. Gracias, Antonio
y Marcos, por hacerme desconectar a la fuerza, y por convertir la vuelta a casa en el
momento más gratificante del dı́a.
José Antonio Parejo Maestre
September 2013
ii
A GRADECIMIENTOS
Tras mucho tiempo de duro trabajo, es el momento de mirar atrás y dar las gracias a
toda la gente que hizo esta tesis posible. Con toda mi gratitud.
En primer lugar, le doy las gracias a Dios, por darme cada dı́a de vida, los dones y
la fuerza necesaria para llevar a cabo esta tesis.
En segundo lugar, quiero dar las gracias a mis directores, Antonio y Sergio, porque
esta tesis es el resultado de sus ánimos, orientación, apoyo y ayuda. Gracias Antonio,
por ser tan valiente y abierto como para embarcarte en esta aventura, por permitirme
trabajar en este tema que tanto me apasiona. Gracias por tu consejo, siempre útil y
atinado, para el trabajo y para mi vida. Gracias sobre todo por tu humor y tu trato,
por tu apoyo y tu liderazgo suave e inspirador. Gracias Sergio, por tu disposición, por
lo duro que has trabajado durante tantas horas, siempre codo con codo conmigo. Sin
tu aporte esta tesis serı́a mucho peor. Gracias a ambos por aceptar mis limitaciones
sin reproches, por ayudarme a superarlas, por toda vuestra dedicación durante este
tiempo.
Mis compañeros del grupo ISA también merecen un gran agradecimiento, porque
de una manera u otra, han contribuido a que esta tesis sea posible. Estoy especialmente
agradecido a Pablo Fernández, por ser un verdadero amigo y un gran compañero todos estos años; a Guti, Manolo y Joaquı́n, por darme su compañı́a y apoyo; a Amador
y Pablo Trinidad por nuestras conversaciones y por hacerme reir incluso en los momentos más duros; a Carlos, Jesús y Adela por hacer más soportables las largas horas
de trabajo con su compañı́a en los almuerzos y las meriendas; a Beatriz Bernárdez
por tener siempre abierta su puerta para mi; y al resto de miembros del grupo ISA,
Jose, Octavio, David Ruiz y David Benavides, Ana Belén, Fabricia, pero muy especialmente a aquellos que están lejos, Cristina and José Marı́a. Mi gratitud también a los
técnicos del grupo, por su gran trabajo y disposición cuando hemos colaborado en los
distintos proyectos: Manuel León, Alejandro Trinidad, Alberto Calleja y Pablo León.
También quiero mostrar mi gratitud a los miembros del Departamento de Lenguajes y
Sistemas por la manera en que me han acogido y tratado estos años, especialmente a
iii
Jorge Garcı́a, por estar a mi lado y escucharme todo este tiempo, y por su disposición
a colaborar conmigo en STATService. No puedo dejar de agradecer a Pepe por contagiarme tantas veces su risa, y por sus visitas a nuestro despacho, y junto con Fernando, Rafael y Miguel, por todo lo que me han enseñado. Durante mi formación,
ellos consiguieron que supiera vislumbrar la belleza escondida en los algoritmos, en
los lenguajes, en el software y su ingenierı́a, y por eso esta tesis es también fruto de su
labor.
También debo expresar mi gratitud a Barbara Pernici, por acogerme en su grupo
durante mi estancia en Milan, y a Maria-Grazia Fugini por su apoyo y su colaboración
conmigo durante la estancia.
Gracias a mi familia, mis padres, mis hermanas, mis cuñados y cuñadas, mis suegros, y a mis otros hermanos, por estar siempre ahı́. Especialmente, gracias mamá y
papá, por enseñarme la cultura del esfuerzo, pero también la importancia del descanso,
por confiar plenamente en mis posibilidades, por apoyarme siempre. Finalmente, los
más importantes para mı́, GRACIAS ANA RUT, porque sin tu sacrificio y entrega, sin
tu amor y tu apoyo, esta tesis hubiera sido imposible. Gracias Ana, mi pequeña, por
ser un rayo de luz en los momentos de oscuridad, la flor que me mira sonriendo desde
tu foto en el rincón de mi mesa. Gracias, Antonio y Marcos, por hacerme desconectar
a la fuerza, y por convertir la vuelta a casa en el momento más gratificante del dı́a.
José Antonio Parejo Maestre
Septiembre 2013
iv
A BSTRACT
Most of the problems that we face nowadays can be expressed as optimization problems. An optimization problem is solved by finding, from a set of candidate solutions,
the one that best fulfills a set of objectives. Finding the best solution in an optimization problem is hard or even infeasible in most real cases. Heuristic algorithms have
been used for decades to guide the search for satisfactory solutions in hard optimization problems at an affordable cost. Metaheuristics are reusable schemes that ease the
implementation of heuristic-based algorithms to solve optimization problems.
The use of metaheuristics to solve optimization problems is a largely studied topic
in the field of computer sciences. In this context, software engineers recently realized
of the benefits of using metaheuristics to solve hard optimization problems, usually
referred to as search-based problems. This has led to a “search-based” trend observed in
a number of software engineering conferences and special issues on the matter. However, despite its many benefits, the application of metaheuristics requires overcoming
numerous obstacles. First, the implementation of efficient metaheuristic programs is a
complex and error-prone process that require of knowledgeable developers. Although
some supporting tools have been proposed, these usually automate only single tasks
of the process. Also, a key challenge on the application of metaheuristic is experimentation. This is due to the fact that there is no analytical method to choose a suitable
metaheuristic program for a given problem. Instead, experiments must be performed
to compare the candidate techniques and their possible variants. This can lead to hundred of potential alternatives to be compared making the design, execution and analysis of experiments complex and time-consuming. Besides this, experiments are usually
performed ad-hoc with generic tools and no clear guidelines introducing threats to validity and making them hardly automated and reproducible.
The goal of this thesis is to reduce the cost of applying metaheuristics for solving
optimization problems. To that purpose, we present a set of tools to support the selection, configuration and evaluation of metaheuristic-based applications. First, we
present a comparison framework and a survey of Metaheuristic Optimization Frameworks (MOFs). This supports the selection of the right MOF for a given optimization
problem. Second, we present an experimental description language (SEDL), and an
v
extension of it (MOEDL), to support the description of experiments and their results
in a succinct, self-contained and machine-processable way. Third, we present a set of
analysis operations for SEDL documents. Among others, these operations support the
automated validation of SEDL experiments warning users about potential threats and
suggesting possible fixes. Fourth, we present a software ecosystem (MOSES) to support the integration of metaheuristic and experimentation tools. Also, we present a
reference implementation of the ecosystem, including the following tools, i ) FOM, a
MOF developed by the authors, ii ) an Experimental Execution Environment (E3) for
the automated analysis, execution and replication of experiments described in SEDL,
and iii ) a suite of on-line software tools (STATService) supporting the most common
statistical analysis tests used in the context of metaheuristics.
For the validation of our work, we used MOSES to solve two relevant search-based
problems in the context quality-driven web service composition and performance testing on the analysis of feature models. As a result, MOSES lessened the implementation
effort and the experimentation burden, producing algorithms that improve the state of
the art for both problems.
vi
R ESUMEN
Muchas de las situaciones a las que nos enfrentamos cada dı́a pueden expresarse como
problemas de optimización. Un problema de optimización se resuelve encontrando, de
entre un conjunto de soluciones candidatas, aquella que mejor satisface un conjunto
de objetivos. Encontrar la mejor solución para un problema de optimización es difı́cil
o incluso inviable en muchos casos reales. Los algoritmos heurı́sticos se han utilizado
durante décadas para guiar la búsqueda de soluciones satisfactorias para problemas
de optimización duros en un plazo de ejecución asequible. Las metaheurı́sticas son
esquemas de algoritmos reutilizables que facilitan el diseño de algoritmos heurı́sticos
para resolver problemas de optimización duros.
El uso de metaheurı́sticas para resolver problemas de optimización es un tema ampliamente estudiado. En este contexto, los ingenieros de software recientemente se
dieron cuenta de los beneficios del uso de metaheurı́sticas para resolver problemas de
optimización duros, generalmente conocidos como problemas de búsqueda (del inglés
search-based problems). Esto ha llevado a una lı́nea de investigación emergente sobre
problemas basados en búsqueda que se aprecia en las conferencias de ingenierı́a de
software y los números especiales de revistas sobre el tema.
Sin embargo, a pesar de sus muchas ventajas, la aplicación de metaheurı́sticas
presenta numerosos obstáculos. En primer lugar, la implementación de las metaheurı́sticas como programas eficientes es un proceso complejo y propenso a errores
que requiere desarrolladores expertos. Aunque se han propuesto algunas herramientas de apoyo, por lo general éstas sólo automatizan tareas aisladas de este proceso.
Otro desafı́o clave en la aplicación de metaheurı́sticas es la experimentación. Esto
se debe a que no hay ningún método teórico general para elegir un programa metaheurı́stico adecuado para un problema dado, pues deben realizarse experimentos para
comparar las técnicas candidatas y sus posibles variantes. Esto puede conducir a cientos de alternativas posibles que deben compararse haciendo que el diseño, la ejecución
y el análisis de los experimentos sean complejos y se dilaten en el tiempo. Finalmente,
los experimentos se realizan generalmente con herramientas genéricas y sin directrices
respecto a las amenazas a la validez, su automatización y replicabilidad.
El objetivo de esta tesis es el de reducir el costo de la aplicación de metaheurı́sticas
vii
para la resolución de problemas de optimización. Para tal fin, se presenta un conjunto
de herramientas para apoyar la selección, configuración y evaluación de las soluciones
basadas en metaheurı́sticas. En primer lugar, se presenta un marco de comparación,
en base al cual se han estudiado las caracterı́sticas de varios frameworks para optimización con metaheurı́sticas (MOFs). Esto da soporte la selección del MOF adecuado
para el problema de optimización a resolver. En segundo lugar, se presenta un lenguaje
de descripción experimental (SEDL), y una extensión del mismo (MOEDL), para dar
soporte a la descripción de los experimentos y sus resultados, de manera sucinta, autocontenida y procesable automáticamente. En tercer lugar, se presenta un conjunto de
operaciones de análisis de documentos SEDL. Entre otras, estas operaciones dan soporte a la validación automática de las amenazas potenciales a la validez, y advertir
a los usuarios de SEDL y sugerir posibles soluciones. En cuarto lugar, se presenta un
ecosistema software (MOSES) para dar soporte a la integración de las herramientas de
metaheurı́sticas y de experimentación. Además, se presenta una implementación de
referencia del ecosistema, incluyendo las siguientes herramientas: i ) FOM, el framework desarrollado por los autores, ii ) un entorno de ejecución Experimental (E3) para
el análisis automatizado, la ejecución y la replicación de experimentos descritos en
SEDL y MOEDL, y iii ) una suite de herramientas de software en lı́nea (STATService)
que da soporte al análisis estadı́stico con los test más comunes en el contexto de las
metaheurı́sticas.
Para la validación de este trabajo, se ha usado MOSES para resolver dos problemas de optimización basados en búsqueda relevantes en el contexto de la ingenierı́a
del software: la maximización de la calidad de composiciones de servicios web y las
pruebas de rendimiento en el análisis de los modelos de caracterı́sticas. Como resultado, MOSES ha disminuido el esfuerzo en la implementación y la carga de la experimentación, y se han diseñado algoritmos que mejoran el estado de la técnica para
ambos problemas
viii
C ONTENTS
I
Preface
1
1
Introduction
3
1.1
Research context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.1
Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.2
Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.1.3
Tooling support . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.2
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3
Thesis goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4
Proposal solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.4.1
On the implementation of MPS applications . . . . . . . . . . . .
12
1.4.2
On the description of MOEs . . . . . . . . . . . . . . . . . . . . . .
12
1.4.3
On the automated analysis of MOEs . . . . . . . . . . . . . . . . .
13
1.4.4
On the automated conduction and replication of MOEs . . . . . .
13
1.4.5
On the development of MPS applications . . . . . . . . . . . . . .
14
1.4.6
Overall contributions . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.5
Thesis context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
1.6
Structure of this dissertation . . . . . . . . . . . . . . . . . . . . . . . . . .
17
ix
CONTENTS
II
2
Background Information
Optimization Problems and Metaheuristics
21
2.1
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.1.2
Why are optimization problems hard? . . . . . . . . . . . . . . . .
22
Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.2.1
Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
Single-solution based metaheuristics . . . . . . . . . . . . . . . . . . . . .
27
2.3.1
Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.3.2
Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.3.3
Tabu search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
Population methods based metaheuristics . . . . . . . . . . . . . . . . . .
32
2.4.1
Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . .
32
2.4.2
Path Relinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.4.3
Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . .
37
2.4.4
Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
Building methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.5.1
GRASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.5.2
Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . .
39
Metaheuristic Optimization Frameworks . . . . . . . . . . . . . . . . . .
40
2.6.1
Why are MOFs valuable? . . . . . . . . . . . . . . . . . . . . . . .
41
2.6.2
Drawbacks: All that glitters ain’t gold . . . . . . . . . . . . . . . .
42
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
2.2
2.3
2.4
2.5
2.6
2.7
3
Experimentation
x
19
45
CONTENTS
3.1
The concept of Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.2
Sample experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3
Experimental Description . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3.1
Objects, subjects and populations . . . . . . . . . . . . . . . . . . .
48
3.3.2
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
3.3.3
Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
3.3.4
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Experimental Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.4.1
Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . .
60
3.4.2
Confirmatory data analysis . . . . . . . . . . . . . . . . . . . . . .
61
3.4
3.5
3.6
III
4
Experimental Validity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.5.1
Internal validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.5.2
External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
Metaheuristic Optimization Experiments . . . . . . . . . . . . . . . . . .
73
3.6.1
Selection and Tailoring experiments . . . . . . . . . . . . . . . . .
73
3.6.2
Tuning Experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
75
3.6.3
Designs for MOEs . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
3.6.4
Analyses for MOEs . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
3.6.5
Threats to validity in MOEs . . . . . . . . . . . . . . . . . . . . . .
77
Contributions
79
Motivation
81
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.2
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
xi
CONTENTS
4.3
4.4
5
4.2.1
On the implementation of MPS applications . . . . . . . . . . . .
83
4.2.2
On the description of MOEs . . . . . . . . . . . . . . . . . . . . . .
83
4.2.3
On the execution of MOEs . . . . . . . . . . . . . . . . . . . . . . .
85
4.2.4
On the analysis of MOEs . . . . . . . . . . . . . . . . . . . . . . . .
86
4.2.5
On the replicability of MOEs . . . . . . . . . . . . . . . . . . . . .
87
Overview of our contributions . . . . . . . . . . . . . . . . . . . . . . . .
88
4.3.1
On the implementation of MPS applications . . . . . . . . . . . .
88
4.3.2
On the description of MOEs . . . . . . . . . . . . . . . . . . . . . .
88
4.3.3
On the execution of MOEs . . . . . . . . . . . . . . . . . . . . . . .
89
4.3.4
On the analysis of MOEs . . . . . . . . . . . . . . . . . . . . . . . .
90
4.3.5
On the replicability of MOEs . . . . . . . . . . . . . . . . . . . . .
90
4.3.6
On the development of MPS-based applications . . . . . . . . . .
91
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
Comparative framework for MOFs
93
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.2
Review Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.2.1
Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.2.2
Source material . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
5.2.3
Inclusion and Exclusion criteria . . . . . . . . . . . . . . . . . . . .
97
5.2.4
Comparison Criteria . . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.3
xii
Metaheuristic Techniques (C1) . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.1
Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 102
5.3.2
Assessment and Feature Coverage Analysis . . . . . . . . . . . . 105
5.3.3
Comparative analysis . . . . . . . . . . . . . . . . . . . . . . . . . 106
CONTENTS
5.4
5.5
5.6
5.7
5.8
Adapting to a problem and its structure (C2) . . . . . . . . . . . . . . . . 107
5.4.1
Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 108
5.4.2
Assessment and Feature Coverage Analysis . . . . . . . . . . . . 114
5.4.3
Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 114
Advanced characteristics (C3) . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.1
Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 115
5.5.2
Assessment and Feature Cover Analysis . . . . . . . . . . . . . . 116
MPS life-cycle Support (C4) . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.6.1
Characteristics Description . . . . . . . . . . . . . . . . . . . . . . 118
5.6.2
Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 121
Design, Implementation and Licensing (C5) . . . . . . . . . . . . . . . . . 122
5.7.1
Characteristics description . . . . . . . . . . . . . . . . . . . . . . 122
5.7.2
Assessment and feature cover analysis . . . . . . . . . . . . . . . 124
Documentation & support (C6) . . . . . . . . . . . . . . . . . . . . . . . . 125
5.8.1
5.9
Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 128
Discussion and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.9.1
Capabilities Discussion . . . . . . . . . . . . . . . . . . . . . . . . 130
5.9.2
Evolution of the market of MOFs . . . . . . . . . . . . . . . . . . . 131
5.9.3
Potential areas of improvement of current frameworks . . . . . . 132
5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6
Scientific Experiments Description Language
137
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2
Experimental description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2.1
Objects, subjects and population . . . . . . . . . . . . . . . . . . . 138
xiii
CONTENTS
Constants and variables . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2.3
Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2.4
Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2.5
Analyses specification . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3
Experimental execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.4
Automated analysis of SEDL documents . . . . . . . . . . . . . . . . . . . 148
6.5
6.6
7
6.2.2
6.4.1
Information extraction operations . . . . . . . . . . . . . . . . . . 149
6.4.2
Operations for validity checking . . . . . . . . . . . . . . . . . . . 151
Extension points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.5.1
Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.5.2
Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5.3
Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5.4
Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5.5
Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Metaheuristic Optimization Experiments Description Language
157
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.2
MOEDL experimental descriptions . . . . . . . . . . . . . . . . . . . . . . 161
7.3
xiv
7.2.1
Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2.2
Problem types and instances . . . . . . . . . . . . . . . . . . . . . 162
7.2.3
Metaheuristic techniques . . . . . . . . . . . . . . . . . . . . . . . 163
7.2.4
Termination criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.2.5
Random number generation algorithm . . . . . . . . . . . . . . . 166
Types of MOEs supported by MOEDL . . . . . . . . . . . . . . . . . . . . 166
CONTENTS
7.4
8
7.3.1
Selection and tailoring experiments . . . . . . . . . . . . . . . . . 166
7.3.2
Tuning experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Transformation from MOEDL to SEDL . . . . . . . . . . . . . . . . . . . . 168
7.4.1
Transformation of common elements . . . . . . . . . . . . . . . . 169
7.4.2
Transformation of Techniques Comparison Experiments . . . . . 171
7.4.3
Transformation of technique tuning experiments . . . . . . . . . . 173
7.5
Extension points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
MOSES: A Meta-heuristic Optimization Software Ecosystem
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.2
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.3
Reference Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.4
8.3.1
Architectural Style . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.3.2
Abstract Component View . . . . . . . . . . . . . . . . . . . . . . 188
MOSES Reference Implementation (MOSES[RI]) . . . . . . . . . . . . . . 192
8.4.1
8.5
8.6
9
STATService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Using MOSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.5.1
IV
181
MOSES Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Validation
Validation
203
205
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.2
QoS-aware composite web services binding . . . . . . . . . . . . . . . . . 206
xv
CONTENTS
9.3
9.4
V
9.2.1
Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.2.2
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.2.3
Tailoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.2.4
Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.2.5
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Generation of hard FMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.3.1
Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.3.2
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.3.3
Tailoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.3.4
Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.3.5
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
9.3.6
Experiments on the generation hard FMs . . . . . . . . . . . . . . 218
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Final Remarks
10 Conclusions
225
227
10.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
10.2 Support for Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
10.3 Discussion, Limitations and Extensions . . . . . . . . . . . . . . . . . . . 230
VI
Appendices
233
A MOFs assessment data
235
A.1 Evaluation per Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
A.2 Global evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
xvi
CONTENTS
B Meta-models and Schemas
251
B.1 SEDL Meta-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
B.2 MOEDL Meta-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
B.3 XML Schemas of SEDL and MOEDL . . . . . . . . . . . . . . . . . . . . . 265
C A Metaheuristics Description Syntax in EBNF
267
D Statistical tests supported
271
E SEA
273
F EEE: Experimental Execution Environment
277
G QoS-aware Binding of Composite Web Services
279
G.1 The QoS-aware Composite Web Services Binding Problem . . . . . . . . 279
G.1.1 QoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
G.2 Our Proposal: QoSGasp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
G.3 Previous Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
G.3.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
G.3.2 Hybrid TS with SA . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
G.4 Experiments performed on the QoSWSCB Problem . . . . . . . . . . . . 292
G.4.1 Experiment QoSWSCB-#A1: Tailoring of GRASP . . . . . . . . . . 292
G.4.2 Experiment #A2: Tuning of GRASP+PR . . . . . . . . . . . . . . . 293
G.4.3 Experiment #1: Selection of a technique for QoSWSCB . . . . . . 293
G.4.4 Experiment #2: Selection of a technique for QoSWSCB (with a
different objective function) . . . . . . . . . . . . . . . . . . . . . . 298
G.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
xvii
CONTENTS
H Generation of Hard Feature Models
305
H.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
H.2 Feature Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
H.3 ETHOM: An evolutionary algorithm for feature models . . . . . . . . . . 309
H.3.1 Instantiation of the algorithm . . . . . . . . . . . . . . . . . . . . . 312
H.4 Experiments on the generation hard feature models . . . . . . . . . . . . 313
H.4.1 Experiment #1(b): Maximizing execution time in a SAT Solver . . 315
H.4.2 Experiment #2: Maximizing memory consumption in a BDD solver316
H.4.3 Experiment #3(a): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with JaCoP) . . . . . . . . 317
H.4.4 Experiment #3(b): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with SAT) . . . . . . . . 319
H.4.5 Experiment #4: Evaluating the impact of the Heuristics of JaCoP
319
H.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
I
J
Evidences of Utility and Applicability
321
I.1
Utility of the Comparative Framework for MOFs . . . . . . . . . . . . . . 321
I.2
Utility of MOSES[RI] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
I.2.1
Utility of FOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
I.2.2
Utility of STATService . . . . . . . . . . . . . . . . . . . . . . . . . 323
I.2.3
Utility of EEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Acronyms
327
Bibliography
328
xviii
L IST OF F IGURES
1.1
Metaheuristic problem solving life-cycle . . . . . . . . . . . . . . . . . . .
6
1.2
Experimentation in the context of the MPS life-cycle . . . . . . . . . . . .
7
1.3
Summary of contributions per phase of the MPS life-cycle . . . . . . . .
15
2.1
Objective function landscapes of several optimization problems. . . . . .
23
2.2
Taxonomy of optimization techniques . . . . . . . . . . . . . . . . . . . .
26
2.3
Search paths generated by HC and SA . . . . . . . . . . . . . . . . . . . .
31
2.4
Crossover operators with binary enconding . . . . . . . . . . . . . . . . .
33
2.5
Sample crossover and mutation for car design solutions . . . . . . . . . .
34
2.6
Path generation in PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.7
Paths between binary encoded solutions of the car design problem. . . .
38
2.8
MOFs conceptual map . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.1
Experimental life-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.2
Conceptual map about experimental description . . . . . . . . . . . . . .
49
3.3
Experimental objects, populations and sample . . . . . . . . . . . . . . .
50
3.4
Taxonomy of experimental variables . . . . . . . . . . . . . . . . . . . . .
51
3.5
Taxonomy of experimental variables according to their levels . . . . . . .
53
3.6
Hypothesis acceptance and rejection areas . . . . . . . . . . . . . . . . . .
63
5.1
Stacked Bar Chart showing MOFs techniques support . . . . . . . . . . . 107
5.2
Adaption to the problem and its structure support . . . . . . . . . . . . . 112
xix
LIST OF FIGURES
5.3
Advanced characteristics support . . . . . . . . . . . . . . . . . . . . . . . 117
5.4
General optimization process support . . . . . . . . . . . . . . . . . . . . 121
5.5
Design, implementation & licensing assessment . . . . . . . . . . . . . . 125
5.6
Frameworks size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.7
Publications and external authors per MOF . . . . . . . . . . . . . . . . . 129
5.8
Documentation and technical support . . . . . . . . . . . . . . . . . . . . 130
5.9
General scores of MOFS as Kiviat diagrams . . . . . . . . . . . . . . . . . 135
6.1
SEDL structure and its mapping to a sample experiment . . . . . . . . . 139
6.2
Schema of the context information supported by SEDL . . . . . . . . . . 140
6.3
Schema of the context information supported by SEDL . . . . . . . . . . 140
6.4
SEDL document with randomized design . . . . . . . . . . . . . . . . . . 141
6.5
Descriptive hypothesis supported by SEDL . . . . . . . . . . . . . . . . . 142
6.6
Simple randomized design supported by SEDL . . . . . . . . . . . . . . . 142
6.7
Simple experimental execution in SEDL . . . . . . . . . . . . . . . . . . . 145
6.8
Experiment description and relational model of its results . . . . . . . . . 146
6.9
Sample of command experimental procedure . . . . . . . . . . . . . . . . 147
6.10 Samples of statistical analyses specifications and results . . . . . . . . . . 148
7.1
MOEDL structure and its mapping to a sample MOE . . . . . . . . . . . 159
7.2
Sample MOEDL experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3
Example of mapping from MOEDL to SEDL . . . . . . . . . . . . . . . . 161
7.4
Problem instances enumeration supported by MOEDL . . . . . . . . . . 162
7.5
Optimization benchmarks specification supported by MOEDL . . . . . . 163
7.6
Problem instance generator defined in MOEDL . . . . . . . . . . . . . . . 163
7.7
Metaheuristic techniques specification supported by MOEDL . . . . . . 164
xx
LIST OF FIGURES
7.8
Global termination criteria and random number generators in MOEDL . 165
7.9
Tailoring experiment in MOEDL . . . . . . . . . . . . . . . . . . . . . . . 167
7.10 Tuning experiment in MOEDL . . . . . . . . . . . . . . . . . . . . . . . . 168
7.11 Transformation from MOEDL to SEDL . . . . . . . . . . . . . . . . . . . . 174
8.1
Template MOSES component . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.2
Components of MOSES
8.3
MOSES[RI] component diagram . . . . . . . . . . . . . . . . . . . . . . . 193
8.4
MOSES[RI] deployment diagram . . . . . . . . . . . . . . . . . . . . . . . 194
8.5
Architecture and users of STATService . . . . . . . . . . . . . . . . . . . . 195
8.6
Decision tree used for test selection . . . . . . . . . . . . . . . . . . . . . . 197
8.7
Snapshots of the STATService web portal . . . . . . . . . . . . . . . . . . 200
8.8
MOSES Studio user interface navigability. . . . . . . . . . . . . . . . . . . 201
9.1
Selection experiment for QoSWSC (Exp 1) . . . . . . . . . . . . . . . . . . 208
9.2
Selection experiment for QoSWSC (Exp 2) . . . . . . . . . . . . . . . . . . 209
9.3
Selection experiment for QoSWSC (Exp 3) . . . . . . . . . . . . . . . . . . 211
9.4
Selection experiment for QoSWSC (Exp 4) . . . . . . . . . . . . . . . . . . 212
9.5
Results of STATService for Exp1 (100ms) . . . . . . . . . . . . . . . . . . . 213
9.6
Tailoring of ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.7
Tuning of ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.8
Analysis report and decision path generated by STATService . . . . . . . 218
9.9
ETHOM - Experiment #1 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 220
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.10 ETHOM - Experiment #2 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 221
9.11 ETHOM - Experiment #3 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 222
9.12 ETHOM - Experiment #4 in SEDL . . . . . . . . . . . . . . . . . . . . . . . 223
xxi
LIST OF FIGURES
10.1 Publications related to the contributions of this dissertation . . . . . . . . 230
B.1 Meta-model of experiments in SEDL . . . . . . . . . . . . . . . . . . . . . 254
B.2 Meta-model of experiments context in SEDL . . . . . . . . . . . . . . . . 254
B.3 Meta-model of experimental hypotheses in SEDL . . . . . . . . . . . . . 255
B.4 Meta-model of experimental variables in SEDL . . . . . . . . . . . . . . . 255
B.5 Meta-model of design in SEDL . . . . . . . . . . . . . . . . . . . . . . . . 256
B.6 Meta-model of experimental designs in SEDL . . . . . . . . . . . . . . . . 256
B.7 Meta-model of experimental configurations in SEDL . . . . . . . . . . . . 257
B.8 Meta-model of experimental executions in SEDL . . . . . . . . . . . . . . 258
B.9 Meta-model of experimental analyses specifications and results . . . . . 258
B.10 Meta-model of dataset specifications in SEDL . . . . . . . . . . . . . . . . 259
B.11 Meta-model of statistical analyses in SEDL . . . . . . . . . . . . . . . . . 260
B.12 Types of Experiments supported by MOEDL and their structure . . . . . 263
B.13 Termination criteria supported by MOEDL and their structure . . . . . . 264
E.1 Layout and structure of SEA lab-packs . . . . . . . . . . . . . . . . . . . . 275
G.1 Goods Ordering Composite Service
. . . . . . . . . . . . . . . . . . . . . 280
G.2 Box plot of results in Experiment #1 and problem instance 9 . . . . . . . 296
G.3 Results in Experiment #1 and problem instance 2 . . . . . . . . . . . . . . 296
G.4 Results of each technique in Experiment #2 for problem instance 0
. . . 302
G.5 Results of each technique in Experiment #2 for problem instance 9
. . . 302
H.1 Feature relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
H.2 Cross-tree constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
H.3 Mobile phone feature model (without cross-tree constraints) . . . . . . . 309
xxii
LIST OF FIGURES
H.4 Encoding of a feature model . . . . . . . . . . . . . . . . . . . . . . . . . . 310
H.5 Example of one-point crossover in our algorithm . . . . . . . . . . . . . . 311
H.6 Examples of infeasible individuals and repairs . . . . . . . . . . . . . . . 312
H.7 Distribution of fitness values for random and evolutionary search . . . . 318
I.1
Support letter from University of Tubingen . . . . . . . . . . . . . . . . . 321
I.2
Support letter from Univeristy of Applied Science of Upper Austria . . . 322
I.3
Expression of interest on FOM from ISOIN . . . . . . . . . . . . . . . . . 323
I.4
Timeline of individual visitors to the STATService web portal . . . . . . 324
I.5
Map of visits to the STATService web portal . . . . . . . . . . . . . . . . . 325
xxiii
L IST OF TABLES
1.1
Support per contribution for each phase of the MPS life-cycle . . . . . .
16
1.2
Support per contribution for each phase of the experimental life-cycle .
16
2.1
Pros and Cons of using MOFs . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.1
Two Sample 3x3 latin squares for a technique comparison experiment . .
58
3.2
Statistical procedure decision table.
. . . . . . . . . . . . . . . . . . . . .
61
3.3
Specific STH for basic experiments with a single independent variable .
61
3.4
Specific STH for experiments with multiple independent variables . . .
62
3.5
Regression coefficients and models . . . . . . . . . . . . . . . . . . . . . .
62
5.1
Selected MOFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
5.2
Areas of interest and comparison characteristics . . . . . . . . . . . . . . 100
5.3
MOFs Programming languages, platforms and licenses . . . . . . . . . . 126
9.1
Tailoring variants in ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.2
Tuning values in ETHOM . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
A.1 Coverage of features in area C1 . . . . . . . . . . . . . . . . . . . . . . . . 238
A.2 Coverage of features in area C2 . . . . . . . . . . . . . . . . . . . . . . . . 243
A.4 Coverage of features in area C4 . . . . . . . . . . . . . . . . . . . . . . . . 245
A.3 Coverage of features in area C3 . . . . . . . . . . . . . . . . . . . . . . . . 247
A.5 Scores for C1 - C4 and C6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
xxv
LIST OF TABLES
A.6 Scores for C5 design, implementation & licensing . . . . . . . . . . . . . 249
A.7 Global scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
D.1 Set of tests and post-hoc analyses supported by SEDL . . . . . . . . . . . 271
G.1 Service providers per Role and their corresponding QoS Guarantees . . 280
G.2 QoS aggregation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
G.3 Parameters Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
G.4 Means of obj. func. per algorithm and exec. time (Experiment 1) . . . . . 297
G.5 Mean percentage of solutions improving any obtained by other tech.
(Exp1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
G.6 Means of obj. func. values per algorithm and execution time in Exp. 2 . 300
G.7 Mean percentage of solutions improving any obtained by other tech.
(Exp2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
H.1 Algorithm tailoring, experiment ETHOM #A1 . . . . . . . . . . . . . . . 313
H.2 ETHOM tuning, experiment ETHOM #A1 . . . . . . . . . . . . . . . . . . 313
H.3 Evaluation results on the generation of feature models maximizing execution time in a CSP solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
H.4 Maximum execution times produced by random models and our evolutionary program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
H.5 BDD size and computation time of the hardest feature models found . . 316
xxvi
PART I
P REFACE
1
I NTRODUCTION
Life is an optimisation problem, with tons of variables and constraints,...
We can only optimise life, never solve it
Chetan Baghat,
(from Three mistakes of my life)- Indian novel writter
his chapter presents an overview on the results presented throughout this dissertation.
In Section §1.1 we review our research context. Section §1.2 describes the purpose of
this work and motivates the problems addressed in this thesis. Section §1.3 describes
our research goals. Section §1.4 describes the approach followed to fulfill such goals. Section
§1.5 explains the context in which this work has been performed. Finally, in Section §1.6 we
present the structure of this dissertation.
T
3
CHAPTER 1. INTRODUCTION
1.1
R ESEARCH CONTEXT
The progress and prosperity of our species has been largely determined by our ability to optimize the tasks that we perform. The step forward in Neolithic was to optimize our ability to produce food by cultivating the land and taming animals. Through
the industrial revolution we greatly optimized production processes. Nowadays, the
optimization of the processing and transmission of information is driving the revolution of computers and Internet. Humankind has not stopped looking for -and findingmore optimized solutions to the problems we face every day. Solving optimization
problems is therefore an important task which appears in virtually every area of human activity [94].
An optimization problem can be defined as finding, from a set of candidate solutions, the one which best fulfills a set of objectives, where not all candidate solutions
are feasible in general. In Mathematics and Computer science, functions are used to define which solution is better, describing the objecives of maximization or minimization,
and a set of constraints are used to specify which solutions are feasible. Solutions are
usually expressed as assignments of values to the set of variables on which objective
functions and constraints are defined.
Depending on the frequency and available time for problem solving, and on the
required quality of the solutions, Rardin and Uzsoy [232] establish three kinds of optimization problems. Design problems are solved once (or at least infrequently) and quality of solutions is critical. Control problems are solved very frequently, solutions must
be provided in near real-time, and quality of solutions is important but not inalienable.
Planning problems provide a balance between those extremes.
1.1.1
Metaheuristics
Heuristics are optimization algorithms that use details of the problem to improve
the solutions obtained [219, 241]. Heuristic methods have proven to be a handy tool
to solve hard optimization problems. Usually heuristics are approximate, providing
a balance between the quality of solutions and the execution time required to obtain
them. The problem-specificity of heuristics makes its design and development a timeconsuming task to be faced for each problem. Metaheuristics avoid the need of designing an ad-hoc heuristic algorithm for each problem from scratch. Metaheuristics provide a reusable algorithm scheme that can be tailored for each problem. The advantage
4
1.1. RESEARCH CONTEXT
of reducing the cost of designing optimization algorithms as well as their good results
for solving NP-hard problems, have boosted the widespread adoption of metaheuristics. Consequently, metaheuristics have been used in a plethora of contexts for solving
disparate optimization problems in recent decades [120], leading also to a boom of the
research in this area [94].
The tailoring of metaheuristics is performed by completing the steps in the algorithmic scheme that are not fully specified. For those steps, the intended behaviour
is loosely specified by the metaheuristic, defining what should be done at a high abstraction level, but not how. This abstraction mechanism enables the definition of the
algorithmic scheme of the metaheuristic independently of the problem, while taking
advantage of problem-specific knowledge in the behaviour of the tailored algorithms.
We name such steps, the tailoring points of the metaheuristic.
Solving optimization problems using metaheuristics requires performing numerous activities to be undertaken in a coordinated manner. Figure §1.1 shows a possible
grouping of such activities as a process with five major stages: Selection, Tailoring, Implementation, Tuning and Execution. We coin this process as the “Metaheuristic Problem
Solving (MPS) life-cycle”. In the Selection stage the specific metaheuristic to use for
problem solving is chosen. In the Tailoring stage the algorithmic scheme of the metaheuristic is completed and tailored to the specific problem at hand, obtaining a fully
specified algorithm. In the Implementation stage the algorithm is implemented as a
metaheuristic optimization program. In the Tuning stage specific values for the parameters of the metaheuristic program are set (e.g. population size in evolutionary
algorithms). The result of this stage, a tuned metaheuristic program that can be invoked for a problem instance and provides a solution (or a set of solutions), is denoted
as a MPS application. Finally, the MPS application is executed to obtain solutions to the
problem in the Execution stage.
The quality of the solutions provided by a MPS application depends on the appropriate matching of optimization problem and solving algorithm -this is a widely accepted consequence of the No Free Lunch (NFL) theorem [300]. In the specific context of
metaheuristic optimization, this translates into an appropriate decision making in the
stages of selection, tailoring and tuning. However, current theoretical development
does not provide analytical methods to make those decisions [50] and the advised procedure is experimentation [22, 50]. We name the experiments intended to make such
decisions Metaheuristic Optimization Experiments (MOEs).
5
CHAPTER 1. INTRODUCTION
Figure 1.1: Metaheuristic problem solving life-cycle
1.1.2
Experimentation
Experimentation refers to a methodical procedure of actions and measurements
with the goal of empirically verifying or falsifying an hypothesis [116, 154, 248]. In the
context of the MPS life-cycle, this experimentation means a process of structured inquiry on the alternatives for decision making in selection, tailoring, and tuning, and an
analysis of the impact of such alternatives on the performance of the algorithm. Current methodologies involve performing several experiments and analyses in an specific
sequence in order to make such decisions [19, 22, 29, 50, 237], which requires a considerable effort. In order to illustrate this point, it is necessary to consider the activities
associated with experimentation in the context of the MPS life-cycle. Figure §1.2 shows
a possible grouping as a process with four main ativities: Design, Development, Conduction, and Analysis. These stages are integrated into the activities of the MPS life-cycle
as described below.
The Design of the experiment is the first activity to be performed. The output of
this activity is a detailed plan -hereinafter experimental protocol- that tries to maximize
the information obtained from the experiment. In MPS contexts it implies deciding
what metaheuristic algorithms will be implemented using which tailorings and parameter values, and how are we going to run them in order to reach the conclusions
we are searching for. Next, experimental artefacts are developed. In MPS contexts this
involves two activities: implementing the metaheuristic algorithms chosen in the previous activity, and implementing the program that executes the experimental protocol
as defined in the design phase (hereinafter the experimental program). Once all experimental artefacts are available, the experiment is conducted. In our context this implies
6
1.1. RESEARCH CONTEXT
Figure 1.2: Experimentation in the context of the MPS life-cycle
executing the experimental program. This execution generates a dataset of results to be
analysed. Finally, based on the results of such analysis, conclusions are drawn. In our
context, this involves interpreting the results of the analysis to choose the metaheuristic
algorithm and its parameter setting.
The quality of the MPS life-cycle is determined mostly by the quality of the experiments performed for its decision making activities. The quality of the experiments is
determined primarily by two factors: its degree of validity and its replicability [116]
Traditionally researchers have established two types of experimental validity: external and internal [248]. In this dissertation we focus on internal validity, since it is
essential to ensure the accuracy of the decision making1 . Internal validity is defined
as the extent to which we can infer that the hypothesis holds (or not) from the experimental process and data. In turn, an experiment is replicable when its results can
be verified and/or clarified through conducting another experiment, either following
the same experimental protocol in similar conditions (exact replication), or through a
different procedure aimed to verify similar hypotheses about the same phenomenon
(conceptual replication) [126].
The data intensive nature of current research in computer science, and the always
changing environment of current computation platforms, makes the achievement of
replicability even more difficult. This situation has led to the recent rise of two trends:
reproducible research and executable papers. The former trend emphasizes the need of pro1 Precision
of measurements in MOEs depends mainly on the implementation technology and execution platform, and external validity is mainly related with the generalizability of conclusions
7
CHAPTER 1. INTRODUCTION
viding a comprehensive, detailed and unambiguous description of the experiments
and a copy of its results and artefacts [276]. These elements are usually provided
as laboratory packages (henceforth named lab-packs) that contain all the relevant information of the experiment. The goal of providing a comprehensive description of
the experiments is being achieved currently by creating repositories of experimental lab-packs [87, 202, 218, 223]. The executable papers trend focuses on automating
the dependent replication of experiments, rather than on enabling the manual replication by other researchers. Some initiatives have emerged recently to support the
creation of executable papers, such as “the executable papers grand challenge” [82],
that promotes the creation of platforms for authoring and publishing executable papers [114, 127, 200].
1.1.3
Tooling support
The effort required for running the MPS life-cycle depends strongly on tooling support. In the context of MOEs (stages of selection, tailoring and tuning) those tools are
mainly statistical analysis packages and design of experiment systems. The burden of
the implementation stage can be significantly reduced using specific software tools. In
this dissertation this kind of software tools are named Metaheuristic Optimization Frameworks (MOFs). The use of MOFs also improves the confidence in the correctness of the
implementation, since the algorithms and tailoring mechanisms provided have been
validated by other users. A considerable number of MOFs have been proposed in the
literature (in Chapter §5 we have identified up to thirty four).
In this dissertation we focus on supporting the process of optimization problem
solving with metaheuristics when experimentation is required and MOFs are used.
Additionally, we aim at enabling the creation of automatically reproducible and easy
to replicate experiments with high internal validity.
1.2
M OTIVATION
A key question regarding the MPS life-cycle is when it is worthy to apply metaheuristics to solve a problem.
In order to answer this question, we may model the worthiness of a technique T
for solving an optimization problem P. The value of an optimization technique for a
given problem can be measured as the value of the solutions that it provides in terms
8
1.2. MOTIVATION
of costs savings, increased profits, better quality of products and services, etc. Thus,
our model is based on the net profit generated by solutions obtained. It is computed
as the value generated by the solution obtained minus the cost of solving the problem:
Pro f it ( T, P) = Value(Solution( T, P)) − Cost ( T, P)
For instance, an optimization problem that most of us solve each week is to maximize the number of items from the market that we store in our fridge. In this case, the
profit of using an optimization technique is usually negative, since the costs of modelling the problem, and applying the technique are usually bigger than value of the solutions provided. Moreover, the dimensionality of this problem is usually small, thus
humans can provide good solutions cheaply. Nevertheless, solving a similar packing
problem, such as the transportation containers filling problem is very profitable. The
causes of this high profit are that the value generated by good solutions is high, and
that the problem is solved very frequently. Moreover, the standardization of containers
and item sizes reduces the complexity of the problem and the cost of solving it.
The Value depends on the optimality of the solutions provided by T, and the Cost
is comprised of the costs of executing each activity of the MPS life-cycle. In particular,
the cost comprises of implementation cost, execution cost, and decision making cost
(i.e., experimentation cost).
There are two ways of improving the worthiness of metaheuristics for any problem: reducing the costs of executing the MPS life-cycle or increasing the value of the
solutions that they provide. The former involves reducing implementation and experimentation costs, since execution costs are usually small and almost fixed. According to
the NFL theorem [300], the latter implies improving the accuracy of the decisions made
in the MPS life-cycle; i.e., improving capability to draw clear and accurate conclusions
from MOEs.
A way to improve the performance of processes is providing tooling support. In
this sense, to reduce the implementation burden, an extensive number of MOFs have
been created (up to thirty four, c.f. Section §5.2.2). However, the support provided
by MOFs for the different metaheuristics is uneven, and its use involves overcoming a
steep learning curve (c.f. chapter §5). Thus, the choice of the appropriate MOF is crucial to ensure that the costs are actually reduced by its use. Neither general reviews nor
comparative studies have been conducted in the literature on MOFs. On the contrary,
literature either lacks of comparative analysis (e.g. [281]), or it focuses on very specific
9
CHAPTER 1. INTRODUCTION
criteria (such as performance or genetic operators) with a narrow perspective and few
MOFs compared (e.g. [41, 71, 107]). Moreover, even though there is no framework that
supports a majority of metaheuristics (c.f. Section §5.3) and interoperability problems
between frameworks have been known for decades [182], there are not proposals to
support MOFs integration or interfaces standardization.
Regarding experimentation, to the best of our knowledge there is no tool providing
support for the whole process depicted in Figure §1.2. On the contrary, there are different tools that support one or two activities each, and a general lack of interoperability
and of automation in the information exchange between the tools. This situation has
three main consequences on the experimentation process:
1. It becomes tedious and time-consuming since users must execute the corresponding
actions on each tool and must perform manually the information exchange. This
problem is aggravated by the fact that several replications are needed to reach
strong conclusions, and that each experiment could be executed several times2 .
As a consequence, it is not surprising that the usual strategy to make the decisions
in the selection, tailoring and tuning stages is the “best-guess strategy” [21].
2. It becomes error-prone and knowledge-demanding. Since no tool has a complete picture of the experimental process, the responsibility of maintaining the consistency in the activities of the process relies on the experimenter. Moreover, users
are forced to master a set of complex tools and choose the correct path through a
forest of features and options. This problem is aggravated by the inherent complexity of the subjects (design of experiments, statistical analysis, etc.), that requires some education [155].
3. It becomes harder to replicate. The need of using several independent tools with its
own configuration, versioning, dependencies, and feature changes along time,
creates a complex environment that is difficult to replicate [62, 269]. This problem
is aggravated by some specific issues of randomized optimization algorithms and
experimental computer science in general [21, 151], the number of tailoring points
and possible variants of metaheuristics, and the lack of a widely accepted scheme
of experimental reporting similar to those used in the natural sciences [116].
2 In
some of our articles [247] the experimentation had to be carried out up to four times, due to
the need of taking into account special cases in the encoding and reparation mechanisms. Those problems were noticed in the analysis stage, leading to the need of modifying the implementations, and
re-conducting the experimentation.
10
1.3. THESIS GOALS
1.3
T HESIS GOALS
The main goal of this thesis is:
Main Goal
Improving the applicability of metaheuristics for solving optimization
problems when experiments must be carried out by reducing its cost
This abstract goal translates into three specific objectives:
Specific Goals
• Define a comparison and selection mechanism for MOFs
• Ensure the replicability and internal validity of MOEs
• Speed-up and automate the execution of the MPS lifecycle
An additional goal of this work is to devise a minimum implementation able to
support the conceptual solutions provided to meet the above goals and enable its validation.
Finally, some of the objectives stated are not specific of the metaheuristic optimization area. The automation of the experimentation and the systematization of the experimental descriptions are challenging goals for any experimental branch of computer
science. Authors believe that the contributions provided may conform a suitable starting point for the creation of a general platform for executable experiments and reproducible research. Thus, we try to maximize the scope of our approaches to contribute
to the creation of such a platform in the future.
11
CHAPTER 1. INTRODUCTION
1.4
P ROPOSAL SOLUTION
The main contribution of this thesis is a set of support tools to reduce the cost of using metaheuristics to solve optimization problems. These contributions can be divided
into five groups described below.
1.4.1
On the implementation of MPS applications
In order to ease the implementation of metaheuristic algorithms applications we
propose the following contributions:
• A Comparison Framework (CF) to reduce the cost of selecting the best MOF to solve
a given optimization problem. The framework includes a comprehensive set of
features that an ideal MOF should support, definitions of metrics for assessing
the support of such features, and means to aggregate such assessments into general quantitative scores. Based on such comparison framework, ten Metaheuristic
Optimization Frameworks (MOFs) are assessed to provide a picture current state
of the art. This contribution has been published in the Soft Computing Journal
[213].
1.4.2
On the description of MOEs
For the description of metaheuristic experiments we propose the following contributions:
• Two languages to reduce the cost of describing, automating and replicating experiments: SEDL and MOEDL. Scientific Experiments Description Language (SEDL)
enables the description of domain-independent experiments in a precise, unambiguous, tool-independent, and machine-processable way. SEDL documents include all the information required to describe the design and execution of experiments including the definition of variables, hypotheses and analysis tests.
SEDL also includes several extensions points for the creation of domain-specific
languages. In turn, Metaheuristic Optimization Experiments Description Language
(MOEDL) is an extension of SEDL for the description of MOEs. MOEDL abstracts the user of the majority of the implicit details of typical metaheuristic experiments such as techniques comparison or parameter tuning.
12
1.4. PROPOSAL SOLUTION
1.4.3
On the automated analysis of MOEs
For the automated analysis and validation of experimental descriptions and results,
we present the following approaches:
• A set of 15 analysis operations on SEDL documents to reduce the cost of checking
the validity and replicability of the experiments. These operations automatically
check the existence of validity threats warning the users and suggesting fixes. For
instance, we provide an operation that checks if the size of data generated by the
experimental conduction is consistent with the design of the experiment.
• A statistical analysis tool (STATService) to reduce the cost of validating experimental conclusions by testing hypotheses. STATService is especially designed to be
used by inexperienced users with no background on statistical tests. Given an
input data set, the tool automatically choose the most suitable statistical tests
providing the corresponding results. STATService is provided with several interfaces, including a web interface, which makes it intuitive and easy to use by
experimenters from any research discipline. The tool has already being used by
9 laboratories in 5 countries3 .
1.4.4
On the automated conduction and replication of MOEs
In order to automate the conduction and replication of MOEs, we present the following contributions:
• A Metaheuristic Optimization Software EcoSystem (MOSES), to reduce the cost of
executing the MPS life-cycle. MOSES provides the design of a global architecture
for supporting the automation of the experimentation process in the context of
metaheuristic optimization. This architecture is defined in terms of service contracts, and software components that act as providers and consumers of those
contracts, binding with other components. The information exchange between
those components is based on SEDL, MOEDL and a format for experimental labpack (Scientific Experiment Archive (SEA)) proposed by the authors.
3 This
information is about registered users. The number of anonymous users is much larger (c.f. see
web statistics in Appendix §I)
13
CHAPTER 1. INTRODUCTION
• A Reference Implementation of MOSES (MOSES[RI]) including an implementation
of the key components of ecosystem, namely FOM, EEE and STATService. Framework for Metaheuristic Optimization (FOM) is a MOF developed by the authors.
Since FOM is the framework that supports more metaheuristic techniques (c.f.
Section §5.3), it enables the exploration of alternative metaheuristic approaches
and their respective tailorings at a low cost. Experiment Execution Environment
(E3) enables the full automation of metaheuristic experiments described as a
MOEDL document plus a SEA lab-pack.
1.4.5
On the development of MPS applications
We evaluated our contributions by developing MPS-based applications for solving
two relevant software engineering problems, namely:
• Quality-drive web service composition. In this problem, the goal is to find a set of
candidate services that maximize the overall non-functional properties (i.e., quality) of a web service composition. Experiments show that our algorithm, called
QoSGasp, outperforms previous metaheuristic approaches proposed in literature
for real-time binding scenarios. Specifically, QoS-aware GRASP+PR algorithm for
service-based applications binding (QoS-Gasp) provided bindings that improve QoS
provided by previous proposals up to a 40%.
• Hard Feature Model Generation. In this problem, we try to create feature models
(c.f. Section §H.2) as difficult to analyze as possible for current tools, in order
to determine its performance in pessimistic scenarios. The proposed algorithm,
called ETHOM, found feature models of realistic size whose analysis takes more
then 30 minutes by current tools.
1.4.6
Overall contributions
The set of contributions provided and the corresponding stages of the MPS lifecycle for which they contribute to reduce costs are depicted in Figure §1.3.
The contributions for the description (Section §1.4.2), automation (Section §1.4.4)
and validation (Section §1.4.3) of MOEs contribute to reduce the cost of the stages
where experimentation takes place, this is, selection, tailoring an tuning. Our comparative framework as well as FOM (Section §1.4.1) contribute to reduce the cost of
14
1.5. THESIS CONTEXT
Figure 1.3: Summary of contributions per phase of the MPS life-cycle
implementing metaheuristic applications. Finally, the two metaheuristic algorithms
presented (Section §1.4.5) address specific search-based problems and were used to
validate the rest of our contributions.
Tables §1.1 and §1.2 depict a tabular summary of our contributions indicating the
specific stages of the MPS and MOE lifecyles that they support. As illustrated, most
of our contributions are intended to ease the burden of experimentation in the stages
of selection, tailoring and tuning. We may also remark that we provide support for all
the tasks of both lifecyles.
1.5
T HESIS CONTEXT
This thesis has been developed in the context of the Applied Software Engineering
research group (Ingenierı́a del Software Aplicada - ISA) of the University of Seville.
The following research projects and networks made this thesis possible:
15
CHAPTER 1. INTRODUCTION
Impact of our contributions on the phases of the MPS life-cycle
Contribution
Selection
Tailoring
Implementation
4
4
4
4
4
4
MOSES
-E3
MOE description
4
4
4
4
4
MOEDL & SEDL
SEA
4
4
4
4
Tuning
Execution
MPS Implementation
CF
FOM
4
MOE automation
4
4
4
4
4
4
4
4
4
4
4
4
4
MOE validation
Analysis Operations
STATService
Table 1.1: Support per contribution for each phase of the MPS life-cycle
Impact of our contributions on the phases of the experimentation life-cycle
Contribution
Design
Develop Conduct
Analyse
MOE automation
MOSES & MOSES [RI]
-E3
4
4
4
4
4
4
4
4
4
4
4
4
MOE description
MOEDL & SEDL
SEA
MOE validation
Analysis Operations
STATService
4
4
4
Table 1.2: Support per contribution for each phase of the experimental life-cycle
SETI: reSearching on intElligent Tools for the Internet of services (TIN2009-07366),
project funded by the Spanish Government (Ministerio de Economı́a y Competitividad) In the context of this project I was awarded a four-year grant for the development
of my PhD thesis. Thus, this project constitutes the main funding of this work.
16
1.6. STRUCTURE OF THIS DISSERTATION
ISABEL: Ingenierı́a de Sistemas Abiertos Basada en Lı́nEas de productos (TIC-2533).
Excellence project funded by the Regional Government of Andalusia. It set the basis
for starting the appropriate definition of a comparison framework for MOFs.
S-Cube: the European Network of Excellence in Software Services and Systems,
funded by the European Commission. Thanks to our participation in this network,
we identified the need to provide a QoS-aware binding algorithm for Service Based
Applications in real-time.
THEOS: Tecnologı́as Habilitadoras para EcOsistemas Software. Excellence project
funded by the Regional Government of Andalusia. It supported the development of
MOSES [RI] and Evolutionary algoriTHm for Optimized feature Models (ETHOM).
TAPAS: Tecnologı́as Avanzadas para Procesos como Servicios (TIN2012-32273), project
funded by the Spanish Government (Ministerio de Economı́a y Competitividad). It
supported the development of QoS-Gasp.
1.6
S TRUCTURE OF THIS DISSERTATION
This dissertation is organised as follows:
Part I: Preface. It comprises this introduction chapter, in which we introduce our
research context, motivate our thesis by presenting the problems addressed, establish
our goals and summarize our contributions.
Part II: Background Information. This part feeds the reader with information regarding the research context in which our work has been developed. In Chapter §2, we
introduce the main concepts of optimization, metaheuristics and experimentation. In
chapter §3 we delve on the concept of experiment, its properties, and the specific type
of experimentes performed in the MPS life-cycle.
Part III: Our Contribution. This part is the core of our dissertation and is organized
in five chapters. In Chapter §4 we define the problems identified regarding the support for the MPS and experimentation life-cycles, and we review the literature on the
topics with regard to each and every problem identified. In Chapter §5 we present our
comparison framework for MOFs and a benchmark of current MOFs developed based
on it. Chapter §6 provides a description of SEDL, our general purpose experimental description language. This chapter also presents a catalog of analysis operations
for SEDL documents. Chapter §7 describes a domain-specific language for describing
17
CHAPTER 1. INTRODUCTION
metaheuristic experiments. In this chapter a set of transformation rules from MOEDL
to SEDL are also presented. Chapter §8 describes a software ecosystem architecture for
supporting experimentation named MOSES and its reference implementation (named
Reference Implementation of the Core of MOSES (MOSES[RI])). Furthermore, this chapter
describes our proposal for the statistical analysis required in the context of the MPS
life-cycle, named STATService (as part of MOSES[RI]).
Part IV: Validation. This part describes the validation performed on MOSES[RI].
This validation is based on the application of MOSES[RI] to two search-based problems
described in Chapter §9.
Part V: Final Remarks. Chapter §10 concludes this dissertation and highlights
some future research directions.
Part VI: Appendices. Several appendices have been attached to this dissertation to
add supplemental material. Appendix §A provides the full data tables of our evaluation of MOFs based on the comparison framework described in chapter §5. Appendix
§B describes the abstract syntax of SEDL and MOEDL by using UML meta-models.
Appendix C describes the syntax for metaheuristic algorithms specification supported
by MOEDL in EBNF. Appendix §D provides a table with the set of Statistical Tests supported by STATService and SEDL. Appendix §E describes our proposal for structure
and packaging format of experimental lab-packs (SEA). Appendix §F briefly describes
the experimental execution environment proposed as a part of the reference implementation of MOSES, named E3. In Appendix §G we describe our approach for solving the
quality-driven web service composition problem, and the results of the experiments
performed to compare it with some previous proposals. In Appendix §H we describe
our approach for hard feature model generation and the results of the experiments
performed on it. Finally, appendix §I shows some additional evidences of the utility of
our contributions. First, it provides support letters from two of the authors of the MOF
evaluated in our survey, stating that it has been useful for the selection of features to
be added in upcoming versions of their tools. Next, an expression of interest of FOM
from a consultancy company is shown. Finally, the use statistics of the web interface
of STATService are provided.
18
PART II
B ACKGROUND I NFORMATION
2
O PTIMIZATION P ROBLEMS AND
M ETAHEURISTICS
You should reach the limits of virtue, before you cross the border of death
Ancient Spartan Saying,
n this chapter, we introduce optimization problems and metaheuristics. In section §2.1
the concepts of optimization problem, global optimum in contrast with to local optima
are presented. Furthermore, we discuss about the underlying reasons of the difficulty of
optimization problems solving, and present some sample optimization problems that are related
to our specific application problems. Section §2.2 describes different techniques for optimization problem solving focusing on metaheuristics. Section §2.6 presents software tools available
for solving optimization problems through metaheuristics, focusing on software frameworks.
Finally, section §2.7 summarizes the concepts presented in this chapter.
I
21
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
2.1
O PTIMIZATION
2.1.1
Introduction
Optimization is about choosing the best element from a set of available alternative
solutions A. In the simplest case, this means to minimize or maximize a function f
by choosing the values of their parameters. Thus, an optimization problem P = ( A, f )
amounts to find an x ∗ ∈ A, such that given a function f : A −→ R, then ∀ x ∈ A •
f ( x ∗ ) ≥ f ( x ) 1 . In this formulation x ∗ denotes the best solution also named the global
optimum. Some optimization problems can have multiple global optima with the same
value on f (note the ≥ in the formulae).
Optimization problems are often expressed with a special notation. For instance,
min x∈R ( x2 + 1) asks for the minimum value for the objective function x2 + 1, where x
ranges over the real numbers R. The single global optimum in this case is x = 0. Figure
2
2
§2.1(a) depicts the value of the objective function f ( x ) = e−( x +y ) . A maximization
2
2
problem can be defined as maxx,y∈[−2,2] (e−( x +y ) ), where both x and y are real variables that range from −2 to 2. A solution to this problem can be denoted as (v x , vy ),
where v x and vy refer to the values for x and y respectively. The optimal solution to
this problem is (0, 0) as Figure §2.1(a) shows.
When more than one objective function must be optimized at the same time, the
problem becomes multiobjective. In multiobjective problems, there could be multiple
optimal solutions for each function to be optimized and consequently, it may be that
there is not a single solution that is the global optimum for all of the objective functions.
For instance, decision making for economic policies are a typical area of application for
multiobjective optimization, since several closely related indicators must be controlled
and optimized simultaneously: minimize inflation, unemployment and deficit while
maximizing growth and commercial balance.
2.1.2
Why are optimization problems hard?
Solving optimization problems is in general a hard task. Most classical optimization
problems are NP-hard, and for typical real life situations problem instances have huge
or even infinite solution spaces. Furthermore, in some cases there is not an analytical
1 This
formulation defines a maximization problem, but we could define a minimization problem by
stating that ∀ x ∈ A • f ( x ∗ ) ≤ f ( x )
22
2.1. OPTIMIZATION
(a) e−( x
2 + y2 )
has a single global optimum
(b) Objective function with a local optimum
(c) The neighborhood of the solution (0, 0) is (d) Objective function with infinite local optima
highlighted.
Figure 2.1: Objective function landscapes of several optimization problems.
23
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
expression for the objective function, or its evaluation is so time consuming or resource
intensive that sampling a significant part of the search space is inconceivable.
As an example, let us consider the design of a car as an optimization problem2 . The
goal is to create a car design that maximizes speed, which is a hard problem since a
car is a highly complex system in which speed depends on a number of parameters
such as engine type, its components as well as shape and body elements. The objective
function of this optimization problem would not have an analytical expression, but
a simulator should be used to obtain measurements about the speed that the designs
can reach. As a consequence, the evaluation of the objective function would require the
execution of a number of simulation runs, and would be extremely time consuming.
Furthermore, this problem is likely to have extra constraints like keeping the cost of
the car under a certain value, making some designs infeasible.
One additional difficulty of optimization problem solving is that, even for singleobjective problems, it is not clear when the global optimum is found, taking the risk
of choosing a sub-optimal solution as result of stopping the search. For instance,
2
2
2
2
for the objective function f ( x, y) = 2e−(−1.7+ x) −(−1.7+y) + e− x −y , shown in Figure
§2.1(b), being a maximization problem, one problem solving technique that traverses
the search space could find the solution (0, 0), and check that all its adjacent solutions
are worse than (0, 0). If the technique decides to stop the search and returns (0, 0), it is
missing the actual global optimum (1.7, 1.7).
A key concept related to the solution space is neighboring. It is said that a solution
y ∈ A is neighbour to other solution x ∈ A if y is close to x regarding a specific criteria.
Neighborhoods are defined by a neighboring function neighborhood : A −→ P ( A),
being P ( A) the powerset of A. Thus, neighborhood( x ) provides a set of solutions in A
that are neighbors of x. An alternative way of defining a neighborhood is by means of
a boolean function isNeighbor : A × A −→ {true, f alse} where given two solutions x
and y, it indicates whether y is a neighbor of x or not.
For instance, given the problem of maximizing the function shown in Figure §2.1(b),
with R2 as search space, we can
qdefine a neighborhood based on the euclidean distance
such as: isNeighbor ( p1 , p2 ) ≡ ( p1 .x − p2 .x )2 + ( p1 .y − p2 .y)2 < 1. Figure §2.1(c) shows
the neighborhood of a solution using this function.
Given an maximization problem P = ( A, f ), a neighboring function n, and a solution x ∈ A, x is a local optimum iff x is better than the remaining solutions in its neigh2A
24
similar example was used to illustrate the working of evolutionary algorithms in [290].
2.2. METAHEURISTICS
borhood; i.e. ∀y ∈ n( x ) • f ( x ) ≥ f (y). An additional difficulty is that the number of
local optima in the solution space of an optimization problem can be huge (or even infinite). For instance, Figure §2.1(d) shows the search landscape for the problem of maxi2
2
2
2
mizing f ( x, y) = 2e−(−1.7+ x) −(−1.7+y) + e− x −y − 0.1Sin(2( x − y)] + 0.1Sin(2( x + y)),
that has an infinite number of local optima when using as solution space A = R2 .
2.2
M ETAHEURISTICS
The techniques to solve optimization problems may be classified into exact and
heuristic (see Figure §2.2, based on [267, 268, 291]). The former provide the global
optimum under some convergence conditions, and range from finding the optimum
of the objective function analytically, to applying algorithms such as Newton’s method
or Dantzig’s simplex. When convergence conditions meet and solutions are obtained
with an affordable time and resources consumption, exact techniques are preferred.
Regarding the latter, many optimization problems do not meet those constraints,
and heuristics emerge as strategies that try to find good solutions within time and resources limits deemed practical. However, heuristics are approximate, and thus do not
guarantee finding the global optimum for the problem. Additionally, heuristic optimization strategies use problem-specific knowledge beyond the definition of the problem itself to find solutions more efficiently [241]. Therefore, heuristics are problemspecific, and must be carefully designed for each problem.
Metaheuristics appear as general schemes of algorithms that can be tailored to solve
different optimization problems, and they have generated a great amount of research
and industrial activity during the previous decades [205]. There exist a number of definitions of metaheuristic [78, 120, 190, 205], being the one of Glover and Kochenberger
the most widely accepted definition [120], who define metaheuristic as: “An iterative
process that guides the operation of one or more subordinated heuristics to efficiently produce
quality solutions for an optimization problem”.
In this dissertation we refer to subordinate heuristics as tailoring points, since they
enable the tailoring of the metaheuristic to each particular. For instance, the simplest
metaheuristic conceivable for any problem is randomly sampling problem’s solution
space, named Random Search (RS) (see Algorithm 1 3 ). This metaheuristic has three tai3 The
efficiency of this algorithm can be improved by storing the evaluation of the objective function
for the best solution found as most MOFs do. In this dissertation we prioritize readability over efficiency.
25
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Figure 2.2: Taxonomy of optimization techniques
loring points, namely: the random solution generation procedure, the objective function evaluation, and the termination criterion [120].
Algorithm 1 Random Search
bestSolFound ← random()
repeat
currentSolution ← random()
if f (currentSolution) > f (bestSolFound) then
bestSolFound ← currentSolution
end if
until Termination Criterion is satisfied
return bestSolFound
According to the way of dealing with solution space, metaheuristics are classified as Single solution based, Population method based and Building method based
[267, 268, 291]. This classifier is not disjoint, for instance, Ant Colony Optimization
(ACO) [72] is both a population method based metaheuristic and a building method
based metaheuristic, since various ants work together to build, iteratively but simultaneously, various solutions to the problem. In the next sections we delve and describe
some metaheuristics of each class, that will be used in this dissertation later.
26
2.3. SINGLE-SOLUTION BASED METAHEURISTICS
2.2.1
Hybridization
A large number of publications in literature do not purely follow the concepts of
one single metaheuristic. Instead they combine various algorithmic ideas, sometimes
also with exact techniques. These approaches are commonly referred to as hybrid
metaheuristics. This specific line of research is becoming very popular and has been
successful in a large number of applications [34]
Last but not least, metaheuristics can be combined in different ways, using one technique for generating the initial solution(s) of other, using one technique as a subroutine
in the interactions of the main loop of other, etc4 . In the context of this dissertation, an
hybrid approach that combines GRASP with Path Relinking is applied to solve the
QoS-aware Web Service Composition Binding (QoSWSCB) problem (see Section §G.2). In
our implementation, GRASP is used to initialize the elite set of PR. The elite set is not
updated in each iteration of PR, but different starting and target solutions are chosen.
2.3
S INGLE - SOLUTION BASED METAHEURISTICS
Single-solution based metaheuristics search the optimum of an optimization problem by improving iteratively a solution. The execution of this kind of metaheuristics
could be regarded as a trajectory through the search space [120]. When this trajectory
is driven by a neighbourhood the next solution to be explored is always a neighbour
of the current solution. In such case, it is said that it is a local search algorithm.
2.3.1
Hill Climbing
One of the simplest local search approaches is the Hill Climbing (HC) algorithm
(see Algorithm 2), a.k.a. Steepest Descent (SD) when solving minimization problems.
This technique searches successively for the best neighbour solution until reaching a
local optimum, i.e., there is no neighbour to the current solution such that generates
improvement on the objective function.
Hill Climbing suffers from a number of drawbacks: (i) it converges toward local
optima, (ii) it is very sensitive to the initial solution, and (iii) it traverses all the neighbourhood of current solutions before choosing the next one, and thus for large neigh4 Interested
readers can find a taxonomy of hybridization in [267].
27
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Algorithm 2 Hill climbing
nextSolution ← initialSolution
repeat
currentSolution ← nextSolution
for all x ∈ neighborhood(currentSolution) do
if f ( x ) > f (nextSolution) then
nextSolution ← x
end if
end for
until f (nextSolution) ≤ f (currentSolution)
return currentSolution
bourhoods the search is extremely time-consuming [268]. Many alternative approaches
to hill climbing have been proposed to overcome those problems such as simulated annealing and tabu search. These approaches are described next5 .
2.3.2
Simulated annealing
The application of Simulated Annealing (SA) to solve optimization problems was
proposed by Kirkpatrick et al. in [161]. Inspired by the natural process of slow cooling
used in metallurgy, SA is a local search algorithm (see Algorithm 3) that enables the
stochastic acceptance of degrading solutions. This acceptance allows escaping from local optima and continuing the search. SA generates at each iteration a set of neighbours
and picks among them the solution whose neighbourhood will be explored in the next
iteration. Improving neighbours are always accepted, but degrading solutions (those
whose objective function evaluation are worse than current solution) could also be accepted based on a probability P(∆E, τ ). This probability depends on the amount of
degradation ∆E = f (currentSolution) − f (nextSolution) of the objective function and
current temperature τ. The temperature is a time-varying parameter that determines
the probability of accepting non-improving solutions. The following constraints are
usually imposed to P(∆E, τ ):
• P(∆E, τ ) ∈ [0, 1]
• Solutions that do not improve can be accepted as the next solution if temperature
is not zero, i.e., when ∆E > 0 and τ > 0 then P(∆E, τ ) > 0.
5 Some
authors consider GRASP as an improvement of local search strategies that uses a multi-start
strategy. However, given that the construction phase of GRASP can be far more complex that the own
local search algorithm, we consider GRASP as a building technique that uses local search as a subroutine,
consequently we describe it in Section §2.5
28
2.3. SINGLE-SOLUTION BASED METAHEURISTICS
• When temperature is zero the procedure becomes a Hill Climbing, i.e., P(∆E, 0) =
0 for degrading solutions (∆E > 0).
This probability usually follows the Boltzmann distribution P(∆E, τ ) = e−
∆E
τ
The rule that drives the evolution of τ along time is named cooling scheme of the simulated annealing algorithm. Several cooling schemes has been proposed in literature
each one which a different cooling speed, namely: Linear, Logarithmic, and Exponential. The exploration of the neighborhood of current solutions is performed by means
of the pickNeighbors(solution,size) algorithm, that returns a set of size randomly
chosen neighbors of solution. The exploration of the neighborhood is bounded by a
maximum number of neighbors, that is a parameter of the algorithm named neighborsPerIteration. In the algorithm this parameter is used for invoking pickNeighbors.
Algorithm 3 Simulated Annealing
τ ← initialTemperature, currentSolution ← initialSolution, bestSolFound ← initialSolution
{Main loop}
repeat
for all x ∈ pickNeighbors(currentSolution, neighborsPerIteration) do
if f ( x ) > f (bestSolFound) then
bestSolFound ← x
end if
if P( f (currentSolution) − f ( x ), τ ) > random() then
currentSolution ← x
break
end if
end for
τ ← cooling(τ )
until Termination Criterion is satisfied
return bestSolFound
Algorithm 3 shows the pseudo-code of simulated annealing. The cooling scheme is
implemented through the cooling sub-routine. The exploration of the neighbourhood
of current solutions is performed through the pickNeighbors(solultion,size) subroutine, that returns a set of size size of randomly chosen neighbours of solution.
The exploration of the neighborhood is bounded by a maximum number of neighbors,
that is a parameter of the algorithm named neighborsPerIteration. In the algorithm
this parameter is used for inovking pickNeighbors.
29
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Figure §2.3 shows the different paths generated by HC and SA in a small solutions
space (19 solutions) and a fitness function similar to that shown in Figure §2.1(b), with
a local optimum in s3 and a global optimum in s15 . The neighbourhood definition in
this example is based on the position of the solution (being neighbours of a solution si
the previous (si−1 ) and next solution (si+1 ) in the one-dimensional search space). Using
s2 as the initial solution, HC will get stuck in the local optimum s3 . On the contrary,
using the same initial solution SA has a certain probability of reaching directly to the
global optimum s15 .6
2.3.3
Tabu search
Basic ideas of Tabu Search were proposed by Glover in [118]. This technique uses
an adaptive memory that guides the search process, avoiding searching in circles through
the solution space. This memory scheme is implemented using data structures that
store either visited solutions (tabu list), some components of those solutions, or even
the frequency of appearance of some components in the solutions visited. If a solution is identified as tabu by the memory structure, the search will discard it as the next
solutions to explore, and the search will be driven to a different area of the solution
space A. In order to avoid discarding promising solutions, an aspiration criteria is implemented. For instance, a usual aspiration criteria is allowing to select a tabu solution
if it improves the current solution by a percentage.
Algorithm 4 shows the pseudo-code of Tabu Search. The neighborhood is explored
using pickNeighbor, that returns a random neighbor of the solution provided as parameter, and by the direct enumeration using n.7 Subroutine tabu checks if a solution
is marked as tabu given the current state of the tabu memory.
6 It is worth noting that the probabilities shown in Figure §2.3 are not the probabilities of reaching each
solution from the previous one in the path, but the acceptance probabilities according to the Boltzman
distribution. For instance, the probability of going from, s2 to s3 is not the probability of acceptance of
s3 given the current temperature τ (whose value is 1), since there is a probability that the alternative
neighbour s1 were chosen (the order of evaluation of the available neighbours in random).
7 Real implementations of Tabu Search (such as those provided by FOM, cf. section §8.4) usually perform the exploration of the neighborhood through the generation of non-tabu moves. This modification
allows avoiding the enumeration of all the neighbours of current solution, improving the performance
significantly on large neighbourhood structures.
30
2.3. SINGLE-SOLUTION BASED METAHEURISTICS
Figure 2.3: Search paths generated by HC and SA
31
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Algorithm 4 Tabu Search
bestSolFound ← initialSolution
currentSolution ← initialSolution
nextSolution ← initialSolution
InitializeTabuMemory
{Main Loop}
repeat
nextSoluiton ← pickNeighbor (currentSolution)
{Neighbourhood Exploration}
for all x ∈ neighbors(currentSolution) do
if ¬tabu( x )∨ aspiration criterion is satisfied then
if f ( x ) > f (nextSolution) then
nextSolution ← x
if nextEval < bestEval then
bestSolFound ← nextSolution
end if
end if
end if
end for
updateTabuMemory
currentSolution ← nextSolution
until Termination Criterion is satisfied
return bestSolFound
2.4
P OPULATION METHODS BASED METAHEURISTICS
Population Methods manage simultaneously a set of solutions in the search space
to diversify the search. Two different population methods are used to solve the application problems shown in this dissertation: evolutionary algorithms and path relinking.
2.4.1
Evolutionary Algorithms
Principles of biological evolution have inspired the development of a set of metaheuristic optimization techniques called Evolutionary Algorithms (EAs). In EAs a set of
candidate solutions are combined and modified iteratively to obtain better solutions.
Each solution is referred to as individual or chromosome in analogy to the evolution of
species in biological genetics where DNA of individuals is combined and modified
along generations enhancing species through natural selection. EAs are among the
most widely used metaheuristics being applied successfully in nearly all scientific and
engineering areas by thousands of practitioners [17, Section D]. All EA variants, such as
32
2.4. POPULATION METHODS BASED METAHEURISTICS
Figure 2.4: Crossover operators with binary enconding
genetic algorithms or evolutionary strategies, are based on a common working scheme
shown in Algorithm 5. First, the initial population (i.e. set of candidate solutions to the
problem) is created. This initialization is performed usually by sampling randomly the
solutions space. Iterations on the remainder of the algorithm are performed until the
termination criterion is met.
In order to create offspring, individuals need to be encoded expressing its characteristics in a form that facilitates its manipulation during the rest of the algorithm. In
biological genetics, DNA encodes individual’s characteristics on chromosomes that are
used on reproduction and whose modifications produce mutants. For instance, classical encoding mechanisms on EAs are binary vectors encoding numerical values in
genetic algorithms (binary encoding) [17, Sec. C1.2] and tree structures encoding abstract syntax tree of programs in genetic programming [169].
In the main loop of the algorithm (see Algorithm 5), chromosomes are selected from
the current population in order to create new offspring. In this process, better chromosomes usually have a greater probability of being selected resembling the natural
evolution where stronger individuals have more chances of reproduction. For instance,
two classic selection mechanisms are roulette wheel and tournament selection [124]. When
using the former, the probability of choosing a chromosome is proportional to its fitness (objective function evaluation) determining the width of the slice of a hypothetic
spinning roulette wheel. This mechanism is often modified assigning probability based
on the position of the chromosomes in a fitness–ordered ranking. When using tournament selection, a group of n chromosomes is randomly chosen from the population
and a winner is selected according to its fitness.
Once parents are chosen, crossover is performed. It combines individuals and produces new individuals in an analogous way to biological reproduction. Crossover
mechanisms depend on the encoding scheme used but standard mechanisms are present
33
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Figure 2.5: Sample crossover and mutation for car design solutions
in literature for widely used encodings [17, Sec. C3.3]. For instance, two classical
crossover mechanisms for binary encoding are one-point crossover [139] and uniform
crossover [5]. When using the former, a random location in the vector is chosen as
break point and portions of vectors after the break point are exchanged to produce offspring. When using uniform crossover, the value of each vector element is taken from
one parent or other with a certain probability, usually 50%. Figure §2.48 shows an example of application of both crossover mechanisms with binary encoding. Fig. §2.5(a)
shows an illustrative application of crossover in our example of car design. An F1 car
and an small family car are combined by crossover producing a sports car.
At the mutation step, random changes are applied to the individuals. Changes are
performed with a certain probability where small modifications are more likely than
larger ones. This step is crucial to prevent the algorithm from getting stuck prematurely at a locally optimal solution. An example of mutation in our car optimization
problem is presented in Figure §2.5(b). The shape of a family car is changed by adding
a back spoiler while the rest of its design parameters remain intact.
In order to evaluate the fitness of new and modified individuals decoding is performed. It often happens that the changes performed in the crossover and mutation
steps create individuals that are not valid designs or break a constraint, this is usually
referred to as an infeasible individual [17], e.g. a car with three wheels. Once an infeasible individual is detected, this can be either replaced by an extra correct one or it can
8 Inspired
34
in fig. 2 of [43]
2.4. POPULATION METHODS BASED METAHEURISTICS
be repaired, i.e. slightly changed to make it feasible.
Finally, individuals are evaluated and the next population is formed in which individuals with better fitness values are more likely to remain in the population. This
process simulates the natural selection of the better adapted individuals that survive
and generate offspring improving species.
Algorithm 5 Evolutionary Algorithm
Initialize population
Encode population
bestSolFound ← decode( population[0])
for all chromosome ∈ Population do
if f ((decode(chromosome)) > f (bestSolFound) then
bestSolFound ← decode(chromosome)
end if
end for
{Main loop}
repeat
Parents ← crossoverSelection( Population) {Selection for Crossover}
O f f spring ← crossover ( Parents) {Crossover}
Population ← mutation( Population) {Mutation}
{Evaluation of new population and Offspring}
for all chromosome ∈ ( Population ∪ O f f spring) do
if f (decode(chromosome)) > f (bestSolFound) then
bestSolFound ← decode(chromosome)
end if
end for
{Selection of survival chromosomes (Next population)}
Population ← survivalSelection( Population ∪ O f f spring)
until Termination Criterion is satisfied
return bestSolution
Algorithm 5 shows the pseudo-code of an Evolutionary Algorithm. The crossover
and mutation sub-routines implement the operators on the populaion, taking as parameters and returning a set of chromosomes and being dependent on the specific
problem at hand and solution encoding used. The selection for crossover and survival
is performed through the crossoverSelection and survivalSelection sub-routines
respectively, and are independent of the problem at hand.
35
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
(a) A sequence of neighboring solutions are (b) Two different paths (red and black) between
generated iteratively until reaching t
s and t
Figure 2.6: Path generation in PR
2.4.2
Path Relinking
Path relinking is a search procedure proposed by Glover [119] that generates new
solutions by exploring trajectories that connect other solutions on the search space. The
set of solutions that are connected are named “the elite set”. It starts from the elite set,
choses an initiating or starting solution s, and generates a path in the search space that
leads toward other solution in the elite set, called guiding or target solution t. Thus
a sequence (s, x1 , x2 , . . . , xm , t) of neighboring solutions is generated from s to t, where
x1 ∈ n(s), x2 ∈ n( x1 ), . . . , t ∈ n( xm ). The best solution in the path is returned as result.
It is worth noting that given a starting and target solutions, multiple neighboring
paths between those solutions are possible, and as a consequence a criterion for choosing the neighbor leading to the target solution is needed. Figure §2.6(a) depicts the
process of path generation for a start solution s, to a target solution t. Figure §2.6(b)
shows two different neighboring paths from s to t.
Algorithm 6 shows the pseudo-code of Path Relinking. The pickNeighborTowards
sub-routine returns a random neighbour of the first solution provided as parameter,
approaching to the second solution provided as parameter. The exhaustive exploration
of the solutions neighbouring a solution s that approach to a target solution t is performed through the subroutine neighborsTowards9 . The criteria for path neighbour
selection in this specific algorithm is elitist, i.e., the best neighbor that leads towards t
9 This
pseudo-code has the disadvantage that for large neighbourhoods the exhaustive exploration
makes the search inefficient. In those situations, the technique is adapted to explore a fixed number of
randomly chosen approaching neighbours (using the pickNeighborTowards sub-routine
36
2.4. POPULATION METHODS BASED METAHEURISTICS
Algorithm 6 Path Relinking
Initialize elite set {Main Loop}
repeat
s ← pickStartSolution(eliteSet)
bestSolPath ← s
t ← pickTargetSolution(eliteSet)
nextSolution ← pickNeighbourTowards(s, t)
pathSize ← 0
while (distance(current, t) > 0) ∧ ( pathSize < maxPathSize) do
for all x ∈ neighborsTowards(current, t) do
if f ( x ) > f (nextSolution) then
nextSolution ← x
if f ( x ) > f (bestSolPath) then
bestSolPath ← x
end if
end if
end for
currentSolution ← nextSolution
pathSize ← pathSize + 1
end while
if f (bestSolPath) > f (bestSolFound) then
bestSolFound ← bestSolPath
end if
until Termination Criterion is satisfied
return bestSolFound
is chosen to create the path.
Figure §2.7 shows a concrete example of two paths between initial and final solutions in the context of the binary encoded car design. Each node (vector of bits) in the
path encodes a design that becomes gradually more similar to the guiding solution
(the differences are removed step by step).
2.4.3
Particle Swarm Optimization
This technique is a stochastic algorithm inspired by the behaviour of birds flocking and fish schooling. The algorithm iteratively modifies a population of solutions
(named the swarm), whose interactions are expressed as equations. Solutions in the
swarm are represented as particles in an n-dimensional space with a position and
speed. The original proposal by Kennedy and Eberhart has been applied successfully
to a variety of problems [54, 216]. This technique has been also adapted to support discrete variables, and different equations to rule swarm interaction have been proposed
37
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Figure 2.7: Paths between binary encoded solutions of the car design problem.
[49, 230, 278, 297].
2.4.4
Scatter Search
Initially proposed by Glover, it operates on a set of solutions, the reference set, by
combining existing solutions to create new ones. In contrast to other evolutionary
methods like genetic algorithms, scatter search is based on systematic designs and
methods, where new solutions are created from the linear combination of two solutions of the reference set, using strategies for search diversification and intensification.
2.5
B UILDING METHODS
Building methods work with incomplete solutions adding elements iteratively until
a feasible solution for the problem at hand is found. Metaheuristics of this class shown
in Figure §2.2 are GRASP and Ant Systems. Next we describe GRASP in detail since it
is used during the validation of this dissertation.
2.5.1
GRASP
The Greedy Randomized Adaptive Search Procedure (GRASP) ([89]) is an iterative optimization technique that has been successfully applied to a plethora of real life
applications and research problems [91]. Each iteration consist of GRASP consists of
constructing a solution and then applying an improvement procedure, typically a local
search method. Iterations are absolutely independent without any memory, and could
38
2.5. BUILDING METHODS
lead to creating the same solution. GRASP is efficient only if the construction step samples different promising regions of the search space, thus it creates a feasible solution
using a randomized greedy algorithm as shown next.
This algorithm is based on iteratively adding elements to a partial solution. At each
iteration of the construction phase, a Restricted Candidate List (RCL) is generated, containing a subset of the candidate elements that could be added to the current partial solution. The RCL is the key of GRASP, determining the greedy and stochastic behaviour
of the technique. The greedy behaviour is based on a greedy function g : E −→ R, that
is used to decide whether a candidate element e ∈ E is in the RCL or not. The stochastic
behaviour is caused by the random selection from the RCL of the element e to be added
to the current partial solution. The most usual criteria used to decide the elements in
the RCL are:
• Cardinality based: The RCL comprises of the p best elements (according to the
value provided by g for all the candidate the elements).
• Value based: The RCL comprises the elements ei that are better than a given
threshold t on g; i.e., ei ∈ RCL ⇒ g(ei ) ≥ t. This threshold is usually computed
based on a parameter α, and the maximum gmax and minimum gmin values of
g for the elements in E, where t = gmin + α · gmax − gmin . The parameter α
controls the balance between greediness and randomness of the algorithm, if α ≈
1 implies that the creation procedure is almost random, α ≈ 0 implies that the
g-best element will be chosen at each iteration.
Once an element is added to the partial solution, the RCL list is updated. The construction phase is extremely important for GRASP success ([222, 235]), since it must provide
a proper balance between diversification and intensification in the search. The factors
that affect this balance are the specific greedy function used and the RCL membership
criterion.
In GRASP algorithm (see Algorithm 7 the subroutine named greedyRandomizedSolution()
is used for implementing the construction of new solutions. This subroutine is described in detail in algorithm 8.
2.5.2
Ant Colony Optimization
This technique, a.k.a. Ant Systems (AS), was proposed originally by Dorigo and
Gambardella. It is a probabilistic optimization algorithm inspired by the food forag-
39
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
Algorithm 7 GRASP main loop
bestSolFound ← random()
{Main loop}
repeat
currentSolution ← greedyRandomizedSolution() {Construction}
currentSolution ← improve(currentSolution) {Improvement}
if f (currentSolution) > f (bestSolFound) then
bestSolFound ← currentSolution
end if
until Termination Criterion is satisfied
return bestSolFound
Algorithm 8 GRASP Construction Phase
currentSolution ← neutralSolution()
validElements ← {}
RCL ← {}
{Main Construction loop}
repeat
validElements ← elements(currenSolution)
RCL ← selectCandidateElements(validElements, g)
chosenElement ← selectElement( RCL)
addElement(currentSolution, chosenElement)
until isComplete(currentSolution)
return currentSolution
ing behaviour of ants that use a data structure called “pheromone trace” to support
communication between them.
2.6
M ETAHEURISTIC O PTIMIZATION F RAMEWORKS
Solving problems by using metaheuristics can be aided by the use of software tools,
that range from problem specification languages and editors with integrated solvers
(such as COMET, OptQuest, or OPL Studio) [132, 274, 279], to statistical analysis packages, such as SPSS or R[226]. For instance, when new problems are considered, metaheuristics should be implemented and tested, implying costs and risks. The Object
Oriented Paradigm has become a successful mechanism used to ease the burden of
application development and particularly, on adapting a given metaheuristic to the
specific problem to solve. Based on this paradigm, there are some proposals which
jointly offer support for the most widespread techniques, platforms and languages.
These kind of approaches are named Metaheuristic Optimization Frameworks (MOFs).
40
2.6. METAHEURISTIC OPTIMIZATION FRAMEWORKS
According to Parejo et al. [213], a MOF can be defined as:
“a set of software tools that provide a correct and reusable implementation of a set of metaheuristics, and the basic mechanisms to accelerate the implementation of its partner subordinate
heuristics (possibly including solution encodings and technique-specific operators), which are
necessary to solve a particular problem instance using techniques provided”.
Figure §2.8 depicts a conceptual map showing these elements and their relationships. In this figure, MOFs and its components are shaded.
Figure 2.8: MOFs conceptual map
MOFs not only provide a set of implemented techniques, they also simplify the
adaptation of those implementations to the problem. MOFs also provide additional
tools to help the whole optimization problem solving such as mechanisms to monitor the optimization processes, supporting tools to determine appropriate values of
parameters of techniques, and to identify the reasons that prevent techniques from
finding optimal solutions.
2.6.1
Why are MOFs valuable?
The No Free Lunch (NFL) theorem [300] claims that there is no strategy or algorithm that generally behaves better than another for the entire set of possible problems.
41
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
In Ho and Pepyne words “Universal optimizers are impossible” [135].
The NFL theorem has been used as an argument against the use of MOFs, since
there can be no universal optimal solver nor a software implementation of it [281,
Chapter 4, pp 82-83]. Nevertheless, frameworks are not intended to be a universal
optimal implemented solution, but tailorable tools that allow performing this implementation in an better way in terms of the implementation cost and effort.
The NFL theorem implies the need to “match” a problem and the optimization
technique used to solve it in order to obtain optimal or near optimal solutions. Metaheuristics allow performing such a matching by adapting its underlying heuristics.
The purpose of MOFs is the optimization of such adaption mechanisms in a more
reusable and easy way. Furthermore, when trying to solve a new problem without specific knowledge (with regards to well-known similar problems and their best-matching
techniques), it is even advantageous to use several metaheuristics to ensure a proper
matching to the problem. The benefits of using MOFs that implement several metaheuristics are even more obvious in this scenario.
The main advantage of using MOFs is that they provide validated, fully functional
and optimized versions of a set of metaheuristic techniques and their variants. They
also provide mechanisms to facilitate the proper implementation of the underlying
heuristics, depending on the problem, the representation of solutions, etc. As a consequence, we only have to implement those elements directly related to the problem,
freeing us, as far as possible, of worrying abouts the aspects that do not depend on it. In
addition, the use of MOFs decreases the risk of bugs in the implementation and therefore the time (and associated cost) invested in testing and debugging. Some MOFs provide additional features to aid solving the optimization problem, such as optimization
process monitoring and results analysis tools, capabilities for parallel and distributed
optimization tasks execution, supporting mechanisms for techniques parameters value
determination, graphical reports and user-friendly interfaces.
2.6.2
Drawbacks: All that glitters ain’t gold
MOFs also have some drawbacks. One is their steep learning curve. The user
needs to know the set of variation and extension points to use in order to adapt the
framework to the problem, and understand how they are related to the behavior of the
software. This means that when we exactly know which technique to apply and we
are confident in our implementation skills, using a MOF may be discouraging unless
42
2.7. SUMMARY
you have expertise in using it. Another drawback to consider when using MOFs is that
the flexibility to adapt the MOF is limited by its design. Consequently, a proper framework design is essential to achieve the most favorable balance between the capabilities
provided and its flexibility. This drawback implies that it could be impossible to implement certain variants or modify certain behavior when using a MOF, which is specially
serious in the context of research, where experimentation with different variants and
capability of customization is a key feature (cf. Sec. §2.6.1). An increased testing and
debugging complexity is a disadvantage resulting from the inversion of control (i.e.
loss of explicit control over the execution flow of our application) that involves the use
of a framework. Finally, the use of MOFs implies increasing the size of the software,
creating dependencies on third-party libraries, and an increase on the complexity of
the application.
Table 2.1: Pros and Cons of using MOFs
Advantages
Drawbacks
Reduced implementation burden and Steep learning curve
ability to apply various techniques and
variants with little additional effort
Additional tools to help problem solv- Advanced knowledge to make adaptaing (monitoring, reporting, parallel tions. Lack of flexibility to implement
and distributed computing)
variants of metaheuristics
Optimized and validated implementa- Induced complexity (when debugging
tion (except the extensions and adap- and testing) and additional depention created by users or the undetected dences
errors potentially present in the MOF)
Users with little knowledge can use The choice of the right MOF may be an
the framework not only as a tool for issue, since switching from one MOF
software application development en- to another has a high cost, they provide
vironment but as a methodological aid diverse features and there are no comparative benchmarks in literature
2.7
S UMMARY
In this chapter basic concepts about optimization has been presented. Next, Metaheuristics have been defined, showing its relationship with other types of optimization
problem solving techniques. In order to consolidate such concepts, the algorithmic
43
CHAPTER 2. OPTIMIZATION PROBLEMS AND METAHEURISTICS
scheme of the metaheuristics used in the applications and validation of this dissertation has been described. Finally, the types of software tools used for solving optimization problems with metaheuristics has been presented. At this point, we have focused
on MOFs, since they are probably the most widely used type of tool.
44
3
E XPERIMENTATION
The use of precise, repeatable experiments is
the hallmark of a mature scientific or engineering discipline
Lewis et al. in [173],
his chapter provides the basic concepts on experimentation required to understand the
contributions of this dissertation. First, Section §3.1 provides a definition of experiment and describes its life-cycle. Section §3.2 presents two sample experiments used
throughout this and subsequent chapters. Next, Section §3.3 introduces the basic concepts related to description of experiments including the variables, hypotheses and design. Section §3.4
describes the concepts and techniques used during the execution of the experiments and the
statistical analysis of their results. The concept of experimental validity is presented in Section
§3.5. Section §3.6 describes the specific issues of MOEs, and the usual types of experiments
performed in the context of the MPS life-cycle. Additionally, sections §3.6.3 and §3.6.4 describe
respectively some specific experimental designs and analyses proposed in literature for MOEs.
T
45
CHAPTER 3. EXPERIMENTATION
3.1
T HE CONCEPT OF E XPERIMENT
Experimentation 1 refers to a methodical procedure of actions and observations with
the goal of empirically verifying or falsifying a hypothesis [116, 154, 248]. Shadish
et al. [248] define experiment as “A study in which an intervention is deliberately introduced to observe its effect”. For instance, in the context of a medical experiment with
a new antipyretic (drug used to reduce the fever), the intervention would be to treat
feverish patients with the drugs, and the observation is to measure the reduction of
body temperature along time.
Related concepts are natural experiments and correlational studies [248]. Natural experiments are studies where the cause usually cannot be manipulated i.e., there is not
intervention, and the study contrasts a naturally occurring event such as the temperature of the ocean. Correlational studies (a.k.a. non-experimental or observational
studies) simply observe the size and direction of a relationship among variables without establishing a relationship of causality or generalizing the relationship out of the
observed data. Natural experiments and correlational studies are not experiments according to the definition provided by Shadish et al.
In this dissertation the concept of experiment is more general, similar to that used
in [116], including both natural experiments and correlational studies, namely a process
of systematic inquiry and data collection with the aim to confirm or disprove a hypothesis.
Thus, an experiment is not a static object, it is a process that flows through an ordered sequence of activities, from the formulation of the experimental hypothesis to
the drawn of conclusions regarding the hypothesis. Figure §3.1 depicts this process as
a life-cycle.
The first activity in the experimentation life-cycle is the statement of the experimental hypothesis. According to [116], this activity comprises of several steps: the
identification of the research problem or question, a survey of the literature to check if
the question has been answered by others, and the reduction of the research question
to a testable hypothesis. For instance this reduction will transform the question “is
the new antipyretic drug effective?” to a hypothesis such as “the average reduction of
corporal temperature in feverish patients, measured after 2 hour of the administration
1 There is not an absolute consensus about experimental terminology in the research community [116,
preface],[248, page 12]. This dissertation uses the terminology and definitions provided by Gliner et al.
in [116] for the basic concepts. The terminology defined in [248] is used for threats to internal validity,
since it is more appropriate for computational experiments.
46
3.1. THE CONCEPT OF EXPERIMENT
Figure 3.1: Experimental life-cycle
of a dose of 100mg of the drug, is statistically significant”.
The design of the experiment is the next activity. According to [116], it comprises
of: the selection a sample for the experiment, the identification of the instruments and
artefacts required for experimentation, and the creation of plan for data collection and
analysis. The output of this activity is thus a detailed plan -called experimental protocolthat tries to maximize the information obtained from the experiment.
Next, experimental artefacts are developed. In social sciences this activity usually
involves the creation of the forms used to collect information, and a database or some
computer support for storaging the information. In computational experiments this
activity also involves the implementation of the algorithms. Once all experimental
artefacts are available, the experiment is conducted and data is collected. Next the
data is analysed, usually through inferential statistics. Finally, based on the results of
such analysis, either conclusions are drawn or the experimental hypothesis and design
are modified leading to a new execution of the life-cycle to reach new conclusions.
47
CHAPTER 3. EXPERIMENTATION
3.2
S AMPLE EXPERIMENTS
Along this chapter two basic experimentation scenarios are used to illustrate the
basic concepts about experimentation. The first one is taken from the medical area,
and the other is taken from the specific area of application of this thesis -optimization
problem solving with metaheuristics-. This selection is intentional since our aim is to
show that experimental concepts presented in this chapter are general and potentially
applicable to any area of human knowledge.
Experiment #1.The goal of this experiment is to discern if a new drug can be used
as antipyretic and in which doses. A set of individuals with fever will be used in this
experiment to evaluate the effects of the drug. The reduction of body temperature of
patients will be measured, and the reduction induced by the new drug will be compared with the reduction generated by previous alternatives and placebos.
Experiment #2. The goal of this experiment is to compare the effectiveness of
several metaheuristic techniques in solving optimization problems in quality-driven
web service composition2 . In particular, the metaheuristics to be compared are Tabu
Search (TS), Simulated Annealing (SA), Evolutionary Algorithms (EA), Greedy Randomized Adaptive Search Procedure (GRASP) and GRASP hybridized with Path Relinking (GRASP+PR). For the experiment, each metaheuristic will be implemented in
an specific metaheuristic program. Then, each program will be run to find solutions
for a set of given optimization problems comparing their results.
3.3
E XPERIMENTAL D ESCRIPTION
In this section, we focus on the concepts regarding the initial activities of the experimental life-cycle. This mainly refer to the selection of the experimental objects and
subjects, hypothesis statement and experimental design. Figure §3.2 depicts the concepts described in this section and its relationships as a conceptual map.
3.3.1
Objects, subjects and populations
The first element to be defined in any experiment are the experimental objects (a.k.a.
experimental units). The experimental objects are the elements of interest that partici2 This
48
problem is described in Section §9.2
3.3. EXPERIMENTAL DESCRIPTION
Figure 3.2: Conceptual map about experimental description
pate in a particular experiment. In the social and biological sciences, those objects are
usually people, and are called participants or individuals. For instance in experiment
#1 experimental objects are sick people with fever. In experiment #2, the experimental
objects represent algorithm runs.
The target population of an experiment is defined as the set of experimental objects
to which we would like to generalize the conclusions obtained from the experiment,
i.e., the subject of the hypothesis of the experiment. The target population of experiment #1 could be all adult humans with fever. The target population of experiment #2
could be all the runs of an algorithm on any instance of the optimization problem to
solve. Since the target population of an experiment can be huge or even infinite, the
researcher usually uses a sample of it in the experiment. The sampling of the target
population can be divided in two phases. First the accessible population (a.k.a. sampling
frame) is defined. The accessible population is the set of experimental individuals that
could participate in the experiment. For instance, the accessible population of experiment #1 could be the set of patients diagnosed with fever in an specific hospital in the
province of Seville. The selected sample is the set of experimental objects that actually
49
CHAPTER 3. EXPERIMENTATION
participate in the experiment, taken from the accessible population. This selection can
be performed in a number of different ways, from random to convenience. For instance, the selected sample of experiment #1 could be 20 individuals with fever chosen
at random from the Seville Hospital.
Experimental subjects (a.k.a. experimenters) are the people who apply the methods,
techniques or treatments to the experimental objects. For instance, in experiment #1,
the specific doctors or nurses that administer the drug to the patients are the experimental subjects. In some cases, experimental subjects may influence the results of the
experiments and therefore they must be controlled during the design of the experiment, i.e., experimental subject are treated as a variable.
Figure 3.3: Experimental objects, populations and sample
3.3.2
Variables
An experimental variable is defined as a characteristic of the experimental objects or
of the experimental environment that can have different values. For instance, in experiment #2 the specific metaheuristic technique applied for solving a problem in a run is
a variable, since it varies between runs. When a relevant characteristic has only one
value in the context of an experiment, it is a constant (a.k.a. parameter). For instance,
in experiment #2 the termination criterion is the same for all the metaheuristics in the
experiment, thus it is a constant.
The roles that a variable can play in an experiment are outcome (a.k.a. output, response or dependent variable), and factor (a.k.a. independent variable or input). This taxon-
50
3.3. EXPERIMENTAL DESCRIPTION
Figure 3.4: Taxonomy of experimental variables
omy is depicted in Figure §3.4 as an UML class hierarchy.
The outcome is the presumed result of the experiment, and its value is used for
testing the hypothesis of the experiment. It measures or assesses the effect of factor(s).
For instance, in experiment #2 the value of the objective function for the best solution
found by each metaheuristic is the outcome. The outcome of experiment #1 is the
difference of body temperature after two hours of dose administration.
Factors can be classify into two types, controllable factors and non-controllable factors.
A controllable factor (a.k.a. active independent variable) is a variable whose value is
applied or given to the experimental objects. The values of this kind of variable are
usually controlled or manipulated in some way by the experimenter. A controllable
factor in experiment #1 is the dose administered to the patients. A controllable factor
in experiment #2 is the specific optimization technique applied in a run. The values
of a non-controllable factor are not changed during the study. For instance, the age and
the gender of patients in experiment #1 are non-controllable factors since they are not
modified by experimenters but are different among patients.
The different values of a factor could make the outcomes of the experiment not
comparable. When this risk exists, the experimental objects must be grouped into
blocks according to the value of the factor, which is used as a so-called blocking variable.
This creates homogeneous blocks that receive the same treatment along the experiment making the results comparable. For instance, in experiment #1, the effects of the
drug measured in children and adults could not be comparable. Thus, the age of the
patients could be used as a blocking variable dividing patients into two blocks, those
under 18and those aged 18 and older.
If a factor is of no particular interest in an experiment, but could be useful in subsequent replications or its impact on the response is unknown, it is called a nuisance
51
CHAPTER 3. EXPERIMENTATION
variable (a.k.a. extraneous variable). Nuisance variables must be ruled out or controlled in order to ensure the validity of the experiment. According to [116] a way to
to control the effect of this kind of variables on the conclusions of the experiments is
random assignment of experimental objects to experimental groups (this type of assignment procedure is described below). For instance, in experiment #1 the sex of patients is not interesting, but its impact on the effectiveness of antipyretics is unknown.
Consequently, the specific drug and dose administered to patients should be chosen
randomly. Randomization ensures that the effect of the sex factor is averaged among
the results of all the drugs and doses, reducing the bias introduced in the results.
The levels of a variable are the set of values that it can have in the context of the
experiment. For instance, in experiment #1, the levels of the “dose” variable could be:
0mg, 100mg, 200mg, etc. In experiment #2, the factor metaheuristic has five levels: EA,
TS, SA, GRASP and GRASP+PR.
One important characteristic about variables is whether the levels are unordered
categories or they are ordered from low to high3 . For instance, the levels of the “metaheuristic” variable in experiment #2 are not ordered, since they are essentially labels.
The variables whose levels are not ordered are said to be nominal variables. Conversely,
ordered variables have a set values that vary from low to high within a certain range.
Depending on the measurement scale of the levels, ordered variables can be divided
into ordinal and real variables. In ordinal variables, the levels are ordered from low
to high in a ranking, but the intervals between the various ranks are not equal. For
instance, in a F1 race the second place car may finish twenty seconds after the winner
but only a fraction of second before the third place car. Real variables (a.k.a. scalar
variables) not only have levels that are ordered, but also the values associated to those
levels are equally spaced. These variables are named rational or intervalar depending
on whether they have a real zero value or not . It is worth noting that the kind of
variable may not be directly related to the output of the mechanism used to measure
it or the nature and range of its value in the real world. In this sense, we can have an
ordinal variable, whose levels are “low” and “high”, but whose values are measured
as real values ranging from 0 to 50 (we must define the two intervals [0, X] and (X, 50]
that determine when a real value is high or low). In the same way, we can have two
variables that are both rational, but whose levels have integer and floating point values
respectively. The taxonomy of variables according to their levels is depicted in Figure
§3.5 as an UML class hierarchy.
3 Here
52
we describe the most common sorts of measurement scales following [88], [162] and [154]
3.3. EXPERIMENTAL DESCRIPTION
Figure 3.5: Taxonomy of experimental variables according to their levels
The experimental objects that have the same levels for factors are usually arranged
into groups. In this sense, the group denotes a specific set of individuals with specific
experimental conditions along the conduction process. The mechanism used to decide
the experimental individuals that pertain to each group, i.e., which treatment will be
applied to them, is denoted as assignment.
3.3.3
Hypotheses
The hypothesis of an experiment is a statement about its variables. This hypothesis can be related to an underlying theory that predicts that the hypothesis holds.
The objective of an experiment is to disprove or confirm the hypothesis. According
to Popper’s concept of scientific truth, the theories whose predictions have not been
disproved by experiments and for which no other alternative theory is available, are
considered as true [48]. Scientific hypothesis are divided into three types [116, p. 38]:
difference (or differential), associational and descriptive.
Differential hypotheses
Differential hypotheses state that there is a difference in the value of the outcome of
two or more groups, that is, the factors make a significant difference on the outcome.
For instance, a differential hypothesis for experiment #2 is that the specific metaheuristic used for optimization (controllable factor) has a significant impact on the value
of the objective function (outcome). Another typical differential hypothesis used in
medicine is that a specific drug has an impact on the symptoms of a particular disease,
e.g. whether the drug actually reduces the fever or not in experiment #1.
53
CHAPTER 3. EXPERIMENTATION
Associational hypotheses
Associational hypotheses state that there is a specific relationship between the levels
of two variables. For instance, a typical associational hypothesis is that of a linear dependence in the value of two variables x and y, where y = a ∗ x + b. In a possible
associational hypothesis for experiment #1 would be that “the decrease in body temperature is proportional to the dose”.
Descriptive hypotheses
Descriptive hypotheses state that the variables have some value, central tendency or
variability, and summarize the data obtained. In this sense, descriptive hypotheses
usually do not have factors, and their assertions refer only to the current sample; i.e.,
the target population, the accessible population and the sample are the same. A sample
descriptive hypothesis for experiment #1 is that the reduction of body temperature for
the specific individuals that participate in the experiment treated with the antipyretic
range between 0.4 and 2.6 degrees. This kind of hypotheses allows including as experiments correlational studies and natural observations, which are usually performed to
explore new questions, or are the only data available at the moment for one subject.
3.3.4
Design
Experimental design is what differentiates scientific and engineering experiments
from a careful natural observation [154, 248]. The main point of experimental design is
controlling factors. Montgomery defines experimental design as “the process of planning the experiment so that appropriate data can be collected and analysed by statistical methods, resulting in valid an objective conclusions (regarding the experimental
hypothesis)”. We further refine this definition of Montgomery based on the description
by Hinkelmann and Kempthorne [134] as follows: Experimental design is the specification
of (i) the actuations to be performed during experimental conduction -regarding the levels of the
variables involved-, (ii) the specific experimental objects on which they will be performed, and
(iii) the arrangement of (i) and (ii) with the aim to minimize experimental error and systematic
bias. Thus, an experimental design should specify:
• Selection of experimental objects. This is about selecting, from the accessible population, the experimental objects that will take part in the experiment. This is
usually done by means of the algorithm used to perform the selection, not by
enumerating the experimental objects explicitly. This algorithm is called the se-
54
3.3. EXPERIMENTAL DESCRIPTION
lection method. For instance, in experiment #1 the selection could be performed by
choosing randomly 50 individuals among all the feverish patients in the accessible population.
• Specification of variables and levels. This determines the factor variables that will
be modified by experimental subjects during the experimental conductions and
which levels will be set. The modification of the level of a factor variable is usually referred to as a treatment. For instance in experiment #1 the action associated
with treatments are the administration of the drug, and the levels of the corresponding factor “dose” could be 0mg i.e., a placebo, 100mg, and 200mg.
• Specification of treatments and groups. It specifies the experimental objects that will
receive each specific treatment. The set of experimental objects that receive the
same sequence of treatments are usually denoted as a group. This specification
is also performed by means of the algorithm used to perform the assignment
of experimental objects to groups, not by enumerating the experimental objects
explicitly. This algorithm is usually named the assignment method. For instance,
in experiment #1, the assignment method used to decide which treatments are
applied to each one of the 50 individuals in the sample could be random.
• Treatment and Measurement sequence. The specific sequence in which the experimental objects receive the treatments and the outcome variables are measured.
Those details are finally expressed in an experimental protocol.
It is worth noting that the experimental design determines the information that
will be gathered from the experiment and the capability of the subsequent analyses to
disprove or confirm the hypothesis. Thus, the consistency of the experimental design
with regards to the experimental hypothesis is a crucial characteristic of any experiment. Experimental design is also intimately related to the internal validity of the
experiment, since the specific arrangement of the treatments and measurements, and
the methods used for selection and assignment of experimental objects can avoid most
threats to validity (c.f. Section §3.5).
Principles of experimental design
In order to assure validity of the analysis, and to increase its capability for providing
clear conclusions regarding the hypothesis, experimental designs should fulfil three
basic principles [116, 134, 195]:
55
CHAPTER 3. EXPERIMENTATION
1. Repetition. This principle establishes that each treatment must be repeated on
different experimental objects a number of times. The pursued goal is reducing
the bias introduced by the specific characteristics of every single experimental
objects in the observations of the outcome variable. For instance, in experiment
#1, the effect of each specific dose and antipyretic drug should be measured on
several patients. Regarding how to determine the right number of repetitions,
some information about the treatments, experimental objects and distribution of
the outcome is need to determine how many repetitions to run. This information is known only to experimenters who are experienced in the experimental
domain. If such information is not available, one option is to use values accepted
in the literature of the domain. For instance, for simple comparison experiments
with normal distribution of the outcomes, a sample size of 30 experimental objects with a single treatment and measurement per object is widely accepted as a
reasonable minimum [248].
2. Randomization. This principle establishes that the decision on which treatment
should be applied to each experimental object must be made according to a random scheme. The goal pursued is to reduce the bias introduce when all the repetitions of a treatment are performed on individuals with similar characteristics.
For instance, in experiment #1 if the new drug would be administered to the
youngest patients only, the results of the study could be biased since they are
more sensitive to antypiretics.
3. Local Control or Blocking. The basic idea behind this principle is that when there
are factors that make the outcomes of the experiment non comparable, the selected sample should be partitioned into blocks as homogeneous as possible.
Within those blocks observations are comparable. The groups, treatments and
observations are then replicated for each block. For instance, in experiment #2
we should have one block per problem instance, since executions of algorithms
on different problem instances are not comparable.
Next, some classical experimental designs compliant with the principles are presented.
Complete randomized design
The simplest design for differential hypotheses is the completely randomized design. Given t treatments and N = tr homogeneous experimental objects. The N experimental objects are partitioned in t experimental groups of r elements, each group with
56
3.3. EXPERIMENTAL DESCRIPTION
a different treatment, and the experimental subjects are assigned randomly to an experimental group. The outcome is measured once the treatment is performed on each
experimental object. Thus the dataset generated contains N values for the outcome
in total, and r observations for each specific treatment. This design requires that the
assignment procedure of the experimental objects to groups and the specific treatment
applied to each group is random, and that there are no blocking variables.
For instance, let’s suppose that a single problem instance is going to be optimized
in experiment #2. In those circumstances a complete randomized design would be
appropriate. We could set the number of experimental objects (algorithm runs) to 100
with one group for each level of the factor “metaheuristic” (TS. SA, EA, GRASP and
GRASP+PR). Each group would comprise of 20 experimental objects (algorithm runs).
The specific order of execution of the runs would be chosen randomly.
Randomized complete block design
There are many situations with systematic variation among sets of experimental
objects depending on its factors. In such situations, a random design is not feasible
since the design should take this variation into account to “eliminate” its effect on the
conclusions making observations comparable. This leads us to the concept of local
control or blocking, introduced by Fisher [93]. For instance, in experiment #2 the variable “instance” has an strong impact on the value of the outcome “objectiveFunction”.
Consequently, the results obtained by techniques for different problem instances cannot be compared. Randomized complete block designs are used for experiments with
differential hypotheses and a single blocking variable. In a randomized complete block
design, the selected sample is divided into b sets of homogeneous experimental objects
called blocks. A complete randomized design is performed on each block. Thus the
dataset generated contains b ∗ N values for the outcome variable in total, and b ∗ r
observations for each specific treatment.
For instance, in experiment #2 a randomized complete block design could specify
a number of objects (algorithm runs) equal to 500. There will be 5 groups with their
corresponding levels of the factor variable “metaheuristic”. There will be 10 blocks,
one per level of the blocking variable “instance”. Each group (and the corresponding
algorithm) will comprise of 10 experimental objects (runs) chosen randomly. Thus,
the dataset generated contains 500 values for outcome variable “objectiveFunction” in
total, and 100 values for each specific optimization technique, and 50 values for each
specific problem instance.
57
CHAPTER 3. EXPERIMENTATION
Latin square
The Latin square design is used for experiments with differential hypotheses and
two blocking variables. It is used to compare t different treatments in a matrix with
t rows and t columns. The rows and columns actually represent two restrictions on
randomization. In this design a single treatment is used for each combination of levels
of each blocking variable. This means that the treatments of the elements for each row
and column of the matrix are different (no treatment is repeated per row and column).
For instance, in experiment #2 the techniques available to find solutions for the
optimization problem are TS, SA and EA. The experiment has a single factor, i.e., the
technique used to solve the problem. It also has two blocking variables, the specific
problem instance (I1 , I2 , and I3 ) and the termination criterion used: 100, 5000, 10000.
Two latin squares for experiment #2 are shown in table §3.1.
Table 3.1: Two Sample 3x3 latin squares for a technique comparison experiment
Factor1: Problem Instance
Factor2: Termnation Criterion I1
I2
I3
(MaxExecTime)
100ms
EA TS
SA
(MaxExecTime)
5000ms
TS SA
EA
(MaxExecTime) 100000ms SA EA
TS
Termination
Criterion
(MaxExecTime)
100ms
5000ms
100000ms
Problem Instance
I1
I2
I3
TS EA
SA
SA TS
EA
EA SA
TS
It is worth noting that latin squares are reduced (or incomplete) designs, in the
sense that not every treatment is applied under every combination of levels of the
blocking variables. Thus, its experimental protocol requires less treatments and measurements (optimization techniques runs in this example) than randomized complete
block designs, making experimental conduction faster and cheaper. In our example,
the latin square design requires 9 runs while the randomized complete block design
requires 27 runs (assuming that no repetition is performed, i.e., groups size is 1).
Factorial designs
Factorial designs are used when several factors are present in the experiment. For
instance, in experiment #1 there are two controllable factors drug and dose. The fac-
58
3.4. EXPERIMENTAL EXECUTION
tor drug could be nominal, with hypothetical levels “never-fever”, “colder-plus”, and
“bye-fever”. The factor dose could be scalar with levels 10mg, 50mg,. . . In this case,
a treatment consists of a combination of levels for drug and dose. The experimental
protocol of factorial designs varies the levels of each of controllable factor until considering all possible level combinations simultaneously. Factorial experiments are used
widely in scientific and industrial experimentation because they allow to evaluate the
effects each factor and their combinations (named interactions). A 2k factorial design
is a specific kind of factorial design where k factors with 2 levels are studied.
For instance, let us consider a slight modification of experiment #2. Instead of comparing different techniques we could compare different variants of EA. In particular,
let suppose we compare two alternative crossover operators (uniform and one-point)
and two alternative selection strategies (roulette and tournament). In this situation,
with two controllable factors, a factorial design is appropriate. Specifically we have a
22 factorial design describe the 4 possible tailorings of the EA. The experimental protocol would have 4 groups of equal size, i.e., one per possible combination of the levels
of the factors. The sequence of application of the treatments would be random.
3.4
E XPERIMENTAL E XECUTION
In this section, we focus on the concepts regarding the final activities of the experimental life-cycle, denoted as experimental execution. According to our definition of
the experimentation life-cycle, experimental execution comprises of two phases, experimental conduction and data analysis.
The description of an experiment is not enough for automating the replication of
the experiment. Even for experiments in computer science where experimental protocols are implemented as programs (such as metaheuristics optimization experiments),
more details are required to perform an automated conduction of the experiment. In
order to support such automation, a detailed description of the process of experimental conduction, including its inputs and outputs is required. Additionally, in order
to evaluate if an experiment could be replicated, additional details are required (such
as ranges for environmental and extraneous variables, human resources required for
replication, material equipment, etc.). Experimental conduction involves treatment
application and data collection. This activity should be performed in an unbiased and
objective way.
59
CHAPTER 3. EXPERIMENTATION
Analysis of data is the process of inspecting and modelling data with the goal of
discovering information, draw conclusions, and support decision making. Statistics is
the basic tool used to draw conclusions from the data retrieved during the data analysis
phase. The specific type of statistical procedure to be used for data analysis depends
strongly on both the type of hypothesis and the design of the experiment, and it is
usually performed in two phases: exploratory and confirmatory analyses.
Throughout exploratory data analysis is possible to detect mistakes in experimental conduction, check the assumptions taken during experimental design, and assess
the direction and rough size of the relationship among the experimental variables. In
turn, throughout confirmatory data analysis is possible to confirm or disprove the hypothesis of the experiment by means of statistical inferences.
3.4.1
Exploratory data analysis
There is a plethora of disparate techniques that are used for exploratory data analysis that can be classified in very different ways. A possible manner divides such
techniques into three groups: tabulations of the data, graphs (a.k.a. charts) and descriptive statistics. Graphs are visual representations of the sample, being histograms and bar
charts two of the most widely used graph type. However, there exists a wide graph
type offer, which is very useful for more specific purposes. For example, the so–called
scatter-plot is often used in case of experiments with relational hypotheses since it allows observing the degree of association between two variables.
Descriptive statistics can be further sub-classified into central tendency measures and
variability measures. Central tendency measures such as the mean, the median and the
mode describe the way the values of a sample cluster around some value. The mean
(a.k.a. arithmetic average) is appropriated when the sample follows a normal distribution in absence of outliers, and it is defined only for real variables. The median
(a.k.a. middle score) is more suitable than the mean when the frequency distribution
is skewed markedly or outliers are present, and it is defined for both real an ordinal
variables. Finally, the mode, the most common value in the sample, is usually the
least precise central measure for real and ordinal variables, but it can be computed for
nominal variables.
Variability measures describe the spread or dispersion of a given sample. Thus,
in an extreme case where all the scores in a distribution are the same, the variability
of the sample is zero. Standard deviation and inter-quartile range (distance between the
60
3.4. EXPERIMENTAL EXECUTION
25th and 75th percentiles) are the most commonly used variability measures for real
and ordinal variables respectively. For nominal variables, an usual variability measure
is the number of different categories present in the data and the percentage of the
samples in each category.
3.4.2
Confirmatory data analysis
Overview
Using the appropriate confirmatory data analysis technique is crucial to obtain true
conclusions for a specific experimental design and hypothesis. Table §3.2 shows the
relationship among hypothesis types, number of factors, and appropriate type of statistical technique used for confirmatory data analysis. Tables §3.3, §3.4, and §3.5 show
the specific statistical techniques to be used under specific circumstances. Those tables are based on the recommendations provided by Gliner et al. in [116], completed
in some cases with the recommendations provided in [68], [248], [195] and [134]. It is
worth noting than although tables §3.3, §3.4, and §3.5 provide a number of statistical
techniques and a complex selection framework, the casuistic and set of techniques to
be used is still incomplete. For instance, in the cases of the application of ANOVA
or Friedman tests, additional post-hoc procedures such as Bonferroni, Holm, etc., are
usually required to decide if a specific treatment is better than other (see [69] for a detailed description of the techniques to be used). This complexity is one of the main
motivations for our goal of automating this task of the experimental life-cycle.
Number of
factors
Zero
One
More
Differential
Basic STH (table §3.3)
Complex STH (table §3.4)
Type of Hypothesis
Associational
Correlation coefficients (table §3.5)
Complex correlation models (table §3.5)
Descriptive
Exploratory
analysis and
basic STH
Table 3.2: Statistical procedure decision table.
Table 3.3: Specific STH for basic experiments with a single independent variable
Experimental
Design
Type
and
distribution
of the
outcome
Real Normal
Real not-Normal
Ordinal
Nominal
two-levels factor
No blocking
Blocking
Independent
Samples t-Test
Mann-withney
Chi Square or
Fisher exact Test
Paired samples tTest
Wilcoxon or
Sign Test
McNemar
three-or-more-levels factor
No blocking
Blocking4
Oneway
ANOVA
Kruskal-Wallis
Repeated Measures ANOVA
Friedman
Chi Square
Cochran Q
61
CHAPTER 3. EXPERIMENTATION
Table 3.4: Specific STH for experiments with multiple independent variables
Type
and
distribution
of the
outcome
Experimental
Design
two-levels factor
Not blocking
Blocking
three-or-more-levels factor
Not Blocking
Blocking
Real Normal
Factorial
ANOVA
Log linear
Factorial
ANOVA
Log linear
Real not-normal
Ordinal
Nominal
Factorial ANOVA
(rep. measures)
Friedman
Friedman
-
Factorial ANOVA
(rep. measures)
Friedman
Friedman
-
Table 3.5: Regression coefficients and models
Type and dist. of indep. vars.
Type and
All Real and
Normal
dist. of the
Mixed
out. variable
All Real &
Normal
Single factor
Mixed
All Ordinal
or Nominal
Φ
or
CRAMER‘s
V
Φ
or
CRAMER‘s
V
Φ or CRAMER‘s V
Pearson or Spearman ρ
Bi-variate
or Kendall τ
Regression
Spearman ρ or Kendall τ
All Ordinal
or Nominal
More-than-three factors
All
Real Otherwise
Normal
Multiple Regression
Discriminant
Logistic
Analysis
Regression
According to table §3.2, the recommended technique for confirmatory data analysis depends strongly on the type of experimental hypothesis. For instance, for associational hypotheses with a single factor, correlation coefficients such as the Pearson product
moment correlation and the Spearman rho could be used. Providing a comprehensive description of each method described in tables §3.3,§3.4, and §3.5 is out of the scope of
this dissertation. Excellent descriptions of such procedures, their premises and application constraints, and numerous real examples are provided in [116], [195] and [134].
Since the types of MOEs taken into account in this dissertation use differential hypotheses (c.f. Sections §3.6 and §7.3) the primary analysis mechanism used in practice
is Statistical Testing of Hypotheses5 (STH). STH is applied to decide whether there are
significant differences between datasets of experimental results. Consequently, can assess if a metaheuristic, tailoring or tuning is better than other. STH is described in
deeper detail in the next subsection.
Statistical Testing of Hypotheses
STH work by defining two hypotheses, the null hypothesis H0 and the alternative hypothesis H1 . The null hypothesis is a statement of no effect or no difference,
whereas the alternative hypothesis represents the presence of an effect or a difference.
5 Confidence
62
intervals are also used, but STH is more popular
3.4. EXPERIMENTAL EXECUTION
Figure 3.6: Hypothesis acceptance and rejection areas
Thus, if H1 holds, then significant differences exist between groups of individuals, the
performance of algorithms, or the effect of a technique or methodology for software
development. Both hypothesis are mutually exclusive; i.e., if H0 holds then H1 does
not hold, and vice-versa
The result generated by most statistical tests is a p-value. A p-value is the probability of the observations provided as result of the experiment assuming that H0 is
true. A p-value provides information about whether a hypothesis test is significant or
not, and it also indicates something about how significant the result is: the smaller the
p-value, the stronger the evidence against H0 [69]. Decision making on the hypotheses
using statistical tests is performed by imposing a minimum threshold for the p-value
from which we consider that the null hypothesis H0 is false. This threshold is named
the significance level and denoted as α. Figure §3.6 shows a diagram that describes the
roles of α and the p-value in the decision making about the hypotheses.
The usual process for applying STH is [154, 299]:
1. Map the experimental hypothesis of the experiment into a pair of statistical hypotheses
(H0 and H1 ). The hypotheses are described in terms of the parameters of the distributions of random variables from which a sample can be obtained by conducting the experiment (the data-set to be analysed). As a consequence, researchers
must identify metrics that measure the variables that appear in the experimental hypothesis, and define the mechanism that will be used to instrument them.
For instance, in MOEs the performance of a metaheuristic is usually measured
by the value of the objective function for the best solution found in a run with a
specific termination criterion. Once the random variables are identified and its
instrumentation mechanisms are specified, statistical hypotheses H0 and H1 are
stated. The mapping between the experimental and the statistical hypothesis is
usually performed in terms of the parameters of the distributions of the random
variables. For instance in experiment #1 the null hypothesis H0 would state that
“there is no significant difference on the means of the distribution of values for
63
CHAPTER 3. EXPERIMENTATION
the objective function provided by the different techniques; i.e., that their performance is similar.”
2. Decide which statistical analysis procedure is appropriate (the specific test of hypothesis). The main factors affecting this decision are: the type of hypothesis, the
experimental design, and the nature of the data. Furthermore, this last factor is
in turn twofold: the type of the variables can be nominal (e.g. X, Y Z), ordinal
(e.g. good, fair, bad) or intervalar/ratio (e.g. 10.0, 5.0, 2.0), and the statistical
properties of the data-set. In case of non-gaussian distributions non-parametric
tests should be used.
3. Select a significance level (α). It is widely accepted that if the p-value is lower than
0.05, there is enough evidence to reject the null hypothesis H0 and assume that
H1 holds.
4. Compute the p-value. In order to compute the p-value, the value of the test statistic
T must be previously calculated from the dataset. Given the test statistic T and
the expected distribution of the data D, the p-value is computed.
5. Decide to reject null hypothesis. It is widely accepted that if and only if the p-value
is less than the significance level α the null hypothesis must be accepted. For
instance, in the example shown in Figure §3.6 the value of the p-value is not
below the selected α, thus there is no evidence for rejecting the H0 .
When STH is used to detect significant differences among two distributions (the
null hypothesis would be that the distributions are identical), they are called simple comparison tests. Conversely, when STH is used to detect significant differences
among three or more distributions (the null hypothesis would be that all the distributions are identical), they are called multiple comparison tests. The use of such a null
hypothesis in multiple comparison tests involves that the alternative hypothesis states
that there are at least one distribution that is different from the rest. If the null hypothesis in a multiple comparison test is rejected, then we know that are significant
differences but ignore among which of the distributions. Thus, in order to reach to
concrete conclusions about which specific distributions are different form others, and
additional type of statistical technique named post-hoc procedure must be applied.
Post-hoc procedures are a special kind of STHs, concerned with finding relationships among a couple of distributions from the associated multiple comparison test.
They control the accumulation of potential errors that derives for linking a sequence
64
3.5. EXPERIMENTAL VALIDITY
of statistical tests in order to provide a global significance level for all the comparisons
performed. For each specific of multiple comparison test (such as ANOVA or Friedman
test), a specific set of post-hoc procedures has been defined in the literature.
3.5
E XPERIMENTAL VALIDITY
The term validity refers to the approximate truth of a statement or inference. In
the context of experimentation, it is usually applied to the conclusions regarding the
hypothesis of an experiment. Thus, when an inference about the hypothesis (whether
that it holds or not) is valid, it means that there is sufficient evidence, both in the data
and the experimental process, to support the inference [248]. Validity is thus a property of inferences (or conclusions), it is not a property of designs or hypotheses. For
example, using a complete randomized design does not guarantee the validity of an
inference about the effectiveness of the antipyretic in experiment #1. There could be
many reasons invalidating the inference. For instance, in experiment #1 the nurses
could administer an erroneous dosis of the drug leading to wrong conclusions. For the
sake of simplicity and in accordance with most of the literature on the topic, we will
define an experiment as valid if it allows to draw valid conclusions.
Threats to validity are the specific causes the compromise the validity of a conclusion. In this dissertation the enumeration of threats to validity presented by Shadish
et al. in [248] is used with slight adaptations to our terminology. In general terms,
threats to validity can be dividid into two groups: internal and external threats. The
formers are those that affect to the validity of the conclusions of the experiment. The
later are those that put at risk the generalization of the conclusions obtained. In the
next subsections we list some of these threats.
3.5.1
Internal validity
The internal validity of an experiment is defined as the extent to which we can infer
that the hypothesis holds (or not) from the experimental process and data gathered.
Consequently, the threats to internal validity are those caused by the characteristics
of the experimental process and its setting. In the following, we describe some of the
most common internal threats reported in the literature [116, 248]. In particular, we
focus on those that could be automatically warned, detected or fixed. For a better understanding, each threat is presented by means of: i ) a brief definition, ii ) examples of
65
CHAPTER 3. EXPERIMENTATION
situations where the threat could appear in experiments #1 and #2, iii ) possible mechanisms to diagnose (i.e., detect) the threat, and iv) ways of neutralizing the threat. The
internal validity threats can be classified into four groups, describied in the following
subsections.
Threats caused by environmental factors and nuisance variables
IVT-1 Wrong temporal precedence
Definition: The measurements are erroneously taken before applying the treatment.
Example: In experiment #1, a nurse could measure the corporal temperature of a patient
before administering the drug. In experiment #2, an implementation bug could make
the program to return the initial solution found as a result, instead of the best solution
found during all iterations.
Diagnosis: This threat is difficult to diagnose in general. However, in computational
experiments, the execution environment can monitor the conduction of the experiment
ensuring that no measures are taken before the treatments.
Neutralization: This threat is neutralized fixing the problems that cause a wrong sequence of treatment-measurement and repeating the conduction of the experiment.
This cannot be done automatically in general.
ITV-2 History and environmental factors
Definition: External events influence the conduction of the experiment affecting the
outcome.
Example: In experiment #1 could happen simultaneously to a heat wave falsifying the
effect of the drug. In experiment #2, an operating system update could happen simultaneously to the execution of a program, decreasing the computational resources
available for its execution.
Diagnosis: Since the events that cause the bias on the outcome are not predictable nor
monitored in general, there is not a general mechanism for diagnosing this threat.
Neutralization: According to Shadish et al. and Gliner et al. the best approach to minimize this threat is the use of a random assignment and a randomized sequence of
treatment application.
66
3.5. EXPERIMENTAL VALIDITY
ITV-3 Testing effects
Definition: Treatments on an experimental object affect other treatments.
Examples: In experiment #1, the reaction to the drug could be different in patients that
had already taken the drug before. In experiment #2 the effect of memory caches and
predictive executions in modern processors could lead to an improvement of the results provided by some techniques.
Diagnosis: To be the best of our knowledge, there is no methods to diagnose this threat.
A tentative approach would be to compute the correlation of the values of the outcome
with the index of the treatments related to the measurements in the sequence of the
experimental protocol (globally, per block, and per group). If there is an strong correlation between the values, then the conclusions of the experiment could be threatened.
Neutralization: Again, the recommended approach to minimize this threat is the use of
a random assignment and a randomized sequence of treatment application [116, 248].
ITV-4 Instrumentation effects
Definition: The way in which a variable is measured has an effect on its value.
Example: In experiment #1, the conclusions would be threatened if the instrument used
for measuring body temperature could get deteriorated during the experiment providing different measurements. In experiment #2, this threat could be a risk if the solutions
generated by the metaheuristic program were dynamic, changing along time.
Diagnosis: This threat can be diagnosed by obtaining several measurements prior and
after the conduction of the experiment. Another diagnosis strategy for this threat is using multiple instrumentation artefacts during the experimental conduction. For more
approaches to diagnose this specific threat to validity c.f. [116, chapter 11].
Neutralization: To the best of our knowledge the only way to neutralize this threat
is to use different artefacts for the measurement. This strategy cannot be automated
in general. The use of multiple instrumentation artefacts and its random assignment
to perform the measurements during experimental conduction can also mitigate the
effects of this threat.
67
CHAPTER 3. EXPERIMENTATION
Threats caused by the characteristics of groups
ITV-5 Heterogeneity of experimental objects
Definition: Experimental objects are not homogeneous. As a result, the effects of the
treatments on the outcomes are confounded with that of the specific levels of the noncontrollable factors of the experimental objects. This threat is also referred to as assignment and selection bias [116].
Examples: In experiment #1, if babies and adults are mixed in the groups the effect of
the drug could be confounded due to the difference in the relation between weight and
dose; i.e., in babies weighting 10 kilograms a dose of 100mg can cause a much bigger
effect than in adults weighting 100 kilograms.
Diagnosis: This threat arises if the experimental description contains non-controllable
factors that are not used in the blocking of the design. This diagnosis should be interpreted as a warning for the experimenters. Another approach would be to measure
the correlation between the levels of non-controllable factors not present in the blocking criterion and the outcome. If there is a strong correlation between the variables the
conclusions drawn are threatened.
Neutralization: The only way to minimize this threat is to reconduct the experiment
introducing the non-controllable factors as blocking criteria.
ITV-6 Attrition
Definition: The measurement of outcomes for some experimental object fail, are lost,
or become impossible to obtain during experimental conduction. This threat is also
referred to as mortality.
Examples: In experiment #1 some patients could die during the conduction of the experiment, leading to no measurements of the outcome. It is worth noting, that certain
levels of attrition are acceptable (but not desirable) in experimental areas such as biology, medicine, etc.
Diagnosis: Based on the experimental description the expected number of outcome
measurements can be computed for classical designs. Since the measurements of the
outcome variable should be provided as part of the lab-pack, it is possible to diagnose
this threat by comparing the expected and actual number of measurements.
Neutralization: The only way to neutralize this threat is repeat the conduction of the
68
3.5. EXPERIMENTAL VALIDITY
experiment. For MOEs this mechanism can be automated.
Threats related to the statistical analysis of the data
ITV-7 Small sampling
Definition: Statistical tests do not recognize as statistically significant the differences in
the observations when the sample is very small. This threat is also referred to as low
statistical power.
Example: In experiment #1 if the number of feverish patients in the Hospital is small,
for instance 5, the experiment conclusions would be threatened since specific and rare
characteristics of one or two of the patients could lead to wrong conclusions.
Diagnosis: A simple diagnostic mechanism for this threat is to check if the size of the
sample is larger than a minimum. Historically, authors have used 30 experimental
objects as such minimum for simple comparisons.
Neutralization: According to [116, 248] there are several ways to increase the statistical
power, but the most usual is to increase the size of the sampling. In MOEs, this can be
usually automated by increasing the number of run of the metaheuristic programs.
IVT-8 Violations of the preconditions of statistical test
Definition: The preconditions of the statistical tests are not met and them conclusions
drawn from them are erroneous.
Example: In experiment #2, depending on the distribution and characteristics of the
data, the recommended statistical test for the experiment according to table §3.3 would
be either a T-test or a Wilcoxon test. If the data is not normal and the T-test is applied,
the preconditions of the T-test would be violated and the conclusions could be erroneous.
Diagnosis: Most of the preconditions of statistical tests, such as normality or homoscedasticity of data, can be evaluated through other statistical tests. Thus, those auxiliary tests
can be used to check whether certain preconditions are fulfilled or not.
Neutralization: In order to neutralize this threat, the statistical analysis must be repeated
using statistical tests whose preconditions are not violated. The automation of this
mechanism involves selecting and running the statistical test automatically.
69
CHAPTER 3. EXPERIMENTATION
IVT-9 Fishing and error rate
Definition: Several comparisons among pairs of observations are performed using simple comparison tests. As a result the error rate is accumulated and the conclusions of
statistical tests are misleading.
Example: In experiment #2 up to five optimization techniques are compared. If we
apply a simple comparison tests, such as the T-test with α = 0.05 to evaluate the significance of the differences for each couple of techniques, we would perform 10 simple
comparisons. The probability of error of the whole set of comparisons is not 0.05, but
significantly higher [69]. Thus, a multiple comparison tests with post-hoc procedures
should be used.
Diagnosis: The automated diagnosis of this threat requires the analysis of the experimental description to determine the number of comparisons to be performed and the
type of the test specified.
Neutralization: In order to neutralize this threat, the statistical analysis must be repeated using appropriate multiple comparison statistical tests and post-hoc procedures. Again, the automation of this mechanism involves the automated selection and
execution of the statistical test.
IVT-10 Restriction of range
Definition: Variables are restricted to a narrow range that do not match the actual domain of the observations.
Example: This threat is equivalent to the out of range errors and precision losses due
to castings in programming languages. For instance, in experiment #1 we could use
[36, 40] as the valid range for the patient temperature. If we get a measurement of 41, it
would be truncated during the analysis of the data leading to erroneous conclusions.
In experiment #2, if the objective function is real with a range between [0, 500.000], but
the levels specified for the experimental variable are in the range [0, 10] (meaning that
measurements of values bigger than 10 are interpreted as a level of 10), it is probable that almost any observation were associated with the level 10. Consequently, the
statistical test could miss actual differences in the distributions of observations of the
variable.
70
3.5. EXPERIMENTAL VALIDITY
Diagnosis: This threat can be diagnosed chekcing that all the values registered at the
end of the experiment are in the ranges defined for each variable. We are not aware of
any diagnosis method for the cases in which the values are in the range but that range
does not match the actual range of the observations.
Neutralization: The experiment should be repeated using an an adequate range for the
variables.
IVT-11 Inaccurate size estimation effect
Definition: Some statistical tests overestimate or underestimate the differences in the
observations depending on the type of variable. This leads experimenters to draw
wrong conclusions about the experimental hypothesis.
Example: In experiment #2, we could use a Quade’s test for multiple comparison. However, it is well known [69] that this test overestimates the differences of results between
techniques when problem instances are very different. See [248, page 52] for some
additional examples.
Diagnosis: The diagnosis of this threat depends on behaviour of the specific statistical
test or post-hoc procedure and on the specific experiment. For instance, the use of
Quade’s test could be admissible if the difference of behaviour for the techniques with
different problem instances would be of paramount importance for solving the optimization problem. However its use is not recommended in general. As a consequence,
the diagnosis of this threat should be interpreted as a warning for experimenters, that
should be confirmed.
Neutralization: In order to neutralize this threat, the statistical analysis must be repeated
using appropriate multiple comparison statistical tests and post-hoc procedures.
Threats caused by the the characteristics of the experimental conduction
IVT-12 Unreliability of treatment implementation
Definition: The implementation of the treatment on experimental object is erroneous.
Example: In experiment #1 some nurses, could forget administrating the drug to several
patients, or could administer a different dose.
71
CHAPTER 3. EXPERIMENTATION
Diagnosis: This could be diagnosed by analysing the variance and the outliers of the
distributions of measurement per instrumentation artefact, block and group. However,
this method is applicable when groups have a minimum size, and the variance of the
distribution is small.
Neutralization: If unreliable measures are detected as outliers, the filtering of those measurements is a valid neutralization mechanism. Otherwise, removing the threat could
require the conduction of the experiment with different instrumentation artefacts.
3.5.2
External Validity
The external validity of an experiment is defined as the extent to which conclusions
can be generalized. Thus, most external validity issues are related to experimental
objects, settings, treatments or outcomes that were not studied in the experiment [60,
61, 248].
Threats to external validity
The threats to the external validity of the experiment are those that could affect
the way in which the conclusions are generalized from the experimental sample to
the target population. Next, we enumerate those proposed by [248] and [116]. The
information regarding the diangosis and neutralization of external validity threats are
omitted since it is out of the scope of this work.
Interaction of experimental objects with factor effects: A conclusion drawn with a
sample could not be extrapolated to a different sample. In experiment #1, the reductions of body temperature observed with specific feverish patients from Sevilla could
be different form those observed on a different set of patients.
Interaction of treatment implementation and factor effects: The effect observed for
a treatment could vary depending on the specific details of their application. In experiment #1, the effect of the drug could be different if patients gulp it with water or
not.
Interaction of factor effects and outcome variable: The effect observed for a treatment
could depend on the specific outcome variable measured. In experiment #2, the effect
of the drug could be different if we measure fever through the amount of sweat.
Interaction of factor effects and experimental setting: The effect observed for a treatment could depend strongly on specific elements of the experimental setting or the
72
3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS
context of the experiment. In experiment #1, the reduction of body temperature could
depend on the temperature of the room where the treatments were applied.
3.6
M ETAHEURISTIC O PTIMIZATION E XPERIMENTS
Metaheuristic optimization experiments have some characteristics that make them
even more time consuming and difficult to set-up than other computational algorithmic experiments. First, the stochastic nature of the algorithms makes it necessary a
high number of runs per group to get significant results, leading to long-running experiments. Second, even using a MOF a significant development effort is needed. This
development effort is due to the need of adapting the techniques to the problem at
hand (by using the extension points of the MOFs that encode the tailoring points of
the implemented metaheuristics), and implement the experimental procedure, thus
we will have bugs that are usually detected during the execution of the experiment
or in the data analysis of the results. As a consequence, experiments are usually run
several times until reaching to valid results. Third, the high number of variants and
parameters that the different techniques present lead to experimental variables with a
high number of levels and complex designs. Finally, one experiment usually requires
the realization of subordinate experiments, magnifying the effect of the previous characteristics. For instance, in technique comparison experiments the performance of the
techniques depends strongly on the parameter values used. As a consequence, in order to perform a fair comparison, one subordinate experiment per technique must be
performed, in order to find its optimal parameter configuration [21]. Next we describe
the types of MOEs taken into account in this dissertation to support the MPS life-cycle.
3.6.1
Selection and Tailoring experiments
Given an optimization problem P = ( f , A) and a set of metaheuristic algorithms
M = { M1 , . . . , Mn }, the goal of this experiment in to determine the techniques that
perform better than all the others; i.e., find the best techniques to solve P. Each metaheuristic algorithm is run with a set of parameter values that is constant along the
experiment. Each metahuristic Mi generates a set of solutions Si = {si,1 , . . . , si,nruns },
where si,j ∈ A is the solution generated by the j-th run of Mi .
This notion of better performance usually operationalized as the maximum average
value on f of the solutions provided by each technique in maximization problems, i.e.,
73
CHAPTER 3. EXPERIMENTATION
|S |
∑ i f (si,j )
avg( f , Si ) = j=|1S | . Thus the goal of the experiment is
i
such that ∀( Mi ∈ M∗ , Mk < M∗ ) • avg( f , Si ) > avg( f , Sk ).
finding the subset M∗ ⊆ M
The experimental object in this experiment are runs of metaheuristics in M solving
P. There is one single controllable factor (we name it “technique”), whose levels are
labels corresponding to the different metaheuristics {0 M10 , . . . ,0 Mn0 }. The outcome is the
value on f of the solution provided by each experimental object (algorithm run).
The hypothesis of this type of experiment is differential, stating that the specific
metaheuristic used to solve the problem has an impact on the value on f of the solutions provided. In statistical terms, the null hypothesis H0TC of this type of MOE states
that “there is no difference in the mean value on f for the populations of solutions
generated by the metaheuristics”. The alternative hypothesis H1TC states that “there is
a significant difference in the mean value on f for the populations of solutions generated by the metaheuristics”, i.e., that there are some techniques that perform better
than the rest.
Usually, this comparison is performed not for a single problem instance, but for a
finite set I = { I1 , . . . , Im } of problem instances. Since the specific problem instance to
be solved can have an important impact on the outcome of the experiment, it is treated
usually as a blocking variable playing the role of nuisance factor.
The design of this type of MOE is usually a blocking factorial, where each metaheuristic Mi is executed nruns times on each problem instance Ik . The analysis for
testing the statistical hypothesis depends on the number of metaheuristics in M. Since
one single factor is present, the techniques described in table §3.3 are used.
When we compare a pair of metaheuristics (| M| = 2), then we have a simple comparison experiment, and the recommended analyses are the T-Test and the MannWithney or Wilcoxon’s Tests if some of the premises of the T-test are violated (typically
normality, but also sphericity or homoscedasticy, and independence). If the statistical
test rejects the null hypothesis we conclude that some metaheuristics are better than
others, and use the means computed to determine which the best is.
When comparing three or more metaheuristics, then we have a multiple comparison experiment. The recommended analyses are ANOVA and Friedman’s test if some
of the premises of the ANOVA are violated (typically normality, sphericity or homoscedasticy, and independence). If the statistical test rejects the null hypothesis we
conclude that some metaheuristics are better than others. In this case we need to
74
3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS
perform post-hoc tests to determine among which metaheuristic are significant differences, and determine those that perform better [69].
A slight variant of this kind of MOE is when several tailorings (algorithms) generated for the same metaheuristic are compared. The hypotheses, experimental objects,
design and analyses are similar to those used for selection experiments, hence the generalization as technique comparison experiments.
3.6.2
Tuning Experiments
In this type of MOE a single metaheuristic algorithm Mx having a set of parameters
{ρ1 , . . . , ρn } with specific domains { D1 , . . . , Dn } is used. The goal of the experiment is
to find an optimal parametrization of Mx , i.e., the parameter values V ∗ = (v1∗ , . . . , v∗n )
that provide the best performance for solving P with Mx where vi ∈ Di , i = 1, . . . , n.
The experimental object in this experiment are runs of Mx solving P. The otucome
is the value on f of the solution provided by each experimental object (algorithm run).
There is one controllable factor per parameter, and the levels of such variables depend
on the specific domain of each parameter.
The comparison is usually performed not for a single problem instance, but for a
finite set I = { I1 , . . . , Im } of problem instances. Since the specific problem instance to
be solved can have an important impact on the outcome of the experiment, it is treated
usually as a blocking variable playing the role of nuisance factor.
The main difference between both types of experiments is that the number of factors and their levels can be much higher for Technique Tuning Experiments than for
Technique Comparison Experiments. Furthermore, the domain of some parameters in
Technique Comparison Experiments can have an infinite number of values, leading to
its discretization by the experimenter, to the use of complex designs [19, 29, 50, 237],
or to carrying out a number related Technique Tuning Experiments for performing the
Tuning stage of the MPS as proposed in [22].
For the purpose of this dissertation, a simple blocking factorial design is chosen for
this kind of experiments. Problem instance is usually treated as a blocking variable
due to the specific problem instance may strongly impact on the outcome. However,
the contributions of this dissertation are extensible and in most cases can be adapted
to alternative designs. In the remainder of this subsection the use of this design is
assumed.
75
CHAPTER 3. EXPERIMENTATION
The null hypothesis of this type of MOE states that “there is no difference in the
mean value on f for the populations of solutions generated by the any of the parametrizations chosen”. The alternative hypothesis states that “there is a significant difference in
the mean value on f for the populations of solutions generated by the metaheuristics”,
i.e., there are some parametrizations that perform better than the rest.
The analysis for testing the hypotheses is similar to that specified for Technique
Comparison Experiments when using a blocking factorial design, but in this cases multiple factors are present, leading to the use of the techniques described in table §3.4.
3.6.3
Designs for MOEs
The experimental designs used for selection and tailoring experiments are usually
classical designs. When the experiment has a single controllable factor, complete randomized designs are used (with blocking when non-controllable factors are present),
since the problem instance is usually a blocking factor the most usual design in this
case is randomized complete block design. In a very similar way, when the experiment has multiple controllable factors (such as in tailoring experiments with variants
in multiple tailoring points), factorial and factorial blocking designs are applied.
Tuning experiments present a much wider diversity of experimental design. Although, factorial and factorial blocking designs are usually applied, several of complex
designs for this kind of MOE has been proposed in literature. The most relevant proposals in this area are: Response surface designs [37, 237], SPO (Sequential Parameter
Optimization) [21], and racing algorithms [19, 29, 181] (interested readers can find a
comprehensive survey in [24]). It is worth noting that most of those proposals do not
provide an explicit experimental protocol, but algorithm that decides the experimental
protocol at condution time based on the observations obtained at each moment.
3.6.4
Analyses for MOEs
MOEs present some specific characteristics that had led to the development of specific way for deciding which specific statistical analysis techniques should be used. In
the specific context of computer science, the assumptions enumerated above are less
likely than in natural or social sciences such as biology and psychology where variables are usually normal [116]. In this sense, MOEs require a wide spectrum of tests of
hypothesis.
76
3.6. METAHEURISTIC OPTIMIZATION EXPERIMENTS
Furthermore, multiple comparison tests have one important drawback: they can
only detect significant differences over the whole set of data; i.e., the test detects that
there are significant differences between any of the multiple compared groups, treatments or algorithms, but it does not identify between which ones. One could think
of applying simple comparison tests to detect the differences between every pair of
variables in the dataset but this process introduces an important error risk. This error
comes from the combination of many pairwise comparisons, that involves increasing
the probability of rejecting one or more true null hypotheses (in statistical terminology,
this error is denoted as the Family Wise Error Rate) [111]. In these circumstances the
use of post-hoc procedures is recommended. Useful practical considerations for nonparametric multiple comparisons test are provided in [69] including recommendations
for the selection of the post-hoc procedures to be applied.
3.6.5
Threats to validity in MOEs
Experimental conduction in MOEs implies the repeated execution of different metaheuristic programs to find solution on each problem instances. The assignment of the
executions to metaheuristic programs and problem instances is usually randomized
and automated, thus avoiding human bias. Furthermore, since experimental objects
are metaheuristic executions on specific problem instances, no risk of mortality or attrition exists.
However, even with such an automated experiment conduction and assignment,
there exists risk of maturation, testing or memory effects. For instance, the effect of
memory caches and predictive executions in modern processors could lead to an improvement of the results provided by the programs if their execution is repeated sequentially. In order to minimize the impact of such effects on the results of the experiment, the order in which metaheuristic programs are run on the different problem
instances is randomized. MOEs are also affected by all the threats regarding to the
validity of the analysis described in section §3.5.2. For instance, fishing and error rate
threats are also important, hence the need of using multiple comparison tests and posthoc procedures.
Additionally, history threats (also called extraneous environmental events in [116]) can
also affect MOEs. This threat is caused by the concurrency of events that affect the experimental environment with the experiment conduction, impacting on the set of outcomes of the experiment. For instance, an automated operating system update could
be performed on the experimentation computer while the experiment is carried out. In
77
CHAPTER 3. EXPERIMENTATION
this situation, the measured performance of the techniques could be affected, depending on the specific stop criteria used in the experiment.
Finally, since metaheuristic algorithms must be implemented as programs MOEs
are threatened unreliability of treatment implementation (implementations can have
bugs).
78
PART III
C ONTRIBUTIONS
4
M OTIVATION
Every problem has in it the seeds of its own solution.
If you don’t have any problems, you don’t get any seeds.
Norman Vincent Peale,
1898 – 1993
American author and minister
n this chapter, we motivate our contributions. After a brief introduction to the chapter
goals in Section §4.1, we describe the problems addressed in this dissertation and the
current solutions found in the literature in Section §4.2. Section §4.3 outlines the main
contributions of this thesis relating them to the problem that motivated our work. Finally, a
summary of the content closes the chapter in Section §4.4.
I
81
CHAPTER 4. MOTIVATION
4.1
I NTRODUCTION
The use of metaheuristics to solve optimization problems is a largely studied topic
in the field of computer sciences. In this context, software engineers recently realized
of the benefits of using metaheuristics to solve hard optimization problems, usually
referred to as search-based problems. This has led to a “search-based” trend observed in
a number of software engineering conferences and special journal issues on the matter
[105, 129, 130, 131].
However, despite its many benefits, the application of metaheuristics requires overcoming numerous obstacles. First, the implementation of metaheuristic programs is a
complex and error-prone process that require of knowledgeable developers. Although
some supporting tools have been proposed, these usually automate only single parts
of the MPS life-cycle.
Furthermore, a key challenge on the application of metaheuristic is experimentation. This is due to the fact that there is no analytical method to choose a suitable
metaheuristic program for a given problem. Instead, experiments must be performed
to compare the candidate techniques and their possible variants. This can lead to hundred of potential alternatives to be compared making the design, execution and analysis of experiments complex and time-consuming. Besides this, experiments are usually
performed ad-hoc with generic tools and no clear guidelines [21] introducing threats
to validity and making them hardly automated and reproducible.
The goal of this chapter is to motivate the need for specific tools that reduce the
cost of using metaheuristics, especially for those not familiar with them as software
engineers.
To that purpose, we follow a software engineering approach and present a set of
generic and extensible tools to support most of the activities of the MPS and MOE
life-cycles. These contributions are the results of an exhaustive analysis of the state
of the art as well as our own experience in solving optimization problems in the area
of software engineering. We trust that our work may contribute to the progress of
metaheuristics in general and search-based software engineering in particular.
82
4.2. PROBLEMS
4.2
P ROBLEMS
The following subsections describe the problems that motivate the contributions
presented in this dissertation and the related works found in the literature.
4.2.1
On the implementation of MPS applications
There exist tens of different MOFs available for researchers and practitioners. Each
MOF supports a subset of metaheuristics techniques and phases of the MPS life-cycle
resulting in a quite heterogeneous set of features. This hinders the comparison of MOFs
and therefore the selection of the right framework for a given optimization problem. As
a result, the use of a MOF usually involve reviewing extensive documentation (or code)
in order to find out the features that it supports in terms of metaheuristics techniques,
tailoring mechanisms, formats, parallelization, hybridization, etc. This is a tedious
and time-consuming task, especially for unexperienced users, that increases the cost of
using metaheuristics.
Comparisons frameworks in the literature about MOFs are either informal evaluations using author criteria or focus on performance [298]. Gagnè and Parizeau present
a comparison of MOFs but only those supporting evolutionary algorithms [107]. In
[281], Voß and Woodruff present a constructive discussion of various software libraries
but there is a lack of a comparative analysis. Finally, some articles such as [41, 71],
present a concrete MOF and include a related work section comparing their tool with
others. However, those works present a narrow perspective with a brief comparison
of a few MOFs.
4.2.2
On the description of MOEs
One of the main obstacles to automate the execution, validation and replication of
experiments is the lack of rigorous and detailed experimental descriptions. Although
many papers report an experimental setup, this is usually an informal description in
natural language were many details are omitted or vaguely described. Instead, as
presented in Chapter §3, experimental descriptions should include information about
the experimental objects, subjects and population as well as definition of the variables,
hypothesis and design protocol.
The lack of means to describe experiments also affects to their validity. In many
83
CHAPTER 4. MOTIVATION
cases, researchers and practitioners are not even aware of the different types of threats
to the validity of their results. In fact, not all authors describe the threats to validity of
their experiments and only a few distinguish between internal and external validity. In
any case, it is up to the user to detect the threats of his experiments and to implement
the necessary measures to neutralize them. This is a manual task that requires a deep
knowledge of experimentation, and it is therefore out of the scope of many users including software engineers. This slows down the experimentation process and affects
to its validity hindering the progress of research disciplines.
Current solutions for describing experiments rely on the so-called Experiment Description Languages (EDLs). EDLs can be categorized as descriptive and operational. Descriptive languages allow describing experiments ensuring that a minimum set of sections and element descriptions are provided and they rely on experimenters for interpreting the description and using it for replication. Thus, descriptive languages
hinder automation and expose the experimentation process to human errors during
conduction, due to mistakes or misinterpretations of experimental descriptions. An
example of generic descriptive language is EDL [253], that provides a basic XML syntax to organize the description of experiments in any scientific area. Tons of domain
specific formats and guidelines for experimental descriptions that can be regarded as
descriptive languages have been proposed in specific areas. For instance, a scheme for
experimental description in software engineering is provided by Juristo and Moreno
in [155].
On the contrary, operational languages such as SED-ML [167], allow describing
experiments ensuring its automated replication. To achieve it, operational languages
must be allowed specifying numerous details, which is the main reason why most proposals of operational languages are domain-specific. For instance, SED-ML is intended
for simulation experiments and PEBL [196, 250] for the creation of form-based experiments in psychology. To the best of our knowledge, previous proposals of EDLs for
MOEs are limited exclusively to the proprietary formats defined by the MOFs. Those
formats usually specify the problems to be solved and the techniques used to solve
them, but are not full fledged experimental descriptions, since they do not specify any
hypothesis, experimental design, or analysis procedure. The single format that provides support for execution results is that of Optsicom Optimization Suite [108]. Other
frameworks such as JCLEC [277] and HeuristicLab [284] only provide support and
templates for the specification of optimization techniques, problems and tasks.
There are two elements of MOEs description for which previous approaches have
84
4.2. PROBLEMS
been proposed in the literature, namely optimization problems specifications and optimization techniques specifications. Regarding the former, several languages has been
proposed in the literature [156], such as AMPL [102] or GAMS [70]. Regarding the
latter, some proposals for particular metaheuristics (such as EDL [76] for Evolutionary
Algorithms and Localizer [191] for local search), and other notations for the general
description of optimization algorithms have been proposed [74, 75]. In this sense, it
is noticeable that HeuristicLab provides a graphical notation for this purpose [285].
Again, those formats are not fully fledged experimental description formats, since they
focus on describing optimization problems or metaheuristic algorithms. Those formats
lack of elements for describing hypotheses, experimental designs, analysis procedures
or the requirements of the experimental setting for replication.
4.2.3
On the execution of MOEs
Analogously to the experimental descriptions, there is a lack of means to describe
the information required to execute experiments. This hinders the automation of experiments and, more importantly, their reproduction in comparable settings. This
information should include, as a minimum, information about the configuration required to run the experiment: operating system, libraries, pre/post processing, etc.
A related issue is the lack of interoperable means to distribute the results of experiments which are usually given in natural language or in ad-hoc formats as excel
spreadsheets. This hinders tackling the threats to validity related to the consistency
between experimental description and results. For instance, it is currently up to the
user to check that the number of algorithm runs matches with the expected number
indicated in the description (see threat ITV-6 in Chapter §3). This makes difficult undertaking experiments and increases the chances of obtaining wrong conclusions.
Related work on the execution of MOEs can be classified into scientific workflow
systems (SWS), generic experiment execution platforms, and domain specific metaheuristic execution tools and services. SWS are designed specifically to compose and
execute a series of computational or data manipulation steps in engineering or research
contexts. The trend of creating SWS originated in the bioinformatics area, where the
needs of data processing are massive, but it has been expanded to other areas, culminating with the creation of several generic SWS such as Taverna [203], Kepler [177],
LONIPipeline [236] and Trident [20].
SWS integrate in a seamless way tasks to be performed by researchers and support-
85
CHAPTER 4. MOTIVATION
ing tools for but: i ) they are not designed specifically for experimentation, thus require
the specification of the own experimentation life-cycle as part of the workflow, ii ) they
do not provide any validation mechanism regarding the threats to the validity of the
experiments, iii ) the do not force the specification of a minimum set of information
regarding the experiment per se; i.e., the checking of the presence of such information
must be coded in the own scientific workflow, and iv) when alternative analysis or
complex design are present, they require the coding of the alternatives and decision
mechanism in the own scientific workflow. Tasks associated to issues iii ) and iv) are
very tedious and error-prone for most experiments.
Regarding generic experiment execution platforms we point out the Collage authoring environment [200], which provides a computational environment for the execution of scripts that implement experiments. It supports several implementation
languages such as python or bash. Another interesting proposal is SHARE [127] a
web portal for the creation and sharing of executable papers through virtual machines.
None of these tools introduce any validity checking or minimum requirements about
information provided for the experiments. Moreover, they do not use any kind of EDL,
but rely on descriptive languages or source code implementations. Finally, recent trend
in this category are experimental data repositories [87, 202, 218, 223], that do not support the automated replication or execution of the experiments but the dissemination
of experimental execution results, analyses, and lab-packs.
Regarding metaheuristic execution tools, some proposals provide the execution of
metaheuristic programs as a service [113, 206], but to the extent of our knowledge no
specific tools have been developed for experimentation.
4.2.4
On the analysis of MOEs
Once the results of a experiment are obtained, these must be analysed to check
whether they support or disprove the experimental hypothesis. As explained in Chapter §3, in the context of metaheuristics this is commonly done using statistical tests.
This is a complex task that require a deep knowledge of the available tests and their
application conditions. Furthermore, statistical analysis tools such as SPSS [144] or R
[2] involves overcoming a steep learning curve, especially for those out of the fields
of maths and statistics. The consequences of such complexity is an increase in the resources needed to analyse experimental results.
Related work in this context can be classified into three categories: software tools
86
4.2. PROBLEMS
and libraries for statistical analysis, on-line software for statistical analysis, and web
services for statistical testing of hypotheses. A set of statistical analysis libraries have
been created by different authors, such as JavaNPST [69], the supporting library developed by Garcı́a et al. [111], the Java Statistical Classes [28], or the Apache Commons
Math library [12]. However, the set of tests available in those libraries is usually incomplete, focusing either on parametric or non parametric tests, and lacking in most
cases of post-hoc analysis procedures.
Regarding on-line statistical analysis tools there are also a number of proposals
[51, 176, 221, 259, 272, 292, 293]. However, those tools do not provide neither some of
the non-parametric multiple comparison tests, such as Quade’s test or Aligned Friedman test, nor post-hoc analysis procedures. Those proposals do not provide integrated
wizards nor methodological guidance for the selection of the statistical test to be applied. Moreover, the approaches of [51] and [292] are not based on web standards, but
on Java applets that are embedded on the pages
As far as we know, the only system providing XML web services for statistical testing of hypotheses is [259]. This site provides some operations implementing parametric tests (specifically ANOVA and T-Student tests), but does not provide non-parametric
tests.
4.2.5
On the replicability of MOEs
The possibility of reproducing experiments is a key point for their validation. Replicability is a widely adopted practice in certain areas of science. For instance, everytime
that a relevance finding is published in the area of p hysics, other labs in the world
rapidly reproduce the experiment to confirm or disprove the discovery. In this sense,
in order to reproduce an experiment, all the information related to its description and
execution should be provided. This is usually not the case in current literature in computer science, where the experimental details presented in a typical paper are usually
insufficient to implement the same algorithm [80].
As pointed out by Eiben and Jelasity in [80], effectively reproducing and verifying
the results found in literature is almost impossible, or at minimum, an extremely laborious task. In this sense, even when the source code of the experiment is provided, it
may not be reproducible. The dependences on the computational environment, runtimes and specific configurations needed imposes limitations, forcing the researchers
to perform code-reviews, and even substantial modifications in the source code in or-
87
CHAPTER 4. MOTIVATION
der to replicate the experiment. Furthermore, source code quality and comments of
research prototypes is not as good and complete as in production code, leading to a
process that is usually more tedious and error-prone than re-implementing the algorithms and experimental execution procedures from scratch.
The problem of reproducing experiments is aggravated in the field of metaheuristics due to the huge number of technique variants, its stochastic nature, and the lack of
a widely accepted scheme for describing experiments and their execution.
4.3
O VERVIEW OF OUR CONTRIBUTIONS
Next, we overview the main contributions of this dissertation relating them to the
problems presented in the previous section.
4.3.1
On the implementation of MPS applications
We present a comparison framework to facilitate the comparison and selection of the
best MOF for a given problem. The framework includes 271 features that an ideal MOF
should support. A metric has been defined for each feature in order to assess the support provided by current MOFs. Furthermore, means to aggregate such assessments
into general quantitative scores in six different areas have been defined. In total, we
have studied 34 different MOFs found in the literature. Based on such comparison
framework, ten MOFs are assessed to provide a picture of the current state of the art.
4.3.2
On the description of MOEs
We present two languages for the description of experiments: SEDL and MOEDL.
SEDL (Scientific Experiments Description Language) is a generic language to describe
experiments in a precise, unambiguous and tool-independent way. SEDL documents
include all the information that a basic experiment description should provide regardless of the application domain, namely: objects, subjects, population, variables, hypothesis, treatments and analysis design. SEDL also defines a set of extension points
that allow the creation of Domain Specific Languages (DSLs) for the definition of domainspecific experiments. MOEDL (Metaheuristic Optimization Experiments Description
Language) is a DSL based on SEDL for describing MOEs, which avoids the need for
providing a full description of the most common elements of typical metaheuristic ex-
88
4.3. OVERVIEW OF OUR CONTRIBUTIONS
periments, such as techniques comparison and parameter tuning. Moreover, MOEDL
include extension points that enable adding support for multiple types of problems
and techniques, e.g. multi-objective optimization problems.
Although currently MOEDL does not provide support for the problems and techniques specification languages described in Section §4.2.2, an extension point has been
introduced in order to add such support in the future.
The benefits of using experimental description languages are plentiful. First, they
contribute to the automation and replication of experiments making experimental descriptions complete and self-contained. Second, these languages facilitate exchanging
experimental information among researchers and practitioners. Finally, languages can
be helpful for amateurish users who can use them as a guide of the basic information
required to perform a correct experiment.
In addition to previous languages, we present a catalog of 15 analysis operations
on SEDL documents. These operations automatically check the most common validity threats associated to experiments warning users and suggesting possible fixes (see
Chapter §3). For instance, these analysis operations can detect inconsistencies between
the hypothesis of the experiment and the design specified for its conduction, and warn
the user that this design cannot disprove or confirm such hypothesis.
Finally, we present MOSES, a software ecosystem and associated reference implementation for the automated processing of SEDL and MOEDL documents. Among
other features, MOSES supports the automated analysis of SEDL and MOEDL documents. To the best of our knowledge, this is the first approach proposing a generic,
extensible and operational language for describing experiments and specific tools supporting their analyses.
4.3.3
On the execution of MOEs
SEDL and MOEDL documents include specific sections for the information required to execute the experiment as well as reporting the statistical analysis of the
results. This contributes to the automation and replication of the experiments including all the information needed to perform the experiment as well as to compare the
outcomes. Additionally, we propose 15 analysis operations on SEDL documents for
checking the threats to validity related to the matching between experimental description and results. For instance, one of those operations compare the expected number of measurements of the outcome variable that should be generated according to
89
CHAPTER 4. MOTIVATION
the experimental design, with the actual number of measurements present in the results. If these values are different it means that the conclusions of the experiment are
threatened due to, i ) attrition ITV-6, i.e., the experiment generated less measurements
than expected, ii ) a problem in the implementation of the treatments IVT-12 or iii ) a
failure in the measurements, i.e., the experiment generated more measurements than
expected.
Given a SEDL or a MOEDL document representing experiments, MOSES supports
the implementation of the algorithms described through FOM, the automated analysis
of the experiment, the automated conduction and replication of MOEs through Experiment Execution Environment (E3), and the publication of lab-packs following a specific
format named Scientific Experiment Archive (SEA).
4.3.4
On the analysis of MOEs
We present STATService, a suite of on-line software tools for performing statistical
analysis. Roughly speaking, STATService provides a multi-purpose and multi-user
platform to carry out statistical analysis that focuses on reusability, ease of use and
learnability. It also provides three different interfaces (web, programmatic, and MSExcel-based) to use their statistical analysis capabilities. Furthermore, STATService
provides added-value utilities such as a wizard to help non-expert users to select the
most appropriate test for their concrete dataset, and a programmatic interface as XML
web services for advanced users.
STATService integrates some traditional libraries with additional tests developed
by authors in order to provide a comprehensive set of statistical tests. Specifically,
STATService integrates tests provided by JavaNPST, the supporting library developed
by Garcı́a et al. in [111], and the Apache Commons Math library [12].
4.3.5
On the replicability of MOEs
Each one of the approaches presented in the previous sections contribute in one
way or another to the replicability of experiments. First, the languages SEDL and
MOEDL enable the complete description of experiments in a self-contained and machineproccesable way. Second, MOSES enable the automated execution of experiments described in SEDL and MOEDL as well as the automated validation of their descriptions
and results. The own nature of the ecosystem make it easily extensible easing the replication of experiments using third-party MOFs or statistical analysis tools.
90
4.4. SUMMARY
4.3.6
On the development of MPS-based applications
For the evaluation of our approaches we used them to solve two relevant problems
in the context of software engineering.
• Quality-driven web service composition. In this problem the Quality of Service (QoS)
provided by each service is used to drive the choice of the specific service provider
to invoke in a composition, trying to maximize the global QoS experienced by
the users. Experiments show that our algorithm, named QoS-Gasp, outperforms
previous metaheuristic approaches proposed in literature for real-time binding
scenarios. Specifically, QoS-Gasp provided bindings that improve QoS provided
by previous proposals up to a 40%.
• Hard Feature Model Generation. In this problem, we searched for feature models
(c.f. Section §H.2) as difficult to analyse as possible for current tools, in order
to determine its performance in pessimistic scenarios. The proposed algorithm,
named ETHOM, successfully identified models producing much longer executions times (up to 30 minutes) and higher memory consumption than those obtained with random models of identical or even larger size.
4.4
S UMMARY
In this chapter, we have presented the main problems that hinder the use of metaheuristics and that motivated this dissertation. These mainly focus on the lack of comparison frameworks for MOFs and the lack of support for the description, execution,
analysis and replicability of MOEs. We have also outlined the current solutions and
we have summarized our contributions emphasizing the gap they fill.
91
5
C OMPARATIVE FRAMEWORK FOR
MOF S
If you cannot measure it, you cannot control it.
John Grebe,
1900 – 1984
American chemist
his chapter presents a comparative study of Metaheuristic Optimization Frameworks.
As criteria for comparison a set of 271 features grouped in 30 characteristics and 6
areas have been selected. These features include the different metaheuristic techniques
supported, mechanisms for solution encoding, constraint handling, neighbourhood specification, hybridization, parallel and distributed computation, software engineering best practices,
documentation and user interface, etc. A metric has been defined for each feature so that the
scores obtained by a framework are averaged within each group of features; leading to a final
average score for each framework. Out of thirty four frameworks identified in the literature ten
have been selected using well-defined filtering criteria, and the results of the comparison are
analyzed with the aim of identifying improvement areas and gaps in specific frameworks and
the whole set. Section §5.2 describes the methodology used to create our comparative framework divided into six areas. In further sections, each area is developed in detail (sections §5.3
to §5.8), defining a set of characteristics, its importance, metrics, and data sources used for its
evaluation. In each section, charts and interesting results on the current support by the selected
MOFs are provided. In Section §5.9 we discuss the results obtained from a global perspective,
showing significant gaps and general tendencies. Finally, in Section §5.10 we summarize and
present the main conclusions.
T
93
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
5.1
I NTRODUCTION
The key point of this chapter is to provide a general comparative framework to
guide the selection of a particular MOF and to evaluate the current MOFs found in
the literature. In doing so, this chapter extends the comparative framework of [107]
including frameworks that incorporate several types of metaheuristic techniques (cf.
Sec. §5.3) and presents a comparative analysis of a large set of features.
Specifically, this chapter advances the state of the art in the following:
1. A general comparative framework for MOFs that can be used to classify, evaluate
and compare them.
2. An analysis of the current relevant MOFs in the literature based in the comparative framework proposed.
3. An evaluation of the current state of the art of MOFs from the research context
that can be used: (i) to guide newcomers in the area and (ii) to identify relevant
gaps to MOFs developers.
It is important to highlight that the main value of this study is neither in comparing
the rankings of two concrete MOFs in a feature or characteristic, nor in stating which
MOFs better fulfills the benchmark criteria; but the establishment of a general comparison framework which clearly defines the set of desirable features of MOFs; depicting
a real “state of the art” of MOFs with improvement directions and gaps in features
support. This comparison framework has shown its value and generality, allowing the
evaluation of the new versions of assessed MOFs released during the realization of this
study without modifications (four MOFs released new versions). Moreover, the possibility of downloading the benchmark as a spreadsheet and tailoring it to user needs by
modifying its weights is also crucial for making it more relevant and applicable.
5.2
R EVIEW M ETHOD
The present comparative is based on the software technology evaluation methodology proposed by Brown and Wallnau in [39], which seeks to identify the value added
by technology through the establishment of a descriptive model in terms of its features
of interest and their relationship and importance to its usage contexts. In our case,
94
5.2. REVIEW METHOD
only the first phase is performed, providing a descriptive model which enables the
evaluation of technologies and the description of the features of interest. The second
phase which involves conducting experiments with each of the MOFs associated with
specific use scenarios, and is beyond the scope of this dissertation.
In order to establish our descriptive model of characteristics to be supported by
MOFs, and select the set of MOFs to assess, we followed a systematic and structured method inspired by the guidelines by Kitchenham [163]: First, we stated a set
of research questions (see next sub-section). Secondly, we established the information
sources used for the search of the candidate MOFs. Then, we applied filtering criteria
to obtain the final set of MOFs to be analyzed. Finally, we composed and grouped the
full set of comparison criteria, and used them to assess MOFs.
5.2.1
Research Questions
In this section, we further refine the research questions presented in Chapter §4
regarding MOFs:
• RQ1:What metaheuristics are currently supported by MOFs? This question motivates
the following sub-questions:
– Is there a MOF that supports the whole set of techniques?
– What is the most popular technique? i.e., Which is the technique implemented by most MOFs?
– Is there a “core set of techniques” supported by more than the 50% of the
assessed MOFs?
• RQ2: What tailoring mechanisms do current MOFs support, and to what extent are
those mechanisms supported? This question motivates the following sub-questions:
– Is there a “core set of adaption mechanisms” (such as solution encoding
mechanisms, operators, etc. ) supported by more than the 50% of the assessed MOFs?
– What MOF is better suited to adapt to specific problem solving?
• RQ3: What combination of techniques (hybrid approaches) are supported when using a
MOF? This question motivates the following sub-question:
95
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
– Is hybridization a widely supported feature (supported by more than half of
the assessed frameworks)?
– What is the most common hybridization mechanism supported by MOFs?
• RQ4: Can current MOFs help to find out the best tuning for their supported metaheuristics (for instance performing hyper-heuristic search)?
• RQ5: To what extent do current MOFs take advantage of parallelization capabilities of
metaheuristics and distributed computing?
• RQ6: What additional tools are provided by current MOFs in order to support the MPS
life-cycle?
• RQ7: Which costs and licensing model do current MOFs go by?
• RQ8: What platforms (operating system, programming languages, etc.) are supported
by current MOFs?
• RQ9: Are current MOFs using software engineering best practices in order to improve
code quality, maintainability, stability and performance?
After reviewing all this information we also want to answer some more general
questions:
• RQ10: What degree of maturity and popularity do current MOFs have? This question
motivates the following sub-questions:
– What problems have been solved with each MOF?
– What documentation and help on its use does each MOF provide?
– Are current MOFs supported by scientific publications?
– What is the user community of each current MOF?
– Which is currently the most popular MOF?
• RQ11: What are the challenges to be faced in the evolution and development of MOFs?
96
5.2. REVIEW METHOD
5.2.2
Source material
The information sources used for the search of MOFs have primarily been electronic
databases through their online search engines. Specifically, we have searched on: IEEE
Xplore, ACM Digital Library, SpringerLink and Scopus. The following search strings
have been used: “Metaheuristic Optimization Framework”, “Heuristic Optimization
Framework”, “Metaheuristic Software library”, “ Metaheuristic Optimization Library”
and “Metaheuristic Optimization Tool”.
Based on the results obtained, a list of candidate MOFs was generated, that later
was enlarged using direct web searches (using Google and the search strings described
above) and references present on papers and frameworks‘ web sites. Key references
obtained during this phase were [281] by Voß and Woodruff and Gagnè and Parizeau.
However, frameworks web sites were a key data source, given that their links, articles
and related work sections allowed us establish the full reference set to study. After
a detailed analysis of these references, an initial set of main supported features and
MOFs were established, and basic information gathering of those tools was performed.
The list of candidate optimization tools contains 34 entries: Comet, EvA2, evolvica,
Evolutionary::Algorithm, GAPlayground, jaga, JCLEC, JGAP, jMetal, n-genes, Open
Beagle, Opt4j, ParadisEO / EO, Pisa, Watchmaker, FOM, Hypercube, HotFrame, Templar, EasyLocal, iOpt, OptQuest, JDEAL, Optimization Algorithm Toolkit, HeuristicLab, MAFRA, Localizer++, GALIB, DREAM, Discropt, MALLBA, MAGMA and UOF.
5.2.3
Inclusion and Exclusion criteria
Some MOFs were discarded to keep the size and complexity of the review at a
manageable level, establishing the following filtering criteria:
• The development of MOFs must be alive, and error fixing supported by their developers. A MOF where users must debug all errors found by themselves and
that will not provide future improvements or features is not a valid option. Consequently this is our first filtering criterion. We consider as abandoned those
frameworks without new versions (even minor bug fixes) or papers published in
the last five years. This criterion eliminated 8 MOFs: jaga, HotFrame, Templar,
MAFRA, DREAM, Discropt and UOF.
• MOFs to be evaluated must be frameworks implemented in general purpose Object Oriented languages (such as Java or C++). They must provide a general de-
97
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
sign where user-defined classes are integrated in order to produce an optimization application for solving the problem at hand. There are useful optimization
tools that do not meet those requirements, and are consequently out of the scope
of this chapter, but might be studied in a similar comparative research work.
This criterion eliminated 3 MOFs: Evolutionary::Algorithm, PISA, Comet and
OptQuest.
• MOFs must support at least two different optimization techniques, considering
multi-objective variants of techniques as different techniques. Otherwise, they
are considered specific applications, even if they can adapt to various problems.
This criterion eliminated 9 MOF: evolvica, n-genes, GALib, GAPlayground, Hypercube, JGAP, Open Beagle, jMetal, Watchmaker.
• Those frameworks for which an executable version or source code with its documentation could not be obtained were also eliminated (after contacting authors
and requesting from them a valid version). This criterion eliminated 4 MOFs:
iOpt, JDEAL, OptQuest and MAGMA.
Table 4 shows the final set of frameworks compared along with their specific versions and web sites.
Name
EasyLocal ([71])
ECJ ([179])
EO/ ParadisEO/ MOEO/
PEO ([41])
EvA2 ([170])
FOM ([209])
HeuristicLab ([283])
JCLEC (and KEEL) ([277])
MALLBA ([8])
Optimization
Algorithm
Toolkit ([40])
Opt4j ([178])
Ver.
2.0
20
1.2
Web (htpp adress)
http://satt.diegm.uniud.it/EasyLocal++/
http://cs.gmu.edu/˜eclab/projects/ecj/
http://paradiseo.gforge.inria.fr
http://eodev.
sourceforge.net/
2
0.8
3.3
4.0
2.0
1.4
http://www.ra.cs.uni-tuebingen.de/software/EvA2/
2.1
http://opt4j.sourceforge.net
http://www.isa.us.es/fom
http://dev.heuristiclab.com
http://JCLEC.sourceforge.net http://sci2s.ugr.es/keel/
http://neo.lcc.uma.es/mallba/easy-mallba/index.html
http://optalgtoolkit.sourceforge.net
Table 5.1: Selected MOFs
In spite of the considerable effort during the development of this work, and that the
MOFs have been chosen based on well-defined and consistent filtering criteria, some
98
5.2. REVIEW METHOD
metaheuristic optimization libraries of great practical interest have not been included
in this study (e.g. JGAP, Hypercube, Watchmaker or Comet).
5.2.4
Comparison Criteria
Evaluating a software tool usually implies understanding and balancing competing
concerns regarding the new technology. In this sense, the proposed comparative criteria covers 6 areas of interest that are subdivided into 30 specific characteristics, which
in turn are subdivided into 271 features. Table §5.2.4 shows the areas and corresponding set of characteristics in this study, along with the associated research question that
we intend to answer through the evaluation of each characteristic.
Table §5.2.4 covers a wide range of concerns, from MOF specific characteristics such
as supported metaheuristic techniques or solution encoding (covered in areas C1, C2
and C3), to general concerns such as usability, documentation and licensing model
(covered in areas C4, C5 and C6).
Specifically, these areas are directly related to our research questions:
• Area C1 establishes a set of metaheuristic techniques and variants to be supported by MOFs. The assessment of this area for each framework allows us to
answer RQ1 and its sub-questions.
• Area C2 describes the possible ways of tailoring the problem through metaheuristics. Thus, its assessment provides a basic way of answering RQ2, showing the
support provided by each framework.
• Area C3 comprises of a set of advanced capabilities such as distributed and parallel processing, or hybridization.
• Area C4 defines different kinds of additional features that are (or could be) supported by MOFs.
• Area C5 shows the platforms and programming languages supported by each
framework, along with the use of software engineering best practices.
• Area C6 defines characteristics that assess the issues concerning the sub-questions
of RQ10.
Due to the different kinds of characteristics present in the comparison framework, a
proper quantification of the facilities provided by each MOF is a complex issue. Some-
99
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
Area
C1 Metaheuristic
Techniques
C2 Adaption to the
Problem and Its Structure
C3 Advanced
Characteristics
C4 General Optimization
Process Support
C5 Design,
Implementation &
Licensing
C6 Documentation &
support
Characteristic
C1.1 Steepest Descent / Hill Climbing
C1.2 Simulated Annealing
C1.3 Tabu Search
C1.4 GRASP
C1.5 Variable Neighborhood Search (VNS)
C1.6 Evolutionary Algorithms
C1.7 Particle Swarm Optimization
C1.8 Artificial Immune Systems
C1.9 ACO
C1.10 Scatter Search
C1.11 Multi-objective Metaheuristics
C2.1 Solution Encoding
C2.2 Neighborhood Structure definition
C2.3 Auxiliary Mechanisms supporting population
based heuristics (Genetic Operators)
C2.4 Solution Selection Mechanisms
C2.5 Fitness Function Specification
C2.6 Constraint Handling
C3.1 Hybridization
C3.2 Hyper-heuristics
C3.3 Parallel & Distributed Computing
C4.1 Termination Conditions
C4.2 Batch execution
C4.3 Experiments Design
C4.4 Statistical Analysis
C4.5 User Interface & Graphical Reports
C4.6 Interoperability
C5.1 Implementation Language
C5.2 Licensing model
C5.3 Platforms availability
C5.4 Usage of Soft. Eng. Best Practices (Test, Design
Patterns, UML)
C5.5 Size ( classes and packages/modules)
C5.6 Numerical Handling
C6.1 Sample problems types
C6.2 Articles & papers
C6.3 Documentation
C6.4 Users & Popularity
Related RQ
RQ1
RQ2
RQ3
RQ4
RQ5
RQ6
RQ8
RQ7
RQ8
RQ9
RQ10
Table 5.2: Areas of interest and comparison characteristics
times it is meaningless to use quantitative values for assessing certain characteristics
(e.g. it makes no sense to associate a quantitative value to the language in which the
MOF is implemented). Therefore, for some characteristics we avoid defining metrics,
using them as simple attributes of MOFs which might be relevant to users. Other characteristics such as MOF size, have been left out of the comparative analysis because
100
5.2. REVIEW METHOD
they do not affect the research questions. However, the information harvested can be
useful for further analysis.
In our comparative approach, we have attempted to obtain a knowledge base about
real capabilities provided by MOFs which are as objective as possible. In so doing, each
characteristic has been defined, and a set of features is identified to evaluate its support
(with minor exceptions). Features are defined taking into account the maximum possible support that could provide an ideal MOF, not the current state of the art MOFs, in
order to identify gaps, and answer RQ11. Consequently, there are characteristics that
are not fully supported by any MOF, and even some for which current support is nearly
non-existent. In case we need a subjective criteria, we have adopted the perspective
of the research-use context (cf. Section §2.6.1) and the research questions stated. We
are working on three levels: areas, characteristics and features; where characteristics
are aggregated into areas and various features are used to evaluate individual characteristics. For each feature and MOF a value is measured with two methods: first,
features corresponding characteristics of areas C1 to C4 are evaluated using a binary
true/false value, avoiding subjectivity on the value assignment. This information is
defined as feature coverage, and is the base of a more general evaluation that provides
a global quantitative value for each characteristic and area. Second, areas C5 and C6
represent non-functional characteristics corresponding to transversal aspects that can
not be measured in an objective way; as a consequence, each feature is defined with a
score marked by the research use context.
A specific value has been given to each characteristic based on these features. A
weighting that defines the contribution of each feature to the general support of the
characteristic. In the same way, each area is measured based on a weighted sum of
the evaluation of its corresponding characteristics. The proposed weights range from
0.0 to 1.0, meaning none and full contribution to characteristics support respectively.
Three different types of metrics have been devised:
• Uniform: weighting is associated evenly to each feature of the characteristic. This
metric type is usually associated with variants or features with no clear predominance in terms of popularity or performance.
• Proportional: a basic feature is given a significant weight (usually 0.5) and the
remaining weight is evenly associated to the other features of the characteristic.
This metric type is associated with a characteristic with a more useful feature
with some rare variants or additional features.
101
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
• Ad Hoc: weighting is associated to features based on specific author criteria.
It is important to note that we have set weights from a research use context on
optimization problem solving. However, in other specific scenarios such as teaching,
or industrial problem solving, weights could vary in order to reflect the exact importance of features, characteristics and areas on those contexts. This mechanism allows
customized versions of the comparative study and tailored conclusions. This information is published as a public spreadsheet at: 1 . In this way data can be verified and
reused, and weights can be redefined. Moreover, for areas C1,C2, C3 and C4, tables
showing feature coverage per framework, and weights are provided in this chapter
(Tables §A.1, §A.2, §A.1 and §A.4 respectively). In the following sections we describe
each area, its characteristics, corresponding features and weights, and global scores obtained by each MOF. Tables 9, 10, and 11 in the appendix, show these scores in detail.
5.3
M ETAHEURISTIC T ECHNIQUES (C1)
The main feature of any MOF is the set of supported metaheuristics. A characteristic is defined for each metaheuristic, which indicates the support the MOFs provide
for it. Most of these metaheuristics have been described in detail in Section §2.2, thus
for the sake of brevity, only their specific features taken into account and their weights
are provided in this section.
5.3.1
Characteristics Description
A set of 11 characteristics has been defined, with 52 features, comprising most major metaheuristics proposed in the literature; either based on intelligent search (characteristics C1.1,C1.2,C1.3 and C1.5), on solution building (C1.4, C1.9 and C1.10) or populations (C1.6, C1.7, C1.8, C1.9 and C1.10). Furthermore we have evaluated the incorporation of techniques for multi-objective problem solving (C1.11). Metaheuristics
and variants taken into account have been chosen following [120] and some technique
specific references such as [3],[17] and [54]. Next we describe in detail each of these
characteristics. The coverage of features by frameworks and their weights are shown
in Table §A.1.
1 http://www.isa.us.es/MOFComparison
this document contains comments about feature coverage
and why some features are assessed as partially supported by some MOFs
102
5.3. METAHEURISTIC TECHNIQUES (C1)
C1.1 Steepest Descent / Hill Climbing: This technique searches successively for the
best neighbor solution until reaching a local optimum. This technique is commonly
used for hybridization (c.f. characteristic C3.1). Metric: We have defined two different
features: (i) basic implementation until local optimum is found, and (ii) multi-start implementation using a random initial solution when local optimum is found. A uniform
metric is used (with each feature weighing 0.5).
C1.2 Simulated Annealing: We have defined a feature associated to the basic implementation of this technique, and features for some of its variants. Variants on the
cooling scheme: linear, exponential scheme as proposed by Kirkpatrick et al. [161],
logarithmic scheme as defined by Geman and Geman [115], and schemes based on
thermodynamics (defined by Nulton and Salamon and Andresen and Gordon). Additionally, we have evaluated the variants on the acceptance criterion of worsening
solutions: metropolis acceptance proposed by [161] and logistic acceptance [123]. Metric: A proportional metric is used, where the basic implementation has a weight of 0.5,
each cooling scheme variant weighs 0.1, and each acceptance criterion variant weighs
0.1.
C1.3 Tabu Search: This technique uses procedures designed to cross boundaries of
local optima by establishing an adaptive memory to guide the search process. Metric: An
ad-hoc metric is used to asses this characteristic. A feature representing the basic implementation of this technique using as memory a tabu list weighs 0.3, memory for
recent solution’s components weighs 0.2, frequency-based memory for solution’s components weighs 0.3, and the inclusion of aspiration criteria weighs 0.2.
C1.4 GRASP:A unique feature indicating support for this technique is used, evaluated as a binary value indicating if the framework provides some kind of support.
C1.5 Variable Neighbourhood Search (VNS): Several variants of this technique have
been proposed in literature, based on them we propose the following features: (i) Original proposal implementation (VNS);(ii) Variable Neighbourhood Descent (VND); (iii)
Reduced VNS (RVNS); (iv) Variable Neighbourhood Decomposition Search (VNDS) by
[128] and (v) Skewed VNS by [66]. Metric: A uniform metric is used, having a weight
of 0.2 for each feature.
C1.6 Evolutionary Algorithms (EA): There are many techniques based on principles
of biological evolution that are denoted as Evolutionary Algorithms. Specifically, EAs
comprise of three independently developed approaches: evolutionary strategies (ES)
proposed by [233], evolutionary programming according to [99] and genetic algorithms
103
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
as developed by [140]. These techniques present different variants based on the elements used for adapting to the problem (some of them present in other techniques) and
some additional variation points. In order to create a global and coherent comparative
criteria, we have identified various characteristics for those variations. Remarkably,
the selection of individuals for crossover and survival is independent of the solution
encoding; thus frameworks can provide implementations using different selection criteria and can reuse them, since mechanisms for selecting solutions are used in various
metaheuristics. We have created a characteristic for evaluating the support for solution
selection (C2.4). Crossover and mutation mechanisms are dependent on the representation scheme used, and the efficiency of a specific mechanism will strongly depend on
the problem to be solved. Consequently, we have created an associated characteristic
in the area of adaptation to the problem (C2.3).
Thus, this feature (C1.6) only measures the support provided by frameworks for general evolutionary algorithms, without taking into account solution encoding capabilities, the genetic operators nor the selection mechanisms available. Of the many variants that have been proposed in literature for the basic evolutionary algorithm, we take
into account: (i) the use of variable population sizes (e.g. GAVaPS [13]), (ii) niching
methods (commonly used to solve multi-modal optimization problems), (iii) individuals that encode more than one solution to the problem (usually diploid). [125], (iv)
co-evolution of multiple populations in competitive and cooperative environments as
described in [17, Chapter on co-evolutionary algorithms] and (v) differential evolution
as developed by [225]. Variants (i), (iii) and (iv) as well as some versions of (ii) can be
implemented regardless of the problem, the solution encoding or the operators used.
Metric: An ad hoc. metric is defined to asses this characteristic. Three features have
been identified to evaluate the support of the different evolutionary approaches, with
each feature weighing 0.2. With regards to the variants, (i) weighs 0.05, (ii) weighs 0.1,
and (iii) weighs 0.05, (iv) weighs 0.1 and (v) weighs 0.1. We evaluate variants as binary
variables, in terms of the support afforded by frameworks.
C1.7 Particle Swarm Optimization (PSO): In this technique, the topology of the neighbourhood of particles, i.e. the particles that influence the position of a given particle
according to the equations, generate a full set of possible variants. In the original PSO,
two different kinds of topologies were defined: (i) global, specifying that all particles
are neighbours of each other; and (ii) local, specifying that only a specific number of
particles can affect a given particle. In [160] a systematic review of neighbourhood
topologies are described, and in [264] the concept of “dynamic” neighbourhood topology is proposed. Another interesting variant is the use of a “life time” for solutions
104
5.3. METAHEURISTIC TECHNIQUES (C1)
in the swarm, after this time solutions are randomized. Metrics: We have created a
feature to represent the original proposal for real variables and classic equations, It
weighs 0.3. Discrete variable support weighs 0.2. Equations customization weighs 0.2.
The explicit modelling and support of different neighbourhood topologies weighs 0.2.
Finally, lifetime support weighs 0.1.
C1.8 Artificial Immune Systems (AIS): This technique intends to use the structure and
operation of biological immune systems of mammals, and apply it to solving optimization problems. This technique comprises various proposals: Clonal Selection algorithms
originally proposed by[199] and its variants such as CLONALG, developed by [65]
and optIA; Immune Network algorithms and Dentritic Cell algorithms Metrics: A uniform
metric is used to asses this characteristic (with each feature weighing 0.25).
C1.9 Ant Colony System (ACS): In this chapter the following variants are taken into
account: The original proposal of Ant System (AS) and Ant Colony System (ACS) as
proposed by [73], Ant System using Rankings (ASrank), Min-Max Ant System (MMAS)
according to [262], and API as developed by [192]. Metrics: An ad hoc. metric is
defined for this characteristic, corresponding weights are shown in Table §A.1.
C1.10 Scatter Search (SS): This technique has a unique feature, evaluated as a binary
value, which indicates if the framework provides it with some kind of support.
C1.11 Multi-objective Metaheuristics: The technique most commonly used to solve
multi-objective optimization problems is EA ([77]). However, some variants of other
techniques have also been taken into account: SA (MOSA as proposed by[273] and
PASA as developed by [265]), PSO ([217]) and ACO ([148]). Those variants have been
adapted to solve multi-objective optimization problems. Regarding the EA variants to
evaluate, we have taken into account: the original proposal by [122] (PGA), MOGA as
proposed by [100], Non Dominated Sorting Genetic Algorithm (NSGA and NSGA-II)
as developed by [67], Niched Pareto Genetic Algorithm (NPGA) according to [143],
Strength Pareto Evolutionary Algorithm (SPEA and SPEA-II ([307, 308])), Pareto Envelope based Selection Algorithms (PESA and PESA-II) ([56]), Pareto-archived ES (PAES)
([165]), multi-objective messy GA (MOMGA) ([275]) and ARMOGA ([242]). Metrics:
A uniform metric is used to assess this characteristic.
5.3.2
Assessment and Feature Coverage Analysis
It is remarkable that only four features of this area are supported by a minimum
of six out of the ten MOFs under study. These features correspond to metaheuristics
105
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
SD/HC, SA and EA. This fact shows a dispersion in techniques supported by MOFs,
and consequently, implies that users have little choice if they want to use techniques
out of this set. Thus MOF is determined by the technique the user wants to apply.
An interesting fact shown in Table §A.1 is that 39% of features in this area are not
supported by any MOF. Consequently, current MOFs have room for improvement in
this area. Moreover the distribution of those unsupported features imply that MOF
techniques support is aimed at the basic variants. This does not apply to the techniques in the core set, TS and some multi-objective variants, since those techniques
only have features that represent variants with more than 30% of MOFs supporting
them. ParadisEO, Eva2, and FOM, have the highest number of features supported in
this area, followed by HeuristicLab and OAT.
5.3.3
Comparative analysis
FOM is the framework that provides a broader support of optimization techniques,
closely followed by Paradiseo, Eva2 and HeuristicLab. It is important to note that
more features supported does not imply more techniques supported, since some techniques have a number of variants and specific heuristics implementations modeled as
features. The weights contribute to express this fact, by making that each technique
sums a total score of 1 unit once the features are weighted. Figure §5.1 shows a stacked
columns diagram for the C1 area characteristics. Each color or texture represents a
metaheuristic and each column the support provided by a MOF. The number of techniques supported by each MOF can be easily identified by the number of different
colors/textures in its column. The degree of support for each technique is expressed
through each pattern’s height (computed based on the weight associated to their features and the feature support information shown). The total height of each column
provides a measure of the global support of metaheuristics by its corresponding MOF.
The almost universal support for EA and the lack of support for AIS is remarkable.
SS is only supported by Eva2, and GRASP is only supported by FOM. Other metaheuristics with very little support are ACO, TS and VNS. This could be due to the complexity of modeling in abstract, the elements involved in their operation and reusing or
customizing them (ACO and TS are based on features of solutions, and VNS needs to
apply different neighborhood structures). When applying EAs using java; ECJ, JCLEC
and EvA2 appear as highly competitive options; whilst Paradiseo and MALLBA are
the MOFs available if the user plans to use C++. In .NET environments, the only option available for applying EAs is HeuristicLab.
106
5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2)
Figure 5.1: Stacked Bar Chart showing MOFs techniques support
We can provide an answer to RQ1 and its sub-questions based on information
shown in Table §A.1 and Figure §5.1: characteristics of area 1 summarize the whole set
of metaheuristics currently supported by assessed frameworks. Most variants of those
techniques are unsupported. The most widely supported techniques are EA, SD/HC
and SA, which are supported by more than 60% of asssessed frameworks. Finally, there
is no universal MOF, which provides support for all the techniques.
5.4
A DAPTING TO A PROBLEM AND ITS STRUCTURE (C2)
As stated in the previous section, MOFs provide implementation of metaheuristic
techniques for problem solving. They also provide mechanisms to express problems
properly in order to apply these techniques. MOFs allow for the adaptation of their
supported metaheuristics for better problem solving.
For instance, frameworks can provide appropriate data structures that the techniques can handle. This two-way adaptation (techniques to problem for efficient problem solving, and problems to techniques for proper solution handling and underlying
heuristics implementation) is basically done in three ways: selecting an appropriate
solution representation/encoding, specifying the objective function to optimize, and
implementing the set of underlying heuristics required by the metaheuristic used to
solve the problem.
107
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
5.4.1
Characteristics Description
This area evaluates the capabilities provided by MOFs to support this adaption.
Characteristic C2.1 aims at assessing capabilities to represent solutions to optimization
problems based on the set of data structures provided by frameworks. Characteristics
C2.2, C2.3 and C2.4 aim to assess the supported set of underlying heuristics. Characteristic C2.5 aims to assess the capabilities of declarative objective function specification
based on the representations assessed in C2.1. Finally, C2.6 aims to assess capabilities
of constraint handling. Features and characteristics described in this section have been
structured following [17] and [240] for solution encodings (C2.1), [17] for selection and
genetic operators (C2.3 and C2.4), [3] for neighborhood definition capabilities (C2.2),
and [190] for constraint handling techniques (C2.6). Next we describe in detail each of
these characteristics:
• C2.1 Solution Encoding: Solution encodings are data structures that allow the
modeling of solutions for metaheuristic techniques to handle. In this sense, the
increased flexibility and the more data structures provided, the lower the effort
invested by the users to address problems. Metric: In order to evaluate this characteristic, we have taken into account 3 criteria: provided data structures (vectors, matrices, trees, graphs and maps), data types and information encoding,
and the ability to use combined representations as described by [240]. A proportional metric is used, where this last feature weighs 0.4. Data types taken into
account are bits (with usual or Gray encoding), integers, floating point numbers,
and strings. The remaining weight is evenly divided among these combination
of data type and data structure.
• C2.2 Neighborhood Structure definition. A proper neighborhood structure definition is a key factor for the success of intelligent search-based heuristics. Neighborhood structure strongly depends on solution representation, and its suitability depends on the problem to be solved and the technique used to solve it (as
stated by [3]). Metric: The assessment is divided into 3 features: pre-defined
neighborhood structures provided by MOFs weigh 0.6; neighborhood structures
of composite representations weigh 0.3, and a weight of 0.1 is given to complex
neighborhood structures that apply different neighborhood structures randomly
or based on some rule.
• C2.3 Auxiliary Mechanisms supporting population-based heuristics (Genetic Op-
108
5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2)
erators). Genetic Operators are the main underlying heuristics on EA. Their implementation (except for selection operators, evaluated in C2.4) is usually dependent on solution representation; therefore, MOFs must provide the corresponding implementations for their supported representations. Various alternatives
for implementing each genetic operator have been proposed in literature as described below. We have relied primarily on [17, chapter C3.3] to develop the
definition and features of this characteristic.
The most common genetic operators are crossover and mutation. Weights have
been evenly distributed among all variants provided for each operator. Next we
enumerate the crossover operators proposed in literature for solution encodings
of Table 3.
– Binary and integer vectors: The original crossover operator was proposed by
[140] and named “one point crossover” (1PX), the generalization of this operator for n crossover points (NPX) was proposed by [152], uniform crossover
(UX) [5], punctuated crossover (PNCTX) [243], shuffled crossover (SX) [86],
half uniform crossover (HCX) [84] and random respectful crossover (RRX)
as proposed by [229].
– Floating Point vectors: Operators 1PX, NPX and UX are in principle applicable to floating point vectors, but they support a set of specific crossover operators for being implemented by MOFs: arithmetic crossover (AX/BLX) [189,
p 112], heuristic crossover (HX) [301], simplex crossover (SPLX) [234], geometric crossover (GEOMX) [189], blend crossover (BLX-alpha) [85], crossover
operators based on objective function scanning (F-BSX) and diagonal multiparental crossover (DMPX) as proposed by [81].
– Permutations: Basic crossover operators, such as 1PX, NPX, UX, etc., generate infeasible individuals when using permutation-based representations; it
is therefore necessary to design specific operators for such representations,
such as: order crossover operator (OX) [63], partially mapped crossover
(PMX) [121], order-2 and position crossover [266], uniform crossover for permutations (UPX) [63, p 80], maximal preservative crossover (MPX) [197, p
331], cycle crossover (CX) [204] and merge crossover (MX) and defined by
Blanton and Wainwright [32].
– State Machines: Crossover operators for state machines (SMFx) were initially
proposed by [97, 99] (p. 21-23). In this comparative study, we evaluate
those operators and 1PX using a vectorial representation of the state machine (SM1PX) as defined by [306], state one to one state interchange as pro-
109
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
posed by [98], uniform crossover for state machines (SMUX) and the merge
operator (SMJO) as defined by [31].
– Trees: There is real difficulty in defining proper crossover operators for trees,
and specifically trees representing programs, since generally constraints have
to be imposed on their structure, semantics and associate data types. The
most common crossover operator for trees were proposed by [58]. In this
comparative, we also considered those defined by [169], and the adaptations
proposed by [193].
– Crossover operators for Composite Representations (CSX): Crossover operators
for individuals using composite representations can be used by applying
the corresponding operators to each component of the representation .
– Composite crossover operators (CMPX): By assigning a probability (or decision
rule) to the application of an operator from a set of valid crossover operators
for the representation used, composite crossover operators are possible.
Next we enumerate the mutation operators proposed in literature for the solution
encodings of Table 3.
– Binary and integer vectors: We have taken into account the original mutation
operator proposed by [139, pp 109-111].
– Floating Point Vectors: The mutation operator based on an uniform distribution U (b, −b) (RUm) proposed by [64], the normal mutation operator (RNm)
developed by [245], the mutation operators based on Cauchy (RCm) and
Laplace (RLm) distribution as proposed by [194, 302], and the proposals of
adaption of mutation ratio according to [245] and [96], are the mutation operator for floating vectors that have been considered.
– Permutations: The mutation operators for permutations covered by this comparison are: 2-opt (P2Optm), 3-opt(P3Optm) and k-opt (PKOptm), simple
interchange mutation operator (PSWm) o insertion operator (deleting the
item from its original position) of 2 element (PIm), and “scramble mutation
operator” (PSCm) [266].
– State machines: The basic mutation operator for state machines is based on
the set of its states and transitions, slightly modifying any state or transition
as porposed by [17, C3.2.4].
– Trees: The mutation operators for trees covered by this comparison are those
proposed by [11]: (i) grow mutation operator (TGm); (ii) reduction mutation
110
5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2)
operator (TSHRm); (iii) swapping mutation operator (TSWm); (iv) cycle mutation operator (TCm); and (v) the gaussian mutation operator for numeric
nodes (TGNm). The adaption proposed by [193] is also taken into account.
– Mutation operators for composite representations (CSm): Mutation operators for
individuals using composite representations can be created by applying the
corresponding operators to each component of the representation.
– Composite Mutation operators (CPXm): Composite mutation operators are possible, through the assignment of a probability (or decision rule) to the application of an operator from a set of valid operators for the representation
used.
– Mutation operators using dynamic probability (DEm): There exists empirical evidence [95] that the use of a dynamic mutation probability that decreases exponentially along the evolution process, improves the performance of EAs.
In this comparison, we have taken this feature it into account.
Metric: A uniform metric is defined, where the weight evenly distributed among
mutation (0.5) and crossover (0.5) operators. For each variant of those operators,
weights are uniformly associated.
• C2.4 Selection Mechanisms: This characteristic assess the support for the different
criteria for solution selection. The problem of selecting a subset amongst a larger
set of solutions appears as a specific heuristic on a number of metaheuristic techniques (SA, TS, EA, ACO, etc.). By applying OO analysis and design methodologies, and specifically the strategy design pattern 2 , objects encapsulating the
solution selection logic are called selectors. The use of different selectors allows
for controlling the trade-off between exploration and exploitation of the search
space. As a consequence, performance of metaheuristic techniques in finding
good solutions to problems is drastically affected by those selection criteria. Usually, selection criteria are based on the adequacy of solutions, but there is a wide
set of possibilities, from random to elitism (stochastic and deterministic).
In this comparison the following criteria are taken into account: (i) elitist selector (Es), that picks the best solutions, and its variants; expected value selector
2 The
strategy pattern is a particular software design pattern, whereby algorithms can be selected
at runtime. This pattern is useful for situations where it is necessary to dynamically swap the algorithms used in an application. The strategy pattern is intended to provide a means to define a family of
algorithms, encapsulate each one as an object, and make them interchangeable [109].
111
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
(EVs) and elitist expected value selector (EEVs) as proposed by [152]; (ii) proportional selector (Ps) as proposed by [139], where probability of select s, P(s) is
proportional to their fitness, and its variants, random sampling selector (RSSs)
and stochastic tournament selector (STs) [38]; stochastic universal sampling selector (SUSs) as proposed by [18]; (iii) ranking based selectors: linear (LRs) and
non-linear (NLRs), developed by [294]; (iv) selection schemas (µ, λ), (µ + λ) and
(v) threshold based selectors (Ths); (vi) Boltzman selector (Bs), (vii) a fully random selector (RNDs), (viii) and a selector that combines a pair of different selectors (COMBs) by dividing the set of elements to select amongst its components.
Metric: A uniform metric is used to assess this characteristic.
Figure 5.2: Adaption to the problem and its structure support
• C2.5 Fitness Function Specification Support: The most problem dependent element
of metaheuristic techniques is the objective function to be optimized. Therefore,
even when using MOFs, its evaluation is usually implemented explicitly by users,
and integrated into the framework through its extension points. However, based
on the solution encodings supplied by MOFs, it is possible to provide tools for
declarative objective function specification, freeing the user from the low level
task of implementing it.
In this case, a Domain Specific Language (DSL) is a tool of great interest for objective function specification. The advantages of using a DSL, compared to classical
implementation are that the DSL can be a much simpler language than the implementation language, and integration of the objective function can be automatic if
112
5.4. ADAPTING TO A PROBLEM AND ITS STRUCTURE (C2)
the MOF supports it. If the MOF provides suitable DSL tools for the specification
of the objective function (such as syntax highlighting and in-line debugging and
error information), it could lead to a more declarative paradigm for metaheuristic problem solving, improving the usability of metaheuristics and contributing
to a wider application of such techniques. There are also drawbacks when using DSLs for objective function specification, such as the need to learn a new
language, performance loss, and the inability to model some objective functions
using the language constructs.
Finally, there are problems types for which the automatization of objective function evaluation is impossible, since it relies on a human operator’s interaction to
evaluate solutions. In order to support this kind of problems, MOFs can provide
a form in which users can directly provide the evaluation of solutions. Moreover,
a partial implementation would be provided, where MOF users would customize
the data entry form and solution representation (graphical or textual), designing
a user friendly interface integrated within the framework. Metric: A uniform
metric is defined to assess this characteristic, using features enumerated above:
DSL support, DSL tools, and forms for solution evaluation by human operators.
• C2.6 Constraint Handling: A feature of great importance for proper problem modeling is constraint definition support. There are usually two different ways to
handle constraints when solving optimization problems 3 : (i) include constraint
meeting in objective function definition as penalties; (ii) and create repairing
mechanisms that are applied to infeasible solutions. There are three alternatives
of implementation for those mechanisms on MOFS: (a) provide global repairing
mechanisms that users can implement for the problem at hand, (b) explicit modeling of each constraint, and (c) specific repairing mechanisms for each constraint.
In the same way as in characteristic C2.5, (iii) the use of a DSL can make it easier to specify constraints for users, and some mechanisms, such as penalization
(cf. (i)), can be applied without the need of implementation by users. Metric:
An ad hoc. metric is defined to asses this characteristic, where the weights have
been associated to each feature as follows: (i) penalization 0.3, (ii.a) global repairing mechanism 0.2, (ii.b) individual constraint modeling 0.2 (ii.c) individual
constraints repairing mechanisms 0.2, and (iii) DSL support 0.1.
3 Various
techniques to adapt metaheuristics to constrained problems have been proposed in literature (c.f. [190] for instance). However, most of these approaches require ad hoc implementation of the
techniques depending on the problem and type of constraints to handle; consequently it is difficult to
integrate those proposals into a MOF. Those ad hoc techniques have been omitted in our comparison
113
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
5.4.2
Assessment and Feature Coverage Analysis
It is remarkable that only 9.57% of features of this area are supported by a minimum
of six out of the ten MOFs under study. Moreover, those features are associated to only
three characteristics (namely C2.1, C2.3 and C2.4), and are mainly related to EA. An
interesting fact shown in Table §A.2 is that more than 25% of features in this area are
not supported by any framework.
5.4.3
Comparative Analysis
Area C2 along with C3 have the smallest average score of our benchmark, evidencing that framework developers have put more emphasis on coding algorithms
for problem solving than in the support for an easy and efficient adaptation of these
algorithms to the problem. Remarkably, there is a lack of support for: (i) the definition of neighborhood structures (except EasyLocal, ParadisEO and HeuristicLab), (ii)
the specification of the objective function, and (iii) constraint handling (exceptions are
FOM, Eva2, ParadisEO and HeuristicLab).
In Figure §5.2 a stacked columns diagram is shown for the characteristics of this
area. Just like in Figure §5.1 colors represent characteristics of this area and columns
their support by the assessed MOFs.
Based on information shown in Table §A.2 and Figure §5.2, we can provide an answer for RQ2: The means of problem adaption are summarized by the characteristics
of area C2, however, current support of these mechanisms is limited and strongly depends on the MOF and metaheuristic to use for problem solving.
It is important to note that characteristic C2.4, is intimately related to EA support,
and consequently those MOFs that do not support this technique are no able to support the features of this characteristic. However, those MOFs, such as EasyLocal, are
still able to provide support for the rest of the area, and constitute very useful alternatives when applying other techniques. Thus, users must have this into account when
comparing different MOFs.
114
5.5. ADVANCED CHARACTERISTICS (C3)
5.5
A DVANCED CHARACTERISTICS (C3)
In this area we evaluate general and advanced characteristics, not related to specific
metaheuristics techniques. Specifically, the characteristics assessed in this area are: the
use of hybrid techniques, the implementation of hyper-heuristics and distributed and
parallel execution. These characteristics are of great interest since they can either drastically improve the results obtained or simplify the application of techniques. They
are especially interesting because their implementation involves a high cost and complexity, preventing their application in many contexts. As MOFs can provide these
characteristics pre-implemented, their applicability is significantly broadened.
5.5.1
Characteristics Description
C3.1 Hybridization: There is ample empirical evidence of the success of hybrid techniques for optimization problem solving as stated by Talbi [267]. Several authors have
described taxonomies of hybrid metaheuristics, to discern the ways techniques can be
combined such as [267] and [238]. In this work we restrict the concept of hybrid metaheuristic to a combination of techniques integrated at a high-level (as defined by [231]),
where each technique keeps its overall structure except at the point of invocation of the
other. Specifically we have considered four different types of hybridization: (i) batch
execution of the same technique (BEMIh), in which the technique is executed several
times; (ii) batch execution of different techniques (BEMMh), where various techniques
are executed sequentially and can the results of one can be used as an initial solution of others; (iii) interleaved execution of a technique as a step in each iteration of
another; possibly affecting the internal variables (IMMh); and (iv) combinations of
various types of the above (Ch). Metric: An ad hoc. metric is defined to asses this
characteristic, with the weights of the features being: (i) BEMIh 0.1, (ii) BEMMh 0.2,
(iii) IMMh 0.6 and (iv) Ch 0.1.
C3.2 Hyper.heuristics: A hyper-heuristic is readily defined as: a heuristic that selects
heuristics. Hyper-heuristics are intended to provide robust and general techniques of
broad applicability without needing extensive knowledge of both the technique and
the problem to solve. Hyper-heuristics have received much attention in recent years
[47, 57]. Hyper-heuristics search from the heuristics space the heuristic that best solves
a particular problem. The search space for hyper-heuristics could consist of four different subspaces: (i) optimization techniques space, with fixed parameters for each
technique, (ii) parameter values space for a technique; (iii) underlying heuristics space
115
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
for a technique (e.g. searching on a space of applicable selection, mutation or crossover
operators when using an evolutionary algorithm); and (iv) search space of possible solution encodings . Metric: A uniform metric is defined to asses these characteristics
(with each search space weighing 0.25).
C3.3 Paralell & Distributed Computation: Many adaptations of metaheuristics have
been proposed in the literature to exploit the paralell processing capabilities available
in current distributed environments. Incorporating these strategies in a MOF is a significant improvement in their applicability and relevance to the resolution of a great
number of real problems, given the complexity and cost of its implementation. Parallel and distributed execution of metaheuristics techniques without intercommunication (IPDM) can be implemented independently of the technique to apply. The only
requirement is the installation of the MOF in each of the computers of the distributed
environment and enabling a mechanism for communication and control in order to
design, plan, launch execution and control optimization tasks in that distributed environment. Another similar variant is one in which techniques can exchange solutions
(SSPDM). A parallel EA-based on islands with migration (as proposed by [295]) would
qualify as a SSPDM technique. Finally, techniques that need a change on the implementation of metaheuristics are sub-classified by [41] into:
Parallel Local Search Metaheuristics: a unique executing instance of the metaheuristic
controls the distributed and parallel exploration of its current solution’s neighborhood
(LSPDNM).
Parallel Population-based Metaheuristics: There are two different approaches to create
paralell population based metaheuristics: (i) parallel and distributed objective function evaluation for the individuals of the population (PDPEDM), where in each network node a different subset of individuals conform the current population to be evaluated. The main difference with SSPDM is that a unique instance of the metaheuristic
algorithm is executed in the distributed environment. (ii) Parallel evaluation of the
objective function, where computing objective function of a solution implies parallel
processing in various nodes (PDESSM). Metric: A uniform metric is defined to assess
this characteristic, where variants taken into account are IPDM, SSPDM, LSPDNM,
PDPEDM and PDESSM.
5.5.2
Assessment and Feature Cover Analysis
It is remarkable that only 6.25% of features of this area are supported by a minimum
of six out of the ten MOFs under study. Furthermore, 40% of MOFs provide a nearly
116
5.6. MPS LIFE-CYCLE SUPPORT (C4)
nil support (fewer than 10% of features) in this area.
With respect to the features of this criterion, the highest scores correspond to ParadisEO and FOM. Although both frameworks support the first characteristic, FOM
does not support Parallel and Distributed Optimization whilst ParadiseEO does not
support Hyper-heuristics. Currently, FOM is the only framework that supports Hyperheuristics. In Figure §5.3 a stacked columns diagram is shown for the characteristics of
this area.
Figure 5.3: Advanced characteristics support
Table §A.1 and Figure §5.3, answer RQ3, RQ4 and RQ5: Basic hybridization, such
as (BEMIh) and (BEMMh) is currently supported by many MOFs, but more advanced
hybridization techniques, such as (IMMh) and (Ch) are not. Parallel and distributed
computing is currently supported by ParadisEO, ECJ, MALLBA and to a limited extent
by other mainly EA-oriented frameworks such as JCLEC and EvA2.
5.6
MPS
LIFE - CYCLE
S UPPORT (C4)
One of the strengths of MOFs is its capacity to support the MPS life-cycle. This
support allows users without a deep knowledge in the area to apply metaheuristic
techniques and obtain useful real results. This area evaluates these capacities.
117
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
5.6.1
Characteristics Description
Seven characteristics have been established, covering the various stages of execution of the global optimization problem solving process (4.1, 4.2, 4.3, 4.4 and 4.7), and
the ability to interact with the user (4.5) and with other systems (4.6). The following
describes those characteristics:
C4.1 Termination Conditions: Metaheuristics do not provide explicit temination criteria, since, in general it is not possible to evaluate whether it have reached the global
optimum solution. Thus, users have to set criteria based on the specific needs and context of the problem to decide when to stop the execution of the metaheuristic. MOFs
can provide implementations of the usual criteria for reuse, among which we find: (i)
maximum number of iterations, (ii) maximum execution time, (iii) maximum number
of objective function evaluations, (iv) maximum number of iterations or execution time
without improvement in the optimal solution found (v) reaching a concrete objective
function value (vi) and logical combinations (using operators AND / OR) of the above
(e.g. ExecTime ≤ 36000 OR ExecTimeWithOutImprovement ≥ 3600. (vii) Furthermore,
termination conditions can be established independently of the problem to solve but
dependent on the technique used, such as a termination criterion based on the diversity
of the population when using an EA. Finally, (viii) we evaluate the facilities provided to
enable the definition of specific criterion by its implementation. In this sense, we have
assessed the use of abstract classes or interfaces to evaluate the termination condition
and its use in the implementation of the metaheuristic techniques provided. Metric:
A proportional metric is defined, were (viii) weighs 0.3 , and the remaining weight is
evenly distributed among the other criteria.
C4.2 Batch mode execution: The ability to automatically run a set of optimization
tasks, where the user only has to specify the sequence and number of times to execute each task is important when performing experiments. The support of this feature
promotes cost reduction, by automating one of the most tedious tasks of research and
studies with empirical validation. We have defined four features related to this automation: (i) repeated execution of a task (using the same technique, parameters values
and instance of the problem); (ii) repeated execution of a task with different parameters
(defined a range or set of values for the parameters of the technique); (iii) execution of
various tasks on the same instance of the problem; and (iv) execution of various tasks
on multiple instances of the problem. Metric: A weight of 0.2 has been given for the
four features described above. In addition the ability to randomize the optimization
task execution sequence and the generation and loading of a document or file where
118
5.6. MPS LIFE-CYCLE SUPPORT (C4)
tasks are defined (the task execution plan, where description of tasks to execute can be
user-supplied or generated by MOFs) weighs 0.2.
C4.3 Experimental Design: The appropriate design of experiments is essential to obtain valid conclusions in any study. This characteristic assesses the support provided
by MOFs to establish hypothesis, identify dependent and independent variables, and
select and define experiments properly using standard designs (factorial, latin squares,
fractional, etc.). This characteristic is assessed independently of the previous characteristic (C4.2), and the capacity for statistical analysis of results (C4.4). There are two
different ways to support this characteristic: (i) provide integration mechanisms with
design of experiments systems (such as GOSSET [254]); and (ii) implement the utilities
for experimental design in the MOF itself. The alternative (i) implies that capabilities
for experiment design are those of the system to integrate with, and are difficult to
assess in the context of this comparative.
We have created a set of features in order to assess the capabilities of frameworks
that use this approach (ii): (a) hypothesis definition support, specifically common hypothesis, such as equality of performance of two techniques or irrelevance of the value
of a parameter in a range; (b) experiments modeling, supporting the definition of dependent and independent variables and their nature (nominal, ordinal or scalar); (c)
experiments design based on the previous model using common schemes; and finally
(d) the capability of executing the experiments automatically, this feature assess the capability of generating a proper task execution plan for the experiments designed (C4.2
evaluates capabilities of automation of those plans execution).Metric: A proportional
metric is defined, where approach (i) weighs 0.2, and the remaining weight is evenly
distributed among features of approach (ii).
C4.4 Statistical Analysis: One of the most common tasks in solving optimization
problems (and in any study with an empirical component) is the statistical analysis of
experimental data and results. There are two different ways to support this characteristic: (i) to provide integration mechanisms with statistical analysis systems (such as R
or SPSS); and (ii) to implement the utilities for statistical analysis in the MOF itself.
One of the disadvantages of approach (i) is that the user must import data into
the statistical analysis system and perform statistical tests on it, interpret results and
return to the framework to change parameters or implementations if necessary. This
approach frees the MOF from the implementation of the statistical tests. Moreover,
statistical analysis systems are usually more complete and powerful than implementations of tests integrated on frameworks. On the other hand, the use of strategy (ii)
119
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
allows the framework to automate the tests and associated data exchange, showing the
results integrated in its user interface, and even react autonomously to the results of
tests. A set of features have been created in order to assess capabilities of frameworks
that use approach (ii), concerning the support of various tests both parametric and
non-parametric: (a) t-student; (ii) one way ANOVA; (iii) two way ANOVA; (iv) n-way
ANOVA; (v) Mann-Withney U test; (vi) Wilcoxon test; and (vii) Kolmogorov-Smirnov
test (or any test to assess the distribution of normal data). The use of approach (ii) does
not necessarily imply that approach (i) can not be applied. In this sense the integration
with the statistical software can be performed at the test execution level (to free the
implementation burden), while providing programmatic support or graphical interfaces integrated in the MOF. Metric: A proportional metric is defined, where approach
(i) weighs 0.3, and the remaining weight is distributed uniformly among features of
approach (ii).
C4.5 User Interface, Graphical Reports and Charts: The usability of applications strongly
depends on the proper design of its Graphical User Interface (GUI). Specifically, an appropriate GUI for MOFs requires taking into account the rest of the characteristics of
this comparison criteria: the ability to select and configure the parameters of the different techniques, reporting of the results and monitoring of the status of optimization
tasks and of the global execution plan, the control of nodes in distributed and parallel
computing environments, the on-line technical support, and the assistance or communication with the user forums and developers of the MOF. Moreover, although GUI
design and usability could be assessed, the evaluation would include a subjective bias.
In order to avoid it, we have defined the following set of features to be evaluated: (i) Integrated help and basic usability (menus, shortcut buttons, etc.); (ii) techniques specification and parameters configuration support, (iii) problem modeling and data import,
(iv) Graphical support of advanced features (subdivided into batch mode execution
configuration, design of experiments and statistical analysis of results) (v) the use of
optimization project where all the information about problem instances, techniques and
results are stored, and (vi) the graphical representation of results through diagrams
and figures. Metric: A uniform metric is defined to assess this characteristic (each
feature weighs 0.2). If the MOF only show the evolution of the objective function of
the best solution, but no additional metrics are provided (such as population diversity when using EA, or current solution when using TS or SA), feature (vi) has been
evaluated with half of the weight.
C4.6 Interoperability: This characteristic assesses the set of capabilities that frameworks provide to exchange information and interact with other systems. Specifically
120
5.6. MPS LIFE-CYCLE SUPPORT (C4)
the following features are taken into account: (i) results and data export capabilities
(considering formats such as CSV or excel/odf files); (ii) data import capabilities (using formats such as CSV, excel/odf files or specific formats of standard libraries of
each problem type, such as SATLIB or TSPLIB); (iii) the capability of deployment and
invocation as a web service (as in [112]); and (iv) the use of XML to store information
associated to optimization projects (selected solution encoding, objective function and
problem model, techniques and their parameters, experiment design and results and
statistical analysis, etc.), so that other systems can process these data and parameters
in a simple way. Metric: A uniform metric is defined to assess this characteristic (each
feature weighs 0.25).
5.6.2
Comparative Analysis
The low score obtained by ParadisEO in this area is surprising, highlighting this
as a potential area of improvement for that framework. OAT is among the highest
scored frameworks (which has a well-designed GUI as well as powerful experiments
execution and statistical analysis support) followed by JCLEC, whose characteristics
in this area have been evaluated together with those of its associated project KEEL (focused on Data Mining and classification applications). Note that this area has, together
with areas C2 and C3, the lowest support levels, thus representing significant areas of
improvement in the present framework ecosystem. In Figure §5.4 a stacked columns
diagram is shown for the characteristics of this area.
Figure 5.4: General optimization process support
121
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
Table §A.4 and Figure §5.4, answer the requirements for RQ6: Area C4 characteristics summarize the capabilities provided by current MOFs for helping in the conduction of research studies and the general problem solving process. Those characteristics
vary from statistical analysis and experiment execution engines, to GUIs with wizards
and chart generation. These tools, however, are not interoperable, and the quality and
support of each MOF is not homogeneous, it is disperse on the set of frameworks.
Consequently those tools are not available for all techniques nor for programming languages and platforms.
5.7
D ESIGN , I MPLEMENTATION AND L ICENSING (C5)
Both a suitable licensing model and the availability to run in multiple platforms are
essential to the success of any software product. In the case of software frameworks,
incorporating proper design and effective implementation is also very important, since
applications created using it incorporate their design therein (with the errors and problems that they may contain). Moreover, the efficiency of those applications is limited
by the efficiency of the framework. As a consequence, a comparison area has been
defined to group this set of characteristics as described below.
5.7.1
Characteristics description
C5.1 Language: Implementation language can be a key factor for users of MOFs,
since the use of a well-known programming language reduces development costs and
likelihood of errors. Frameworks under consideration in this share are implemented
in C++, C# and Java.
C5.2 Licensing: Cost is not a characteristic of interest since all the frameworks assessed are free; however, licensing of MOFs can limit the context and purposes of their
use, or they can be forced to provide the client with the source code of the generated
application. From this perspective the types of license we take into account are: (i)
commercial; (ii) free without providing MOF source code nor commercial use; (iii) free
with MOF source code available only for certain organization and usages (usually universities and non profit activities); (iv) MOF source code available under GPL (GNU
General Public License) or similar, that forces the distribution of the source code of
derived products under GPL license; and (v) MOF source code available under LGPL
(GNU Lesser General Public License) or similar, that allows the use for commercial
122
5.7. DESIGN, IMPLEMENTATION AND LICENSING (C5)
application without restrictions on source code availability. Metric: This feature is not
evaluated using a set of features but we establish a direct score, based on the freedom
that each license provides: (i) Commercial Licensing = 0; (ii) Free binaries (no commercial use) = 0.25; (iii) Restricted availability of source code = 0.5; (iv) GPL = 0.75 and (v)
LGPL =1.
C5.3 Supported Platforms: The set of platforms taken into account are: Windows,
Unix (Linux, Solaris, HPUX, etc.) and Mac. Metric: A uniform metric is defined, with
each platform weighing 0.b
3 ; in the case of partial support (only a limited set of features
are available on a certain platform) we penalize it with 50%.
C5.4 Software engineering best practices: A proper design and following of software
engineering best practices is especially important for MOFs. However, assessing the
design of a framework in a quantitative and objective way is a difficult task. As a result,
features only evaluate basic use of certain tools and processes recognized as best practices such as: (i) the use of design patterns to promote flexibility in variation points;
(ii) the use of automated tests (unit tests), this characteristic is evaluated based on the
source code of MOFs (for those that do not provide the source code, evaluation is based
on the documentation, if tests exists); (iii) explicit documentation of the MOF variation
and extension points; and (iv) the use of reflective capabilities and dependence injection to promote flexibility as described by [104]. The latter feature corresponds to the
capabilities of the framework to dynamically load types of problems, objective functions, and other elements associated with customization or extension without having
to recompile the framework. With regards to feature (iv), MOFs that perform runtime
loading of modules have been associated with half of the weight, while those that use a
dependence injection system for the management of modules have full weight. Metric:
A uniform metric is defined to assess this characteristic.
C5.5 Size: A basic measure of the complexity of a framework is its size. The size
of a framework can be measured by various metrics, number of lines of code, number
of classes and packages / modules, number of variation points and possible combinations of components, etc. It would be inappropriate to use the size of frameworks as
a quantitative evaluative criteria, since the functionalities supported are not directly
related to it, and an increase in its size does not necessarily imply greater complexity
in its use. Therefore, we consider it as a qualitative criterion. As a consequence, we
consider some of these measures for each framework but they will not be included in
the quantitative assessments.
C5.6 Numerical handling: Most metaheuristic techniques are stochastic, requiring
123
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
the use of a random number generator. This fact has two consequences: (i) choosing
a good random number generator is a key point for the proper behavior of the techniques implemented by MOFs; and (ii) in order to support experiments replicability, a
unique seed must be used on all random number generators used by along the framework and its customizations/extensions developed by users. Features evaluating this
two important points are defined for this characteristic, where (i) evaluates if a proper
random number generator is provided (either a Mersene Twister implementation or
support for customization of the random number generation scheme); and (ii) evaluates the replicability of experiments based on the support of a global seed and provision of a random number generator using this seed to user implemented modules.
Metric: A uniform metric is defined to assess this characteristic.
5.7.2
Assessment and feature cover analysis
This area seems to be the most homogeneous and supported in the sense that most
frameworks support almost all the features and to a high degree. The platforms supported is practically universal, except for HeuristicLab, EsayLocal and some modules
of ParadisEO. It is remarkable also the general adoption of the UML notation, as well as
the open source licensing models. In Figure §5.5 a stacked columns diagram is shown
for some characteristics of this area. With regards to the size of MOFs, Figure §5.6
shows the framework sizes in terms of number of packages(or modules) and classes
(or files, when there is not a direct relation from files to classes). These attributes may
be of interest because the size of a framework may be an indirect measure of its complexity and therefore of its possible difficulty of use. However, the restrictions imposed
by the language should be taken into account as for example in java each public class
must be in a separate file.
Table §5.3 and Figure §5.5 provide the answers to RQ7, RQ8 and RQ9. There is
wide availability of MOFs per platform, where each technique is available on nearly
all platforms. This fact is due to the use of platform independent programming languages such as Java and C++ (using standard libraries). However, as there is not
a MOF supporting all techniques, users must be careful since although there could
be available platforms providing implementations for missing techniques, the effort
needed for changing is considerable, and implies give up other features or variants.
All the frameworks evaluated provide GPL or free licenses for teaching or research
purposes. Finally, basic software engineering best practices, such as UML diagrams
of MOFs architecture and dynamic module loading are widely supported, but more
124
5.8. DOCUMENTATION & SUPPORT (C6)
Figure 5.5: Design, implementation & licensing assessment
Figure 5.6: Frameworks size
advanced ones, such as automated tests, use of dependence injection libraries and explicit variation points documentation are not supported. Notably, some frameworks
do not support the use of a proper random number generator nor its customization.
5.8
D OCUMENTATION & SUPPORT (C6)
When selecting a framework for developing any kind of application, documentation, technical support and user community responsiveness are important. These are
125
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
MOF
Platforms
License
EasyLocal
ECJ
Prog.
Lang.
C++
Java
Unix
All
ParadisEO
C++
EvA2
FOM
HeuristicLab
JCLEC (and KEEL)
MALLBA
Optimization Algorithm Toolkit
Opt4j [178]
Java
Java
C#
Java
C++
Java
All (Except
for windows
if
using
PEO)
All
All
Windows
All
Unix
All
GPL
Open Source (Academic
free license)
CECILL (ParadisEO) and
LGPL (EO)
Java
All
LGPL
GPL
GPL
LGPL
Open Source
LGPL
LGPL
Table 5.3: MOFs Programming languages, platforms and licenses
the factors that can smooth out the learning curve when users have no experience and
need to solve problems or errors that arise during use. Consequently we have considered those factors, including additional features, in order to measure the maturity
of the frameworks; such as types of problems that MOFs bring as samples, and the
number of scientific articles published using the framework:
C6.1 Sample problem types: As a measure of maturity and supportiveness of frameworks, we have this characteristic that assesses the implemented problem types that
MOFs provide. This characteristic can also measure to what extent MOFs have been
applied and tested with different kinds of problems. Moreover, solved problem types
can be excellent starting points if users try to solve problems to some extent similar to
those provided. The set of problem types considered comprises problem families such
as TSP, SAT, QAP, Job Shop Scheduling, Flow Shop Scheduling and knapsack, iterated prisoners dilemma, symbolic regression problems, and others. The exact problem
types can be consulted in the evaluation data sheet mentioned previously. Metric: A
uniform metric is defined, where the weight is distributed evenly amongst the evaluated problem types. The set is comprised of fifty nine different problem types.
C6.2 Articles & papers: Another way to assess the maturity and quality of MOFs is
126
5.8. DOCUMENTATION & SUPPORT (C6)
through scientific publications that describe MOFs or report their use. The assessment
of this characteristic relies on publications found during our literature review and on
publications enumerated on MOFs websites. A total number of 285 publications were
found for the selected MOFs, searching for papers from 2000 to 2010. Metric: An ad
hoc. metric is defined: the maximum score (1.0) was assigned to the framework with
the most publications, namely ECJ with 113, and the score of the other frameworks
were computed based on this formula: score=(publications of MOF N) / (maximun number
of publications per MOF)4 .
C6.3 Documentation: Documentation is the main source of information for users in
a framework, a capital element to enable its use. This characteristic is assessed based
on the presence (or absence) of the following features: (i) User manual; (ii) Technical/development documentation; (iii) “How to” document, where short recipes are
provided to perform usual actions; (iv) frequently asked question section on the web
site of framework documentation; and (v) MOF web site. Metric: A uniform metric is
defined, where each feature weighs 0.2.
C6.4 Users & Popularity: This characteristic intends to assess the number of users
of each framework. The evaluation of this characteristic is based on the number of
researchers using each framework outside the MOF creators research group and development team; we name them “external users”. In order to evaluate this characteristic we have filtered publications found during our literature review using each
MOF and on publications enumerated on MOFs websites, removing those where one
of its authors is member of the development team or research group of MOFs creators.
Metric: An ad hoc. metric is defined: the maximum score (1.0) were assigned to the
framework with more external publications, namely ECJ with 84, and the scores of the
other frameworks were computed based on this formula: score=(external publications of
MOF N) / (maximun number of external publications per MOF). The whole set of publications found per framework is available at http://www.isa.us.es/uploads/MOFs/
bib/N-external.bib, where N is the name of each MOF; for instance ECJ bibliography of external publications is available at http://www.isa.us.es/uploads/MOFs/
bib/ECJ-external.bib.
4 The
whole set of publications found per framework is available at http://www.isa.us.es/
uploads/MOFs/bib/N.bib, where N is the name of each MOF; for instance, ECJ bibliography is available
at http://www.isa.us.es/uploads/MOFs/bib/ECJ.bib.
127
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
5.8.1
Comparative Analysis
In general, the feature that is less supported in this area is the implemented problem
types. With regards to papers that describe or apply MOFs and popularity between external authors, ECJ is the most salient framework, that dwarfs the other MOFs in this
comparative. Figures §5.7(a) to §5.7(d) illustrate this fact: Figure §5.7(a) shows the
number of publications per MOF and year. ECJ appears as the senior framework, obtaining a dominant position early in the procedures which it still holds. The Figure (b)
shows the total number of publications per MOF. Subfigure §5.7(c) shows the number
of external and internal publications per MOF as an stacked columns chart. Figure
§5.7(d) shows the number of external authors per MOF. ECJ is followed by ParadisEO
and HeuristicLab in number of publications. ECJ has nearly 75% of external publications and 65% of external authors. The less popular frameworks are FOM and OAT
with nearly null external usage and a small number of publications.
Note that there are two frameworks that score low on Documentation, namely OAT
and EasyLocal. All frameworks have active and supportive communities of users/developers. Figure §5.8 uses a stacked columns diagram to summarize the support of
this area’s characteristics.
Figures §5.7 and §5.8, and the information gathered along this study provide an answer to RQ10: Currently, a high number of MOFs are available which support a wide
set of features. So when addressing new problems or performing research studies on
well-known ones, the use of MOFs becomes a valid approach. MOFs use outside of
developers research groups could be boosted by an improvement of framework documentation and support. Currently, the most popular framework is ECJ, which has
a large community of external users and a wealth of publications year on year. Moreover, there seems to be a correlation between the score in area C3 and MOFs popularity,
since frameworks with higher scores in that area are those with higher popularity. This
fact is not surprising, since that area contains some of the features that add more value
for user. These features, such as distributed and parallel optimization, make MOFs
tools capable for solving extremely complex problems, and are difficult to implement
from scratch; thus making those frameworks more attractive for users that need those
features and contributing to make those MOFs popular.
128
5.8. DOCUMENTATION & SUPPORT (C6)
(a) Publications per frameworMOF and year
(b) Total publcations per MOF
(c) Total number of internal and external publications (d) Total number of external authors per MOF
per MOF
Figure 5.7: Publications and external authors per MOF
129
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
Figure 5.8: Documentation and technical support
5.9
D ISCUSSION AND C HALLENGES
In this section, we discuss the results obtained in this study. Based on these results,
we identify a number of challenges (RQ11) to be addressed in the future. Challenges
are part of the authors’ own personal view of open questions, based on the analysis
presented in this paper. Figure §5.9 shows global score results for MOFs as Kiviat
diagrams, summarizing the results of this study; evaluating MOFs from a research
user perspective. In the appendix, Table §A.2 shows the global score obtained for each
MOF and characteristic as well as the average for each area.
To achieve the maximum score in areas C1,C2 and C3, each MOF would have to
implement an ample subset of the current state of the art on metaheuristics, so it is
not surprising that the scores do not generally reach the maximum possible value. On
the contrary, the small average values on areas C4 and C6 are significant and therefore
show a general improvement direction for current MOFs.
5.9.1
Capabilities Discussion
On average, the MOF with the best score is ECJ (maximum area in Fig. §5.9), making it a preferred choice if users can use EA on java. However, this MOF scores below
average in areas C1 and C5, which are clear improvement areas for it, and could lead
users to evaluate different options (C1 measures techniques available). The next best
scored MOF is ParadisEO, salient in areas C1 and C3, which uses C++ as its implemen-
130
5.9. DISCUSSION AND CHALLENGES
tation language. This MOF, however, scores below average in C4 area, making this a
clear improvement area. The MOFs that provide the amplest support in terms of the
variety of metaheuristics (criterion C1) are FOM and ParadisEO. The score obtained by
OAT in C4 area is remarkable, much above average, and it is due to its GUI, experiments execution and statistical analysis tooling. In this same area the support of JCLEC
(and its twin project KEEL) is also above average. However the best score of the GUI
characteristic is obtained by HeuristicLab, that in its las version (3.3) provides an complete, highly configurable and intuitive user interface. C5 area is where all of MOFs
provide better average results. This is not surprising given that these characteristics
are key for frameworks use and success, and are clear signs of technical competence
and maturity. In this sense, MOFs without good design or implementation simply
do not survive. Finally, the average value of area C6 indicates the need to improve
documentation, user guidance and support. Thus we define Challenge 1: Improve
documentation, user guidance and support and GUI tooling.
5.9.2
Evolution of the market of MOFs
The creation of this benchmark has been a time consuming and demanding task.
However the length of this task has allowed the evaluation of an additional feature of
the set of MOFS: its liveliness and evolution speed. During the creation of this benchmark, various frameworks released new major versions with important improvements,
namely ECJ, PARADISEO, JCLEC and HeuristicLab, moreover other frameworks such
as EvA2 and Opt4j released minor versions with bug fixes and minor features. This
evolution allowed us to test the evaluation framework presented in this study. No
modifications were needed in order to asses those new versions of the MOFs and their
features, thus it validates the flexibility and completeness of our approach. Moreover,
both the previous and new versions of those frameworks were evaluated, providing a
dynamic view of the ecosystem, in contrast with the static one shown in the previous
sections. In this sense, we can evaluate the “hot areas”, i.e. those areas where more
evolution have been performed; and the speed in the evolution of the assessed MOFs.
In this sense the area with bigger improvements are C4 and C5, primarily due to the
improvements in the GUI and licensing model of HeuristicLab and the new GUI of
ECJ. Additionally C1 and C6 have also improved significantly but in a smaller scale,
since new techniques and better documentation are provided by the assessed MOFs.
The MOF with a bigger improvement in this time was HeuristicLab changing directly
form version 1.1 to version 3.3. In this new version significant improvement is performed in the licensing model (it becomes an open source project under GPL license)
131
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
and the GUI and documentation has been found. The next framework in terms of
improvement during the creation of this benchmark was ECJ, where a multi-objective
technique and a GUI were added; complementary, significant improvements in documentation has been developed. Finally, the evolution measured shows that the current
MOFs Ecosystem is a vibrant and living one, where new versions and important features are added continuously.
Both the final evaluation of current versions and the previous one are available
as Google Docs spreadsheets at http://www.isa.us.es/MOFComparison and http:
//www.isa.us.es/MOFComparison-OLD respectively. They can be downloaded and
exported to different formats such as MS Office or open office for customization and
tailoring.
5.9.3
Potential areas of improvement of current frameworks
In addition to the points stated above about area C6, based on the finished comparative study carried out, and on results described above, we enumerate below some
gaps and unsupported features that have been identified. The areas where we see the
most room for improvement are C2 (Adaption to the Problem and its Structure), C3
(Advance Characteristics), C4 (General Optimization Process Support). Specifically,
some features that have room for improvement are:
• Hyper-heuristics support.
• Support for designing and automated running of experiments and for analyzing results.
• User guides together with wizards, project templates and GUI to aid the optimization
process
• Parallel and Distributed computing support.
• Domain Specific Languages for objective function and constraints formulation
Thus we define Challenge 2: Provide added-value features for optimization, such
as hyper-heuristics and Parallel and distributed computing capabilities.
In particular, with regards to the C5 area (Design, Implementation & Licensing), we
have identified the following unmet of software engineering best practices:
132
5.10. SUMMARY
• Absence of unit tests. Note that one of the discarded EA-oriented optimization
library (JGAP) is recognized reference for this practice [186], however assessed
MOFs donı̈¿½t provide unit tests in general (except for JCLEC and HeuricLab).
• Heterogeneity of project building and description mechanisms. It would be interesting that, as in ParadisEO, projects provide files for framework compilation using
standard mechanisms such as makefiles in C++, or ant or maven build files in java.
• Absence of explicit documentation of variation points. Although all the frameworks
that have been evaluated provide extensive technical documentation of the different classes and modules, none of them provide a scheme (such as feature models) to
describe the variation points of the framework, nor are these even described explicitly in natural language in the documentation. Moreover, none of the frameworks use the UML profiles for framework documentation [101].
• Limited dynamic and reflexive capabilities for loading problems, heuristics and techniques variants. Thus, only Opt4j uses a dependency injection mechanism (such
as Google Juice or Spring).
Finally, regarding area C1 (Metaheuristic techniques) there is always the possibility
of enlarging the portfolio of techniques implemented. The current support is uneven,
with some techniques (such as EA) practically universally supported and others (such
as GRASP, SS, ACO or AIS) being rarely implemented.
Thus we define Challenge 3: Improve techniques and variants support and Challenge 4: Develop standard benchmarks for MOFs.
5.10
S UMMARY
In this chapter we have performed an assessment based on the state of the art of the
main MOFs. The motivation of the study is based on the implications of the NFL theorem in terms of the desirability and advantages of using such tools, on the complexity
and difficulty of learning and mastering the use of any of these frameworks and on the
availability of a good number of MOFs.
From the MOFs assessment carried out, we can draw the following conclusions:
• Frameworks are useful tools that can speed up the development of optimizationbased problem solving projects, reducing their development time and costs. They
133
CHAPTER 5. COMPARATIVE FRAMEWORK FOR MOFS
might also be applied by non-expert users as well as extend the user base and the
applications scope for metaheuristics techniques.
• There are many MOFs available, which overlap and provide similar capabilities
which means that a certain duplication of efforts has been made. It would be
great if a certain coordination and standardization of these MOFs was carried
out in order to improve the support given to the user community.
• There are visible gaps in the support of specific key characteristics, as shown in
section §5.9.3.
The contributions presented in this chapter were published in the indexed journal
Soft Computing [213].
134
5.10. SUMMARY
Figure 5.9: General scores of MOFS as Kiviat diagrams
135
6
S CIENTIFIC E XPERIMENTS
D ESCRIPTION L ANGUAGE
The limits of my language means the limits of my world.
Ludwig Wittgenstein,
1889 – 1951
Austrian-British philosopher
Writing cannot express all words, words cannot encompass all ideas.
Confucius,
n this chapter we present SEDL, a language for describing scientific experiments. Section
§6.1 provides an introduction to the scope and general structure of the language. Section
§6.2 illustrates the elements of SEDL experimental descriptions. Section §6.3 shows how
SEDL can be used to describe experimental executions. In Section §6.4 we present a catalog of
analysis operations for SEDL documents. The extension points of the language are described in
Section §6.5. Finally, Section §6.6 summarizes the contributions of this chapter.
I
137
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
6.1
I NTRODUCTION
In this chapter we present SEDL (Scientific Experiments Description Language), a
domain-independent language to describe experiments in a precise, tool-independent
and machine-processable way. Additionally, we present a catalog of analysis operations to support the automated validation and extraction of information in SEDL experiments.
Figure §6.1 shows the structure of an experiment in SEDL and a sample experiment.
The document is divided into two main sections: experimental description and experimental execution. The former includes details about the objects, subjects, population,
constants, variables, hypothesis, design and analysis specification. The latter includes
information concerning the configurations used to run the experiments and the results
obtained.
A formal definition of the abstract syntax of SEDL using UML metamodels is presented in Appendix §B. The appendix also includes a specific concrete XML-based syntax to support serialization in our ecosystem MOSES (see Chapter §8). In this chapter,
a human-readable syntax based on plain text is used. In the following sections, the
different parts of a SEDL document are described providing examples.
6.2
E XPERIMENTAL DESCRIPTION
The description of an experiment includes all the information required to conduct
the experiment. In the next subsections we explain how experiments are described in
a SEDL document.
6.2.1
Objects, subjects and population
The first section of a SEDL document includes information about the context of
the experiment, namely experimental subjects, population and objects. The accessible
population could also be optionally described. Figure §6.2 shows a SEDL document
fragment of the experiment #1 presented in Chapter §3. The subjects (experimenters)
are Bart and Lisa Simpson. The experimental objects are individual human beings with
fever. The population is composed of all the sick people with fever. The accessible
population is the sick people in the Seville Hospital.
138
6.2. EXPERIMENTAL DESCRIPTION
Figure 6.1: SEDL structure and its mapping to a sample experiment
139
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
EXPERIMENT : New−A n t i p y r e t i c 1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Subjects :
B a r t Simpson
L i s a Simpson
Objects : ’Individuals with fever ’
Population : ’Any feverish person ’
A c c e s s i b l e P o p u l a t i o n : ’Feverish people in the Sevilla Hospital ’
...
Figure 6.2: Schema of the context information supported by SEDL
6.2.2
Constants and variables
Constants in SEDL are defined by an identifier and a value. Variables are defined
by an identifier, a type and a domain. Default types are integer, real and enumerates that can be in turn ordered or not. Variable domains can be described by extension (i.e., by enumerating each possible value), or by intension (i.e., defining a minimum and maximum value). Also, variables are divided into: controllable factors
(Factors), non-controllable factors (NCFactors), nuisance variables (Nuisances) and outcomes (Outcomes). Figure §6.3 depicts the constants and variables sections of the SEDL
document for experiment #2 presented in Chapter §3. The experiment has a constant
(TerminationCriterion), a controllable factor (OptTech), a non controllable factor (Instance), and an outcome (ObjectiveFunction). The levels of the variables can be simple
values, such as labels or integers, or a named list of properties (pairs of keys and values). For instance, P0( File : ‘/tmp/p0.pwsc0 ) means that the level P0 of the variable
Instance has a property named File with value ‘/tmp/p0.qwsc’.
EXPERIMENT : New−A n t i p y r e t i c 1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
...
Constants :
T e r m i n a t i o n C r i t e r i o n : ’MaxTime (10000) ’ // In m i l l i s e c o n d s
Variables :
Factors :
OptTech enum EA, GRASP1 , . . . , TS+SA // Opt . t e c h n i q u e
NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e
I n s t a n c e enum o r d e r e d P0 ( f i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language
Outcomes :
O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found
...
Figure 6.3: Schema of the context information supported by SEDL
140
6.2. EXPERIMENTAL DESCRIPTION
6.2.3
Hypotheses
The type of hypothesis of a SEDL experiment is specified with a keyword indicating
whether the hypothesis is differential, associational or descriptive (see Section §3.3.3 for
details). If the hypothesis is differential, it is assumed that the goal of the experiment
is to confirm or disprove that the values of the controllable factors make a difference
in the outcome. Associational hypotheses are intended to confirm or disprove that the
relationship between the controllable factor and the outcomes follows a specific mathematical relation. Finally, descriptive hypotheses state that the value of the outcome
has certain statistical properties.
An example of differential hypothesis is presented in Figure §6.4. The hypothesis
implicitly states that the specific optimization technique used has a significant impact
on the value of the objective function, i.e., that the techniques have different performance. Figure §6.5 shows a descriptive hypothesis, stating that the average decrease
in body temperature for the patients participated in the experiment #1 is 2.83 Celsius
degrees. The syntax for describing statistical analyses is defined in section §6.3.
EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
...
Variables :
Factors :
OptTech enum EA, GRASP1 , . . . , TS+SA // Opt . t e c h n i q u e
NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e
I n s t a n c e enum o r d e r e d P0 ( F i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language
Outcomes :
O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found
// Means t h a t t h e combination o f f a c t o r s , i n t h i s c a s e t h e opt . t e c h . makes
Hypothesis : D i f f e r e n t i a l
// a d i f f e r e n c e on t h e value o f t h e outcome
Figure 6.4: SEDL document with randomized design
Regarding associational hypotheses, the mathematical relationship depends on the
type of the outcome variable. This relation can be Linear for scalar outcomes. This
means that the value of the outcome variable depends on the values of the factors
(both controllable an non-controllable) according to the following equation: outcome =
C0 + C1 ∗ f actor1 + ... + Cn ∗ f actorn where C0 , . . . , Cn are constants in R. If the outcome
factor is not real the type of relation should be specified. For ordered and enumerated
outcomes the relationships are usually Logistic [133], and for binary variables it is Probit
[33].
141
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
EXPERIMENT : A n t i p y r e t i c s −Desc−Hyp−1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
...
Constants :
dose : 2 0 0 // measured i n m i l l i g r a m s
Variables :
Outcomes : bodyTemperatureDecrease f l o a t // i n C e l s i u s degrees measured 2
// hours a f t e r t h e a d m i n i s t r a t i o n
Nuisances :
age : i n t e g e r ( 1 8 , 6 0 )
weight : f l o a t ( 4 0 , 1 2 0 )
Hypothesis : D e s c r i p t i v e
Mean( bodyTemperatureDecrease ) = 2 . 8 3
Figure 6.5: Descriptive hypothesis supported by SEDL
6.2.4
Experimental design
The design of a SEDL experiment includes information about the sampling, assignments, blocks, groups and the experimental protocol. These concepts are fully
described in Section §3.3.4. Figure §6.6 shows the design of experiment #2.
EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
...
Variables :
Factors :
OptTech enum EA, GRASP1 , . . . , RS // Opt . t e c h n i q u e
NCFactors : // Here t h e problems a r e oredered t o show t h e s y n t a s o f t h e
I n s t a n c e enum P0 ( F i l e : ’P1.qoswsc ’ ) , . . . , P10 ( . . . ) // language
Outcomes :
O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found
Hypothesis : D i f f e r e n t i a l
Design :
Sampling RandomBlock
Blocking I n s t a n c e
Assignment Random
Groups OptTech s i z e 20
P r o t o c o l Random
Analyses // Use ANOVA or Friedman
A1 :
FactANOVAwRS( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) )
Tukey ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) )
A2 :
Friedman ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) , 0 . 0 3 )
Holms ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) )
Figure 6.6: Simple randomized design supported by SEDL
The supported sampling methods are random, random block and custom. If the sampling method is random the objects are randomly selected from the population. If the
142
6.2. EXPERIMENTAL DESCRIPTION
sampling method is random block, the objects are randomly picked from the population and assigned to the blocks until all blocks are complete. The number of blocks
depends on the non controllable factors used for blocking in the assignments. Finally,
custom samplings methods are also allowed using the extensions points of the language (see Section §6.5). In the example, a random block sampling method is used.
Thus, the runs of the metahuristics programs are divided into 10 blocks, one for each
problem instance.
The assignments indicates how the objects are grouped, the size of the groups, and
which values of the controllable factors (i.e., treatments) are applied to each group. In
Figure §6.6, this means that each metaheuristic program will be run 20 times for each
block, i.e., problem instance.
The protocol establishes the order in which the individual objects receives the treatments. The experimental protocols supported in SEDL are random and custom. In the
example shown in Figure §6.6, the protocol establishes that the order in which the
metaheuristic programs are run with each problem instance is random. Custom protocols can be defined using the extension points of the language.
6.2.5
Analyses specification
The description of a SEDL experiment concludes with the specification of the statistic analyses to be performed on the outcomes. This specification comprises of a list of
alternative set of statistical tests. This list is prioritized indicating that the first test set
whose preconditions holds will be applied, ignoring the rest. Each set of statistical
tests has an identifier (A1 and A2 in the example) and may include one ore more tests
that are expected to be applied in sequential order. The experiment described in Figure §6.6 describes two alternative sets of statistical analyses, one using parametric tests
(ANOVA and the Tukey’s post-hoc procedure) and the other using non-parametric
tests (Friedman’s tests and Holms post-hoc procedure).
SEDL supports the specification of two types of statistical analyses on experimental
datasets, descriptive statistics and statistical analyses. Regarding descriptive statistics,
SEDL supports the specification of:
• Central tendency measures include mean, median, mode, and confidence intervals.
• Variability measures include standard deviation, range, inter-quartile range, and confidence intervals.
143
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
• Rankings establish an order relation on the levels of a set of factors based on the
value of a descriptive statistic, e.g. mean.
Regarding statistical analyses, SEDL supports the specification of:
• Null Hypothesis Significance Tests (NHST). The tests and post-hoc procedures supported are described in Appendix §D. In SEDL, NHST comprise of: the name
of the specific test to be performed, the dataset on which the test should be performed, and the significance level α , e.g. ANOVA( Filter (OptTech), 0.001) means:
“perform an ANOVA multiple comparison test on the datasets per optimization
technique with α = 0.001”.
• Correlation coefficients. In SEDL the specific methods used to compute correlation
are specified by name. The correlation coefficients supported are Spearman [257],
Pearson [260], Kendall [158] and Cramer [59].
The datasets on which the analyses should be performed are specified in terms of
three different operations:
• Filtering: This operation selects a subset of the results dataset based on the values
of specific variables, in a very similar way as the WHERE clause in SQL. SEDL supports two types of filters, per variable and per group. For instance, in experiment
#2, Filter (OptTech = EA) would generate a single dataset with the measurements
of the variables where the metaheuristic EA was used. Another example in the
context of experiment #2 is Filter (OptTech), which generates as many datasets as
levels has the OptTech variable, i.e., one dataset per metaheuristic.
• Projection: A projection defines the set of variable measurements that a dataset
will contain. It is similar to the enumeration of columns after the SELECT in SQL
queries. For instance in experiment #2, Project(ObjectiveFunction) would generate a single dataset with all the measurements of the ObjectiveFunction. When a
single outcome variable is specified in the experiment, a projection by this variable is implicitly assumed. For instance, in experiment #1, the mean of the body
temperature decrease per dose can be specified as Mean( Filter (dose)), which is
equivalent to Mean( Filter (dose).Project(bodyTemperatureDescrease))
• Grouping: Grouping operations define how the elements in different datasets will
be arranged for comparison in statistical tests. Its primary use is the specification of the blocks defined by blocking variables. For instance, in experiment #2,
144
6.3. EXPERIMENTAL EXECUTION
Grouping( Instance) means that the datasets to be compared must contain tuples
with same value for the Instance variable.
6.3
E XPERIMENTAL EXECUTION
As introduced in Section §3.4, experimental executions comprises of two phases,
experimental conduction and data analysis. Detailing these phases is especially relevant to automate the execution and replication of computational experiments. Figure
§6.7 describes a simple SEDL configuration.
EXPERIMENT : QoS−Gasp1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
...
Configuration C1 :
Inputs :
F i l e p e r I n s t a n c e ’${Instance }. qoswsc ’
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . .
Runs : E1 :
R e s u l t : F i l e ’...’
Analyses :
A2 : p−value : 0 . 0 0 0 1 7 // Thus t h e r e a r e s i g . d i f f . ( our hyp . i s TRUE)
// Conclusion : D i f f e r e n t i a l Hypothesis Accepted
Figure 6.7: Simple experimental execution in SEDL
SEDL describes the experimental execution in terms of configurations. A configuration includes all the information required to conduct and replicate the experiment.
This includes the URIs of the input files and the output files to be generated as well as
the experimental settings.
The results files of an experiment are assumed to have a relational structure, where
each tuple contains the information obtained from a measurement of the corresponding values of variables. In particular, the attributes on the relation correspond to the
variables of the experiment or any other relevance information of the object. For instance, in experiment #1, in addition to the values of the variables (dose and difference
in corporal temperature) the results tuples can also contain the social security number
(SSN) of the patient.
The experimental settings describe the requirement to conduct the experiment within
an specific configuration. This point is clearly domain-dependent and thus it is defined
as an extension point of the language. However, an specific extension for computational experiments is provided. This SEDL extension describes experimental settings
145
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
Figure 6.8: Experiment description and relational model of its results
in terms of operating system, runtimes and libraries. Other types of experimental settings
(such as medical equipment, or measurement instruments) could be specified through
specific DSLs using the corresponding extension point of the language.
Furthermore, configurations may contain a detailed description of the conduction
process named experimental procedure. This description is aimed specifically at automation. While the experimental design defines what treatments will be applied on
which experimental objects and in which order, the experimental procedure details how to
apply the treatments. The language required to specify such a description depends
strongly on the type of experiment, thus we have created an extension points for this
purpose. In our reference implementation two different DSLs are provided: one supporting the execution treatments through shell commands, and another one specific
for metaheuristic optimizations with MOFs.
Figure 6.9 shows an examples of experimental procedure specified using the command line procedure. Note that the values and the variables and constants of the
experiment are used to apply the treatment. Specifically, the procedure described in
this figure states that the treatment depends on a single variable named “solver” (but
can use any constant defined in the experiment). The application of the treatment is
performed by invoking a command that runs a java program named ETHOM, and the
procedure specifies as parameters of such java program the output file to be generated,
146
6.3. EXPERIMENTAL EXECUTION
a set of constants values, and a property of the specific variable that the treatment takes
as input (solver).
EXPERIMENT : SampleCommandExperimentalProcedure v e r s i o n 1 . 0 //no r e p o s i t o r y
...
Constants :
/ * Parameters o f t h e F e a t u r e Model t o be g ene rate d * /
NFeatures : 5 0 0
// Number o f f e a t u r e s o
CTC : 20
// P e r c e n t a g e o f Cross Tree C o n s t r a i n t s
/ * Parameters o f ETHOM ( i t i s an E v o l u t i o n a r y Algorithm ) * /
CrossoverProb : 0 . 7
// Crossover r a t e or p r o b a b i l i t y
MutationProb : 0 . 0 0 5
// Mutation p r o b a b i l i t y
P o p u l a t i o n S i z e : 100
// P o p u l a t i o n s i z e
Variables :
Factors :
// S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
s o l v e r enum JaCob ( name : ’CSP -JaCoB ’ ) , Choco ( name : ’CSP -Choco ’ )
Outcomes :
O b j e c t i v e F u n c t i o n i n t e g e r // B e s t value o f t h e o b j . func . found
...
Configurations : C1
Procedure :
Command as Treatment ( s o l v e r ) :
’java -jar ETHOM Results.csv ${NFeatures} ${CTC} ${solver.name }\
${ CrossoverProb } ${ MutationProb} ${ PopulationSize }’
...
Figure 6.9: Sample of command experimental procedure
Configurations may include a set of so-called runs. A run represents a conduction
of the experiment using an specific configuration. Note that this is primordial to make
experiments self-contained and to enable the comparison of results in subsequent replications. An experimental run is described in terms of the result dataset and the results
of the analyses. Accordingly to the described in the previous section, the results can
be either of descriptive statistics or results of statistical tests The results of descriptive
statistics are presented in terms of its value per object/group, e.g. in experiment #2 for
a mean of the objective function by metaheuristic (specified as Mean( Filter (OptTech)))
the results would be ( EA) : 7.21, ( RS) : 2.57, etc.
Another example in the context of experiment #2 would be computing the confidence interval of the mean per metaheuristic and problem instance, specified as CI ( Filter ( Instance,Op
The results generated for this analyses would be ( P1| EA) : 12.53, ( P1| RS) : 9.12, . . .,( P10| EA) :
721.3,( P10| RS) : 512.32, etc. The results of statistical test are presented per object/group indicating the name and value of each statistical measure, namely, p-value, freedom degree and threshold. Figure §6.10, shows a set of examples of analysis and their
corresponding results.
147
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
EXPERIMENT : S a m p l e A n a l y s e s S p e c i f i c a t i o n v e r s i o n 1 . 0 //no r e p o s i t o r y
...
Design :
...
Analyses :
A1 : Avg ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) )
A2 : StdDev ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) )
A3 : Range ( F i l t e r ( OptTech ) . P r o j ( ObjFunc ) )
A4 : CI ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) )
A5 : IQR( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) )
A6 : Ranking ( Avg ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) ) , OptTech )
A7 : Median ( P r o j ( ObjFunc ) )
A8 : Friedman ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) , 0 . 0 5 )
Holms ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) . P r o j ( ObjFunc ) , 0 . 0 5 )
A9 : Pearson ( BodyTempDiff , { Dose } ,{ ’Cx’ } )
...
Configurations : C1
Runs : E1 :
...
Analyses :
A1 : Avg (EA| PI0 ) : 6 . 3 2 , . . . , Avg (GRASP6 | P0 ) : 5 . 4 1 , . . . ,
(EA| PI0 ) : 6 . 3 2 , . . . , (GRASP6 | PI0 ) : 5 . 4 1
A2 : StdDev (EA| PI0 ) : 1 . 6 4 , . . . , StdDev (GRASP6 | P0 ) : 0 . 4 1 , . . . ,
StdDev (EA| PI10 ) : 5 . 3 2 , . . . , StdDev (GRASP6 | PI10 ) : 1 . 1
A3 : Range (EA ) : 1 . 7 − 2 2 3 . 6 2 , . . . , Range (GRASP) : 0 . 8 7 − 1 8 3 . 4 3
A4 : CI (EA| PI0 ) : 6 . 1 2 − 7 . 2 1 8 } , . . . , CI ( GRASP6 | P0 ) : 5 . 3 5 − 5 . 9 3 , . . . ,
CI (EA| PI10 ) : 6 . 3 2 , . . . , CI (GRASP6 | PI10 ) : 5 . 4 1
a5 : IQR(EA| PI0 ) : 1 . 8 2 , . . . , IQR( GRASP6 | PI0 ) : 0 . 5 7 , . . . ,
IQR(EA| PI10 ) : 6 . 3 2 , . . . , IQR( GRASP6 | PI10 ) : 5 . 4 1
A6 : Ranking : ( GRASP+PR1 ) : 1 , (GRASP6 ) : 2 , . . . , ( TS+SA ) : 5
A7 : Median : 3 . 1 7 6
A8 : Friedman : Pvalue : 0 . 0 0 0 1 7 , d e s c r i p t i o n : ’Chi -Squared dist.’ , freedom degrees : 2 4
{ (EA vs TS+SA) Pvalue : 0 . 0 6 3 Sthreshold : 0 . 0 2 } , . . . ,
{ (EA vs GRASP6 ) Pvalue : 0 . 0 0 3 Sthreshold : 0 . 0 2 }
A9 : Pearson ( BodyTempDiff , { Dose } ) : {C : 0 . 2 , Cdose : 0 . 0 0 1 5 , r : 0 . 7 8 9 1 2 }
Figure 6.10: Samples of statistical analyses specifications and results
6.4
A UTOMATED ANALYSIS OF SEDL DOCUMENTS
In this section, we present a catalog of 15 analysis operations for the analysis of
SEDL documents. In particular, we first present several operations for checking the internal validity of experiments described in SEDL. Then, we propose several operations
to extract information from SEDL documents. We may remark that the characteristics
of SEDL makes the implementation of these operations straightforward. To the best
of our knowledge, this is the first approach supporting the automated validation of
experiments.
148
6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS
6.4.1
Information extraction operations
In this section, we present a set of generic operations for the automated extraction
of information from SEDL documents. These can be helpful to make decision during
the experimental process and to report further information about experiments. Also,
these can be used as auxiliary operations to check the validity of experiments described
in SEDL.
Number of Blocks
This operation computes the number of blocks of the experiment as the product of
the number of levels of the blocking variables. It takes as input an SEDL experiment E.
This operation can be expressed as follows:
#blocks( E) =


∏ (v.levels.size)
if B , ∅
(6.1)
v∈ B
 1
Otherwise
where B is the set of blocking variables of the experiments, i.e., E.design.blockings.
For instance, the number of blocks of the experiment described in Figure §6.6 is 10,
since it has a single 10-levels blocking variable. Another example is the experiment
Exp1a described in figure §9.9 (in Chapter §9), which has two block variables NFeatures
and CTC with 4 levels each. For this experiment #blocks( Exp1a) = 4 ∗ 4 = 16.
Measurements per object
This analyses operation computes the number of measurements performed on each
experimental object a group (this is usually specified by the experimental protocol).
For complete randomized designs, blocking factorial designs, and most of the designs
described in literature, its value is 1 [116]. An scenario where the result of this operation would be 2 is experiment #1 if the decrease in body temperature of patients were
measured two times. We denote this operation as #measurementPerObject.
Measurements per experimental Block
This operation computes the total number of measurements that should be generated for each block in a classical design. This operation can be expressed as follows:
measurementsPerBlock( E) =
∑ ( g.size ∗ #measurementPerObject(E.design.protocol [ g]))
g∈ G
149
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
where G is the set of groups defined in the design of the experiment, i.e., E.design.groups.
The analysis operation #measurementPerObject is defined below. For instance, for the
experiment E described in Figure §6.6, measurementsPerBlock( E) = ∑51 (20 ∗ 1) = 100,
since it has five groups (one per level of factor OptTech) of size 20.
Design Cardinality (a.k.a. Design size)
This operation computes the expected number of measurements that should be
generated according to a specific design. It takes as input an experiment E. It can only
compute the result for classical design (this is a precondition of the operation). The
cardinality is computed as the product of the number of blocks (computed through
the analysis operation blocks) and the number of measurements per block specified
in the design (computed through the analysis operation measurementsPerBlock). This
operation can be expressed as follows:
#design( E) = #blocks( E) ∗ #measurementsPerBlock( E)
For instance, in the experiment E described in Figure §6.6 #design( E) = 10 ∗ 100 =
1000 since #blocks( E) = 10 and #measurementsPerBlock( E) = 100.
Sample size (#sample)
This operation computes the number of objects in the sample. It can only compute
the result for classical design (this is a precondition of the operation). The cardinality
is computed as the product of the number of blocks and the sum of the sizes of all
groups.This operation can be expressed as follows:
#sample( E) = #blocks( E) ∗
∑ ( g.size)
g∈ G
where G is the set of groups defined in the design of the experiment i.e., E.design.groups,
and #blocks is the analysis operation defined next. It is noticeable that when
#measurementsPerObject is 1 for all the groups of the experiment, then #sample( E) =
#design( E).
Results cardinality (a.k.a. Results size)
This operation computes the total size of the dataset of an specific execution of a
lab-pack. We assume that the structure of the dataset is a relation (in the sense of the
relational model). This operation can be expressed as follows in relational algebra:
#results( L, ExId) = Ωcount(∗) (rel ( L.executions[ ExId].results))
150
6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS
where L is the labpack that contains both the experimental description of the experiment and the results of the executions, and expression rel ( L.executions[ ExId].results)
provides the results of the execution identified by ExID as a relation.
6.4.2
Operations for validity checking
In this section, we present a set of analysis operations for checking the internal
validity of experiments described in SEDL. Given a SEDL document, the operations
diagnose possible internal threats in the experiment. The results of these operations
could be used to warn experimenters about potential threats and suggest possible fixes.
The operations can be divided into two groups, those used to validate the description of the experiment (description validation) and those used to validate the results
and statistical tests (execution validation). Also, we distinguish two different severity
levels for the results of the operations, warning and error. Warnings show evidences
of possible threats to validity while errors confirm their existence. Table §6.4.2 depicts
a summary of the analysis operations for validity checking. For each operation, the
threat that it diagnoses, its name, type of validation (description or execution validation) and severity level are shown.
In the following subsections, we define each operation indicating the threat that it
detects and how to neutralize or minimize it.
Analysis operations for checking the internal validity of SEDL experiments
Threat
Operation
Validation
IVT-2
IVT-3
IVT-5
IVT-6
IVT-7
IVT-8
IVT-9
IVT-10
IVT-11
Random design
Description
Random conduction
Description
Blocking factors
Description
Missing measurements
Execution
Small sampling
Description
Statistical preconditions Execution
Multiple comparison
Description
Out of range
Execution
Recommended test
Execution
Severity
Warning
Warning
Warning
Warning/Error
Error
Error
Error
Error
Warning
Random design
This analysis operation checks whether the sampling, the assignment, and the protocol are randomized. If they are not, the experiment would be more susceptible to
151
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
external events that may influence the conduction of the experiment and its outcome.
This should be warned as an internal validity threat in the experiment (ITV-2, Section
§3.5.1). To minimize this threat, a random sampling, assignment mechanism and protocol should be used.
Random conduction
This analysis operation checks whether the experimental protocol defined in the
SEDL document is randomized. If it is not, the results could be strongly biased in the
case that certain treatments on objects could affect to other subsequent treatments. This
should be interpreted as a warning related to the threat to validity ”testing effects”’
(ITV-3, Section §3.5.1). To minimize this threat, a random protocol should be used.
Blocking factors
This analysis operation checks if there exists a non controllable factor that is not
used for blocking in the design. In such a case, the conclusions of the experiment
could be threatened (ITV-5, Section §3.5.1). Since the materialization of the threat is not
fully confirmed this should be interpreted as a warning. To neutralize the threat, all
non controllable factors should be used for blocking. This operation can be expressed
as follows:
blockingFactors( E) ⇔ ∃nc f ∈ N • nc f < E.Design.Blocking
where N stands for the set of non-controllable factors of the experiment, i.e.,
E.Variables.NCFactors.
Missing measurements
This analysis operation checks if the results of an experimental conduction contains less measurements that those specified in the design. If the operation returns
true, it means that some measurement of the outcomes failed or were lost (ITV-6, Section §3.5.1). In general, this is considered an error. However, in some cases, a small
percentage of missing measurements can be acceptable and therefore this can be interpreted as an warning. To neutralize the threat, the experiment should be repeated.
This operation can be defined as follows:
missingMeasurements( L, ExecId) ⇔ #results( L, ExecId) < #design( L.experiment)
where L represents the experimental lab-pack and ExecId the identifier of the execution to check.The analysis operations #results and #design are defined above. For
152
6.4. AUTOMATED ANALYSIS OF SEDL DOCUMENTS
instance, in the first run of experiment #A1 with QoS-Gasp (c.f. Section ??) this operation found missing measurements (less results than those expected), warning us of a
bug in the implementation of the techniques.
Small sampling
This analysis operation returns true if the size of the sampling is less than an input
parameter s. This operation is used to diagnose the validity threat derived of using a
number of experimental objects too small (IVT-7, Section §3.5.1). Since the minimum
accepted size for the sampling varies among experimental disciplines and areas1 , it
is provided as a parameter to the operation. To neutralize this threat, the experiment
should be repeated with a larger sampling. This can be expressed as follows:
smallSample( E, threshold) ⇔ #sample( E) < threshold
where #sample is an analysis operation described above.
Statistical preconditions
As explained in Section §3.4.2, some statistical tests are only applicable to dataset
with certain characteristics. For instance, ANOVA is applicable only when the data
follow a normal distribution. This operation checks if the dataset specified in a SEDL
experiment fulfils all the preconditions required by the statistical tests to be performed.
If not, the results would be threatened (IVT-8, Section §3.5.1). This threat is neutralized
by repeating the experiment with a set of statistical tests that are suitable for the type
of dataset. This can be expressed as follows:
statPrecnd( L, ExId) ⇔ ∃( a ∈ A), ∃( p ∈ a.precond) • ¬ p.holds(res( L, ExId))
where A is the set of NHST analyses specified for the experiment, a.precond is the set
of preconditions of the NHST a, and p.holds is an operation that checks whether the
preconditions hold or not.
Multiple comparison
This analysis operation checks if several datasets are being compared using a simple comparison statistical test, instead of a specific test for multiple comparison. If a
simple comparison test is used, the error rate could be accumulated leading to erroneous conclusions (IVT-9, Section §3.5.1). To neutralize this threat, the statistical analysis should be repeated using a suitable technique for comparing multiple datasets.
1A
sample size of 10 could be inappropriate for MOEs but acceptable for an experiment regarding
rare diseases.
153
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
Out of range
This operation checks if the any of the values of the variables is out of range. In
that case, the conclusions of the experiment could be clearly biased (IVT-10, Section
§3.5.1). To neutralize it, the experiment should be repeated using suitable ranges for
the variables. This can be expressed as:
outO f Range( L, ExecId) ⇔ ∃v ∈ V, m ∈ Projection(res( L, ExecId), v) • m.value < v.levels
where V is the set of variables of the experiment of the lab-pack, i.e.,
L.experiment.variables, and Projection(res( L, ExId), v) computes the set of values of
variable v in the results of the execution ExecId.
Recommended test
This operation checks whether the statistical tests performed on the experimental
results are those recommended by an specific analysis methodology. For instance, in
the thesis we follow the methodology proposed by Derrac et al.. Using the wrong technique may threat the validity of the conclusion due to the “inaccurate size estimation
effect” (IVT-11, Section §3.5.1). The result of this operation should be interpreted as a
warning. To neutralize the threat, the experimental analysis should be repeated using
some of the recommended tests in the methodology. In our specific implementation
the decision tree depicted in Figure §8.6.
6.5
E XTENSION POINTS
In order to provide the flexibility required to support the description of experiments
in different scientific or engineering areas, several extension points have been defined
in SEDL. These are presented in the following subsections.
6.5.1
Context
• Population. This extension point enables a detailed description of the populations
of the experiment. The current version of SEDL supports the description of the
populations in natural language.
154
6.5. EXTENSION POINTS
6.5.2
Hypotheses
• Assertion. This extension point enables the usage of different languages for specifying the assertions of descriptive hypotheses. Currently, SEDL supports the use
of statistical tests (where the assertion states that the null hypothesis is rejected)
and descriptive statistics.
• Relation. This extension point allows the usage of different languages for specifying the relation among dependent and independent variables in associational
hypotheses. For instance, this extension point could be used to provide support
for relations based on additional kinds of regression (c.f. Table §3.5).
6.5.3
Designs
• Sampling. This extension point enables the accurate and machine-processable description of sampling methods. It could be used to enable the assessment of the
external validity of experiments.
• Assignment. This extension point allows the accurate description of custom methods in order to integrate it into the automatic assessment of the internal validity
of experiments.
• Experimental design. This extension point enables the succinct and accurate description of well-known experimental designs defined in the literature such as
factorials, Latin squares and hypercubes, etc.
• Sizing. This extension point enables the description of group sizes based on different elements of the experiment such as expressions using the value of the constants of the experiment, number of levels of variables, and the α used for statistical tests. Currently SEDL supports constant group sizes.
6.5.4
Configurations
• Experimental Procedure. The purpose of this extension point is to enable an accurate description of the procedure of execution of the experiment in different
scientific areas. Currently, two DSLs are provided, one for executing computer
based experiments as sequences of commands, and another for executing optimization tasks in MOFs.
155
CHAPTER 6. SCIENTIFIC EXPERIMENTS DESCRIPTION LANGUAGE
• Settings. This extension point enables the description of the specific equipment
required for experimental conduction in particular scientific environments. For
instance, this extension point could be used to create a DSL for experimental
instruments in Physics, enabling the description of oscilloscopes with a minimum
precision, a mass spectrograph, or particle accelerators with a minimum power.
• Inputs. This extension point enables the use of alternative sources of input data
for experiments such as public online datasets or experimental repositories [87,
202, 218, 223]. Currently SEDL supports the specifications of input files.
• Outputs. This extension point enables the use of alternative mechanisms for
storing and publishing the result datasets of experimental executions. Currenly
SEDL supports the specification of output files.
6.5.5
Analyses
• Datasets. This extension point enables the usage of user-friendly languages for
specifying the datasets on which experimental analyses should be performed.
Currently, SEDL provides a language for specifying simple operations based on
the operators of relational algebra, namely Filtering, Grouping and Projection
(c.f. Section §6.2.5).
• Analysis specification. This extension point enables the usage of additional types
of analyses and their result format. For instance, this extension point could be
used to enable the generation of histograms and box-plots for exploratory data
analysis.
6.6
S UMMARY
In this chapter, a language for experimental description named SEDL has been proposed. Its main elements have been introduced through a number of sample experimental descriptions using a concrete syntax based on plain text. Despite the expressiveness and precision in experimental description that SEDL supports, the description
of experiments is still tedious and time-consuming due to the high number of elements
to be specified. As a response to these drawbacks, an specific DSL intended for the description of MOEs based on SEDL is proposed in the next chapter.
156
7
M ETAHEURISTIC O PTIMIZATION
E XPERIMENTS D ESCRIPTION
L ANGUAGE
If you talk to a man in a language he understands, that goes to his head.
If you talk to him in his language, that goes to his heart.
Nelson Mandela,
1918 –
South African anti-apartheid revolutionary and politician
n this chapter we present MOEDL, a domain-specific language for the description of
MOEs based on SEDL. Section §7.1 provides a brief introduction to the language. Sections §7.2 describes the structure of a MOEDL document. The types of MOEs supported
by MOEDL and their specific syntax in the language are presented in Section §7.3. Section
§7.4 presents a set of transformation rules to transform MOEDL documents into SEDL documents. The extension points of the language are described in Section §7.5. Finally, Section §7.6
summarizes the contributions of this chapter.
I
157
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
7.1
I NTRODUCTION
In this chapter we present Metaheuristic Optimization Experiments Description Language (MOEDL), a domain-specific language for the description of MOEs based on
SEDL. MOEDL has been designed with the goal of reducing the time and expertise
required for describing MOEs with maximum guarantees of validity and replicability.
SEDL eases the achievement of those goals since: i ) it enables the interpretation of any
MOEDL experiment as a SEDL experiment and check its validity using the analysis
operations defined for SEDL (c.f. Section §6.4); and ii ) it frees MOEDL from defining
many of the low-level details already supported in SEDL.
Figure §7.1 shows the general structure of a MOEDL description and its specific
materialization for a sample experiment. The abstract and XML concrete syntax of
MOEDL are provided in Appendix §B. The document is divided into three main sections: problems, techniques and configuration. The former includes details about the
problem such as its type and problem instances to solved. The second includes information about the metaheuristic techniques used to solve the problem, the termination
criterion and random number generator used. The later includes information concerning the execution configuration.
In this dissertation the interpretation and analysis of MOEDL documents are performed on the basis of its corresponding SEDL document. To that purpose, we present
a set of transformation rules from MOEDL to SEDL, i.e., any MOEDL document can
be automatically transformed to a SEDL document. This approach to the design of
MOEDL has important advantages. First, it enables the creation of more succinct experimental descriptions, since we can skip the elements that are common to any MOE
and incorporate them to the corresponding SEDL documents during the transformation process. Furthermore, this approach enables grouping several experimental design decisions into alternative choices in MOEDL reducing the risk of using incompatible designs.
Interpreting MOEDL documents as their corresponding SEDL documents also implies several consequences. First, MOEDL should be as simple as possible, removing
all the elements that could be delegated to SEDL. Second, it requires of support tools
to define, select and apply the specific transformation strategy, henceforth named experimental methodology for MOEDL interpretation1 (c.f. Section §7.4).
1 Given
a MOE it is possible to define several SEDL descriptions that encode the semantics of such
experiment with alternative designs [23, 30] that are known as experimental methodologies [21].
158
7.1. INTRODUCTION
Figure 7.1: MOEDL structure and its mapping to a sample MOE
159
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
Figure §7.2 and §7.3 show a simple MOEDL experiment and its corresponding
SEDL counterpart. The experiment describes a simple technique comparison. The optimization problem to be solved is the Traveling Salesman Problem (TSP). The experiment includes two problem instances (bruma14 and berlin52, taken from the TSPLib
benchmark [1]) and two metaheuristic techniques, a random search and a simulated
annealing with exponential cooling scheme.
MOEDL : : EXPERIMENT : SimpleTSP v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
Problem Types : // T r a v e l l i n g Salesman Problem
TSP ( ’es.us.isa.fomfw.problems.tsp.TSPProblem ’ )
{
O b j e c t i v e f u n c t i o n s : TourLength
I n s t a n c e s ( f i l e : ’${instance }.tsp’ ) :
burma14
berlin52
}
O p t i m i z a t i o n T e c h n i q u e s ( encoding : ’es.us.isa.fomfw.problems.tsp.solutions.TSPSolution ’ ) :
RS ( RandomSearch ) { }
SA(SA) {
i n i t i a l T e m p e r a t u r e : 10000
neighbourPerIteration : 5
coolingScheme : E x p o n e n t i a l ( 0 . 9 5 )
}
T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 )
// i n m i l l i s e c o n d s −> 1 second
Random Number Generator : // Mersene T w i s t e r RNDG
B a s i c ( seed : 1 7 4 3 2 6 ,
c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . .
Runs :
E1 : s t a r t 1 0 : 1 2 : 2 7 2013/08/11 f i n i s h 2 0 : 5 6 : 4 8 2013/08/11
R e s u l t : F i l e ’...’
Analyses :
A2 : Wilcoxon ( OptTech , I n s t a n c e ) : p−value : 0 . 0 0 0 1 7
Figure 7.2: Sample MOEDL experiment
In order to describe the structure of experimental descriptions in MOEDL, first we
depict the common elements for any MOE, and next the particular elements of each
specific type of MOE.
160
7.2. MOEDL EXPERIMENTAL DESCRIPTIONS
EXPERIMENT : SimpleTSP v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3 runs : 2 0
Constants :
RS−Configuration : ’RandomSearch {}’
SA−Configuration : ’SimulatedAnnealing {
initialTemperature : 10000
neighbourPerIteration : 5
coolingScheme : Exponential (0.95)
}’
T e r m i n a t i o n C r i t e r i o n : ’MaxTime (1000) ’
RandomNumberGenerator : ’class:org.apache.commons.math3.random. MersenneTwister ’
Encoding : ’class:org.apache.commons.math3.random. MersenneTwister ’
ProblemType : ’es.us.isa.fomfw.problems.tsp.TSPProblem ’
Variables :
F a c t o r s : OptTech enum RS , SA // Optimization t e c h n i q u e
NCFactors : I n s t a n c e enum burma13 ( F i l e : ’burma13.tsp’ ) , b e r l i n 5 2 ( F i l e : ’berlin52.tsp’ )
Outcomes : O b j e c t i v e F u n c t i o n f l o a t // B e s t value o f t h e o b j . func . found
Design :
Sampling RandomBlock
Assignment Random Blocking I n s t a n c e
Groups by OptTech s i z e 20
P r o t o c o l Random
Analyses // Use T−T e s t or Wilcoxon
A1 : T−T e s t ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) )
A2 : Wilcoxon ( F i l t e r ( OptTech ) . Group ( I n s t a n c e ) )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5 . . .
Runs :
E1 : s t a r t 1 0 : 1 2 : 2 7 2013/08/11 f i n i s h 2 0 : 5 6 : 4 8 2013/08/11
R e s u l t : F i l e ’...’
Analyses :
A2 : Wilcoxon ( OptTech , I n s t a n c e ) : p−value : 0 . 0 0 0 1 7
Figure 7.3: Example of mapping from MOEDL to SEDL
7.2
MOEDL
7.2.1
Preamble
EXPERIMENTAL DESCRIPTIONS
The preamble of a MOEDL document identifies the experiment as a MOE by including the prefix MOEDL :: to the EXPERI MENT declaration, and specifies the type
of experiment and set of transformation rules to be applied to be applied, using the type
and methodology properties respectively. Figure §7.4 shows a simple MOEDL experiment of type techniqueComparison, that uses the basic set of transformations described
in this dissertation for this type of experiments.
161
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
7.2.2
Problem types and instances
MOEDL documents must describe the optimization problems to be solved. The description of each optimization problem comprises of i ) an identifier of the problem, ii )
a definition of the problem including an enumeration of their objective functions, and
iii ) the specific problem instances to be solved in the experiment. Problems instances
in MOEDL can be described as follows:
• Instances enumeration: The definition of each problem instance contains an identifier and a set of properties. The file containing the data of that particular problem instance is usually specified as a property. Figure §7.4 depicts a sample
MOEDL document fragment with problem instance enumeration. MOEDL provides sugar syntax for providing the file property that uses the identifier if the
problem instance as a parameter in the file name. This short-cut is supported as
a property of the Instances declaration, as shown in the comments of Figure §7.4.
• Benchmark: An optimization benchmark is a published set of instances of a particular optimization problem. It is specified indicating the download URL and the
identifiers of the specific problem instances to be used in the experiment. Figure
§7.5 shows a MOEDL document fragment including a benchmark definition.
• Instances generator: This enables the automated generation of problem instances.
Generators are specified as an executable command and a set of parameters. The
specific seed to be used, the number of instances to be generated, and the files that
will be generated are usually parameters of this command. Figure §7.6 depicts a
sample MOEDL document fragment using a problem instance generator.
MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
...
Problem Types : // QoS−aware Composite Web S e r v i c e s Binding
QoSWSCB( c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) {
O b j e c t i v e f u n c t i o n s : GlobalQoS
Instances :
//More s u c c i n c t : I n s t a n c e s ( f i l e : ’ ${ i n s t a n c e } . qoswsc ’ )
P0 ( F i l e : ’P0.qoswsc ’ )
//
P0 , . . . , P10
...
P10 , { F i l e : ’P10.qoswsc ’}
}
...
Figure 7.4: Problem instances enumeration supported by MOEDL
162
7.2. MOEDL EXPERIMENTAL DESCRIPTIONS
MOEDL: : EXPERIMENT : TSP−Sample v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
...
Problem Type : // T r a v e l l i n g Salesman Problem
TSP ( c l a s s : ’es.us.isa.fomfw.problems.tsp.TSPProblem ’ ) {
O b j e c t i v e f u n c t i o n s : TourLength
Benchmarks :
TSPLib ( f i l e : ’${instance }.tsp’ ) {
u r l : h t t p : //comopt . i f i . uni−h e i d e l b e r g . de/ s o f t w a r e /TSPLIB95/ t s p /
I n s t a n c e s : b e r l i n 5 2 , burma14 , t s p 2 2 5
}
...}
Figure 7.5: Optimization benchmarks specification supported by MOEDL
MOEDL: : EXPERIMENT : QoSWSCB2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
...
Problem Types : // QoS−aware Composite Web S e r v i c e s Binding
QoSWSCB( c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ ) {
Command( n I n s t a n c e s : 5 , seed : 1 , outputPath : ’/tmp/qowswc/’ , f i l e T e m p l a t e : ’I${run}. qowsc ’ ) :
’QoSWSCGen.bat ${seed}, ${outputFile},${nInstances}, ${fileTemplate }’
}
...
Figure 7.6: Problem instance generator defined in MOEDL
7.2.3
Metaheuristic techniques
The description of each metaheuristic technique applied in MOEs is performed
through its Parameters. A parameter in MOEDL is a couple key/value (e.g., populationSize: 100). Parameters can be nested enabling the creation of complex parameters.
For instance, in Figure §7.7, the survival policy is described using two parameters: the
mainSelector and secondarySelector.
The complete grammar of the metaheuristic techniques specification languages is
provided in Appendix §B.3 as an EBNF syntax specification.
Such grammar specifies two common parameters of any metaheuristic technique,
namely the termination criterion and the initialization scheme. Current support for termination criteria specification is described in Section §7.2.4. The initialization scheme
defines how will be generated the initial solutions used by the metaheuristic. It can
be used for hybridizing metaheuristic techniques, specifying that one metaheuristic is
used as initialization scheme of another [267]. Regarding initialization schemes, two
alternatives are supported: Random that initializes the technique with random solu-
163
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
tions, and Technique, that uses another metaheuristic technique to generate the initial
solutions.
MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
...
Optimization Technique :
GRASP1(GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL cre ation : RangeBased ( type : , alpha : 0 . 2 5
g−f u n c t i o n : g6 ( c l a s s : ’es.us.isa.qoswsc.G6’ ) )
Local improvement : SD ( )
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’
}
EACanfora (EA) {
I n i t i a l i z a t i o n S c h e m e : Random ,
populationSize : 100 , mutationProbability : 0 . 0 1 ,
c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : RouletteWheel ,
m u t a t i o n S e l e c t o r : RandomSelector ,
survivalPolicy : PrioritizedCompositeSelector (
mainSelector : E l i t i s t S e l e c t o r ( rate : 2 , absolute : true ) ,
s e c o n d a r y S e l e c t o r : RouletteWheel )
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’
}
...
Figure 7.7: Metaheuristic techniques specification supported by MOEDL
7.2.4
Termination criterion
MOEDL requires the specification of a termination criterion either globally or per
metaheuristic technique. The termination criteria currently supported by MOEDL are:
• Max Time. It specifies a maximum execution time (in milliseconds), e.g. “MaxTime(10000)” specifies an execution of ten seconds.
• Max Iterations. It specifies a maximum number of iterations for the execution of
the metaheuristic technique, e.g. “MaxIterations(100)” specifies the execution of
100 iterations.
• Max Obj Func. Evaluations. It specifies a maximum number of evaluations for the
objective function, e.g. “MabObjFuncEval(100)”.
• And: This is a composite termination criterion that takes as parameters a set of
subordinated termination criteria. The execution of the metaheuristic technique
164
7.2. MOEDL EXPERIMENTAL DESCRIPTIONS
will terminate when all the subordinated termination criteria are met simultaneously. For instance, “MaxTime(10000) AND MaxIterations(100)” specifies that
the run will terminate after 10 seconds of executions and 100 iterations has been
executed, i.e., we could execute 200 iterations if the metaheuristic iterates very
quickly until reaching 10 seconds of execution.
• Or: This is a composite termination criterion where the execution of the technique will terminate when one of the subordinated termination criteria is met.
For instance, “MaxTime(10000) OR MaxIterations(100)” specifies that the execution will terminate if we reach 100 iterations or 10 seconds of execution, i.e., it
could perform only 70 iterations times only if the metaheuristic iterates slowly.
• Repeat for: This is a composite termination criterion which introduces a blocking
variable in the experiment. Thus, the experiment will be repeated for each subordinated criterion. It is only supported as global termination criterion for the
experiment.
Figure §7.8 shows a sample MOEDL document fragment including a composite
termination criterion with several subordinated criteria (maximum execution times).
Each metaheuristic program will be executed five times, one for each maximum execution time, i.e., 100ms, 500ms, 1000ms, 5000ms and 10000ms.
MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
...
Optimization Techniques :
GRASP1(GRASP) {
...
CanforaGA (EA) {
...
T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) ,
MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) )
Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m
B a s i c ( seed : 2 1 4 3 5 ,
c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
...
Figure 7.8: Global termination criteria and random number generators in MOEDL
165
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
7.2.5
Random number generation algorithm
MOEDL documents should include the definition of the algorithm used to generate
random numbers. This can be defined globally (for all the techniques) or locally (for
each specific technique) as parameters. The algorithm is specified with an identifier
and the name of the class that implements it. When the class is not specified, the default
random number generation algorithm of the runtime specified in the configuration is
assumed. Figure §7.8 illustrates the definition of a random number generator in a
MOEDL document.
7.3
T YPES OF MOE S SUPPORTED BY MOEDL
Experiments in MOEDL provide support for the selection, tailoring and tuning
phases of the MPS life-cycle. Next, we describe the specific types of experiments supported by MOEDL.
7.3.1
Selection and tailoring experiments
These experiments are intended to compare either different metaheuristics techniques (selection) or different variants of the same technique, e.g. EA with one-point
crossover vs. EA with uniform crossover (tailoring). This is done by comparing the
solutions provided by each technique in a specific set of problem instances.
As described in Section §7.2.3, the techniques to be compared in an experiment
are described in terms of parameters. However, MOEDL provides syntactic sugar for
specifying the set of variants of a tailoring point to be compared in tailoring experiments. For instance, Figure §7.9 shows the description of different variants of GRASP
with two different greedy functions (g-function) and three possible local improvement
methods. This would translate into the following 6 variants to be compared:
• GRASP+SD-1, using SD as algorithm for the local improvement phase of GRASP,
and g1 as the greedy function.
• GRASP+SD-2, using SD as algorithm for the local improvement phase of GRASP,
and g2 as the greedy function.
• GRASP+SD-3, using SD as algorithm for the local improvement phase of GRASP,
166
7.3. TYPES OF MOES SUPPORTED BY MOEDL
and g3 as the greedy function.
• GRASP+TS-1, using Tabu Search as algorithm for the local improvement phase
of GRASP, and g1 as the greedy function.
• GRASP+TS-2, using Tabu Search as algorithm for the local improvement phase
of GRASP, and g2 as the greedy function.
• GRASP+TS-3, using Tabu Search as algorithm for the local improvement phase
of GRASP, and g3 as the greedy function.
MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
. . . // The c o n t e x t i s omitted f o r t h e sake o f b r e v i t y
Optimization Technique :
GRASP+${ Local improvement}−${g−f u n c t i o n } (GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL cre ation : RangeBased ( alpha : 0 . 2 5 )
g−f u n c t i o n : V a r i a n t s {
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G2’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) ,
Custom ( ’G3’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) }
}
Local improvement : V a r i a n t s {
SD ,
TS{
memory : Recency ( 2 5 ) ,
a s p i r a t i o n : BestImprovement ( ) ,
termination criterion : MaxIterations (50) }
}
}
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPExplorableSolution ’
}
T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 )
Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m
B a s i c ( seed : 6 4 3 2 8 1 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Figure 7.9: Tailoring experiment in MOEDL
7.3.2
Tuning experiments
The goal of tuning experiments is to find the best configuration for a single technique, in terms of parameter values. Thus, in these experiments the solutions provided
by a single technique with different parameter configurations are compared.
The parameter values to be used in a MOEDL tuning experiment are describe using a parameter space. A parameter space comprises of a set of parameter dimensions.
167
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
Each parameter dimension specifies the domain of one parameter of the metaheuristic technique. For instance, Figure §7.10 shows the definition of a three-dimensional
parameter space for an evolutionary algorithm. Each parameter has two possible values. Thus, this experiment would result in eight different variants of the metaheuristic
program being run, i.e., one for each possible combination of values.
MOEDL: : EXPERIMENT : QoSWSCB−AUX1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueTuning , methodology : BasicMOSES , runs : 2 0
...
Optimization Technique :
EACanfora (EA) {
I n i t i a l i z a t i o n S c h e m e : Random ,
−
crossoverSelector : RouletteWheelSelector ,
m u t a t i o n S e l e c t o r : RandomSelector ,
survivalPolicy :
PrioritizedCompositeSelector (
mainSelector : ParentsSelector (
selector : ElitistSelector ,
rate : 2 , absolute : true ) ,
secondarySelector : OffspringSelector (
s e l e c t o r : RouletteWheelSelector )
)
),
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_Individual ’
}
Parameters Space :
Dimensions :
p o p u l a t i o n S i z e enum 5 0 , 100
m u t a t i o n P r o b a b i l i t y enum 0 . 0 1 , 0 . 0 5
c r o s s o v e r P r o b a b i l i t y enum 0 . 5 , 0 . 8
Figure 7.10: Tuning experiment in MOEDL
7.4
T RANSFORMATION FROM MOEDL TO SEDL
MOEDL documents do not describe fully-fledged experiments since many details
such as the variables or the hypothesis are implicit. As a result, MOEDL documents do
not include all the information required to automate and replicate MOEs. Besides this,
the information included in MOEDL experiments is insufficient to apply the analysis
operations for validity checking proposed in Section §6.4.
In order to enable the automated execution and analysis of MOEs, we propose a
set of transformation rules to transform MOEDL documents into SEDL documents.
Given a MOE described in MOEDL, it is possible to define several SEDL descriptions
168
7.4. TRANSFORMATION FROM MOEDL TO SEDL
that encode the semantics of such experiment with alternative experimental design
properties, such as the set of treatments or the sampling method, that are known in the
literature as experimental methodologies [21, 23, 30]. Therefore, different methodologies could lead to different transformation rules from MOEDL to SEDL experiments.
The methodology described in this dissertation transforms MOEDL experiments to
SEDL experiments including: i ) a differential hypothesis, ii ) a complete blocking factorial experimental design, and iii ) the null hypothesis statistical tests for the analysis
described in [69].
The transformations defined in this section are described as correspondences between MOEDL elements and their corresponding SEDL elements. The elements on
both sides (source and target) are described in natural language where the context is
set to a experiment object denoted as exp. Additionally, for a more comprehensible and
succinct representation of the transformation rules, we propose using a syntax inspired
in a simplified version of ATL [153] with the following structure:
FROM <t y p e o f MOEDL e l e m e n t>
WHEN <p r e c o n d i t i o n f o r f i r i n g t h e t r a n s f o r m a t i o n r u l e>
TO <t y p e o f SEDL e l e m e n t t o g e n e r a t e>
HAVING < p r e d i c a t e s and s t a t e m e n t s a b o u t t h e e l e m e n t s g e n e r a t e d>
7.4.1
Transformation of common elements
In this section, we describe the transformation of the elements that are common to
all types of MOEDL experiments to SEDL, namely:
RULE 1.-The transformation rule copies the elements that belong directly to SEDL
or that are not directly related to the experimental methodology used for experimentation. Specifically, the context, the configurations, and the executions and analyses
present in the original MOEDL experiment are directly copied. As a consequence,
the consistency among the pre-existing analyses copied by the basic transformation
and the hypothesis and the design that could be generated by the specific set of rules
applied is not guaranteed (since those previous analyses could have been generated
through a different set of rules).
/ / RULE 1
FROM MOEDL: : Experiment ( moe )
TO SEDL : : Experiment ( exp )
HAVING exp . Id=moe . Id+’bySEDLtoMOEDL ’ AND exp . c o n t e x t =moe . c o n t e x t AND exp . name=moe . name
AND exp . metaID=moe . metaID AND exp . a n n o t a t i o n s =moe . a n n o t a t i o n s AND exp . n o t e s =moe . n o t e s
AND exp . c o n f i g u r a t i o n s =moe . c o n f i g u r a t i o n s AND exp . design=new D e t a i l e d E x p e r i m e n t a l D e s i g n ( )
AND exp . c o n f i g u r a t i o n s . add ( new Configuration ( ’MOEDL2SEDL - Configuration ’ )
169
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
In the remainder of this chapter, it is assumed that the variables defined in this
transformation are available as global variables for defining the remainder transformation rules (namely moe denoting the MOEDL metaheuristic optimization experiment to
transform, and exp denoting the SEDL experiment generated).
RULE 2.-Currently, the basic transformation from MOEDL to SEDL supports only
mono-objective problems, since it generates a single outcome variable named “ObjectiveFunctionValue”2 . The levels of “ObjectiveFunctionValue” are defined by intension,
and its value is determined by the domain of the objective function of the problem. In
our implementation this variable is rational by default. Actually, this domain is the
union of the domains of the objective function for each problem instance in the experiment. Thus, in order to ensure an appropriate interpretation of the values during the
analysis, the experiments should contain problem instances with objective functions
with simple and compatible domains. Note that if all the instances correspond to the
same problem type, this requirement is usually met.
FROM MOEDL: : Experiment ( moe )
TO SEDL : : V a r i a b l e ( outcome )
HAVING exp . v a r i a b l e s . c o n t a i n s ( outcome ) AND outcome . r o l e =Outcome
AND outcome . Id= O b j e c t i v e F u n c t i o n AND outcome . type= r a t i o n a l
AND outcome . domain= F l o a t
RULE 3.- For each problem instance an input file will be generated. Additionally, if
there is more than one problem type, or a single problem type with multiple instances,
then a nominal variable named “instance” is created. The role of “instance” is characteristic (non-controllable factor), and it is used as a blocking variable. The domain of
the variable is the set of identifiers of the problem instances.
/ / RULE 3
/ / RULE 3 . a ( I n p u t F i l e s )
FROM MOEDL: : ProblemInstance ( i )
TO SEDL : : Configuration : : Input ( f )
HAVING f . type= F i l e AND f . name= i . f i l e
/ / RULE 3 . b ( i n s t a n c e v a r i a b l e ) ,
/ / the enumeration contains a l l the i n s t a n c e s d e f i n e d f o r the experiment
FROM MOEDL: : ProblemInstanceEnumeration ( i n s t a n c e s )
WHEN i n s t a n c e s . s i z e () >1
TO SEDL : : V a r i a b l e ( i n s t a n c e V a r )
HAVING exp . v a r i a b l e s . c o n t a i n s ( i n s t a n c e V a r ) AND i n s t a n c e V a r . r o l e = C h a r a c t e r i s t i c
AND outcome . Id= i n s t a n c e AND outcome . type=Nominal
2 For
multi-objective problems a variable for each specific objective should be defined. Additionally,
a variable denoting the metric used for technique comparison, such as the hypervolume, pareto front
error, etc., should be defined.
170
7.4. TRANSFORMATION FROM MOEDL TO SEDL
AND outcome . domain= f o r e a c h i n s t a n c e i n i n s t a n c e s ,
{ i n s t a n c e . problemType . Id+’-’+ i n s t a n c e . Id }
AND exp . design . b l o c k i n g V a r i a b l e s . c o n t a i n s ( i n s t a n c e V a r )
RULE 4.- Global random number generation algorithm and termination criterion
become parameters of the algorithm.
/ / RULE 4 . a G l o b a l Random Number G e n e r a t o r
FROM MOEDL: : Random Number Generator ( rng )
WHEN moe . i s G l o b a l ( rng )
TO SEDL : : Parameter ( p )
HAVING exp . c o n f i g u r a t i o n s −>f i n d ( ’MOEDL2SEDL - Configuration ’ ) . parameters . c o n t a i n s ( p ) )
AND p . name=’RandomNumberGenerator ’ AND ( p . value=rng . c l a s s +rng . f u n c t i o n )
/ / RULE 4 . b G l o b a l T e r m i n a t i o n C r i t e r i o n
FROM MOEDL: : T e r m i n a t i o n C r i t e r i o n ( t c )
WHEN i n s t a n c e s . s i z e () >1
TO SEDL : : Parameter ( p )
HAVING exp . c o n f i g u r a t i o n s −>f i n d ( ’MOEDL2SEDL - Configuration ’ ) . parameters . c o n t a i n s ( p ) )
AND p . name=’TerminationCriterion ’ AND p . value= t c . t o S t r i n g ( )
7.4.2
Transformation of Techniques Comparison Experiments
The rules for the basic transformation from a MOEDL techniques comparison experiment into plain SEDL experimental descriptions are enumerated next:
The hypothesis of technique comparison of experiments is that the specific technique used for optimization has an impact on the value of the objective function of the
solutions obtained, i.e., some techniques perform better that others.
RULE TCE-1. For technique comparison experiments, a factor variable named
“technique” and a differential hypothesis related to “ObjectiveFunctionValue” and to
“technique” are generated’. The factor “technique” is a nominal variable whose levels
are defined as by extension as the set of labels that identify each technique described
in the. The specific details of the configuration used for each technique are expressed
as parameters of the experiment with name “${technique.Id}-Configuration”.
/ / RULE TCE−1
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n
WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0
/ / TCE − 1. a : F a c t o r V a r i a b l e
TO SEDL : : V a r i a b l e ( t e c h )
HAVING t e c h . Id=’Techique ’ AND t e c h . r o l e = F a c t o r AND t e c h . type=Nominal
AND t e c h . domain={ f o r e a c h moe . t e c h n i q u e s t e c h i q u e , t e c h n i q u e . Id }
/ / TCE − 1. b : c o n f i g u r a t i o n o f t e c h n i q u e s a s e x p e r i m e n t a l c o n s t a n t s ( p a r a m e t e r s )
AND exp . design . parameters . addAll ( { f o r e a c h moe . t e c h n i q u e s t e c h v a r ,
new Parameter ( ’#’+ t e c h v a r . Id+’-Configuration ’ , t e c h v a r . parameters )
})
/ / TCE − 1. c : D i f f e r e n t i a l H y p o t h e s i s
171
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
TO SEDL : : D i f f e r e n t i a l H y p o t h e s i s ( dh )
HAVING exp . h y p o t h e s i s =dh AND dh . outcome=ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ )
AND dh . f a c t o r s ={ t e c h }
RULE TCE-2. Regarding the experimental design, the transformation encodes a
basic methodology with a complete blocking factorial design, where all the variables
that are neither outcomes nor factors are blocking variables; i.e. experiments are replicated for all possible combination of levels of such variables. This technique is used for
dealing with experiments that specify multiple problem instances in the previous section. The design contains as many groups as optimization techniques, whose size is the
number of runs specified in the MOE. The methodology implements a simple experimental protocol base on a sequence of treatment and measurement in all the groups
(the treatment assigns the variable “technique” the level corresponding to the specific
technique associated with the group.
/ / RULE TCE−2 ( F u l l y b l o c k i n g f a c t o r i a l e x p e r i m e n t a l d e s i g n )
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n
WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0
TO SEDL : : F u l l y S p e c i f i e d E x p e r i e m n t a l D e s i g n ( ddsgn )
HAVING exp . design . d e t a i l e D e s i g n =ddsgn AND ddsg . assignmentMethod=new RandomAssignmentMethod ( )
/ / One Group p e r T e c h n i q u e
AND ddsg . groups . addAll ( f o r e a c h moe . t e c n i q u e s tech ,
new Group ( ’G-’+ t e c h . Id ,
/ / The g r o u p i s a s s o c i a t e d w i t h an s p e c i f i c t e c n i q u e run
new V a l u a t i o n ( exp . design . v a r i a b l e s . f i n d ( ’Tecnique ’ ) , t e c h . Id ) ) ,
moe . NRuns ) / / The s i z e o f t h e g r o u p i s t h e number o f r u n s
/ / Experimental protocol :
AND ddsg . p r o t o c o l =new E x p e r i m e n t a l P r o t o c o l ( )
AND f o r e a c h ( f o r e a c h ddsg . group g ,
/ / 1. − T r e a t m e n t ( t e c h . run ) , t h e v a r i a b l e v a l u a t i o n o f t h e g r o u p d e t e r m i n e s t h e t e c h .
ddsg . p r o t o c o l . addStep ( g , new Treatment ( g . v a r i a b l e V a l u a t i o n s ) ) ,
/ / 2. − Measurement o f t h e o b j . f u n c . f o r t h e b e s t s o l u t i o n f o u n d .
ddsg . p r o t o c o l . addStep ( g , new Mesaurement ( ex . design .
v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ )
RULE TCE-3. Regarding experimental analysis specification, the basic methodology implemented through the transformation follows the guidelines described in
Section §3.4.2. Consequently, the transformation generates the specification of factorial ANOVA with repeated measures as primary analysis technique. If the premises
of ANOVA are not met, Friedman with the Holms post-hoc procedure is specified
as secondary analysis option. This analysis methodology is consistent with most of
the guidelines provided in the literature [69, 116, 195, 248]. In this case, the filtering
is performed for each level of the “technique” variable, usually generating a multiple
comparison test among datasets of objective function values obtained with different
172
7.4. TRANSFORMATION FROM MOEDL TO SEDL
optimization techniques3 .
/ / RULE TCE−3 A n a l y s i s S p e c i f i c a t i o n
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n
WHEN moe . t e c h n i q u e s . s i z e () >1 OR moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e () >0
TO SEDL : : E x p e r i m e n t a l A n a l y s i s S p e c i f i c a t i o n ( ans )
HAVING exp . design . a n a l y s i s S p e c i f i c a t o n =ans
/ / F a c t o r i a l ANOVA w i t h r e p e a t e d m e a s u r e s a s p r i m a r y a n a l y s i s
AND ans . addAnalysis (
new NHST( ’FactorialANOVAwRD ’ ,
/ / D a t a s e t f o r t h e a n a l y s i s ( a c c o r d i n g t o t h e d e s i g n c r e a t e d by r u l e TCE− 2)
new DataSetSpec ( {
new F i l t e r S e t ( exp . design . v a r i a b l e s . f i n d ( ’technique ’ ) ) ,
new Grouping ( exp . design . v a r i a b l e s . f i n d ( ’instance ’ ) ) ,
new P r o j e c t i o n ( exp . design . v a r i a b l e s . f i n d ( ’ObjectiveFunction ’ ) ) } ) ,
/ / D e f a u l t a l p h a i s 0 . 0 5 ( t h i s param . i s c o n f i g u r a b l e i n t h e a c t u a l i m p l . )
0 . 0 5 ) ) . add ( new PostHoc ( ’Tukey ’ ,
. . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 )
/ / F r i e d m a n w i t h Holms p o s t −h o c a s s e c o n d a r y a n a l y s i s
AND ans . addAnalysis (
new NHST( ’Friedman ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) . add (
PostHoc ( ’Holms ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) ) )
7.4.3
Transformation of technique tuning experiments
The rules for our transformation of MOEDL tuning experiments into plain SEDL
experiment descriptions are shown next:
The hypothesis of this kind of experiments is that the parameter values used for
optimization have an impact on the value of the objective function of the obtained
solutions, i.e., that some configurations are better than others for the set of problem
instances of the experiment. A complete blocking factorial design is generate, where
the hypothesis has a single factor variable named configuration, whose levels are the
combinations of possible values of each dimension of the parameters space.
RULE TTE-1. For technique parametrization experiments a single factor variable
named “configuration” is created. This variable denotes the combination of parameter
values used for optimization. A differential hypothesis related to “ObjectiveFunctionValue” and to “configuration” is also created. The factor “configuration” is a nominal
variable whose levels are defined as by extension as the set of points in the parameters space; i.e., all possible combination of values of the parameter dimensions. The
specific value of each parameter dimension is associated to the corresponding level of
3 In
our current implementation the transformation takes into account the number of levels of “technique” and generates the corresponding single comparison tests for simple comparisons.
173
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
Figure 7.11: Transformation from MOEDL to SEDL
174
7.4. TRANSFORMATION FROM MOEDL TO SEDL
“configuration” as part of their definition4 .
/ / RULE TTE−1
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n
WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0
/ / TPE − 1. a : F a c t o r V a r i a b l e f o r c o n f i g u r a t i o n s
TO SEDL : : V a r i a b l e ( c o n f i g )
HAVING c o n f i g . type=Nominal AND t e c h . Id=’configuration ’ AND t e c h . r o l e = F a c t o r
AND t e c h . domain=Permutations ( {moe . parametersSpace . dimensions } )
/ / TPE − 1. b : D i f f e r e n t i a l H y p o t h e s i s
TO SEDL : : D i f f e r e n t i a l H y p o t h e s i s ( dh )
HAVING exp . h y p o t h e s i s =dh AND dh . outcome=ex . design . v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ )
AND dh . f a c t o r s ={ c o n f i g }
RULE TTE-2. For technique tuning experiments, the specific details of the constant
elements of the configuration for the optimization technique technique are stored in a
of the experiment with name “Technique-Configuration”.
/ / RULE TTE−2
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n
WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0
TO SEDL : : Parameter ( t e c h )
HAVING t e c h . Id=’’ T e c h n i q u e −Configuration ’’ AND t e c h . value=moe . t e c h n i q u e s [ 0 ] . parameters
AND exp . design . parameters . add ( t e c h )
RULE TTE-3. The transformation encodes a basic methodology with a complete
blocking factorial design, where the “instance” variable is used as blocking factor. The
design contains as many groups as levels has the “configuration” variable, whose size
is the number of runs specified in the MOE. The methodology implements a simple
experimental protocol base on a sequence of treatment and measurement in all the
groups (the treatment assigns the variable “configuration” to the level corresponding to
the specific parameter values associated with the group.
/ / RULE TTE−3 ( F u l l y b l o c k i n g f a c t o r i a l e x p e r i m e n t a l d e s i g n )
FROM MOEDL: : Experimet ( moe )
WHEN moe . t e c h n i q u e s . s i z e () >1 / / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e C o m p a r i s o n
TO SEDL : : F u l l y S p e c i f i e d E x p e r i e m n t a l D e s i g n ( ddsgn )
HAVING exp . design . d e t a i l e D e s i g n =ddsgn AND ddsg . assignmentMethod=new RandomAssignmentMethod ( )
/ / One Group p e r T e c h n i q u e
AND ddsg . groups . addAll (
f o r e a c h Permutations ( {moe . parametersSpace . dimensions } ) c o n f i g ,
new Group ( ’G-’+ c o n f i g . Id ,
/ / The g r o u p i s a s s o c i a t e d w i t h an s p e c i f i c p a r a m e t e r c o n f i g u r a t i o n
new V a l u a t i o n ( exp . design . v a r i a b l e s . f i n d ( ’# configuration ’ ) , c o n f i g . Id ) ) ,
moe . NRuns ) / / The s i z e o f t h e g r o u p i s t h e number o f r u n s
4 In
our transformation the generation of such set of values is performed through the Permutations
function, that provides the set of permutations for the space dimensions as levels, and configures each
level with the specific value for each dimension. The identifiers of the level are generated sequentially.
175
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
/ / Experimental protocol :
AND ddsg . p r o t o c o l =new E x p e r i m e n t a l P r o t o c o l ( )
AND f o r e a c h ( f o r e a c h ddsg . group g ,
/ / 1. − T r e a t m e n t ( t e c h . run ) , t h e v a r i a b l e v a l u a t i o n o f t h e g r o u p d e t e r m i n e s t h e c o n f i g .
ddsg . p r o t o c o l . addStep ( g , new Treatment ( g . v a r i a b l e V a l u a t i o n s ) ) ,
/ / 2. − Measurement o f t h e o b j . f u n c . f o r t h e b e s t s o l u t i o n f o u n d .
ddsg . p r o t o c o l . addStep ( g , new Mesaurement ( ex . design .
v a r i a b l e s . f i n d ( ’ObjectiveFunctionValue ’ )
RULE TTE-4. Regarding experimental analysis specification, the basic methodology implemented through the transformation follows the guidelines described in Section §3.4.2. Consequently, the transformation generates the specification of factorial
ANOVA with repeated measures as primary analysis technique. If the premises of
ANOVA are not met, Friedman with the Homls post-hoc procedure is specified as
secondary analysis option. This analysis methodology is consistent with most of the
guidelines provided in the literature [69, 116, 195, 248]. The only difference between
this transformation rule and rule TTE-3 is the way results are filtered for comparison.
In this case, the filtering is performed for each level of the “configuration” variable, usually generating a multiple comparison test among datasets of objective function values
obtained with different configurations5 .
/ / RULE TTE−4 A n a l y s i s S p e c i f i c a t i o n
FROM MOEDL: : Experimet ( moe )
/ / T h i s means t h a t t h e e x p e r i m e n t i s a T e c h n i q u e P a r a m e t r i z a t i o n
WHEN moe . t e c h n i q u e s . s i z e ( ) = 1 AND moe . t e c h n i q u e s [ 0 ] . v a r i a n t s . s i z e ( ) = 0
TO SEDL : : E x p e r i m e n t a l A n a l y s i s S p e c i f i c a t i o n ( ans )
HAVING exp . design . a n a l y s i s S p e c i f i c a t o n =ans
/ / F a c t o r i a l ANOVA w i t h r e p e a t e d m e a s u r e s a s p r i m a r y a n a l y s i s
AND ans . addAnalysis (
new NHST( ’FactorialANOVAwRD ’ ,
/ / D a t a s e t f o r t h e a n a l y s i s ( a c c o r d i n g t o t h e d e s i g n c r e a t e d by r u l e TCE− 2)
new DataSetSpec ( {
new F i l t e r S e t ( exp . design . v a r i a b l e s . f i n d ( ’configuration ’ ) ) ,
new Grouping ( exp . design . v a r i a b l e s . f i n d ( ’instance ’ ) ) ,
new P r o j e c t i o n ( exp . design . v a r i a b l e s . f i n d ( ’ObjectiveFunction ’ ) ) } ) ,
/ / D e f a u l t a l p h a i s 0 . 0 5 ( t h i s param . i s c o n f i g u r a b l e i n t h e a c t u a l i m p l . )
0 . 0 5 ) ) . add ( new PostHoc ( ’Tukey ’ ,
. . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 )
/ / F r i e d m a n w i t h Holms p o s t −h o c a s s e c o n d a r y a n a l y s i s
AND ans . addAnalysis (
new NHST( ’Friedman ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) . add (
PostHoc ( ’Holms ’ , . . . [ The same D a t a s e t s p e c i f i e d a b o v e ] , 0 . 0 5 ) ) )
5 In
our current implementation the transformation takes into account the number of levels of “configuration” and generates the corresponding single comparison tests for simple comparisons
176
7.5. EXTENSION POINTS
7.5
E XTENSION POINTS
MOEDL provides the following extension points:
• Metaheuristic optimization experiment. The main extension point of MOEDL is the
own experiment definition. Thus new types of metaheuristic optimization experiments with additional sections can be created.
• Problem type and Instances specification. This extension point enables the integration of DSLs for specifying optimization problems an instances. Through this
extension point languages such as AMPL [102], GAMS [70] or MPS could be integrated into MOEDL.
• Metaheuristic Optimization technique. This extension point enables the integration
of DSLs for specifying metaheuristic optimization techniques, including their tailoring and tuning. The specific languages for used by the MOFs identified in
Chapter §5 could be integrated into MOEDL through this extension point.
7.6
S UMMARY
In this chapter, a language for describing MOEs named MOEDL has been proposed.
Its main elements have been introduced through a number of sample experimental descriptions using a plain text syntax. MOEDL supports the description of MOEs in an
the expressive, succinct and precise way. This language paves the way for the automated execution of MOEs, enabling the partial validation of internal consistency automatically through the analysis operations described in Section §6.4. A brief description
of MOEDL was published in [215].
177
7.6. SUMMARY
179
CHAPTER 7. METAHEURISTIC OPTIMIZATION EXPERIMENTS DESCRIPTION LANGUAGE
180
8
MOSES: A M ETA - HEURISTIC
O PTIMIZATION S OFTWARE
E COSYSTEM
Do you know what my favorite renewable fuel is? An ecosystem for innovation.
Thomas Friedman ,
1953 –
American Journalist
n this chapter we present a software ecosystem for the development of MPS-based solutions (MOSES). In Section §8.1 we introduce the chapter motivating the need for a
software ecosystem. In Section §8.2 the key features of the ecosystem are enumerated.
The design of the ecosystem is detailed in Section §8.3. In this chapter we also present a reference implementation of the ecosystem and its main components. Section §8.5 depicts our vision
of a fully fledged software ecosystem for metaheuristic optimization. Finally, in Section §8.6 we
summarize the chapter.
I
181
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
8.1
I NTRODUCTION
The survey of metaheuristic frameworks presented in Chapter §5 revealed several
limitations and obstacles in the current tool support for the MPS and MOE lifecyles,
among others:
• There are too many MOFs. Up to 34 MOFs were identified in the literature. This
proliferation of MOFs is not sustainable. In fact, about a half of those tools have
been discontinued or abandoned.
• There is no universal MOF. There is no MOF that support all the metaheuristic
techniques proposed. Also, the features of MOFs are disperse, and strong discrepancies appear between the scores at different areas.
• There is a lack of interoperability and reuse between MOFs. The only integration
feature supported by some MOFs is the capability of loading standard problem
format instances.
• MOFs are big already, and they are growing. Most MOFs have hundreds of
classes (c.f. Figure §5.6). The size of a framework is an indirect measure of its
complexity and therefore of the steepness of its learning curve.
• MOFs are evolving from software frameworks to implement metaheuristics toward optimization problem solving packages with a minimal experimentation
support. There exists a trend to extend MOFs with an ampler set of features for
supporting more activities in the MPS life-cycle and the experimentation process. MOFs are adding capabilities for experiment design, charting and reporting, batch execution and monitoring of jobs, and statistical analysis among others.
• There is no widespread adoption. Except for ECJ[179], ParadisEO [41] and HeuristicLab [283], MOFs are used mainly by its developers, with a small number of
external users.
Summarizing, there exists a scenario were a number of organizations create, maintain and evolve competing software (the MOFs) that try to support a similar set of
processes (the MPS lifecycle). These tools are not only competing, but also they are
complementary since no one provides support for all the activities and variants of the
MPS and MOE lifecycles. This creates a complex environment with high variability
182
8.2. FEATURES
and absence of interoperability, where portability and reuse of the developments and
experiments among MOFs is extremely difficult.
Considering the current of state of the art, it does not seem that proposing yet another MOF may contribute to facilitate the development of MPS-based solutions. Thus,
in this dissertation we adopt a different approach: we model the set of tools supporting the MPS life-cycle as a software ecosystem that we have coined as Metaheuristic
Optimization Software EcoSystem (MOSES). Software ecosystems are an emerging topic
in software engineering promoting the development of families of products developed
by independent contributors [53]. This approach enables embracing the high variability and multi-organizational structure of MPS in a natural way. It also supports the
integration of the different tools that can be both competitors and complementary into
the architecture of the ecosystem.
In the following sections, we describe the key features of MOSES outlining the main
aspects of its design. Then, we present MOSES[RI], a reference implementation of the
ecosystem, detailing its main components. Finally, we present our vision of the future
of MOSES as an extensible integration development environment for the implementation of MPS-based solutions.
8.2
F EATURES
Key features taken into account in the inception and development of MOSES can
be summarized differentiating among functional and non-functional features. Functional features define what must be supported while non-functional features define
constraints on how that support should be provided. The functional features required
by our ecosystem are described below:
FF.1 Aid and support for the design and development of metaheuristic algorithms: Executing the MPS life-cycle requires implementing one or several
metaheuristic programs. As described in Chapter §5 this feature is currently
fulfilled by MOFs. We consider this feature as mandatory.
FF.2 Experimental design support: Since several activities of the MPS life-cycle
involve decision-making through experimentation, the activities of the experimental process should also be supported by the ecosystem. Consequently,
some guidance in the design of experiments for the most common decisionmaking experiments used in the MPS life-cycle should be provided. As a first
183
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
approach to support this feature, the ecosystem should enable the automated
analysis of experiments and lab-packs using the analysis operations described
in Chapter §6.
FF.3 Experiment execution support: Once the metaheurisic algorithm is implemented and a suitable experimental design has been identified, the experiment must be conducted. Thus, the ecosystem should automate such execution, and allow users to monitor an manage the optimization process involved.
Thus, the ecosystem should support the automated condution of MOEs.
FF.4 Results analysis support: After experiment execution, the results obtained
must be analyzed in order to disprove or confirm the hypothesis of the experiment. Such analysis is usually performed in the case of metaheuristic experiments by means of statistical techniques. The specific statistical tests that must
be used in the analysis of results depends both on the experiment design, and
on the number and nature of the variables of the experiment. As a result, the
ecosystem should support as many statistical tests as possible, and aid in the
appropriate selection and use of the test for a given dataset or lab-pack.
FF.5 Report generation: The ecosystem should provide reports and charts on the
results of experiments, such as histograms, tables, etc.
Regarding the non-functional features required by our ecosystem we have identified the following:
NFF.1 High performance: Since one of our goals is to speed up the execution of
the MPS life-cycle, the architecture of the software ecosystem should promote a high performance. This requirement suggests the use of distributed
an parallel computing capabilities. This feature is specially important for experimental conduction, since it can take weeks to complete the conduction
some experiments.
NFF.2 Incremental deployment: The architecture of the ecosystem will have a
number of components, and each implementation of such components can
provide a specific set of features in order to fulfil its responsibilities. This
structure creates a high degree of variability, with a high number of possible
configurations. For instance, R + JCLEC + R, SPSS + HeuristicLab + Excel,
FOM + STATService. The ecosystem should support the installation/deployment and un-installation/un-deployment of components as required by
184
8.2. FEATURES
its users. Each component of the ecosystem should be able to work in isolation, and integrate with other components as they are deployed/installed.
This implies keeping the dependency relationships among the components
untangled, and protect its logic from failures in the related components, in
order to avoid the “nothing works until everything work” syndrome. Since
services are deployed, managed and versioning independently, the adoption
of a service oriented architecture could help us to support this feature.
NFF.3 Self-awareness & Self-description: The architecture of the software ecosystem should enable the enumeration of the components available in the ecosystem, and evaluate if the set of features provided is enough for executing the
specific tasks required to perform the activities required. In order to enable
such level of self-awareness, each component should be able to describe its
supported features and evaluate its ability to successfully fulfill a given request. For instance, a specific statistical analysis component should provide
a mechanism to answer the following question: Can you perform a nonparametric multiple comparison analysis of a dataset with a binary dependent variable and a single active independent variable with 3 levels?.
NFF.4 Interoperability: The inter-organizational nature of the software ecosystem
increases the probabilities of requiring the integration and automation of
information flows. As a consequence interoperability is a must for MOSES.
NFF.5 Openness:
(a) User openness: At least one open source implementation of each component should be available, in such a way that full support of the MPS
life-cycle is available to users with zero licensing costs.
(b) Contributors openness: The variability of the applications, and different variants and configurations of metaheuristics is huge. For instance,
readers can refer to the alternatives for implementing the crossover
operator enumerated in table §A.2. As a consequence, the ecosystem
should be open to incorporate new implementations of its components,
and some of those implementations should also be open to incorporate
changes by third parties, in order to fulfil their specific needs on their
application domains. In order to maintain the ecosystem fully functional, such variability must be modelled and controlled.
Introducing the appropriate variation points in the ecosystem architecture and in the underlying implementations is crucial to achieve such
185
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
flexibility and control. In this context users can extend and customize
their behaviour without losing stability or features that are not customized. This flexibility is known as the open-close principle, and is
usually stated as: “software entities (classes, modules, functions, etc.) should
be open for extension, but closed for modification” [188]. The fulfilment of
this requirement by the tools implies the clear definition of the extension points and interfaces to be used by the contributors.
(c) License model: Our experience in developing industry-ready tools (over
14 tools in last 5 years1 ) has caused us to reflect upon many aspects [270].
One of the lessons learned is that distributing or licensing our tools as
Open Source Software to a certain extent facilitates not only their adoption by third-parties but also its upkeep by some of these users. On occasions, some of these third-parties find it interesting to contribute with
the tool evolution and maintenance by adding new features or evolving existing ones; or even creating new related tools in order to build a
software ecosystem. Currently, MOSES is licensed as LGPLv3 [147].
8.3
R EFERENCE A RCHITECTURE
8.3.1
Architectural Style
A plethora of architectural styles and patterns has been proposed in the literature
[25, 53, 103], where systems can exhibit a combination of one or more styles and patterns. Service Oriented Architecture (SOA) is proposed as the architectural style for the
ecosystem, since it provides a elements to achieve a majority of the non-functional
features stated above. SOA defines the architecture of a software system as a collection of distributed components that provide or consume services. Those services are
reusable, distributed, loosely coupled, and accessible using standardized protocol and
data formats. SOA integrates in a natural way into the multi-organizational and distributed nature that is typical of software ecosystems, since services can be developed,
maintained, deployed and consumed by different organizations.
Adopting SOA allows the fulfilment of features NFF.2, NFF.3 and NFF.4. It supports
the fulfilment of NFF.2 since services are loosely coupled and independently versioned,
and deployed. It supports the fulfilment of NFF.3 since it introduces mechanisms for
1 www.isa.us.es/tools
186
8.3. REFERENCE ARCHITECTURE
registering and querying the available implementations of a service. Since services are
technologically agnostic by definition, NFF.4 is also fulfilled.
Regarding features NFF.1 and NFF.5, adopting SOA provides both advantages and
drawbacks regarding requirement NFF.1. On the one hand, the distributed nature of
the SOA and the routing and mediation mechanisms it supports enables the parallel
and distributed use of several service implementations, which contributes to achieve
high performance. On the other hand, the use of standard protocols and distributed
computing mechanisms for invocation introduces latency and overhead on each invocation. Since optimization problem solving with metaheuristics implies usually a high
computational cost, authors consider that the adoption of SOA is convenient regarding feature NFF.1. However, the use of the infrastructure elements required for achieving some of the advantages of SOA can undermine NFF.5a, since the available open
source implementations are heavyweight and require powerful computing platforms.
Moreover, the use of such infrastructure elements create a complex development and
configuration scenario which also undermines NFF.5b.
Furthermore, the combined use of SOA and web services standards provides responses to some of the software architecture challenges that the creation of software
ecosystems implies [36], namely:
• Interface Stability: When using web services standards such as WSDL+XML
Schema, the room for backward compatible changes on the services’ contracts
is wide [83]. The independent versioning and deployment of the services allows
maintaining multiple versions of the same service whilst the backward compatibility constraints is met. Reducing the impact of changes on the interface, through
the use of more granular independently maintained services. This challenge is
intimately related to that of extensibility, since the use of the extensibility mechanisms of the standards2 , allows maintaining the general interface of the service
stable while extending the functionality provided through it.
• Integration of external contributions: The integration challenge in the context of
software ecosystems has three dimensions: user interface, workflow and data.
Using SOA and web service standard allows addressing this challenge at two
out of those dimensions. In particular, it provides standards for integrating data
using XML, JSON, etc., and integrating workflows through standardized service
2 And more specifically XML Schemas, which is used to describe the structure of the input and output
parameters of the operations in a web service.
187
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
Figure 8.1: Template MOSES component
interfaces and orchestration engines and languages like BPEL.
8.3.2
Abstract Component View
The abstract component view of MOSES specifies the components of its architecture
by describing the interface implementations3 they must provide and they can require.
Thus, this view does not provide information about any implementation in particular,
i.e., it is implementation agnostic or abstract.
In order to fulfil features NFF.2, NFF.3 each MOSES component provides two optional interfaces named Validator and CapacityEvaluator, and requires an optional interface named Binder (see Figure §8.1)The validator interface has the responsibility of
checking the syntactic and semantic correctness of the information used in the domainspecific services provided by the components. Validator interface enables the integration of the property checking mechanisms described in Chapter §6 into MOSES by
incorporating these checks into the corresponding validation logic. Moreover, it constitutes a mechanism to protect some participants from the failures that could happen
in others and therefore, it contributes to fulfil requirement NFF.2.
CapacityEvaluator interface supports the partial fulfillment of requirement NFF.3 by
defining among the wide set of functional features that could be provided by one participant, the specific subset that a specific component does actually provide. For instance, the ExperimentalDataAnalyzer participant has the responsibility of performing
3 For
the sake of brevity we refer to interface implementation simply as interface except when confusions may appear.
188
8.3. REFERENCE ARCHITECTURE
analyses on the data generated by experiment executions. However, the set of analyses
on experimental data is huge, and not all those analysis are required in a specific area
(for instance MOEs are usually analysed using statistical test of hypothesis).
Consequently, a software component such as STATService playing the role of ExperimentalDataAnalyzer can define if it can perform a specific analysis in a experimental data set through its capacity evaluator interface. It is worth mentioning that this
interface also contributes to the fulfilment of requirement NFF.2 since the invocation of
an analysis that is not supported by the ExperimentalDataAnalyzer would result in an
error.
For the sake of clarity, the ports of interfaces provided are shown at the left, the
ports of services required are shown at the right, and the ports of the general purpose
services (Validators, CapacityEvaluators, and Binder) are shown at the bottom. A template of component showing this distribution is depicted in Figure §8.1 as an UML
component diagram.
The abstract component view of MOSES is depicted in Figure §8.2 as an UML Component Diagram using the SOAML profile [256]. It is important to note that each element in this architectural diagram is a participant, that defines a component type.
Consequently, each participant can be implemented by different tools on different instantiations of the ecosystem. Those elements stereotyped as “core” conform MOSES
[CORE]. The set of components depicted in Figure §8.2 are described below:
Metaheuristic Development Platform. It is a core participant since it provides the
basic mechanisms for incorporating problem’s knowledge into the a metaheuristic and
implementing metaheuristic algorithms. Consequently, any software component playing the role of this participant should support the use of one or more MOFs.
This component provides two specific interfaces named ProjectGenerator and Packager. The project generation interface is responsible of creating development projects
based on MOEDL MetaheuriticOptimizatonExperiments. Those projects enable the implementation of metaheuristic algorithms by developers. The MOFs supported by
MOSES would beused for such implementation, and consequently the corresponding software artefacts should be included in the project as libraries, dependences or
references (depending on the implementation technology and project type).
The packaging service is responsible of packaging such development projects into
a SEA lab-pack that could be used to interact with the remainder participants of the
189
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
Figure 8.2: Components of MOSES
190
8.3. REFERENCE ARCHITECTURE
ecosystem.
ExperimentalExecutor. It is a core participant since it has the responsibility of executing experiments. In so doing, it requires storing the information regarding the experiment as a SEA lab-pack. Consequently, this provides two specific interfaces named
Executor and Deployer. The former is responsible of executing an experiment with
a specific configuration. According to SEA the experiment should be described as a
SEDL BasicExperiment or a domain-specific experiment type that extends SEDL Experiment. This constitutes an important variation point, where different software component playing the role of this participant in a Particular ecosystem instances could provide support for different domain-specific experiment types. Consequently, this participant provides interfaces CapacityEvaluator<Experiment> and Validator<Experiment>.
The execution of an experiment is a process that can take a long time and that the
transmission of labpacks for service invocation in distributed environments can involve a heavy load4 . Consequently, in order to fulfil NFF.1, the participant supports
the deployment of experiment lab-packs in this execution environment the Deployer
interface. A Validator<SEALabpack> interface is also provided. This participant can optionally consume the services provided by most of the remainder participants of the
ecosystem, in order to automate the publication of experimental lab-packs in repositories (through a Deployer interface), the analysis (through a Analyzer interface) and
the generation of reports (through a ReportGenerator interface) for the results of the
executions performed.
ExperimentalDataAnalizer. It is a core participant since statistical analysis and testing of hypotheses is essential for getting the right conclusions according to the results
obtained in the experiments. It provides the interface Analyzer. Consequently, it provides the interfaces CapabilityEvaluator<AnalysisRequest> and Validator<AnalysisRequest>.
This participant can optionally requires a ReportGenerator interface in order to generate reports from the analyzed data.
The set of additional participants defined in of MOSES are:
ExperimentalRepository. This participant has the responsibility of storing experimental information and allowing its querying and retrieval. The capability of de4 The
size of the lab-packs of our application experiments is in the order of tens of megabytes.
191
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
ploying and retrieving experimental information is achieved by exposing a Deployer
service, that enables the upload and download of SEA lab-packs. The capability of
querying the experimental lab-packs deployed is achieved through the Finder service
provided. There exists some tools that fulfil partially the role of this component, such
as the Optimization knowledge base [244] or the reproducible research repositories
[87, 202, 218, 223]. .
ExperimentalDesigner. A proper experimental design is essential to get the right answers to out research questions and solve problems at hand. This participant is responsible of aiding in the creation appropriate experimental descriptions. It provides
the interface Designer that generates experimental descriptions based on DSLs (such
as MOEDL) and other kinds of experiment specifications (typically less formal than
SEDL). Moreover, through the DesignExpander interface this participant is responsible of expanding experimental designs; i.e., it takes a PredefinedExperimentalDesign
and returns a FullySpecifiedExperimentalDesign as output. It is worth noting that
there exists types of experimental designs whose expansion is not possible. For instance, the treatments and measurements generated by a RacingAlgorithmDesign and
some ResponseSurfaceDesign are not predectible until the experiment is executed. Additionally, this participant provides the interface Validator<ExperimentalDesign> and
a capacity evaluators for each exposed service.
ExperimentalReportGenerator. This participant has the responsibility of generating
charts and reports in an automated or semi-automated way. It helps the actors to draw
conclusions from the results of the experiment and make decisions in the context of the
MPS life-cycle. Sample tools that could play the role of this participants are plugins or
scripts for ofimatic packages, or ad-hoc modules. It provides a unique domain specific interface named ReportGenerator, generic interfaces for validating requests and
evaluating the capability of generating reports by specific implementations.
8.4
MOSES R EFERENCE I MPLEMENTATION (MOSES[RI])
In this section we present a reference implementation of MOSES, MOSES [RI]. This
implementation is composed of three main components, a MOF (FOM), an Experimental Execution Platform (E3) and an statistical analysis tool (STATService). FOM
was fully described in Chapter §5, and STATService is below in this chapter.E3 enables
192
8.4. MOSES REFERENCE IMPLEMENTATION (MOSES[RI])
Figure 8.3: MOSES[RI] component diagram
the automated analysis and execution of SEDL and MOEDL experiments. For details
about E3 and SEA we refer the reader to Appendixes §F and §E respectively.
Figure §8.3 depicts an UML component diagram of MOSES-RI showing how it extends the main components of MOSES. As illustrated, FOM plays the MetaheuristicsDevelopmentEnvironment role, E3 plays the ExperimentExecutor role and STATService plays
the ExperimentalDataAnalyzer role.
The current version of MOSES [RI] define components, interfaces and exchange
data formats. However, all these elements must still be integrated to release all their
potential. This is part of our future work. Figure §8.8 depicts a prototype interface
of our vision of such an integrated development environment for MPS-based solutions. In such an integrated environment users could create MPS projects for solving
optimization problems, choosing the specific MOF to be used for implementation. The
system could aid in this decision using the data form of comparative framework, showing which MOFs provide better support for the techniques that the user plans to apply.
Additionally, the system could integrate E3 and STATService in order to automate the
execution and analysis of the experiment.
193
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
Figure 8.4: MOSES[RI] deployment diagram
8.4.1 STATService
STATService is a suite of on-line software tools to perform the most usual Null Hypotheses Statistical Tests (NHST) in the field of metaheuristics. The tool provides several user interfaces, including a web portal, which makes it extremely easy to use. The
tool supports a variety of both parametric and non parametric tests and an integrated
decision tree which selects automatically the most suitable tests for the input dataset.
Additionally, the tool assists on the interpretation of the results (using colours, tables
and graphs) which ease drawing conclusions even for amateur users. The web portal
of STATService is available at http://moses.us.es/statservice.
Interfaces
STATService can be used using four different interfaces, namely:
• A distributable open-source java package with all the implementations of the
tests in order to be integrated into other java applications.
• A web portal that allows importing data and applying the tests and post-hoc
analyses from any standard browser. The input data format supported are: comma
separated values (CSV), plain text with user-defined delimiters, and MS Excel
spreadsheets.
• XML web services that allow the programmatic invocation of STATService from
any computer platform and programming language in a distributed and standards-
194
8.4. MOSES REFERENCE IMPLEMENTATION (MOSES[RI])
Figure 8.5: Architecture and users of STATService
based way.
• A MS Excel plugin which (like the web portal) is aimed to ease the use of STATService directly through the interface of the spreadsheet.
Design
The architecture of STATService is described as an UML component diagram in
Figure §8.5. The diagram has been decorated with additional images, that are used to
better describe their elements.
The architecture of STATService is conceived for creating a distributed system,
where users can access the functionality of statistical analysis ubiquitously through
Internet. Figure §8.5 shows two different nodes represented as 3D boxes, namely: the
server where STATService is deployed and the computer of the user.
The XML web services implements the interface defined by the ExperimentalDataAnalizer participant, which integrates STATService as a component of MOSES. This allows
a seamless interaction with other MOSES components independently of the clients
platform and development technology.
Both the web and XML interfaces use a common core implementation of the statistical analysis logic. This architecture reduces the implementation burden, promotes
reuse, and ensures the consistence of results between both interfaces. The core test
195
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
implementation consists of a set of statistical test developed by authors, and a refactoring and re-design of the source base provided by the SCI 2 S research group as a companion to their papers [111]. Additionally, this core tests implementation integrates
some test from the JavaNPST library [69] and from the statistics package of the Apache
Commons Math library [12]. The different components of the core implementation are
shown in Figure §8.5 on the right side as a zoom.
Statistical tests supported
STATService implements a wide set of parametric (pairwise and multiple comparison), non-parametric (pairwise and multiple comparison), normality and homocedasticity tests. It also provides post-hoc procedures for multiple comparison. The tests
implemented are shown in Appendix §D. Besides this, STATService offers a service
called SMARTest that selects the best set of tests to carry out a statistical comparison
according to a specific methodology. This service analyses properties such as normality or homocedasticity using statistical tests, and executes the best test according to
the decision tree shown in Figure §8.6. It performs an evaluation of the premises of
parametric tests (except for independence), and chooses the specific test to be applied
based on the methodology described in [69].
Reporting results
For each p-value provided as a result, STATService provides its value, the value of
the statistic, the distribution used to compute it (including its freedom degrees), and
the significance level that should be used for rejecting the null hypothesis H0 (using
0.05 by default for usual tests and the adjusted value for post-hoc procedures). STATService generates rankings and the complete table of p-values for non-parametric multiple comparison and post-hoc procedures respectively. The use of colors fonts in the
p-values, and the presence of links to the information regarding the tests, makes it
straightforward drawing conclusions from the statistical information.
When used through its web interface, STATService uses the font colour to show the
meaning of the p-value, in order to help users to interpret its value (using red for H0
rejection, and green for H0 non rejection). Furthermore, STATService shows the results
of the evaluation of the decision tree step by step, and it describes the alternative tests
that should be used if the assumptions and considerations made during the evaluation
of the decision tree are not met. In turn, it shows links that allow the application of the
alternative tests in the decision tree directly. Figure§8.7 shows some snapshots of the
web portal interface of STATService.
196
8.5. USING MOSES
Figure 8.6: Decision tree used for test selection
8.5
U SING MOSES
8.5.1
MOSES Studio
In order to ease the development of MPS solutions, we have incorporated a web
application called MOSES studio. MOSES studio does not only provide a MOEDL/SEDL documents editor, but it also provides several facilities to validate and analyze
MOEDL/SEDL documents. Its user interface is summarized in the diagram of Figure
§8.8. As figure denotes, MOSES studio presents the information according to the kind
of document loaded. Its main features are the following.
1. Basic documents management operations, File menu depicted in the Figure, including:
• create by-default MOEDL/SEDL descriptions to make easier its description from
scratch.
197
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
• create scenarios to launch several analysis operations by using a simple Javascript
syntax.
• perform typical file system operations with the documents and scenarios
such as: open, save, save as, print, or rename.
• download documents and scenarios in several formats, namely, SEDL, MOEDL
and SEDL/MOEDL serialised as XML, SEA lab-pack.
2. Common advanced operations that apply to any experiment description, Main
window menu, including:
• Experiment edition including facilities such as undo, redo, copy, paste, and
find.
• Experiment validation by means of a unique validate button that performs
a different checking and explaining depending on the kind of experiment
currently loaded.
• Experiment conduction by means of a unique through E3.
3. SEDL-specific operations that were included in the Tool menu when a SEDL description is loaded. The supported operations are:
• To analyse the experiment conduction. Such an analysis may check if the validity of the experiment is threatened using the automated analysis operations
described in Chapter §6. If threats are detected, an explaining report with
advices for neutralization is showed, and the sections of experimental descriptions affected are highlighted.
4. To perform MOEDL-specific operations that were included in the Tool menu when
a MOEDL description is loaded. The supported operations are:
• To obtain the equivalent SEDL description.
5. Repository mangement allows to add or remove research repositories and perform searches by different criteria, such as, optimization problem or instances,
experimental subjects (author), technique, etc.
The MOEDL and SEDL editor depicted in Figure §8.8 provides the following features:
1. Syntax highlighting.
198
8.6. SUMMARY
2. Auto-complete
The experimental scenarios developer, that is depicted in Figure §8.8, allows to launch
several analysis operations consecutively by using a simple JavaScript notation, and
the result is shown in a log window. Note that this developer scenario allows experimenting with more than one analysis operation, configuration and experimental
execution.
8.6
S UMMARY
In this chapter, we presented a software ecosystem for the development of MPSbased solutions. This ecosystem set the basis for the integration of the current disparate
metaheuristic tools. In particular, we described the features and the design of our
ecosystem MOSES in detail and we proposed a reference implementation of its main
components. Finally, we glimpsed our vision of the future of the ecosystem as an
open IDE for the development of MPS-based solutions. A subset of the contributions
presented in this paper were presented at the national conference MAEB. In [211] the
concept of software ecosystem for metaheuristic optimization was presented. In [214]
we proposed STATService. Finally, in [215] MOEDL was proposed.
199
CHAPTER 8. MOSES: A META-HEURISTIC OPTIMIZATION SOFTWARE ECOSYSTEM
(a) Home page of STATService
(b) STATService data reviewer & editor
(c) Test selection form
(d) Results provided by STATService (decision tree,
ranking and p-values)
Figure 8.7: Snapshots of the STATService web portal
200
8.6. SUMMARY
Figure 8.8: MOSES Studio user interface navigability.
201
PART IV
VALIDATION
9
VALIDATION
At the heart of science is an essential balance between two seemingly contradictory attitudes –an openness
to new ideas, no matter how bizarre or counter-intuitive they may be, and the most ruthless sceptical scrutiny
of all ideas, old and new. This is how deep truths are winnowed from deep nonsense.
Carl Sagan,
1934–1996
American astronomer, exobiologist and writer
n this chapter, we report the results of MOSES validation. In particular, we explain
how we used the ecosystem to implement, evaluate an analyze two MPS applications
in the context of search-based software engineering: QoS-Gasp and ETHOM. Section
§9.1 introduces how the validation was undertaken. Sections §9.2 and §9.3 explain how we
used MOSES along the MPS lifecyles of QoS-Gasp and ETHOM respectively. Finally, some
conclusions are summarized in Section §9.4.
I
205
CHAPTER 9. VALIDATION
9.1
I NTRODUCTION
As part of our thesis, we developed two specific algorithms, QoS-Gasp and ETHOM,
to solve two significant problems in the context of search-based software engineering.
These problems and algorithms are summarized in the following sections and fully described in Appendixes §G and §H. The development, evaluation and analysis of both
algorithms were tedious, error-prone and extremely time-consuming. This was one of
the main reasons that motivated us to propose a set of tools to reduce the cost of using
metaheuristics in the context of software engineering. In this chapter, we report the
results of using MOSES to replicate all the experiments performed during the development of QoS-Gasp and ETHOM. This has been used as a validation of the ecosystem
assessing the gains that it provides in real settings.
For each problem, we explain how we used the tools of the ecosystem along the
MPS lifecyle. On each phase, we used MOEDL to describe the experimental descriptions and their results. Also, we created a lab-pack for each experiment including the
instances of the problem. Once the MOEDL documents were ready, we used E3 to
automatically check their internal validity. This helped us to detect and fix a bug in
the implementation of QoS-Gasp. Then, we again used E3 to load the experiments described in MOEDL and run them automatically using, among others, our framework
FOM. Finally, after the execution of each experiment, we used STATService to perform
all the required statistical tests automatically and obtain conclusions.
9.2
Q O S- AWARE COMPOSITE WEB SERVICES BINDING
In service oriented scenarios, applications are created by composing atomic services
and exposing the resulting added value logic as a service. When several alternative service providers are available for composition, quality of service (QoS) properties such
as execution time, cost, or availability are taken into account to make the choice, leading to the creation of QoS-aware composite web services. Finding the set of service
providers that result in the best QoS is a NP-hard optimization problem. To address
this problem, we propose QoS-Gasp, a metaheuristic algorithm for performing QoSaware web service composition at runtime. QoS-Gasp is an hybrid approach that combines GRASP with Path Relinking. For the evaluation of our approach we compared
it with related metaheuristic algorithms found in the literature. The experiments show
that when results must be available in seconds, QoS-Gasp improves the results of pre-
206
9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING
vious proposals up to 40%. Beside this, QoS-Gasp found better solutions than any of
the runs of the techniques compared in a 92% of the runs when results must be available in 100ms. The complete description of QoS-Gasp and the experimental results are
fully reported in Appendix §G. This work has been submitted to the IEEE Transactions
on Services Computing journal.
In the following subsections, we provide a complete description of how MOSES
supported the design and validation of QoS-Gasp on each one of the phases of the
MPS life-cycle.
9.2.1
Selection
Regarding the selection phase, it is worth noting that QoS-Gasp is not our first
attempt to design an effective algorithm for solving the QoSWSCB problem. On the
contrary, an hybrid of Evolutionary Algorithm and Tabu Search was proposed initially
for solving the problem, since it is a simple hybridization of the most widely used
metaheuristic for this problem [43]. The results of this algorithm were good for small
instances of the problem and it was published in [210]. However, the algorithm did not
provide good results for large problems. Next, an Ant Colony Optimization algorithm
was designed but its results were unsatisfactory. Finally, we explored the possibilities
of using GRASP and its hybridization with Path Relinking for solving this problem.
This process of selection by educated guessing, trial and error took a long time, which
was one of the main motivations for the development of MOSES. Summarizing, the
selection of the metaheuristic was performed based on authors’ experience, the known
properties of previous proposals, and the specific metaheuristic algorithms applied in
most cases. Since our first efforts to solve this problem date back to 2008, MOSES
could not be applied for the selection of such approaches except for QoS-Gasp. In the
remainder of this section, we refer exclusively to the application of the MPS life-cycle
aimed at designing and validating QoS-Gasp.
We used MOSES to compare four different metaheuristics and select the one providing better results for the QoSWSCB problem. More specifically, we compared Genetic Algorithms (GA) [43], hybrid Tabu Search with Simulated Annealing (TS+SA)
[166], and our approaches, GRASP and GRASP with Path Relinking (GRASP + PR).
For the comparison, we used two different objective functions (a.k.a. global QoS functions) reported in the literature. Figures §9.1 and §9.2 show the MOEDL documents
used for the experiments. We will refer to these experiments as Exp1 and Exp2 respectively. For each experiment, we created a lab-pack with 11 instances of the problem.
207
CHAPTER 9. VALIDATION
MOEDL: : EXPERIMENT : QoSWSCB1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
Problem Types :
QoSWSCB( ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ )
O b j e c t i v e f u n c t i o n s : GlobalQoS
I n s t a n c e s ( f i l e : ’Problem -${i}. qoswsc ’ ) :
P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10
Optimization Techniques :
GRASP1(GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL cre ation : { type : RangeBased , alpha : 0 . 2 5 ,
g−f u n c t i o n : {Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’}}
Local improvement : SD
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’}
GRASP+PR${g−f u n c t i o n } : {
T e c h n i q u e : PR ,
i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector ,
I n i t i a l i z a t i o n S c h e m e : T e c h n i q u e (GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL crea tion : RangeBased{ alpha : 0 . 2 5 ,
g−f u n c t i o n : V a r i a n t s {
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) }
Local improvement : SD ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’
}
},
g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector , e l i t e S e t S i z e : 2 0 , r e l i n k i n g S t e p s : 5 0
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ }
EACanfora (EA) {
I n i t i a l i z a t i o n S c h e m e : Random ,
populationSize : 100 , mutationProbability : 0 . 0 1 ,
c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : { type : RouletteWheel }
m u t a t i o n S e l e c t o r : RandomSelector , s u r v i v a l R e p l a c e r :
PrioritizedCompositeSelector{
m a i n S e l e c t o r : { type : E l i t i s t S e l e c t o r , r a t e : 2 , a b s o l u t e : t r u e }
s e c o n d a r y S e l e c t o r : { type : RouletteWheel }
}
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’ }
TS+SA( c l a s s : ’es.us.isa.qoswscb.technique. HybridTSandSA ’ ) {
memory : Recency ( 1 0 0 } , A s p i r a t i o n : BestImprovement ,
c oo li ng s ch em e : E x p o n e n t i a l ( r : 0 . 9 5 } , i n i t i a l t e m p e r a t u r e : 1 0 0 0 0 ,
neigbours per iteration : 5 ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBExplorableSolution ’
}
T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) ,
MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) )
Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m
B a s i c ( seed : 2 3 1 7 5 2 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5
Figure 9.1: Selection experiment for QoSWSC (Exp 1)
208
9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING
MOEDL: : EXPERIMENT : QoSWSCB2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
Problem Types :
QoSWSCB( ’es.us.isa.qowswc.problem. CanforaProblemDefinition ’ )
O b j e c t i v e f u n c t i o n s : CanforasGlobalQoS
I n s t a n c e s ( f i l e : ’Problem -${i}. qoswsc ’ ) :
P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10
QoSWSCB O b j e c t i v e f u n c t i o n s : GlobalQoS
Optimization Techniques :
GRASP1(GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL cre ation : { type : RangeBased , alpha : 0 . 2 5 ,
g−f u n c t i o n : {Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’}}
Local improvement : SD
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’}
GRASP+PR${g−f u n c t i o n } : {
T e c h n i q u e : PR ,
i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector ,
I n i t i a l i z a t i o n S c h e m e : T e c h n i q u e (GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL crea tion : RangeBased{ alpha : 0 . 2 5 ,
g−f u n c t i o n : V a r i a n t s {
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) }
Local improvement : SD ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’
}
},
g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector , e l i t e S e t S i z e : 2 0 , r e l i n k i n g S t e p s : 5 0
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’ }
EACanfora (EA) {
I n i t i a l i z a t i o n S c h e m e : Random ,
populationSize : 100 , mutationProbability : 0 . 0 1 ,
c r o s s o v e r P r o b a b i l i t y : 0 . 7 , c r o s s o v e r S e l e c t o r : { type : RouletteWheel }
m u t a t i o n S e l e c t o r : RandomSelector , s u r v i v a l R e p l a c e r :
PrioritizedCompositeSelector{
m a i n S e l e c t o r : { type : E l i t i s t S e l e c t o r , r a t e : 2 , a b s o l u t e : t r u e }
s e c o n d a r y S e l e c t o r : { type : RouletteWheel }
}
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBIndividual ’ }
TS+SA( c l a s s : ’es.us.isa.qoswscb.technique. HybridTSandSA ’ ) {
memory : Recency ( 1 0 0 } , A s p i r a t i o n : BestImprovement ,
c oo li ng s ch em e : E x p o n e n t i a l ( r : 0 . 9 5 } , i n i t i a l t e m p e r a t u r e : 1 0 0 0 0 ,
neigbours per iteration : 5 ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCBExplorableSolution ’
}
T e r m i n a t i o n C r i t e r i o n : RepeatForEach ( MaxTime ( 1 0 0 ) , MaxTime ( 5 0 0 ) ,
MaxTime ( 1 0 0 0 ) , MaxTime ( 5 0 0 0 ) , MaxTime ( 1 0 0 0 0 ) )
Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m
B a s i c ( seed : 6 4 8 2 7 5 2 , c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5
Figure 9.2: Selection experiment for QoSWSC (Exp 2)
209
CHAPTER 9. VALIDATION
Before running the experiments, we used E3 to check the validity of the MOEDL
documents automatically. No threats to validity were detected.
9.2.2
Implementation
For the implementation of the metaheuristic program we used our framework FOM
since it supports all the techniques under comparison in this experiment. Additionally,
this was helpful to test the framework in a new application scenario.
9.2.3
Tailoring
In order to find the best variant of GRASP+PR for QoS-Gasp we performed an additional experiment (Exp3). In the experiment, we compared seven greedy functions
and three specific values of α (greediness parameter). Figure §9.3 depicts the MOEDL
document used to describe this experiment. Again, we used E3 to check the validity of
the MOEDL documents automatically. Interestingly, the operation “missing measurement” detected an error invalidating our results, i.e.,. the number of measurements did
not match the expected number of measurements in the design. After some debugging
we detected that the problem was a bug in the implementation of the tailorings. We
fixed the bug and repeated the experiment and the analysis with no more detected
threats.
210
9.2. QOS-AWARE COMPOSITE WEB SERVICES BINDING
MOEDL: : EXPERIMENT : QoSWSCBA1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : TechniqueComparison , methodology : BasicMOSES , runs : 2 0
Problems :
QoSWSCB O b j e c t i v e f u n c t i o n s : GlobalQoS
c l a s s : ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’
I n s t a n c e s ( f i l e : ’A1 -P${instance }. qoswsc ’ ) :
P0 , P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10
Optimization Techniques :
GRASPTech (GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL cre ation : RangeBased ( alpha : V a r i a n t s { 0 . 2 5 , 0 . 5 , 0 . 7 5 } ) ,
g−f u n c t i o n : V a r i a n t s {
Custom ( ’G1’ , c l a s s : ’es.us.isa.qoswsc.G1’ ) ,
Custom ( ’G2’ , c l a s s : ’es.us.isa.qoswsc.G2’ ) ,
Custom ( ’G3’ , c l a s s : ’es.us.isa.qoswsc.G3’ ) ,
Custom ( ’G4’ , c l a s s : ’es.us.isa.qoswsc.G5’ ) ,
Custom ( ’G5’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) ,
Custom ( ’G6’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) ,
Custom ( ’G7’ , c l a s s : ’es.us.isa.qoswsc.G7’ )
}
Local improvement : SD ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’
}
T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 )
Random Number Generator : //Mersenne t w i s t e r a l g o r i t h m ’ ,
B a s i c ( seed : 3 7 2 3 4 2 1 ,
c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5
Figure 9.3: Selection experiment for QoSWSC (Exp 3)
9.2.4
Tuning
For tuning QoS-Gasp we performed an additional experiment (Exp4) to search the
best combination of values for the parameters of path relinking, namely: number of
solutions in the elite set generated by GRASP, number of elite solutions, the number of
path per iteration and the number of neighbours to explore per path. Figure §9.4 shows
the MOEDL document describing this experiment. The operations for the automated
validation of the experiment revealed no threats.
211
CHAPTER 9. VALIDATION
MOEDL: : EXPERIMENT : QoSWSCBA2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
type : T e c h n i q u e P a r a m e t r i z a t i o n , methodology : BasicMOSES , runs : 2 0
Problem Types :
QoSWSCB( ’es.us.isa.qowswc.problem. QoSAwareWebServiceBinding ’ )
O b j e c t i v e f u n c t i o n s : GlobalQoS
I n s t a n c e s ( f i l e : ’A1 -P${instance }. qoswsc ’ ) :
P0 , P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P10
Optimization Techniques :
GRASP+PR ( PR) {
I n i t i a l i z a t i o n S c h e m e : M e t a h e u r i s t i c (GRASP) {
I n i t i a l i z a t i o n S c h e m e : Random ,
RCL crea tion : RangeBased{
alpha : 0 . 2 5 ,
g−f u n c t i o n : Custom ( ’G6’ , c l a s s : ’es.us.isa.qoswsc.G6’ ) } ,
Local improvement : SD ,
encoding : ’es.us.isa.qoswscb.solutions. QoSWSCB_GRASPSolution ’
}
},
i n i t i a l s o l u t i o n s e l e c t o r : RandomSelector ,
g u i d i n g s o l u t i o n s e l e c t o r : RandomSelector }
Parameters Space :
Dimensions :
e l i t e S e t S i z e enum 5 , 1 0 , 20
r e l i n k i n g S t e p s enum 1 0 , 2 0 , 50
T e r m i n a t i o n C r i t e r i o n : MaxTime ( 1 0 0 0 )
Random Number Generator : ( ’Mersenne twister algorithm ’ ,
B a s i c ( seed : 4 6 3 2 4 5 1 ,
c l a s s : ’org.apache.commons.math3.random. MersenneTwister ’ )
Configurations :
C1 :
Outputs : F i l e ’Results -${ finishTimestamp }.csv’
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FOM 0 . 5
Figure 9.4: Selection experiment for QoSWSC (Exp 4)
9.2.5
Analysis
Once the results of each experiment were obtained, we proceeded to analyze the
data. To that purpose, we uploaded the results of each experiment to the web interface
of STATService in CSV format. The tool automatically performed all the tests required
and returned the following results: i ) a ranking of variants, ii ) the p-values of all the
statistical tests, iii ) the p-values of the post-hoc procedures, and iv) the decision path
followed to perform the tests (c.f. Figure §8.6). Figure §9.5 shows a screenshot of the
analysis report provided by STATService for Exp3.
212
9.3. GENERATION OF HARD FMS
Figure 9.5: Results of STATService for Exp1 (100ms)
9.3
G ENERATION OF HARD FM S
A Feature Model (FM) is a compact representation of the products of a software
product line. The automated extraction of information from FMs is a thriving topic
involving numerous analysis operations, techniques and tools [26]. Performance evaluations in this domain mainly rely on the use of random FMs. However, these only
provide a rough idea of the behaviour of the tools with average problems and are not
sufficient to reveal their real strengths and weaknesses. To address this problem, we
propose to model the problem of finding computationally-hard FMs as an optimiza-
213
CHAPTER 9. VALIDATION
tion problem and we solve it using a novel Evolutionary algoriTHm for Optimized
feature Models (ETHOM). Given a tool and an analysis operation, ETHOM generates
input models of a predefined size maximizing aspects such as the execution time or
the memory consumption of the tool when performing the operation over the model.
This allows users and developers to know the behaviour of tools in pessimistic cases
providing a better idea of their real power. Experiments using ETHOM on a number of analyses and tools have successfully identified models producing much longer
executions times and higher memory consumption than those obtained with random
models of identical or even larger size. The complete description of ETHOM and the
experimental results are fully reported in Appendix §H. This work has been submitted
to the Information and Software Technology journal.
In the following subsections, we provide a complete description of how MOSES
supported the design and validation of ETHOM on each one of the phases of the MPS
life-cycle. Also, we present how the ecosystem contributed to replicate the experiments performed to evaluate the effectiveness of the algorithm in different scenarios.
The experimental description for this application are provided in SEDL since: i ) the
implementation was not performed with any MOF, and ii ) it allows to perform a more
complete validation testing the expressiveness SEDL with more experiments.
9.3.1
Selection
The idea of generating hard FMs to evaluate the performance of analysis tools was
inspired by the work of Wegener et al. [290]. In their work, the authors showed that
Evolutionary Algorithms (EAs) are effective in finding hard inputs for real time systems. Thus, we decided to follow the same approach, based on EAs, and we did not
compare any other techniques, i.e., no selection experiments were performed.
9.3.2
Implementation
The characteristics of the problem led us to defined a custom encoding (trees of
fixed size) and thus custom crossover and mutation operators as well as specific repairing mechanisms. As a result, we could not adapt any of the existing MOFs and we
had to implement an ad-hoc java implementation of ETHOM.
214
9.3. GENERATION OF HARD FMS
9.3.3
Tailoring
With the aim of finding a suitable tailoring of ETHOM, we performed numerous
executions of a sample optimization problem evaluating different combination of values for the tailoring points of the algorithm, presented in Table §9.1. Underlined values
were those providing better results and therefore those selected for the final configuration of ETHOM. The optimization problem used for tailoring was to find a FM maximizing the execution time invested by the analysis tool when checking whether the
model is void (i.e., whether it represents at least one product). We chose this analysis
operation because it is currently the most quoted in the literature [26]. In particular,
we looked for FMs of different size maximizing execution time in the CSP solver JaCoP integrated into the FaMa framework v1.0. We choose FaMa mainly because our
familiarity with the tool.
Tailoring Point
Variants evaluated and selected
Selection strategy
Roulette-wheel, 2-Tournament
Crossover strategy
One-point, Uniform
Infeasible individuals Replacing, Repairing
Table 9.1: Tailoring variants in ETHOM
Figure §9.6 depicts the description of the tailoring experiment using SEDL. The
analysis operations for internal validation revealed no threats in the experiment.
215
CHAPTER 9. VALIDATION
EXPERIMENT : ETHOM−A1 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Objects : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
NFeatures : 5 0 0 // Number o f f e a t u r e s o f t h e FM t o be ge nera ted
CTC : 20
// P e r c e n t a g e o f Cross Tree C o n s t r a i n t s t o be gene rat ed
S o l v e r : ’CSP -JaCoB ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
CrossoverProb : 0 . 7
MutationProb : 0 . 0 0 5
P o p u l a t i o n S i z e : 100
E x e c u t i o n s : 5000
Variables :
Factors :
s e l e c t i o n enum ’Roulette -wheel ’ , ’2-Tournament ’
c r o s s o v e r enum ’One -point ’ , ’Uniform ’
i n f e a s i b i l i t y T r e a t m e n t enum ’Reparison ’ , ’Relacement ’
Outcomes :
O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random
D e t a i l e d D e s i g n : Custom
Assignment : Random
Groups : by s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t s i z i n g 10
P r o t o c o l : Random
Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . )
A1 :
FactANOVAwRS( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 )
Tukey ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 )
A2 :
Friedman ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 )
Holms ( F i l t e r ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) , 0 . 0 5 )
Configuration C1 :
Outputs : F i l e ’Results -ETHOM -A1.csv’ r o l e : MainEvidence
format : CSV mapping : VarsPerColumn
S e t t i n g : Runtimes : J a v a 1 . 6 L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1
Procedure :
Command as Treatment ( s e l e c t i o n , c r o s s o v e r , i n f e a s i b i l i t y T r e a t m e n t ) :
’java -jar ETHOM Results.csv ${NFeatures} ${CTC} ${Executions}
\
${Solver} ${selection} ${crossover}
\
${ infeasibilityTreatment } ${ CrossoverProb } ${ MutationProb }\
${ PopulationSize } ’
Figure 9.6: Tailoring of ETHOM
9.3.4
Tuning
For tuning ETHOM we repeated the same process described for the tailoring but
this time we evaluated the solutions found with different values for the key parameters
of the algorithm. The parameters and values evaluated are presented in Table 9.2.
Figure §9.7 depicts the SEDL document used to describe and automate the execution
216
9.3. GENERATION OF HARD FMS
of the experiment. No threats were detected.
EXPERIMENT : ETHOM−A2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Object : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
NFeatures : 5 0 0 // Number o f f e a t u r e s o f t h e FM t o be ge nera ted
CTC : 20
// P e r c e n t a g e o f Cross Tree C o n s t r a i n t s t o be gene rat ed
S o l v e r : ’CSP -JaCoB ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
s e l e c t i o n : ’Roulette -wheel ’
c r o s s o v e r : ’One -point ’
i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’
Variables :
Factors :
CrossoverProb enum 0 . 7 , 0 . 8 , 0 . 9
MutationProb enum 0 . 0 0 5 , 0 . 0 0 7 5 , 0 . 0 2
P o p u l a t i o n S i z e enum 5 0 , 1 0 0 , 200
E x e c u t i o n s enum 2 0 0 0 , 5000
Outcome O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random
Assignment : Random
Groups :
s i z i n g 10
P r o t o c o l : Random
A n a l y s e s // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . )
A1 :
FactANOVAwRS( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 )
Tukey ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 )
A2 :
Friedman ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 )
Holms ( F i l t e r ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) , 0 . 0 5 )
Configurations
C1 :
Outputs : F i l e ’Results -ETHOM -A2.csv’ r o l e : MainEvidence
format : CSV mapping : VarsPerColumn
S e t t i n g : Runtimes : J a v a 1 . 6
L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0
Procedure :
Command as Treatment ( CrossoverProb , MutationProb , P o p u l a t i o n S i z e , E x e c t u i o n s ) :
’java -jar ETHOM Results -ETHOM -A2.csv ${NFeatures} ${CTC}
\
${Executions} ${Solver} ${selection} ${crossover}
\
${ infeasibilityTreatment } ${ CrossoverProb }
\
${ MutationProb } ${ PopulationSize } ’
Figure 9.7: Tuning of ETHOM
In total, we performed over 40 million executions of the objective function to find a
good tailoring and tuning of ETHOM.
217
CHAPTER 9. VALIDATION
Parameter
Crossover probability
Mutation probability
Size initial population
#Executions fitness function
Values evaluated and selected
0.7, 0.8, 0.9
0.0075, 0.005, 0.02
50, 100, 200
2000, 5000
Table 9.2: Tuning values in ETHOM
Figure 9.8: Analysis report and decision path generated by STATService
9.3.5
Analysis
For the statistical tests of each experiment we used the web interface of STATService. This allowed us to know if our hypothesis were confirmed in a few seconds.
Figure §9.8 shows an screenshot of the analysis report provided by STATService for experiment #1 of ETHOM. Additionally, we had to do some manual work to create fitness
evolution graphs, e.g. histograms. This was a feature that we missed in STATService
and that we plan to add in the future (c.f. Chapter §10).
9.3.6
Experiments on the generation hard FMs
Once that we found a suitable configuration for ETHOM, we performed several
experiments to evaluate its effectiveness with different optimization criteria. These
218
9.3. GENERATION OF HARD FMS
experiments are described in the following subsections.
Experiment #1: Maximizing execution time in a CSP Solver
In this experiment, we evaluated the ability of ETHOM to search for input feature
models maximizing the analysis time of a solver. In particular, we measured the execution time required by a CSP solver to find out if the input model was consistent
(i.e. it represents at least one product). This was the same problem used to tune the
configuration of our algorithm. Again, we chose the consistency operation because it
is currently the most used in the literature. Figure §9.9 depicts the SEDL document
used to describe this experiment.
In a related experiment we evaluated the ability of ETHOM to search for input feature models maximizing the analysis time of a solver. The only difference between this
experiment an the previously described is the solver used for the analysis (Parameter
Solver: ’FAMA-SAT’). Thus, we simply copied the experiment and changed the parameter in order to execute the experiment.
Experiment #2: Maximizing memory consumption in a BDD solver
In this experiment, we evaluated the ability of our evolutionary program to generate input FMs maximizing the memory consumption of a solver. In particular, we
measured the memory consumed by a BDD solver when finding out the number of
products represented by the model. We chose this analysis operation because it one of
the hardest operations in terms of complexity and it is currently the second operation
most quoted in the literature [26]. We decided to use a BDD-based reasoner for this
experiment since it has proved to be the most efficient option to perform this operation
[26]. The SEDL document describing this experiment is presented in Figure §9.10.
219
CHAPTER 9. VALIDATION
EXPERIMENT : ETHOM−E1a v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Objects : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
T e r m i n a t i o n c r i t e r i o n : ’MaxMObjFuncEvaluations (5000) ’
RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’}
Variables :
F a c t o r FMGenerator enum ETHOM( command : ’ETHOM ’ , s e l e c t i o n : ’Roulette -wheel ’ ,
c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’
c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 )
,
RandomGen ( command : ’RandomFMGenerator ’ )
Outcome O b j e c t i v e F u n c t i o n i n Z // B e s t value o f t h e o b j . func . found
NCFactors
NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000
CTC enum 1 0 , 2 0 , 3 0 , 40
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random
D e t a i l e d D e s i g n : Custom
Assignment : Random Blocking : NFeatures , CTC
Groups : FMGenerator s i z i n g 25
P r o t o c o l : Random
Analyses : // Use T−T e s t or Wilcoxon ( a . k . a . Mann−Withney )
A1 :
TTest ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
Tukey ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
A2 :
Friedman ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
Holms ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
Configurations :
C1 :
Outputs : F i l e ’Results -ETHOM -1a.csv’
E x p e r i m e n t a l S e t t i n g : Runtimes : J a v a 1 . 6
L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0
Experimental procedure :
Command as Treatment ( FMGenerator , NFeatures , CTC ) :
’java -jar ${ FMGenerator } Results -ETHOM -1a.csv ${NFeatures} ${CTC} \
${ Termination_criterion } ${Solver} ${FMGenerator.selection}
\
${FMGenerator.crossover} ${FMGenerator. infeasibilityTreatment }
\
${FMGenerator. CrossoverProb } ${FMGenerator. MutationProb}
\
${FMGenerator. PopulationSize }’ // I f a p r o p e r t y i s not d e f i n e d i t s value i s ’ ’
Figure 9.9: ETHOM - Experiment #1 in SEDL
Experiment #3: Evaluating the impact of the number of generations
During the work with ETHOM, we detected that the maximum number of generations used as stop criterion had a great impact in the results of the algorithm. We
evaluated that impact with a double aim. First, we tried to find out the minimum number of generations required by ETHOM to offer better results than random techniques
220
9.3. GENERATION OF HARD FMS
EXPERIMENT : ETHOM−E2 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Object : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
S o l v e r : ’SPLOT -BDD’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
T e r m i n a t i o n c r i t e r i o n : ’MaxMObjFuncEvaluations (5000) ’
RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’}
Variables :
Factors :
FMGenerator enum ETHOM( command : ’ETHOM ’ , s e l e c t i o n : ’Roulette -wheel ’ ,
c r o s s o v e r : ’One -point ’ , i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’
c r o s s o v e r P r o b : 0 . 9 , mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 )
,
RandomGen ( command : ’RandomFMGenerator ’ )
NCFactors :
NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000
CTC enum 1 0 , 2 0 , 3 0 , 40
Outcomes : O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random
Assignment : Random Blocking : NFeatures , CTC
Groups : by FMGenerator s i z i n g 25
P r o t o c o l : Random
Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . )
A1 :
TTest ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
A2 :
Wilcoxon ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
Configurations :
C1 :
Outputs : F i l e ’Results -ETHOM -2. csv’ r o l e : MainEvidence
format :CSV mapping : VarsPerColumn
Setting :
Runtimes : J a v a 1 . 6
L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0
Procedure :
Command as Treatment ( FMGenerator , NFeatures , CTC ) :
’java -jar ${ FMGenerator } Results -ETHOM -2. csv
${NFeatures} ${CTC} \
${ Termination_criterion } ${Solver} ${FMGenerator.selection}
\
${FMGenerator.crossover} ${FMGenerator. infeasibilityTreatment }
\
${FMGenerator. CrossoverProb } ${FMGenerator. MutationProb}
\
${FMGenerator. PopulationSize }’
Figure 9.10: ETHOM - Experiment #2 in SEDL
on the search for hard FMs. Second, we wanted to find out whether ETHOM was able
to find even harder models than in our previous experiments when allowed to run for
a large number of generations. In particular, we performed two experiments with two
different solvers for the evaluation of the fitness function, CSP and BDD. The description of the experiments only differed in the fitness function and so they were almost
identical. Figure §9.11 shows the SEDL description of the experiment with BDD.
221
CHAPTER 9. VALIDATION
EXPERIMENT : ETHOM−E3a v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Object : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
ETHOM: { s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ ,
i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ , c r o s s o v e r P r o b : 0 . 9 ,
mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 }
NFeatures : 500
CTC : 20
S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’}
Variables :
Factors :
NGenerations enum 1 0 , 2 5 , 5 0 , 7 5 , 1 0 0 , 125
Outcomes :
E f f e c t i v e n e s s I n t e g e r // % o f ti me s t h a t ETHOM outperforms Random Search
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random Assignment : Random
Groups : by FMGenerator s i z i n g 25
P r o t o c o l : Random
Analyses Spec : // Use ANOVA or Friedman ( with t h e i r corresponding PostHoc proc . )
A1 :
ANOVA( F i l t e r ( NGenerations ) , 0 . 0 5 )
Tukey ( F i l t e r ( NGenerations ) , 0 . 0 5 )
A2 :
KruskalWalls ( F i l t e r ( NGenerations ) , 0 . 0 5 )
Configurations :
C1 :
Outputs : F i l e ’Results -ETHOM -3a.csv’
Setting :
Runtimes : J a v a 1 . 6
L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0
Procedure :
Command as Treatment ( NGenerations ) :
’java -jar Effectiveness Results -ETHOM -3a.csv ${NFeatures }\
${CTC} ${ NGenerations * ETHOM. populationSize } ${Solver} \
${ETHOM.selection} ${ETHOM.crossover}
\
${ETHOM. infeasibilityTreatment } ${ETHOM. CrossoverProb } \
${ETHOM. MutationProb } ${ETHOM. PopulationSize }’
Figure 9.11: ETHOM - Experiment #3 in SEDL
Experiment #4: Evaluating the impact of the analysis heuristics
In this experiment we checked whether the hard FMs generated by our evolutionary approach were also hard for solvers using other heuristics. In particular, we repeated the analysis of the hardest FMs found in experiment #1 using the other seven
heuristics available in the CSP solver JaCoP. Figure §9.12 shows the description of this
experiment in SEDL.
222
9.4. SUMMARY
EXPERIMENT : ETHOM−E4 v e r s i o n 1 . 0 r e p : h t t p : //moses . us . es/E3
Object : ’Run of ETHOM for the parameters specified ’
Population : ’Any run of ETHOM with a valid tuning for the parameters specified ’
Constants :
S o l v e r : ’CSP -JaCoP ’ // S o l v e r used t o e v a l u a t e t h e a n a l y s i s o p e r a t i o n
ETHOM: { s e l e c t i o n : ’Roulette -wheel ’ , c r o s s o v e r : ’One -point ’ ,
i n f e a s i b i l i t y T r e a t m e n t : ’Repairing ’ , c r o s s o v e r P r o b : 0 . 9 ,
mutationProb : 0 . 0 0 7 5 , p o p u l a t i o n S i z e : 2 0 0 }
RandomNumberGenerator : { desc : ’Standard Java RND’ , c l a s s : ’java.util.Random ’}
Variables :
Factors :
J a C o P H e u r i s t i c enum ’MaxRegret ’ , ’LargestMin ’ , ’SmallestMax ’ , ’MostConstrainedDynamic ’ ,
’MinDomainOverDegree ’ , ’LargestDomain ’ , ’SmallestDomain ’ , ’SmallestMin ’
NCFactors :
NFeatures enum 2 0 0 , 4 0 0 , 6 0 0 , 8 0 0 , 1000
CTC enum 1 0 , 2 0 , 3 0 , 40
Outcomes : O b j e c t i v e F u n c t i o n I n t e g e r // B e s t value o f t h e o b j . func . found
Hypothesis : D i f f e r e n t i a l
Design :
Sampling : Random Assignment : Random
Blocking : NFeatures , CTC
Groups : by FMGenerator s i z i n g 25
P r o t o c o l : Random
Analyses :
A1 :
ANOVA( F i l t e r ( J a C o P H e u r i s t i c ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
A2 :
Friedman ( F i l t e r ( FMGenerator ) . Gropuing ( { NFeatures , CTC } ) , 0 . 0 5 )
Configuration C1 :
Outputs : F i l e ’Results -ETHOM -4. csv’ r o l e : MainEvidence
format :CSV mapping : VarsPerColumn
E x p e r i m e n t a l S e t t i n g : Runtimes : J a v a 1 . 6
L i b r a r i e s : FAMA 1 . 1 . 2 , B e t t y 1 . 1 . 1 , ETHOM 1 . 0
Experimental procedure :
Command as Treatment ( J a C o P H e u r i s t i c , NFeatures , CTC ) :
’java -jar ETHOM Results -ETHOM -4. csv ${NFeatures} ${CTC} \
${ETHOM.Executions} ${Solver} ${ETHOM.selection}
\
${ETHOM.crossover} ${ETHOM. infeasibilityTreatment }
\
${ETHOM. CrossoverProb } ${ETHOM. MutationProb}
\
${ETHOM. PopulationSize } ${ JaCoPHeuristic }’
Figure 9.12: ETHOM - Experiment #4 in SEDL
9.4
S UMMARY
In this chapter, we illustrated how MOSES may contribute to reduce the cost of using metaheuristics in the context of two search-based problems in software engineering. MOEDL and SEDL contributed to provide a succinct and self-contained description of the experiments and their results. This in turn made possible the automated detection of possible threats in the experiments. In fact, we automatically detected a bug
in the implementation of one of the algorithms. Also, the experimental descriptions
223
CHAPTER 9. VALIDATION
and their corresponding lab-packs enabled the automated execution of experiments
with one-click. Finally, STATService made the statistical analysis of the data complete
automated. These results support the validity of our conclusions as an effective approach to reduce the cost of using metaheuristics.
224
PART V
F INAL R EMARKS
10
C ONCLUSIONS
If at first, the idea is not absurd, then there is no hope for it.
Albert Einstein,
1879 – 1955
German physicist
It’s a dangerous business, Frodo, going out your door. You step onto the road, and if you don’t keep your
feet, there’s no knowing where you might be swept off to.
J. R. R. Tolkien,
(from The Lord of the Rings
1892 – 1973
English Writter
10.1
C ONCLUSIONS
The main conclusion that can be drawn from this dissertation is that:
The current support to develop MPS applications can be improved
with MOSES.
We are convinced that our dissertation is only a first, baby step, but it is in the right
way. We contributed in three key points: the language (SEDL/MOEDL), the Software
227
CHAPTER 10. CONCLUSIONS
Development Kit (FOM and the analysis operations catalogue) and the development
and execution environment (MOSES). These three aspects determine the capabilities
of many software development tools, and to the best of our knowledge, in the case of
MPS applications we are pioneers in providing both a novel experiments description
language and a novel development and execution environment. It has been a long path
to run, but we are convinced that our research strategy was the right one.
Throughout this dissertation we took some decisions that lead us to achieve our
goal successfully. Firstly, we decided betting for FOM as the reference MOF. Many
times we were about to abandon this decision, but after surveyed the state of the art
we concluded that in spite of on average it was not the MOF with the best score, it
provided the broadest spectrum of metaheuristic techniques. Therefore, it was good
enough start point to build a comprehensive solution for supporting MPS life-cycle.
Secondly, we decided to validate our work in building experimental intensive MPS
applications, what has supposed an important extra effort. We also forced ourselves to
develop publicly-available tools to show our progress. In this sense, some of the needs
of end-users help us to identify new and interesting features. At following we expose
some more specific conclusions.
Regarding the implementation of MPS-based applications, our comparison Framework has resulted useful not only as a reference guide for practitioners, but also as a
helpful tool in deciding on the directions for the further development of other MOFs
such as Eva2 and HeuristicLab (see Appendix §A). Furthermore, FOM has been used
by a local company to build MPS-based solutions in the urban traffic and operational
management arena.
Regarding the description of MOEs, SEDL was conceived with a twofold purpose:
as end-user language and as intermediate fully-fledged specification language for experiments, i.e. to serve as the target domain into which other domain-dependent experiment description languages would be translated in order to be benefited by its
automated analysis support. Our experience defining and using MOEDL in our validation scenarios allow us to claim that our initial conception has been successful and
it can be also found useful for other colleagues.
Regarding the automated analysis, the 15 analysis operations identified have a very
easy implementation which means our solution can be easily shared and reproduced.
Furthermore, the applicability of these analysis operations goes beyond SEDL and
could be potentially applicable for any experiment description language with a formal
semantics. With regard to STATService, it was conceived assuming that it would be
228
10.2. SUPPORT FOR RESULTS
used by inexperienced users with no background on statistical tests. Considering that
it has already being used by 9 labs in 5.countries we can conclude that our assumption
was in the right direction.
Regarding the automated conduction and replication of MOEs, MOSES makes easier the comprehensive development of MPS applications having been validated in experimentation intensive scenarios. MOSES has been incrementally developed with a
set of facilities to implement metaheuristic algorithms, to conduct and replicate SEDL
and MOEDL experiments, etc. However, we must validate it with more MPS applications, and some facilities must be included in the tool suite that integrates MOSES
in order to enhance its usefulness. In the following section we discuss some of these
potential enhancements.
Additionally, during the validation of our approach we have obtained results for
solving specific optimization problems in the Search Based Software Engineering area.
We provided an algorithm based on the hybridization of GRASP and Path Relinking
for solving the QoS-aware Web Service Composition problem at runtime. Furthermore, we collaborated with Dr. Segura for designing a novel evolutionary algorithm
for finding computationally–hard feature models.
As a final conclusion we conjecture that the lack of a commonly accepted MOF as
well as the lack of a comprehensive support of MPS applications life-cycle are delaying
the application of metaheuristics for solving optimization problems by researchers and
practitioners in Software Engineering. In this regard, we are confident that our work
will provide a foundation on which MPS applications can be built. Furthermore, apart
from our technique’s inherent value, having developed MOSES as open source tooling
support which can be quickly integrated and reused and easy-to-use is a determining
factor to achieve a useful result and settles the basis to spread the use of metaheuristics.
As an example, the replication of QOSGasp and ETHOM in MOSES takes only the time
to push the Start button.
10.2
S UPPORT FOR R ESULTS
Some of the results shown in this thesis have been already published in scientific
forums. Figure §10.1 summarises these publications, grouping them by two dimensions: type and topic. Five types are defined: book, journal, conferences, tool-demos
and workshops. Furthermore some types of publications have a quality level associ-
229
CHAPTER 10. CONCLUSIONS
ated, JCR for journals, and CORE and MAS (Microsoft Academic Search) rankings for
conferences.
Figure 10.1: Publications related to the contributions of this dissertation
10.3
D ISCUSSION , L IMITATIONS AND E XTENSIONS
In this section we discuss some of the decisions we have made in this dissertation
highlighting its main limitations and possible extensions. As performed in conclusions, we organize the content of this section in the main contributions of our work.
Regarding the description of MOEs, current experimental descriptions in SEDL
only support the definition of simple hypotheses, that relate a single dependent variable with a whole set of independent variables (either in terms of causality or of covariance). Consequently, complex research studies that comprise multiple related hypothesis and nested experimental designs are neither expressible in SEDL nor though the
provided analysis operations. In a similar way MOEDL has not been tested for multiobjective optimization problems, and the transformations defined in this dissertation
do not support them.
Extension: Extend the meta-models of SEDL and MOEDL and the catalogue of anal-
230
10.3. DISCUSSION, LIMITATIONS AND EXTENSIONS
ysis operations to deal with complex experiments and multi-objective optimization
problems.
Definitions of the transformation MOEDL2SEDL. Although there exist multiple
methodologies for performing each type of MOE supported by MOEDL (c.f. for instance [21] for technique parametrization experiments), a single transformation with a
specific experimental design has been provided.
Extension: Define a new participant and several services in MOSES for supporting
this new variability dimension. The participant would be responsible of transforming
experimental description according to different experimental methodologies, and of
validating experimental descriptions according to those experimental methodologies.
Flexibility of replicability and internal validity checkings. Although the checkings defined for replicability and internal validity are appropriate for MOEs, they
would result in wrong conclusions and false positives when applied to experiments
in other areas. For instance, the checking that the number of actual measuremets is
equal to the expetced given the experimental design leads to an positive detection if
there is even the smaller percentage of mortality or attrition. However, certain levels of
mortality or attrition are acceptable in other areas such as biology, medicine or social
sciences.
Extension: Extend the definition of the replicability and internal validity checkings
in order to support its customization to different scientific areas.
Scope of MOSES [RI]. The contributions presented in this dissertation have been
implemented in MOSES [RI]. Nevertheless this tool suite presents some limitations to
be overcome:
• Lack of a reference implementation for the ExperimentalRepository participant.
• Lack of a reference implementation for the ExperimentalDesigner participant.
• Lack of a reference implementation for theExperimentalReportGenerator participant.
Extensions: Current components could also be extended in the following ways:
• STATService:
– Generate Histograms and Box-Plots.
231
CHAPTER 10. CONCLUSIONS
– Generate p-value interpretation diagrams.
– Compute the power of the tests and the required sample size to reach a minimum power1 .
• FOM: The improvements to be performed in MOSES are described through the
evaluation performed in Chapter §5.
• MOSES: We plan to integrate all the tools and implement MOSES Studio. Furthermore, we plan to study the possibilities of other software ecosystems, such
as Eclipse to support the features of MOSES on desktop.
1 Remember
that the power evaluates the probability of having false negatives in NHST, i.e., that the
null hypothesis is accepted when there exist significant differences.
232
PART VI
A PPENDICES
A
MOF S ASSESSMENT DATA
n this appendix, we provide detailed information about the scores obtained in each characteristic by each framework. Interested readers can obtain more detailed information about
assessment on characteristics and features (including comments on problems found on the
assessment, penalizations on some features and its underlying reasons and informations sources
used to assess it) in http:// www.isa.us.es/ MOFComparison. Moreover, this spreadsheet
can be downloaded and exported to various formats, and is provided in such a way that user
can customize weights of each characteristic, feature and area, allowing the creation of tailored
benchmarks more adapted to its specific needs.
I
235
APPENDIX A. MOFS ASSESSMENT DATA
A.1
E VALUATION PER A REA
In order to assess the MOFs selected on each area of our comparative framework,
we have crawled their the source code, user and technical documentation, and user
interface. The result of such assessment are the tables shown in this section. The last
column of the tables of this section show the number of MOFs supporting each feature.
The last two rows show the number of features supported by each MOF, and a score
computed as the weighted sum of features supported divided by the number of characteristics in the area. Table §A.1 shows the feature coverage of Area C1, along with
the weight corresponding to each feature in its associated characteristic
Table §A.2 shows the feature coverage of area C2, along with the weight corresponding to each feature in its associated characteristics.
Table §A.1 shows feature coverage of Area C3, along with the weight corresponding
to each feature in its associated characteristic.
The feature coverage of C4 area is shown on Table §A.4, along with the weight
corresponding to each feature in its associated characteristic. As an exception, the
features of the GUI characteristic has been assessed using a real value between 0.0
and 1.0.
236
C1 Supported Metaheuristics
TS
GRASP
VNS
EA
PSO
4
4
4
4
4
4
4
4
4
4
4
4
SUM
4
MALLBA
4
HeuristicLab
4
EasyLocal
4
Opt4j
4
OAT
FOM
4
JCLEC
EvA2
SA
Feature
Weight
Basic Implementa0.5
tion
Multi-Start
0.5
Basic Impl.
0.5
Lineal Annealing
0.1
Exponential/Geometric0.1
Annealing
Logaritmic
An0.1
nealing
Metropolic Accep0.1
tance
Logistic
Accep0.1
tance
Basic Impl.
0.3
Recent Features/0.2
Moves Based Tabu
Memory
Frecuency Based
0.3
Tabu Memory
Basic
Aspiration
0.2
Criteria
1
Basic VNS (VNS)
0.2
Variable
Neigh0.2
borhood Descent
(VND)
Reduced
VNS
0.2
(RVNS)
VNS with Decom0.2
position
Skewed
VNS
0.2
(SVNS)
Basic EA Imple0.2
mentation of GA
Basic EA Imple0.2
mentation of ES
Basic EA Imple0.2
mentation of GP
GAVaPS
0.05
Diploid Individu0.05
als support
Coevolution sup0.1
port
Differential evolu0.1
tion
Niching Methods
0.1
Basic Implementa0.3
tion
Discrete Variable
0.2
Support
ParadisEO
SD/HC
ECJ
Area
Characteristic
A.1. EVALUATION PER AREA
4
4
4
8
∼
4
4
4
4
4
4
4
7.5
7
3
5
4
4
1
4
4
4
4
4
4
4
4
7
0
4
4
4
∼
4
4
4
4
4
4
4
4
4
4
4
3.5
4
4
2
4
4
1
2
1
4
0
0
0
4
4
4
4
4
9
4
4
4
4
4
4
8
4
4
4
4
4
4
4
4
4
4
4
4
4
7
0
1
4
1
4
4
∼
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3.5
6
1
Continued on next page
237
AIS
ACS
Multi-Objective Metaheuristics
Scatter
S.
238
0.2
0.1
0.25
0.25
0.25
0.25
0.1
0.2
0.4
0.2
0.1
1
3
4
3
0
1
0
0
0
4
4
4
4
4
4
4
4
4
4
4
4
PGA
0.0625
MOGA
0.0625
4
4
4
NSGA
0.0625
4
4
NSGA-II
0.0625 4
4
4
4
NPGA
0.0625
SPEA
0.0625
4
SPEA-II
0.0625 4
4
4
PAES
0.0625
PESA
0.0625
4
PESA-II
0.0625
4
MOMGA
0.0625
ARMOGA
0.0625
Multiobjective
0.0625
AS/ACO
Multiobjective PSO
0.0625
POSA
0.0625
MOSA
0.0625
Feature Support Count
10.5
20
19
17.5
7
Weighted Sum
0.207 0.381 0.394 0.450 0.081
Table A.1: Coverage of features in area C1
4
10
0.175
4
9.5
0.264
16
0.324
3
3
2
1
0
1
0
3
2
6
0
1
4
0
1
1
0
0
0
4
11
0.259
SUM
MALLBA
HeuristicLab
EasyLocal
4
Opt4j
4
OAT
4
JCLEC
4
FOM
EvA2
Weight
0.2
ParadisEO
Feature
Customizable Dynamic Equations
Topologies
Lifetime support
CLONAG
optIA
Immune Networks
Detritic Cell Algorithms
AS
ACS
MMAS
ASrank
API
Basic. Impl.
ECJ
Area
Characteristic
APPENDIX A. MOFS ASSESSMENT DATA
10
0.245
0
0
0
130.5
E/A Auxiliary Methods
C2 Problem Adaption/Encoding
N. D.
OAT
Opt4j
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
0,4
4
4
4
4
4
4
4
4
4
4
4
4
9
0
4
0
9
0
4
7
0
2
3
0
4
4
6
2
4
0,04
0,04
0,04
0,04
0,04
0,04
0,04
0,04
0,04
0,04
0,04
0,04
4
4
4
SUM
JCLEC
4
MALLBA
FOM
4
EasyLocal
EvA2
Weight
0,04
4
0,04
0,04
4
0,04
ParadisEO
ECJ
Feature
Bit Vector
Bit Matrix
Bit Map
Bit GrayCode Vector
Integer Vector
Integer Matrix
Integer Map
Real Vector
Real Matrix
Real Map
String Vector
String Matrix
String Map
Permutation
Expression Tree
State-Machine
/
Graph
Combined / Arbitrary representations
Neighborhood
structures
for
solution encodings
Neighborhood
structures for composite
solution
encondings
Complex Neighborhood structures
(1PX) Integer/Bit
Vector One Point
Crossover
(NPX) Integer/Bit
Vector
n-Points
Crossover (including 2)
(UX) Integer/Bit
Vector
Uniform
crossover
(HCX) Half Uniform Crossover
(PNCTX)
Integer/Bit
Vector
Punctuated
Crossover
(SX)
Integer/Bit
Vector
Shuffled
Crossover
HeuristicLab
Solution encoding
Area
Characteristic
A.1. EVALUATION PER AREA
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
6
4
4
4
0,6
4
4
4
3
0,3
4
4
4
3
0,1
0
0.0102
4
4
4
4
4
4
4
4
8
0.012
4
4
4
4
4
4
4
4
8
0.0102
4
4
4
4
4
4
4
7
4
4
4
3
0.0102
0.0102
0.0102
0
4
1
Continued on next page
239
SUM
MALLBA
HeuristicLab
EasyLocal
Opt4j
OAT
0
0.0071
4
4
4
4
4
4
6
0.0071
4
4
4
4
4
4
6
0.0071
4
4
4
4
4
4
6
0.0071
4
4
4
4
4
0.0071
4
6
4
1
0.0071
0
0.0071
1
0.0071
4
4
4
4
0.0071
4
0.0071
4
4
0.01
4
0.01
0.01
4
0.01
4
0.01
4
6
4
2
1
0.01
4
4
2
4
4
3
4
4
2
4
4
3
1
4
Continued on next page
240
JCLEC
FOM
EvA2
Weight
0.0102
ParadisEO
Feature
(RRX) Integer/Bit
Vector Random Respectfull Recombination
(R1PX) Real Vector
One
Point
Crossover
(RNPX)
Real
Vector
n-Points
Crossover (including 2)
(RUX) Real Vector
Uniform Crossover
(RAX) Real Vector
Arithmetic
Crossover
(RHX) Real Vector
Heuristic
Crossover
(RSPLX)
Real
Vector
Simplex
Crossover
(GEOMX) Geometric crossover
(BLX-alpha)
Blended crossover
(F-BSX) Real Vector Fitness Scaning
Based Crossover
(DMPX) Real Vector Diagonal MultiParental Crossover
(POX)
Permutation Davis order
Crossover
(PPMX)
Permutation
Partially
mapped Crossover
(P2OX)
Permutation Order 2
Crossover
(PPX) Permutation
Position Crossover
(PDUX) Permutation Davis Unifform Crossover
(PMPX)
Permuation
Maximal Preservative
Crossover
ECJ
Area
Characteristic
APPENDIX A. MOFS ASSESSMENT DATA
1
0.023
0.023
4
HeuristicLab
MALLBA
SUM
EasyLocal
Opt4j
OAT
JCLEC
FOM
EvA2
Weight
0.01
ParadisEO
Feature
(PCX) Permutatio
Cycle Crossover
(TCX) Tree Cramer
Crossover
(TKX) Tree Koza
Crossover
(TMX) Tree Montana Crossover
(SMFc)
State
Machine
Fogel
Crossover
(SM1Pc)
State
Machine
Zou
&
Grefenstette
Crossover
(SMUc) State Machine
Uniform
Crossover
(SMJo) State Machine Join Operator
(CSc)
Composite/Combined
Solution Encoding
Crossover
(CPXc) Complex
Crossover operator
(Bm)
Binary/Integer Vector Basic
Mutation
(RUm) Real Vector
Unifform Mutation
(RNm) Real Vector
Normal Mutation
(RCM) Real Vector
Cauchy Mutation
(RLm) Real Vector
Laplace Mutation
(RSDm) Real Vector Schwefel Dynamic Mutation
(RFDm) Real Vector Fogel Dynamic
Mutation
(P2Optm) Permutation 2-Opt mutation
(P3Optm) Permutation 3-Opt mutation
ECJ
Area
Characteristic
A.1. EVALUATION PER AREA
4
4
2
4
4
4
4
4
4
4
4
4
1
0.023
4
0.0178
0
0.0178
0
0.0178
0
0.0178
0
0.07
4
4
4
4
5
4
0.07
0.06
0
4
0.01
0.01
4
4
4
4
4
4
4
7
4
4
4
4
4
4
6
4
4
4
4
4
6
0.01
0
0.01
0
0.01
0
0.01
0
0.01
4
1
0.01
4
1
Continued on next page
241
Solution Selection
SUM
MALLBA
HeuristicLab
EasyLocal
Opt4j
OAT
0
0.01
4
4
0.01
4
4
2
0.01
4
4
2
4
4
4
4
4
3
4
4
5
0.015
4
0.015
4
0.015
4
0.015
4
4
4
4
4
0.06
0
0.06
4
4
4
4
0.06
0.06
0.07
0.066
4
5
4
1
1
4
4
4
4
4
4
4
4
4
4
0.066
0.07
9
0
0
4
4
4
4
4
4
4
4
0.066
8
0
0.066
0.066
3
1
1
4
4
4
4
4
Continued on next page
242
JCLEC
FOM
EvA2
Weight
0.01
ParadisEO
Feature
(PKOptm) Permutation K-Opt mutation
(PSWm) Permutation Swap Mutation
(PIm) Permutation
Insertion Mutation
(PSCm)
Permutation
Scramble
Mutation
(TGm) Tree Grow
Mutation
(TSHm)
Tree
Shrink Mutation
(TSWm)
Tree
Swapping Mutation
(TCm) Tree Cycle
Mutation
(SMBm) State Machine Basic Mutation Operator
(CSm)
Composite/Combined
Solution Encoding
Mutation
(CPXm) Complex
Mutation operator
(DEm)
Dynamic
probaility
mutation
Elitist Selector (Es)
Expected Value Selector (EVs)
Elitist
Expected
Value
Selector
(EEVs)
Proportional Selector (Ps)
Determinist Sampling
Selector
(DSs)
Remaining Stocastic Sampling Selector (RSSs)
Stocastic
Tournament
Selector
(STs)
ECJ
Area
Characteristic
APPENDIX A. MOFS ASSESSMENT DATA
4
4
4
4
4
9
SUM
MALLBA
HeuristicLab
EasyLocal
Opt4j
Feature
Weight
Stocastic Universal
0.066
4
4
Sampling Selector
(SUSs)
Linear Ranking Se0.066
4
4
lector (LRs)
Mu.Lambda Selec0.066
4
4
4
4
4
tor (MLs)
Mu+Lambda
0.066
4
4
4
4
4
Selector (M+Ls)
Threshold Selector
0.066
(Ths)
Boltzman Selector
0.066
4
4
4
(Bs)
Random Selector
0.066
4
4
4
4
4
(RNDs)
Combined Selector
0.066
4
4
4
(CMBs)
DSL for Objective
0.33
Function
Definition (DSLof)
GUI & Graph.
0.33
tools for Objective
Function
Definition (GUIof)
Interactive
Ob0.33
jective
Function
Definition (Iof)
Explicit Constraint
0.1
4
Modeling
Penalization
on
0.3
4
4
4
Objective Function
Individual
Con0.3
4
straint
Solution
Repairing Mechanism
Global Solution Re0.3
4
4
pair Mechanism
Feature Support Count
33
36
43
12
36
Weighted Sum
0.254 0.413 0.306 0.090 0.258
Table A.2: Coverage of features in area C2
OAT
JCLEC
FOM
EvA2
ParadisEO
ECJ
O.F.S.
Const. H.
Area
Characteristic
A.1. EVALUATION PER AREA
2
4
3
4
4
7
4
4
7
0
3
4
4
4
8
3
∼
0.5
∼
0.5
0
1
3
1
2
10
0.078
29
0.210
2
0.150
48
0.484
23
0.102
272
243
Experiment
Design
4
4
0.1
4
4
0.1
4
4
4
4
4
4
4
4
4
4
4
4
4
4
0.1
4
4
4
0.1
4
4
4
∼
4
4
0.1
0.2
4
0.2
0.2
4
4
10
4
8
4
6
4
7
4
∼
4.5
3
4
4
4
∼
4
4
4
4
4
4
2.5
8
2.5
4
4
4
3
0.2
0
0.2
0
0.2
∼
4
4
2.5
0.2
∼
4
4
2.5
Continued on next page
244
4
SUM
4
4
MALLBA
0.1
4
HeuristicLab
4
EasyLocal
FOM
4
Opt4j
EvA2
4
OAT
Weight
0.1
4
JCLEC
Batch processing
Feature
Max iterations terminator
Fitness value terminator
Max
execution
time terminator
Max
objective
function
evaluations terminator
Max
iter./exec.
time/obj. func. ev.
without improvement terminator
Composite Logical combinations
terminator
Technique specific
Automated repetition of a single optimization task
Automated repetition of a task varying parameters of
the technique
Automated repetition of different
tasks
(possibly
varying the parameter of each
one)
Automated repetition of different
tasks
(possibly
varying the parameter of each
one) over different
instances of the
problem
Random execution
of planified tasks
Hypothesis Definition Support
Experiment Modelling
Support
(Definition
of
Dependent
and
Independent Variables)
ParadisEO
C4 Optimization Process Support
Finalization
Conditions
ECJ
Area
Characteristic
APPENDIX A. MOFS ASSESSMENT DATA
Statistical
Analysis
One way ANOVA
Two way ANOVA
Multi-way
ANOVA
Wilcoxon
Mann-Withney
KolmogorovSmirnov
Statistical
Analysis
Systems
export/import
GUI
Design & Usability
Metaheuristic
techniques configuration & spec.
Problem definition.
modelling
and
data import
Support for planning of optimization tasks
Graphical Support
for methodological
guidance
Charting
Interoperab.
Data Export
Data Import
Web Services Facade
XML usage for
projects & config
Feature Support Count
Weighted Sum
SUM
MALLBA
HeuristicLab
EasyLocal
Opt4j
OAT
JCLEC
FOM
EvA2
Weight
0.2
ParadisEO
Feature
Experiment Design
Suppoort (Factorial.
fractional.
latin
squares.
nested. etc.)
Experiment Execution Support
Experiment Execution Plan Generation Support
Experimental
Design
Systems
import/export
T-Student
ECJ
Area
Characteristic
A.1. EVALUATION PER AREA
∼
0.5
0.2
4
4
4
3
0.2
∼
∼
∼
∼
0.1
4
4
4
4
0.1
0.1
0.1
4
4
4
4
4
4
4
4
0.1
0.1
0.1
4
4
4
4
4
4
4
∼
2.5
0.2
0.4
0.5
0.8
0.5
0.1b
6
0.1b
6
3
3
2
3
4
1
4
∼
0.3
0.1b
6
0.1b
6
4
0.5
0.2
1
0.5
0.5
1
0.1b
6
0.4
0.1b
6
0.25
0.25
0.25
1
4
4
0.25
∼
∼
∼
1
∼
∼
1
1
1
0.5
1
0.8
1
1
4.9
4.3
0.5
1
2
1
1
3
2.3
0.5
0.4
1
1
∼
∼
0.5
1
4
4
4
12.4
8
12.7
17.2
18.1
0.368 0.192 0.347 0.411 0.470
Table A.4: Coverage of features in area C4
0.5
7
3.5
3.5
0
2.5
4
23.2
0.582
7.7
0.208
8.2
0.165
14.5
0.458
5.5
0.131
127.5
245
APPENDIX A. MOFS ASSESSMENT DATA
A.2
G LOBAL EVALUATION
Table §A.2 show the global assessment computed per characteristic given the evaluation of each feature provided in th previous section. The value associated to each
characteristic is computed by the aggregated sum of their features multiplied by its
corresponding weight. The value associated to each feature is 1.0 if total support is
provided (4), 0.5 if the MOF provides partial support (∼), and 0.0 if the feature is not
supported.
Table §A.2 shows the evaluation of area C5.
Table §A.2 summarizes the total scores obtained by each framework on the different
areas of the comparative.
246
OAT
Opt4j
EasyLocal
HeuristicLab
∼
4
4
4
4
4
∼
∼
∼
4
MALLBA
JCLEC
4
FOM
4
EvA2
ParadisEO
Weight
Characteristic Feature
Hybridization (BEMIh) Batch Ex0.1
ecution
Multiple
Instances Hybridization
(BEMMh)
Batch
0.2
Execution
Multiple
Metaheuristics
Hybridization
(IMMh) Interleaved
0,6
Multiple Metaheuristics Hybridization
(Ch) Combined Hib0.1
ridizacion
HyperPre-implemented Pa- 0.25
Heuristics
rameter Setting metaproblem
Pre-Implemented
0,25
Technique selection
meta-problem
Pre-implemented
0.25
Operators/Low level
heuristics selection
meta-problem
Pre-implemented so- 0.25
lution encoding selection meta-problem
Paral.
& (IPDM) Independent
0.2
Dist,
Parallel
&
Distributed Metaueristics execution
(SSPDM) Shared So0.2
lutions (or Populations) Parallel & Distributed Metaheuristics
(LSPDNM)
Local
0,2
Search using Parallel
& Dist Neighborhood exporation .
Metah.
(PDPEDM) Parallel
0.2
& Distributed Population
Evaluation
Metah.
(PDESSM) Paralell
0.2
& Dist. Evaluation
of Single Solution
Metah.
Feature Support Count
Weighted Sum
ECJ
C3 Advanced Metaheuristic Characteristics
Area
SUM
7.5
4
4
2.5
∼
4
∼
4
∼
0.5
4
1
4
1
0
0
4
4
4
4
4
4
4
4
4
4
0
4
4
4
4
5.5
0.400
6
0.500
4
3
0.200
4
0.333
1.5
0.133
1
0.033
1
0.033
Table A.3: Coverage of features in area C3
2
0.100
3
0.300
4
4
4
3
4.5
0.367
31.5
Hybridization support
Hyper-Heuristics support
Parall. & Dist. Opt
Finalization
Conditions support
Batch processing
Experiments
Design
Support
Statistical Analysis features
User
Interface
&
Graphical Reports
Interoperability
Problems & Tutorials
Papers
Documentation
Popularity / Users
HeuristicLab
MALLBA
1
0.800
0
0.6
0.7
0.8
0.7
0.510
0
0
0
0
0
0
0
0
0
1
0
0.2
0.7
0
0
0
0
0
0.330
0.100
0.060
0.25
0.7
0.7
0.7
0
0.7
0.4
0.570
0.5
0
0
0
0.5
0
0.3
0.3
0.260
0
0
0
0
0.25
0
0
0
0
0.025
0
0
0.188
0
0.438
0.438
0.7
0
0
0
0
0.188
0.9
0
0
0
0
0.125
0
0
0
0
0
0.0625
0.3
0
0
0.190
0.044
0.113
0
0.9
0.588
0.9
0.15
0
0.443
0.270
C1 - Metaheuristic Techniques
1
1
0
1
1
1
0
0.8
0.6
0.9
0
0
0
0
0.7
0
0.2
0
0
0
0.9
1
0.2
0.85
0.6
0.8
0.3
0.7
0
0
0
0.125
0.7
0
OAT
1
JCLEC
1
FOM
0
EvA2
Avg
ParadisEO
EasyLocal
Solution enconding
Neighborhood Definition
E/A Auxiliary Methods
Solution Selection
Objective
Function
Specification
Contraint Handling
Opt4j
SD (Steepest Descent /
Hill Climbing)
SA (Simulated Annealing)
TS (Tabu Search)
GRASP
VNS (Variabl Neighborhood Search)
EA (Evolutionary Algorithms)
PSO (Particle Swarn
Optimization)
AIS (Artificial Immune
Systems)
ACO
Scatter Search
MultiObjective Metaheuristics
ECJ
Characteristic
C2 - Adaption to the Problem and Its Structure
0.7
0.775 0.075 0.588 0.113 0.738
0.9
0
0
0
0
0
0.226
0.6
0
0.409
0.467
0
0
0.3
0.4
0
0.7
0
0.8
0.8
0.393
0.667
0
0
0.467
0
0.426
0.533
0
0.02
0.333
0
0.321
0.2
0
0
0
0
0.616
0.467
0.333
0.197
0.267
0
0.261
0.400
0.033
0.6
0.7
0
0
0
0
0
0
0.160
0.1
0
0.3
0
0.9
0
0.3
0
0.370
0.050
0
0
0
0.8
0.300
0.6
0.15
0.5
0.7
0.665
C3 - Advanced Characteristics
0
0.5
0.4
0.1
0
0.5
0
0
0.6
0
0
0
C4 - MPS lifecycle Support
0.95
0.8
0.6
0.75
0.7
0.9
0.4
0
0
0
0.2
0
0.2
0.6
0.4
0.72
0.3
0.74
0.2
0
0.6
0.04
0.4
0.1
0
0
0.270
0.220
0
0
0.15
0.5
0.6
0.7
0
0.2
0
0
0.215
0.483
0
0.533
0.367
0.25
0.75
0.45
0
1
0.083
0.392
0.625
0.25
0.125
0
0.25
0.25
0
0
0.75
0
0.225
0.36
1
0.8
1
0.136
0.407
0.79
0.0595238
0.033898305
0.142
0.2
0
0.407
0.345
0.61
0
0.136
0.221
0.35
0
0.175
0.274
0.511
0.118
C6 - Documentation & support
0.068 0.034 0.153 0.288 0.136
0.283 0.027 0.195 0.027 0.097
0.6
0.41
0.59
0.14
0.62
0
0
0
0.027
0
Table A.5: Scores for C1 - C4 and C6
HeuristicLab
MALLBA
OAT
Opt4j
EasyLocal
Open Source
JCLEC
Sof. Eng. Best Practices
Packages / Modules
Classes / Files (for
non OO languages)
Numerical Handling
GPL
FOM
Plat-
Open Source
(Academic
free license)
All
EvA2
Supported
tforms
ParadisEO
ECJ
Characteristic
Licensing
LGPL
LGPL
LGPL
LGPL
LGPL
All
All
All
All
All
WindowsWindows
and
Unix
0.2
0.64
0.9
0.1
0.7
0.4
0.7
0.2
80
244
119
785
80
514
1
0
0.62
CECILL
(
ParadisEO)
y LGPL (EO)
All (Except
for windows
if
using
PEO)
0.4
28
226
10
542
54
594
215
510
63
304
109
373
35
417
1
1
0.75
0
1
0.5
1
GPL
1
unix
Table A.6: Scores for C5 design, implementation & licensing
Area
1
Supported
Metaheuristics
2
Problem
Adaption/Encoding
3
Advanced
Metaheuristic
Characteristics
4
Optimization
Process
Support
5 Design, Implementation
& licensing
6 Documentation, samples &
popularity
ECJ
0.207
ParadisEO
0.381
EvA2
0.394
FOM
0.450
JCLEC
0.081
OAT
0.259
Opt4j
0.175
EasyLocal
0.264
HeuristicLab
0.324
MALLBA
0.245
Avg
0.282
0.254
0.413
0.306
0.090
0.258
0.078
0.210
0.150
0.484
0.102
0.249
0.400
0.500
0.200
0.333
0.133
0.033
0.033
0.100
0.300
0.367
0.226
0.368
0.192
0.347
0.411
0.470
0.582
0.208
0.165
0.458
0.131
0.356
0.905
0.797
0.738
0.660
0.975
0.650
0.925
0.717
0.708
0.417
0.786
0.789
0.348
0.238
0.118
0.234
0.364
0.213
0.094
0.340
0.177
0.304
Average
per
Framework
0.487
0.439
0.371
0.344
0.359
0.328
0.294
0.248
0.436
0.240
0.367
Table A.7: Global scores
B
M ETA - MODELS AND S CHEMAS
his appendix provides formal definitions of the languages described in this dissertation
in terms of their UML meta-model. As a consequence those descriptions are independent of the specific concrete syntaxes and serializations provided for such meta-model
like xml schemas and documents, plain text syntaxes such as SEDL4People, or graphical notations. In the UML diagrams extension points of the meta-models are denoted shading the
classes in red and with the ExtensionPoint stereotype.
T
Specifically, Section §B.1 provides the full specification of the meta-model of SEDL. Section
§B.2 provides the full specification of the meta-model of MOEDL.
251
APPENDIX B. META-MODELS AND SCHEMAS
B.1
SEDL M ETA - MODEL
The SEDL meta-model is the result of an extensive analysis of a variety of experiments developed by authors [210, 213, 247], a careful study of the related literature,
and a process of successive refinements of the meta-model after applying it to different
scenarios. Specifically, we have taken [116] as the main reference for general experiment descriptions and [21, 23] for the specific details of metaheuristic optimization
experiments. Additionally, we have evaluated other approaches (c.f. Chapter §4) and
the proprietary formats and classes used by the set of MOFs assessed in Chapter §5 for
experiment description.
In general we use UML class diagrams to define the structure of the metamodel.
The general structure of the metamodel of SEDL is depicted in Figure §B.1.
Specifically, it defines Experiment as a base abstract class that provides the basic
identification attributes for the experiment and acts as extension point. This extension
point can be used by DSLs to define domain specific experiments by subclassification.
SEDL provides a domain independent subclass of Experiment named BasicExperiment
for describing any kind of scientific experiment. BasicExperiment enables a comprehensive specification of experiments by providing formalizations of the basic concepts
described in Chapter §3. In particular, the attributes of a BasicExperiment are defined
as follows:
• id: string. Every experiment must be uniquely identified by an identifier, that
usually will be a number preceded by ’Exp’.
• name: string. This attribute provides a descriptive name for the experiment.
• metaId: string. This attribute contextualizes the identifier of the experiment,
reducing drastically the possibilities of identification conflicts. It identifies the
author who performed the experiment, its supporting organization, and the experimental repository where the experiment is stored. Its purpose is similar to
the namespace attribute of XML documents. The use of URLs as the value of this
attribute is encouraged.
• context: Context. This attribute provides detailed information about people,
organizations and projects related to the experiment.
• hypothesis: Hypothesis. Experiments in SEDL have an unique scientific hypothesis. This scientific hypothesis must be a logical assertion testable using the
252
B.1. SEDL META-MODEL
data gathered during experimental conduction [183, 224], [4, chap. 6]. In the next
section, a detailed description of Hypothesis is provided.
• design: Design. The design of a SEDL experiment describes the set of variables
and constants that are involved in the experiment, and a description of the experimental protocol that describes when and how are measured and modified
the variables. Moreover, designs contain an specification of the analysis procedures to be used on the data gathered during experimental conduction.
• configurations: Configuration[0..*]. Configurations describe the specific experimental settings and details about experimental conduction. For instance,
in metaheuristic optimization experiments the configuration should specify the
metaheuristic programs executed and the execution platform used (hardware
and software). Configurations also provide details about the inputs and expected
outputs of the experiment. Additionally, configurations are relevant for the description of the experiment along its life-cycle, since they describe the set experimental conductions performed (executions: Execution[0..*]). Executions
describe a specific conduction the experiment, in terms of the execution process
and its results. Moreover, each execution contains the results of the application
of the methods specified in the design for analysis (analyses: Analysis[0..*]).
They are intended to support the testing of the hypothesis of the experiment in
order to draw of conclusions.
• annotations: string[0..*]. Annotations are machine-processable information
that can be included in the experiment for use by specific tools.
• notes: string[0..*]. Other information about the experiment that cannot be
fitted in previous fields can be recorded here.
The specific meta-model of the context of experiments in SEDL is shown in Figure
§B.2. The structure and types of hypotheses supported by SEDL are depicted in Figure
§B.3. A RelationalHypothesis states an assertion about the relationship between the
outcome and a non-empty the set of factors or characteristics (independent variables).
This relationship can be a causal relationship (a change in the levels of the independent
variables causes a change in the level of the dependent variable), or it can be ruled by
an specific mathematical expression that allows to predict the level of the dependent
variable based on the levels of the independent ones. This dichotomy leads to the creation of two different types of relational hypotheses in SEDL: DifferentialHypothesis
and AsociationalHypothesis. We have included extension points for the description
253
APPENDIX B. META-MODELS AND SCHEMAS
Figure B.1: Meta-model of experiments in SEDL
Figure B.2: Meta-model of experiments context in SEDL
254
B.1. SEDL META-MODEL
Figure B.3: Meta-model of experimental hypotheses in SEDL
Figure B.4: Meta-model of experimental variables in SEDL
of assertions in DescriptiveHypothesis and relationships between variables in AsociationalHypothesis. Thus users can define their own DSLs for specifying such elements.
The meta-model of SEDL variables is depicted in Figure §B.4.
The structure of detailed designs is described in Figure §B.6 as an UML class diagram. Some classes of this diagram have invariants. Specifically, the invariant of
Treatment specifies that the referenced variables in its valuations are ActiveIndependentVariables (otherwise the level of the variable could not be changed). The invariant
of VariableValuation ensures that its level is in the domain of the associated variable.
The specification of predefined experimental designs is supported through the use
of the PredefinedDesign extension point. In order to perform property checking on
SEDL documents with a predefined designs, such designs must be expanded into its
complete specification. In chapter §8 we define the contract that authors must fullfill in
order to support the expansion of their specific predefined designs. In this dissertation
255
APPENDIX B. META-MODELS AND SCHEMAS
Figure B.5: Meta-model of design in SEDL
Figure B.6: Meta-model of experimental designs in SEDL
256
B.1. SEDL META-MODEL
Figure B.7: Meta-model of experimental configurations in SEDL
we only provide the specific designs needed by the metaheuristic optimization experiments we focus on (techniques comparison designs and technique parametrization
designs), that are defined in Chapter §7.
The components of a Configuration are depicted in Figure §B.7.
The description if the input data required for experimental conduction is performed
in SEDL through the ExperimentalInputs element, that comprises of a set of inputDataSources. Additionally, the InputFile of the experiment can contain a set of VariableValuations named features, in order to describe the levels of the variables that a specific
input file has associated.
The structure of SEDL Executions is described in Figure §B.8.
Figure §B.9 depicts the structure of the specification of analyses to be performed
(ExperimentalAnalysisSpecification), and their results (AnalysisResult). Since SEDL
257
APPENDIX B. META-MODELS AND SCHEMAS
Figure B.8: Meta-model of experimental executions in SEDL
Figure B.9: Meta-model of experimental analyses specifications and results
258
B.1. SEDL META-MODEL
Figure B.10: Meta-model of dataset specifications in SEDL
is aimed at the automation of the experimentation life-cycle, its provides support mainly
for statistical analyses (other type of analyses such as charts or summary tables are
useful, but require human interpretation and evaluation). Specifically, the Statistic package provides specific subclasses for ExperimentalAnalysisSpecification and
AnalysisResult. The elements of this package are described in detail in the following
subsections.
Additionally, package DatasetSpecification provides mechanisms for specifying
the subset of the results on which the analyses should be performed. Figure §B.10
depicts the elements provided in this package.
The elements of this package are translations of the basic operators of relational
algebra (Projection and Filtering) to our model plus the grouping operator for specifying how to compare datasets in presence of blocking variables. In this sense, the
results of a dataset specification are specified by the union of the results of its associated projections as applied to the results obtained by applying its filters sequentially
(equivalent to a single filter whose criteria is an AND of the corresponding filtering
criteria).
The structure of the StatisticalAnalysisSpecifications and StatisticalAnalysisResults supported by SEDL is depicted in Figure §B.11.
Regarding analysis specification the meta-model defines:
• CentralTendencyMeasures describes the way in which a real DependentVariable
cluster around some value, that in our meta-model is stored in the attribute centralValue. Specifically, SEDL supports the following types of measures: the Mean,
the Median, the Mode, and the ConfidenceInterval. The Mode is an special kind
259
APPENDIX B. META-MODELS AND SCHEMAS
Figure B.11: Meta-model of statistical analyses in SEDL
260
B.1. SEDL META-MODEL
of central tendency measure since it can describe information about any kind of
variable, as a consequence, it can be associated to a specific level. ConfidenceIntervals provide information both about the clustering of values and its dispersion. As a consequence it is also a type of variability measure. The limits of the
confidence interval are min and max.
• VariabilityMeasure describes dispersion of the values of a real DependentVariable, expressed as magnitude that in our meta-model is stored in the attribute
variabilityValue. Specifically, SEDL supports the following types of measures:
the StandardDeviation, the Range, the InterQuartileRange, and the ConfidenceInterval.
• Ranking defines an order relation on the Level of an IndependentVariable based
on the value of a descriptive statistic. This class has an invariant that specifices
that the associated DescriptiveStatistic should not be a Ranking.
Regarding StatisticalAnalysis, SEDL supports the description of several types:
• Null Hypothesis Significance Test (NHST), is a decision making mechanism about
an hypothesis (named the null hypothesis) based on a dataset. These tests determine if the results would lead to the rejection of the null hypothesis for a prespecified level of significance. Specifically, NHSTs answer the following question: Assuming that the null hypothesis is true, what is the probability of observing a
value for the test statistic that is at least as extreme as the value that was actually observed?. That probability is named the p-value. The usual practice is to assume
that the null hypothesis is false when the p-values is lower than the significance
level. The p-value is computed based on the value of a test statistic and on its
theoretical distribution under the null hypothesis. Such distribution can have a
number of freedom degrees. In SEDL the specific tests to be applied is identified by the attribute name. When NHST are used to detect significant differences
among two distributions (the null hypothesis would be that the distributions are
identical), they are called simple comparison NHST. Conversely, when NHST are
used to detect significant differences among three or more distributions (the null
hypothesis would be that all the distributions are identical), they are called multiple comparison NHST. Additionally, a MultipleComparisonNHST can be associated
with a set of Post-hocProcedure. These procedures are a special kind of NHSTs,
concerned with finding relationships among a couple of distributions from the as-
261
APPENDIX B. META-MODELS AND SCHEMAS
sociated multiple comparison test. The specific distributions compared are identified by the respective variable valuations as shown in Figure §B.9.
• CorrelationCoeficient describes the degree of relationship between two or more
data sets, and is used to infer the presence or absence of an association. There are
several methods to compute correlation. In the SEDL meta-model both CorrelationCoeficient and NHST are generic, without binding to a specific hypothesis
testing or coefficient computation method. This enhances the expressiveness of
the language.
For each specific type of analysis specification, SEDL defines a corresponding subtype of StatisticalAnalysisResult:
• DescriptiveStatisticValue supports expressing the results for any DescriptiveStatistic, such asMeans, Medians, StandardDeviations, etc. It is associated with a
level (value) of the dependent variable of the experiment. Since the relationship
of the level with the dependent variable is not shown in the diagram of Figure
§B.11, this constraint is specified as invariant of DescriptiveStatisticValue (the
variable of the level specified as result pertains to the domain of a DependentVariant).
• RankingResult provides an order list of the levels for the ranking criterion. Class
RankingResult, has an OCL invariant that specifies that the values specified in the
ranking should be in the domain of the ranking variable variable.
• ConfidenceIntervalValue provides a minimum and maximum value for a confidence interval.
• CorrelationCoeficientValue provides a value and optional description of the correlation coeficient computed for the results of the experiment.
• PValue provides the p-value of the test of hypothesis associated. It contains the
value, the degrees of freedom, and an optional description.
B.2
MOEDL M ETA - MODEL
Figure §B.12 depicts the main elements of MOEDL, their structure and their relationships as an UML class diagram.
262
B.2. MOEDL META-MODEL
Figure B.12: Types of Experiments supported by MOEDL and their structure
263
APPENDIX B. META-MODELS AND SCHEMAS
Figure B.13: Termination criteria supported by MOEDL and their structure
MetaheuristiOptimizationExperiment has an OCL invariant specifying that either
a global termination criterion is specified for the experiment, or each optimization
technique defines its own termination criterion. It is worth noting that the use of different termination criteria in this kind of experiments could lead to bias in the comparison (some algorithms can use more computational resources than others in their
execution), and consequently to wrong conclusions regarding the performance of algorithms.
An OCL invariant in MetaheuristiOptimizationExperiment states that a random
number generation algorithm and seed must be specified either for the experiment
as a global setting, or for each optimization technique in particular.
An OCL invariant in MetaheuristiOptimizationExperiment states that termination
criterionmust be specified either for the experiment as a global setting, or for each
optimization technique in particular.
Figure §B.12 depicts the meta-model of the description of this kind of experiments
supported by MOEDL. A TechniqueParametrizationExperiment is associated with a
single technique (note the first clause of its invariant), and contains a set of SimpleParameter definitions. These definitions specify the domain of the parameter using a
SEDL Domain but not its value, since finding its optimal value is the purpose of the
experiment. Currently only simple parameter dimensions are supported, i.e., the possible values of the parameters must be enumerated.
Figure §B.13 describes the termination criteria supported in MOEDL.
264
B.3. XML SCHEMAS OF SEDL AND MOEDL
B.3
XML S CHEMAS OF SEDL AND MOEDL
The XML schemas that provide a concrete syntax for the above described metamodels are available at:
• SEDL: http://moses.us.es/schemas/sedl/v1/SEDL.xsd
• MOEDL: http://moses.us.es/schemas/moedl/v1/MOEDL.xsd
Those schemas has been generated directly from the meta-models using the default
Eclipse EMF transformation to XML schema.
265
C
A M ETAHEURISTICS
D ESCRIPTION S YNTAX IN EBNF
1
Syntactical definitions
Metaheuristics Supported
meataheuristic ::= [openPar meaheuristicDec closePars] | metaheuristicDef
metaheuristicDef ::= EA | SD | VNS | TS | SA | AS | GRASP | PR | RS ;
Common parameters
MHParams
initParam
solutionClass
initType
metaheuristicInit
randomInit
::=
::=
::=
::=
::=
::=
initScheme [listSep solutionClass [listSep terminationCriterion]] ;
0
Init 0 valueSep initType ;
0
class 0 ; |0 encoding 0 classSpec ;
randomInit | metaheuristicInit ;
metaheuristic ;
‘Random 0 ;
Termination Criteria
TerminationCriterion
NIterationsCriterion
MaxTimeCriterion
RepeatCriterion
TerminationCriterionList
::=
::=
::=
::=
::=
NIterationsCriterion | MaxTimeCriterion | RepeatCriterion ;
‘MaxIterations 0 openParams Integer closeParams ;
‘MaxTime 0 openParams Integer closeParams ;
‘Repeat 0 openParams TerminationCriterionList closeParams ;
TerminationCriterion { listSep TerminationCriterion } ∗ ;
Selection criteria
::= (ElitistSelector | RandomSelector | RouletteWheelSelector |
TournamentSelector | ProportionalRakSelector | CustomSelector )
openPars SignSpec listSep RepeatSpec closePars ;
ElitistSelector
::= 0 Elitist 0 ;
RandomSelector
::= ‘Random 0 ;
RouleteWheelSelector
::= ‘Roulette 0 ;
TournamentSelector
::= ‘Tournament 0 Integerselector ;
ProportionalRanksSelector ::= ‘ProportionalToRanks 0 ;
CustomSelector
::= CustomElement ;
SignSpec
::= [‘sign 0 ]‘positive 0 | ‘negative 0 ;
RepeatSpec
::= [‘repeat 0 ]Boolean ;
Selector
Steepest Descent / Hill Climbing
SD ::= (0 SD 0 |0 HC 0 )[initSep MHParams endSep] ;
Simulated Annealing
SA
SAParams
CoolingScheme
LinearCS
ExponentialCS
LogaritmicCS
InitialTemperature
SolutionsPerIter
::=
::=
::=
::=
::=
::=
::=
::=
initSep MHParams SAParams endSep ;
listSep CoolingScheme listSep InitialTemperature listSep SolutionsPerIter ;
[0 coolingScheme 0 valueSep] (LinearCS | ExponentialCS | LogaritmicCS ) ;
0
Linear 0 openPars Float closePars ;
0
Exponential 0 openPars Float closePars ;
0
Logarimic 0 openPars Float closePars ;
0
[ initialTemperature 0 valueSep] Float ;
[0 solutionPerIter 0 valueSep] Integer ;
1
267
Tabu Search
TS ::= initSep MHParamslistSep TSParams endSep ;
Tabu Search parameters
TSParams
TSMemory
RecencyTSMemory
FrequencyTSMemory
CustomTSMemory
TSAspriationCriterion
ImproveAspCrit
CustomAspCrit
::=
::=
::=
::=
::=
::=
::=
::=
TSMemorylistSep TSAspirationCriterion ;
RecencyTSMemory | FrequencyTSMemory | CustomTSMemory ;
0
Recency 0 openPar Integer closePar ;
Frecuency openPar Integer closePar ;
CustomElement ;
ImproveAspCrit | CustomAspCrit ;
0
Improvement 0 ;
0
Custom 0 openPars classSpec closePars ;
Variable Neighbourhood Search (VNS)
VNS ::= initSep MHParamslistSep VNSParams endSep ;
VNS parameters
VNSParams
SolutionsPerNeigh
NeighList
NeighbourhoodStruct
::=
::=
::=
::=
SolutionsPerNeigh listSep NeighList ;
[0 solutionsPerNeigh 0 valueSep] Integer ;
NeighbourhoodStruct listSepNeighbourhoodStruct + ;
CutomElement ;
Evolutionary Algorithm
EA ::= ‘EA0 initSepMHParamslistSepEAParamsendSep ;
Evolutionary Algorithm Parameters
EAParams
::= PopulationSize listSep
CrossoverOp listSep CrossoverSel listSep CrossoverProb
[IncestThreshold listSetp] IncestTreatment
[MutationOplistSetp] MutationSel listSep MutationProb
SurvivalPolicy ;
CrossoverOp
::= 0 1px 0 |0 2px 0 |0 Unifform 0 | CustomOp ;
CrossoverSel
::= [0 crossoverSel 0 cvalueSep]selector ;
CrossoverProb
::= [0 crossoverProb 0 cvalueSep] Float ;
CustomOp
::= CustomElement
MutationOp
::= [0 mutationOp 0 cvalueSep] 0 bitflip 0 | CustomOp ;
MutationSel
::= [0 mutationSel 0 cvalueSep] Selector ;
MutationProb
::= [0 mutationProb 0 cvalueSep] Float ;
IncestThreshold ::= [0 mutationThreshold 0 cvalueSep] Float ;
IncestTreatment ::= [0 incestTreatment 0 cvalueSep] ‘Repair 0 ReparisonMechanism |0 Remove 0 ;
SurvivalPolicy
::= [0 survivalPolicy 0 cvalueSep]SelectorRelpacer ;
SelectorReplacer ::= Selector ;
GRASP
GRASP ::= ‘GRASP 0 initSep MHParams listSep GRASPParams endSep ;
2
268
GRASP parameters
GRASPParams
RCLSelection
GreedyAndAlphaRCL
GFunction
Alpha
CustomRCLSelection
::=
::=
::=
::=
::=
::=
RCLSelection, LocalImprovement ;
[RCLcreation valueSep] GreedyAndAlphaRCL | CustomRCLSecletion ;
0
Greedy 0 openPars GFunction listSep Alpha closePars ;
0
[ g − function 0 valueSep] CustomElement ;
[0 alpha 0 valueSep] Float ;
CustomeElement ;
PR
PR ::= ‘PR 0 initSep MHParams listSep PRParams endSep ;
PR parameters
PRParams
GSS
EliteSetSize
RelinkingSteps
::=
::=
::=
::=
GSS listSep EliteSetSize listSepRelinkingSteps ;
[0 guidings olutions elector 0 ] Selector ;
[0 eliteSetSize 0 valueSep] Integer ;
[0 relinkingSteps 0 valueSep] Integer ;
Ant Systems
AS ::= ‘AS 0 initSep MHParams listSep ASParams endSep ;
Ant System parameters
ASParams
::=
::=
NAnts
::=
NAnts
::=
PheromoneTrailUpdater ::=
EvaporationScheme
::=
NAnts listSepNUpdaters listSep
PheromoneTrailUpdater listSep EvaporationScheme ;
[0 ants 0 valueSep] Integer ;
[0 updaters 0 valueSep]Integer [0 %0 ] ;
Selector ;
CoolingScheme ;
3
269
2
Lexical definitions
Basic Types
CutomElement ::=
::=
classSpec
::=
Float
::=
Integer
::=
Digit
::=
Boolean
::=
String
::=
Character
::=
letter
::=
[0 Custom 0 openPar [String listSep] classSpec closePar ] |
classSpec ;
[0 class 0 valueSep] String ;
Integer (0 .0 Integer ) |0 f 0 ;
[+ | −] Digit {Digit} ∗ ;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ;
0
true 0 | 0 false 0 ;
0 0
‘ Character {Character } ∗0 ‘0 ;
letter | Digit ;
0 0
a | 0b0 | 0c0 | 0d 0 | 0e 0 | 0f 0 | 0g 0
0 0
| h | 0 i 0 | 0 j 0 | 0 k 0 | 0 L0 | 0 m 0 | 0 n 0
| 0 o 0 | 0 p 0 | 0 q 0 | 0 r 0 | 0 s 0 | 0 t 0 | 00
| 0u 0 | 0v 0 | 0w 0 | 0x 0 | 0x 0 | 0z 0 |
| 0 A0 | 0 B 0 | 0 C 0 | 0 D 0 | 0 E 0 | 0 F 0 | 0 G 0
| 0 H 0 | 0 I 0 | 0 J 0 | 0 K 0 | 0 L0 | 0 M 0 | 0 N 0
| 00 | 0 O 0 | 0 P 0 | 0 Q 0 | 0 R 0 | 0 S 0 | 0 T 0
| 0U 0 | 0V 0 | 0W 0 | 0X 0 | 0Y 0 | 0Z 0;
Separators
listSep
valueSep
initSep
endSep
openPars
closePars
::=
::=
::=
::=
::=
::=
‘,0 ;
‘ :0 ;
‘{0 ;
‘}0 ;
‘(0 ;
‘)0 ;
4
270
D
S TATISTICAL TESTS SUPPORTED
NHST suppport in SEDL
Purpose
Normality condition
Test
Reference
Kolmogorov-Smirnov
Lilliefors
Shapiro-Wilk
[255]
[175]
[251]
Levene
[172]
Parametric pairwise comparison
T-student
[252]
Non-parametric pairwise comparison
Wilcoxon
McNemar
[296]
[185]
ANOVA
[252]
Friedman
Aligned Friedman
Iman & Davenport
Quade
Cochran Q
[106]
[137]
[146]
[228]
[252]
Bonferroni-Dunn
Holm
Hochberg
Hommel
Holland
Rom
Finner
Li
Shaffer
Nemenyi
[79]
[141]
[136]
[142]
[138]
[239]
[92]
[174]
[249]
[198]
Homoscedasticity condition
Parametric multiple comparison
Non-parametric multiple comparison
Post-hoc analyses
Table D.1: Set of tests and post-hoc analyses supported by SEDL
271
E
SEA
According to the experimentation guidelines of the literature [21, 80, 286], the reporting of experiments with the aim of reproducibility requires not only specifying its hypothesis, design and analyses (tasks for which we have provided support with the
above described languages), but also providing all the input and output data of the
experiment, along with all the experimental artefacts used for its conduction, such as
survey forms, data gathering spreadsheets, etc. In the context of computational experiments those artefacts are usually algorithms implementations or elements encoded as
computer files, and consequently such information can be packaged along with the experiment description in an electronic resource that fully describes the experiment. We
denote such as resource as the experimental lab-pack.
Using a lab-pack for replicating an experiment requires identifying the role of each
comprising element (inputs, outputs, experimental artefacts), and using them properly
during the execution of the experimental procedure. We consider that the role of those
elements in the experiment should have an impact on their location in the lab-pack,
in order to ease its use independently of the experiment description and promote an
structured layout.
Thus, we propose the creation of a standard layout for the elements of lab-packs depending on their role in the experimental protocol, and a default location and file name
for the main experimental description (written in SEDL or in one of its DSLs). Such a
standard folder structure is depicted in Figure §E.1. According to this figure, the experimental description file of a SEA lab-pack must be at the root folder, and its name must
be “experimentalDescription” (with the corresponding extension, for instance “.sed” if
it is plain SEDL or “.moe” for MOEDL descriptions). The layout of the lab-pack folders is divided into general and configuration-specific files. The general elements are
placed in folders directly under the root of the lab-pack, such as the “/input” folder,
that contains all the general input data used during the experiment. For instance, the
files that contain the specific problem instance data in a MOEDL experiment if they are
available in a configuration-independent format (such as plain text or xml files). The
configurations folder contains one sub-folder per Configuration specified in the exper-
273
imental description of the lab-pack, where the name of the folder must be the identifier
of the corresponding Configuration. Each configuration contains a set of folders with
configuration-specific information as described next:
• “configurations/hConfigIDi/input” contains the input information that is specific
for this configuration; the
• “configurations/hConfigIDi/artefact” contains the set of artefacts used for the execution of the experimental procedure of the configuration, such as the source
code of the algorithms, their binary executables, or the documents that contain
the survey used in the experiment. Those artefacts are arranged according to the
nature in three different folders:
– “src” for the editable documents or source code of the algorithms (in this
way they can be used to generate the artifacts needed to perform future
non-exact replications.
– “bin” for the actual artefacts used in the experimental protocol execution.
For instance, this folder could contain the forms used for the survey in
“.pdf” format or the executable “.jar” archives for java algorithms.
– “docs” for the additional documentation about how to use the artefacts during the experimental conduction. For instance, this folder could contain
a transcription of the speech performed by instructors prior to surveying
users, or the documentation about the usage of the executable files or the
source code provided.
• “configurations/hConfigIDi/executions” this folder contains the information about
each individual execution of the experiment with this configuration. The results
of each execution will be stored in a folder whose name is the execution ID according to the experimental description of the lab-pack,
i.e., “configurations/hConfigIDi/executions/hExecutionIDi/h ResultFileNamei ”.
We propose using the tar packaging format as described by IEEE in the standard
[145] for distributing whole lab-packs as single files, and the use of the “.sea” extension
for such files. Additionally, a “.zea” extension could also be used if the lab-pack file is
compressed using the zip format as described by ISO/IEC in [149].
Regarding to the use of the elements in the lab-pack in the context of execution
of the experimental procedure, we consider that it is strongly dependent on the experiment domain and the corresponding DSL used for describing the experimental
274
Figure E.1: Layout and structure of SEA lab-packs
procedure. For non-automated experimental procedures, the actors involved in the
replication can have access to the elements in the lab-pack and use them accordingly
to the description provided in the procedure. For automated experimental procedures,
we rely on the extensibility of SEDL for supporting different DSLs for experimental
procedures description and the extension mechanisms provided in MOSES for creating modules that perform the execution accordingly.
Regarding the languages for experimental procedures description provided in this
dissertation, our current implementations ensure that: Experimental procedures described using the command-based language, the elements provided in the global and
configuration specific1 “/input” folders will be copied to the commands execution environment in a folder named “/input”, consequently they can be used as parameters
for any command. Moreover, the elements provided in the global an configuration
specific “/artefacts/bin” folders will be available in the path of the commands execution environment, and can consequently be invoked as commands. Those experiments
1 Note
the any experimental procedure execution is associated with a specific configuration of the
experiment
275
described using MOEDL have an implicit experimental procedure. This procedures
are executed using an optimization execution engine provided by MOSES (E3) that
executes the algorithms specified on the problems instances in a randomized order.
MOSES-EE delegates the responsibility of running optimization specific tasks (algorithm X on problem instance Y) on a MOF-specific module named the metaheuristics
experimental tasks execution engine (MOF-E3). E3 ensures that any MOF-EE will have
access to a copy of the whole lab-pack in a specific path specified as a parameter at
invocation. In the specific case of FOM, since it is a JAVA framework, the optimization task execution engine adds all the jar files provided in the global an configuration
specific “/artefacts/bin” folders to the class-loader, and the global an configuration
specific “/inputs” are copied in a similar way as in the commands-based language.
Thus, problem instances data can be loaded directly from the “/inputs” folder and the
optimization algorithms created according to the description provided in the configuration parameters of the experiment (whose implementation could be stored in the jars
provided in the “/artefacts/bin” folder of the lab-pack) can optimize them.
276
F
EEE: E XPERIMENTAL E XECUTION
E NVIRONMENT
The purpose of E3 is to provide a basic implementation of the ExperimentalExecutor
service as defined in MOSES (c.f. §8.3.2). Currently it is a command line tool that loads
the experimental descriptions (either SEDL or MOEDL through a transformation) and
executes the experiment.
This execution is performed in two alternative ways. If the experimental procedure
is a Command it executes the corresponding invocations through the shell of the operating system, according to the sequence specified by the experimental protocol (currently
only Random protocols are provided). Otherwise it tries to execute the experiment as
a MOF by running the algorithms specified in FOM.
E3 assumes that the current directory where it is invoked is a lab-pack that follows
the structure specified by SEDL. Thus, all the file names are interpreted accordingly.
For instance, problem instance files are searched in the “/input” and
“/con f igurations/hcon f igID i/input”.
This implementation is currently a prototype, not intended for industrial use. Furthermore, the automated execution of the analyses specified in the experimental description is not still implemented.
277
G
Q O S- AWARE B INDING OF
C OMPOSITE W EB S ERVICES
Quality is never an accident; it is always the result of high intention,
sincere effort, intelligent direction and skillful execution;
it represents the wise choice of many alternatives
William A. Foster,
Found in Igniting the Spirit at Work: Daily Reflections
G.1
T HE Q O S- AWARE C OMPOSITE W EB S ERVICES B IND ING
P ROBLEM
In service oriented environments, complex applications are developed by composing web services; i.e., specifying through the control flow instructions of the application the sequence of invocation of composed services. These applications can be
created general purpose programming languages, that contain instructions invoking
several web services, such as java or C#, or expressed through web service composition
languages such as BPEL [10]. In the latter case, the compositions are deployed usually
as services, in such a way that other service based applications can use it (as services
being composed). In the remainder of this dissertation we focus on this latter case,
but the algorithms and solutions described are applicable to service based applications
implemented with general purpose programming languages with minimal modifications. Service based applications can be composed using abstract services, where the
services to be invoked are chosen dynamically at runtime from a set of candidates that
implement the same functionality and present a compatible interface.
In this context, Quality of Service (QoS) has been identified as a key element to guide
the selection of those candidates, using the values of the QoS properties for each candidate service, such as execution time, invocation cost or availability. The development
of QoS-aware composite services leads to the creation of context-aware and automat-
279
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Figure G.1: Goods Ordering Composite Service
ically optimized applications, depending on available services and user preferences.
This problem represents an important SOC research challenge [207, 208] identified as
part of the Search Based Software Engineering area [129].
QoS-aware Composite Web Service Binding (QoSWSCB) implies solving a NP-hard
optimization problem [14, 35]. In [14] is shown that this problem is similar to the
MMKP, where every instance of the MMKP can be formulated as a QoSWSC. Since
QoS levels provided by a service may change frequently, and even some services can
become unavailable or new services emerge [44], composition approaches which take
into account runtime changes in the QoS of component services are needed.
When this problem is solved at runtime (during the execution of the sequence of
invocations that conform the composition) or at invocation time(immediately before its
start), it is called a reoptimization or rebinding problem [303][15]. In these circumstances,
the time to obtain a good solution is a critical issue, and the use of heuristics appears
as a promising approach [27]. In literature some metaheuristic techniques has been
proposed to solve this problem [42, 166, 287].
Actor
Provider
Task
Candidate Service
Cost (in cents)
Execution Time
BANK
A
t1
s1,A
1
0.2
B
t2
s2,A
2
0.2
t1
s1,B
1.5
0.1
t2
s2,B
5
0.15
PROVIDER
C
D
t3
t4
t3
t4
s3,C s4,C s3,D s4,D
1
2
1
5
0.2 0.2 0.4 0.25
DELIVERY
E
F
t5
t5
s5,E s5,F
1
2
0.2 0.2
DIG. SIGN.
G
H
t6
t6
s6,G s6,H
1
2
0.2 0.1
SURVEYING
I
J
t7
t7
s7,I s7,J
1.5 5
0.1 0.15
Table G.1: Service providers per Role and their corresponding QoS Guarantees
280
G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM
A motivating example
In order to illustrate the QoSWSCB problem, a goods ordering service inspired in
the example provided in [305] is depicted in Fig. §G.1 using BPMN 2.0. The diagram
specifies a business process exposed as a composite web service that uses 7 services with
alternative providers (henceforth named tasks, t1 , . . . , t7 ). Table §G.1 shows the available
service providers for each task and their corresponding QoS attributes. As illustrated,
two candidate services are available for each task.
The composition starts when a client sends an order. First the order is registered.
Next if the payment type of the order is “Credit Card”, the card is checked (t1 ) and
the payment (t2 ) is performed. As depicted in Table §G.1, two banks providers are
available, A and B, and each of them provide candidate services for the tasks t1 and t2 ,
denoted as s1,A , s2,A , s1,B and s2,B . Different providers could be chosen in the binding
of the CWS for each task; e.g. A for t1 , and B for t2 .
Next the stock is checked (t3 ) and the products are reserved for pick-up (t4 ). If any
product in the order is not in stock, the user is informed of the delay and the CWS waits
for some time until activities t3 and t4 are repeated (creating a loop). It is worth noting
that the same provider must be chosen for the tasks t3 and t4 , since the reservation in
t4 refers to the stock of the specific provider queried in t3 . Once the order is ready for
delivery two branches are performed in parallel. The pick-up and delivery (t5 ) to the
client is requested, and an e-mail is sent to the client with an enclosed digitally signed
invoice (t6 ). Once the activities on both branches are performed, the completion of an
user satisfaction survey (t7 ) is requested.
Additionally, Fig. §G.1 shows several QoS constraints that must be fulfilled. These
constraints may affect to single tasks (e.g. ”the cost of credit card payment must be lower
than 0.1$”) or to a group of tasks (e.g. “The total execution time of the remainder activities
after having the order ready for delivery must be lower than 0.5 seconds”)
The QoSWSCB problem can be stated as finding the binding that fulfills all the QoS
constraints and maximizes or minimizes certain user-defined optimization criteria, e.g.
minimize cost. Note that this may become extremely complex as the number of candidate services increases. In this example two providers are available for each task, thus
128 (27 ) different bindings are possible. This problem becomes especially convoluted
in rebinding scenarios where providers can become unavailable and QoS levels may
change unexpectedly.
Moreover a single service provider can expose its services with different QoS values
281
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
defining a whole set of alternative candidates. For instance, the SLA of the Amazon
Simple Storage Service (AmazonS3) provides three types of storage (standard, redundant, and glacier) with different QoS values
The global cost of the binding χ = ( A, B, D, D, F, H, J ) for an invocation where payment is performed using credit card would be:
Qcost (χ) = cost1 (χ) + cost2 (χ) + cost3 (χ) + cost4 (χ) + cost6 (χ) + cost7 (χ) = cost(s1,A ) + cost(s2,B ) +
cost(s3,D ) + cost(s4,D + cost(s5,F ) + cost(s6,H ) + cost(s7,J ) = 1 + 5 + 1 + 5 + 2 + 2 + 5 = 21 cents.
Since the total execution time of two parallel branches is equal to the maximum of
their execution times global execution time for such binding, under the assumption of
finding the goods in stock, would be:
Q ExecTime (χ) = ( ExecTime1 (χ) + ExecTime2 (χ) + ExecTime3 (χ)) + ExecTime4 (χ) +
+ Max ( ExecTime5 (χ), ExecTime6 (χ)) + ExecTime7 (χ) = ExecTime(s1,A ) + ExecTime(s2,B ) + ExecTime(s3,D ) +
ExecTime(s4,D + Max ( ExecTime(s5,F ), ExecTimet(s6,H )) + ExecTime(s7,J ) = 0.2 + 0.15 + 0.4 + 0.25 +
+ Max (0.2, 0.1) + 0.15 = 1, 25 seconds.
The process of QoS-aware binding composite web services
The QoS-aware binding of a composite web service is performed as follows: When
the CWS is invoked or a rebinding is needed [44], the set T = {t1 , . . . , tn } of tasks is
identified. For each task ti , the set of service providers available Si = {si,1 , . . . , si,m }
(named candidate services) is determined by performing a search on a service registry.
For each candidate service si,j , the QoS information is retrieved. The value provided
by si,j for the QoS property q is denoted as qi,j ; e.g. according to Table §G.1 the cost of
invoking the payment service of provider A (cost2,A ) is 0.02$. Given that some registry
technologies do not support QoS information, a QoS-enriched registry or alternative
QoS information source (such as a Service Level Agreements Repository or a Service
Trading Framework [90]) is needed. The set of QoS properties taken into account is
denoted as Q.
Taking into account this information the expected QoS provided by the application
can be optimized. The goal of this optimization is to find the binding that maximizes
the utility of the global QoS provided according to the consumers’ preferences. Such
preferences determine which binding is more valuable based on the global QoS levels
Qq provided for each property q. For instance, a total execution time Q ExTime of 2
seconds could be fair for some users but too much for others. User preferences are
expressed as weights wq and utility functions Uq for each QoS property q. The weights
define the relative importance of each property. For instance, wCost = 0.2 and wExTime =
282
G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM
0.1 means cost is twice as important as execution time for the user. Utility functions Uq
define which values of the specific property are more useful for the user.
Thus, our goal translates in to finding the best binding χ∗ that maximizes the global
user utility computed as:
GlobUtil (χ) =
∑ Uq (Qq (χ)) ∗ wq
(G.1)
q ∈Q
having ∑q∈Q wq = 1. Similar schemes for expressing user preferences and global
utility function have been used extensively in the literature [15, 43, 261, 303].
G.1.1
QoS Model
QoS properties
The set of quality properties Q = {C, T, A, R, S} considered in this dissertation has
been used extensively in related work [15, 43, 303]. It comprises of:
Cost (C). Fee that users must pay for invoking a service.
Execution Time (T). Expected delay between service invocation and the instant when
result is obtained.
Availability (A). Probability of accessing the service per invocation, where its domain
is [0, 1].
Reliability (R). It measures the trustworthiness of the service. It represents the ability
to meet the quality guarantees for the rest of the properties. Its value is usually computed based on a ranking performed by end users. For example, in www.amazon.com,
the range is [0, 5] where 0 means that QoS guarantees are violated systematically, and 5
means that guarantees are always respected. In this dissertation we assume its domain
is [0, 1].
Security (S). It represents the quality aspect of a service to provide mechanisms to assure confidentiality, authentication and non-repudiation of the parties involved. Usually this property implies the use of encryption algorithms with different strength, different key sizes on underlying messages, and some kind of access control. In this article
we use a categorization of the security, where the use of an encryption algorithm and
key size in a service implies a numerical value associated to this property for the ser-
283
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
vice. Its domain is [0, 1], where value 0 means no security at all and value 1 means
maximum security. Although all those properties are domain-independent, new and
possibly domain-dependent quality properties can be added without fundamentally
altering our approach.
QoS properties are usually classified as negative or positive. A quality property is
positive if the higher the value, the higher the user utility. For instance, availability is
a positive property, since the higher the availability the better. A quality property is
negative if the higher the value, the lower the utility. For instance, cost is a negative
property. A widely used [15, 43, 303] definition of the utility of the value x for a QoS
property q is:


1
if qmax − qmin = 0


x −qmin
if q is positive
(G.2)
Uq ( x ) =
max − qmin
q

max

 q −x
if q is negative
qmax −qmin
where qmax and qmin are the maximum and minimum values of qi,j for all candidate
services.
Computing the Global QoS
Apart from the specific providers chosen for each task, the global QoS values for
the CWS depend on:
The workflow of the composition and the type of QoS property. Global QoS is computed by recursively applying a QoS aggregation function according to the building
blocks that define the structure of the composition. Table 2 summarizes the aggregation functions applied for each QoS property q and type of building block. These
functions are widely applied in literature [15, 43, 261, 287, 303]. For instance, the total
execution time of the parallel branches is computed as the maximum execution time
of any branch, but the execution time of a sequence of tasks is computed as the sum.
Similarly, given a specific worklow, for instance the parallel branches of our motivating example (tasks t6 and t5 ), the total cost is computed as the sum of the costs of the
tasks, but the total execution time is computed as the maximum execution time of any
branch.
The specific branches chosen for execution and the number of iterations performed
in loops. Since in general the specific run-time behaviour of loops and alternative
branches is unknown in advance, an estimate of this behaviour is needed to perform
QoS-aware binding [44]. Specifically, for each loop and alternative execution branch in
the workflow, the average number of iterations k, and the probability of branch execu-
284
G.1. THE QOS-AWARE COMPOSITE WEB SERVICES BINDING PROBLEM
Sequence (S)
Loop (L)
Branch (B)
Fork (F)
∑i=n1 C (si )o
p
f
Cost (C)
∑im=1 C ( ai )
k · ∑in=1 C ( ai )
∑im=1 Pi · C (sib )
Time (T)
∑im=1 T ( ai )
k · ∑in=1 T ( ai )
∑im=1 Pi · T (sib )
max T (si )
Reliability (R)
∏im=1 R( ai )
∏im=1 A( ai )
(∏in=1 R( ai ))k
∑im=1 Pi · R(sib )
∑im=1 Pi · A(sib )
∏ i =1 R ( s i )
p
f
∏ i =1 A ( s i )
∑im=1 Pi · S(sib )
f B ( FS (sib ) , [ pi ])
mini=1 S(si )
Avaliability
(A)
Security (S)
min(S( ai ))i∈{1...m}
Custom
attribute (F)
f S ( F ( ai ))i∈1...m
(∏in=1 A( ai ))k
min(S( ai ))
f L (s L , k ))
f
p
f
p
f
f
f F ( F (si ))i∈1...p
Table G.2: QoS aggregation functions
tion pb are estimated. For instance, given PCCard = 0.8 and k = 2, i.e. probability of using
credit card is 0.8, and 2 iterations of stock reservation are performed; the estimated global
cost for the binding χ = ( A, B, D, D, F, H, J ) in the sample problem instance presented
in Section §G.1 is: QCost (χ) = Cost of switch(χ) + Cost of Loop(χ) + Cost of fork(χ) +
Cost7 (χ) = 0.8 ∗ 0.025 + 2 ∗ 0.06 + 0.09 = 0.23$
The actual global QoS values provided can differ significantly from the estimations
in some invocations, since those values are estimates. In the worst case this deviation
can lead to the violation global QoS constraints. To avoid this problem, the re-binding
triggering approach proposed in [44] could be used. The approach used to compute
the global QoS of binding Qq (χ) is similar to that proposed in [46]. These functions
are recursively defined on activities of the composition structure , allowing to compute global values by the recursive application of the corresponding function on each
building block.
Constraints of the QoSWSCB problem
The QoSWSCB problem has three types of constraints[15, 303]:
Global QoS constraints. They affect the QoS of the CWS as a whole. E.g. the total cost
of the composition must be lower than five ≡ Qcost (χ) < 5.
Local QoS Constraints. They affect the QoS values provided by the service chosen for
a specific tasks. E.g. the cost of payment (t2 ) must be lower than 1 ≡ cost2 (χ) < 1.
Service dependence constraints. A CWS may use several services that must be binded
to the same provider. This situation creates a dependence, i.e., if the provider is selected
for one of the tasks, then it must be selected for the rest of tasks it implements. In our
285
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
motivating example there exists a dependence constraint between tasks t3 and t4 (stock
management and reservation). Some examples of these interdependences can be found
in [15] regarding stateful services.
Our Proposal: QoSGasp
QoS-Gasp is a novel proposal for solving the QoSWSCB problem. It stands for
“QoS-aware GRASP+PR algorithm for service-based applications binding”. It is an
hybrid algorithm, where GRASP is used for initializing the elite set of Path Relinking.
Current approaches for solving the QoSWSC problem
The QoS-aware service composition is one of the most promising possibilities that
Service Oriented technologies brings to service developers and software architects. It
brings the dynamic, loosely coupled service selection paradigm of service orientation
to its maximum expression. This problem provides an excellent application scenario
for different methods and techniques, ranging from pure optimization techniques to
artificial intelligence systems. In this context, two kinds of optimization strategies have
been formulated for this problem in literature[303] [14]; global and local selection.
• Local approaches have two main drawbacks: (i)Obtained solutions are suboptimal with regards to the overall quality of the composite web service. (ii) Global
constraints according the structure of the composite web service and their quality
properties cannot be imposed.
• Global approaches try to optimize the whole set of services used in the composition according to their QoS properties, taking into account the structure of the
composition. Therefore, global QoS constraints can be formulated. In doing so,
more information about the expected properties of the composition must be provided, in order to overcome the variability that could arise in the optimization
process.
Global approaches to solve the QoSWSC problem comprise:
• The use of Integer Programing techniques ([303] [7]), Linear ([45]) or Mixed (I/L)
Programming techniques ([15] [227]). These kind of approaches model the problem using integer and/or real variables and a set of constraints. Although these
approaches provide the global optimum of the problem, and their performance
is better for small size instances of the problem, genetic algorithms outperform
286
G.2. OUR PROPOSAL: QOSGASP
these techniques for problem instances with avg(|Si |) > 17 [42]. The use metaheuristics is more flexible, because those techniques can consider non-linear composition rules and different fitness function formulations [14].
• The use of Heuristic techniques. [150] and [55] develop some specific heuristics
to solve the service composition problem. Applications of metaheuristics to this
problem are present in the literature, mainly using different genetic algorithm
based approaches. These incorporate variants to the work presented by [42]; either on the encoding scheme, the fitness function or QoS model ([110] [263] [287]),
or using population diversity handling techniques ([304]). [52] and [282] use a
multi objective evolutionary approach to identify a set of optimal solutions according to different quality properties without generating a global ranking. [220]
uses fuzzy logic to relax the QoS constraints, in order to find alternative solutions
when it is not possible to find any solution to the problem. Some authors have
proposed the use of simulated annealing ([287]), but no experimental results have
been provided.
In recent years, the trend has been to apply slight modifications to the own formulation of the QoSWSCB problem. In [171] the cost is used as the QoS property guiding
the search, but penalties and rewards expressed in the SLAs that define the QoS levels
of candidate services are taken into account. In [305], the building blocks supported
by the problem solving algorithm are extended to unstructured conditional and loop
patterns. In [180] the input parameters of the composition and the specific state of the
execution in rebinding are taken into account to improve the estimation of the QoS provided by each possible binding, making the search more accurate. In [164] the latency
of service invocations is taken into account for computing the global QoS.
G.2
O UR P ROPOSAL : Q O SG ASP
In this section we present QoS-Gasp a novel proposal for solving the QoSWSCB
problem. It stands for “QoS-aware GRASP+PR algorithm for service-based applications
binding”. It is an hybrid algorithm, where GRASP is used for initializing the elite set
used in Path Relinking.
Next we describe how GRASP and PR have been adapted for solving the QoSWSCB
problem.
287
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Solution encoding
In order to apply metaheuristic optimization algorithms a suitable encoding of solutions is needed. An encoding is the mechanism used for expressing the characteristics of solutions in a form that facilitates its manipulation during the rest of the algorithm. In QoS-Gasp a vector-based encoding structure is used. This encoding has been
used extensively in literature [42, 110]. Solutions are encoded as a vector of integer
values, with a size equal to the number of tasks | T | = n. Value j at position i of this
vector encodes the choice of service j as provider for task i. Thus, the vector [ j| . . . |k ] of
size n encodes the binding χ = {s1,j , . . . , sn,k }.
For instance, in our motivating example, the vector that encodes the binding
( A, B, D, D, F, H, J ) would be [0|1|1|1|1|1|0|0]. The index of each provider is determined
by order of appearance in table §G.1; e.g. for Banks A ≡ 0 and B ≡ 1. Note that the
values in each position of the vector would be either 0 or 1, since we have only two
providers per task in our motivating example; i.e., the encoding is not binary.
Constraints support
GRASP and PR do not support the optimization of constrained optimization problems directly. In order to overcome this drawback, a variant of equation G1 is used as
objective function. This variant takes into account a penalization term in a very similar
way as in [43]. This term is computed using a weight wun f , and a function D f that
measures the distance of a binding χ from a full constraint satisfaction:
∑ Meet(c, χ)
D f (χ, C ) =
c∈C
|C |
(G.3)
being C the set of global and interdependence constraints of the problem 1 . Meet(c, χ)
is a function that measures the distance to the fullfillment of a single constraint c by
the binding χ


0
if c is met










if c is global
 abs( Qq (χ) − Tq )
Meet(c, χ) =
(G.4)
(Dist. to threshold) and unmet








Fract. of interdep. if c is a dep.



 serv. missing
const. unmet
1 Local
constraints are not taken into account, since they can be met by preprocessing the set of candidate services [15].
288
G.2. OUR PROPOSAL: QOSGASP
In this function, we denote the threshold of each global constraint on QoS property q
as Tq . For instance, given the global constraint ”‘the total cost of the composition must be
lower than five”’ ≡ Qcost (χ) < 5, then Tcost = 5.
Thus our final function to be maximized is:
ObjFunc(χ) = GlobUtil (χ) − (wun f ∗ D f (χ))
(G.5)
having 0 ≤ wun f ≤ 1.
GRASP building phase
In QoS-Gasp, GRASP elements represent choices of a specific candidate service for
a task. Thus, the solution χ is built by choosing a service for a task at each iteration of
the loop until the solution is a complete binding. The partial solution at iteration k is
denoted as χk . The specific task to bind at iteration k is randomly chosen.
The set of valid elements for the task ti is determined by the service dependency
constraints. For instance, in our motivating example there exists a dependency constraint between t3 (stock querying) and t4 (reservation for pickup). Thus, if a provider
has been chosen for task t3 in our partial solution χk , then the same provider should be
chosen for t4 . If conflicting dependency constraints are found the construction phase
restarts, since it is not possible to create a feasible solution from χk−1 .
QoS-Gasp uses a RCL selection scheme that has been applied extensively in the literature of GRASP [235]. Specifically, this selection is driven by an evaluation function
g -that must be defined for the specific optimization problem to solve- and a greediness
parameter α (between 0 and 1). Function g provides a value in R for each candidate
service, where gmin is the minimum and gmax is the maximum of those values. A service si,j will be in the RCL if g(si,j ) is greater or equal than gmin + α · ( gmax − gmin ); i.e., α
defines the proportion of the range [ gmin , gmax ] in which candidates are discarded from
RCL. Thus, for α = 0 all the candidates are in the RCL (none is discarded), and the
construction phase becomes random. If α = 1 only the candidates with a value in g of
gmax would be in the RCL.
The function g and value of α are crucial for the performance of GRASP [235]. We
defined up to seven different greedy functions for the QoSWSCB problem. Since the
optimal values of those parameters depends on the problem to be solved, we performed a preliminary experiment testing each of function g with several values of α.
All the details about the g functions and their evaluation are reported in [212]. Summarizing, an auxiliary experiment #A1 was performed in order to choose the best com-
289
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
bination of values for α and g. The best average results were obtained for α = 0.25 with
greedy functions G1, G2 and G6 showed below:
G1 (si,j , χk ) =
∑ wq · Uq (qi,j )
(G.6)
q∈ Q
G1 is “miopic” and unadaptive, meaning that it only considers the QoS value of each
service, ignoring the current solution under construction χk , but its evaluation is extremely fast.
G2 (si,j , χk ) = D f (χk ) − D f (χk ∪ si,j )
(G.7)
G2 uses the difference of distance to constraint satisfication of the current partial solution χk and the new partial solution, denoted as χk ∪ si,j , but it ignores the QoS weights
G6 (si,j , χk ) = ObjFunc(χk ∪ si,j ) − GlobUtil (χk )
(G.8)
G6 is based directly on the gradient of the global QoS, but ignoring the distance to constraint satisfaction of the current solution. This subtle variant penalizes the selection of
elements that generate constraint violations.
In order to evaluate D f , GlobUtil, and ObjFunc, a random solution is generated at
the beginning of the construction phase, and their elements are used to complete the
choices for unassigned tasks in χk .
GRASP improvement phase
The GRASP improvement phase in QoS-Gasp is a local search procedure based
on a neighbourhood definition. The neighbourhood of a binding χ comprises of all
possible bindings that have exactly n − 1 assignments in common with χ; i.e. have
the same candidate services selected for each task except for one. QoS-Gasp uses Hill
Climbing, where only a percentage of the neighbourhood is explored.
Path Relinking
QoS-Gasp uses the adpation of GRASP described above to initialize the elite set
used by PR. The length of the path between initiating and guiding solutions in QoSGasp is determined by the number of different service candidates. Each step of any relinking path, incorporates one service candidate from the guiding solution. It is worth
noting that the order in which service candidates are incorporated defines different
paths. Consequently, for each pair of initiating and guiding solutions a high number
of different paths could be explored. In order to reduce the computational cost of such
exploration, QoS-Gasp restricts the number of paths generated between eah pair of solutions to Npaths . It introduces the service candidates from the guiding solution in a
290
G.3. PREVIOUS PROPOSALS
random order, and it limits the number of neighbours explored in each path to Nsteps .
These parameters control the balance between the diversifycation of the areas of the
search space explored and the exhaustiveness of the search in those areas, which is
crucial in rebinding scenarios where execution time is scarce.
G.3
P REVIOUS P ROPOSALS
G.3.1
Genetic Algorithms
The proposal described in [42] has been implemented for comparison since it is the
most cited GA-based approach for this problem. In particular, the initial population is
generated randomly. A standard one-point crossover operator [78] is used.
The mutation operator modifies the candidate to a single task, both chosen randomly. Parameter values are chosen according to [42].
G.3.2
Hybrid TS with SA
A hybrid of TS with Simulated Annealing (SA) was proposed in [166] for solving
the QoSWSCB problem. This proposal was aimed at finding feasible solutions of constrained instances, thus the search was driven by the constraint meeting distance and
the execution terminates when a feasible solution is found. In order to enable the comparison with our proposals, and to continue optimizing according to user preferences
even when all constraints are met, a modification has been carried out. When all the
constraints are met, the difference between the QoS value of current solution Qq and
the average QoS for this property Avgq is used for guiding the search. Specifically, the
QoS property selected to guide the improvement in the algorithm is the one minimizing s ∗ ( Qq (χ) − Avgq ) ∗ wq , where s is 1 if q is positive and −1 if it is negative; i.e., our
modification tries to generate neighbors improving the solution in the QoS property
with the bigger improvement room and importance for users. The pseudo-code of the
resulting algorithm, and a detailed explanation of its working scheme is available in
the additional material ([212]).
291
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
G.4
E XPERIMENTS PERFORMED ON THE Q O SWSCB P ROB LEM
G.4.1
Experiment QoSWSCB-#A1: Tailoring of GRASP
The aim of preliminary experiment #A1 was to choose the parameters of the GRASP
construction phase to be used in the experimentation. These parameters are two: (i)
α, that controls the level of elitism versus randomness when creating the RCL; and (ii)
the greedy function G, to be used in the evaluation of features at each iteration of the
construction phase. This experiment is similar to those presented in [222], which we
have used for inspiration.
In this experiment we assume that the extreme values of α, zero and one, are not
valid options. When α is zero the constructions phase turns into a purely elitist algorithm, similar to local optimization methods. When α is one, the construction method
is a purely random solutions generation procedure. Consequently, in this experiment
the following range of values for α is used: 0.25, 0.5, and 0.75. The range of possible
greedy functions along with their formulations are described in Section §G.2, from G1
to G7.
A good greedy function should provide not only near-optimal solutions, but generate enough diversity as to explore the multiple local optima of the search space
[222, 235]. A measure of diversity of solutions is defined based on our notion of neighboring solutions
A factorial design is used for this preliminary experiment, generating all the possible combinations of parameter values. Eleven problem instances were generated by
the algorithm described in Appendix C of [212].
In general, the best global configuration is α = 0.25 and G = G6. However, for
hardly constrained instances, the configuration α = 0.5 and G = G2 provides better
mean results, and should be used for this kind of problem. It is not surprising that G2
generates much diversity, since for unconstrained instances of the problem its use is
equivalent to a random selection of features. Note that the evaluation of this greedy
function will always be zero for unconstrained problems and thus all the features are
inserted in the RCL independently of the value of α. G1 and G4 have also a good
performance in general but are worse than G2.
292
G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM
Paramenter
Initial Grasp Iterations
Number of Elite Solutions
Number of paths per Iteration
Number of neighbors to explore per path
Range
{10, 50, 100}
{2, 5, 10}
{2, 5, 10}
{5, 10, 50}
Table G.3: Parameters Ranges
G.4.2
Experiment #A2: Tuning of GRASP+PR
The aim of preliminary experiment #A2 was to choose the parameters of the GRASP+PR
approach to be used in the experimentation. The parameters values for the underlying
GRASP technique are chosen based on the results of the previous section. Thus, only
the values of parameters specific of Path Relinking left. These parameters are: (i) Initial
Grasp Iterations (number of grasp iterations prior to the intensification phase of PR),
(ii) Number of Elite Solutions, (iii) Number of Paths per Iteration; and (iv) Number of
neighbours to be explored on each path.
A factorial design is used for this preliminary experiment, generating all the possible combinations of parameter values in a certain range. The ranges used for each
parameter are described in table §G.3. These ranges were determined by authors criteria and some exploratory executions of GRASP+PR.
A number of 11 problem instances were generated by the algorithm described in
[212]. GRASP+PR where run 30 times for each problem instance and parameters configuration. Techniques were configured for using a termination criteria based on a
maximum execution time of 100ms.
Due to the size of the experiment design, we simply describe the conclusions of the
experiment. Both the source code of the implementation and the raw data of results
are available on-line for interested readers.
The configuration with best mean results was: (i) Initial Grasp Iterations = 50; (ii)
Number of Elite Solutions = 10; (iii) Number of paths per Iteration = 2; and (iv) Number
of neighbors to explore per path = 50.
G.4.3
Experiment #1: Selection of a technique for QoSWSCB
The aim of this experiment is to compare the performance of our proposal and previous ones in terms of the QoS of solutions they provide. Previous proposals (as de-
293
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
scribed in Section §G.3) are compared to ours when solving a number of instances of
the QoSWSC problem. Specifically, we compare Genetic Algorithms (GAs) and Hybrid
Tabu Search with Simulated Annealing (TS/SA), with a GRASP using G1 (GRASP(G1)),
and two variants of GRASP with Path Relinking (GRASP+PR) that use G2 and G6.
Positive scaling utility function were used for Availability, Reliability and Security (we
denote this set of properties as Q+ = { A, R, S}), c.f. section §G.1.1. Negative scaling
utility functions were used for the remaining properties (Q− = {C, T }). The weights
used for each QoS property were: wun f = 0.5, wC = 0.3, wT = 0.3, w A = 0.1, wS = 0.2,
w R = 0.1. Since FOM solves minimization problems, an objective function that subtracts the value of ObjFunc (as described in equation G.5) to 1.0 was used. Problem
instances were generated by the algorithm described in appendix C of [212].
Table §G.4 shows the mean results per problem instance and execution time. Specifically, table §G.4 is divided into four sub-tables by execution time. In each sub-table,
rows depict the results obtained for each problem instance, and columns depict the
results obtained by each optimization technique. The best means per problem instance
and execution time is highlighted in boldface. Note that the problem was modelled as a
minimization problem for compatibility with the experimental framework FOM which
mean that the lower the value the better. It is noticeable that GRASP+PR(G6) obtained
the best mean results in all cases. GA provides intermediate results, better than TS+SA,
but not as good as GRASP+PR and GRASP. The performance of TS+SA was bad except for tightly constrained problem instances. Our statistical analysis revealed that
the differeces among GRAPS+PR(G6) and the other techniques are statistically significant (with α = 0.05) except for one problem instance and technique. Specifically, the
differences between GRASP(G1) and GRAPS+PR(G6) are not signficant for Problem
P7 when execution times are longer than 500ms. It is worth noting that P7 is significantly smaller that the others. It contains only 7 tasks and 63 candidate services. Thus,
authors infer that for small instances of the problem, GRASP(G1) can behave nearly
as well as GRASP+PR(G6). The causes of this behaviour could be: (i) the inneficiency
of the intensification strategy of PR, since the probability of overlapping of paths is
bigger for small problem instances; and (ii) the capability of GRASP for exploring the
promising area of the search space for small problem instances.
In order to evaluate extent to which some techniques outperform others, we computed the percentage of runs where the result obtained by one technique are better
than any result (out of the 30 runs) obtained by other technique (for the same problem
instance and execution time). Table §G.5 summarizes these results. It is divided into
four sub-tables by execution time, were each sub-table contains a square matrix with
294
G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM
the optimization techniques in rows and columns. Specifically, the value of a cell is the
mean of the percentage described above for the problem instances. For instance, the
value in the second row and first column of the top-left sub-table specifies that for execution times of 100ms, on average for the problem instasces a 92.42% of the solutions
obtained by GRASP+PR(G6) are better than any solution obtained by GA. This means
that the results obtained by GRASP+PR(G6) outperform those obtained by GA. Since
the percentages are averaged for all the problem instances and refer to different pairs
of techniques, the sum by rows and columns is not 100%. Table §G.5 confirms the conclusions drawn above, since the row of GRASP+PR(G6) has the higher percentage in
almost any execution time and column. However, it is noticiable the small percenge of
such row for the column of GRASP(G1), while the trasposed cell (row GRASP(G1) and
column GRASP+PR(G6)) has also a small percentage. This means that, although on
average the results of GRASP+PR(G6) are better and have less dispersion than those of
GRASP(G1), the latter can find ocasionally better solutions than those usually found by
the former. Another noticeable finding is the progressive decrease of the percentages
of GRASP+PR(G6) and GRASP(G1) when execution time increases.
Figures §G.4.3 and §G.4.3 show box plots for two problem instances with a termination criterion of 100ms. Each figure depicts four populations, defined as the
values of ObjFunc for the best solution obtained in the runs of an optimization technique. Thus each population has 30 samples. Results of GRASP+PR(G6) are labeled as
GRASP+PR, and those of GRASP(G1) as GRASP. Specifically, for each population the
boxplot shows: the minimum sample represented as the lower horizontal line segment,
lower quartile (Q1) represented as the lower limit of the box, median (Q2) segment dividing the box, upper quartile (Q3) represented as the top of the box, and largest sample represented as the upper horizontal line segment. Samples considered outliers are
represented as circles or stars. The distribution of the results obtained by GRASP+PR
is the best in both figures. The small variability of the results provided by TS+SA is
analyzed in depth in [212].
The improvements provided by our proposals are significant not only in a statistical
sense, but also in terms of the actual QoS provided. As a motivating example, the
QoS of solutions provided by GRASP+PR(G6) for problem instance C4 are 49.25% and
28% better on average than those provided by GAs and TS+SA respectively. These
improvements are noteworthy when translated into costs savings and execution time
decreases.
295
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Figure G.2: Box plot of results in Experiment #1 and problem instance 9
Figure G.3: Results in Experiment #1 and problem instance 2
296
G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM
Execution Time
Technique
Problem P0
Problem P1
Problem P2
Problem P3
Problem P4
Problem P5
Problem P6
Problem P7
Problem P8
Problem P9
Problem P10
GA
0,317053066
0,832070664
0,314220241
0,786458899
0,810939436
0,345341638
0,814693606
0,755621698
0,859142490
0,802587993
0,333850406
GRASP+PR (G6)
0,31467194
0,82958546
0,30428238
0,77332510
0,81066853
0,33979296
0,79437721
0,74596884
0,85159186
0,78813945
0,33271258
100 ms
GRASP+PR (G2)
0,31559766
0,83114955
0,30821952
0,77774798
0,81082234
0,34254369
0,80665351
0,74604446
0,85524606
0,79375533
0,33290040
GRASP(G1)
0,31585178
0,82996641
0,30961194
0,77939848
0,81086532
0,34201717
0,79564660
0,74597200
0,85185832
0,79500608
0,33326753
TS/SA
0,37823648
0,90526782
0,40335166
0,87387101
0,81292188
0,39219569
0,89469352
0,82326859
0,91504732
0,88106812
0,34420161
Execution Time
Technique
Problem P0
Problem P1
Problem P2
Problem P3
Problem P4
Problem P5
Problem P6
Problem P7
Problem P8
Problem P9
Problem P10
GA
0,316847884
0,832185790
0,314327155
0,785376182
0,810937976
0,345156788
0,815804062
0,756262913
0,859046630
0,803349334
0,334067627
GRASP+PR (G6)
0,31465716
0,82958531
0,30334881
0,77303310
0,81066853
0,33979296
0,79307634
0,74577745
0,85159186
0,78810276
0,33271171
500 ms
GRASP+PR (G2)
0,31541106
0,83047811
0,30667738
0,77626168
0,81077270
0,34138343
0,80427884
0,74566567
0,85385226
0,79208361
0,33276010
GRASP(G1)
0,31559366
0,82996641
0,30770218
0,77730067
0,81081414
0,34179962
0,79564660
0,74580965
0,85185832
0,79353237
0,33286655
TS/SA
0,37819911
0,90526782
0,40335070
0,87377022
0,81291786
0,39217004
0,89464300
0,82099777
0,91504127
0,88106812
0,34393207
Execution Time
Technique
Problem P0
Problem P1
Problem P2
Problem P3
Problem P4
Problem P5
Problem P6
Problem P7
Problem P8
Problem P9
Problem P10
GA
0,31702924
0,83257414
0,31406676
0,78422132
0,81094311
0,34514013
0,81558663
0,75901268
0,85937275
0,80275109
0,33372791
GRASP+PR (G6)
0,31464257
0,82958531
0,30334250
0,77294846
0,81066853
0,33978323
0,79244253
0,74549613
0,85159186
0,78810276
0,33268621
1000 ms
GRASP+PR (G2)
0,31514186
0,82991653
0,30495659
0,77398728
0,81072816
0,34021847
0,79865127
0,74544938
0,85210501
0,79004525
0,33266052
GRASP (G1)
0,31521718
0,82992594
0,30548735
0,77478427
0,81073885
0,34087602
0,79564660
0,74549475
0,85185832
0,79100871
0,33268679
TS/SA
0,37819911
0,90526782
0,40335070
0,87377022
0,81291786
0,39217004
0,89464300
0,82099777
0,91504127
0,88106812
0,34393207
Execution Time
Technique
Problem P0
Problem P1
Problem P2
Problem P3
Problem P4
Problem P5
Problem P6
Problem P7
Problem P8
Problem P9
Problem P10
GA
0,31708794
0,83234986
0,31456350
0,78437278
0,81090575
0,34478886
0,81608455
0,75804813
0,85909064
0,80183890
0,33395456
GRASP+PR (G6)
0,31464248
0,82958531
0,30334250
0,77284804
0,81066853
0,33977041
0,79238052
0,74542966
0,85159186
0,78810276
0,33265462
50000 ms
GRASP+PR (G2)
0,31503530
0,82979438
0,30436232
0,77318822
0,81070306
0,33982145
0,79697338
0,74541133
0,85185755
0,78949346
0,33264226
GRASP (G1)
0,31511347
0,82981234
0,30472537
0,77396728
0,81072227
0,34035985
0,79564660
0,74542556
0,85183389
0,78994663
0,33265207
TS/SA
0,37819911
0,90526782
0,40335070
0,87377022
0,81291786
0,39217004
0,89464300
0,82099777
0,91504127
0,88106812
0,34393207
Table G.4: Means of obj. func. per algorithm and exec. time (Experiment 1)
297
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Exec. Time
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
Exec. Time
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
92,42%
35,45%
84,55%
0,00%
500 ms
GA GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1)
0,00%
0,30%
0,00%
87,27%
63,33%
1,21%
58,48%
1,21%
0,91%
83,94%
0,61%
60,30%
0,30%
0,30%
0,30%
0,30%
TS+SA
90,91%
90,91%
90,91%
90,91%
1000 ms
GRASP+PR(G2) GRASP(G1)
0,00%
0,00%
68,18%
1,52%
0,91%
4,24%
0,00%
60,30%
0,30%
0,30%
0,30%
TS+SA
90,91%
90,91%
90,91%
90,91%
50000 ms
GA GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1)
0,91%
0,91%
0,91%
71,52%
36,06%
5,15%
72,73%
0,91%
9,09%
72,73%
0,00%
31,82%
0,91%
0,30%
0,30%
0,61%
TS+SA
36,97%
89,39%
76,06%
90,61%
Exec. Time
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
Exec. Time
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
100 ms
GRASP+PR(G2) GRASP(G1) TS+SA
0,30%
0,00% 100,00%
80,30%
3,94% 100,00%
0,30%
0,30% 100,00%
0,91%
70,00%
100,00%
0,00%
0,00%
0,00%
GRASP+PR1(G6)
0,00%
86,67%
74,85%
83,03%
0,30%
GRASP+PR1(G6)
0,00%
Table G.5: Mean percentage of solutions improving any obtained by other tech. (Exp1)
G.4.4
Experiment #2: Selection of a technique for QoSWSCB (with a
different objective function)
In order to ensure that the differences between our proposals and the previous approaches do not depend on the specific fitness function and problem instances used,
we repeated the experiment using 11 additional problem instances and the objective
298
G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM
function defined in [43].
∑ (wq · Uq ( Qq (χ))
min
f Can
f (χ)
=
q∈ Q−
∑ (wq · Uq ( Qq (χ))
+ wun f · D f (χ)
(G.9)
q∈ Q+
The results obtained for this experiment are shown in table §G.6 using the same
structure and notation as in table §G.4. GRASP+PR(G6) generetes the best mean results
for most problem instances. Specifically, for execution times of 500ms GRASP+PR(G6)
provides the best average results for 8 out of 11 problem instances. TS+SA provided
the best mean results for problem C2. This fact confirms that for tightly constrained
problem instances it can perform better than GA and the GRASP-based proposal. This
result is coherent, since it priorizes constraint satisfaction in the search process [166].
GRASP provided the best mean results for two problem instances (C5 and C6).
Table §G.7 shows the mean percentages of improvments in a similar way as table
§G.5. Again, GRASP+PR(G6) provided the highest percentages in general. The capabiliy of GRASP(G1) for finding sporadically the best results is confirmed by the results
in table §G.7. Moreover, the descreasing trend of the percentages of GRASP+PR(G6)
when execution time increases is also appreciable. A noticeable difference regarding
table §G.5 are the percentages of TS + SA. The performance of this technique is much
better in this experiment. Thus, the performance of TS + SA is higly influenced by the
specific objective function used for modelling the global utility. Statistical tests confirmed that the differences in the group of techniques were statistically significant in
almost all cases. The only exception were the differences between GRASP+PR(G6) and
TS+SA for problem (C2) and execution times of 50000ms.
Figures §G.4.4 and §G.4.4 show two plots depicting the results of each technique
for two different problem instances with equation 9.9 as objective function, and a termination criterion of 100ms. Again, the distribution of GRASP+PR is the best in both
figures.
299
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Exec. Time
Technique
Problem C0
Problem C1
Problem C2
Problem C3
Problem C4
Problem C5
Problem C6
Problem C7
Problem C8
Problem C9
Problem C10
GA
20,3294
17546,9250
77,4635
365838,7379
4660,0503
43077,7130
504,0981
29445,2042
623,4833
141414,7664
21682,8585
GRASP+PR (G6)
18,1494
16798,8826
53,3737
354607,1630
2688,8626
40087,5854
353,8804
32899,8809
590,7976
129780,8750
20345,8959
100 ms
GRASP+PR (G2)
19,0278
16892,6761
69,3314
357263,5720
4032,7248
41927,6160
367,8332
25257,0317
604,8967
135381,2160
20448,2644
GRASP(G1)
18,8089
16883,4022
50,6206
357324,9570
2817,3613
39712,0757
347,8916
19163,0773
605,9958
133877,3010
20392,9211
TS/SA
19,4567
18028,5566
47,2274
381218,1130
2758,6429
39804,2874
348,1642
20070,3702
653,1621
144955,3870
26421,6496
Execution Time
Technique
Problem C0
Problem C1
Problem C2
Problem C3
Problem C4
Problem C5
Problem C6
Problem C7
Problem C8
Problem C9
Prolem C10
GA
20,2267
17538,6257
77,2953
371234,6973
4717,7005
43167,4373
512,1532
27976,6404
621,2869
143803,8980
21803,1250
GRASP+PR (G6)
18,1300
16793,7861
50,7654
354238,9660
2522,7860
40087,5854
352,3514
16228,1583
565,9943
126129,2150
20189,7876
500 ms
GRASP+PR (G2)
18,7870
16853,8274
64,8281
354854,3030
3846,3891
40992,9427
368,4614
22018,1279
593,6641
131166,6470
20295,9563
GRASP(G1)
18,7038
16814,3450
50,6206
354498,9940
2817,3613
39712,0757
347,2431
19156,5452
591,7945
130685,0550
20295,1878
TS/SA
19,4567
18028,5566
47,2274
381218,1130
2758,6429
39804,2874
348,1642
20070,3702
653,1621
144955,3870
26421,6496
Execution Time
Technique
Problem C0
Problem C1
Problem C2
Problem C3
Problem C4
Problem C5
Problem C6
Problem C7
Problem C8
Problem C9
Problem C10
GA
20,2537
17344,5197
77,8725
366237,9430
4729,2815
43157,9618
499,4118
28586,1747
622,9089
143082,5650
21699,4216
GRASP+PR (G6)
18,1294
16795,3229
49,1226
353935,1220
2379,8780
40039,7574
354,9664
15381,4457
556,9008
125785,1540
20183,3250
1000 ms
GRASP+PR (G2)
18,7945
16836,7681
62,7276
355399,0430
3529,8844
40893,9150
368,2569
19556,9182
581,6285
129145,8090
20295,5048
GRASP (G1)
18,3892
16799,8261
50,6205
353959,6700
2817,3613
39712,0757
344,4944
18875,8913
574,9983
128143,6330
20228,4809
TS/SA
19,4567
18028,5566
47,2274
381218,1130
2758,6429
39804,2874
348,1642
20070,3702
653,1621
144955,3870
26421,6496
Execution Time
Technique
Problem C0
Problem C1
Problem C2
Problem C3
Problem C4
Problem C5
Problem C6
Problem C7
Problem C8
Problem C9
Problem C10
GA
20,5018
17444,8214
79,1106
368806,7140
4591,2618
43269,8455
513,9603
28453,8342
623,6269
142111,7900
21672,8291
GRASP+PR (G6)
18,1271
16789,8868
47,3028
353530,0840
2474,0530
40228,1459
360,8437
14439,0012
557,2021
125284,279
20204,3569
50000 ms
GRASP+PR (G2)
18,7918
16841,7142
63,4201
354685,5790
3550,5512
40794,3524
377,2588
19615,3699
584,9948
129281,0100
20282,9220
GRASP (G1)
18,2799
16789,4079
50,6206
353517,1300
2817,3613
39712,0757
340,9694
18326,1133
568,9675
127287,3410
20213,8838
TS/SA
19,4567
18028,5566
47,227387
381218,1130
2758,6429
39804,2874
348,1642
20070,3702
653,1621
144955,3870
26421,6496
Table G.6: Means of obj. func. values per algorithm and execution time in Exp. 2
300
G.4. EXPERIMENTS PERFORMED ON THE QOSWSCB PROBLEM
Execution Time
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
Execution Time
94,85%
62,73%
100,00%
45,15%
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
88,79%
61,21%
90,61%
36,06%
Execution Time
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
75,45%
40,91%
75,45%
36,06%
Execution Time
GA
GA
GRASP+PR(G6)
GRASP+PR(G2)
GRASP(G1)
TS+SA
71,52%
54,24%
72,42%
26,97%
200 ms
GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA
0,00%
0,00%
0,00% 41,21%
59,09%
13,94% 76,97%
0,00%
1,21% 62,12%
7,88%
62,42%
90,91%
9,09%
26,97%
9,09%
500 ms
GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA
0,30%
0,30%
0,30% 42,73%
74,24%
8,18% 73,33%
0,00%
1,21% 56,06%
3,33%
73,94%
74,55%
18,18%
36,36%
18,18%
1000 ms
GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA
0,61%
0,61%
0,61% 35,15%
59,39%
2,42% 72,12%
0,30%
2,73% 54,55%
0,61%
57,88%
70,00%
18,18%
27,27%
27,27%
50000 ms
GRASP+PR1(G6) GRASP+PR(G2) GRASP(G1) TS+SA
0,91%
0,91%
0,91% 35,45%
56,97%
8,48% 66,97%
0,00%
1,82% 48,79%
0,00%
56,06%
65,15%
18,18%
18,48%
18,18%
Table G.7: Mean percentage of solutions improving any obtained by other tech. (Exp2)
One of the most surprising results of the study is the low dispersion of results obtained by the TS/SA algorithm between different executions, specially when using the
objective function of equation 9.9. Authors logged the execution of this technique in
order to determine the root of these results. One of the causes was the acceptance criteria defined: according to [166], the probability of acceptance of a solution χ0 as the
next current solution, given that current solution is χ, is exp(( F (χ) − F (χ0 )) · iteration)
-where F is the fitness function. For functions with large scale values such as equation
9.9, it is rare to choose solutions that do not improve, since if there is a large difference in the objective function and the number of iterations increase, the probability
of accepting them becomes extremely small. Therefore, the algorithm only chooses
301
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
Figure G.4: Results of each technique in Experiment #2 for problem instance 0
Figure G.5: Results of each technique in Experiment #2 for problem instance 9
non improving neighbors very early in the optimization process. Furthermore, for
constrained problem instances the criteria of constraint satisfaction prevails, since it is
applied earlier than the optimization criteria. Under these circumstances, the neighbor
generation subroutine tends to use always the same properties for improvement, thus
leading the search to similar areas in the search space. Finally, the deterministic initial
solution generation method (local optimization based initialization, in Table 4 of [166]),
302
G.5. THREATS TO VALIDITY
also contributes to generate the nearly constant results.
G.5
T HREATS TO VALIDITY
In order to clearly delineate the limitations of the experimental study, next we discuss internal and external validity threats.
Internal validity. This refers to whether there is sufficient evidence to support the conclusions and the sources of bias that could compromise those conclusions. In order to
minimize the impact of external factors in our results, QoS-Gasp was executed 30 times
per problem instance to compute averages. Moreover, statistical tests were performed
to ensure significance of the differences identified between the results obtained by the
compared proposals.
External validity. This is concerned with how the experiments capture the objectives
of the research and the extent to which the conclusions drawn can be generalized. This
can be mainly divided into limitations of the approach and generalizability of the conclusions. Regarding the limitations, experiments showed no significant improvements
when comparing QoS-Gasp with a simple GRASP for small problem instances and
short execution times. This limitation is due to: (i) the capability of GRASP to explore
a significant amount of the search space, and (ii) the overlapping of the paths explored
by PR for such small problem instances.
Regarding the generalizability of conclusions, two different objective functions, and
two different sets of problem instances were used. Additionally the parameters and
size of were chosen from a survey of the most common values used in the literature
(c.f. tables of problem instance parameters in [212] and [261]). Finally, conclusions regarding the performance of QoS-Gasp are not generalizable to scenarios with longer
exeuctions times, pointing out a direction of future work.
303
APPENDIX G. QOS-AWARE BINDING OF COMPOSITE WEB SERVICES
304
H
G ENERATION OF H ARD F EATURE
M ODELS
The most effective way to do it, is to do it.
Amelia Earhart,
1897 – disappeared in 1937
American aviation pioneer and author
H.1
I NTRODUCTION
Recent publications reflect an increasing interest to evaluate and compare the performance of analysis techniques and tools on the analyses of feature models. One of
the main challenges when assessing performance is to find hard problems that show
the strengths and weaknesses of the tools under test in extreme situations (e.g. those
producing longest execution times). Feature models from real domains are by the far
the most appealing input problems. Unfortunately, although there are references to
large–scale real feature models, only small examples from research publications or
cases studies are available. For instance, the largest feature model available in the
SPLOT feature model repository [258] at the time of writing this paper has 287 features. This lack of hard realistic feature models, has led authors to evaluate their tool
with large–scale randomly generated feature models of 5000, 10000 and up to 20000
features. More recently, some authors have also suggested looking for tough and realistic feature models into the open source community.
Regardless of the type of feature model used during experimentation, the characteristics of the tools under tests are not considered in the current state of the art. As a
result, current performance evaluations only provide a rough idea of the behaviour of
tools with average problems rather than looking for specific weak points related to the
type of technique or algorithm under evaluation. Hence, developers and users would
probably be more interested to know whether their tool can crash with a hard realis-
305
APPENDIX H. GENERATION OF HARD FEATURE MODELS
tic model of small or medium size rather than knowing the execution times of huge
random model out of their scope.
The main goal of software testing is to find inputs that reveal errors in the software
under test. The exhaustive search for these inputs is acknowledged to be unfeasible
due to the size and complexity of the programs, there are simply too many inputs
combinations. As pointed by McMinn [184], random testing is not a feasible solution: ”random methods are unreliable and unlikely to exercise ‘deeper’ features of software that
are not exercise by mere chance”. In this context, metaheuristic search techniques have
proved to be a promising solution for the automated generation of test data for both
functional [184] and non–functional properties [6]. Metaheuristic search techniques are
frameworks which use heuristics to find solutions to hard problems at an affordable
computational cost. Typical metaheuristic techniques are evolutionary algorithms, hill
climbing or simulated annealing [280]. For the generation of test data, these strategies translate the test criteria into an objective function (also called fitness function)
that is used to evaluate and compare the candidate solutions respect to the overall
search goal. Using this information, the search is guided toward promising areas of
the search space. Wegener et al. [288, 289] were one of the first proposing using evolutionary algorithms to verify the time constraints of software back in 1996. In their
work, the authors used genetic algorithms to find input combinations that violate the
time constraints of real–time systems, that is, those inputs producing an output too
early or too late. Their experimental results showed that evolutionary algorithms are
much more effective than random search in finding input combinations maximizing
or minimizing execution times. Since then, a number of authors have followed their
steps using metaheuristics and especially evolutionary algorithms for the testing of
non–functional properties such as execution time, quality of service, security, usability
or safety [6, 184].
Inspired by the ideas of Wegener and later authors, in this an evolutionary algorithms for the automated generation of hard feature models has been proposed. In
particular, we model the problem of finding computationally–hard feature models as
an optimization problem and we solve it using a novel evolutionary algorithm for
feature models. Given a tool and analysis operation, our algorithm generate input
models of a predefined size maximizing aspects as the executions times or the memory consumption of the tool when performing the operation. For the evaluation of
our approach, we performed several experiments using different analysis operations,
paradigms, tools and optimization criteria. In total, we performed over 50 million executions of analysis operations for the configuration and evaluation of our approach.
306
H.2. FEATURE MODELS
The results showed how our evolutionary program successfully identified input models causing much longer executions times and higher memory consumption than random models of identical or even larger size. Furthermore, the data revealed that the
hard feature models found have similar properties to the realistic models found in the
literature.
H.2
F EATURE M ODELS
Feature models were first introduced as a part of the Feature-Oriented Domain
Analysis method (FODA) by Kang back in 1990 [157]. In this first approach, three
kinds of relationships between features were proposed:
• Mandatory. If a child feature is defined as mandatory, it must be included in
all the products in which its parent feature is included. Mandatory features are
generally modelled using a filled circle as shown in Figure §H.1(a). For instance,
according to the feature model of Figure §H.3, it is mandatory that the software
for mobile phones includes support for calls.
• Optional. If a child feature is defined as optional, it can be optionally included in all
products in which its parent feature appears. Optional features are generally represented
using a empty circle as shown in Figure §H.1(b). For instance, support for multimedia
devices (e.g. camera) is an optional feature in the feature model of Figure §H.3.
• Alternative. A set of child features are defined as alternative, if only one of them can be
selected when its parent feature is part of the product. Figure §H.1(c) depicts the usual
visual representation for this relationship. As an example, according to the feature model
of Figure §H.3, a mobile phone may use a Symbian or a WinCE operating system but not
both in the same product.
Notice that a child feature can only appear in a product if its parent feature does.
The root feature is a part of all the products within the software product line. In addition to the parental relationships between features, two kinds of cross-tree constraints
between features were identified in FODA. These are:
• Requires. If a feature A requires a feature B, the inclusion of A in a product
implies the inclusion of B in such product. Requires constraints are commonly
modelled using unidirectional arrows as shown in Figure §H.2(a). For instance,
307
APPENDIX H. GENERATION OF HARD FEATURE MODELS
(a) Mandatory
(b) Optional
(c) Alternative
Figure H.1: Feature relationships
according to the feature model of Figure §H.3, mobile phones including camera
require highde f inition.
• Excludes. If a feature A excludes a feature B, both features cannot be part of the same
product. This constraints are visually represented using bidirectional arrows as shown
in Figure §H.2(b). As an example, the software product line represented by the model of
Figure §H.3 rules out the possibility of offering support for GPS with a basic screen in
the same product.
(a) Requires
(b) Excludes
Figure H.2: Cross-tree constraints
In order to clarify the concepts concerning basic feature models we present some
examples. Figure §H.3 depicts a simplified feature model inspired by the mobile phone
industry. The model illustrates how features are used to specify and build software for
mobile phones. The software loaded in the phone is determined by the features that it
supports. According to the model, all phones must include support for calls, messaging,
and one screen ( basic), colour or HighDe f inition (but not two screens). Furthermore, the software for mobile phones may optionally include support for multimedia devices such as camera,
MP3 or both of them.
308
H.3. ETHOM: AN EVOLUTIONARY ALGORITHM FOR FEATURE MODELS
Figure H.3: Mobile phone feature model (without cross-tree constraints)
H.3
ETHOM: A N EVOLUTIONARY ALGORITHM FOR FEA TURE MODELS
ETHOM is a novel evolutionary algorithm for solving optimization problems on feature
models. The algorithm takes several size constraints and a fitness function as input and returns
a feature model of the given size maximizing the optimization criteria defined by the function.
The algorithm takes several size constraints and a fitness function as input and returns a
feature model of the given size maximizing the optimization criteria defined by the function. In
Section §2.4.1, we described the general structure of an evolutionary algorithm and explained
its basic steps. In the following, we describe how these basic steps are carried out in our algorithm.
Initial population. The initial population is generated randomly according to the size constraints received as input. The current version of our algorithm allow the user to specify the
number of features, percentage of cross-tree constraints and maximum branching factor of the
feature model to be generated.
Evaluation. Feature models are evaluated according to the fitness function received as input
obtaining a numeric value that represents the quality of the candidate solution (i.e. its fitness).
Encoding. For the representation of feature models as individuals we propose using a custom
encoding. Existing encodings were ruled out since these were either not adequate to represent
tree structures (e.g. binary encoding [16]) or were not able to produce solutions of a fixed size
(e.g. tree encoding [168]), a key requirement in our approach. Figure §H.4 depicts an example
of our encoding. As illustrated, each model is represented by means of two arrays, one storing information about the tree and another one with information about Cross-Tree Constraints
(CTC). The order of each feature in the array corresponds to the Depth–First Traversal (DFT)
order of the tree. Hence, feature labelled with ‘0’ in the tree is stored in the first position of the
array, feature with ‘1’ is stored the second position and so on. Each feature in the tree array is
defined as a two-tuple < PR, C > where PR is the type of relationship with its parent feature
309
APPENDIX H. GENERATION OF HARD FEATURE MODELS
Figure H.4: Encoding of a feature model
(M: Mandatory, Op: Optional, Or: Or-relationship, Alt: Alternative) and C is the number of
children of the given feature. As an example, first position in the tree array, < Op, 2 >, indicates that feature labelled with ‘0’ in the tree has an optional relationship with its parent feature
and has two child features (those labelled with ‘1’ and ‘3’). Analogously, each position in the
CTC array stores information about one constraint in the form < TC,O, D > where TC is the
type of constraint (R: Requires, E: Excludes) and O and D are the indexes of the origin and
destination features in the tree array respectively.
Selection. This step determines how the individuals of one generation are selected to be combined and produce new offspring. Selection strategies are generic and can be applied regardless of how the individuals are represented. In our algorithm, we experimented with both
rank-based roulette-wheel and binary tournament selection strategies obtaining positive results with both of them.
Crossover. These are the techniques used to combine chromosomes in some way and produce new individuals in an analogous way to biological reproduction. We tried two different
crossover techniques in our algorithm with positive results, one–point and uniform crossover.
Figure §H.5 depicts an example of the application of one–point crossover in our algorithm.
The process starts by selecting two parent chromosomes to be combined. For each array in the
chromosomes, the tree and CTC arrays, a random point is chosen (so–called crossover point).
Finally, the offspring is created by copying the content of the arrays from the beginning to the
crossover point from one parent and the rest from the other one. Notice that the characteristics
of our encoding guarantee a fixed size for the individuals.
Mutation. In this step, random changes are applied to the chromosomes to prevent the algorithm from getting stuck prematurely at a locally optimal solution. Mutation operators must be
310
H.3. ETHOM: AN EVOLUTIONARY ALGORITHM FOR FEATURE MODELS
Figure H.5: Example of one-point crossover in our algorithm
specifically designed for the type of encoding used. In our algorithm, we defined four different
types of custom mutation operators, namely:
• Operator 1. It changes randomly the type of a relationship in the tree array, e.g. from
mandatory,< M, 3 >, to optional,< Op, 3 >.
• Operator 2. It changes randomly the number of children of a feature in the tree, e.g. from
< M, 3 > to < M, 5 >. The new number of children is the range [0, BF ] where BF is the
maximum branching factor indicated as input.
• Operator 3. It changes the type of a cross-tree constraint in the CTC array, e.g. from
excludes < E, 3, 6 > to requires < R, 3, 6 >.
• Operator 4. It changes randomly (with equal probability) the origin or destination feature
of a constraint in the CTC array, e.g. from < E, 3, 6 > to requires < E, 1, 6 >. Origin and
destination features are ensured to be different.
These operators are applied randomly with the same probability.
Decoding. At this stage, the array-based chromosomes are translated back into feature models
in order to be evaluated. In our algorithm, we identified three types of patterns making a chromosome infeasible or semantically redundant, namely: i ) those encoding set relationships (orand alternative) with a single child feature (e.g. Figure §H.6(a)), ii ) those containing cross-tree
constraints between features with parental relationship (e.g. Figure §H.6(b)), and iii ) those containing features sharing contradictory or redundant cross-tree constraints (e.g. Figure §H.6(c)).
The specific approach used to address infeasible individuals, replacement or repairing, mainly
depend on the problem and it is ultimately up to the user.
Survival. Finally, the next population is created by including all the new offspring plus those
individuals from the previous generation that were selected for crossover but did not generate
descendants due to probability.
311
APPENDIX H. GENERATION OF HARD FEATURE MODELS
Figure H.6: Examples of infeasible individuals and repairs
H.3.1
Instantiation of the algorithm
In this section, we propose to model the problem of finding computationally–hard feature
models as an optimization problem and to solve it using an instantiation of our evolutionary
algorithm. We chose evolutionary computation because it has proved to be a robust search
technique suited for the complex search spaces and noisy objective functions used when dealing with non–functional properties [6]. Key benefit of our approach is that it takes into account
the characteristics of the tools under test trying to exploit its vulnerabilities. Also, our approach
is very generic being applicable to any automated operation on feature models, not only analyses, in which the quality (i.e. fitness) of the models can be measured quantitatively.
Next, we clarify the main aspects of the configuration of our algorithm:
• Fitness function. Our first attempt was to measure the execution time in milliseconds
invested by FaMa to perform the operation. However, we found that this was very inaccurate since the result of the function was deeply affected by the system load, i.e. it was
not deterministic. To solve this problem, we decided to measure the fitness of a feature
model as the number of backtracks produced by the analysis tool during its analysis. A
backtrack represents a partial candidate solution to a problem that is discarded because
it cannot be extended to a full valid solution [271]. In contrast to the execution time,
most CSP backtracking heuristics are deterministic. Together with execution time, the
number of backtracks is commonly used to measure the complexity of constraint satisfaction problems [271]. Thus, we may assume that the higher the number of backtracks
the longer the computation time.
• Infeasible individuals. We evaluated the effectiveness of both replacement and repairing techniques and we finally opted for the later. More specifically, we used the fol-
312
H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS
lowing repairing algorithm with infeasible individuals: i ) isolated set relationships are
converted into optional relationships (e.g. the model in Figure §H.6(a) is changed as in
Figure §H.6(d)), ii ) cross-tree constraints between features with parental relationships
are removed (e.g. the model in Figure §H.6(b) is changed as in Figure §H.6(e)), and iii )
two features cannot share more than one constraint (e.g. the model in Figure §H.6(c) is
changed as in Figure §H.6(f)).
• Stop criteria. There is not means of deciding when an optimum input has been found
and the evolutionary algorithm should be stopped [289]. Therefore, we decided to allow
the algorithm to continue for a given number of executions of the fitness function taking
the largest number of backtracks obtained as the optimum, i.e. solution to the problem.
H.4
E XPERIMENTS ON THE GENERATION HARD FEATURE
MODELS
In order to evaluate our approach, we developed a prototype implementation of our algorithm in Java. With the aim of finding a suitable tailoring and tuning of our algorithm, we
performed numerous executions of a sample optimization problem evaluating different combination of values for the variants and key parameters of the algorithm, presented in Tables
§H.1 and §H.2 respectively. The optimization problem was to find a feature model maximizing the execution time invested by the analysis tool when checking whether the model is void
(i.e., whether it represents at least one product). We chose this analysis operation because it
is currently the most quoted in the literature [26]. In particular, we looked for feature models
of different size maximizing execution time in the CSP solver JaCoP integrated into the FaMa
framework v1.0. We choose FaMa mainly because our familiarity with the tool.
Tailoring Point
Variants evaluated and selected
Selection strategy
Roulette-wheel, 2-Tournament
Crossover strategy
One-point, Uniform
Infeasible individuals Replacing, Repairing
Table H.1: Algorithm tailoring, experiment ETHOM #A1
Parameter
Values evaluated and selected
Crossover probability
0.7, 0.8, 0.9
Mutation probability
0.0075, 0.005, 0.02
Size initial population
50, 100, 200
#Executions fitness function 2000, 5000
Table H.2: ETHOM tuning, experiment ETHOM #A1
313
APPENDIX H. GENERATION OF HARD FEATURE MODELS
Underlined values were those providing better results and therefore those selected for the
final configuration of ETHOM. In total, we performed over 40 million executions of the objective function to find a good setup for our algorithm (taking into account experiments #A1 and
#A2).
Experiment #1(a): Maximizing execution time in a CSP Solver
In this experiment, we evaluated the ability of ETHOM to search input feature models maximizing the analysis time of a solver. In particular, we measured the execution time required
by a CSP solver to find out if the input model is consistent (i.e., whether it is void or not). This
was the same problem used to tune the configuration of ETHOM. Again, we chose the consistency operation because it is currently the most used in the literature. Next, we present the
experimental description of our experiment in SEDL4People and this experiment.
We define the effectiveness of our evolutionary program as the percentage of times (out of
20) in which the program found a better optimum than random models, i.e. a higher number of
backtracks. The effectiveness of evolutionary programming was over 90% in most of the cases
reaching 100% in six of them. Overall, our evolutionary program found harder feature models
than those generated randomly in 88.75% of the executions. We may remark that ETHOM revealed the lowest effectiveness with those models containing 10% of cross-tree constraints. This
is due to the simplicity of the analysis in these models. The number of backtracks produced by
these models was very low, zero in most of the cases, and thus our evolutionary program had
problems finding promising individuals that could evolve towards optimal solutions.
Table §H.3 depicts the evaluation results for the range of feature models with 20% of crosstree constraints. For each number of features and search technique, random and evolutionary,
the table shows the average and maximum fitness obtained as well as the average and maximum execution times of the hardest feature models found. The effectiveness of the evolutionary program is also presented in the last column. As illustrated, the evolutionary program
found feature models producing a number of backtracks larger by several orders of magnitude
than those produced by random models. The fitness of the hardest models generated using
our evolutionary approach was on average 1,950 times higher than that of random models
(108,579.13 backtracks against 55.63) and 2,450 times higher in the maximum value (6.6 million backtracks against 2,751). As expected, these results were also reflected in the execution
times. On average, the CSP solver invested 0.03 seconds to analyse the random models and
5.35 seconds to analyse those generated using our evolutionary generator. The superiority of
evolutionary search was especially remarkable in the maximum times ranging from the 0.23
seconds of random models to the 251.45 seconds (4.1 minutes) invested by the CSP solver to
analyse the hardest feature model generated by our evolutionary program. Overall, our evolutionary approach produced a harder feature model than random techniques in 94% of the
executions in the range of 20% of constraints.
314
H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS
#Feat.
200
400
600
800
1,000
Total
Random Testing
Avg Fitness Max Fitness Avg Time (s) Max Time (s)
6.45
15
0.01
0.01
13.20
37
0.02
0.02
29.50
223
0.03
0.06
53.95
304
0.05
0.06
175.05
2,751
0.07
0.23
55.63
2,751
0.03
0.23
Avg Fitness
310.25
8,028.85
8,765.65
346,217.95
179,572.95
108,579.13
Evolutionary Algorithm
Max Fitness Avg Time (s)
2,122
0.02
153,599
0.28
118,848
0.67
6,678,168
13.19
3,167,253
12.58
6,678,168
5.35
Max Time (s)
0.06
4.80
7.33
251.45
208.91
251.45
Effectiveness
90
95
100
95
90
94
Table H.3: Evaluation results on the generation of feature models maximizing execution time in a CSP solver
A global summary of the results is presented in Table §H.4. The table depicts the maximum
execution times invested by the CSP solver to analyse the hardest models found using random
and evolutionary search. The data show that our approach was more effective than random
models in all size ranges. The hardest random model required 0.23 seconds to be processed.
In contrast, our evolutionary approach found five models requiring between 1 and 4.2 minutes
to be analysed. Interestingly, our algorithm was able to find harder but significantly smaller
feature models (between 400 and 800 features) than the hardest random model found (1,000
features). This emphasizes the ability of our approach to generate motivating input models of
realistic size that reveal the vulnerabilities of tools and heuristics instead of just running them
using large random models.
#Feat.
200
400
600
800
1,000
10% CTC
Rand. Time (s) EA Time (s)
0.01
0.05
0.07
0.22
0.05
1.78
0.05
62.30
0.10
3.43
20% CTC
Rand. Time (s) EA Time (s)
0.01
0.06
0.02
4.80
0.06
7.33
0.06
251.45
0.23
208.91
30% CTC
Rand. Time (s) EA Time (s)
0.02
0.14
0.02
1.02
0.03
4.79
0.05
250.95
0.07
84.99
40% CTC
Rand. Time (s) EA Time (s)
0.01
0.02
0.02
0.10
0.03
7.25
0.05
0.35
0.06
0.49
Table H.4: Maximum execution times produced by random models and our evolutionary program.
H.4.1
Experiment #1(b): Maximizing execution time in a SAT Solver
The only difference between this experiment an the experiment #1(a) is the solver used for
the analysis (Parameter Solver: ’FAMA-SAT’). Consequently, we omit the experimental description. Regarding the results, the differences in the execution times obtained using random and
evolutionary techniques were not significant. This finding supports the results of Mendoca
et al. [187] that show that checking the consistency of feature models with simple cross-tree
constraints (i.e. those involving three features or less) using SAT solvers is highly efficient. We
emphasize, however, that SAT solvers are not the optimum solution for all the analyses that
can be performed on a feature model [26]. Previous studies shows that CSP and BDD solvers
are often a better alternative than SAT solvers and therefore experiments with these and others
solvers are still necessary to study their applicability. The complete set of results summarized
315
APPENDIX H. GENERATION OF HARD FEATURE MODELS
in tables and figures showing the effectiveness of ETHOM for this experiment are available in
[246].
H.4.2
Experiment #2: Maximizing memory consumption in a BDD
solver
In this experiment, we evaluated the ability of our evolutionary program to generate input
feature models maximizing the memory consumption of a solver. In particular, we measured
the memory consumed by a BDD solver when finding out the number of products represented
by the model. We chose this analysis operation because it one of the hardest operations in terms
of complexity and it is currently the second operation most quoted in the literature [26]. We
decided to use a BDD-based reasoner for this experiment since it has proved to be the most
efficient option to perform this operation [26].
Although it is possible to find a good variable ordering that reduces the size of the BDD,
the problem of finding the best variable ordering remains NP-complete
Table §H.5 depicts the number of BDD nodes of the hardest feature models found using
random techniques and our evolutionary program. For each size range, the table also shows
the computation time (BDD building time + execution time) invested by SPLOT to analyse
the model. As illustrated, our evolutionary program found better results than random techniques in all size ranges. On average, the BDD size found by our evolutionary approach was
between 2 and 12.5 times higher than those obtained with random models. The largest BDD
generated from random models had 25.3 million nodes while the largest BDD obtained using
our evolutionary program had 27.9 million nodes. The results suggest, however, that the maximum found by evolutionary search would be much higher if we would not have limited the
improvement factor in the range of 250 features (30% constraints) to make the experiment affordable. As expected, the superiority of our evolutionary program was also observed in the
computation times required by each model to be compiled and analysed. This suggest that our
approach can also deal with optimization criteria involving compilation time. Overall, our evolutionary program found feature models producing higher memory consumption than random
models in 99.3% of the executions.
#Feat.
50
100
150
200
250
10% CTC
Random
Evolutionary
BDD size Time (s) BDD Size Time (s)
781
0
1,963
0
7,629
0.01
20,077
0.02
65,627
0.10
188,985
0.31
203,041
0.09
924,832
0.86
1,720,983
3.69
7,170,121
25.94
20% CTC
Random
Evolutionary
BDD size Time (s) BDD Size Time (s)
2,074
0
8,252
0.01
33,522
0.03
161,157
0.20
374,675
0.91
3,060,590
12.80
2,735,005
4.34
19,698,780
75.05
25,392,597
82.28
27,970,630 253.32
30% CTC
Random
Evolutionary
BDD size Time (s) BDD Size Time (s)
2,455
0.01
10,992
0.01
95,587
0.08
419,835
0.73
673,410
1.28
11,221,303 24.22
3,394,435
58.22
23,398,161 380.52
20,579,015 343.72
22,310,416 431.62
Table H.5: BDD size and computation time of the hardest feature models found
316
H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS
Figure §H.7 shows the frequency with which each fitness value was found during the search
of a feature model producing the largest BDD. The data presented corresponds to the hardest
feature models generated in the range of 50 features and 10% of cross-tree constraints. We chose
this size range because it produced the smallest BDD sizes and facilitated the comparison of the
results of both techniques using the same scale. For random models (Figure §H.7(a)), a narrow
Gaussian-like curve is obtained with more than 99% of the executions producing fitness values
under 300 BDD nodes. During evolutionary execution (Figure §H.7(b)), however, a wider curve
is obtained with 39% of the execution producing values over 300 nodes. Both histograms clearly
show how evolutionary programming performed a more exhaustive search in a larger portion
of the solution space than that explored by random models. This trend was also observed in
the rest of size ranges.
During this experiment, we found that the fitness function was not deterministic, that is,
different executions with the same input feature model produced different number of BDD
nodes. We found, however, that the variations in the number of nodes were small and did not
affect the effectiveness of our evolutionary program.
H.4.3
Experiment #3(a): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with JaCoP)
During the work with ETHOM, we detected that the maximum number of generations
used as stop criterion had a great impact in the results of the algorithm. In this experiment, we
evaluated that impact with a double aim. First, we tried to find out the minimum number of
generations required by ETHOM to offer better results than random techniques on the search
for hard feature models. Second, we wanted to find out whether ETHOM was able to find even
harder models than in our previous experiments when allowed to run for a large number of
generations. Next, we present the experimental description of our experiment in SEDL4People
and its results.
For each number of generations (i.e., stop criterion), the maximum fitness and the effectiveness of both random and evolutionary search are presented. The results revealed that the
effectiveness of ETHOM was around 96% (CSP solver) and 100% (BDD solver) when the number of generations was 25 or higher. More importantly, we found that the results provided by
evolutionary search were better and better as the number of generations was increased without
reaching a clear peak meanwhile the results of random search showed little or no improvement
at all. In the execution with the CSP solver, ETHOM produced a new maximum fitness of more
than 77 million backtracks meanwhile random search found a maximum value of only 1,603
backtracks.
317
APPENDIX H. GENERATION OF HARD FEATURE MODELS
(a) Distribution of fitness values for random models
(b)
Figure H.7: Distribution of fitness values for random and evolutionary search
318
H.4. EXPERIMENTS ON THE GENERATION HARD FEATURE MODELS
H.4.4
Experiment #3(b): Evaluating the impact of the number of generations on the effectiveness of ETHOM (with SAT)
The only difference between this experiment an the experiment #1(a) is the solver used for
the analysis (Parameter Solver: Parameter Solver: ’SPLOT-BDD’ and the analysis operation was
the number of products). Consequently, we omit the experimental description. In a very similar
way as in the previous experiment, the maximum random fitness produced in experiment #3(b)
was 89,779 nodes, far from the best fitness obtained by our evolutionary program, 22.7 million
nodes. Finally, we may emphasize that the maximum number of BDD nodes found by ETHOM
in the range of 125 generations (22.2 million nodes) was 120 times higher than the maximum
obtained when using 25 generations as stop criterion (185,203 nodes). This shows the power of
ETHOM when it is allowed to run for a long number of generations.
H.4.5
Experiment #4: Evaluating the impact of the Heuristics of JaCoP
In this experiment we checked whether the hard feature models generated by our evolutionary approach were also hard for solvers using other heuristics. In particular, we repeated
the analysis of the hardest feature models found in experiment #1 using the other seven heuristics available in the CSP solver JaCoP. The results revealed that the hardest feature models
found in our experiment, using the heuristic MostConstrainedDynamic, were trivially solved by
some of the others heuristics. This finding supports our working hypothesis: feature models
that are hard to analyse by one tool or technique could be trivially processed by others and viceversa. Hence, we conclude that using standard set of problems, random or not, is therefore not
sufficient for a full evaluation of the performance of different tools. Instead, as in our approach,
the characteristics of the techniques and tools under evaluation must be carefully examined to
identify their strengths and weaknesses providing helpful information for both users and developers. Next, we present the experimental description of our experiment in SEDL4People
and its results.
The results revealed that the hardest feature models found in our experiment, using the
heuristic MostConstrainedDynamic, were trivially solved by some of the others heuristics. This
finding supports our working hypothesis: feature models that are hard to analyse by one tool
or technique could be trivially processed by others and vice-versa. Hence, we conclude that
using standard set of problems, random or not, is therefore not sufficient for a full evaluation
of the performance of different tools. Instead, as in our approach, the characteristics of the
techniques and tools under evaluation must be carefully examined to identify their strengths
and weaknesses providing helpful information for both users and developers.
319
APPENDIX H. GENERATION OF HARD FEATURE MODELS
H.5
T HREATS TO VALIDITY
We briefly discuss the threats to validity of our work:
• Experimental procedure. In order to ensure validity of the experimental approach, experiments were performed in a randomized order on the same computer and were replicated 25 times for each experimental configuration. Additionally, the results were formally validated by means of statistical tests that clearly showed the superiority of our
algorithm when compared to random search. More specifically, as detailed in [246],
Wilcoxon tests were performed on the results obtained with random and evolutionary
search.
• Limitations of the approach. Experiments showed no significant improvements when
using our algorithm with problems of low complexity, i.e., feature models with 10% of
constraints in Experiment #1. This limitation is due to the extremely flat shape of fitness
landscape found in simple problems in which most fitness values are equal or close to
zero. Another limitation of the experimental approach is that experiments for extremely
hard feature models become unfeasible. We may remark, however, that this limitation is
intrinsic to the problem of looking hard feature models and thus it also affects to random
search. Finally, we emphasize that in the worst case our algorithm behave randomly
equalling the strategies for the generation of hard feature models used in the current
state of the art.
• Generalizability of the conclusions. In our experiments, we used two different analysis
operations which could seem not sufficient to generalize the conclusions of our study.
We remark, however, that these operations are currently the most quoted in the literature, have significantly different complexity and, more importantly, are the base for the
implementation of many other analysis operations on feature models [26]. Thus, feature models that are hard to analyse for these operations would certainly be hard to
analyse by those operations that use them as an auxiliary function making our results
extensible to other analyses. Similarly, we just used two different analysis tools for the
experiments, FaMa and SPLOT. We remark, however, that these tools are developed and
maintained by independent laboratories providing the sufficient degree of heterogeneity
for our study. Finally, the results obtained reveal that the shape and properties of the
hard feature models generated are similar to those found in the literature and therefore
there is no threat to validity due to the lack of realism of the generated models.
320
I
E VIDENCES OF U TILITY AND
A PPLICABILITY
I.1
U TILITY OF THE C OMPARATIVE F RAMEWORK FOR MOF S
During the elaboration of the Comparative Framework of MOFs, we gathered letters from
MOFs authors’ stating the usefullness of the CF for making decissions about the planification
features to include in their next releases.
Figure I.1: Support letter from University of Tubingen
321
APPENDIX I. EVIDENCES OF UTILITY AND APPLICABILITY
Figure I.2: Support letter from Univeristy of Applied Science of Upper Austria
322
I.2. UTILITY OF MOSES[RI]
I.2
U TILITY OF MOSES[RI]
I.2.1
Utility of FOM
FOM has been used along almost a decade for solving optimization problems. Unfortunately, the number of downloads and active users has not beet tracked along these years.
However, ther is evidence of its use solve traffic planning problems in industry. Additionally
we attach an expression of interes from a local engineering company.
Figure I.3: Expression of interest on FOM from ISOIN
I.2.2
Utility of STATService
Since STATService has a web interface it is possible to track the use of the appplication.
Figures §I.2.2, §I.2.2, show distribution of visits along time and space repectively.
323
APPENDIX I. EVIDENCES OF UTILITY AND APPLICABILITY
Go to this report
http://labs.isa.us.es:8080/statservice ­ http://labs.isa.us.es:80…
labs.isa.us.es:8080/statservice [DEFAULT]
Dec 1, 2011 ­ Jul 8, 2013
Audience Overview
% of visits: 100.00%
Overview
Visits
100
50
January 2012
April 2012
July 2012
October 2012
January 2013
April 2013
Ju...
1,451 people visited this site
New Visitor
Visits
2,391
Returning Visitor
Unique Visitors
1,451
39.8%
Pageviews
6,633
Avg. Visit Duration
00:03:03
Pages / Visit
60.2%
2.77
Bounce Rate
69.76%
% New Visits
60.10%
Country / Territory
Visits
1. Spain
% Visits
1,180
49.35%
2. United States
166
6.94%
3. Italy
156
6.52%
4. India
79
3.30%
5. United Kingdom
58
2.43%
6. Brazil
43
1.80%
7. Indonesia
40
1.67%
8. Australia
35
1.46%
9. Malaysia
32
1.34%
10. France
26
1.09%
view full report
© 2013 Google
Figure I.4: Timeline of individual visitors to the STATService web portal
324
I.2. UTILITY OF MOSES[RI]
Go to this report
http://labs.isa.us.es:8080/statservice ­ http://labs.isa.us.es:80…
labs.isa.us.es:8080/statservice [DEFAULT]
Nov 1, 2011 ­ Jul 8, 2013
Location
% of visits: 100.00%
Map Overlay
Site Usage
1
Country / Territory
1,268
Visits
Pages / Visit
Avg. Visit Duration
% New Visits
Bounce Rate
2,483
2.77
00:03:02
60.69%
70.16%
% of Total: 100.00% (2,483)
Site Avg: 2.77 (0.00%)
Site Avg: 00:03:02 (0.00%)
Site Avg: 60.61% (0.13%)
Site Avg: 70.16% (0.00%)
1.
Spain
1,268
3.46
00:04:09
36.44%
64.12%
2.
United States
166
1.40
00:00:31
98.19%
78.31%
3.
Italy
157
5.76
00:08:17
38.85%
45.22%
4.
India
80
1.32
00:00:30
97.50%
88.75%
5.
United Kingdom
58
1.24
00:00:37
98.28%
87.93%
6.
Brazil
44
1.16
00:00:29
52.27%
93.18%
7.
Indonesia
40
1.28
00:00:21
100.00%
77.50%
8.
Australia
35
1.29
00:00:51
82.86%
94.29%
9.
Malaysia
32
1.38
00:00:18
96.88%
78.12%
France
26
1.23
00:00:33
100.00%
10.
80.77%
Rows 1 ­ 10 of 96
© 2013 Google
Figure I.5: Map of visits to the STATService web portal
I.2.3
Utility of EEE
E3 is currently in a beta version, and has not been published except for testing purposes
and validation of the contributions of this dissertation.
325
J
A CRONYMS
EEE
ETHOM
Experiment Execution Environment.
Evolutionary algoriTHm for Optimized feature
Models.
FOM
Framework for Metaheuristic Optimization.
MIaMOE
Minimum Information about Metaheuristic Optimization Experiments.
Metaheuristic Optimization Experiment.
Metaheuristic Optimization Experiments Description Language.
Metaheuristic Optimization Framework.
Metaheuristic Optimization Software EcoSystem.
Reference Implementation of the Core of
MOSES.
Metaheuristic Problem Solving.
MOE
MOEDL
MOF
MOSES
MOSES[RI]
MPS
QoS-Gasp
QoSWSCB
QoS-aware GRASP+PR algorithm for servicebased applications binding.
QoS-ware Web Service Composition Binding.
SEA
SEDL
Scientific Experiment Archive.
Scientific Experiments Description Language.
327
Acronyms
328
B IBLIOGRAPHY
[1] Tsplib benchmar library. accesible at: http://www.iwr.uniheidelberg.de/iwr/comopt/soft/
TSPLIB95/TSPLIB.html. (page 160).
[2] The r project. GNU project, 2013. URL http://www.r-project.org/. (page 86).
[3] E. Aarts and J. Lenstra. Local Search in Combinatorial Optimization. Wiley, 1997. (pages
102, 108).
[4] P. Achinstein. Scientific Evidence: Philosophical Theories and Applications. Scientific Evidence. Johns Hopkins University Press, 2005. ISBN 9780801881183. URL http://books.
google.es/books?id=xTN7NGO52OoC. (page 253).
[5] D. H. Ackley. A connectionist machine for genetic hillclimbing. Kluwer Academic Publishers,
Norwell, MA, USA, 1987. ISBN 0-89838-236-X. (pages 34, 109).
[6] W. Afzal, R. Torkar, and R. Feldt. A systematic review of search-based testing for nonfunctional system properties. Information and Software Technology, 51(6):957–976, 2009.
ISSN 0950-5849. doi: 10.1016/j.infsof.2008.12.005. (pages 306, 312).
[7] R. Aggarwal, K. Verma, J. Miller, and W. Milnor. Constraint driven web service composition in meteor-s. In SCC ’04: Proceedings of the 2004 IEEE International Conference on
Services Computing, pages 23–30, Washington, DC, USA, 2004. IEEE Computer Society.
ISBN 0-7695-2225-4. (page 286).
[8] E. Alba and J. F. Chicano. Software project management with gas. Information Sciences, 177(11):2380 – 2401, 2007. ISSN 0020-0255. doi: DOI:10.1016/j.ins.2006.12.
020. URL http://www.sciencedirect.com/science/article/B6V0C-4MTK976-2/2/
8570bc12b346047bd32fed96dc473c3c. (page 98).
[9] B. Andresen and J. M. Gordon. Constant thermodynamic speed for minimizing entropy
production in thermodynamic processes and simulated annealing. Phys. Rev. E, 50(6):
4346–4351, Dec 1994. doi: 10.1103/PhysRevE.50.4346. (page 103).
[10] T. Andrews, F. Curbera, H. Dholakia, Y. Goland, J. Klein, F. Leymann, K. Liu, D. Roller,
D. Smith, S. Thatte, I. Trickovic, and S. Weerawarana. BPEL4WS specification, 2003.
(page 279).
329
BIBLIOGRAPHY
[11] P. J. Angeline, D. B. Fogel, and L. J. Fogel. A comparison of self-adaptation methods for
finite state machines in a dynamic environment. In Proc. 5th Ann. Conf. on Evolutionary
Programming, pages 441–450, 1996. (page 110).
[12] Apache. The statistical package of the apache commons math library. http://commons.
apache.org/math/userguide/stat.html, 2011. URL l. (pages 87, 90, 196).
[13] J. Arabas, Z. Michalewicz, and J. Mulawka. Gavaps-a genetic algorithm with varying
population size. In Evolutionary Computation, 1994. IEEE World Congress on Computational
Intelligence., Proceedings of the First IEEE Conference on, pages 73–78 vol.1, Jun 1994. doi:
10.1109/ICEC.1994.350039. (page 104).
[14] D. Ardagna and B. Pernici. Global and local qos guarantee in web service selection. In
Business Process Management Workshops, pages 32–46, 2005. (pages 280, 286, 287).
[15] D. Ardagna and B. Pernici. Adaptive service composition in flexible processes. Software
Engineering, IEEE Transactions on, 33(6):369–384, 2007. (pages 280, 283, 284, 285, 286, 288).
[16] T. Back, D. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. IOP
Publishing Ltd., Bristol, UK, UK, 1997. ISBN 0750303921. (page 309).
[17] T. Back, D. B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation.
IOP Publishing Ltd., Bristol, UK, UK, 1997. ISBN 0750303921. URL http://portal.acm.
org/citation.cfm?id=548530. (pages 32, 33, 34, 102, 104, 108, 109, 110).
[18] J. Baker. Reducing bias and inefficiency in the selection algorithm. In Proceedings of the
Second International Conference on Genetic Algorithms, pages 14–21, 1987. (page 112).
[19] P. Balaprakash, M. Birattari, and T. Stützle. Improvement strategies for the f-race algorithm: sampling design and iterative refinement. In Proceedings of the 4th international conference on Hybrid metaheuristics, HM’07, pages 108–122, Berlin, Heidelberg, 2007. SpringerVerlag. ISBN 3-540-75513-6, 978-3-540-75513-5. URL http://dl.acm.org/citation.
cfm?id=1777124.1777133. (pages 6, 75, 76).
[20] R. Barga, J. Jackson, N. Araujo, D. Guo, N. Gautam, and Y. Simmhan. The trident scientific workflow workbench. In eScience, 2008. eScience ’08. IEEE Fourth International Conference on, pages 317–318, 2008. doi: 10.1109/eScience.2008.126. (page 85).
[21] T. Bartz-Beielstein. Experimental Research in Evolutionary Computation: The New Experimentalism (Natural Computing Series). Springer, 1 edition, Apr. 2006. ISBN 3540320261. (pages
10, 73, 76, 82, 158, 169, 231, 252, 273).
330
BIBLIOGRAPHY
[22] T. Bartz-Beielstein and M. Preuss. Experimental research in evolutionary computation.
In Proceedings of the 2007 GECCO conference companion on Genetic and evolutionary computation, GECCO ’07, pages 3001–3020, New York, NY, USA, 2007. ACM. ISBN 978-1-59593698-1. doi: 10.1145/1274000.1274102. URL http://doi.acm.org/10.1145/1274000.
1274102. (pages 5, 6, 75).
[23] T. Bartz-Beielstein and M. Preuss. Tuning and experimental analysis in evolutionary
computation: what we still have wrong. In GECCO (Companion), pages 2625–2646, 2010.
(pages 158, 169, 252).
[24] T. Bartz-Beielstein, M. Chiarandini, and L. Paquete. Experimental Methods for the Analysis of Optimization Algorithms. SpringerLink: Springer e-Books. Springer, 2010. ISBN
9783642025389. URL http://books.google.es/books?id=UXogQWx8_HEC. (page 76).
[25] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Addison Wesley, 3
edition, 2013. (page 186).
[26] D. Benavides, S. Segura, and A. Ruiz-Cortés. Automated analysis of feature models 20
years later: A literature review. Information Systems, 35(6):615 – 636, 2010. ISSN 0306-4379.
doi: 10.1016/j.is.2010.01.001. (pages 213, 215, 219, 313, 315, 316, 320).
[27] R. Berbner, M. Spahn, N. Repp, O. Heckmann, and R. Steinmetz. Heuristics for qosaware web service composition. Web Services, 2006. ICWS ’06. International Conference on,
pages 72–82, Sept. 2006. (page 280).
[28] A. J. Bertie. Java applications for teaching statistics. MSOR Connections, 2(3):28–81, 2002.
(page 87).
[29] M. Birattari, T. Stützle, L. Paquete, and K. Varrentrapp. A racing algorithm for configuring metaheuristics. In Proceedings of the Genetic and Evolutionary Computation Conference,
GECCO ’02, pages 11–18, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers
Inc. ISBN 1-55860-878-8. URL http://dl.acm.org/citation.cfm?id=646205.682291.
(pages 6, 75, 76).
[30] M. Birattari, M. Zlochin, and M. Dorigo. Towards a theory of practice in metaheuristics
design: A machine learning perspective. RAIRO-Theoretical Informatics and Applications,
40(02):353–369, 2006. (pages 158, 169).
[31] M. Birgmeier. Evolutionary programming for the optimization of trellis-coded modulation schemes. In Proc. 5th Ann. Conf. on Evolutionary Programming, 1996. (page 110).
[32] J. L. Blanton, Jr. and R. L. Wainwright. Multiple vehicle routing with time and capacity
constraints using genetic algorithms. In Proceedings of the 5th International Conference on
331
BIBLIOGRAPHY
Genetic Algorithms, pages 452–459, San Francisco, CA, USA, 1993. Morgan Kaufmann
Publishers Inc. ISBN 1-55860-299-2. (page 109).
[33] C. I. Bliss. The calculation of the dosage-mortality curve. Annals of Applied Biology, 22
(1):134–167, 1935. ISSN 1744-7348. doi: 10.1111/j.1744-7348.1935.tb07713.x. URL http:
//dx.doi.org/10.1111/j.1744-7348.1935.tb07713.x. (page 141).
[34] C. Blum, M. J. B. Aguilera, A. Roli, and M. Sampels. Hybrid Metaheuristics: An Emerging
Approach to Optimization. Springer Publishing Company, Incorporated, 1st edition, 2008.
ISBN 354078294X, 9783540782940. (page 27).
[35] P. A. Bonatti and P. Festa. On optimal service selection. In WWW ’05: Proceedings of the
14th international conference on World Wide Web, pages 530–538, New York, NY, USA, 2005.
ACM. ISBN 1-59593-046-9. (page 280).
[36] J. Bosch. Architecture challenges for software ecosystems. In Proceedings of the Fourth
European Conference on Software Architecture: Companion Volume, pages 93–95. ACM, 2010.
(page 187).
[37] G. E. Box and K. Wilson. On the experimental attainment of optimum conditions. Journal
of the Royal Statistical Society. Series B (Methodological), 13(1):1–45, 1951. (page 76).
[38] A. Brindle. Genetic Algorithms for Function Optimization. PhD thesis, University of Alberta,
Edmonton, 1981. (page 112).
[39] A. W. Brown and K. C. Wallnau. A framework for evaluating software technology. IEEE
Softw., 13(5):39–49, 1996. ISSN 0740-7459. doi: http://dx.doi.org/10.1109/52.536457.
(page 94).
[40] J. Brownlee. Oat: The optimization algorithm toolkit. Technical report, Complex Intelligent Systems Laboratory, Swinburne University of Technology, 2007. (page 98).
[41] S. Cahon, N. Melab, and E.-G. Talbi. Paradiseo: A framework for the reusable design of
parallel and distributed metaheuristics. Journal of Heuristics, 10(3):357–380, 2004. ISSN
1381-1231. doi: http://dx.doi.org/10.1023/B:HEUR.0000026900.92269.ec. (pages 10, 83,
98, 116, 182).
[42] G. Canfora, M. D. Penta, R. Esposito, and M. Villani. Qos-aware replanning of composite
web services. Web Services, 2005. ICWS 2005. Proceedings. 2005 IEEE International Conference on, 1:121–129, 11-15 July 2005. doi: 10.1109/ICWS.2005.96. (pages 280, 287, 288,
291).
[43] G. Canfora, M. D. Penta, R. Esposito, and M. L. Villani. An approach for qos-aware
service composition based on genetic algorithms. In GECCO ’05: Proceedings of the 2005
332
BIBLIOGRAPHY
conference on Genetic and evolutionary computation, pages 1069–1075, New York, NY, USA,
2005. ACM. ISBN 1-59593-010-8. (pages 34, 207, 283, 284, 288, 299).
[44] G. Canfora, M. D. Penta, R. Esposito, and M. L. Villani. A framework for qos-aware
binding and re-binding of composite web services. Journal of Systems and Software, 81(10):
1754–1769, 2008. (pages 280, 282, 284, 285).
[45] V. Cardellini, E. Casalicchio, V. Grassi, and F. L. Presti. Efficient provisioning of service
level agreements for service oriented applications. In IW-SOSWE ’07: 2nd international
workshop on Service oriented software engineering, pages 29–35, New York, NY, USA, 2007.
ACM. ISBN 9781595937230. URL http://portal.acm.org/citation.cfm?id=1294936.
(page 286).
[46] J. Cardoso, A. Sheth, J. Miller, J. Arnold, and K. Kochut. Quality of service for workflows
and web service processes. Web Semantics: Science, Services and Agents on the World Wide
Web, 1(3):281–308, April 2004. doi: 10.1016/j.websem.2004.03.001. (page 285).
[47] K. Chakhlevitch and P. Cowling. Hyperheuristics: Recent developments. In Adaptive and
Multilevel Metaheuristics, pages 3–29. 2008. (page 115).
[48] A. F. Chalmers. What Is This Thing Called Science? Hackett Pub Co, 3 edition, Jan. 1999.
ISBN 0702230936. (page 53).
[49] A. Chatterjee and P. Siarry. Nonlinear inertia weight variation for dynamic adaptation
in particle swarm optimization. Computers and Operations Research, 33(3):859 – 871, 2006.
ISSN 0305-0548. (page 38).
[50] M. Chiarandini, L. Paquete, M. Preuss, and E. Ridge. Experiments on metaheuristics:
methodological overview and open issues. Technical Report DMF-2007-03-003, The Danish Mathematical Society, 2007. (pages 5, 6, 75).
[51] A. Chu, J. Cui, and I. D. Dinov. Socr analyses: Implementation and demonstration of a
new graphical statistics educational toolkit. Journal of Statistical Software, 30(3):1–19, April
2009. (page 87).
[52] D. Claro, P. Albers, and J. Hao. Selecting web services for optimal composition. In Proc.
Int. Conf. Web Services (ICWS 05), 2005. (page 287).
[53] P. Clements, D. Garlan, L. Bass, J. Stafford, R. Nord, J. Ivers, and R. Little. Documenting
Software Architectures: Views and Beyond. Pearson Education, 2010. ISBN 0201703726.
(pages 183, 186).
[54] M. Clerc. Particle Swarm Optimization. ISTE Publishing Company, February 2006. ISBN
1905209045. (pages 37, 102).
333
BIBLIOGRAPHY
[55] D. Comes, H. Baraki, R. Reichle, M. Zapf, and K. Geihs. Heuristic approaches for qosbased service selection. Service-Oriented Computing, 6470:441–455, 2010. (page 287).
[56] D. Corne, J. D. Knowles, and M. J. Oates. The pareto envelope-based selection algorithm
for multi-objective optimisation. In PPSN VI: Proceedings of the 6th International Conference on Parallel Problem Solving from Nature, pages 839–848, London, UK, 2000. SpringerVerlag. ISBN 3-540-41056-2. (page 105).
[57] P. I. Cowling, G. Kendall, and E. Soubeiga. Hyperheuristics: A tool for rapid prototyping in scheduling and optimisation. In Proceedings of the Applications of Evolutionary
Computing on EvoWorkshops 2002, pages 1–10, London, UK, 2002. Springer-Verlag. ISBN
3-540-43432-1. (page 115).
[58] N. L. Cramer. A representation for the adaptive generation of simple sequential programs. In Proceedings of the 1st International Conference on Genetic Algorithms, pages 183–
187, Hillsdale, NJ, USA, 1985. L. Erlbaum Associates Inc. ISBN 0-8058-0426-9. (page
110).
[59] H. Cremér. Mathematical Methods of Statistics (PMS-9), volume 9. Princeton university
press, 1999. (page 144).
[60] L. J. Cronbach and K. Shapiro. Designing evaluations of educational and social programs.
Taylor & Francis, 1983. (page 72).
[61] L. J. Cronbach, S. R. Ambron, S. M. Dornbusch, R. D. Hess, R. C. Hornik, D. Phillips, D. F.
Walker, and S. S. Weiner. Toward reform of program evaluation. Jossey-Bass Publishers San
Francisco, 1980. (page 72).
[62] J. W. Daly, A. Brooks, J. Miller, M. Roper, and M. Wood. Verification of results in software maintenance through external replication. In H. A. Müller and M. Georges, editors,
ICSM, pages 50–57. IEEE Computer Society, 1994. ISBN 0-8186-6330-8. (page 10).
[63] L. Davis. Applying adaptive algorithms to epistatic domains. In IJCAI’85: Proceedings of
the 9th international joint conference on Artificial intelligence, pages 162–164, San Francisco,
CA, USA, 1985. Morgan Kaufmann Publishers Inc. ISBN 0-934613-02-8, 978-0-934-613026. (page 109).
[64] L. Davis. Adapting operator probabilities in genetic algorithms. In Proceedings of the third
international conference on Genetic algorithms, pages 61–69, San Francisco, CA, USA, 1989.
Morgan Kaufmann Publishers Inc. ISBN 1-55860-006-3. (page 110).
[65] L. de Castro and F. Von Zuben. Learning and optimization using the clonal selection
principle. Evolutionary Computation, IEEE Transactions on, 6(3):239–251, Jun 2002. ISSN
1089-778X. doi: 10.1109/TEVC.2002.1011539. (page 105).
334
BIBLIOGRAPHY
[66] M. C. de Souza and P. Martins. Skewed vns enclosing second order algorithm for
the degree constrained minimum spanning tree problem. European Journal of Operational Research, 191(3):677 – 690, 2008. ISSN 0377-2217. doi: DOI:10.1016/j.ejor.2006.
12.061. URL http://www.sciencedirect.com/science/article/B6VCT-4N2KTC4-7/
2/7799160d76fbba32ad42f719ee72bbf9. (page 103).
[67] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic
algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002. (page
105).
[68] J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine
Learning Research, 7:130, 2006. (page 61).
[69] J. Derrac, S. GarcÃa, D. Molina, and F. Herrera. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm
intelligence algorithms. Swarm and Evolutionary Computation, 1(1):3 – 18, 2011. ISSN
2210-6502. doi: 10.1016/j.swevo.2011.02.002. URL http://www.sciencedirect.com/
science/article/pii/S2210650211000034. (pages 61, 63, 70, 71, 75, 77, 87, 154, 169,
172, 176, 196).
[70] G. Development and R. Rosenthal. GAMS: A User’s Guide. Books on Demand, 2006.
ISBN 9783833435089. URL http://books.google.es/books?id=253PPQAACAAJ. (pages
85, 177).
[71] L. Di Gaspero and A. Schaerf. Easylocal++: An object-oriented framework for flexible
design of local search algorithms. Software — Practice & Experience, 33(8):733–765, July
2003. doi: 10.1002/spe.524. (pages 10, 83, 98).
[72] M. Dorigo and G. Di Caro. The ant colony optimization meta-heuristic. New ideas in
optimization, pages 11–32, 1999. (page 26).
[73] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach
to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):
53–66, April 1997. URL http://citeseer.ist.psu.edu/115428.html. (pages 39, 105).
[74] S. Dower. Specifying ant system with esdl. Technical report, Technical Report TR/CIS/2010/2, Swinburne University of Technology, 2010. (page 85).
[75] S. Dower and C. Woodward. Specifying particle swarm optimisation with esdl. Technical report, Technical Report TR/CIS/2010/5, Swinburne University of Technology, 2010.
(page 85).
335
BIBLIOGRAPHY
[76] S. Dower and C. J. Woodward. Esdl: a simple description language for populationbased evolutionary computation. In Proceedings of the 13th annual conference on Genetic
and evolutionary computation, GECCO ’11, pages 1045–1052, New York, NY, USA, 2011.
ACM. ISBN 978-1-4503-0557-0. doi: 10.1145/2001576.2001718. URL http://doi.acm.
org/10.1145/2001576.2001718. (page 85).
[77] Dreo, A. Pétrowski, P. Siarry, and E. Taillard. Metaheuristics for Hard Optimization: Methods
and Case Studies. Springer, December 2005. ISBN 354023022X. (page 105).
[78] J. Dreo, A. Petrowski, and E. Taillard. Metaheuristics for Hard Optimization. Springer, 2003.
(pages 25, 291).
[79] O. J. Dunn. Multiple comparisons among means. Journal of the American Statistical Association, 56:52–64, 1961. (page 271).
[80] A. Eiben and M. Jelasity. A critical note on experimental research methodology in ec.
Computational Intelligence, Proceedings of the World on Congress on, 1:582–587, 2002. (pages
87, 273).
[81] A. E. Eiben, P.-E. Raué, and Z. Ruttkay. Genetic algorithms with multi-parent recombination. In PPSN III: Proceedings of the International Conference on Evolutionary Computation.
The Third Conference on Parallel Problem Solving from Nature, pages 78–87, London, UK,
1994. Springer-Verlag. ISBN 3-540-58484-6. (page 109).
[82] Elsevier.
The executable papers grand challenge, 2011.
executablepapers.com/. (page 8).
URL http://www.
[83] T. Erl, A. Karmarkar, P. Walmsley, H. Haas, L. U. Yalcinalp, K. Liu, D. Orchard, A. Tost,
and J. Pasley. Web Service Contract Design and Versioning for SOA. Prentice Hall PTR,
Upper Saddle River, NJ, USA, 1 edition, 2009. ISBN 9780136135173. (page 187).
[84] L. J. Eshelman. The chc adaptive search algorithm : How to have safe search when engaging in nontraditional genetic recombination. Foundations of Genetic Algorithms, pages
265–283, 1991. URL http://ci.nii.ac.jp/naid/10000024547/en/. (page 109).
[85] L. J. Eshelman and J. D. Schaffer. Real-coded genetic algorithms and interval-schemata.
In D. L. Whitley, editor, Foundation of Genetic Algorithms 2, pages 187–202, San Mateo, CA,
1993. Morgan Kaufmann. (page 109).
[86] L. J. Eshelman, R. A. Caruana, and J. D. Schaffer. Biases in the crossover landscape.
In Proceedings of the third international conference on Genetic algorithms, pages 10–19, San
Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-006-3. (page
109).
336
BIBLIOGRAPHY
[87] S. Exchange. The reproducibility initiative, 2011. URL https://www.scienceexchange.
com/reproducibility. (pages 8, 86, 156, 192).
[88] N. E. Fenton and S. L. Pfleeger. Software Metrics: A Rigorous and Practical Approach. PWS
Publishing Co., Boston, MA, USA, 2nd edition, 1998. ISBN 0534954251. (page 52).
[89] T. A. Feo and M. G. Resende. Greedy randomized adaptive search procedures. Journal of
Global Optimization, 6:109–133, 1995. (page 38).
[90] P. Fernández, M. Resinas, and R. Corchuelo. Towards an automatic service trading.
Upgrade, 7(5):26–29, 2006. ISSN 1684-5285. (page 282).
[91] P. Festa and M. Resende. An annotated bibliography of grasp part ii: Applications. International Transactions in Operational Research, 16(2):131–172, 2009. ISSN 1475-3995. (page
38).
[92] H. Finner. On a monotonicity problem in step-down multiple test procedures. Journal of
the American Statistical Association, 88:920–923, 1993. (page 271).
[93] R. A. Fisher. The arrangement of field experiments. Journal of the Ministry of Agriculture
of Great Britain, 33:503 – 513, 1926. (page 57).
[94] C. A. Floudas and P. M. Pardalos. Encyclopedia of Optimization, 2nd edition, October 2008.
ISBN 978-0-387-74760-6. (pages 4, 5).
[95] T. C. Fogarty. Varying the probability of mutation in the genetic algorithm. In Proceedings
of the 3rd International Conference on Genetic Algorithms, pages 104–109, San Francisco, CA,
USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-066-3. (page 111).
[96] D. Fogel, L. Fogel, and J. Atmar. Meta-evolutionary programming. In Signals, Systems and
Computers, 1991. 1991 Conference Record of the Twenty-Fifth Asilomar Conference on, pages
540–545 vol.1, Nov 1991. doi: 10.1109/ACSSC.1991.186507. (page 110).
[97] L. J. Fogel. On the Organization of Intellect. PhD thesis, UCLA, 1964. (page 109).
[98] L. J. Fogel and D. B. Fogel. Artificial intelligence through evolutionary programming.
Technical report, Final Report for US Army Research Institute, contract no PO-9-X561102C-1, 1986. (page 110).
[99] L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial intelligence through simulated evolution. Wiley, 1966. (pages 103, 109).
[100] C. M. Fonseca and P. J. Fleming. Genetic algorithms for multiobjective optimization: Formulationdiscussion and generalization. In Proceedings of the 5th International Conference
337
BIBLIOGRAPHY
on Genetic Algorithms, pages 416–423, San Francisco, CA, USA, 1993. Morgan Kaufmann
Publishers Inc. ISBN 1-55860-299-2. (page 105).
[101] M. Fontoura, C. Lucena, A. Andreatta, S. Carvalho, and C. Ribeiro. Using uml-f to enhance framework development: A case study in the local search heuristics domain. Journal of Systems and Software, 57(3):201–206, 2001. cited By (since 1996) 3. (page 133).
[102] R. Fourer, D. Gay, and B. Kernighan. AMPL: a modeling language for mathematical programming. Thomson/Brooks/Cole, 2003. ISBN 9780534388096. URL http://books.google.
es/books?id=Ij8ZAQAAIAAJ. (pages 85, 177).
[103] M. Fowler. Patterns of enterprise application architecture. A Martin Fowler signature book.
Addison Wesley Professional, 2003. ISBN 9780321127426. URL http://books.google.
es/books?id=FyWZt5DdvFkC. (page 186).
[104] M. Fowler. Inversion of control containers and the dependency injection pattern., 2004.
URL http://www.martinfowler.com/articles/injection.html(2004). (page 123).
[105] G. Fraser and J. T. de Souza, editors. Search Based Software Engineering. Proceedings of the
4th International Symposium on SSBSE. Springer, 2012. (page 82).
[106] M. Friedman. The use of ranks to avoid the assumption of normality implicit in the
analysis of variance. Journal of the American Statistical Association, 32(200):675–701, 1937.
(page 271).
[107] C. Gagnè and M. Parizeau. Genericity in evolutionary computation software tools: Principles and case-study. International Journal on Artificial Intelligence Tools, 15(2):173–194,
2006. (pages 10, 83, 94, 97).
[108] M. Gallego, F. Gortazar, and E. G. Pardo. Optsicom optimization suite, un conjunto
de herramientas para la investigacion en optimizacion. In Proceedings of the VII Spanish
Conference on Metaheuristic, Evolutionary and Bio-inspired Algorithms (MAEB), pages 352–
363, 2010. (page 84).
[109] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley Professional, illustrated edition edition, January 1994. (page 111).
[110] C. Gao, M. Cai, and H. Chen. Qos-driven global optimization of services selection supporting services flow re-planning. In Advances in Web and Network Technologies, and Information Management, Lecture Notes in Computer Science, pages 516–521. Springer, 2007.
doi: 10.1007/978-3-540-72909-9\ 56. (pages 287, 288).
338
BIBLIOGRAPHY
[111] S. Garcı́a, A. Fernández, J. Luengo, and F. Herrera. Advanced nonparametric tests for
multiple comparisons in the design of experiments in computational intelligence and
data mining: Experimental analysis of power. Information Sciences, 180(10):2044–2064,
2010. (pages 77, 87, 90, 196).
[112] J. Garcı́a-Nieto, E. Alba, and F. Chicano. Using metaheuristic algorithms remotely via
ros. In GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary
computation, pages 1510–1510, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-697-4.
doi: http://doi.acm.org/10.1145/1276958.1277239. (page 121).
[113] P. Garcı́a-Sánchez, J. González, P. A. Castillo, J. Merelo, A. M. Mora, J. L. J. Laredo, and
M. G. Arenas. A distributed service oriented framework for metaheuristics using a public
standard. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pages
211–222. Springer, 2010. (page 86).
[114] M. Gavish and D. Donoho. A universal identifier for computational results. Procedia
Computer Science, 4(0):637 – 647, 2011. Proceedings of the International Conference on
Computational Science, ICCS 2011. (page 8).
[115] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian
restoration of images. Readings in computer vision: issues, problems, principles, and
paradigms, pages 564–584, 1987. (page 103).
[116] J. A. Gliner, G. A. Morgan, and N. L. Leech. Research methods in applied settings: an integrated approach to design and analysis (second edition), volume 2nd. Tylor and Francis
Group, 2009. (pages 6, 7, 10, 46, 47, 52, 53, 55, 61, 62, 65, 66, 67, 68, 69, 72, 76, 77, 149, 172,
176, 252).
[117] F. Glover. Heuristics for integer programming using surrogate constraints. Decision
Sciences, 8(1):156–166, 1977. URL http://dx.doi.org/10.1111/j.1540-5915.1977.
tb01074.x. (page 38).
[118] F. Glover. Tabu search ? part i. ORSA Journal on Computing, 1:190?206, 1989. (page 30).
[119] F. Glover. A template for scatter search and path relinking. In J.-K. Hao, E. Lutton,
E. Ronald, M. Schoenauer, and D. Snyers, editors, Artificial Evolution, volume 1363 of
Lecture Notes in Computer Science, pages 1–51. Springer Berlin / Heidelberg, 1998. ISBN
978-3-540-64169-8. (page 36).
[120] F. Glover and G. A. Kochenberger. Handbook of Metaheuristic. Kluwer Academic Publishers, 2002. (pages 5, 25, 26, 27, 102).
339
BIBLIOGRAPHY
[121] D. Goldberg and R. Lingle. Alleles loci and the traveling salesman problem. In 1, editor,
Proc. 1st Int. Conf. on Genetic Algorithms and their Applications, pages 154–159, 1985. (page
109).
[122] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine learning. Addison
Wesley, 1989. (page 105).
[123] D. E. Goldberg. A note on boltzmann tournament selection for genetic algorithms and
population-oriented simulated annealing. Complex Systems, 4(4):445–460, 1990. (page
103).
[124] D. E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic
algorithms. In FOGA. Proceedings of the First Workshop on Foundations of Genetic Algorithms., pages 69–93, 1990. (page 33).
[125] D. E. Goldberg and R. E. Smith. Nonstationary function optimization using genetic algorithm with dominance and diploidy. In Proceedings of the Second International Conference
on Genetic Algorithms on Genetic algorithms and their application, pages 59–68, Hillsdale, NJ,
USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 104).
[126] O. S. Gómez, N. Juristo, and S. Vegas. Replications types in experimental disciplines. In
Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, pages 3:1–3:10, New York, NY, USA, 2010. ACM. ISBN
978-1-4503-0039-1. doi: 10.1145/1852786.1852790. URL http://doi.acm.org/10.1145/
1852786.1852790. (page 7).
[127] P. V. Gorp and S. Mazanek. Share: a web portal for creating and sharing executable research papers. Procedia Computer Science, 4(0):589 – 597, 2011. ISSN 1877-0509. URL http:
//www.sciencedirect.com/science/article/pii/S1877050911001207. Proceedings
of the International Conference on Computational Science, ICCS 2011. (pages 8, 86).
[128] P. Hansen, N. Mladenović, and D. Perez-Britos. Variable neighborhood decomposition
search. Journal of Heuristics, 7(4):335–350, 2001. ISSN 1381-1231. doi: http://dx.doi.org/
10.1023/A:1011336210885. (page 103).
[129] M. Harman. The current state and future of search based software engineering. Future of
Software Engineering, 2007. FOSE ’07, pages 342–357, 23-25 May 2007. doi: 10.1109/FOSE.
2007.29. (pages 82, 280).
[130] M. Harman and A. Mansouri. Search based software engineering: Introduction to
the special issue of the ieee transactions on software engineering. IEEE Transactions on Software Engineering, 36(6):737–741, 2010. ISSN 0098-5589. doi: http://doi.
ieeecomputersociety.org/10.1109/TSE.2010.106. (page 82).
340
BIBLIOGRAPHY
[131] M. Harman, P. McMinn, J. de Souza, and S. Yoo. Search based software engineering:
Techniques, taxonomy, tutorial. In B. Meyer and M. Nordio, editors, Empirical Software
Engineering and Verification, volume 7007 of Lecture Notes in Computer Science, pages 1–59.
Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-25230-3. URL http://dx.doi.org/
10.1007/978-3-642-25231-0_1. (page 82).
[132] P. V. Hentenryck, L. Michel, P. Laborie, W. Nuijten, and J. Rogerie. Combinatorial optimization in opl studio. In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, EPIA ’99, pages 1–15, London, UK, UK, 1999.
Springer-Verlag. ISBN 3-540-66548-X. (page 40).
[133] J. Hilbe. Logistic Regression Models. A Chapman & Hall book. CRC PressINC, 2009. ISBN
9781420075755. URL http://books.google.es/books?id=eJcMIAAACAAJ. (page 141).
[134] K. Hinkelmann and O. Kempthorne. Design and Analysis of Experiments, Introduction to Experimental Design. Design and Analysis of Experiments. Wiley, 2007. ISBN 9780470191743.
URL http://books.google.es/books?id=T3wWj2kVYZgC. (pages 54, 55, 61, 62).
[135] Y. C. Ho and D. L. Pepyne. Simple explanation of the no-free-lunch theorem and its
implications. Journal of Optimization Theory and Applications, 115(3):549–570, December 2002. URL http://www.ingentaconnect.com/content/klu/jota/2002/00000115/
00000003/00450394. (page 42).
[136] Y. Hochberg. A sharper bonferroni procedure for multiple tests of significance.
Biometrika, 75:800–803, 1988. (page 271).
[137] J. L. Hodges and E. L. Lehmann. Ranks methods for combination of independent experiments in analysis of variance. Annals of Mathematical Statistics, 33:482–497, 1962. (page
271).
[138] B. S. Holland and M. D. Copenhaver. An improved sequentially rejective bonferroni test
procedure. Biometrics, 43:417–423, 1987. (page 271).
[139] J. H. Holland. Adaptation in natural and artificial systems: An introductory analysis with
applications to biology, control, and artificial intelligence. University of Michigan Press, 1975.
ISBN 0472084607. (pages 34, 110, 112).
[140] J. H. Holland. Adaptation in Natural and Artificial Systems. MIT press. 2nd edition, 1992.
(pages 104, 109).
[141] S. Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of
Statistics, 6:65–70, 1979. (page 271).
341
BIBLIOGRAPHY
[142] G. Hommel. A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika, 75:383–386, 1988. (page 271).
[143] J. Horn, N. Nafpliotis, and D. Goldberg. A niched pareto genetic algorithm for multiobjective optimization. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 82–87 vol.1, Jun 1994.
doi: 10.1109/ICEC.1994.350037. (page 105).
[144] IBM. SPSS 17 Statistical Package. http://www.spss.com/, accessed November 2010.
(page 86).
[145] IEEE. Posix std 1003.1, 2004. (page 274).
[146] R. L. Iman and J. M. Davenport. Approximations of the critical region of the friedman
statistic. Commun. Stat., 18:571–595, 1980. (page 271).
[147] F. S. F. Inc.
Gnu lesser general public
http://www.gnu.org/copyleft/lesser.html. (page 186).
licence
version
3.
[148] S. Iredi, D. Merkle, and M. Middendorf. Bi-criterion optimization with multi colony ant
algorithms. In EMO ’01: Proceedings of the First International Conference on Evolutionary
Multi-Criterion Optimization, pages 359–372, London, UK, 2001. Springer-Verlag. ISBN
3-540-41745-1. (page 105).
[149] ISO/IEC. Information technology – document container file – part 1: Core (formal name).
profile of zip file format. np 21320-1., 2011. (page 274).
[150] M. C. Jaeger, G. Mühl, and S. Golze. Qos-aware composition of web services: An evaluation of selection algorithms. On the Move to Meaningful Internet Systems 2005: CoopIS,
DOA, and ODBASE, pages 646–661, 2005. (page 287).
[151] D. S. Johnson. A theoretician’s guide to the experimental analysis of algorithms. 2002.
URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.1935. (page
10).
[152] K. A. D. Jong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis,
University of Michigan, 1975. (pages 109, 112).
[153] F. Jouault, F. Allilaire, J. Bézivin, I. Kurtev, and P. Valduriez. Atl: a qvt-like transformation language. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented
programming systems, languages, and applications, pages 719–720. ACM, 2006. (page 169).
[154] N. Jurison and A. M. Moreno. Basics of Software Engineering Experimentation. Kluwer
Academic Publishers, 2005. (pages 6, 46, 52, 54, 63).
342
BIBLIOGRAPHY
[155] N. Juristo and A. Moreno. Basics of Software Engineering Experimentation. Springer, 2001.
ISBN 9780792379904. URL http://books.google.es/books?id=ovWfOeW653EC. (pages
10, 84).
[156] J. Kallrath. Modeling Languages in Mathematical Optimization. Applied Optimization. Springer, 2004. ISBN 9781402075476. URL http://books.google.es/books?id=
wJYART7VYe8C. (page 85).
[157] K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature–Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, SEI, 1990. (page
307).
[158] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. (page
144).
[159] J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 4, pages 1942–1948 vol.4, 1995. doi:
10.1109/ICNN.1995.488968. URL http://dx.doi.org/10.1109/ICNN.1995.488968.
(page 37).
[160] J. Kennedy and R. Mendes. Population structure and particle swarm performance. In
Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2002. (page 104).
[161] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing.
Science, 220:671–680, 1983. (pages 28, 103).
[162] B. A. Kitchenham. Software Metrics: Measurement for Software Process Improvement. Blackwell Publishers, Inc., Cambridge, MA, USA, 1996. ISBN 1855548208. (page 52).
[163] B. A. Kitchenham. Procedures for undertaking systematic reviews. Technical report,
Computer Science Department, Keele University, 2004. (page 95).
[164] A. Klein, F. Ishikawa, and S. Honiden. Towards network-aware service composition in
the cloud. In Proceedings of the 21st international conference on World Wide Web, WWW ’12,
pages 959–968, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1229-5. doi: 10.1145/
2187836.2187965. URL http://doi.acm.org/10.1145/2187836.2187965. (page 287).
[165] J. D. Knowles and D. W. Corne. Approximating the nondominated front using the pareto
archived evolution strategy. Evol. Comput., 8(2):149–172, 2000. ISSN 1063-6560. doi:
http://dx.doi.org/10.1162/106365600568167. (page 105).
[166] J. M. Koa, C. O. Kima, and I.-H. Kwonb. Quality-of-service oriented web service composition algorithm and planning architecture. Journal of Systems and Software, 81(11):2079–
2090, 2008. (pages 207, 280, 291, 299, 301, 302).
343
BIBLIOGRAPHY
[167] D. Köhn and N. Novère. Sed-ml — an xml format for the implementation of the miase guidelines. In Proceedings of the 6th International Conference on Computational Methods
in Systems Biology, CMSB ’08, pages 176–190, Berlin, Heidelberg, 2008. Springer-Verlag.
ISBN 978-3-540-88561-0. doi: 10.1007/978-3-540-88562-7 15. URL http://dx.doi.org/
10.1007/978-3-540-88562-7_15. (page 84).
[168] J. Koza. Genetic programming: on the programming of computers by means of natural selection.
MIT Press, Cambridge, MA, USA, 1992. ISBN 0-262-11170-5. (page 309).
[169] J. R. Koza. Genetic programming: On the programming of computers by natural selection. MIT
Press, 1992. (pages 33, 110).
[170] M. Kronfeld, H. Planatscher, and A. Zell. The EvA2 optimization framework. In C. Blum
and R. Battiti, editors, Learning and Intelligent Optimization Conference, Special Session on
Software for Optimization (LION-SWOP), number 6073 in Lecture Notes in Computer
Science, LNCS, pages 247–250, Venice, Italy, Jan. 2010. Springer Verlag. URL http:
//www.ra.cs.uni-tuebingen.de/publikationen/2010/Kron10EvA2Short.pdf. (page
98).
[171] P. Leitner, W. Hummer, and S. Dustdar. Cost-based optimization of service compositions.
IEEE T. Services Computing, 6(2):239–251, 2013. (page 287).
[172] H. Levene. Robust test for quality variance. Contributions to Probability and Statistics:
Essays in Honor of Harold Hotelling, pages 278–292, 1960. (page 271).
[173] J. A. Lewis, S. M. Henry, D. G. Kafura, and R. S. Schulman. On the relationship between
the object-oriented paradigm and software reuse: An empirical investigation. Technical
report, Blacksburg, VA, USA, 1992. (page 45).
[174] J. Li. A two-step rejection procedure for testing multiple hypotheses. Journal of Statistical
Planning and Inference, 138:1521–1527, 2008. (page 271).
[175] H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean and variance
unknown. Journal of the American Statistical Association, 62:399–402, 1967. (page 271).
[176] R. Lowry. Vassarstats: Website for statistical computation. 1998. URL http://faculty.
vassar.edu/lowry/VassarStats.html. (page 87).
[177] B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao,
and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and
Computation: Practice and Experience, 18(10):1039–1065, 2006. ISSN 1532-0634. doi: 10.
1002/cpe.994. URL http://dx.doi.org/10.1002/cpe.994. (page 85).
344
BIBLIOGRAPHY
[178] M. Lukasiewycz, M. Glaß, F. Reimann, and S. Helwig. Opt4j - the optimization framework for java. http://www.opt4j.org, 2009. (pages 98, 126).
[179] S. Luke, L. Panait, G. Balan, S. Paus, Z. Skolicki, E. Popovici, K. Sullivan, J. Harrison, J. Bassett, R. Hubley, A. Chircop, J. Compton, W. Haddon, S. Donnelly,
B. Jamil, and J. O’Beirne. Ecj: A java-based evolutionary computation research system.
http://cs.gmu.edu/ eclab/projects/ecj/, 2009. (pages 98, 182).
[180] H. Ma, F. Bastani, I.-L. Yen, and H. Mei. Qos-driven service composition with reconfigurable services. IEEE Transactions on Services Computing, 6(1):20–34, 2013. ISSN 1939-1374.
doi: http://doi.ieeecomputersociety.org/10.1109/TSC.2011.21. (page 287).
[181] O. Maron and A. W. Moore. The racing algorithm: Model selection for lazy learners.
Artificial Intelligence Review, 11:193–225, 1997. (page 76).
[182] M. Mattsson, J. Bosch, and M. E. Fayad. Framework integration problems, causes, solutions. Commun. ACM, 42(10):80–87, Oct. 1999. ISSN 0001-0782. doi: 10.1145/317665.
317679. URL http://doi.acm.org/10.1145/317665.317679. (page 10).
[183] D. G. Mayo. An objective theory of statistical testing. Synthese, 57:297–340, 1983. ISSN
0039-7857. URL http://dx.doi.org/10.1007/BF01064701. 10.1007/BF01064701. (page
253).
[184] P. McMinn. Search-based software test data generation: a survey: Research articles.
Software Testing Verification and Reliability., 14(2):105–156, 2004. ISSN 0960-0833. doi:
10.1002/stvr.v14:2. (page 306).
[185] Q. McNemar. On the sampling error of the difference between correlated proportions or
percentages. Psychometrika, 12(2):153–157, 1947. (page 271).
[186] K. Meffert. JUnit Profi-Tips. Entwickler.Press, 2006. (page 133).
[187] M. Mendonça, A. Wasowski, and K. Czarnecki. SAT–based analysis of feature models is
easy. In Proceedings of the Sofware Product Line Conference, 2009. (page 315).
[188] B. Meyer. Object-Oriented Software Construction. Prentice Hall, 1988. (page 186).
[189] Z. Michalewicz. Genetic Algorithms Plus Data Structures Equals Evolution Programs.
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1994. ISBN 0387580905. (page
109).
[190] Z. Michalewicz and D. B. Fogel. How to Solve It: Modern Heuristics. Springer, December 2004. ISBN 3540224947. URL http://www.amazon.ca/exec/obidos/redirect?tag=
citeulike09-20&amp;path=ASIN/3540224947. (pages 25, 108, 113).
345
BIBLIOGRAPHY
[191] L. Michel and P. Van Hentenryck. Localizer a modeling language for local search. In Principles and Practice of Constraint Programming-CP97, pages 237–251. Springer, 1997. (page
85).
[192] N. Monmarchè, G. Venturini, and M. Slimane. On how pachycondyla apicalis ants suggest a new search algorithm. Future Gener. Comput. Syst., 16(9):937–946, 2000. ISSN 0167739X. (page 105).
[193] D. J. Montana. Strongly typed genetic programming. Evol. Comput., 3(2):199–230, 1995.
ISSN 1063-6560. doi: http://dx.doi.org/10.1162/evco.1995.3.2.199. (pages 110, 111).
[194] D. J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. In IJCAI’89: Proceedings of the 11th international joint conference on Artificial intelligence, pages 762–767, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc.
(page 110).
[195] D. Montgomery. Design and Analysis of Experiments. John Wiley & Sons Canada, Limited,
1997. ISBN 9780471260080. (pages 54, 55, 61, 62, 172, 176).
[196] S. T. Mueller. The PEBL Manual. 2010. (page 84).
[197] H. Muhlenbein. Evolution on time and space-the parallel genetic glgorithm. Foundations
of Genetic Algorithms, 1991. URL http://ci.nii.ac.jp/naid/10016718767/en/. (page
109).
[198] P. B. Nemenyi. Distribution-free Multiple comparisons. PhD thesis, Princeton University,
1963. (page 271).
[199] G. J. V. Nossal and J. Lederberg. Antibody production by single cells. Nature, 181(4620):
1419–1420, May 1958. doi: 10.1038/1811419a0. URL http://dx.doi.org/10.1038/
1811419a0. (page 105).
[200] P. Nowakowski, E. Ciepiela, D. Harezlak, J. Kocot, M. Kasztelnik, T. Bartybski, J. Meizner,
G. Dyk, and M. Malawski. The collage authoring environment. Procedia Computer Science, 4(0):608 – 617, 2011. ISSN 1877-0509. URL http://www.sciencedirect.com/
science/article/pii/S1877050911001220. Proceedings of the International Conference on Computational Science, ICCS 2011. (pages 8, 86).
[201] J. D. Nulton and P. Salamon. Statistical mechanics of combinatorial optimization. Phys.
Rev. A, 37(4):1351–1356, Feb 1988. doi: 10.1103/PhysRevA.37.1351. (page 103).
[202] S. of Electronics and C. S. at the University of Southampton. Reproducible research repository. URL http://rr.epfl.ch/. (pages 8, 86, 156, 192).
346
BIBLIOGRAPHY
[203] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover, C. Goble,
A. Goderis, D. Hull, D. Marvin, P. Li, P. Lord, M. R. Pocock, M. Senger, R. Stevens,
A. Wipat, and C. Wroe. Taverna: lessons in creating a workflow environment for the life
sciences. Concurrency and Computation: Practice and Experience, 18(10):1067–1100, 2006.
ISSN 1532-0634. doi: 10.1002/cpe.993. URL http://dx.doi.org/10.1002/cpe.993.
(page 85).
[204] I. M. Oliver, D. J. Smith, and J. R. C. Holland. A study of permutation crossover operators
on the traveling salesman problem. In Proceedings of the Second International Conference on
Genetic Algorithms on Genetic algorithms and their application, pages 224–230, Hillsdale, NJ,
USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 109).
[205] I. Osman and G. Laporte. Metaheuristics: A bibliography. Annals of Operations Research,
63:511–623, 1996. ISSN 0254-5330. URL http://dx.doi.org/10.1007/BF02125421.
10.1007/BF02125421. (page 25).
[206] B. Paechter, T. Back, M. Schoenauer, M. Sebag, A. Eiben, J. J. Merelo, and T. C. Fogarty. A
distributed resource evolutionary algorithm machine (dream). In Evolutionary Computation, 2000. Proceedings of the 2000 Congress on, volume 2, pages 951–958. IEEE, 2000. (page
86).
[207] M. P. Papazoglou and W.-J. van den Heuvel. Service oriented architectures: approaches,
technologies and research issues. VLDB J., 16(3):389–415, 2007. (page 280).
[208] M. P. Papazoglou, P. Traverso, S. Dustdar, F. Leymann, and B. J. Krämer. Service-oriented
computing: A research roadmap. In F. Curbera, B. J. Krämer, and M. P. Papazoglou,
editors, Service Oriented Computing, volume 05462 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl,
Germany, 2005. (page 280).
[209] J. A. Parejo, J. Racero, F. Guerrero, T. Kwok, and K. Smith. Fom: A framework for metaheuristic optimization. Computational Science ICCS 2003. Lecture Notes in Computer Science., 2660:886–895, 2003. no-indexada. (page 98).
[210] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. Qos-aware services composition using
tabu search and hybrid genetic algorithms. In Talleres de las JISBD. IADIS’08., 2008.
(pages 207, 252).
[211] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. De frameworks a ecosistemas: Evolución
del software para optimización metaheurı́stica. In Actas del Congreso Español De Metaheuristicas, Algoritmos Evolutivoy y Bioispirados. MAEB2010. Celebrado en el marco del CONGRESO ESPAÑOL DE INFORMÁTICA (CEDI 2010)., Sep 2010. (page 199).
347
BIBLIOGRAPHY
[212] J. A. Parejo, P. Fernández, and A. Ruiz-Cortés. On parameter selection and problem
instances generation for qos-aware binding using grasp and path-relinking. Research
Report 2011-4, ETSII. Av. Reina Mercedes s/n. 41012. Sevilla. Spain, 2011. (pages 289,
291, 292, 293, 294, 295, 303).
[213] J. A. Parejo, S. Lozano, A. Ruiz-Cortès, and P. Fernandez. Metaheuristic optimization
frameworks: A survey and benchmarking. Soft Computing, 2011. (pages 12, 41, 134, 252).
[214] J. A. Parejo, A. R.-C. Jorge Garcı́a, and J. C. Riquelme. Statservice: Herramienta de análisis
estadı́stico como soporte para la investigación con metaheurı́sticas. In Actas del VIII Congreso Español sobre Metaheurı́sticas, Algoritmos Evolutivos y Bio-inspirados, 2012. (page 199).
[215] J. A. Parejo, S. Segura, and A. Ruiz-Cortés. Achieving replicability: Is there life for our
experiments after publication? In Actas del IX Congreso Español sobre Metaheurı́sticas,
Algoritmos Evolutivos y Bio-inspirados, 2013. (pages 177, 199).
[216] K. Parsopoulos and M. Vrahatis. Recent approaches to global optimization problems
through particle swarm optimization. Natural Computing, 1(2):235 to 306, 2002. (page
37).
[217] K. E. Parsopoulos and M. N. Vrahatis. Particle swarm optimization method in multiobjective problems. In SAC ’02: Proceedings of the 2002 ACM symposium on Applied
computing, pages 603–607, New York, NY, USA, 2002. ACM. ISBN 1-58113-445-2. doi:
http://doi.acm.org/10.1145/508791.508907. (page 105).
[218] J. K. Patrick Vandewalle and M. Vetterli. Reproducible research portal, 2009. (pages 8,
86, 156, 192).
[219] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison
Wesley, 1984. (page 4).
[220] M. D. Penta and L. Troiano. Using fuzzy logic to relax constraints in ga-based service
composition. In GECCO ’05: Proceedings of the 2005 conference on Genetic and evolutionary
computation, June 2005. (page 287).
[221] J. C. Pezzullo. Statpages.org. 2010. URL http://statpages.org. (page 87).
[222] E. Piñana, I. Plana, V. Campos, and R. Martı́. Grasp and path relinking for the matrix
bandwidth minimization. European Journal of Operational Research, 153(1):200–210, 2004.
(pages 39, 292).
[223] R. R. Planet. The reproducible research librum, 2011. URL http://www.rrplanet.com/.
(pages 8, 86, 156, 192).
348
BIBLIOGRAPHY
[224] K. R. Popper. Objective Knowledge. Oxford University Press, 1972. (page 253).
[225] K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution A Practical Approach to
Global Optimization. Natural Computing Series. Springer-Verlag, Berlin, Germany, 2005.
(page 104).
[226] T. R. Project. Web site of the r project. visited in 2013. URL http://www.r-project.org.
(page 40).
[227] Y. Qu, C. Lin, Y. Wang, and Z. Shan. Qos-aware composite service selection in grids. Grid
and Cooperative Computing, 2006. GCC 2006. Fifth International Conference, pages 458–465,
Oct. 2006. doi: 10.1109/GCC.2006.77. (page 286).
[228] D. Quade. Using weighted rankings in the analysis of complete blocks with additive
block effects. Journal of the American Statistical Association, 74:680–683, 1979. (page 271).
[229] N. J. Radcliffe. Forma analysis and random respectful recombination. In In Foundations
of Genetic Algorithms,, page 222 229, 1991. (page 109).
[230] I. Rahman, A. K. Das, R. B. Mankar, and B. D. Kulkarni. Evaluation of repulsive particle
swarm method for phase equilibrium and phase stability problems. Fluid Phase Equilibria,
May 2009. ISSN 03783812. doi: 10.1016/j.fluid.2009.04.014. URL http://dx.doi.org/
10.1016/j.fluid.2009.04.014. (page 38).
[231] G. R. Raidl. Hybrid Metaheuristics, chapter A Unified View on Hybrid Metaheuristics,
pages 1 – 12. Springer, 2006. (page 115).
[232] R. L. Rardin and R. Uzsoy. Experimental evaluation of heuristic optimization algorithms:
A tutorial. Journal of Heuristics, 7(3):261–304, May 2001. ISSN 1381-1231. doi: 10.1023/A:
1011319115230. URL http://dx.doi.org/10.1023/A:1011319115230. (page 4).
[233] I. Rechenberg. Cybernetic solution path of an experimental problem. Royal Aircraft Establishment Library Translation 1122, Farnborough, Uk, 1965. (page 103).
[234] J.-M. Renders and H. Bersini. Hybridizing genetic algorithms with hill-climbing methods
for global optimization: two possible ways. In Evolutionary Computation, 1994. IEEE World
Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages
312–317 vol.1, Jun 1994. doi: 10.1109/ICEC.1994.349948. (page 109).
[235] M. G. C. Resende. Greedy randomized adaptive search procedures. In Encyclopedia of
Optimization, pages 1460–1469. 2009. (pages 39, 289, 292).
[236] D. E. Rex, J. Q. Ma, and A. W. Toga. The loni pipeline processing environment. Neuroimage, 19(3):1033–1048, 2003. (page 85).
349
BIBLIOGRAPHY
[237] E. Ridge and D. Kudenko. Tuning the performance of the mmas heuristic. In T. Stützle,
M. Birattari, and H. H. Hoos, editors, Engineering Stochastic Local Search Algorithms. Designing, Implementing and Analyzing Effective Heuristics, volume 4638 of Lecture Notes in
Computer Science, pages 46–60. Springer Berlin / Heidelberg, 2007. ISBN 978-3-540-744450. (pages 6, 75, 76).
[238] A. Roli and C. Blum. Hybrid metaheuristics: An introduction. In Hybrid Metaheuristics.
Springer, 2008. (page 115).
[239] D. M. Rom. A sequentially rejective test procedure based on a modified bonferroni inequality. Biometrika, 77:663–665, 1990. (page 271).
[240] F. Rothlauf. Representations for Genetic and Evolutionary Algorithms. Springer, 2nd edition,
2006. (page 108).
[241] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. 3rd edition. Prentice Hall Series in Artificial Intelligence. Pearson Education/Prentice Hall, 2010. ISBN
9780136042594. URL http://books.google.es/books?id=8jZBksh-bUMC. (pages 4,
25).
[242] D. Sasaki. Armoga: An efficient multi-objective genetic algorithm. Technical report, 2005.
(page 105).
[243] J. D. Schaffer and A. Morishima. An adaptive crossover distribution mechanism for genetic algorithms. In Proceedings of the Second International Conference on Genetic Algorithms
on Genetic algorithms and their application, pages 36–40, Hillsdale, NJ, USA, 1987. L. Erlbaum Associates Inc. ISBN 0-8058-0158-8. (page 109).
[244] A. Scheibenpflug, S. Wagner, E. Pitzer, and M. Affenzeller. Optimization knowledge base:
an open database for algorithm and problem characteristics and optimization results. In
Proceedings of the fourteenth international conference on Genetic and evolutionary computation
conference companion, GECCO Companion ’12, pages 141–148, New York, NY, USA, 2012.
ACM. ISBN 978-1-4503-1178-6. doi: 10.1145/2330784.2330806. URL http://doi.acm.
org/10.1145/2330784.2330806. (page 192).
[245] H.-P. Schwefel. Numerical Optimization of Computer Models. John Wiley & Sons, Inc., New
York, NY, USA, 1981. ISBN 0471099880. (page 110).
[246] S. Segura, J. A. Parejo, R. M. Hierons, D. Benavides, and A. Ruiz-Cortés. Ethom: An
evolutionary algorithm for optimized feature models generation (v. 1.2). Technical report,
July 2012. (pages 316, 320).
350
BIBLIOGRAPHY
[247] S. Segura, J. A. Parejo, R. M. Hierons, D. Benavides, and A. Ruiz-Cortés. Ethom: An
evolutionary algorithm for optimized feature models generation (v. 1.2). Technical report,
July 2012. (pages 10, 252).
[248] W. R. Shadish, T. D. Cook, and D. T. Campbell.
Experimental and quasiexperimental designs for generalized causal inference. Houghton Mifflin, 2 edition, July
2001. ISBN 0395615569. URL http://www.amazon.com/exec/obidos/redirect?tag=
citeulike07-20&path=ASIN/0395615569. (pages 6, 7, 46, 54, 56, 61, 65, 66, 67, 69, 71,
72, 172, 176).
[249] J. P. Shaffer. Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81(395):826 –831, 1986. (page 271).
[250] T. Shane. A partial implementation of the bica cognitive decathlon using the psychology
experiment building language (pebl). International Journal of Machine Consciousness, 2(02):
273–288, 2010. (page 84).
[251] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality. Biometrika, 52
(3–4):591–611, 1965. (page 271).
[252] D. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press,
Boca Raton, 2006. (page 271).
[253] R. Sinnema.
Experiment description language (edl), 2003.
sourceforge.net/. (page 84).
URL http://edl.
[254] N. J. A. Sloane and R. H. Hardin. Gosset: A general-purpose program for designing
experiments, 1991-2003. URL http://www.research.att.com/˜njas/gosset/index.
html. (page 119).
[255] N. V. Smirnov. Estimate of deviation between empirical distribution functions in two
independent samples (in russian). Bulletin of Moscow University, 2:3–16, 1939. (page 271).
[256] SoaML 1 0. Service Oriented Architecture Modeling Language (SoaML) Specification, Version
1.0. Object Management Group, Mar. 2012. URL http://www.omg.org/spec/SoaML/1.
0/. (page 189).
[257] C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 15(1):72–101, 1904. (page 144).
[258] splot. S.P.L.O.T.: Software Product Lines Online Tools. http://www.splot-research.
org/, accessed October 2010. (page 305).
[259] I. StatPoint Technologies.
Statgraphics online.
statgraphicsonline.com/SGOnline.aspx. (page 87).
2012.
URL http://www.
351
BIBLIOGRAPHY
[260] S. M. Stigler. Francis galton’s account of the invention of correlation. Statist. Sci., 4(2):
73–79, 1989. (page 144).
[261] A. Strunk. Qos-aware service composition: A survey;. In Proceedings of the European
Conference on Web Services (ECOWS10), 2010. (pages 283, 284, 303).
[262] T. Stutzle and H. Hoos. Max-min ant system and local search for the traveling salesman
problem. Evolutionary Computation, 1997., IEEE International Conference on, pages 309–314,
Apr 1997. doi: 10.1109/ICEC.1997.592327. (page 105).
[263] S. Su, C. Zhang, and J. Chen. An improved genetic algorithm for web services selection.
In Distributed Applications and Interoperable Systems, volume 4531/2007 of Lecture Notes
in Computer Science, pages 284–295. Springer, 2007. doi: 10.1007/978-3-540-72883-2\ 21.
URL http://dx.doi.org/10.1007/978-3-540-72883-2_21. (page 287).
[264] P. N. Suganthan. Particle swarm optimiser with neighbourhood operator. In Proceedings
of the IEEE Congress on Evolutionary Computation (CEC, pages 1958–1962, 1999. (page 104).
[265] R. Suresh and K. Mohanasundaram. Pareto archived simulated annealing for permutation flow shop scheduling with multiple objectives. In Cybernetics and Intelligent Systems, 2004 IEEE Conference on, volume 2, pages 712–717, 2004. doi: 10.1109/ICCIS.2004.
1460675. (page 105).
[266] G. Syswerda. Foundations of genetic algorithms, chapter A Study of Reproduction in Generational and Steady-State Genetic Algorithms. Morgan Kaufmann, 1991. (pages 109,
110).
[267] E.-G. Talbi. A taxonomy of hybrid metaheuristics. J. Heuristics, 8(5):541–564, 2002. (pages
25, 26, 27, 115, 163).
[268] E.-G. Talbi. Metaheuristics - From Design to Implementation. Wiley, 2009. ISBN 978-0-47027858-1. (pages 25, 26, 28).
[269] J. Timmer.
Keeping computers from ending science’s reproducibility.
Article in Ars Technica, Jan 2010. URL http://arstechnica.com/science/2010/01/
keeping-computers-from-ending-sciences-reproducibility/. (page 10).
[270] P. Trinidad, C. Müller, J. Garcı́a-Galán, and A. Ruiz-Cortés. Building industry-ready tools:
Fama framework and ada. In Third International Workshop on Academic Software Development Tools and Techniques, pages 160–173, 2010. (page 186).
[271] E. Tsang. Foundations of Constraint Satisfaction. Academic Press, 1995. ISBN 0-12-701610-4.
(page 312).
352
BIBLIOGRAPHY
[272] D. G. Uitenbroek. Sisa: Simple interactive statistical analysis. 1997. URL http://www.
quantitativeskills.com/sisa/. (page 87).
[273] E. Ulungu, J. Teghem, P. Fortemps, and D. Tuyttens. Mosa method: a tool for solving
multiobjective combinatorial optimization problems. Journal of Multi-Criteria Decision
Analysis, 8(4):221 to 236, 1999. (page 105).
[274] P. Van Hentenryck and L. Michel. Constraint-Based Local Search. The MIT Press, 2005.
ISBN 0262220776. (page 40).
[275] D. A. Van Veldhuizen and G. B. Lamont. Multiobjective optimization with messy genetic
algorithms. In SAC ’00: Proceedings of the 2000 ACM symposium on Applied computing,
pages 470–476, New York, NY, USA, 2000. ACM. ISBN 1-58113-240-9. doi: http://doi.
acm.org/10.1145/335603.335914. (page 105).
[276] P. Vandewalle, J. Kovacevic, and M. Vetterli. Reproducible Research in Signal Processing
- What, why, and how. IEEE Signal Processing Magazine, 26(3):37–47, 2009. doi: 10.1109/
MSP.2009.932122. (page 8).
[277] S. Ventura, C. Romero, A. Zafra, J. Delgado, and C. Hervás. Jclec: a java framework for
evolutionary computation. Soft Computing, 12(4):381–392, 2008. URL http://dx.doi.
org/10.1007/s00500-007-0172-0. (pages 84, 98).
[278] J. S. Vesterstrøm and J. Riget. Particle swarms: Extensions for improved local, multi-modal,
and dynamic search in numerical optimization. PhD thesis, Dept. of Computer Science, University of Aarhus, 2002. (page 38).
[279] S. Voß. Meta-heuristics: The state of the art. pages 1–23. 2001. (page 40).
[280] S. Voß. Meta-heuristics: The state of the art. In Proceedings of the Workshop on Local Search
for Planning and Scheduling-Revised Papers in ECAI, pages 1–23. Springer-Verlag, London,
UK, 2001. ISBN 3-540-42898-4. (page 306).
[281] S. Voß and D. L. Woodruff. Optimization Software Class Libraries. Kluwer Academic Publishers, 2002. (pages 9, 42, 83, 97).
[282] H. Wada, P. Champrasert, J. Suzuki, and K. Oba. Multiobjective optimization of slaaware service composition. Congress on Services - Part I, 2008. SERVICES ’08. IEEE, pages
368–375, July 2008. doi: 10.1109/SERVICES-1.2008.77. (page 287).
[283] S. Wagner. Heuristicı̈¿½Optimizationı̈¿½Softwareı̈¿½Systemsı̈¿½Modelingı̈¿½ofı̈¿½Heuristicı̈¿½Optimizationı̈¿½Algorithmsı̈¿½in
theı̈¿½HeuristicLabı̈¿½Softwareı̈¿½Environment.
PhD thesis, Johannesı̈¿½Kepler University, Linz, Austria, 2009. (pages 98, 182).
353
BIBLIOGRAPHY
[284] S. Wagner and M. Affenzeller. Heuristiclab: A generic and extensible optimization environment, 2005. URL http://dx.doi.org/10.1007/3-211-27389-1_130. (page 84).
[285] S. Wagner, G. Kronberger, A. Beham, S. Winkler, and M. Affenzeller. Model driven rapid
prototyping of heuristic optimization algorithms, 2009. URL http://dx.doi.org/10.
1007/978-3-642-04772-5_94. (page 85).
[286] D. Waltemath, R. Adams, D. Beard, F. Bergmann, U. Bhalla, R. Britten, V. Chelliah,
M. Cooling, J. Cooper, E. Crampin, et al. Minimum information about a simulation experiment (miase). PLoS computational biology, 7(4):e1001122, 2011. (page 273).
[287] H. Wang, P. Tong, P. Thompson, and Y. Li. Qos-based web services selection. icebe, 0:
631–637, 2007. doi: http://doi.ieeecomputersociety.org/10.1109/ICEBE.2007.88. (pages
280, 284, 287).
[288] J. Wegener, K. Grimm, M. Grochtmann, and H. Sthamer. Systematic testing of real-time
systems. In Proceedings of the Fourth International Conference on Software Testing and Review
(EuroSTAR), 1996. (page 306).
[289] J. Wegener, H. Sthamer, B. Jones, and D. Eyres. Testing real-time systems using genetic
algorithms. Software Quality Control, 6(2):127–135, 1997. ISSN 0963-9314. doi: 10.1023/A:
1018551716639. (pages 306, 313).
[290] J. Wegener, H. Sthamer, B. F. Jones, and D. E. Eyres. Testing real-time systems using
genetic algorithms. Software Quality Journal, 6:127–135, 1997. ISSN 0963-9314. (pages 24,
214).
[291] T. Weise. Global Optimization Algorithms - Theory and Application. Self-Published, second
edition, 2009. Online available at http://www.it-weise.de/. (pages 25, 26).
[292] W. West and T. Ogden. Statistical analysis with webstat, a java applet for the world
wide web. Journal of Statistical Software, 2(3):1–7, 9 1997. ISSN 1548-7660. URL http:
//www.jstatsoft.org/v02/i03. (page 87).
[293] W. West, Y. Wu, and D. Heydt. An introduction to statcrunch 3.0. Journal of Statistical
Software, 9(5):??–??, 3 2004. ISSN 1548-7660. URL http://www.jstatsoft.org/v09/i05.
(page 87).
[294] D. Whitley. The genitor algorithim and selection pressure: Why rank-based allocation
of reproductive trials is best. In Proceedings of the Third International Conference on Genetic
Algorithms, pages 116–121, 1989. (page 112).
[295] D. Whitley, S. Rana, and R. B. Heckendorn. The island model genetic algorithm : On
separability, population size and convergence. CIT. Journal of computing and information
technology, 7(1):33 – 47, 1999. (page 116).
354
BIBLIOGRAPHY
[296] F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83,
1945. (page 271).
[297] D. N. Wilke, S. Kok, and A. A. Groenwold. Comparison of linear and classical velocity
update rules in particle swarm optimization: notes on diversity. International Journal for
Numerical Methods in Engineering, 70(8):962 984, 2007. (page 38).
[298] G. C. Wilson, A. Mc Intyre, and M. I. Heywood. Resource review: Three open source systems for evolving programs–lilgp, ecj and grammatical evolution. Genetic Programming
and Evolvable Machines, 5(1):103–105, 2004. ISSN 1389-2576. doi: http://dx.doi.org/10.
1023/B:GENP.0000017053.10351.dc. (page 83).
[299] C. Wohlin. Experimentation in Software Engineering: An Introduction. The Kluwer International Series in Software Engineering. Kluwer Academic, 2000. ISBN 9780792386827.
URL http://books.google.es/books?id=nG2UShV0wAEC. (page 63).
[300] D. Wolpert and W. Macready. No free lunch theorems for optimization. Evolutionary
Computation, IEEE Transactions on, 1(1):67–82, Apr 1997. ISSN 1089-778X. (pages 5, 9, 41).
[301] A. H. Wright. Genetic algorithms for real parameter optimization. In Foundations of
Genetic Algorithms, pages 205–218. Morgan Kaufmann, 1994. (page 109).
[302] X. Yao and Y. Liu. Fast evolutionary programming. In Proc. 5th Ann. Conf. on Evolutionary
Programming, 1996. (page 110).
[303] L. Zeng, B. Benatallah, A. H. Ngu, M. Dumas, J. Kalagnanam, and H. Chang. Qos-aware
middleware for web services composition. IEEE Trans. Softw. Eng., 30(5):311–327, 2004.
ISSN 0098-5589. doi: http://dx.doi.org/10.1109/TSE.2004.11. (pages 280, 283, 284, 285,
286).
[304] C. Zhang, S. Su, and J. Chen. Diga: Population diversity handling genetic algorithm
for qos-aware web services selection. Comput. Commun., 30(5):1082–1090, March 2007.
ISSN 0140-3664. doi: 10.1016/j.comcom.2006.11.002. URL http://portal.acm.org/
citation.cfm?id=1228023. (page 287).
[305] H. Zheng, W. Zhao, J. Yang, and A. Bouguettaya. Qos analysis for web service compositions with complex structures. Services Computing, IEEE Transactions on, PP(99):1, 2012.
ISSN 1939-1374. doi: 10.1109/TSC.2012.7. (pages 281, 287).
[306] H. Zhou and J. J. Grefenstette. Induction of finite automata by genetic algorithms. In
oceedings of the 1986 IEEE International Conference on Systems, Man, and Cybernetics, page
170 to 174, 1986. (page 109).
355
BIBLIOGRAPHY
[307] E. Zitzler and L. Thiele. Multiobjective evolutionary algorithms: a comparative case
study and the strength pareto approach. IEEE Trans. Evolutionary Computation, 3(4):257–
271, 1999. (page 105).
[308] E. Zitzler, M. Laumanns, and L. Thiele. Spea2: Improving the strength pareto evolutionary algorithm. Technical report, Computer Engineering and Networks Laboratory (TIK).
Department of Electrical Engineering. Swiss Federal Institute of Technology (ETH), 2001.
(page 105).
356

Documentos relacionados