una arquitectura distribuida para el control de robots
Transcripción
una arquitectura distribuida para el control de robots
UNA ARQUITECTURA DISTRIBUIDA PARA EL CONTROL DE ROBOTS AUTÓNOMOS MÓVILES UN ENFOQUE APLICADO A LA CONSTRUCCIÓN DE LA PLATAFORMA QUAKY-ANT Dept. Ingeniería de la Información y las Comunicaciones Universidad de Murcia Presentada por: Humberto Martínez Barberá Dirigida por: Antonio Fernando Gómez Skarmeta Enero 2.001 A DISTRIBUTED ARCHITECTURE FOR INTELLIGENT CONTROL IN AUTONOMOUS MOBILE ROBOTS AN APPLIED APPROACH TO THE DEVELOPMENT OF THE QUAKY-ANT PLATFORM Dept. of Communications and Information Engineering University of Murcia Presented by: Humberto Martínez Barberá Supervised by: Antonio Fernando Gómez Skarmeta January 2.001 D. Antonio Fernando Gómez Skarmeta, Profesor Titular de Universidad del Área de Ingeniería Telemática en el Departamento de Ingeniería de la Información y las Comunicaciones, AUTORIZA: La presentación de la Tesis Doctoral titulada “Una Arquitectura Distribuida para el Control de Robots Autónomos Móviles: un Enfoque Aplicado a la Const rucción de la Plataforma Quaky-Ant“, realizada por D. Humberto Martínez Barberá, bajo mi inmediata dirección y supervisión, en el Departamento de Ingeniería de la Información y las Comunicaciones, y que presenta para la obtención del grado de Doctor por la Universidad de Murcia. En Murcia, a 5 de Febrero de 2.001 A Esther. 1 Agradecimientos / Acknowledgements No hay tesis que se precie sin su página de agradecimientos, y esta no va a ser menos. Para empezar, todo este trabajo no habría sido posible sin la inestimable ayuda de distintos compañeros: Miguel Zamora y Benito Úbeda (hardware de Quaky-Ant), Eduardo Martínez (transmisión de vídeo en multicast), Juan Botía y Mercedes Valdés (técnicas inteligentes para fusión sensorial), Jesús Ibáñez (genético para fusión de comportamientos), y por supuesto, todo el grupo de Sistemas Inteligentes, Universidad de Murcia, especialmente el subgrupo que habita el sótano y se hace llamar grupo ANTS. Estos han contribuido en diversas facetas complementarias, y no menos importantes, del quehacer diario, en particular el partidillo de los jueves (¡aúpa equipo rojo!). I would also like to express my thanks to the people at the centre for Applied Autonomous Sensor Systems, University of Örebro, Sweden. They made my work very pleasant during my stay in the white hell. I specially would like to thank Dr. Alessandro Saffiotti for his interest and valuable help. Por supuesto, gran parte de la culpa de haber terminado la tesis la tiene mi director Dr. Antonio Gómez que ha sabido darme suficiente libertad para tomar distintos caminos, aconsejarme para tomar decisiones, y obligarme a realizar también la parte más ingrata del trabajo de la tesis. He de dar también las gracias a mi familia, porque gran parte, si no todo, de lo que soy se lo debo a ellos. ¡Si!, todos los cacharros y estropicios que he hecho parece que servían de algo. Finalmente, no puedo sino agradecer a Esther su compresión y apoyo durante estos años que, seguro, han servido para cambiar el chip y resetear la cabeza, y en cualquier caso, renovar las ilusiones para terminar este trabajo. Gracias de verdad. 3 Contenido / Contents PARTE Iª.......................................................................................................................................................................9 RESUMEN ...................................................................................................................................................................13 DESCRIPCIÓN GENERAL ..................................................................................................................................15 0.1 I NTRODUCCIÓN.................................................................................................................................................15 0.2 LA PLATAFORMA QUAKY-A NT ......................................................................................................................16 0.3 FUSIÓN SENSORIAL ...........................................................................................................................................18 0.4 NAVEGACIÓN....................................................................................................................................................23 0.5 ARQUITECTURA DE CONTROL.........................................................................................................................32 0.6 APRENDIZAJE DE LA FUS IÓN DE COMPORTAMIENT OS .................................................................................54 0.7 UNA APLICACIÓN DE TELEOPERACIÓN BASADA EN EL WEB.......................................................................57 0.8 CONCLUSIONES .................................................................................................................................................61 0.9 TRABAJO FUTURO.............................................................................................................................................66 PARTE IIª ...................................................................................................................................................................71 ABSTRACT ................................................................................................................................................................73 INTRODUCTION.....................................................................................................................................................75 1.1 OBJECTIVES AND CONTENTS ...........................................................................................................................75 1.2 SOME BACKGROUND ON MOBILE ROBOTICS .................................................................................................76 1.2.1 Mobile robotics .......................................................................................................................................76 1.2.2 History of mobile robotics .....................................................................................................................78 1.2.3 Robot intelligence ...................................................................................................................................81 1.2.4 Soft Computing ........................................................................................................................................83 1.3 CONTRIBUTIONS OF THIS THESIS ....................................................................................................................95 HARDWARE ARCHITECTUR E........................................................................................................................97 2.1 I NTRODUCTION.................................................................................................................................................97 2.2 M ADC AR MICRO-ROBOT .................................................................................................................................98 2.2.1 Description...............................................................................................................................................98 2.2.2 Sonar controller ....................................................................................................................................100 2.2.3 Shaft encoder .........................................................................................................................................101 2.2.4 Electronic compass...............................................................................................................................102 2.2.5 Conclusions ............................................................................................................................................103 2.3 LA-GARRA ROBOT ..........................................................................................................................................104 2.3.1 Description.............................................................................................................................................104 4.3.2 Conclusions ............................................................................................................................................106 5 CONTENIDO /CONTENTS 2.4 QUAKY-ANT ROBOT ...................................................................................................................................... 106 2.4.1 Description ............................................................................................................................................106 2.4.2 Modular architecture and interconnection...................................................................................... 107 2.4.3 External video and data communication.......................................................................................... 110 2.4.4 Sonar and Infrared module ................................................................................................................ 111 2.4.5 Motor control module.......................................................................................................................... 114 2.4.6 Compass module ..................................................................................................................................115 2.4.7 Bus master module............................................................................................................................... 117 2.4.8 Conclusions ........................................................................................................................................... 117 SENSOR FUSION ..................................................................................................................................................119 3.1 I NTRODUCTION ..............................................................................................................................................119 3.2 SENSOR MODELS ............................................................................................................................................121 3.2.1 Ultrasonic sonar range finders .......................................................................................................... 121 3.2.2 Infrared sensors .................................................................................................................................... 123 3.3 SENSOR FUSION USING NEURAL NETWORKS ..............................................................................................124 3.3.1 Experiments and results ...................................................................................................................... 125 3.4 SENSOR FUSION USING FUZZY RULES .......................................................................................................... 129 3.4.1 Some background on clustering methods.........................................................................................129 3.4.2 Fuzzy tabu clustering........................................................................................................................... 131 3.4.3 Adaptive network based fuzzy inference system (ANFIS) .............................................................138 3.4.4 Experiments and results ...................................................................................................................... 139 3.5 CONCLUSIONS ................................................................................................................................................ 143 ROBOT NAVIGATION .......................................................................................................................................145 4.1 I NTRODUCTION ..............................................................................................................................................145 4.1.1 Map representation methods ..............................................................................................................146 4.1.2 Path planning........................................................................................................................................148 4.1.3 Localisation........................................................................................................................................... 150 4.3 M AP BUILDING AND PATH PLANNING .........................................................................................................152 4.3.1 Fuzzy grid map ..................................................................................................................................... 152 4.3.2 A*-based planning ................................................................................................................................ 155 4.3.3 Experiments and results ...................................................................................................................... 158 4.4 M AP BUILDING AND LOCALISATION ............................................................................................................160 4.4.1 Fuzzy segments map ............................................................................................................................161 4.4.2 Fuzzy segments localisation ............................................................................................................... 166 4.4.3 Experiments and results ...................................................................................................................... 167 4.5 CONCLUSIONS ................................................................................................................................................ 169 CONTROL ARCHITECTURE .......................................................................................................................... 171 5.1 I NTRODUCTION ..............................................................................................................................................171 5.1.1 Del iberative and reactive control architectures .............................................................................171 5.1.2 Hybrid control architectures ..............................................................................................................176 5.1.3 Behaviour definition ............................................................................................................................179 5.1.4 Behaviour coordination ...................................................................................................................... 182 6 HUMBERTO M ARTÍNEZ BARBERÁ 5.1.5 Multiagent systems ...............................................................................................................................185 5.2 T HE BGA CONTROL ARCHITECTURE ...........................................................................................................189 5.2.1 Antecedents: MadCar’s architecture ................................................................................................189 5.2.2 BGA: a multiagent architecture .........................................................................................................194 5.2.3 Inter-agent communication in BGA...................................................................................................203 5.2.4 Conclusions ............................................................................................................................................208 5.3 T HE BG PROGRAMMING LANGUAGE ...........................................................................................................208 5.3.1 Antecedents ............................................................................................................................................208 5.3.2 BG syntax and semantics .....................................................................................................................209 5.3.3 The BGen development environment.................................................................................................222 5.3.4 Conclusions ............................................................................................................................................225 A SAMPLE AGENT ARCHI TECTURE .........................................................................................................227 6.1 I NTRODUCTION...............................................................................................................................................227 6.2 SENSOR PROCESSING AND ACTUATOR CONTROL .......................................................................................228 6.3 REACTIVE CONTROL ......................................................................................................................................229 6.3.1 Common block .......................................................................................................................................230 6.3.2 Obstacle avoidance behaviours..........................................................................................................232 6.3.3 Goal reaching behaviour .....................................................................................................................236 6.3.4 Alignment behaviours ...........................................................................................................................237 6.3.5 Behaviour blending ..............................................................................................................................238 6.4 NAVIGATION ...................................................................................................................................................240 6.5 PLANNING ........................................................................................................................................................241 6.6 EXPERIMENTS AND RESULTS ........................................................................................................................243 6.6.1 Point-to-point navigation experiment (nav -A).................................................................................243 6.6.2 Point-to-point navigation experiment (nav -B).................................................................................246 6.6.3 Point-to-point navigation experiment (nav -C).................................................................................249 6.6.4 Map-building experiment (map-A) ....................................................................................................251 6.7 CONCLUSIONS .................................................................................................................................................253 BEHAVIOUR FUSION LEARNING, ...............................................................................................................255 7.1 I NTRODUCTION...............................................................................................................................................255 7.2 BEHAVIOUR FUSION USIN G GENETIC -BASED ALGORITHMS......................................................................255 7.2.1 Introduction ...........................................................................................................................................256 7.2.2 Representation and initial population ...............................................................................................256 7.2.3 Genetic operators and parameters ....................................................................................................258 7.2.4 Fitness function .....................................................................................................................................260 7.3 EXPERIMENTAL RESULTS ..............................................................................................................................261 7.4 CONCLUSIONS .................................................................................................................................................264 A WEB-BASED TELEOPER ATION APPLICATION ...............................................................................267 8.1 I NTRODUCTION...............................................................................................................................................267 8.2 WEB ROBOTICS TECHNOLOGIES ...................................................................................................................271 8.2.1 The UWA Telerobot..............................................................................................................................271 8.2.2 EPFL KhepOnTheWeb.........................................................................................................................273 7 CONTENIDO /CONTENTS 8.2.3 CMU Xavier .......................................................................................................................................... 274 8.2.4 NASA WITS ........................................................................................................................................... 277 8.3 T HE BGEN-W EB SYSTEM .............................................................................................................................279 8.3.1 Objectives ..............................................................................................................................................279 8.3.2 BGen-Web architecture.......................................................................................................................280 8.3.3 Experiments and results ...................................................................................................................... 283 8.4 CONCLUSIONS ................................................................................................................................................ 285 CONCLUSIONS AND FUTURE WORK .......................................................................................................287 9.1 CONCLUSIONS ................................................................................................................................................ 287 9.1.1 The Quaky-Ant mobile robot..............................................................................................................287 9.1.2 Sensor fusion .........................................................................................................................................288 9.1.3 Navigation .............................................................................................................................................288 9.1.4 Control architecture and programming language .........................................................................289 9.1.5 Web robotics application .................................................................................................................... 291 9.2 F UTURE WORK................................................................................................................................................ 291 COMMERCIAL MOBILE ROBOTS: A SURVEY..................................................................................... 295 A.1 IS ROBOTICS ................................................................................................................................................. 295 A.2 ACTIVM EDIA R OBOTICS..............................................................................................................................296 A.3 NOMADICS T ECHNOLOGIES .........................................................................................................................297 A.4 K-T EAM .........................................................................................................................................................297 A.5 ROBOS OFT SERVICE ROBOTICS ..................................................................................................................298 A.6 APPLIED AI SYSTEMS ..................................................................................................................................298 A.7 P ROBOTICS ..................................................................................................................................................... 299 OBTAINING THE BGEN S OFTWARE .........................................................................................................301 REFERENCES ........................................................................................................................................................303 8 PARTE Iª 9 “... jamás podrán usar palabras ni otros signos, componiéndolos como hacemos nosotros, para manifestar a los demás nuestros pensamientos. Pero se puede concebir una máquina que exprese palabras e, incluso, emita algunas respuestas a acciones de tipo corporal que se le causen y que produzcan cambios en sus órganos ...” (René Descartes, Discurso del Método) “I wish to work miracles” (Leonardo da Vinci) 11 Resumen Esta tesis presenta una serie de trabajos relacionados a la robótica móvil, que abarcan los distintos niveles necesarios para conseguir que un robot móvil realice tareas de forma independiente en un entorno previamente desconocido. Se emplean técnicas de SoftComputing (redes neuronales, algoritmos genéticos, y especialmente lógica difusa) para el desarrollo de los distintos niveles. Se empieza describiendo la arquitectura hardware que da lugar a la plataforma QuakyAnt, un robot autónomo móvil que es la base de la mayor parte de las pruebas de los distintos algoritmos en entorno real. Tomando como base esta plataforma, se presenta una solución al problema de la fusión sensorial no basada en modelo del entorno. Dicha fusión se resuelve inicialmente con una red neuronal simple para, posteriormente, utilizar un sistema híbrido que genera reglas difusas y basado en la red ANFIS y algunos algoritmos de clustering, uno de los cuales ha sido desarrollado en el marco de este trabajo, el clustering difuso con búsqueda tabú. Una vez que se asumen unos sensores mejorados (tras el proceso de fusión sensorial), se plantea el problema de la navegación. Este se resuelve, en dos etapas, por la aplicación de dos técnicas de navegación. En la primera etapa se aplica un mapa de celdillas difusas para el problema de la elaboración de mapas, así como una búsqueda A* para el problema de la generación de trayectorias. En la segunda etapa se aplica un mapa de segmentos difusos para el problema de la elaboració n de mapas, así como un sistema de localización basado en el mapa de segmentos difusos. Una vez que se tienen desarrolladas las capacidades básicas de navegación y percepción del entorno, se define una arquitectura distribuida de control (BGA) que sirve como base del desarrollo de los distintos elementos de control. Esta arquitectura permite distinguir entre procesos deliberativos y comportamientos reactivos. Los distintos elementos de la arquitectura se comunican por medio de un protocolo basado en KQML sobre UDP. Para facilitar el desarrollo de programas de control por medio de la arquitectura BGA, se ha 13 RESUMEN definido un lenguaje (BG) que hace uso de la lógica difusa para especificar y definir los distintos módulos de control. La arquitectura BGA y el lenguaje BG se han integrado en un entorno de programación y simulación (BGen) desarrollado íntegramente en Java, con el objetivo de poder ser integrado en distintos robots móviles. Para resolver el problema de la fusión de comportamientos se ha utilizado un algoritmo genético que genera y sintoniza el conjunto de reglas difusas de fusión, evaluando las distintas soluciones candidatas en el entorno BGen. Finalmente, se muestra una aplicación de la plataforma Quaky-Ant y del entorno BGen en un entorno web (BGen-Web) que permite la teleoperación del robot móvil en tareas de exploración y presenta al usuario gran interactividad gracias al uso de la transmisión de vídeo multicast. 14 Capítulo 0 Descripción General 0.1 Introducción La robótica móvil es un campo de investigación relativamente nuevo que está relacionado con el control de vehículos autónomos o semi-autónomos. Los robots móviles no son sólo una colección de algoritmos, sino también sistemas físicos que trabajan en entornos reales. Así, los robots móviles facilitan un marco de trabajo en el mundo real donde se pueden probar conceptos teóricos y algoritmos. Aunque la mayoría de los robots móviles son experimentales, hay aplicaciones emergentes de importancia industrial o comercial. Esta tesis aborda el problema de hacer que un robot móvil desarrolle tareas con cierta fiabilidad en entornos completamente desconocidos. Este tópico es bastante general, y así, está relacionado con la especificación de tareas de robots móviles, el desarrollo de comportamientos de robots móviles, la interpretación del entorno, y la validación del sistema final. Hay muchas familias de técnicas disponibles para atacar los distintos subproblemas. Esta tesis usa y desarrolla técnicas inteligentes en el campo que se conoce como softcomputing (Zadeh, 1994) o computational intelligence (Schult et al., 2000). Haciendo un uso extenso de la lógica difusa, los algoritmos genéticos, y las redes neuronales. Un punto importante es el uso de técnicas de aprendizaje para asistir en el desarrollo de tanto los comportamientos del robot y la interpretación del entorno. Dados los anteriores problemas de robótica, el objetivo principal de esta tesis es el uso, desarrollo e integración de técnicas que no hacen uso de modelos a priori del entorno. Además, se ha creado un entorno de desarrollo para hacer el sistema robótico robusto y ayudar en la programación de software de control. Este entorno de desarrollo incluye módulos 15 CAPÍTULO 0: DESCRIPCIÓN GENERAL para la especificación, aprendizaje y validación de comportamientos de robots móviles, e integra una arquitectura híbrida (reactivo-deliberativa) que se ejecuta en un entorno distribuido. Los distintos trabajos de la tesis se pueden agrupar en los siguientes puntos: • El desarrollo de una arquitectura de control, basada en el uso de técnicas inteligentes. • La especificación y desarrollo de la plataforma y el modelo de comunicación entre los diferentes elementos del sistema. • El uso y desarrollo de técnicas de aprendizaje para la definición y fusión de comportamientos así como la interpretación del entorno. • El uso y desarrollo de técnicas de navegación en entornos desconocidos: realización de mapas, planificación de trayectorias y localización. 0.2 La plataforma Quaky-Ant Durante la preparación de esta tesis se han desarrollado, construido y probado distintas plataformas, desde el muy simple MadCar hasta el robot final Quaky-Ant (ver capítulo 1). El desarrollo de los distintos robots estaba condicionado por diferentes factores, siendo los más importantes las dimensiones físicas de los laboratorios y pasillos adyacentes, y el coste total. Las dimensiones físicas limitaban el tamaño máximo a 50 cm de diámetro, y el coste estaba limitado por un presupuesto de 600.000 ptas. Adicionalmente, se impusieron otras restricciones al desarrollo final: • La plataforma debía ser suficientemente flexible para probar los distintos algoritmos y arquitecturas presentadas en esta tesis. • La plataforma debía ser robusta y fácilmente actualizable, haciendo uso de distintos sensores (ultrasonidos e infrarrojos). • La plataforma debía ser capaz de desarrollar satisfactoriamente tareas complejas, operando en entornos tanto interiores como exteriores no abruptos. El resultado fue el robot Quaky-Ant que, curiosamente, se parece a un robot Pioneer de ActiveMedia Robotics (ver apéndice A), pero sólo es una coincidencia estética. Este robot 16 HUMBERTO M ARTÍNEZ BARBERÁ incorpora una arquitectura distribuida organizada en dos capas, la de bajo nivel basada en microcontroladores y la de alto nivel basada en microprocesador. El robot incluye 2 motores de 12 V DC que atacan las ruedas de la dirección diferencial y el soporte de una pequeña rueda “loca”. Esta es la típica configuración en triángulo para obtener un buen equilibrio. Además, la distancia al suelo es pequeña para permitir un centro de gravedad bajo. El robot (Figura 0.1) (Figura 0.2) dispone de distintos sensores, que se describen a continuación: • Diez sensores de ultrasonidos que se usan para medir distancias a paredes y objetos lejanos. • Siete sensores de infrarrojos que se usan para medir distancias a paredes y objetos cercanos. • Dos encoders de alta resolución montados sobre los ejes de las ruedas motrices, y que se usan para los cálculos odométricos. • Un compás electrónico que mide la orientación del robot respecto al polo norte magnético de la Tierra. Se utiliza para compensar los errores de odometría en entornos interiores. • Un receptor GPS para posicionamiento absoluto, que se utiliza para compensar los errores de odometría en entornos exteriores. • Una cámara RGB que se utiliza como realimentación visual en tareas de teleoperación. Figura 0.1: Robot Quaky-Ant. Figura 0.2: Componentes de Quaky-Ant. 17 CAPÍTULO 0: DESCRIPCIÓN GENERAL Teniendo en cuenta el coste, el rendimiento y la capacidad de ampliación, el robot Quaky-Ant se compara favorablemente con distintos robots comerciales (ver apéndice A). Las principales características de la plataforma se pueden resumir en: • Exhibe una arquitectura abierta. Dos puntos clave son el uso del bus I2C para la interconexión de los distintos módulos y la definición de un protocolo abierto genérico. • Una consecuencia del diseño de la arquitectura es la robustez y capacidad de ampliación (todos los módulos se acceden de la misma forma independientemente de su naturaleza). • La plataforma no está ligada a ninguna arquitectura de control o sistema de desarrollo, aunque ha servido de base para el desarrollo de una. • El robot opera suave y correctamente en entornos interiores y es capaz también de operar en exteriores no abruptos. En ambos casos, no exhibe una alta velocidad máxima (en torno a los 0.45 m/s) ya que esta no era fundamental en el diseño. En contraste, exhibe una buena maniobrabilidad y precisión. 0.3 Fusión sensorial Uno de los objetivos de la tesis es que el robot Quaky-Ant realice tareas en entornos de oficina, sin conocimiento previo del entorno, y sin ninguna modificación del entorno (ver capítulo 3). Este objetivo, entre otras cosas, dicta la necesidad de tener sensores fiables tanto para construir mapas como para controlar el robot. El robot Quaky-Ant sólo dispone de sensores de bajo coste que dan una distancia como salida: ultrasonidos e infrarrojos. Estos dos tipos de sensores pueden ser complementarios (cada uno tiene sus propias fuentes de error), y una forma natural de manejar la incertidumbre es mediante la realización de fusión sensorial al nivel de la señal. Así, la salida de cada par de sensores apuntando a la misma zona es combinada en lo que se llama un sensor virtual. Puesto que el robot construye su propio modelo del mundo, técnicas que necesitan previamente un modelo del entorno (como ocurre con los filtros de Kalman) no son aplicables. Las técnicas de aprendizaje supervisado han sido previamente utilizadas para fusión sensorial, y se usan también en esta tesis, en particular una red neuronal y un sistema de reglas difusas. 18 HUMBERTO M ARTÍNEZ BARBERÁ Uno de los problemas más cruciales del aprendizaje supervisado par la fusión sensorial es la adquisición de datos. Esta adquisición no es simple porque se necesitan suficientes medidas representativas: tanto las lecturas de los sensores y la distancia real para diferentes ángulos de incidencia, superficies de objetos, y distancias. En este caso, la preparación de los experimentos en un entorno real es muy difícil. Se encuentran disponibles en la literatura buenos modelos de los sensores, y por ello se utiliza la simulación para la adquisición de datos. El método de fusión es entrenado con estos datos, y el modelo de fusión resultante se valida en un entorno real siguiendo un proceso iterativo (Figura 0.3). Figura 0.3: Procedimiento de construcción del método de fusión. Como modelo de los ultrasonidos se ha utilizado el método del sonar-tracing (Gallardo, 1999). Este método produce buenos resultados porque puede modelar las distintas causas de error de los ultrasonidos, incluidos los dobles rebotes. Sin embargo, no modela completamente el efecto del lóbulo principal. Así, cuando el rayo de distancia mínima está alejado del centro del lóbulo, el tiempo de vuelo resultante es muy conservativo y produce distancias menores que las reales. El método ha sido modificado para tener en cuenta también este efecto, introduciendo el efecto de la distancia angular del rayo al centro del lóbulo. Como no se encuentran en la literatura muchos modelos de infrarrojos se ha usado directamente una linearización de la respuesta empírica del sensor, basada en la información del fabricante. Se han utilizado dos entornos distintos para la adquisición de datos para el entrenamiento de la red neuronal, y su posterior validación. El primero está inspirado en la planta del sótano de la Facultad de Informática (donde está el laboratorio del autor), con unas dimensiones de 17 x 22 m, pasillos de 2 m y puertas de 80 cm (Figura 0.4). El segundo está 19 CAPÍTULO 0: DESCRIPCIÓN GENERAL basado en la planta del Tinity College Fire Fighting Contest, con unas dimensiones de 3.5 x 3.5 m, y pasillos y puertas de 65 cm (Figura 0.5). Figura 0.4: Entorno sótano. Figura 0.5: Entorno trinity. Para seleccionar los valores apropiados para los parámetros de la red se sigue un proceso iterativo (Figura 0.6). En lugar de entrenar una única red, se entrenan varias redes con distintos parámetros, y la mejor red, aquella con el menor error sobre ejemplos de prueba, se utiliza como salida del proceso de aprendizaje. Para acelerar la ejecución del proceso de aprendizaje se utilizan simultáneamente cuatro equipos que ejecutan un algoritmo de backpropagation. Una vez que el proceso ha terminado el robot se simula en ambos escenarios sobre distintos caminos predefinidos, sobre los que se realizan medidas de ambos tipos de sensores. Las medidas se fusionan y se compara el resultado con la medida real para así obtener una medida de error. El propósito de este paso es medir la capacidad de generalización de la red neuronal. Figura 0.6: Proceso de entrenamiento de la red. 20 HUMBERTO M ARTÍNEZ BARBERÁ Los resultados se tabulan haciendo uso del error medio (Avg), varianza (Var), y error cuadrático medio (RMSE). Los resultados de la red neuronal se comparan con los des los sensores individuales y con el método de fusión de reglas simples de Flynn (Flynn, 1988). Cabe destacar que la red produce mejores resultados (Tabla 3.1) (Tabla 3.2) en todos los casos salvo en la media del escenario trinity, debido a que las paredes están siempre muy cerca y se pueden utilizar fiablemente los infrarrojos que es lo que hace siempre el método de Flynn. En adición a este método de fusión simple, se ha desarrollado uno más complejo basado en el uso de un sistema de reglas difusas. En este caso las reglas se obtienen como resultado de un proceso de aprendizaje híbrido. Este proceso está compuesto de tres técnicas, cada una aplicada a una fase del modelado difuso. Primero se utiliza una técnica que encuentra el número de reglas que componen el sistema. Esto se realiza por medio del mountain clustering (Yager y Filev, 1994) que produce como salida el número inicial de centroides que representan los distintos comportamientos de los datos. Con el número de centroides se aplica un segundo algoritmo de clustering para obtener unos centroides mejorados que posteriormente constituirán los parámetros iniciales de las reglas. En esta etapa se utilizan los métodos k-mean, fuzzy k-mean y fuzzy tabu clustering. Finalmente, se aplica una red neuronal adaptativa ANFIS (Jang, 1993) que se encarga del ajuste final de las reglas. El fuzzy tabu clustering es un nuevo algoritmo que se ha desarrollado para tratar la existencia de múltiples mínimos locales en tareas de clustering. Está basado en la técnica de búsqueda tabú y en algunas ideas del algoritmo fuzzy c-means. La búsqueda tabú en tareas de clustering (Al-Sultan, 1995) es diferente de la búsqueda local basada en ascensión de colinas en el sentido de que no se queda atrapada en mínimos locales, como es el caso del fuzzy cmeans. El nuevo algoritmo es, en esencia, una búsqueda tabú que tiene en cuenta los grados de pertenencia de los datos a los distintos centroides. Así, el objetivo es dividir los datos en un número conocido de grupos de forma que se minimice una función objetivo similar a la de fuzzy c-means, que está basada en los grados de pertenencia a los distintos centroides. Como medida de error de los datos a los centroides se utiliza la distancia euclídea, aunque se podría haber utilizado cualquier otra. Se han realizado distintos experimentos que muestran como con esta técnica se pueden obtener soluciones óptimas o casi ó ptimas globalmente más 21 CAPÍTULO 0: DESCRIPCIÓN GENERAL fácilmente que con técnicas de clustering tradicionales. Para ello se han utilizado dos conjuntos de datos: un conjunto en que los clusters están claramente diferenciados, y otro en que los clusters no están claramente diferenciados (el famoso conjunto de los IRIS de Anderson). De forma similar a como se hizo con la red neuronal, se han utilizado dos entornos distintos para la adquisición de datos para el aprendizaje de las reglas difusas, y su posterior validación. El primero es similar a un pasillo al que están conectadas diversas salas, con unas dimensiones de 5.2 x 4.9 m, pasillos de 1.2 m y puertas de 80 cm (Figura 0.7). El segundo está basado en la planta del Tinity College Fire Fighting Contest, con unas dimensiones de 3.5 x 3.5 m, y pasillos y puertas de 65 cm (Figura 0.8). Para generar las reglas difusas se han utilizado distintas combinaciones de métodos de clustering, de las que destacan tres por sus resultados. En todas se utiliza un mountain clustering para obtener el número de clusters (Tabla 3.9), un método adicional de clustering para obtener los centroides, y una red ANFIS (Tabla 3.11) para generar las reglas a partir de los centroides. Como métodos de clustering para obtener los centroides se han utilizado el fuzzy k-means (Tabla 3.9), el k-means, y el fuzzy tabu (Tabla 3.10). Figura 0.7: Entorno trinity. Figura 0.8: Entorno pasillo. Una vez que el proceso de aprendizaje ha terminado el robot se simula en ambos escenarios sobre los caminos utilizados para la adquisición de datos (Figura 0.7) (Figura 0.8). El propósito de este paso es comprobar las características de aprendizaje de las bases de reglas difusas. Los resultados se tabulan igualmente haciendo uso del error medio (Avg), varianza (Var), y error cuadrático medio (RMSE). Los resultados de las reglas difusas se 22 HUMBERTO M ARTÍNEZ BARBERÁ comparan con los de los sensores individuales, con la red neuronal anteriormente generada, y con el método de fusión de reglas de Flynn. Cabe destacar que la combinación basada en la búsqueda tabú produce los mejores resultados (Tabla 3.12) (Tabla 3.13) en todos los casos, tanto en Avg como en RMSE. Las reglas difusas comparadas con las reglas de Flynn obtienen mejores resultados en el escenario del pasillo porque sus proporciones son similares a un entor no real, ya que el escenario trinity tiene pasillos excesivamente estrechos para el tamaño del robot. Finalmente, el robot se simula en ambos escenarios sobre caminos alternativos (Figura 0.9) (Figura 0.10), distintos a los utilizados para la adquisición de datos. El propósito de este paso es comprobar las características de generalización de las bases de reglas difusas. Cabe destacar de nuevo que la combinación basada en la búsqueda tabú produce los mejores resultados (Tabla 3.14) (Tabla 3.15) en el escenario del pasillo, aunque en el escenario trinity las reglas de Flynn y la combinación basada en k-means producen resultados ligeramente mejores. El motivo es que el escenario trinity puede ser navegado haciendo uso exclusivo de los infrarrojos, y el recorrido de validación sigue una trayectoria muy complicada. En cualquier caso, las reglas obtenidas por la combinación basada en el fuzzy tabu clustering obtienen los mejores resultados de forma general. Figura 0.9: Validación en trinity. Figura 0.10: Validación en pasillo. 0.4 Navegación Para un robot móvil la habilidad de navegar es una de las más importantes. La navegación se puede definir como la combinación de tres habilidades fundamentales: la 23 CAPÍTULO 0: DESCRIPCIÓN GENERAL construcción de mapas, que es el proceso de realizar mapas a partir de las lecturas de los sensores en diferentes posiciones, la localización, que es el proceso de conocer la posición actual del robot a partir de las lecturas de los sensores y el mapa actual, y la generación de trayectorias, que es el proceso de obtener una trayectoria factible y segura desde una determinada posición a otra objetivo a partir del mapa actual. La plataforma Quaky-Ant ha sido dotada de distintos algoritmos y métodos para atacar el problema de la navegación (ver capítulo 4). Inicialmente, el sistema asumía una odometría fiable para operaciones de corto tiempo y distancia. Teniendo en cuenta esta asunción, incorporaba un método de construcción de mapas y otro de generación de trayectorias. Para solventar el problema de la fuerte restricción de la odometría, se dotó al sistema de un método de localización basado en otro modelo de mapa. La combinación final de ambos métodos resultó funcionar bastante bien. El primer modelo de navegación usa un mapa de celdillas difusas y un algoritmo basado en el A* para generar las trayectorias. De forma distinta a otras soluciones, el robot construye los mapas mientras está en movimiento en vez de detenerse constantemente para actualizar el mapa. La idea es que el robot exhiba cierto grado de reactividad similar al utilizado por animales y personas. El proceso de navegación es como sigue. Cada vez que un sensor produce un valor nuevo el mapa se actualiza. Este paso no consume mucho tiempo y se realiza a una alta frecuencia (cada 300 ms aproximadamente). Simultáneamente el robot se va desplazando y cada cierto tiempo se genera una nueva trayectoria teniendo en cuenta el mapa actual. Este paso se realiza a una frecuencia menor (cada 1200 ms aproximadamente) debido a que la generación de trayectorias consume comparativamente más tiempo. El método de representación de mapas está basado en los mapas de celdas difusa (Oriolo et al., 1998). Con este método se usan y actualizan tres mapas: el de las celdas que están posiblemente vacías (Figura 0.11), que tienen un grado de pertenencia alto al conjunto difuso celda vacía, el de las celdas que están posiblemente ocupadas (Figura 0.12), que tienen un grado de pertenencia alto al conjunto difuso celda ocupada, y el de las celdas que son seguras para navegación (Figura 0.13), que se obtiene combinando los otros dos de forma que las celdas tienen simultáneamente un grado de pertenencia alto al conjunto difuso celda vacía y un grado de pertenencia bajo al conjunto difuso celda ocupada. 24 HUMBERTO M ARTÍNEZ BARBERÁ Figura 0.11: Mapa de celdas vacías. Figura 0.12: Mapa de celdas ocupadas. Figura 0.13: Mapa de navegación. Un punto importante de la solución adoptada es que el método original sólo tiene en cuenta medidas de sensores de ultrasonidos, modelando su incertidumbre directamente sobre el mapa. En este caso se utilizan los sensores virtuales que producen como resultado la fusión de los ultrasonidos e infrarrojos para obtener una medida más fiable. Como el infrarrojo es dominante sólo para distancias menores de 80 cm, la apertura del haz del mismo se aproxima bastante a la del sonar si se proyecta sobre las celdas del mapa, ya que estas tienen habitualmente un tamaño de 10 a 20 cm. De esta forma el modelo del mapa es también válido. Otra diferencia importante estriba en la inicialización del mapa. El método original considera una aproximación tipo sense-map-plan-move y modela el área no explorada como ambigua o ocupada. Como la aproximación de este trabajo es distinta, el método de generación de trayectorias requiere que el área no modelada se represente como vacía, de forma que las 25 CAPÍTULO 0: DESCRIPCIÓN GENERAL trayectorias posibles crucen el área no explorado si es necesario. Finalmente, las celdas bajo la posición del robot se consideran como no ocupadas y vacías. Trivialmente, si el robot se encuentra en esa posición es que no hay ningún obstáculo debajo. Esta estrategia produce muy buenos resultados, no solo en la calidad del mapa, sino también a la hora de generar trayectorias La forma más directa para generar trayectorias en un mapa de celdillas es considerar cada celda como un nodo de un grafo y posteriormente aplicar un A*. Este algoritmo (Hart et al., 1968) permite el uso de información heurística cuando está disponible, resultando en una búsqueda eficiente. A partir del mapa de navegación segura se puede aplicar este método. Hay en la literatura aproximaciones para resolver este problema teniendo en cuenta el ruido de los sensores, el tamaño del robot, y recorridos de tamaño mínimo (Oriolo et al., 1998), pero no simultáneamente. En este trabajo se ha optado por una solución que encuentra caminos de tamaño mínimo a la vez que produce caminos seguros y alejados de obstáculos. La cota del coste mínimo de la función de evaluación del A* se calcula en dos pasos: en primer lugar el mapa difuso es dilatado localmente alrededor de la celda actual, para posteriormente aumentar exponencialmente la función de coste (Figura 0.14). Figura 0.14: Función de evaluación. El paso de dilatación sirve para rellenar pequeños huecos en el mapa de forma que los caminos tentativos no crucen las paredes por pequeñas aberturas. Un parámetro permite especificar el diámetro de dilatación de forma que se pueden tener en cuenta robots de distinto tamaño. El paso de aumentación expande exponencialmente el grado de certeza de que las celdas están ocupadas, de forma que valores pequeños den resultados similares y que valores 26 HUMBERTO M ARTÍNEZ BARBERÁ grandes den resultados muy distintos. La idea es evitar el uso de las celdas ocupadas para la generación de trayectorias. El paso de dilatación es aplicado localmente para evit ar aplicar la transformación al mapa completo porque la generación de trayectorias típicamente involucra a pequeñas porciones del mapa y, además, es un proceso costoso en tiempo. Usando este esquema basado en A*, la estrategia de navegación usa el camino obtenido para establecer un punto de look-ahead que es usado para la navegación local. Este punto se sitúa sobre el camino a una cierta distancia del robot, que es función de la velocidad media del robot y de la frecuencia del ciclo de control. Se han realizado diversos experimentos de la estrategia de navegación (Figura 0.15) (Figura 0.16), en entornos tanto simulados como reales, que muestran su eficacia. 27 CAPÍTULO 0: DESCRIPCIÓN GENERAL Figura 0.15: Estrategia de navegación simulada. Figura 0.16: Estrategia de navegación en entorno real. El primer modelo de navegación asumía una odometría fiable para corto tiempo y distancia. Esto es claramente una restricción importante en entornos grandes o en operaciones que requieran mucho tiempo. Para solventar esta limitación, el método anterior se complementa con un método de localización basado en mapas de segmentos difusos. Los 28 HUMBERTO M ARTÍNEZ BARBERÁ mapas globales de celdillas no son válidos para localización, mientras que los mapas locales de celdillas requieren una referencia precisa de un compás para obtener resultados fiables (Duckett y Nehmzov, 1998). Por otro lado, el método anteriormente descrito, si se provee con una localización fiable produce caminos seguros de una forma rápida. Así, el segundo modelo de navegación está basado en el primero con la adición de una técnica de localización que no necesita un modelo a priori del entorno. El nuevo modelo funciona como sigue. El mapa de celdas difusas y el de generación de trayectorias se ejecutan de igual forma que en el caso anterior, pero en lugar de utilizar la posición obtenida por odometría usan la posición corregida por el método de localización. Simultáneamente, un mapa local de segmentos difusos se actualiza cada vez que se llena un buffer se ultrasonidos (cada 1200 ms aproximadamente). Este mapa se compara (cada 2500 ms aproximadamente) con un mapa global de segmentos difusos para obtener las diferencias y calcular el error actual de la odometría, para posteriormente corregir la posición. Ambos mapas se construyen sin tener ningún conocimiento previo del entorno. El método del mapa de segmentos difusos (Gasós y Martín, 1996b) (Gasós, 2000) es una representación geométrica formada únicamente por segmentos de línea. Manejando cuidadosamente la incertidumbre relativa a la posición del robot y el ruido de los sensores, el método intenta solucionar los problemas típicamente asociados a los modelos geométricos. En este caso, el mapa ampliamente reduce la falta de estabilidad disminuyendo la precisión de los ajustes de las líneas (hacié ndolas más gruesas), manteniendo la incertidumbre durante todo el proceso. El modelo original genera los segmentos de línea considerando cada sensor individualmente a lo largo del tiempo. Las medidas de los sensores son preprocesadas para eliminar errores claramente detectables y agrupar medidas correlativas. Finalmente estos grupos son aproximado por una recta para así generar un segmento difuso. Esta aproximación genera mapas aceptables, pero no tiene la habilidad de combinar medidas de distintos sensores al nivel sensorial. Esto hace que el proceso de generación de los mapas sea lento, porque se necesita bastante tiempo para obtener buenas medidas que puedan ser ajustadas por una recta. Se ha desarrollado un nuevo método de generación de segmentos para solventar este problema. Así, este método construye y mantiene de forma heurística un buffer circular que almacena las últimas medidas de todos los sensores. Cada vez que un nuevo conjunto de medidas está disponible (un ciclo de disparo de los sensores), las que son menores de una 29 CAPÍTULO 0: DESCRIPCIÓN GENERAL determinada longitud se almacenan secuencialmente en el buffer sobreescribiendo los valores menos recientes. Este buffer se utiliza para extraer las agrupaciones de puntos que posteriormente se ajustarán con una recta para dar lugar a un segmento difuso (Figura 0.17). Este método obtiene heurísticamente segmentos más consistentes (Figura 0.19) que en el modelo original (Figura 0.18), a la vez que se necesitan menos movimientos. Este método ha sido probado en entorno simulados y reales con resultados satisfactorios. Figura 0.17: Buffer de sensores y generación de segmentos. Figura 0.18: Generación original. Figura 0.19: Generación modificada. Para la localización (Gasós y Martín, 1996a) se utilizan mapas de segmentos difusos obtenidos de la forma anterior. Mientras que el robot navega por el entorno para realizar una tarea, las observaciones de los sensores se utilizan para construir un mapa parcial de segmentos difusos que incluye el entorno cercano del robot. Este mapa es comparado con el mapa global que se ha construido previamente (se ha podido actualizar en el instante anterior). Cuando la superposición de ambos mapas produce suficiente evidencia de 30 HUMBERTO M ARTÍNEZ BARBERÁ coincidencia, la localización del robot se actualiza basada en las diferencias entre los mapas. La localización se ejecuta como un proceso continuo durante todo el tiempo que el robot se está desplazando. Un punto importante en el proceso de localización es la correcta coincidencia entre el mapa local y el global. Si no hay suficientes segmentos perpendiculares, la posible coincidencia no se tiene en cuenta. Si sólo coinciden segmentos paralelos no se puede tener suficiente certeza de la posición en la dirección de las paralelas. Claramente, cuanto mayor sea el número de segmentos correctos que se han generado mayor es la posibilidad de alcanzar una localización satisfactoria. Este es el mayor beneficio que se obtiene del método de generación de segmentos propuesto anteriormente. Se han realizado diferentes pruebas de localización, tanto en simulación (Figura 0.20) como en entorno real (Figura 0.21), mostrando unos buenos resultados. 31 CAPÍTULO 0: DESCRIPCIÓN GENERAL Figura 0.20: Localización en entorno simulado. Figura 0.21: Localización en entorno real. 0.5 Arquitectura de control Una vez que la plataforma Quaky-Ant ha sido dotada de unas habilidades básicas (fusión sensorial y navegación), el siguiente problema que surge es como estructurar y organizar los distintos elementos de un sistema de control. En este contexto se ha definido una arquitectura, llamada BGA (ver capítulo 5), que está basada en un modelo multi-agente (Hendeler, 1999) para combinar agentes reactivos y deliberativos. Estos agentes juegan un papel importante como elementos estructurales de la arquitectura, y se comunican mediante un esquema basado en una pizarra (Hayes-Roth, 1985). Esto agentes se pueden organizar en distintos niveles y distribuir en función de sus funcionalidades. En concreto se ha utilizado 32 HUMBERTO M ARTÍNEZ BARBERÁ una división en niveles similar a las arquitecturas híbridas de tres niveles (Gat, 1998). Como estos agentes pueden incorporar comportamientos reactivos, la fusión de sus salidas se realiza mediante un método basado en el context-dependent blending (Saffiotti, 1997b). El diseño de la arquitectura se ha basado en los siguientes requisitos: • Integración de módulos reactivos (ej. evitación de obstáculos) y deliberativos (ej. generación de trayectorias), de forma que cada módulo sea una entidad semi-autónoma e independiente, encargada de una determinada tarea. Así se consigue la encapsulación, aislamiento, y control local. • No se hace ninguna asunción acerca del conocimiento que tienen cada subsistema ni de los métodos de resolución que implementa. Además, se debe hacer un uso extensivo de la lógica difusa. • Organización flexible y dinámica, que permita la distribución de módulos dependiendo de sus necesidades de procesado y la habilitación y deshabilitación de dichos módulos en ejecución. • Adecuación para la incorporación de técnicas de aprendizaje en los distintos elementos de la arquitectura. En el contexto de la arquitectura BGA se definen tres unidades de computación básicas: • Behaviour: es la unidad computacional más simple de BGA. Los comportamientos son procesos concurrentes que tienen un tiempo de ejecución con cota casi constante. Estos comportamientos tienen una finalidad reactiva. • Blender: cuando se tienen uno o más comportamientos el blender se encarga de combinar los distintos comportamientos, tanto la fusión de comportamientos como la de comandos. No es necesario que haya ningún blender, pero sí es recomendable. • Task: es la unidad computacional más compleja de BGA. Las tareas son procesos asíncronos que no tienen un tiempo de ejecución con cota constante. Estas tareas tienen una finalidad deliberativa. 33 CAPÍTULO 0: DESCRIPCIÓN GENERAL Usando estas unidades básicas, la arquitectura BGA (Figura 0.22) está compuesta por elementos de los siguientes tipos: • Agent: un agente es un mecanismo que sirve para agrupar behaviours, blenders y tasks. Un agente no puede contener simultáneamente comportamientos y tareas. Esto da lugar a dos tipos de agentes: los agentes reactivos, que son aquellos que sólo contienen comportamientos, y los deliberativos, que son aquellos que sólo contienen tareas. Los agentes se ejecutan de una forma asíncrona y distribuida. • Blackboard: la pizarra sirve como una infraestructura de comunicación que enlaza los distintos agentes del sistema. Todos los agentes utilizan un protocolo común, basado en KQML, para comunicarse con la pizarra. Esta comunic ación se puede realizar usando una red o memoria compartida. • Sensor processing: las unidades de procesamiento sensorial están a cargo de acceder a las diferentes señales de los sensores para filtrarlas, procesarlas, y posteriormente enviar la información a la pizarra. Típicamente sólo hay uno de estos módulos en la arquitectura. • Actuator control: las unidades de control de los actuadores están a cargo de acceder a la pizarra para leer comandos motores y posteriormente enviarlos a los distintos actuadores. Típicamente sólo hay uno de estos módulos en la arquitectura. Figura 0.22: La arquitectura BGA. 34 HUMBERTO M ARTÍNEZ BARBERÁ Una idea clave en BGA es que cada agente reactivo se descompone en comportamientos muy simples que se ejecutan concurrentemente y acceden a la pizarra a través de variables de entrada y salida. Un agente reactivo en BGA (Figura 0.23) está compuesto por tres tipos de módulos que se ejecutan secuencialmente: • Un módulo common que se ejecuta antes de cualquier comportamiento y que construye o actualiza el espacio perceptual local (LPS) del agente (Saffiotti, 1997b). El LPS es una estructura que almacena información perceptual, interpretaciones de estos datos, y representaciones de subobjetivos creados por el planificador. • Un conjunto de comportamientos que se ejecutan concurrentemente. Estos son módulos reactivos estándar que enlazan directamente entradas (de los sensores o del LPS) a acciones. • Un módulo blender que especifica qué comportamientos están activos y cómo se fusionan las salidas de estos. Figura 0.23: Agente reactivo BGA. Puesto que dos comportamientos pueden modificar la misma variable de salida, se necesita un mecanismo de coordinación de comportamientos. En este caso, los blenders de BGA utilizan un método basado en el context-dependent blending (Saffiotti, 1997b). El método en BGA funciona como sigue. En tiempo de desarrollo, a cada comportamiento se le 35 CAPÍTULO 0: DESCRIPCIÓN GENERAL asigna un peso que es fijo durante toda la ejecución, y determina como es de importante ese comportamiento (y por tanto se utiliza para pesar las distintas salidas). Cada comportamiento, basado en su información sensorial (e.d. sus variables de entrada), produce una serie de acciones (e.d. sus variables de salida) que han de ser introducidas en la pizarra. En lugar de trabajar directamente sobre la pizarra, cada comportamiento utiliza una copia local de las variables de salida. El blender implementa un base de meta-reglas difusas, donde las reglas tienen entradas de información sensorial o de contexto y las salidas representan los grados de activación de los distintos comportamientos. Un mecanismo de inferencia difuso obtiene finalmente los grados de activación que se van a aplicar. Estos grados definen el esquema de arbitrio. Cuando un comportamiento tiene grado de activación cero, este es desactivado. Si tiene otro grado, este se activa parcialmente. Finalmente las copias de las variables locales se combinan teniendo en cuenta tanto los pesos de los comportamientos como sus grados de activación. La ventaja de utilizar estos pesos reside en su uso al aplicar técnicas de aprendizaje para obtener la base de reglas de fusión (ver capítulo 7). En este caso, estos pesos indican el conocimiento que tiene el usuario acerca de la importancia de cada comportamiento. Figura 0.24: Agente deliberativo BGA. Un agente deliberativo en BGA (Figura 0.24) está compuesto por dos tipos de módulos que se ejecutan secuencialmente: 36 HUMBERTO M ARTÍNEZ BARBERÁ • Un módulo common que se ejecuta antes de cualquier tarea y que construye o actualiza el espacio perceptual extendido (EPS) del agente. El EPS es una estructura que almacena información global y de alto nivel, interpretaciones de estos datos, y representaciones de objetivos y subobjetivos. • Un conjunto de tareas que se ejecutan secuencialmente. Estos son módulos deliberativos estándar que no enlazan directamente entradas a acciones, sino que modelan el mundo (a través del uso del EPS) o razonan acerca del mundo. Un punto importante que diferencia los agentes reactivos de los deliberativos es que las tareas no tienen esquemas explícitos de arbitrio. Es responsabilidad del diseñador evitar conflictos entre las distintas tareas. Esto no es restrictivo ya que, en caso de conflicto entre tareas, se puede implementar una tarea que resuelva los conflictos entre las otras. En la arquitectura BGA los servicios de comunicación ofrecidos por la pizarra son un punto muy importante. Los agentes se comunican enviando datos a la pizarra y recibiendo datos de esta. Para esta tarea BGA usa KQML (Labrou et al., 1999), un lenguaje de comunicación entre agentes. Las implementaciones típicas de KQML utilizan TCP, que es un protocolo robusto diseñado para proporcionar flujos fiables de bytes de extremo a extremo sobre redes no fiables. Esta fiabilidad se consigue a expensas de una gran latencia comparada con UDP, que es un protocolo de transporte no orientado a conexión que proporciona transmisión de bytes de extremo a extremo de forma no fiable. La arquitectura BGA está pensada para aplicaciones robóticas donde los agentes típicamente se ejecutan en la misma red, y los tiempos de respuesta son muy importantes. Como al reducir la latencia se disminuye el tiempo de respuesta se ha implementado el servicio de comunicación de BGA sobre UDP. Para facilitar el desarrollo de programas dentro de la arquitectura BGA se ha definido un lenguaje de programación, llamado BG, y se ha desarrollado su correspondiente traductor. Este lenguaje soporta enteramente todos los elementos de la arquitectura, y permite al usuario desarrollar sistemas de control basados en BGA de forma amigable y a la vez potente. Además, el modelado de los comportamientos está soportado por medio del uso de lógica difusa. Un programa BG está formado por la definición de una serie de agentes BGA y sus 37 CAPÍTULO 0: DESCRIPCIÓN GENERAL interfaces con la pizarra. Un programa BG está pensado para ser eje cutado como un proceso individual (que puede contener distintos threads de ejecución). Así, una arquitectura BGA completamente distribuida está compuesta por distintos programas BG, cada uno ejecutándose en un procesador diferente. En su versión actual, el lenguaje BG incluye numerosos elementos y estructuras de control de los lenguajes imperativos clásicos: • Variables locales y globales. • Expresiones y asignaciones. • Operadores aritméticos, lógicos y relacionales. • Condiciones. • Funciones similares a C. Además, BG soporta dos tipos de datos: números reales y conjuntos difusos con formas estándar. Adicionalmente BG incluye estructuras de control avanzadas: • Autómatas finitos deterministas. • Bases de reglas difusas. • Soporte para razonamiento aproximado. • Soporte para navegación de robots. • Soporte para agentes, comportamientos, tareas y blenders. BG soporta sólo números reales y conjuntos difusos (Listado 5.9). Un criterio de diseño de BG era hacerlo lo más simple posible, soportando sólo un conjunto mínimo de instrucciones pero sin perder generalidad. En este sentido los números enteros, booleanos, etc. se pueden ver como casos particulares de los reales. Las variables reales pueden ser accedidas tanto para lectura como para escritura, y sólo están disponibles en el ámbito de un programa BG. Se pueden utilizar modificadores para acceder a variables que no están en el ámbito de un programa (en general para acceder a la pizarra). Estos modificadores restringen el acceso a estas variables a ser en modo lectura o escritura de forma que se reduzca la sobrecarga de acceso externo. Están soportados tres tipos de modificadores: sensor, para acceder a variables 38 HUMBERTO M ARTÍNEZ BARBERÁ en modo sólo lectura, effector, para acceder a variables en modo sólo escritura, status, acceder a variables en modo lectura y escritura. Están permitidos cinco tipos de conjuntos difusos: trapezoidales, triangulares, crisp, sigmoides, y funciones campana. Además, un tipo especial de datos permite definir consecuentes tipo TSK. Las variables reales pueden ser operadas con operadores aritméticos, lógicos y relacionales estándar (Listado 5.10). Además, las variables difusas pueden ser operadas con los operadores t-norma, t-conorma, inverso, y grado de pertenencia (Listado 5.11). El lenguaje BG, de forma similar a C, soporta las estructuras de control clásicas y las asignaciones, así como bases de reglas difusas y métodos de soporte a la navegación. Estas estructuras se denominan comandos y pueden ser agrupados en bloques, que son listas de comandos que se ejecutan secuencialmente (Listado 5.12). Además, BG soporta el uso de funciones definidas por el usuario (Listado 5.13), que son globales a todos los módulos definidos dentro de un programa BG. El tipo de retorno implícito es el real. Los parámetros de las funciones también son implícitamente números reales. Si las funciones no tienen un comando de retorno, implícitamente devuelven el valor cero. El uso de autómatas finitos deterministas (Listado 5.14) permite al programador secuenciar tareas y acciones, y definir las condiciones para cambiar entre las distintas tareas. El lenguaje BG, en su versión actual, no tiene soporte para bucles (while, for, o repeat-until), aunque estos se pueden modelar como autómatas. La razón principal para no incluir bucles es para tener bajo control el ciclo de ejecución de los distintos comandos (de forma que un error no lo pueda bloquear). En un futuro estos se incorporarán como en COLBERT (Konolige, 1997), que en el fondo son descompuestos en autómatas. Otro uso de los autómatas es la implementación de planificadores sencillos que simplemente encadenan acciones. Las bases de reglas difusas (Listado 5.16) están pensadas tanto para la implementación de mecanismos de control como la fusión de comportamientos, y son uno de los elementos más destacados del lenguaje. Las reglas difusas en BG se pueden formar mediante cualquier combinación de operadores difusos, variables y conjuntos difusos para la definición del antecedente, así como cualquier combinación de variables y conjuntos difusos para la definición del consecuente (incluido el TSK). Además de las reglas difusas estándar, BG incluye reglas background, que son reglas cuyos antecedentes siempre se evalúan con el 39 CAPÍTULO 0: DESCRIPCIÓN GENERAL mismo grado de pertenencia, el del parámetro de la regla. Estas reglas son útiles para cubrir el espacio de entrada no definido. Esto se consigue indicando un grado de disparo muy bajo, que domina si ninguna otra regla se activa, y que produce un resultado despreciable si se activa alguna. Las reglas background juegan un papel importante cuando se utilizan en conjunción con un sistema de aprendizaje automático, ya que estos suelen dejar zonas del espacio de entrada sin cubrir en función de los ejemplos. En la implementación actual se pueden escoger distintos tipos de t-normas y t-conormas, pero sólo pueden ser cambiadas globalmente antes de la ejecución un programa BG. Salvo que se cambien, se usa la -t norma del mínimo y la -t conorma del máximo. El lenguage BG proporciona una estructura y métodos para gestionar información espacial bidimensional para la construcción de mapas y la generación de trayectorias (Listado 5.18). Como estructura de mapas se utilizan los mapas de celdillas difusas. El usuario puede especificar los parámetros de inicialización del mapa: el tamaño horizontal y vertical, el tamaño de las celdas, y el método de generación de trayectorias. Están soportados dos métodos de generación de trayectorias, el A* y el directo. Este último calcula la orientación del punto objetivo con respecto a la orientación actual del robot sin tener en cuenta ningún obstáculo. Este método es útil para implementar comportamientos de búsqueda del objetivo completamente reactivos. BG se diseñó para servir como lenguaje de desarrollo de la arquitectura BGA, y así, soporta la definición de los distintos elementos de BGA (Listado 5.20): agentes, tareas, comportamientos y blenders. Los distintos bloques de proceso (comportamientos, tareas y blenders) pueden contener variables reales de ámbito local, aunque no están soportados los conjuntos difusos locales. La definición de los comportamientos puede incluir un peso, que se utiliza en el proceso de fusión de BGA. Si no se especifica ningún peso se toma el valor 1. Cuando se usa un programa BG en conjunción con técnicas de aprendizaje se puede especificar que variables y sus respectivos rangos se van a utilizar en el bloque blender. De esta forma se puede limitar el espacio de búsqueda del método de aprendizaje usado (ver capítulo 7). Un programa BG (Listado 5.22) es simplemente una colección de agentes y variables. Adicionalmente se puede especificar un bloque de inicialización que se ejecuta una única vez 40 HUMBERTO M ARTÍNEZ BARBERÁ cuando el programa BG es iniciado, y en el que se puede realizar cualquier inicialización que se necesite. El intérprete del lenguaje BG ha sido desarrollado íntegramente usando el lenguaje Java. El analizador sintáctico está basado en un parser LALR generado a partir de la definición de la gramática, de forma similar a como se hace con YACC (Johnson, 1975), pero usando Java en lugar de C. El intérprete de BG está formado por dos módulos: el parser BG y el ejecutor BG. El parser BG toma como entrada un fichero que contiene un programa BG y procede con los análisis léxico y sintáctico. Como el lenguaje BG es simple, el análisis semántico (básicamente solo la compr obación de tipos) se realiza simultáneamente con el análisis sintáctico. Cuando un programa BG ha sido analizado el módulo produce un código intermedio para su posterior ejecución. El ejecutor BG toma como entrada ese código intermedio sin hacer ningún tipo de comprobación, y entonces los interpreta (funcionando en realidad como un transliterador). El código intermedio se almacena como un grafo dirigido, donde los nodos corresponden a los distintos elementos del lenguaje BG (variables, expresiones, comandos , comportamientos, tareas y agentes). En esta forma, la parte que consume más tiempo en el proceso de interpretación (el análisis léxico y semántico) se desacopla de la parte de ejecución por razones de eficiencia. Aunque el ejecutor BG también está desarrollado íntegramente en Java, que en realidad también es un lenguaje interpretado, se han ejecutado programas BG satisfactoriamente incluso en un procesador i486. El intérprete del lenguaje BG ha sido incorporado en un entorno de desarrollo llamado BGen, que también integra un simulador parametrizado de distintos robots holonómicos, herramientas de visualización y registro de datos, y una herramienta de aprendizaje de comportamientos. El entorno BGen no sólo está pensado para el desarrollo, sino también para el uso en aplicación final. En la etapa de desarrollo el usuario define y especifica un programa BG que refleja una arquitectura BGA particular para posteriormente simular el comportamiento global del robot y comprobar la conveniencia de la solución. Una vez que el programa BG está listo para su implantación como aplicación final es transferido al robot real, que también contiene el entorno BGen pero en lugar del simulador utiliza un interface con los elementos de bajo nivel del robot. Esto es posible porque todo el entorno se ha desarrollado en Java, lo que implica que el robot debe incluir una máquina virtual Java, disponibles para la 41 CAPÍTULO 0: DESCRIPCIÓN GENERAL mayoría de los sistemas operativos. Los robots usados en esta tesis para probar el entorno BGen utilizan Linux como sistema operativo. Este modelo de desarrollo presenta algunas ventajas: • El entorno de desarrollo y el de ejecución final son idénticos. • El tiempo necesario para pasar del entorno de desarrollo al de ejecución es mínimo. • Distintas plataformas pueden reusar código, a excepción de las funciones de muy bajo nivel. Figura 0.25: Ejemplo de arquitectura BGA. Una vez que se dispone de una arquitectura (BGA), un lenguaje (BG) y un entorno de desarrollo (BGen), el siguiente paso más natural es utilizarlos en una aplicación sobre la plataforma Quaky-Ant (ver capítulo 6). La tarea que ha de realizar el robot es navegar desde una determinada posición inicial a una objetivo, y después regresar a la inicial, en un entorno completamente desconocido similar a cualquier entorno interior de oficina (con diferentes salas y pasillos). Las coordenadas de las dos posiciones se conocen de antemano. Adicionalmente, el robot no debe colisionar ni quedarse paralizado. Para cumplir el objetivo el sistema ha sido descompuesto, extrayendo sus elementos más importantes (Armstrong et al., 1998), en los siguientes agentes BGA (Figura 0.25): • Reactive Control. Es el agente BGA que se encarga de la navegación reactiva. Implementa una serie de comportamientos para evitar colisiones, seguir un camino determinado, alinearse a una pared, o escapar de obstáculos en forma de U. 42 HUMBERTO M ARTÍNEZ BARBERÁ • Planning. Es el agente deliberativo BGA que se encarga de la planificación de alto nivel y de la supervisión del objetivo. • Navigation. Es el agente deliberativo BGA que se encarga de construir un modelo del entorno, localizar el robot sobre él, y generar una posible camino para alcanzar el objetivo. • Sensor Processing and Actuator Control. Son los agentes BGA encargados de procesar los datos sensoriales y controlar los motores. El primero accede a los señales directas de los sensores, las filtra, las procesa, y pone la información relevante en la pizarra. Es en este agente donde se ejecutan los algoritmos de fusión sensorial (ver capítulo 3). El segundo lee los comandos motores de la pizarra y envía las correspondientes acciones de control a los motores. toGoal Blackboard alignL alignR LPS uobject heading turn alpha left right leftd rightd speed virtu i avoidR front avoidF common avoidL escape blender Figura 0.26: Agente Reactive Control. El agente Reactive Control (Figura 0.26) es el más importante y crucial porque implementa los comportamientos básicos reactivos, que tienen la responsabilidad de mantener 43 CAPÍTULO 0: DESCRIPCIÓN GENERAL la integridad física del robot por medio de la evitación de colisiones. Casi todos los comportamientos se definen mediante bases de reglas difusas, obtenidas a partir de experiencia previa o de técnicas de modelado difuso. Se han identificado y desarrollado los siguientes comportamientos reactivos: • Avoid-left. Evita un obstáculo situado a su izquierda. • Avoid-front. Evita un obstáculo situado a su frente. • Avoid-right. Evita un obstáculo situado a su derecha. • Align-left. Mantiene el robot a una distancia determinada de la pared izquierda. • Align-right. Mantiene el robot a una distancia determinada de la pared derecha. • Move-to-goal. Dirige el robot en la dirección del objetivo actual. • Escape. Gira el robot aleatoriamente para escapar de un pared en forma de U. Blackboard EPS heading gy gx virtu i alpha grid x y map Figura 0.27: Agente Navigation. 44 HUMBERTO M ARTÍNEZ BARBERÁ Blackboard EPS stage x gy gx h o m e _x h o m e _y y ingoal plan Figura 0.28: Agente Planning. El agente Navigation (Figura 0.27) es el encargado de construir un mapa del entorno, calcular un camino a la posición objetivo, y mantener localizado el robot. Se utiliza la estructura de navegación soportada por BG. El mapa se actualiza continuamente con las medidas de los sensores. El agente Planning (Figura 0.28) se encarga de supervisar la tarea que se está ejecutando, cambiando el objetivo cuando se ha completado un bordo y deteniendo la ejecución cuando se completa la tarea global. Para implementar este planificados simple se utiliza la estructura de autómata del lenguaje BG. Los distintos estados del autómata tienen asociados coordenadas objetivo, que son usadas por el agente Navigation para producir un rumbo que será posteriormente seguido por el robot hasta cumplir el objetivo del estado. Se han realizado distintos experimentos para probar el conjunto de la arquitectura, el lenguaje, el entorno de desarrollo, y la aplicación descrita. El primero de estos (Figura 0.30) se ha basado en un entorno de 5.2 x 4.9 m (Figura 0.29), con puertas estrechas y ligeramente más grandes que el diámetro del robot. Se ha realizado sobre simulador con la plataforma Quaky-Ant. El resultado muestra que el robot es capaz de realizar la tarea describiendo una trayectoria suave y evitando los obstáculos. 45 CAPÍTULO 0: DESCRIPCIÓN GENERAL Goal Position Home Position Figura 0.29: Entorno del primer experimento. 46 HUMBERTO M ARTÍNEZ BARBERÁ Figura 0.30: Resultados del primer experimento. El segundo experimento (Figura 0.32) se ha realizado con un robot real Nomad 200 en un laboratorio (Figura 0.31) de la Universidad de Örebro, Suecia, con un entorno de 4.2 x 7.5 m y sin corregir la odometría. Es importante destacar que, además de realizar la tarea, este 47 CAPÍTULO 0: DESCRIPCIÓN GENERAL experimento se ha repetido diversas veces con diferentes puntos de inicio y fin, e incluso con obstáculos humanos a baja velocidad. Cuando se ajustaron correctamente los comportamientos y reglas de fusión (que fueron desarrollados para el robot Quaky-Ant) el robot no colisionaba y encontraba siempre la ruta hasta el punto objetivo, siempre y cuando existiera una. Figura 0.31: Entorno del segundo experimento. 48 HUMBERTO M ARTÍNEZ BARBERÁ Figura 0.32: Resultados del segundo experimento. El tercer experimento compara el comportamiento del robot con generación de trayectorias basada en A* y en el método directo. En el primer caso la generación es un proceso deliberativo mientras que en el segundo es un proceso puramente reactivo. El trayecto deliberativo (Figura 0.33) presenta un resultado similar al del primer experimento. En este, los comportamientos reactivos ayudan al robot en el problema de evitar obstáculos, a la vez que el módulo de generación de trayectorias produce caminos seguros. El trayecto reactivo (Figura 0.34) discurre de forma diferente. Mientras que los comportamientos reactivos ayudan al robot en el problema de evitar obstáculos, el módulo de generación de trayectorias produce caminos directos al objetivo sin tener en cuenta los obstáculos. En este caso el robot completa la tarea pero describe una trayectoria que depende de la configuración del entorno. Es importante notar que usando esta aproximación reactiva el robot puede quedar atrapado en obstáculos con forma de U o en ciclos infinitos. 49 CAPÍTULO 0: DESCRIPCIÓN GENERAL Figura 0.33: Tercer experimento, trayecto deliberativo. Figura 0.34: Tercer experimento, trayecto reactivo El cuarto experimento (Figura 0.37) se ha realizado con un robot real Nomad 200 en un laboratorio (Figura 0.35) de la Universidad de Örebro, Suecia, con un entorno de 4.2 x 7.5 m, y con corrección de odometría. La preparación es similar a la del segundo experimento, pero involucrando bordos más largos y un tiempo de ejecución de la tarea mucho mayor. El robot comienza en un punto cualquiera e intenta construir un mapa del entorno moviéndose hacia 50 HUMBERTO M ARTÍNEZ BARBERÁ las áreas no exploradas. Dos son las diferencias más notables con respecto a los anteriores experimentos: • Adicionalmente al mapa de celdas difusas, que es usado para la generación de trayectorias, el robot también construye un mapa de segmentos difusos, que es usado para localización (ver capítulo 2). • El agente Planning es similar a los anteriores, pero en lugar de recibir las coordenadas objetivo de un operador humano (o como parámetro de un programa BG), recibe las coordenadas de un nuevo agente llamado Topology, que incorpora un planificador basado en un mapa topológico. Figura 0.35: Entorno del cuarto experimento. El planificador topológico construye y actualiza un mapa global topológico (Duckett y Nehmzov, 1998) mediante la técnica de relajación (Duckett et al., 2000). El mapa topológico (Figura 0.36) lleva la cuenta de los lugares que han sido visitados, y también los posibles lugares no ocupados. El planificador selecciona uno de estos lugares y envía sus coordenadas al agente Planning como nuevo objetivo. El experimento muestra que el robot es capaz de construir un mapa del entorno mientras que se mantiene localizado. Es importante notar que los errores de odometría son muy significativos al cabo de unos minutos. 51 CAPÍTULO 0: DESCRIPCIÓN GENERAL Figure 0.36: Mapa topológico. Figura 0.37: Resultado del cuarto experimento. De los experimentos realizados se pueden extraer una serie de conclusiones acerca del entorno BGen, y del lenguaje y arquitectura subyacentes. Los programas BG son simples y fáciles de comprender. Sin conocer de antemano que hace un programa BG, es fácil figurarse el propósito del mismo, aunque, por supuesto, esto también depende de cómo se haya escrito el programa. En cualquier caso, aunque las tareas involucradas para alcanzar el objetivo del robot son complicadas, el lenguaje BG oculta los 52 HUMBERTO M ARTÍNEZ BARBERÁ detalles y permite al programador concentrase en el dominio de la aplicación. Las técnicas específicas de navegación están accesible desde el lenguaje, e incluso nuevos métodos pueden ser añadidos. Esto favorece y ayuda a la hora de conseguir un rápido desarrollo de los programas del robot así como su mantenimiento futuro. El entorno BGen es abierto y portable. Mientras que el desarrollo del entorno BGen se ha llevado a cabo en la Universidad de Murcia en paralelo al desarrollo del robot Quaky-Ant, alguno de los experimentos muestran resultados con otro completamente diferente. El interface del entorno BGen con el nuevo sistema, y la modificación del programa BG para tener en cuenta la geometría y velocidad del nuevo robot se llevaron a cabo en menos de dos semanas, mientras que el autor estaba en la Universidad de Örebro, Suecia. Es importante hacer notar que el entorno todavía no se había probado al completo en un robot real. Estos hechos apoyan la idea de la flexibilidad del entorno y su facilidad de transporte a otros sistemas. Los entornos robóticos basados en Java son prácticos y usables. Comúnmente se argumenta que los programas en Java son extremadamente lentos. Si bien es cierto que los programas Java compilados just-in-time son ligeramente más lentos que el equivalente compilado nativo, es posible utilizar Java en la mayoría de las aplicaciones robóticas. Además, el intérprete de BG se ejecuta por encima de la máquina virtual Java, lo que implica que los programas BG se interpreten dos veces. Esto no es un grave problema ya que la mayoría de las técnicas intensivas en procesamiento se ejecutan directamente como librerías Java (construcción de mapas, generación de trayectorias, y localización). El valor añadido del entorno BGen es, de nuevo, su portabilidad y reutilización de código a dos niveles: a nivel de sistema, reusando las librerías completas de BGen, y a nivel de aplicación, reusando los programas BG para diferentes plataformas. El entorno BGen es extensible. Uno de los experimentos muestra que nuevas técnicas o módulos pueden ser añadidos a la arquitectura BGA. En este caso el planificador topológico no es un programa Java, sino un programa en C/C++ que se comunica por medio del Java Native Interface. De esta forma se pueden añadir al entorno nuevos servicios o aplicaciones ya desarrollados. 53 CAPÍTULO 0: DESCRIPCIÓN GENERAL 0.6 Aprendizaje de la fusión de comportamientos Un punto muy importante del control basado en comportamientos es como coordinar eficientemente conflictos y competición entre distintos tipos de comportamientos para conseguir un buen rendimiento (ver capítulo 7). Se ha mostrado la solución de la arquitectura BGA que utiliza un sistema basado en reglas difusas. El uso de técnicas de aprendizaje en la base de reglas de fusión puede resultar en una mejora del rendimiento del robot. Una de las ventajas de este sistema es que el robot puede ser entrenado con solo unos comportamientos, y posteriormente si se desea añadir otros nuevos solo se necesita reentrenar la bases de reglas de fusión, preservando los comportamientos anteriores. El proceso de aprendizaje propuesto en esta tesis es como sigue. El usuario selecciona el agente BGA cuya base de reglas de fusión va a ser aprendida. Una aproximación interesante y práctica es empezar con un conjunto de reglas difusas predefinido, que se utilizan como entrada al proceso de aprendizaje (al que también se le puede pasar un conjunto de reglas vacío). Para el aprendizaje se utiliza un algoritmo genético, cuyos individuos son bases de reglas difusas (codificación Pittsburgh), y el resultado del proceso de aprendizaje es una base de reglas de fusión. El algoritmo utilizado se basa en una modificación de uno utilizado para el modelado difuso (Gómez Skarmeta y Jiménez, 1997). Para simplificar la codificación, el algoritmo desarrollado impone y asume algunas restricciones: los conjuntos difusos han de ser trapezoidales y las reglas sólo deben usar conjunciones. Mientras que la primera restricción puede condicionar el rendimiento de los resultados, la segunda no impone ninguna restricción. Cualquier conjunto de reglas difusas con disyunciones y negaciones puede ser reducido a otro conjunto con solo conjunciones, simplemente cambiando el número de reglas, y complementando algunos conjuntos. Los conjuntos trapezoidales se utilizan comúnmente en muchas aplicaciones de control, y los experimentos muestran que son adecuados para este problema. La codificación original se ha modificado para incluir las reglas background. 54 HUMBERTO M ARTÍNEZ BARBERÁ Aparte de la codificación de las reglas y algunos parámetros derivados de esta, un punto clave es la definición de la función de fitness. Esta depende fuertemente del contexto de la aplicación y generalmente no es simple. La función debe medir cuanto de bueno es cada individuo, que, de hecho, afecta el rendimiento del robot. De esta forma, la función debe tener en cuenta como ejecuta el robot una tarea predefinida. El usuario debe definir un entorno de simulación con, al menos, los siguientes parámetros: las posiciones inicial y final del robot y qué debe de hacer, un entorno con paredes, puertas, pasillos y obstáculos, y el número de simulaciones que realizará cada individuo. Para evitar ciclos infinitos en las simulaciones, se necesitan adicionalmente: el tiempo máximo disponible para solucionar la tarea, y el número máximo permitido de colisiones. La evaluación de la función de fitness para cada individuo implica el uso de su base de reglas difusas como base de reglas de fusión en un programa BG, y la simulación del robot con dicho programa. Se tendrá en cada generación tantos programas BG como individuos haya. Cuando se termina una simulación, esta devuelve los siguientes parámetros: • Si el objetivo se ha conseguido o no. • El número total de pasos de simulación realizados. • El número de colisiones del robot con el entorno. • El último tramo completado por el robot. • El porcentaje de la distancia que le queda al robot para terminar el tramo actual. La función de fitness definida a partir de estos parámetros intenta tener en cuenta los distintos aspectos relevantes para conseguir un buen rendimiento del robot: premiar cuando la tarea se completa, premiar tiempos de ejecución bajos, y castigar las colisiones con las paredes. Adicionalmente, esta función debe medir el rendimiento de un robot incluso si no ha completado el objetivo. Para ello también tiene en cuenta el número de tramos completados, y el grado de completitud del último tramo no completado. De otra forma la evolución no puede distinguir entre individuos que no son suficientemente buenos para lograr completar el objetivo, y sin embargo pueden ser buenos candidatos para seguir evolucionando. Este hecho es muy importante, sobre todo al comienzo del proceso de la evolución. 55 CAPÍTULO 0: DESCRIPCIÓN GENERAL Se han llevado a cabo diferentes experimentos para comprobar el rendimiento del algoritmo genético en el problema de la fusión de comportamientos. En todos los casos, el proceso de aprendizaje comienza con un programa BG determinado, que es el descrito en los ejemplos de la arquitectura BGA. El programa BG especifica el número de variables de entrada y salida, que corresponden a las variables de entrada del bloque de fusión y a los comportamientos reactivos respectivamente. Debido a que la simulación consume mucho tiempo, los escenarios usados (Figura 0.38) son muy pequeños, aunque presentan tres problemas reactivos clásicos: el objetivo es ocluido por un obstáculo simétrico, el robot ha de girar a la izquierda con el objetivo a la derecha, y el robot ha de girar a la derecha con el objetivo a la izquierda. Las probabilidades de mutación y cruce se han seleccionado ligeramente más altas que en sistemas genéticos típicos. La razón está relacionada con el tiempo necesario para realizar una simulación. Si las probabilidades se escogen bajas el tiempo de exploración (el tiempo que se necesita para explorar diferentes partes del espacio de soluciones) es mayor. Si las probabilidades se escogen altas el tiempo de exploración es menor, con el inconveniente de una búsqueda local más pobre. En la presente aplicación la capacidad de búsqueda global se considera más importante, y por ello se utilizan probabilidades altas. Figura 0.38: Entornos para aprendizaje. La definición de comportamientos en BG y su correspondiente modelo en BGA permiten el uso de pesos para indicar la importancia de cada comportamiento. En este contexto de aprendizaje los pesos juegan un papel importante porque es una información que no es evolucionada y es mantenida estática durante todo el proceso de aprendizaje. Además, esta información ha sido proporcionada por el usuario que sirve domo experto en el dominio. Para comprobar el efecto de estos pesos en el proceso de aprendizaje se han efectuado dos experimentos, uno de los cuales tiene todos los pesos puestos al mismo valor. Se ha comprobado en estos casos que el aprendizaje con pesos produce mejores resultados que el 56 HUMBERTO M ARTÍNEZ BARBERÁ mismo sin pesos, y que, además, produce mejores resultados que con el conjunto de reglas iniciales. 0.7 Una aplicación de teleoperación basada en el web En los últimos años las aplicaciones robóticas basadas en el web (Taylor y Dalton, 2000) en crecido de una manera que la mayoría de los laboratorios importantes tienen aplicaciones para acceder a sus robots desde Internet. Además existen diferentes tecnologías para permitir que un robot sea teleoperado desde una red. BGen-Web es una aplicación robótica de exploración basada en el web (ver capítulo 8), que sirve como prueba de concepto. El ánimo de esta aplicación es proporcionar un servicio basado en el web para un control supervisor (Sheridan, 1992) altamente interactivo de un robot móvil con el objetivo de realizar una exploración de un entorno poco conocido o completamente desconocido. Así, los objetivos de la aplicación BGen-Web se pueden resumir en: • La aplicación debe funcionar en entornos reales tanto interiores como exteriores. En el último caso, puesto que no se dispone de ningún vehículo todo terreno, sólo se consideran caminos asfaltados. La intención es el uso en tareas de exploración y vigilancia. • El sistema se debe de construir sobre la arquitectura BGA, haciendo así uso de los recursos previamente desarrollados. • La experiencia del usuario debe ser los más interactiva posible, sólo limitada por los requisitos de ancho de banda. • El grado de control de usuario debe depender de un perfil, variando desde el control en lazo cerrado hasta el control supervisor por parte del operador. Como se puede observar en ejemplos disponibles en la literatura (Taylor y Dalton, 2000a) (Saucy y Mondada, 2000) (Simmons et al., 2000) los interfaces basados en HTNL no son suficientes para sistemas altamente interactivos porque demandan mucho procesamiento 57 CAPÍTULO 0: DESCRIPCIÓN GENERAL en el lado del servidor. Como se muestra en otro ejemplo (Tso et al., 1998), el modo de solucionar el problema es incrementar el procesamiento en el cliente, y en este caso la tecnología Java proporciona un marco de trabajo multi plataforma para conseguir interfaces de alto rendimiento. Puesto que la arquitectura BGA y su correspondientes entorno BGen y lenguaje BG han sido desarrollados en Java, es sencillo incorporar los desarrollos previos en una arquitectura basada en web. Aunque la aplicación BGen-Web no es dependiente de una plataforma concreta, se ha utilizado el robot Quaky-Ant. Entre otros elementos (ver capítulo 2), dispone de una serie de sensores de ultrasonidos e infrarrojos, una cámara de vídeo, un radio módem para transmisión bidireccional de datos y un emisor de vídeo a 2.4 GHz. La arquitectura BGen-Web (Figura 0.39) utiliza HTML como el mecanismo de distribución del software y los ficheros de configuración, y para ejecutar el applet Java. Los clientes proporcionan interfaces para datos sensoriales, realimentación de visual, y control del robot. El servidor web proporciona el repositorio de software y la pasarela para el protocolo de la arquitectura BGA (KQML). El servidor de vídeo proporciona vídeo en tiempo real. Ambos servidores pueden ejecutarse físicamente en el mismo equipo. La pasarela BGA actúa como un servicio proxy: recibe peticiones de los distintos clientes y las deposita en la pizarra, y lee información sensorial de la pizarra y la envía a los distintos clientes. El vídeo en tiempo real lo proporciona la Java Media Framework (JMF), que es una API que permite que las aplicaciones y applets incorporen media basados en tiempo. Así, permite a los programadores desarrollar software que presenta, captura y almacena dichos media, y también controla el tipo de procesamiento aplicado a esos flujos. Por último, la JMF ofrece soporte RTP (RFC 1889) para ser usado con aplicaciones de vídeo o audio bajo demanda o aplicaciones de videoconferencia interactiva. El tipo de media usado para la transmisión del BGen-Web es la recomendación ITU H.261 (Tekalp, 1995). En la implementación actual de BGen-Web el servidor de vídeo envía imágenes QCIF H.261 a 7.5 frames/s, que proporciona suficiente calidad de vídeo sobre Internet. 58 HUMBERTO M ARTÍNEZ BARBERÁ Figura 0.39: Arquitectura BGen-Web. La operación de BGen-Web es muy simple. Cuando el applet (Figura 0.40) arranca, se conecta tanto al proxy BGA (por medio de una dirección unicast) como al servidor de vídeo (por medio de una dirección multicast), y comienza a recibir datos de los sensores y realimentación de vídeo en tiempo real. El applet tiene unos componentes que muestran las lecturas de los sensores (ultrasonidos, infrarrojos, sensores virtuales, bumpers, compás, odometría y GPS), el vídeo, el mapa de celdas difusas actual, y que permiten dirigir el robot. Al comienzo el mapa está vacío y se va actualizando según se desplaza el robot. El operador en control supervisor puede escoger el punto objetivo que desee haciendo click en el mapa. El robot ejecuta un programa BG que trata de alcanzar el objetivo actual, y cuando lo alcanza se detiene y espera. El operador puede seleccionar un objetivo nuevo en cualquier momento. De esta forma el operador puede explorar entornos desconocidos. Para propósitos de adquisición de datos, el operador también puede manejar el robot en cerrando el lazo de control por medio de las barras de despla zamiento de velocidad y giro. Esta opción sólo está disponible a aquellos operadores cuya identificación de equipo esté almacenada en el proxy. Cuando un 59 CAPÍTULO 0: DESCRIPCIÓN GENERAL comando de velocidad o giro llega al proxy, este compara la dirección de origen y si esta no se encuentra entre las permitidas el comando no se envía al robot. Figura 0.40: Applet de la aplicación BGen-Web. El sistema BGen-Web ha sido probado satisfactoriamente en diferentes lugares y con distintas configuraciones (Figura 0.41). La aplicación se muestra como una prueba de concepto, y el servicio no está disponible en todo momento porque sólo se activa para exhibiciones y pruebas, ya que el equipo se comparte con otros experimentos. 60 HUMBERTO M ARTÍNEZ BARBERÁ Figura 0.41: Secuencia de una teleoperación con BGen-Web. 0.8 Conclusiones Esta tesis presenta una serie de trabajos relacionados con la robótica móvil, que cubren las diferentes capas necesarias para hacer que un robot móvil desarrolle tareas de forma autónoma en un entorno desconocido a priori. Cada capa proporciona una serie de capacidades que son usadas por las capas superiores. Estas capas son la plataforma física, la capa de fusión sensorial, la capa de navegación, la capa de arquitectura de control y lenguaje y la capa de aplicación. Se resumen a continuación los resultados y contribuciones conseguidos en cada una de ellas. El Quaky-Ant es un diseño simple pero potente, que presenta una arquitectura flexible y modular. Al ser comparado con robots comerciales, Quaky-Ant resulta mejor en aspectos como el coste, rendimiento y capacidad de ampliación. Las principales características de este robot pueden resumirse en: • Presenta una arquitectura abierta. Dos puntos clave son el uso del bus estándar I2C para la interconexión de los diferentes módulos y la definición de un protocolo genérico abierto. 61 CAPÍTULO 0: DESCRIPCIÓN GENERAL • Una consecuencia del diseño de la arquitectura es su robustez (principalmente debido a los protocolos hechos a medida) y su extensibilidad (todos los módulos son tratados de la misma forma, independientemente de su naturaleza). • La arquitectura no está ligada a ninguna arquitectura de control o sistema de desarrollo, a pesar de que sirve como punto de referencia de la arquitectura de control desarrollada. Opera suave y correctamente en entornos interiores y es también capaz de operar en entornos exteriores no accidentados. En cualquier caso, no muestra una destacada velocidad máxima (aproximadamente 0.45 m/s) porque este no fue una preocupación de diseño. Frente a esto, exhibe maniobrabilidad y precisión. Se han desarrollado dos métodos diferentes para resolver el problema de la fusión sensorial en entornos no conocidos a priori. En este ámbito, las técnicas que se basan en un modelo y hacen uso de información a priori no son aplicables, y por ello, se han empleado en su lugar redes neuronales y sistemas basados en reglas borrosas. Estas técnicas de fusión requieren algunos ejemplos para la etapa de aprendizaje. La obtención de tales ejemplos de los sensores reales es un problema extremadamente complejo. La aproximación seguida ha sido emplear ejemplos obtenidos usando un simulador. Por esta razón, se han evaluado e integrado en el entorno de simulación modelos de sonar y de sensores infrarrojos, también empleados en la validación de la fusión sensorial. El modelo de sonar está basado en una técnica de sonar-tracing que ha sido modificada para tomar en consideración los efectos de lóbulos del sonar, y el modelo de sensores infrarrojos se ha obtenido linealizando los valores experimentales. Una vez que se han entrenado los métodos de red neuronal y reglas borrosas, se prueban sobre diferentes entornos simulados, desde entornos pequeños a grandes. Seguidamente, los métodos de fusión se comparan con los datos directos del sensor y con un sistema sencillo basado en reglas heurísticas. Los resultados muestran que, en la mayoría de los casos, los métodos de fusión propuestos se comportan mejor que los datos directos del sensor y el sistema de reglas heurísticas. Además de esto, de entre todos los métodos de fusión probados, 62 HUMBERTO M ARTÍNEZ BARBERÁ el sistema basado en reglas borrosas obtenido usando clustering tabú difuso tiene el mejor rendimiento general. El clustering tabú borroso es un algoritmo basado en medias c borrosas, pero con la introducción de una búsqueda del mínimo global por medio de una técnica de búsqueda tabú. El algoritmo ha sido aplicado también, satisfactoriamente, a diferentes ejemplos, mostrando resultados comparables con los algoritmos más conocidos de clustering borroso disponibles en la literatura. Se han desarrollado dos esquemas de navegación para permitir al robot Quaky-Ant navegar en un entorno desconocido a priori. El primer procedimiento se apoya en la suposición de una odometría fiable para operaciones en cortos intervalos de tiempo y cortas distancias. Teniendo en cuenta esta suposición, se proporciona al robot un método de construcción de mapas y un algoritmo de planificación de trayectorias, que se ejecutan simultáneamente, de forma que a medida que el robot se desplaza, construye un mapa, resultando en un movimiento continuo. Esta aproximación se basa en un mapa de celdas borroso para la construcción del mapa y un algoritmo basado en A* para la planificación de la trayectoria. El mapa de celdas borroso se ha modificado para cumplir con la tarea requerida: se emplean sensores virtuales en lugar de los sonars, se ha cambiado el procedimiento de inicialización y se han incorporado algunas heurísticas para modificar la certeza de las celdas visitadas. Se ha definido el planificador basado en A* con una nueva función de evaluación para generar trayectorias que dejan suficiente espacio entre el robot y los obstáculos, y para ser eficiente en tiempo, de forma que el nodo de evaluación es suficientemente rápido para ejecutar el planificador con alta frecuencia. El esquema de navegación ha sido usado de forma satisfactoria en escenarios diferentes, tanto simulados como reales. La arquitectura BGA aprovecha varios conceptos bien conocidos y desarrollados. Se basa en una arquitectura de pizarra que coordina un sistema multiagente. Los agentes pueden estar divididos en niveles y distribuidos de acuerdo con sus funcionalidades, incluyendo los esquemas de arquitecturas híbridas de tres capas, donde la coordinación del comportamiento se consigue mediante un método de fusión derivado del context-dependent blending. Las principales contribuciones de la arquitectura BGA son las siguientes: 63 CAPÍTULO 0: DESCRIPCIÓN GENERAL • BGA agrupa, de manera efectiva, diferentes conceptos y paradigmas de arquitecturas de control (a saber, arquitecturas de pizarra, arquitecturas híbridas y sistemas multiagente) extendiendo y combinando sus características. • BGA incorpora un nuevo mecanismo de fusión que está basado en el context-dependent blending, que de forma efectiva afronta el problema de la coordinación de comportamientos. Además de esto, su diseño permite el uso de conocimiento experto durante el proceso de aprendizaje de la base de reglas de fusión. • BGA hace uso de KQML, típico en los sistemas de paso de mensajes pero no habitual en los sistemas de pizarra, implementado sobre UDP para mejorar el rendimiento. El lenguaje BG es un lenguaje de programación de robots de alto nivel que soporta completamente la arquitectura BGA y, por tanto, permite al usuario desarrollar sistemas de control basados en BGA de una manera sencilla y potente a la vez. BG ayuda en el ciclo de desarrollo de aplicaciones de robótica proporcionando un entorno único y uniforme para declarar y especificar todos los elementos de la arquitectura BGA (comportamientos, tareas, blenders y agentes), soportando un modelado del comportamiento a través del uso extensivo de la lógica difusa. Además, se puede conseguir una arquitectura BGA completamente distribuida ejecutando los programas BG en diferentes procesadores. El entorno de desarrollo BGen incorpora un intérprete del lenguaje BG, así como un simulador para diferentes robots holonómicos adecuados, herramientas de visualización y de grabación de datos, y una herramienta de aprendizaje de comportamientos. El entorno BGen está concebido tanto para el desarrollo como para la puesta en producción, y proporciona un modo genérico y simple de crear y probar programas BG fuera del robot, y para ejecutar dichos programas en un robot real. Por tanto, los tiempos de desarrollo y puesta en marcha se reducen considerablemente, además de permitir que diferentes plataformas reutilicen el código. Mediante una serie de experimentos se ha mostrado que BGen, que incluye el lenguaje BG y la arquitectura BGA, proporciona un buen entorno de trabajo para los problemas relacionados con robótica móvil. 64 HUMBERTO M ARTÍNEZ BARBERÁ Se ha desarrollado un método basado en algoritmos genéticos para asistir en la creación de las bases de reglas difusas de fusión de comportamientos. El algoritmo se ha empleado para la obtención y aprendizaje de un nuevo conjunto de reglas de fusión desde un conjunto inicial de reglas proporcionado por el usuario (que puede ser el conjunto vacío). El algoritmo genético ha sido adaptado para codificar individuos que son convertidos en bases de reglas difusas de fusión de BG. Por ello, estos individuos pueden ser reglas difusas estándares MIMO y también reglas difusas background. Más aún, la función de fitness ha sido definida para incluir los efectos de la evaluación experimental del comportamiento del robot usando simulación. Esta función toma en cuenta diferentes parámetros que miden el rendimiento del robot tanto en ejecuciones satisfactorias como insatisfactorias. Este aspecto es crucial en la parte inicial del proceso de aprendizaje porque la mayoría de los individuos no alcanzan el objetivo, y es muy importante discriminar entre ellos para acelerar el proceso de aprendizaje. Se han llevado a cabo varios experimentos para probar la influencia de los pesos del comportamiento y los resultados del proceso de aprendizaje. Como se ha mostrado, los pesos del comportamiento pueden ayudar al proceso de aprendizaje a reducir el espacio de búsqueda haciendo uso de conocimiento que el usuario tiene del dominio. Por otra parte, el uso del proceso de aprendizaje mejora el rendimiento del robot con respecto a las reglas de fusión definidas por el usuario. BGen-Web es una aplicación de exploración de robótica basada en web, creada a modo de prueba de concepto. El sistema proporciona un servicio basado en web para realizar una supervisión altamente interactiva de un robot móvil, con el fin de llevar a cabo una exploración de un entorno poco conocido o completamente desconocido, y como tal se ha probado con resultados satisfactorios en entornos reales, tanto internos como externos. Este sistema se ha construido sobre la arquitectura BGA con el fin de usar todos los recursos del software previamente desarrollado. Usando los datos de los sensores y la realimentación del vídeo en tiempo real, la experiencia del operador es altamente interactiva cuando controla un robot a través de Internet (sólo está limitado por los requisitos de ancho de banda). Este es un aspecto que lo distingue de otras aplicaciones de robótica a través de web. Otro aspecto importante de BGen-Web es la granularidad de tareas. Algunos sistemas proporcionan interfaces de control al nivel más bajo (por ejemplo, control directo de los 65 CAPÍTULO 0: DESCRIPCIÓN GENERAL motores), mientras que otros ofrecen control de comandos a un nivel mayor (por ejemplo, situar el robot en un lugar previamente conocido). BGen-Web proporciona comandos a un nivel intermedio: el usuario escoge la posición deseada y el robot intenta alcanzarla. Por ello, BGen-Web muestra niveles de interactividad más allá de los presentados por otros sistemas, que aportan mucha menos información al operador y no se perciben interactivamente. Por otra parte, BGen-Web opera en entornos del mundo real, como lo hacen WITS y Xavier. 0.9 Trabajo futuro Esta tesis presenta soluciones en diferentes áreas para conseguir una arquitectura distribuida de control inteligente en robots móviles autónomos. A partir de esta tesis se abren nuevas líneas y áreas de trabajo para la mejora de los diferentes componentes. Las técnicas de fusión de sensores han mostrado un buen rendimiento, especialmente la aproximación híbrida neuro-fuzzy que, desde un punto de vista cualitativo, produce buenos resultados en el mundo real. La configuración de los experimentos necesaria para extender estos tests, con el fin de incluir datos reales y obtener resultados cuantitativos, es muy difícil. Más aún, la adquisición de datos reales es realmente compleja y no se ha realizado todavía. Hacerlo mejoraría también los resultados. El problema de la navegación se ha resuelto satisfactoriamente para la mayoría de las situaciones, pero existen aún algunos aspectos que deben ser mejorados. El mapa de celdas difusas asume una odometría perfecta, y las celdas se actualizan basándose en la localización actual. Si este mapa se usa junto con un mapa de segmentos difusos, la calidad de los resultados depende fuertemente de la frecuencia de actualización de la localización. Si el seguimiento del entorno actual se puede por cualquier razón, la localización inicial podría fallar en entornos muy simétricos, como los que se encuentran en edificios con pasillos. En este caso, una técnica de coincidencia geométrica no es suficiente, y queda como un problema pendiente. La arquitectura BGA es suficientemente flexible como para afrontar diferentes problemas y tareas, como los que se presentan en esta tesis. A pesar de que esta arquitectura 66 HUMBERTO M ARTÍNEZ BARBERÁ no está restringida a un robot único, son necesarias ciertas estructuras adicionales para poder emplear BGA en un escenario multi robot. Una posible mejora es la inclusión de una facilidad de paso de mensajes para la comunicación de diferentes robots, mientras se mantiene el actual modelo de pizarra para la comunicación interna del robot. En este contexto, la tecnología de red JINI, basada en Java, es un marco interesante para establecer comunicaciones entre los diferentes agentes de la arquitectura. Proporciona un mecanismo simple que permite a los agentes conectarse para formar una comunidad sin ninguna planificación, instalación o intervención humana. Adicionalmente, el lenguaje BG debería ser modificado para incorporar las nuevas estructuras de control necesarias para la comunicación con otros robots. El lenguaje BG, en su origen, fue diseñado para usar mapas de celdas borrosos y planificación de trayectorias A*. Durante el trabajo de esta tesis se han ido añadiendo nuevos métodos y técnicas, pero no son visible s en el lenguaje BG. Una clara mejora sería emplear estructuras de navegación genéricas (construcción de mapas, planificación de trayectorias y localización) que se instanciarían en tiempo de ejecución usando un fichero de configuración. De esta forma, el programa de control simplemente enviaría comandos de actualización a estas estructuras y recibiría la realimentación correspondiente usando un interfaz genérico. Desde el punto de vista de la legibilidad, el lenguaje podría ser extendido también para soportar más tipos de datos (booleanos, enteros, estructuras de datos). Desde el punto de vista del rendimiento, el intérprete BG actual es muy flexible y rápido, pero si los programas BG se compilasen a Java se lograría un incremento considerable en su velocidad de ejecución, a expensas de una preparación más compleja del programa (las modificaciones del programa sobre la marcha no estarían soportadas). Actualmente, la arquitectura BGen soporta los robots Quaky-Ant y Nomad 200, pero está siendo actualizada para poder trabajar con otros robots comerciales, como el RWI B21. Se ha mostrado que, para la fusión de comportamientos, la aproximación basada en algoritmos genéticos produce buenos resultados. Su desventaja estriba en el tiempo requerido para obtener una buena solución. El problema es doble. En primer lugar, se requieren diferentes escenarios para descartar la búsqueda aleatoria y probar diferentes habilidades. Si hay muchos escenarios, o sus dimensiones son grandes, el algoritmo dedica mucho tiempo a la simulación del robot. En este caso, el uso de una versión distribuida o paralela del 67 CAPÍTULO 0: DESCRIPCIÓN GENERAL algoritmo podría mejorar de forma notable el tiempo requerido para obtener una solución. Además de esto, deberían incorporarse más condiciones de terminación para evitar la pérdida de tiempo en soluciones claramente inútiles. La aplicación de robótica basada en web BGen-Web se ha mostrado efectiva. Presenta un gran nivel de interactividad, principalmente debido al uso de la realimentación con el vídeo, y tiene muchas aplicaciones interesantes en el mundo real, como la exploración semiautónoma. Normalmente, sólo unos pocos Proveedores de Servicios de Internet (PSI) proporcionan o permiten el enrutamiento de paquetes multicast. Esto limita claramente el número de usuarios potenciales. Una mejora podría consistir en la incorporación de un servidor de vídeo bajo demanda para permitir a los usuarios no multicast recibir la señal de vídeo, a expensas de una mayor carga sobre la red. Otro problema es la seguridad. Actualmente, el control del acceso se realiza basándose en la dirección IP, y no se lleva a cabo ningún cifrado. Una mejora posible consistiría en emplear una implementación Java de SSL (Alan et alt., 1996) para implementar cifrado y autenticación basados en el uso de certificados. Finalmente, si se incorpora la tecnología JINI a la arquitectura BGA, la aplicación BGen-Web se vería favorecida con varios servicios de valor añadido, como el descubrimiento automático de los servidores BGen-Web. El entorno BGen también podría incorporar nuevas tecnologías para soportar simulación distribuida. En primer lugar, el interface de usuario actual incluye únicamente una representación bidimensional del entorno. BGen podría beneficiarse del uso de tecnologías de visualización tridimensionales basadas en web, como VRML (Roehl et al., 1997) o Java3D (Brown and Petersen, 1999). Estas permiten que un navegador de web muestre contenidos tridimensionales generados desde una aplicación Java. La mayoría de los navegadores soportan VRML, mientras que sólo los que se ejecutan sobre Windows y Solaris soportan en la actualidad Java3D. Sin embargo, esto podría cambiar en un futuro próximo ya que Sun está desarrollando Java3D sobre OpenGL. Por otro lado, el uso del estándar IEEE 1278.1, el protocolo Distributed Interactive Simulation (DIS) podría permitir el uso de un entorno de simulación más elaborado y organizado dentro de un marco distribuido. En este caso, DIS podría ser un punto fundamental para integrar modelos de simulación no monolíticos para robots múltiples. Además, se encuentran disponibles implementaciones de DIS sobre Java. 68 HUMBERTO M ARTÍNEZ BARBERÁ 69 PARTE IIª 71 Abstract This thesis presents a series of mobile robotics related works, which cover the different layers needed to have a mobile robot perform tasks autonomously in an a priori unknown environment. Soft-Computing techniques (neural networks, genetic algorithms, and especially fuzzy logic) are used in the development of the different layers. The hardware architecture that leads to the Quaky-Ant platform is described first. This is a mobile robot which is the basis of most of the tests of the different algorithms in real environments. A solution to the sensor fusion problem, which is based on this platform, is described. The fusion is first solved by using a simple neural network, and then, it is solved by a hybrid system which is based on the ANFIS network and some clustering methods. One of them, the fuzzy tabu clustering, has been developed in the context of this thesis. Once a reliable sensor system is assumed (after the sensor fusion process), the navigation problem is outlined. This is solved, in two steps, by the application of two navigation techniques. In the first step a fuzzy grid map is applied for the map-building problem, and an A* search is applied for the path-planning problem. In the second step, a fuzzy segments map is applied for the map-building and localisation problems. Once the basic navigation and environment perception capabilities have been developed, a distributed control architecture (BGA) is defined. This serves as the basis of the different control-related elements and allows us to distinguish between deliberative and reactive processes. The different elements of the architecture communicate by a KQML over UDP based protocol. To aid in the development of control programs using the BGA architecture, a programming language (BG) has been defined. It uses heavily fuzzy logic and serves to define and specify the different control modules. The BGA architecture and the BG language have been integrated in a programming and simulation environment (BGen), which has been developed completely in Java in order to be incorporated in different mobile robots. To solve the behaviour fusion problem a genetic algorithm has been used. It generates and tunes the fusion fuzzy rules by evaluating the different candidate solutions in the BGen environment. Finally, an application of both the Quaky-Ant platform and the BGen environment is shown (BGen-Web). This allows the teleoperation of the mobile robot in order to perform exploratory tasks and also exhibits a high degree of interactivity due to the use of multicast video transmission. 73 Chapter 1 Introduction 1.1 Objectives and contents Mobile robotics is a relatively new research field which is related to the control of autonomous or semi-autonomous vehicles. Mobile robots are not only collections of algorithms but also physical embodiments that perform in real environments. Thus, mobile robots provide a real world framework in which to test theoretic concepts and algorithms. Although most mobile robots are in an experimental stage, important industrial or commercial applications are emerging. This thesis addresses the problem of making a mobile robot work with some reliability in completely unknown environments. This topic is rather general, and thus, it is related to the specification of mobile robot tasks, the development of mobile robot behaviours, the interpretation of the environment, and the validation of the final system. There are many families of techniques available to tackle the different subproblems. This thesis uses and develops intelligent techniques from what is known as soft computing (Zadeh, 1994) or computational intelligence (Schultz et al., 2000), making extensive use of fuzzy logic, genetic algorithms, and neural networks. One important point is the use of learning techniques to assist in the development of both robot behaviours and environment interpretation. Given the robotics problem above, the main objective of this thesis is the use, development and integration of techniques that do not make use of a priori environment models. Moreover, a development environment has been created to make the robotics system robust and to assist in the programming of control software. This development environment includes modules for specification, learning and validation of mobile robot behaviours, and integrates a hybrid (reactive-deliberative) architecture which runs in a distributed environment. The different areas of work of this thesis can be summarised in the following points: • Development of a distributed control architecture, based on the use of intelligent • techniques. Specification and development of the platform and the communication model between the different elements of the system. 75 CHAPTER 1: INTRODUCTION • Use and development of learning techniques for behaviour definition and combination and • for environment interpretation. Use and development of navigation techniques for unknown environments: map building, path planning and localisation. This thesis is divided into a small number of chapters. Each chapter is self-contained, examines previous research in the area, describes the proposed approach, and finally, presents some results and conclusions. This organisation reflects the layered approach usually applied to mobile robotics problems. Chapter 2 deals with the hardware platform developed for testing and validation of the different algorithms and techniques. It describes the physical platform of the Quaky-Ant robot and the previous developments upon which it is built. Chapter 3 is about sensors and sensor fusion. It describes some techniques to obtain valuable data from noisy and/or unreliable sensors. Chapter 4 is related to robot navigation. It describes the way the robot builds a model of the world, how it plans the path to reach the desired goal point, and how it remains referenced and localised to the modelled world. Chapter 5 deals with the software architecture used to develop mobile robotics control software (BGA) and the programming language used for robot programming (BG). Chapter 6 presents a sample software scheme, which is based on the BGA architecture and using the BG language, developed to perform different tasks, and shows some results obtained with the Quaky-Ant robot. Chapter 7 is related to robot behaviour learning. It focuses on how to combine the different behaviours of a robot using learning techniques without the intervention of a human operator. Chapter 8 presents an application of the different works of this thesis in a web-based exploration task. Finally, chapter 9 makes a brief overview of the thesis, summarises the results, and indicates directions of further research. The remaining sections of this introductory chapter serve as an overview of some fields related to this thesis and to put the later chapters into context. 1.2 Some background on mobile robotics 1.2.1 Mobile robotics Mobile robotics (Dudek and Jenkin, 2000) is a relatively new research area that deals with the control of autonomous and semiautonomous vehicles. What sets mobile robotics apart from other research areas such as conventional manipulator robotics, artificial intelligence, and computer vision is the emphasis on problems related to the understanding of large-scale spaces, that is, regions of space substantially larger than those that can be observed from a single vantage point. Although at first blush the distinction between sensing in largescale space, with its requirement for mobility, and local sensing may appear obscure, it has far-reaching implications. To behave in large–scale environment not only implies dealing with the incremental acquisition of knowledge, the estimation of positional error, the ability to recognise important or familiar objects or places, and real-time response, but it requires that all these functionalities be exhibited in concert. This issue of extended space influences all of 76 HUMBERTO M ARTÍNEZ BARBERÁ mobile robotics. The tasks of moving through space, sensing about space and reasoning about space are fundamental problems within the study of mobile robots. Mobile robots are not only a collection of algorithms for sensing, reasoning, and moving about space. They are also physical embodiments of these algorithms and ideas that must cope with all of the vagaries of the real world. As such, mobile robots provide a reality check for theoretical concepts and algorithms. Mobile robotics is the domain where literally the “rubber meets the road” for many algorithms in path planning, knowledge representation, sensing and reasoning. Although most of the mobile robot systems currently in operation are experimental, some mobile robot systems are beginning to be deployed in industrial settings. Real applications in which current mobile robots have been successfully installe d are characterised by one of the following attributes: an absence of an on-site human operator, a potentially high cost, and the need to tolerate environmental conditions that might not be acceptable to a human. As such, robots are especially well suited for tasks that exhibit one of more of the following characteristics: • An inhospitable environment into which sending a human being would be either very • • • costly or very dangerous. A remote environment into which sending a human operator would be too difficult or would take too long. An extreme instance is domains completely inaccessible to humans such as microscopic environments. A task with a very demanding duty cycle or a very high fatigue factor. A task that is highly disagreeable to a human. Mobile robots are feats of engineering. The actuators, processors, user interfaces, sensors, and communication mechanisms that permit a mobile robot to operate must be integrated so as to permit the entire system to function as a complete whole. The physical structure of a mobile robot is complex, requiring a considerable investment of human and financial resources to keep it operating. Robots can be considered from several different perspectives. At a physical, hardware, or mechanistic level, robots can be decomposed into the following: • A mechanism for making the robot move through its environment. It includes the physical • • • organisation of motors, belts, and gears necessary to make the robot move. A computer or collection of computers for controlling the robot. A collection of sensors with which the robot gathers information concerning its environment. Communications hardware to enable the robot to communicate to an off-board operator or any externally based computers. 77 CHAPTER 1: INTRODUCTION At a device level, the hardware details can be abstracted, and a robot can be considered as: • A software level abstraction of the motors, encoders, and motor driver boards that allow • • the robot to move. Most mobile robot hardware manufacturers provide support for the underlying hardware at this level rather than force the users to deal with the details of actually turning the motors. Software level mechanisms or libraries to provide access to the robot’s sensors. For example, the current readings of the sonar sensors as an array of distances. A standard communications mechanism such as an interface across a distributed network to the outside world. From a still more abstract perspective, mobile robots can be considered at a purely computational level, so that the sensors, communications, and locomotive systems are seen simply as software modules that enable the robot to interact with its environment. Typical components in a software architecture include the following: • A motion control subsystem. • A sensor control and processing subsystem. • A sensor interpretation subsystem, which decides on future actions. Even higher levels of abstraction exist. The term cognitive robotics is used to refer to the use of artificial intelligence techniques within a mobile robot and often assumes the existence of an idealised computational abstraction of the robot. 1.2.2 History of mobile robotics Artificial intelligence and mobile robotics (Nehmzow, 2000) have always been linked. Even before the days of the 1956 Dartmouth College Conference, at which the term “artificial intelligence” was coined, it was known that mobile robots could be made to perform interesting tasks and to learn. William G. Walter built a couple of mobile robots in the early 50s which were able to learn tasks like obstacle avoidance and phototaxis by instrumental conditioning, changing charges in a robot’s capacitor which controlled the robot’s behaviour (Walter, 1950). Early pioneer in artificial intelligence, such as Marvin Minsky and John McCarthy, became interested in robotics almost straightaway after de 1956 Dartmouth Conference. In the late 50s Minsky, together with Richard Geenblood and William Gosper, attempted to build a ping-pong playing robot. Due to technical difficulties with the machine hardware, eventually a robot was built that could catch a ball using a basket instead of the robot’s gripper. At Stanford, Nils Nilsson (Nilsson, 1969) developed the mobile robot SHAKEY (Figure 1.1) in 1969. This robot used two stepper motors in differential drive arrangement to provide 78 HUMBERTO M ARTÍNEZ BARBERÁ locomotion and was equipped with a triangulating optical range finder, a vidicon TV camera with controllable focus and iris and tactile sensors mounted on bumpers. It was connected to DEC PDP-10 and PDP-15 computers via radio and video links for off-board data processing. SHAKEY used programs for perception, world modelling, and acting. Low level action routines took care of simple moving, turning, and route planning. Intermediate level actions strung the low level ones together in ways that robustly accomplished more complex tasks. The highest level programs could make and execute plans to achieve goals given to it by a user. The system also generalised and saved these plans for possible future use. SHAKEY’s tasks included both obstacle avoidance and object movement within a highly structured environment. All obstacles were simply uniformly coloured blocks and wedges. SHAKEY maintained a list of formulae representing the objects of its environment and, using the classical logic based problem solver called STRIPS, it determined plans of actions that it then executed. Again, although the reasoning system worked properly, SHAKEY often had problems generating the symbolic information needed for the planner from the raw data obtained from the sensors. The hardware was the difficult part of the robot project. Hans Moravec, then a student at Stanford, recalls (Crevier, 1993): “An entire run of SHAKEY could involve the robot getting into a room, finding a block, being asked to move the block over the top of the platform, pushing a wedge against the platform, rolling up the ramp, and pushing the block up. SHAKEY never did this as one complete sequence. It did it in several independent attempts, which each had a high probability of failure. You were able to put together a movie that had all the pieces in it, but it really was flaky.” 79 CHAPTER 1: INTRODUCTION Figure 1.1: The SHAKEY mobile robot. Also at Stanford, John McCarthy started a project in the early 70s to build a robot that would assemble a colour television kit, and again the hardware of the robot (the physical act of inserting components into printed circuit boards with sufficient accuracy) proved to be the difficult part. Many researcher who were interested in robotics in the early days of artificial intelligence, left the hardware aspect of robotics aside and concentrated again on the software and reasoning components of the control system. The JPL Rover, developed in the 70s at the Jet Propulsion Laboratory in Pasadena, was designed for planetary exploration. Using a TV camera, laser range finder and tactile sensors the robot categorised its environment as “traversable”, “non-traversable” and “unknown”. Navigation was performed by dead reckoning using and inertial compass. At Stanford, Hans Moravec developed CART (Figure 1.2) in the late 70s (Moravec, 1990). This mobile robot’s task was obstacle avoidance using a camera sensor. The robot would take nine pictures at one location to create a two-dimensional world model. It would the move 1 metre ahead and repeat the process. To process those images took 15 minutes: 5 minutes to digitise the 9 pictures, 5 minutes to perform a low level visual reduction of the image, in which obstacle were represented as circles, and 5 minutes for the maintenance of the world model and path planning. CART was successful at avoiding obstacles, albeit very slow. It had, however, problems in getting its own position right or to see obstacles which lacked sufficiently high contrast. 80 HUMBERTO M ARTÍNEZ BARBERÁ Figure 1.2: The Stanford CART mobile robot. In the late 70s, HILARE was developed at LAAS in Toulouse (Briot et al., 1979). It was one of the first European mobile robot projects. HILARE used computer vision, laser range finders and ultrasonic sensors to navigate in its environment. A slow process of scene analysis, which was carried out every 10 seconds, and a faster dynamic vision process, which was carried out every 20 centimetres of movement, were the underlying control principles. Nearby obstacles were avoided using ultrasonic sensors. Navigation and path planning were achieved by using a two-dimensional polygon representation of space and a global coordinate system. 1.2.3 Robot intelligence But, what about robot intelligence? There is no a usable and clear definition of intelligence (Nehmzow, 2000). Intelligence refers to behaviour, and the yardstick by which it is measured is very dependent on education, understanding, and point of view. Alan Turing wrote in 1947: “The extent to which we regard something as behaving in an intelligent manner is determined as much by our own state of mind and training as by the properties of the object under consideration. If we are able to explain and predict its behaviour or if there seems to be little underlying plan, we have little temptation to imagine intelligence. With the same object, therefore, it is possible that one man would consider it as intelligent and another would not; the second man would have found out the rules of its behaviour.” However intelligence is defined, the definition is tied to human behaviour. Humans consider themselves intelligent, and therefore any machine that does what humans do has to 81 CHAPTER 1: INTRODUCTION be considered intelligent too. For mobile robotics, this view has interesting consequences. People have always felt that games like chess require intelligence, whereas moving about in the world without any major problem is just ordinary stuff, not requiring intelligence. Rodney Brooks defined intelligence as: “… what humans do, pretty much all the time.” With this definition, ordinary behaviour is the key to intelligent robots. It has proven a lot more difficult to build robots that move around without getting into trouble than it has to build chess playing machines. This artificial intelligence based point of view is well accepted and it serves as a generalised reference, although there are some cases where it is not, as Wallace Marshall stated in 1987: “Artificial stupidity may be defined as an attempt by computer scientists to create computer programs capable of causing problems of a type normally associated with human thought.” Like intelligence, the behaviour of any robot cannot be evaluated independently from the robot’s environment and the task which the robot is performing Robot, task and environment depend upon each other, and influence each other (Figure 1.3). Robot Task Environment Figure 1.3: Robot, task and environment interlinks. A famous example of this fact is Tom Smither’s spider example: “… extremely competent at survival in the countryside, but utterly incompetent in the bathtub!.” This means, consequently, that a general-purpose robot cannot exist, in the same way that a general-purpose living being does not exist. A robot’s function and operation are defined by the robot’s own behaviour within a specific environment, taking into account a specific task. Only the simultaneous description of agent, task and environment described an agent completely. 82 HUMBERTO M ARTÍNEZ BARBERÁ 1.2.4 Soft Computing There are many techniques that can be used to provide a robot with intelligence. Nowadays, one of the hottest field is what is called Soft Computing (Bonissone, 1997), which is a recently coined term describing the symbiotic use of many emerging computing disciplines. Lotfi Zadeh wrote in 1994: “... in contrast to traditional, hard computing, soft computing is tolerant of imprecision, uncertainty, and partial truth.” In this context, the main components of Soft Computing are Fuzzy Logic (FL), Probabilistic Reasoning (PR), Neural Networks (NN) and Genetic Algorithms (GA). Fuzzy Logic, introduced by Zadeh (Zadeh, 1965), provides a language, with syntax and local semantics, in which an expert can translate his qualitative knowledge about a problem to be solved. FL’s main characteristic is the robustness of its interpolative reasoning mechanism. Probabilistic Reasoning, based of the original work of Bayes (Bayes, 1763) and DempsterShafer’s theory of belief (Dempster, 1967)(Shafer, 1976), provides a mechanism to evaluate the outcome of systems affected by randomness or other types of probabilistic uncertainty. PR’s main characteristic is its ability to update previous outcome estimates by conditioning them with newly available information. Neural Networks, first explored by Rosenbaltt (Rosenbaltt, 1959) and Widrow and Hoff (Widrow and Hoff, 1960), are computational structures that can be trained to learn patterns from examples. By using a training set that samples the relation between inputs and outputs, and a back-propagation type algorithm (Werbos, 1974), NNs provide a supervised learning algorithm that performs fine-granule local optimisation. Genetic Algorithms, proposed by Holland (Holland, 1975), provide a way to perform randomised global search in a solution space. In this space, a population of candidate solutions, encoded as chromosomes, is evaluated by a fitness function in terms of its performance. The best candidates evolve and pass some of their characteristics to their offsprings. The common denominator of these technologies is their departure from classical reasoning and modelling approaches that are usually based on boolean logic, analytical models, crisp classification, and deterministic search. In ideal problem formulations, the systems to be modelled or controlled are described by complete and precise information. In this case, formal reasoning systems, such as theorem provers, can be used to attach binary truth-values to statements describing the state or behaviour of the physical systems. When attempting to solve real-world problems, however, it is easy to realise that the problems are typically ill defined systems, difficult to model and with large scale solution space. In this case, precise models are impractical, too expensive, or non-existent. The relevant available information is usually in the form of empirical prior knowledge and inputoutput data representing instances of the system’s behaviour. Therefore, approximate reasoning systems capable of handling such imperfect information are needed. Soft Computing technologies provide with a set of flexible computing tools to perform these approximate reasoning and search tasks. Mor eover, these tools can also be combined into hybrid systems, such as the control of GAs and NNs parameters by FL, the evolution of NNs topologies and weights by GAs or its application to tune FL controllers, and the realisation of 83 CHAPTER 1: INTRODUCTION FL controllers as NNs tuned by backpropagation-type algorithms. The figure below (Figure 1.4) provides a graphical summary of these hybrid algorithms and their components. Lighter cells correspond to techniques and technologies covered or used in this thesis. In the followings subsections neural networks, genetic algorithms and fuzzy logic are introduced. The idea is to provide a very light introduction so that the non-expert reader can follow the parts of this thesis related to intelligent techniques and its applications. For a more in deep discussion the reader is suggested to consult (Mitchell, 1997) and (Jang et alt., 1997). Approximate Reasoning Approaches Probabilistic Models Bayesian Belief Nets DempsterShafer Theory Search/Optimisation Approaches Multivalued and Fuzzy Logics MV-Algebras Fuzzy Logic Neural Networks Feedforward Single layer Evolutionary Computation Feedback Kohonen SOM RBF nets ART models Hopfield nets Evol . Strat. Evol. Progr. Gen. Progr. Gen. Algs. Compet. nets Multi layer Probability of fuzzy event Belief of fuzzy event FL controllers generated and tuned by GAs FL controllers tuned by NNs NN parameters controlled by FL NN topology and weights generated by GAs GA parameters controlled by GA GA parameters controlled by FL Figure 1.4: Soft Computing overview. 1.2.4.1 Neural networks Neural network (Mitchell, 1997) learning methods provide a robust approach to approximating real-valued, discrete-valued, and vector -valued target functions. For certain types of problems, such as learning to interpret real-world sensor data, artificial neural networks are among the most effective learning methods currently known. The study of artificial neural networks has been inspired in part by the observation that biological learning systems are built of very complex webs of interconnected neurons. In rough analogy, artificial neural networks are built out of a densely interconnected set of simple units, where each unit takes a number of real-valued inputs and produces a single real-valued output. The simplest neural network is the perceptron. It is composed of a single neuron with some real inputs and one output, which gives 1 if a linear combination of the input is greater 84 HUMBERTO M ARTÍNEZ BARBERÁ than 0, or –1 otherwise. Formally, given the inputs x1, …, xn, the output is as follows (Eq. 1.1): 1 w0 x0 + ... + wn x n > 0 o(x1 ,...,xn ) = otherwise −1 (1.1) where wi, i=0,…,n are the weights assigned to the inputs. These weights indicate the importance of the corresponding inputs. The perceptron produces an output using a square function, which is called the squashing function (Figure 1.5). Figure 1.5: Single perceptron structure. This kind of node is very powerful for function representation or approximation. A perceptron can represent logic functions as AND and OR. It is important to notice that a single perceptron can not represent a XOR function or other arbitrary functions. The basic rule for perceptron training is very simple. It takes and applies each example and varies the weights wi (Eq. 1.2) accordingly. ∆ wi = η(t − o) xi (1.2) wi ← wi + ∆wi where t is the desired output, and o is the output obtained with the current example. This process converges provided the example set is linearly separable and η is small enough. This η, called learning rate, controls the convergence speed of the hyperplane that separates the examples. When the examples are not linearly independent separable the basic rule does not provide convergence. In this case a different technique, known as delta rule, solves the problem by providing asymptotic convergence to the hyperplane that best matches the examples, even if they are not linearly separable. It uses a function (Eq. 1.3) to measure the training error, that is, how far is the approximation of the perceptron from the function defined by the examples. r 1 2 E( w) = ∑ (td − od ) 2 d∈D (1.3) 85 CHAPTER 1: INTRODUCTION where D is the training set, td is the desired output and o d is the output of the perceptron. From the definition it is clear that the function to be minimised is a hyperbolic surface with a global minimum. The delta rule is a two-step process. First, the different ∆wi are calculated for each example, and second, the wi are updated with the new values (Eq . 1.4). The different wi are initialised randomly and the ∆wi are set to zero. The process is stopped when the desired error is achieved. Other learning rules are described in the literature (Mitchell, 1997). ∆ wi ← ∆ wi + η(t − o)xi (1.4) wi ← wi + ∆wi Figure 1.6: Multilayer neural network. The perceptron can approximate simple and linear functions. If more complex functions are to be approximated or modelled a richer structure is needed. The multi-layer network (Figure 1.6) is similar to a series of interconnected perceptrons where the linear output unit has been replaced by a non-linear function (Figure 1.7), usually a sigmoid function (Eq. 1.5). Figure 1.7: Sigmoid neuron structure. σ(y) = 1 y 1 + e− (1.5) This structure is more complex than the single perceptron, and thus, it can approximate more complex and non-linear functions. In a similar way as with the delta rule, the backpropagation algorithm learns the weights for a multilayer network. It also uses a training error function (Eq. 1.6) considering different input and output nodes. 86 HUMBERTO M ARTÍNEZ BARBERÁ r 1 2 E( w) = ∑ ∑ (t kd − okd ) 2 d∈D k ∈O (1.6) where D is the training set, O is the output nodes set, tkd is the desired output of the example d in the node k, and okd is the output obtained in node k for the example d. In this case the error surface is not like the perceptron one, and now more than one local minimum can appear. Unfortunately, this means that gradient descent is guaranteed only to converge toward some local minimum, and not necessarily to the global minimum error. Despite this fact, the backpropagation has been found to produce excellent results in many real-world applications. The backpropagation algorithm proceeds as follows. It first initialises the values of the different weights wi randomly. Then, it iteratively applies a two-phase process to each example. The first phase propagates the input forward the network, computing the output ou of every unit u in the network. The second phase propagates the error backward through the network. For each network output unit k, its error term is computed (Eq. 1.7). δ k ← o k (1 − ok )(t k − ok ) (1.7) Then, for each hidden unit h, its error term is computed (Eq. 1.8). δ h ← oh (1 − oh ) ∑ wkhδ k (1.8) k∈O Finally, the network weights wji are updated (Eq. 1.9). ∆w ji = ηδ j x ji (1.9) w ji ← w ji + ∆w ji This method (the stochastic gradient descent backpropagation) is in fact very similar to the delta rule and takes into account the derivative of the sigmoid units. The algorithm is typically applied thousands of times (epochs) over the entire training set. The termination condition is usually the maximum number of epochs, although different criteria, like the learning error gradient, threshold can be used. The set of functions that can be represented by a feedforward multilayer network depends on the width and depth of the network. Although much is still unknown about which function classes can be described by which types of networks, three quite general results are known: • Boolean functions. Every boolean function can be represented exactly by some network • with two layers of units, although the number of hidden layers grows exponentially in the worst case with the number of network inputs. Continuos functions. Every bounded continuous function can be approximated with arbitrarily small error (under a finite norm) by a network with two layers of units (Cybenko, 1989). The theorem in this case applies to networks that use sigmoid units at 87 CHAPTER 1: INTRODUCTION • the hidden layer and linear units at the output layer. The number of hidden units required depends on the function to be approximated. Arbitrary functions. Any function can be approximated to arbitrary accuracy by a network with three layer of units (Cybenko, 1988). Again, the output layer uses linear units, the two hidden layers use sigmoid units, and the number of units required at each layer is not known in general. 1.2.4.2 Genetic algorithms Genetic algorithms (Mitchell, 1997) provide a learning method motivated by an analogy to biological evolution. Rather than search from general-to-specific hypotheses, or from simple-to-complex, genetic algorithms generate successor hypotheses by repeatedly mutating and recombining parts of the best currently known hypotheses. At each step, a collection of hypotheses, called the current population, is updated by replacing some fraction of the population by offspring of the best current hypotheses. The best hypothesis is defined as the one that optimises a predefined numerical measure for the problem at hand, called the hypothesis fitness. Although different implementations of genetic algorithms vary in their details, they typically share the following structure. The algorithm operates by iteratively updating the population, a pool of hypotheses. On each iteration, all members of the population are evaluated according to the fitness function. A new population is then generated by probabilistically selecting the best individuals from the current population. Some of these selected individuals are carried forward into the next generation population intact. Others are used as the basis for creating new offspring individuals by applying genetic operations such as crossover and mutation. Hypotheses in genetic algorithms are often represented by bit strings, so that they can be easily manipulated by genetic operators. These hypotheses can be quite complex. For example sets of if-then rules can easily be represented in this way by choosing an encoding of rules that allocates specific substrings for each rule precondition and postcondition. The generation of successors in a genetic algorithm is determined by a set of operators that recombine and mutate selected members of the current population. These operators correspond to idealised versions of the genetic operations found in biological evolution. The two most common operators are crossover and mutation. The crossover operator (Figure 1.8) produces two new offspring from two parent strings by copying selected bits from each parent. The bit at position i in each offspring is copied from the bit at position i in one of the parents. The choice of which parent contributes the bit for position i is determined by and additional string called the crossover mask. In addition to recombination operators, a second type of operators produces offspring from a single parent. 88 HUMBERTO M ARTÍNEZ BARBERÁ Figure 1.8: Crossover operator. The mutation operator (Figure 1.9) produces small random changes to the bit string by choosing a single bit at random and then changing its value. Some genetic systems employ additional operators, especially operators that are specialised to the particular hypothesis representation used by the system (Mitchell, 1997). Figure 1.9: Mutation operator. The fitness function defines the criterion for ranking potential hypotheses and for probabilistically selecting them for inclusion in the next generation population. If the task is to learn classification rules, then the fitness function typically has a component the scores the classification accuracy of the rule set over a set of provided examples. More generally, when the bit-string hypothesis is interpreted as a complex procedure, the fitness function may measure the overall performance of the resulting procedure rather than performance of individual rules. This is the case when the bit-string represents a collection of if-then rules that will be chained together to control a robotic device. In some genetic systems the probability that a hypothesis will be selected is given by the ratio of its fitness to the fitness of other members of the current population (Eq. 1.10). This method is called roulette wheel selection. Pr(h) = f (h) ∑ f (hi ) (1.10) i In tournament selection two hypotheses are first chosen at random from the current population. With some predefined probability p the best of these two is then selected, and with probability (1 – p) the worst hypothesis is selected. Tournament selection often yields a more diverse population than roulette wheel selection. Genetic algorithms employ a randomised beam search method to seek a maximally fit hypothesis. To contrast the hypothesis space search of genetics with that of neural network backpropagation, for example, the gradient descent search in backpropagation moves smoothly from one hypothesis to a new hypothesis which is very similar. In contrast, the genetic algorithm search can move much more abruptly, replacing a parent hypothesis by an offspring which may be radically different from the parent. Then a genetic algorithm is less likely to fall into the same kind of local minima that can plague gradient decent methods. In many natural systems, individual organisms learn to adapt significantly during their lifetime. Different evolution models can be defined to integrate species evolution and individual learning. In Lamarck ian evolution the experiences of a single individual directly affect the genetic makeup of its offspring. Although current scientific evidence 89 CHAPTER 1: INTRODUCTION overwhelmingly contradicts Lamarck’s model in biological systems, recent computer studies have shown that Lamarck processes can sometimes improve the effectiveness of genetic algorithms. Although Lamarckian evolution is not an accepted model of biological evolution, other mechanisms have been suggested by which individual learning can alter the course of evolution. The Baldwin effect, which provides an indirect mechanism for individual learning to positively impact the rate of evolutionary progress, is based on the following observations: if a species is evolving in a changing environment, there will be evolutionary pressure to favor individuals with capability to learn during their lifetime; and those individuals who are able to learn many traits will rely less strongly on their genetic code to hard-wire traits, relying on individual learning to overcome the missing or not quite optimised traits in their genetic code. 1.2.4.3 Fuzzy sets and systems Fuzzy set theory (Jang et alt., 1997) provides a systematic calculus to deal with imprecise and incomplete information linguistically. Numerical computations can be performed using the membership functions associated to linguistic labels. These linguistic labels can be placed in the antecedent and consequent of if-then rules, resulting in what is called fuzzy if-then rules. Moreover, a selection of fuzzy if-then rules forms the key component of a fuzzy inference system that can effectively model human expertise in a specific application. A classical set is a set with a crisp boundary. Usually a classical set (Eq. 1.11) can be defined using a membership function µ, which characterises the elements that belong to the set over a given universe of discourse X. µc : X → {0,1} (1.11) In contrast to a classical set, a fuzzy set, as the name implies, is a set without a crisp boundary. That is, the transition from “belong to a set” to “not belong to a set” is gradual. Usually, a fuzzy set (Eq. 1.12) is defined using a membership function µ, which characterises the degree of membership to the set of all the elements from a given universe of discourse X. Given this definition, a classical set is an especial case of a fuzzy set where the elements have full or zero degree of membership. µf : X → [0,1] (1.12) A fuzzy set is uniquely specified by its membership function. In practice, when the universe of discourse X is a continuous space, X is usually partitioned (Figure 1.10) into several fuzzy sets whose membership functions cover X in a more or less uniform manner. These fuzzy sets, which usually carry names that conform to adjectives appearing in daily linguistic usage, are called linguistic values or linguistic labels. 90 HUMBERTO M ARTÍNEZ BARBERÁ Figure 1.10: A fuzzy partition. Union, intersection, and complement are the most basic operations on classical sets. Corresponding to these ordinary operations, fuzzy sets have similar operations: union (Eq. 1.13), which intuitively corresponds to the smallest fuzzy set containing A and B, intersection (Eq. 1.14), which intuitively corresponds to the largest fuzzy set contained in both A and B, and complement (Eq. 1.15), which intuitively corresponds to the elements that do not belong to the set A. The operations defined this way are usually referred as classical fuzzy operators. C = A∪ B µC ( x) = max{µA (x), µB (x)} (1.13) C = A∩ B µC (x) = min {µ A ( x), µ B ( x)} (1.14) A= X−A µ A (x) = max{0,1 − µA (x)} (1.15) Since most fuzzy sets in use have a universe of discourse X consisting of the real line R, it would be impractical to list all the pairs defining a membership function. A more convenient and concise way to define a membership function is a mathematical formula. Typical membership functions (Figure 1.11) have triangular, trapezoidal gaussian or bell shapes. 91 CHAPTER 1: INTRODUCTION Figure 1.11: Standard membership function shapes. The extension principle is a basic concept of fuzzy sets theory that provides a general procedure for extending crisp domains of mathematical expressions to fuzzy domains. If f is a function from X to Y, and A is a fuzzy set on X defined as (Eq. 1.16): A = µ A (x1 ) / x1 + ... + µA (xn )/ x n (1.16) Then the extension principle states that the image of the fuzzy set A under the mapping f can be expressed as a fuzzy set B (Eq. 1.17): yi = f ( xi ) (1.17) µB ( y) = max {µA (xi )} x= f −1 (y) B = f (A) = µ B (y1 )/ y1 + ...+ µB (y n )/ yn Binary fuzzy relations (Eq. 1.18) are fuzzy sets in X x Y, which can map each element in X x Y to a membership grade between 0 and 1. These fuzzy relations are in fact 2D membership functions. R = {(( x, y), µR (x, y)) (x, y) ∈ X × Y} (1.18) A fuzzy if-then rule assumes the form (Eq. 1.19): (1.19) If x is A then y is B where A and B are linguistic values defined by fuzzy sets on universes of discourse X and Y. Fuzzy rules are usually also expressed as fuzzy implications (A → B). In essence, this expression describes a fuzzy relation between two variables x and y, and thus it can be viewed as a 2D membership function. Fuzzy reasoning is an inference procedure that derives conclusions from a set of fuzzy if-then rules and known facts. The basic rule of inference in traditional two-valued logic is modus ponens, according to which the truth of a proposition B can be inferred from the truth of A and the implication A → B. If these propositions make use of fuzzy sets of appropriate universes, the procedure is called fuzzy reasoning or generalised modus ponens. Given a certain rule and fact (Eq. 1.20), the fuzzy reasoning is as follows (Eq. 1.21): Premise 1 (fact): Premise 2 (rule): Consequence (conclusion): x is A’ if x is A then y is B y is B’ µB ′ (y) = maxx min {µ A′ ( x), µR ( x, y)} (1.20) (1.21) where µR(x,y) is a fuzzy implication, which, for instance, can be the Mamdani fuzzy implication (Eq. 1.22). 92 HUMBERTO M ARTÍNEZ BARBERÁ Rm = A × B = ∫ X ×Y µ A ( x) ∧ µ B ( y)/(x, y) (1.22) fc (x, y) = min {x,y} For instance, given a series of rules with multiple antecedents (Eq. 1.23), the fuzzy reasoning procedure is graphically as follows (Figure 1.12): Premise 1 (fact): Premise 2 (rule): Premise 3 (rule): x is A’ and y is B’ if x is A1 and y is B1 then z is C1 if x is A2 and y is B2 then z is C2 Consequence (conclusion): z is C’ (1.23) Figure 1.12: Graphic representation of fuzzy reasoning. A fuzzy inference system is a popular computing framework based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. It has found successful applications in a wide variety of fields, such as automatic control, data classification, decision analysis, expert systems, time series prediction, robotics, and pattern recognition. It is also known by other names as fuzzy logic controller. The basic structure of a fuzzy inference system consists of three conceptual components: a rule base, which contains a selection of fuzzy rules, a database, which defines the membership functions used in the fuzzy rules, and a reasoning mechanism, which performs the inference procedure upon the rules and given facts to derive a reasonable output or conclusion. Usually, a fuzzy inference system can take either fuzzy inputs or crisp inputs (which are viewed as fuzzy singletons), and the outputs are fuzzy sets. Sometimes is necessary to have crisp outputs instead, especially in situations where the fuzzy system is used as a controller. In this case a defuzzification method is applied to obtain a crisp value that best represents a fuzzy set. With crisp inputs and outputs, a fuzzy inference system implements a nonlinear mapping from its input space to output space. This mapping is accomplished by a number of fuzzy if-then rules, each one describing the local behaviour of the mapping. In particular, the antecedent of a rule defines a fuzzy region in the input space, while the consequent specifies the output in the fuzzy region. 93 CHAPTER 1: INTRODUCTION Three types of fuzzy inference systems have been widely employed in various applications. The differences between them lie in the consequents of their fuzzy rules, and thus their aggregation and defuzzification procedures differ accordingly. The Mamdani model uses standard fuzzy sets in both the antecedents and consequents of the rules, and utilises a defuzzification procedure to obtain a crisp value from the output. There are many different defuzzification methods, although the most widely adopted strategy is known as the centroid of area (Eq. 1.24), which gives the centroid of the area of the fuzzy set as the result. zCOA = ∫µ Z A ∫µ (z) ⋅ z ⋅ dz A (1.24) (z) ⋅ dz Z The Sugeno model, also known as TSK, uses standard fuzzy sets in the antecedents of the rules, and utilises a crisp function f(x 1,…,xn) in the output. Usually this function is a polynomial in the input variables x1,…,xn, but it can be any function as long as it can approximately describe the output of the model within the fuzzy region specified by the antecedent of the rule. Since each rule has a crisp output, the overall output is obtained via weighted average, thus avoiding the time-consuming process of defuzzification required in a Mamdani model. Unlike the Mamdani model, the Sugeno model cannot follow the compositional rule of inference strictly in its fuzzy reasoning mechanism. This poses some difficulties when the inputs to a Sugeno model are fuzzy. Without the time-consuming and mathematically intractable defuzzification operation, the Sugeno fuzzy model is by far the most popular candidate for sample-data-based fuzzy modelling. The Tsukamoto model uses standard fuzzy sets in the antecedents and consequents of the rules, but those fuzzy sets are constrained to have monotonical membership functions in the consequents. As a result, the inferred output of each rule is defined as a crisp value induced by the rule’s firing strength. The overall output is taken as the weighted average of each rule’s output. Since each rule infers a crisp output and the aggregation is by weighted average, the Tsukamoto model avoids the time-consuming process of defuzzification. However, this model is not used often since it is not as transparent as either the Mamdani or Sugeno models. In a similar way as in neural networks, the representational power of fuzzy systems is characterised. Thus, from a control point of view, any continuous nonlinear function can be approximated as exactly as needed with a finite set of fuzzy variables, values, and rules (Kosko, 1992). But still what is not answered is how many rules are needed and how to find them. In many cases, relatively small and simple systems perform well in real-world applications. The advantages of using fuzzy systems for control usually fall in the following categories (Driankov et alt., 1993): • Expert knowledge representation. In many industrial processes there is a variety of conventional control loops, but a human operator is still needed during certain phases of the operation (starting/closing phase, parameterisation of controllers, or switching between operational modes). The knowledge of the operator is usually based on experience and usually does not lend itself to being expressed in differential equations. In 94 HUMBERTO M ARTÍNEZ BARBERÁ • • this case, fuzzy control offers a method for representing and implementing the expert’s knowledge. Robust nonlinear control. In some cases there are good and exact models of systems available. In such cases it is not difficult to realise classical controllers that work pretty well for known system parameters within a narrow range. In the presence of major disturbances, classical systems are usually faced with a trade-off between fast reactions with significant overshoot or smooth but slow reactions. In this case, fuzzy control offers ways to implement simple but robust solutions that cover a wide range of system parameters and that can cope with major disturbances. Reduction of development time. Fuzzy control offers languages at two different levels of expertise: a symbolic level, which is appropriate for describing the application-related expert knowledge, and a compiled level, which is well understood by the development engineers. Since there is a well-defined, formal translation between those levels, a fuzzy based approach can help reducing communication problems. In typical real systems, communication problems between application and development specialist can lead to very long development times. 1.3 Contributions of this thesis This thesis describes an experimental investigation into the issues of mobile robot tasks specification, mobile robot behaviours development, environment interpretation, and robot performance validation. For this task, a series of methods, algorithms, techniques and physical robots have been integrated into effective working systems. The novel contributions of this research consists of: • The definition of BGA, a new control architecture, which is based on the concept of • • • • • multi-agent systems and relies on a blackboard architecture. From a cognitive point of view, this architecture is used as a hybrid three-layer architecture. In the context of the BGA architecture, the definition and implementation of BGL, a new robot programming language and its development environment to speedup the development time for control software. The implementation of an UDP based KQML protocol for agent communication in BGA, including a simple yet powerful content description language. One of its applications is the extension of the BGA architecture to a web-based robotics application. The modification of the context dependent blending technique for behaviour fusion and its later integration in a learning environment to automatically obtain the blending rules. The development of a new clustering algorithm which is applied in a sensor fusion problem. The modification and improvement of different map-building and path-planning algorithms to achieve autonomous control of a robot in unknown environments. 95 Chapter 2 Hardware Architecture 2.1 Introduction During the preparation of this thesis, different mobile robotics platforms were built and tested, from the very simple first MadCar to the final Quaky-Ant robot. In this chapter the different models are presented and described, and their most important pros and cons are analysed. The Quaky-Ant robot, which is one the robots used to obtain the real-world results of this thesis, is described in greater depth. The development of the different robots was constrained by different factors, the most important being the physical dimensions of the laboratory and adjacent corridors, and the cost. The physical constraint necessitated that the robots were no larger than medium size (approximately 50 centimetres of diameter). The budget limited the final robot cost to no more than $3,500. In addition to the initial constraints, some requisites were imposed on the final robot, namely: • The platform was to be flexible enough to test the different algorithms and architectures • • presented in this thesis (see following chapters). The platform was to be robust and easy to upgrade, and use different sensors (mainly sonar and infrared sensors). The platform was to be capable of successfully carrying out complex tasks, operating in both indoor environments and non-rough outdoor environments. A brief survey of the medium size robotics market (see appendix A) shows that there are clearly two commercial robot groups: the high-tech group and the do-it-yourself group. While the latter does not meet the criteria specified above, the former provides ways of enhancing and expanding the systems, and includes the necessary set of sensors, either in basic configuration or as optional packages. The platforms that are close to fulfilling the requirements are the Magellan and Pioneer-2 DX (see appendix A). They present some important drawbacks: • They are not easy to expand using custom sensor modules, mainly because their interconnection buses and protocols are proprietary. 97 CHAPTER 2: HARDWARE ARCHITECTURE • They are intended to be used in conjunction with their own development environments, and the only option for testing different systems completely is to build gateways, with the corresponding performance penalty. Although the present work could have been performed using off-the-shelf mobile robots, it was decided to develop one from scratch, taking commercial robots as models when possible and trying to improving them. Either way, leaving aside comparisons of performance with commercial robots, the Quaky-Ant robot features a flexible and modular architecture designed to support custom developments, namely algorithms for sensors (chapter 3), navigation algorithms (chapter 4), and a control architecture (chapter 5). Finally, the robot has been successfully used in real world applications (chapters 6 and 8). 2.2 MadCar micro-robot 2.2.1 Description The robot named MadCar (Figure 2.1) was the first platform developed and built to perform tests for this thesis (Villalba and Martínez Barberá, 1998). MadCar is based on an AA-Kidsmate BMW Sport RC (radio-controlled for hobbyist use) 1:10 scale model car. The purpose of this robot was to test some simple reactive algorithms in a low cost platform, using sonar as the main sensory system. The whole system is based on a microcontroller that is in charge of reading sensors, performing some processing, and actuating motors. The car has two 12V DC motors, one connected to a mechanical differential for traction (capable of driving the car forward or backward), and the other connected to an Ackerman style steering system (to turn front wheels to the left or to the right). The main characteristics of the platform are presented in Table 2.1, and the distribution of the different elements is shown in Figure 2.2. Dimensions Base: 50 x 25 cm Height: 18 cm Weight: 2 Kg Sensors 4 sonar sensors 1 shaft encoder 1 Hall effect compass 4 switches Actuators 2 DC motors Additional equipment 1 MC68HC11 1 RF modem (100 MHz) 10 NiCd 1.2V 1.8Ah batteries 1 NiCd 9V 0.8 Ah battery Table 2.1: Main characteristics of MadCar. The car has different sensors (Table 2.1 and Figure 2.2) and the most relevant characteristics are listed below: • The custom Hall effect compass is used for measuring direction and elevation. The accuracy is not very good because of the mechanical design: it is based on an oilgimballed mechanical compass, with the magnetic ball having a tendency to get trapped in the mountings. 98 HUMBERTO M ARTÍNEZ BARBERÁ • The four ultrasonic sonar sensors are used to measure the distance to walls and objects. A • • custom board to control and multiplex the sonar sensors firing modules was designed. This control board does not allow firing of more than one sensor simultaneously. The shaft encoder sensor mounted on the main axle is used for measuring car speed. The precision is 8 pulses per revolution, and it was intended only as a rough estimate of the car’s speed. The four switches are arranged in two bumpers (one located at the front and the other at the rear) and they are used to detect collisions. Figure 2.1: MadCar’s lateral view. Figure 2.2: Hardware arrangement. A HandyBoard (Martin, 1994), a Motorola 68HC11 microcontroller based board with 32 KB of static memory, controls the robot. It has a set of A/D-D/A input/output converters, in which all the sensors and actuators are connected (Figure 2.3), and an integrated RS-232 serial port, to which a wireless modem is attached. The control software has been developed using Interactive C (Martin, 1996) using a 68040 Macintosh as the host computer. The modem, a custom half-duplex wireless 1200 baud modem, in conjunction with the RC emitter/receptor, is used to control the car manually and to transfer sensor data to the host computer to perform data analysis. The resulting control software (see chapter 5 for the control software architecture used) is then downloaded to the microcontroller for later autonomous execution. 99 CHAPTER 2: HARDWARE ARCHITECTURE Sensors Sonar Effectors Traction Motor 68HC11 Steering Compass Switches Shaft Encoder Figure 2.3: MadCar’s sensors and effectors. 2.2.2 Sonar controller The sonar controller uses the Polaroid 6500 series Ultrasonic Sonar Range Finder modules. These sensors operate using a single transducer for both sending and receiving the ultrasonic pulses (at 49.4 kHz). The maximum intensity width of the ultrasonic beam is typically between 15 – 20 degrees. The module can measure from 15 centimetres up to 10 metres with an accuracy of ± 1% over the entire range. The interface to these modules is simple: there are some signals that need to follow a simple timing (Figure 2.4) to control the firing of the sensor. Figure 2.4: Polaroid 6500 timing diagram. As the microcontroller board does not have enough free I/O ports to control the four Polaroid 6500 modules, a sonar controller board was designed to fulfil the following objectives: • The board had to be capable of firing all the sensors (although not simultaneously). • The number of I/O lines should be minimal. The sonar control board (Figure 2.5) works as follows. The microcontroller selects the sensor to be fired (Select0 and Select1 lines), and then sends the firing command (Init line). 100 HUMBERTO M ARTÍNEZ BARBERÁ The firing signals are generated and as result the 6500 module emits an ultrasonic pulse. When this pulse returns to the transducer, an echo signal is sent to the microcontroller (Echo line) where an interruption is produced. Figure 2.5: Sonar controller schematics. Figure 2.5 shows the control circuitry for one 6500 module. The Select0 and Select1 lines set the active module using a 74LS139 multiplexer. When an Init signal arrives from the microcontroller (active in low level) it is passed on to a BC237 transistor in common emitter configuration. This acts as an inverter and a current driver to generate the INIT signal in the 6500 module. In order to avoid detecting the emitted pulse as an echo (only one transducer per 6500 module is used), the 6500 module generates an internal blanking signal for 2.38 ms. This sets the minimum sensing distance to 0.4 m. To reduce this distance to 0.1 m a blanking inhibit signal BINH is also needed. The Init signal from the microcontroller is driven through a RC network that produces a delay of 0.6 ms, which is passed on to the 74LS139, and finally to a 7404 inverter to obtain the BINH signal. Additionally, the BINH is inverted to obtain the blanking BLNK signal. The ECHO signal from the 6500 module is driven to an open-collector inverter gate to obtain the Echo signal in the microcontroller. 2.2.3 Shaft encoder To measure the speed of the car approximately, 8 magnets have been disposed in the perimeter of one of the rear wheels. A Hall effect based encoder generates a pulse, which is driven to the microcontroller whenever a magnet is detected. The wheels have a diameter of 7 cm, giving an encoder accuracy of 2.74 cm. As the Hall effect sensor signal cannot be directly 101 CHAPTER 2: HARDWARE ARCHITECTURE attached to the microcontroller, because the signal needs some conditioning, an encoder board was designed. The encoder circuitry (Figure 2.6) consists of three stages: a band-pass filter, an amplifier, and a hysteresis comparer. The UGN3501 Hall effect sensor produces a voltage that depends on the distance to the magnet. In our case there was some noise with a frequency of 500 Hz. The band pass filter (R1C1-R2C2) eliminates noise below 5 Hz and above 100 Hz (with 8 magnets this gives a superior bound of 2 m/s, far higher than the car’s working speed). After the filter, the signal is driven to the A741 opamp based amplifier, which amplifies the signal 10 times. Finally, the amplified signal is converted to a square wave signal by the hysteresis comparer. This square wave signal is driven to a PN2221 transistor in common emitter configuration to limit the output voltage to +5V in order to protect the digital I/O circuitry of the HandyBoard. Figure 2.6: Shaft encoder schematics. 2.2.4 Electronic compass As an additional sensor system, MadCar has a custom electronic compass, which measures the orientation of the car with respect to the Earth’s magnetic North pole. The electronic compass, which has been developed and built, is composed of two subsystems: • Mechanical subsystem: composed of an extremely low cost mechanical oil-gimballed • compass, intended for use in bicycles. Electronic subsystem: detects the orientation of the mechanical compass and send the data to the microcontroller. 102 HUMBERTO M ARTÍNEZ BARBERÁ The compass system works as follows: the mechanical compass contains a sphere that aligns to the Earth’s magnetic North pole using a small magnet mounted in its surface. There are two Hall effect sensors orthogonally disposed in the outside of the sphere mountings. These sensors measure the deviation of the sphere with respect to a reference position, and so indirectly measure the orientation with respect to the magnetic North pole. In this way the system does not measure the Earth’s magnetic field directly but the mechanical compass magnetic field. The interface circuitry works in a similar way to the shaft encoder, but uses a pair of 634SS2 Hall effect sensors. Figure 2.7 shows the circuitry corresponding to only one sensor. The output of the 634SS2 is differential, so the first step is to subtract the corresponding outputs, eliminate noise and amplify the signal 10 times (using A741 operational amplifiers). The signal is then sent to a variable gain stage to adapt the range of the signal (from full magnetic North to full South). Finally, the output is shifted so that a 0 Gauss magnetic field reads as 2.5 V. Figure 2.7: Electronic compass schematics. 2.2.5 Conclusions Although much effort was put into trying to develop good electronic and software systems, no modifications were made to the mechanics. This strongly conditioned the way the robot performed its tasks (see chapter 5 for a performance discussion). The main lesson learned was: “mechanics do matter!”. The most important mechanical drawbacks were: • The steering system is an all or nothing turning system, and is a major problem in accomplishing smooth long radius of curvature turns. Thus, the robot performed in a jerky way when turning. 103 CHAPTER 2: HARDWARE ARCHITECTURE • The sphere of the mechanical compass, due to nonuniformities in the mountings, could • suddenly get trapped producing the same output although the car was changing its direction abruptly. The original car assembly was very lightweight. As a children’s toy, it was designed to obtain good top speed on smooth surfaces, and consequently it equips a low reduction gearbox. As the added weight was 5 times the original, the control at low speed was very critical. Despite the above facts, it nevertheless served as a very good robotics background, and most of it would be the basis of the following robots. 2.3 La-Garra robot 2.3.1 Description The robot named La-Garra (Figure 2.8) was developed just after the completion of MadCar. It is based on a MiniMax’96 Robot Kit from Wirz Electronics. The purpose of this robot was to continue the development of the software architecture while taking advantage of the previously developed sensing system. The whole system is based on a microcontroller that is in charge of reading sensors and actuating motors, and a personal computer without its cover. The robot has two 12V DC motors attached to the differentially steered wheels (to turn them independently forward or backward) and has the support of two castor wheels (one in the front and one in the rear). This is a typical diamond configuration to overcome the steering difficulties of MadCar, and thus La-Garra is a holonomic platform. The main characteristics of the platform are presented in Table 2.2, and the distribution of the different elements is shown in Figure 2.9. Dimensions Base: 40 x 40 cm Height: 55 cm Weight: 6 Kg Sensors 4 sonar sensors 2 shaft encoders 4 switches 3 battery level sensors 1 B/W camera Actuators 2 DC motors 1 robotics arm Additional equipment 1 MC68HC11 1 i486DX/2 @ 66 MHz 1 10baseT ethernet 1 NiCd 12V 1.8 Ah battery 1 NiCd 12V 2 Ah battery 1 NiCd 9V 0.8 Ah battery Table 2.2: Main characteristics of La-Garra. The robot has different sensors (Table 2.2 and Figure 2.9). The most relevant characteristics are listed below: • The four ultrasonic sonar sensors are used to measure the distance to walls and objects. The control circuitry is identical to that used in MadCar. 104 HUMBERTO M ARTÍNEZ BARBERÁ • The two optical shaft encoder sensors mounted on the axles of the steering wheels are • • used for odometry. The precision is 28 pulses per revolution, which was intended only as a rough estimate of the robot’s speed and position. The four switches are positioned in two bumpers (one located in the front and the other in the rear) and they are used to detect collisions. The three propioceptive battery level sensors are used to trigger an alarm when the output voltage drops dangerously. In addition, some additional elements were added to test if they were feasible and worth the development time: • A 256 grey scale tones and low resolution QuickCam camera. The idea was for it to be • capable of identifying tennis balls and/or soft drink cans to perform collection and cleaning tasks. A LynxArm mechanical robotics arm with 4 degrees of freedom, which serves as a pantilt support for the camera. Figure 2.8: La-Garra’s lateral view. Figure 2.9: Hardware arrangement. La-Garra, as with MadCar, has a HandyBoard, although now it only takes care of interfacing the sensing system and the actuators. The control board is a standard i486DX2 personal computer without its covering case. It has 10 Mb of memory and 500 Mb of harddisk, and runs the Linux operating system. It works as the master of the system, whereas the HandyBoard runs as a slave. For data communication between the different subsystems, there are two serial RS232C ports (486 to 68HC11, and 486 to LynxArm) and a Centronics parallel 105 CHAPTER 2: HARDWARE ARCHITECTURE port (486 to QuickCam). In addition, there is a wired ethernet link that is used just for development and testing purposes. In the very beginning the control software was developed in C, but there were some problems testing the software both on board and off board, and thus it was decided to rewrite everything in Java (see chapter 5 regarding the new control software architecture developed). 4.3.2 Conclusions With this robot the suitability of a distributed processing architecture was tested and simple tasks were tried. Major achievements were due to software, as there were no significant difference in the final result compared to MadCar. The main lesson learned was: “mechanics still do matter!”. The most important problems found were: • The platform was still inadequate because of its lack of stability in tight turns. • Odometry needs encoders with much higher precision. • There is not enough processing power on board to perform control and image analysis. 2.4 Quaky-Ant robot 2.4.1 Description The robot named Quaky-Ant (Figure 2.10) started as some “minor” modifications to LaGarra in search of robustness. It soon became apparent that the only way to achieve better performance was to spend some time on designing the mechanics of the platform and to increase the number and type of sensors. Curiously, the platform bears a resemblance to a Pioneer from ActiveMedia Robotics (see appendix A), but it is merely a resemblance. A major step ahead with this robot was the incorporation of a distributed computing architecture organised in two layers, a low-level layer based on small microcontrollers and a high-level layer based on a microprocessor. In addition to this, the software and algorithms were sufficiently evolved so that more complex tasks could be accomplished, and consequently the robot needed to exhibit more features. The design goal was to develop a mobile robot that could work autonomously and could be teleoperated in both indoor and outdoor environments, while reusing most of the hardware and all the software. The robot has two 12V DC motors attached to the differentially steered wheels (to turn them independently forward or backward) and has the support of a small castor wheel (located in the rear). This is a typical triangle configuration to achieve better equilibrium. In addition, ground clearance is small to allow for a lower centre of gravity. The main characteristics of the platform are presented in Table 2.3, and the distribution of the different elements is shown in Figure 2.11. 106 HUMBERTO M ARTÍNEZ BARBERÁ Dimensions Base: 50 x 40 cm Height: 22 cm Weight: 15 Kg Sensors 10 sonar sensors 7 infrared sensors 2 shaft encoders 1 flux-gate compass 1 GPS receiver 1 RGB camera Actuators 2 DC motors Additional equipment 10 PIC16F877 1 i486DX/4 @ 133 MHz 1 10baseT ethernet 1 radio modem 900 MHz 1 video transmitter 2.4GHz 1 NiCd 12V 2 Ah battery 1 NiCd 12V 7 Ah battery Table 2.3: Main characteristics of Quaky-Ant. The robot has different sensors (Table 2.3 and Figure 2.11). The most relevant characteristics are listed below: • The ten ultrasonic sonar sensors are used to measure the distance to distant walls and • • • • • objects. The control circuitry is based on the one used in MadCar and La-Garra. The seven infrared sensors are used to measure the distance to near walls and objects. The two high-resolution shaft encoder sensors mounted on the axles of the driving wheels are used for odometry. The flux-gate compass measures the heading of the robot to the Earth’s magnetic North pole. It is intended to compensate odometry errors in indoor environments. The GPS receiver serves as an absolute positioning system. It is intended to compensate odometry errors in outdoor environments. The RGB camera is used to have video feedback in remote control robotics tasks. Figure 2.10: Quaky-Ant’s lateral view. Figure 2.11: Hardware arrangement. 2.4.2 Modular architecture and interconnection 107 CHAPTER 2: HARDWARE ARCHITECTURE The core of the processing system is an i486DX4 based single board computer (SBC) at 133 MHz, with 32 Mb of memory and 24 Mb of solid state disk (DiskOnChip). It runs the Java virtual machine under the Linux operating system, conveniently reduced to fit into 24 Mb. It was decided to modularise the system (Figure 2.12) so that the SBC is in charge of the high-level control software, and leave the low-level control and data acquisition software in different units. These modules interface with each other through a standardised bus for power and inter-module communication. Each module contains its own microprocessor that implements the inter-module communication protocol and performs the required low-level signal processing functions for sensor and actuator control. Only a few units are directly attached to the SBC (the GPS receiver and the RGB camera focus controller), because it was not worth adapting them to the interconnection bus. This architecture presents the following benefits: • Expandability: new sensor or actuator modules can be added at anytime without any • • • modification of the existing units. Flexibility: all modules use the same bus-protocol, and thus system maintenance and upgrading do not require much effort. Robustness: if one module fails, the others will continue working. Specialisation: each module contains only the hardware necessary to comply with its mission, reducing the overall sizes and price of the system. The inter-module communication bus is an I2C bus (Lekei, 1997) operating at 100 Kb/s. I C is a bus design and communication protocol that allows multiple modules to be connected to a common two-wire bus. One wire provides a high speed synchronous clock (SCL), while the other provides a two-way data communication line (SDA). Target modules are distinguished from one another by pre-appending each message on the data line with an address header. Only the module that matches the address acts on that message. In addition to the standard I2C lines, there are additional lines for +5 V power (VCC), ground (GND), reset (RESET), and two global sensor-disabling lines (US-INT and IR-INT). These last two lines are intended for power saving while in debug mode. 2 All the modules connected to the bus are based on the PIC16F877 microcontroller, running at 20 MHz, and having 8 Kb of program memory and 196 bytes of data memory. The I2C bus master is also attached to the SBC. The transition from 68HC11 to 16F877 enabled the fast development of all the different modules, at the expense of a more complex programming task (the use of a very reduced instruction set assembler instead of the C language). The PIC16F877 was chosen, among other things, because it has built-in hardware support for the I2C bus. 108 HUMBERTO M ARTÍNEZ BARBERÁ Radio Link Video Transmitter RGB Camera Serial SBC i486 GPS Receiver Serial Serial Infrared Sonar Bus Master Infrared Sonar Motor Driver Encoder I2 C Bus Infrared Sonar Infrared Sonar Infrared Sonar Motor Driver Encoder Compass Figure 2.12: Quaky-Ant modular architecture. The custom protocol that has been defined and implemented over the I2C bus uses a single -master mode configuration, with two different high-level data packet formats: PWRT for sending commands and PANS for requesting data. The protocol has been defined as stateless, so that if any packet between any other two is lost, the microcontrollers do not enter into a wait state. Should any slave get hanged for any reason, the whole system only loses the functionalities given by that microcontroller, while the others continue working. This decision has been taken for simplicity and robustness, mainly because communication deadlocks are thus avoided. As I2C uses byte-oriented and master-initiated data transmission, PWRT and PANS data packets are then converted to I 2C logic sequences. PWRT packets (Figure 2.13) are used to send commands from the master module to any slave module. The data interchange process is as follows (Figure 2.15): the master initiates the communication sending the slave address (ADDR), the code of the command to be executed (CMD), then the list of parameters needed for that command (PARAMi), and finally the end of packet code (TRAIL). ADDR CMD PARAM 1 … PARAM n TRAIL Figure 2.13: PWRT packet format. PANS packets (Figure 2.14) are used to receive data from any slave module to the master module. The data interchange process is as follows (Figure 2.15): the master initiates the communication sending the slave address (ADDR), and then the slave sends the list of data available (ANSi) and the end of packet code (TRAIL). ADDR ANS1 ANS2 … Figure 2.14: PANS packet format. 109 ANSm TRAIL CHAPTER 2: HARDWARE ARCHITECTURE a) PWRT packet b) PANS packet Figure 2.15: PWRT and PANS packets data interchange. Each slave module implements only a subset of the different commands available, but does not produce an error whenever and unimplemented command arrives. Usually, the data that the slave modules return is the current reading of their own sensors. The amount of data that each module type returns depends on the sensors it implements, and the control application is responsible for knowing how to parse that data correctly. This protocol enables the interoperation of many different subsystems with a low protocol overhead. The different parameters and options are discussed in the following sections. 2.4.3 External video and data communication One of the objectives when building this robot was mobility while remaining connected to a fixed station both for debugging tasks, distributed control (see chapter 5) and for teleoperation (see chapter 8). In the development and test scenario, the robot is connected remotely to the Faculty of Computer Science network, where the remote-debugging host is located. Additionally, there is also a video receiver connected to another host that broadcasts images across the network. The mobility objective directly excludes the use of a tethered solution and implies as well the need to send data (the system status), receive data (the desired commands), and send video (visual feedback) in wireless mode (Cheah, 1999). The video was not a problem because a custom modified video transmitter, working in the 2.4 GHz band, was reused from another project. The data communication between the different high-level systems is based on the UDP/IP protocols (see chapter 5), so this protocol was a must. The first option tested was a wireless ethernet link at 2 Mb/s, also working in the 2.4 GHz band. 110 HUMBERTO M ARTÍNEZ BARBERÁ After some tests, the frequency assignment decided upon was the one that appears in Figure 2.16. The leftmost peak corresponds to the video signal and the rightmost peak corresponds to the ethernet signal. Figure 2.16: Wireless video and ethernet bandwidths. The ethernet solution had a major problem: since it acts as a repeater to the wired 10/100 Mb/s fixed ethernet network, the wireless link was overloaded almost all the time due to the amount of traffic. In short, it only worked at night (when the Faculty’s traffic decreases noticeably). To overcome these difficulties it was decided to use a wireless modem at 115 Kb/s, working in the 900 MHz band (Cheah, 1999). A PPP channel is established over the wireless link, to support the UDP/IP protocols. This final solution provided robustness and high availability (the frequency does not interfere either with the video link or other wireless links that exist in the area) at the expense of a reduced bandwidth that excludes the use of complex real-time communication patterns. A possible improvement over these two approaches (the wireless ethernet and the wireless modem) is the use of modern IEEE 802.11 wireless LAN cards (Welch, 1999) that also operate in the 2.4 GHz band and provide 11 Mb/s transmission. It was not possible to test any of these cards during the work of this thesis, although such a link will be incorporated in a future. Another possibility is the use of a lowend computer with two network cards acting as an inexpensive router. This option was not tested because the necessary equipment was not available at the moment. 2.4.4 Sonar and Infrared module Each sonar and infrared module can handle up to four sensors: two sonar sensors and two infrared sensors. The module is designed so that a flexible firing pattern can be achieved, 111 CHAPTER 2: HARDWARE ARCHITECTURE and is based on the work of MadCar’s sonar controller. In fact, the previous experience with MadCar’s sonar controller speeded up the development of the sonar and infrared module. The sensor used is still the Polaroid 6500 range finder module, which operates using a single transducer for both sending and receiving the ultrasonic pulses (at 49.4 kHz). The maximum intensity width of the ultrasonic beam is typically between 15 – 20 degrees. The module can measure from 15 centimetres up to 10 metres with an accuracy of ± 1% over the entire range. For infrared range measurement, this module uses the Sharp GP2D02 High Sensitive Distance Measuring Sensor. These sensors operate using an infrared LED emitter and a PSD light detector. The maximum intensity width of the light beam is typically about 10 centimetres. The sensor can measure from 10 centimetres up to 120 centimetres with accuracy and resolution, depending on the distance. ALIMENTACIÓN SENSORES SONAR Vcc 3 GND J1 J3 VCC R1 4K7 C4 1 2 3 4 5 6 7 8 9 R2 4K7 GND VCC 1 2 3 4 5 6 7 8 9 + J2 CAPACITOR POL 2 VOUT 1 2 VIN 1 US-POWER U4 LM7806/TO VIN1 US1 US2 VCC R3 VCC POWER SUPPLY R4 VCC 2K2 VDD 2K2 Q1 Q2 NPN CBE NPN CBE VIN2 R5 10K ECHO2 ECHO1 R6 10K CON5 J5 1 2 3 4 5 CON5 VOUT2 VCC Vss J4 1 2 3 4 5 VOUT1 VCC INIT2 INIT1 BINH2 BINH1 U1A INIT1 1 3 2 Y1 SN7408 12M U1B C1 47P C2 47P INIT2 4 6 5 U2 SN7408 DIODE 2 3 4 5 6 7 RBO/INT RB1 RB2 RB3 RB4 RB5 RB6 RB7 R7 47K R8 1 13 200 MCLR/Vpp OSC1/CLKIN C3 104Kp RCO/T1OSO/T1CKI RC1/T1OSI/CCP2 RC2/CCP1 RC3/SCK/SCL RC4/SDI/SDA RC5/SDO RC6/TX/CK RC7/RX/DT RD0/PSP0 RD1/PSP1 RD2/PSP2 RD3/PSP3 RD4/PSP4 RD5/PSP5 RD6/PSP6 RD7/PSP7 RE0/RD/AN5 RE1/WR/AN6 RE2/CS/AN7 U1C D1 9 8 10 SN7408 U1D 33 34 35 36 37 38 39 40 11 13 SN7408 IR-INT 15 16 17 18 23 24 25 26 19 20 21 22 27 28 29 30 8 9 10 D2 12 US-INT SCK SDA R9 VCC VCC VCC 1k SW1 1 2 3 4 J6 8 7 6 5 1 2 3 4 CON4 SW DIP-4 PIC16C77_P RESET J7 IIC BUS 1 2 3 4 5 6 7 8 9 10 D3 14 RA0/AN0 RA1/AN1 RA2/AN2 RA3/AN3/VREF RA4/TOCKI RA5/AN4/SS 1 2 3 4 5 6 7 8 9 10 VCC OSC2/CLKOUT J8 IIC BUS Figure 2.17: Sonar and infrared module schematics. The board (Figure 2.17) has two I2C connectors, four address selection switches, and an additional +12 V power connector. As the 6500 modules are powered using +6 V, a LM7806 voltage regulator obtains the required voltage from the +12 V supply. No sensor is multiplexed, so each one has its own firing circuitry. The sonar INIT lines and infrared VIN lines are globally enabled/disabled using the US-INT and IR-INT bus lines and the 7408 AND gates. The sonar sensor firing timing (Figure 2.4) and circuitry is similar to that described in 112 HUMBERTO M ARTÍNEZ BARBERÁ MadCar’s sonar controller. The microcontroller uses INIT to fire the 6500 module, which is driven through a BC237 transistor in common emitter configuration. This acts as an inverter and a current driver to excite the 6500 module. In order to avoid detecting the emitted pulse as an echo (only one transducer is used per 6500 module), the 6500 module generates an internal blanking signal for 2.38 ms. This sets the minimum sensing distance to 0.4 m. To reduce this distance to 0.1 m a blanking inhibit signal BINH is also needed. The microcontroller introduces a delay of 0.9 ms between the INIT and BINH signals. The ECHO signal from the 6500 module is attached to an interrupt-driven digital input of the microcontroller, in opencollector configuration. The infrared firing is simple (Figure 2.18). The serial protocol to interface the GP2D02 uses the VIN and VOUT lines. The microcontroller uses VIN to fire the GP2D02. After 70 ms, when a valid measure is present, the infrared triggers VOUT, so that the serial protocol interchange starts. Each bit is signalled by the microcontroller using VIN, and the infrared returns bit values using VOUT. The microcontroller should wait 0.2 ms between two contiguous VIN pulses. When finished, a valid 8 bit value has been received. Figure 2.18: GP2D02 infrared sensor timing chart. The board interprets three different commands, developed using the custom PANS/PWRT protocol: two for setting firing masks for sonar (US-PAT) and infrared (IRPAT) sensors, and the other for actually firing them (START). Each mask specifies how many sensors of its class are to be fired: none, one, or both. When issuing a PANS packet, there should be a minimum period of 100 ms in order to obtain valid readings. Otherwise, the board returns the last valid set of measures. The typical sensor cycle (Figure 2.19) is as follows: the master sends a US-PAT command to set the sonar mask, then sends a IR-PAT command to set the infrared mask, and last sends a START command to fire the specified sensors simultaneously. After a time greater than 100 ms, the master can send a PANS packet to receive data. It is important to notice here that, whatever sensor masks are set, the board always sends the lasts valid readings corresponding to all the sensors. 113 CHAPTER 2: HARDWARE ARCHITECTURE Figure 2.19: Typical sonar and infrared protocol sequence. 2.4.5 Motor control module Each motor control module can handle one motor, one encoder and up to eight digital inputs, intended to be used in bumpers or switches. The board (Figure 2.20) has one I2C connector, one encoder connector, one motor connector, four address selection switches, and an additional +12 V power connector. Each motor control board is attached to a MAXON AMax y32 12V and 15w motor and a HP HEDS-5540 three channel 500 pulses per revolution optical encoder. The motor is geared with a 60:1 ratio so that 30,000 pulses are generated per revolution. As the wheels have a diameter of 14.87 cm, in theory a high accuracy can be obtained. The microcontroller interfaces the encoder using three interrupt-driven digital input ports to process the two quadrature (A and B) and index (Z) signals. To interface the motor, each board has a L298 dual H-bridge current driver, with both drivers in parallel to support up to 4 A at 12 V. The microcontroller then sets the speed of the motor varying the duty cycle of a 4 KHz PWM signal. To change the direction of turn or to brake the motor, it uses two lines (A1 and A2), and to disable the motor (free running mode) it uses an enable signal (E). 114 HUMBERTO M ARTÍNEZ BARBERÁ POWER SUPPLY CONDESADORES DRIVERS DIODOS PROTECCIÓN VS J1 VCC VDD VCC1 VS VCC2 VCC1 2 1 VS VS C1 100n VS + C2 100n C3 330u D1 Alimentacion Motor Vss OUT1 3 J2 VS R1 Y1 2 D2 10K 1 2 PWM MOTOR D3 U1 5 7 1 12M C4 22-47P C5 22-47P 10 12 U2 VCC D5 DIODE RBO/INT RB1 RB2 RB3 RB4 RB5 RB6 RB7 R2 47K R3 1 13 200 MCLR/Vpp OSC1/CLKIN C6 104Kp RCO/T1OSO/T1CKI RC1/T1OSI/CCP2 RC2/CCP1 RC3/SCK/SCL RC4/SDI/SDA RC5/SDO RC6/TX/CK RC7/RX/DT RD0/PSP0 RD1/PSP1 RD2/PSP2 RD3/PSP3 RD4/PSP4 RD5/PSP5 RD6/PSP6 RD7/PSP7 RE0/RD/AN5 RE1/WR/AN6 RE2/CS/AN7 14 6 11 2 3 4 5 6 7 2A1 2A2 2Y1 2Y2 1EN 2EN 1E 2E 2 3 OUT2 13 14 D4 1 15 L298 A-ENCODER B-ENCODER Z-ENCODER 33 34 35 36 37 38 39 40 J3 15 16 17 18 23 24 25 26 1 2 3 4 5 VCC CON5 SCK SDA VCC 19 20 21 22 27 28 29 30 8 9 10 SW1 4 3 2 1 5 6 7 8 1 2 3 4 5 6 7 8 9 10 RA0/AN0 RA1/AN1 RA2/AN2 RA3/AN3/VREF RA4/TOCKI RA5/AN4/SS 1Y1 1Y2 10 9 8 7 6 5 4 3 2 1 OSC2/CLKOUT 1A1 1A2 BUMPER J4 J5 BUS ADDRESS IIC BUS 100k R4 VCC PIC16C77_P RESET Figure 2.20: Motor control module schematics. There are two different commands that the board interprets: one to enable the motor (ENA) and the other to change the speed and direction (CTRL). A CTRL command is valid until a new CTRL command is received, independently of the ENA command. The pulse counter is a 3 byte register to permit flexible encoder management. Whenever the master issues a PANS, the microcontroller returns the current encoder value and the current bumper status, and then resets the encoder counter. 2.4.6 Compass module The compass board (Figure 2.21) has two I2C connectors and four address selection switches. It uses a Precision Navigation Vector 2XG mechanically gimballed 2-axis electronic compass. It features an accuracy of 2º RMS with a resolution of 1º. The microcontroller polls the Vector 2XG at its maximum rate (5 Hz). When reading data from the compass (Figure 2.22), the microcontroller starts to set the polling signal (P/C) and then waits for approximately 100 ms until the compass gets a valid measure and sets the end of capture line (EOC). The microcontroller sets the slave mode line (SS), so that the serial protocol interchange starts. Each bit is signalled by the microcontroller using SCLK, and the compass returns bit values using SDO. The microcontroller should wait 0.1 ms between two contiguous SCLK pulses. When finished, two valid 16 bit values, corresponding to the X and Y components of the heading, have been received. 115 CHAPTER 2: HARDWARE ARCHITECTURE POWER SUPPLY VCC VDD Vss Y1 12M SCLK SDO C2 47P SS P/C U1 RA0/AN0 RA1/AN1 RA2/AN2 RA3/AN3/VREF RA4/TOCKI RA5/AN4/SS VCC D1 DIODE RBO/INT RB1 RB2 RB3 RB4 RB5 RB6 RB7 R2 47K R3 1 13 200 MCLR/Vpp OSC1/CLKIN C3 104Kp RCO/T1OSO/T1CKI RC1/T1OSI/CCP2 RC2/CCP1 RC3/SCK/SCL RC4/SDI/SDA RC5/SDO RC6/TX/CK RC7/RX/DT RD0/PSP0 RD1/PSP1 RD2/PSP2 RD3/PSP3 RD4/PSP4 RD5/PSP5 RD6/PSP6 RD7/PSP7 RE0/RD/AN5 RE1/WR/AN6 RE2/CS/AN7 2 3 4 5 6 7 VCC BCD/BIN Y FLIP X FLIP CI EOC RAW 33 34 35 36 37 38 39 40 VCC 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 CON17 RESET 15 16 17 18 23 24 25 26 19 20 21 22 27 28 29 30 CAL RES M/S R1 10k SCK SDA R4 VCC VCC VCC 1k SW1 1 2 3 4 J2 8 7 6 5 1 2 3 4 CON4 SW DIP-4 1 2 3 4 5 6 7 8 9 10 OSC2/CLKOUT 14 J1 PIC16C77_P RESET J3 IIC BUS 1 2 3 4 5 6 7 8 9 10 C1 47P J4 IIC BUS Figure 2.21: Compass module schematics. Figure 2.22: Vector-2 GX timing chart. As the compass board works in continuous polling mode, it does not need any special configuration, and so the board does not implement any command. Whenever the master issues a PANS, the microcontroller returns the last valid reading of the compass, consisting of the X and Y values. 116 HUMBERTO M ARTÍNEZ BARBERÁ 2.4.7 Bus master module The bus master board (Figure 2.23) has one I2C connector, one RS-232 connector, one 20x2 LCD connector, and one +12 V power connector. The +12 V input is tied to a LM7805 voltage regulator to obtain the +5 V needed to power all the boards connected to the bus. The microcontroller’s logic is in charge of accepting and sending serial data using the RS-232 port through a MAX232 integrated circuit. The serial packet format is similar to that of the bus format. The RS-232 port is set at a speed of 38.4 Kbaud. In addition to the commands supported by the different boards, the bus master also accepts commands to send or format data to the LCD display. Alimentacion Sistema Iluminacion Display VCC 1 VIN VOUT VCC 2 R1 GND 2 1 VDD 3 U3 L7805/TO3 J1 POT 2 3 VO 1 Vss Y1 12M C2 22-47P J2 CON16 D1 RBO/INT RB1 RB2 RB3 RB4 RB5 RB6 RB7 R2 DIODE 47K R3 1 13 200 MCLR/Vpp OSC1/CLKIN SW1 C8 SW PUSHBUTTON 104Kp RCO/T1OSO/T1CKI RC1/T1OSI/CCP2 RC2/CCP1 RC3/SCK/SCL RC4/SDI/SDA RC5/SDO RC6/TX/CK RC7/RX/DT RD0/PSP0 RD1/PSP1 RD2/PSP2 RD3/PSP3 RD4/PSP4 RD5/PSP5 RD6/PSP6 RD7/PSP7 RE0/RD/AN5 RE1/WR/AN6 RE2/CS/AN7 VCC 15 16 17 18 23 24 25 26 19 20 21 22 27 28 29 30 VO VCC + C3 10u-16v D0 D1 D2 D3 D4 D5 D6 D7 33 34 35 36 37 38 39 40 VCC C4 U2 C5 10u-6.3v VV+ C2C2+ C1C+ C0 C1 C2 SCK 7 14 9 12 SDA DTR + VCC T2OUT T1OUT R2OUT R1OUT T2IN T1IN R2IN R1IN 6 2 5 4 3 1 10u C6 + 2 3 4 5 6 7 C7 10 11 8 13 10u-15v 10u-6.5v DTR TXD MAX232 DCD RXD DCD VCC IR-INT 1 2 3 4 5 6 7 8 9 10 RA0/AN0 RA1/AN1 RA2/AN2 RA3/AN3/VREF RA4/TOCKI RA5/AN4/SS + 14 + OSC2/CLKOUT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 U1 C0 C1 C2 D0 D1 D2 D3 D4 D5 D6 D7 C1 22-47P US-INT J3 8 9 10 RS-232 R4 R5 VCC RESET C9 CAP NP 1 2 3 4 5 6 7 8 9 10 PIC16C77_P J4 RESISTOR RESISTOR C10 CAP NP IIC BUS Figure 2.23: Bus master module schematics. 2.4.8 Conclusions The Quaky-Ant is a simple but powerful design that sticks to the operational requirements pointed and discussed at the beginning of the chapter. When measuring cost, performance and expandability Quaky-Ant compares favourably with commercial robots. The main characteristics of this robot can be summarised as: 117 CHAPTER 2: HARDWARE ARCHITECTURE • It features an open architecture. Two key points are the use of a standard I2C bus for the • • • interconnection of different modules and the definition of a generic open protocol (PWRT/PANS packets). One consequence of the design of the architecture is its robustness (mainly due to the custom protocols) and its expandability (all the modules are addressed the same way irrespective of their nature). The architecture is not tied to any control architecture or development system, although one has been developed (see chapter 5). It operates smoothly and correctly in indoor environments and is also capable of operating in non-rough outdoor environments. Either way, it does not exhibit an outstanding top speed (approximately 0.45 m/s) because this was not a design concern. In contrast it exhibits manoeuvrability and precision. 118 Chapter 3 Sensor Fusion 3.1 Introduction A mobile robot collects information about its environment through external sensors. This is combined with the information provided by internal sensors and both of them are used to perform navigation tasks. External sensors used in mobile robots are ultrasonic, infrared, and laser sensors, vision cameras and others. Each of them provides readings with a degree of uncertainty depending on the sensor characteristics. In many applications, matching inaccurate information coming from various sensors results in a reduction in uncertainty of global sensor information. The Quaky-Ant robot (see chapter 2) has ultrasonic and infrared sensors. Sonar sensors measure distances up to nine metres, but short distances are very inaccurate and they have many problems with multiple reflections and wide beam width. Infrared sensors measure up to eighty centimetres, with less uncertainty than sonar in obstacle avoidance tasks (due to their aperture). Both kind of sensors are placed with similar orientation and location around the robot, and thus, different redundant sensors can reduce the uncertainty of information. Many techniques have been studied in the process of integrating data from different sensors (sensor fusion). These methods translate the different sensory inputs into reliable estimates that can be used by other navigation subsystems. There have been several reviews of sensor fusion (Luo, 1989), (Kam, 1997). Luo (Luo, 1989) shows the advantages gained through the use of multi-sensory information as a combination of four aspects: redundancy, complementarity, timeliness and cost of information. Kam (Kam, 1997) makes a review describing integration techniques in two categories: low level fusion used for direct integration of sensory data, and high level fusion used for indirect integration of sensory information through control signals or commands suggested by different modules in the control architecture. Sensor fusion with known statistics relies on well-developed techniques such as Kalman filtering or Bayesian theory. Their goal is to determine a parameter θ ∈ Rm from a set of observations z ∈ Rn. They are both related through the non-linear function g(z, θ)=0. Extended Kalman filtering (Leonard, 1992) is the most popular tool proposed in the literature for sensor fusion in mobile robot navigation. 119 CHAPTER 3: SENSOR FUSION If there is no model of the uncertainty, other specific techniques such as rule based sensor fusion (Flynn, 1988), fuzzy logic or neural networks (Chang, 1996) and others are used. These algorithms do not require an explicit analytical model, thus resulting in more flexible models. These models are obtained by taking into account experiments (inputs and outputs) with no prior knowledge, or the knowledge of an expert operator through descriptors or rules. One of the goals of this thesis is the Quaky-Ant robot to perform delivery tasks in an indoor office-like environment, without prior knowledge of the floor plant, and without any modification or special engineering of the environment. This objective, among other things, dictates the necessity of having reliable sensors both to build maps and to navigate the robot. Quaky-Ant has only low cost sensors that return a distance as output: ultrasonic sonars and infrared sensors. These two kinds of sensors can be complementary (each one has its own error sources) and a natural way to cope with their uncertainty is to perform signal-level sensor fusion. Thus, the output of each pair of sensors targeting the same area is fused in what is called a virtual sensor. As the robot builds its own map without an a priori world model, fusion techniques that need a model of the environment in advance (like Kalman filters) are not applicable. Supervised learning based techniques have been previously used for fusion and are also used in this thesis, in particular a neural network and a fuzzy reasoning system. One of the most crucial problems when using a supervised learning method for sensor fusion is data acquisition. This acquisition is not trivial because quite a few representative sensor measurements are needed: both the sensor reading and the real distance, for different incidence angles, object surfaces, and distances. In this case, the real experimental set-up is very difficult. Although important, the real set-up is very time consuming and thus a different approach is taken. There are good sensor models available in the literature, and thus the data acquisition is performed using sensor simulation. The fusion method is trained with these data, and then the fusion model is validated in real-world environments by an iterative process (Figure 3.1). The fusion module will accept the distance measured by each sensor and produce the fused distance, which will be used by other robotics related tasks like mapbuilding (chapter 4), robot control (chapter 5), or feature-level sensor fusion (chapter 6). Figure 3.1: Fusion method training and validation. 120 HUMBERTO M ARTÍNEZ BARBERÁ 3.2 Sensor models 3.2.1 Ultrasonic sonar range finders Quaky-Ant’s uses the Polaroid 6500 series Ultrasonic Sonar Range Finder. These sensors operate using a single transducer for both sending and receiving the ultrasonic pulses (at 49.4 kHz). The maximum intensity width of the ultrasonic beam is typically about 15 – 20 degrees. The module can measure from 15 centimetres up to 10 metres with an accuracy of ± 1% over the entire range. There are three fundamental kinds of error: • Loss of echo signal. When the angle of incidence between the sonar and a given objet is • • wide enough, the pulses do not go back to the receiver and the sensor does not detect the object. Uncertainty due to angular aperture. Given the typical sonar beam, the further away the object is, the more uncertainty the measure has. The uncertainty zone of the sonar is a cone with its vertex located on the sensor. The distance to the object is known, but its position is not. Double rebounds. Sometimes, when the angle of incidence between the sonar and the object is big, the reflected pulse can reach another object and then go back to the sensor. In this case the sensor detects the object but produces a wrong measurement (longer than the real one). There are some sonar models available in the literature (Barshan, 1990)(Leonard, 1992), based on the physical model of an ultrasonic pulse reaching planes and corners. These methods feature a local scope, because they decompose the environment in discrete elements (segments, corners, cylinders, etc) and then model the sonar behaviour with each element. The major drawbacks of these methods are: when using a given environment, it must be decomposed in the different elements and the methods fail to model double rebounds, which are very significative in a cluttered environment. To cope with these problems there are techniques (Gallardo, 1999) based on the well-known raytracing method for computer graphics (Watt, 1992) and the sonar models described below. The method of sonar tracing for a given sonar is as follows (Gallardo, 1999): • Let θ be the orientation of the sonar. First compute the sensibility level ∆θd (Eq 3.1). Then • • send δ rays from θ − ∆θd to θ + ∆θd. Recursively follow each of the rays until they reach the sensor. Return the distance of the shortest path. To compute the path of an emitted ray rout, first find the intersection point p with the environment, and the direction rref. Then test if rref reaches the sensor with amplitude greater than a0. In this case the distance d is calculated (Eq 3.2) based on the path followed by the ray (p 1 … p n). If the amplitude is not greater than a0 then compute the next reflection, in the rref direction. This way rref becomes rout. Then follow the same method described below. 121 CHAPTER 3: SENSOR FUSION The method to compute the amplitude of a ray is as follows: • • • • Let rin be the ray from the transducer to the intersection point p. Compute the angle ∆κ between rin and rref . Compute the angle ∆η between rin and θ. Compute the sensibility a (3.3). ∆θd = θ0 −ln(a 0 ) 2 (3.1) n d= ∑p (3.2) i i a=e n −2( ∆θ 2 ∆η 2 ∆κ 2 + + ) θ0 2 θ0 2 κ 0 2 (3.3) This method produces quite good results because it can model the three sonar error sources. Nevertheless, it does not take into account the main lobe effect in not completely specular environments, which is the case in real world environments. Thus, when the minimum distance ray is far from the centre of the lobe, the resulting time-of-flight is quite conservative and produces distances shorter than the real ones. To produce more realistic distances the previous method has been modified to take into account the angular distance from the centre of the lobe. The new addition is as follows: • Let tof i be the rays that reach the sensor, and tof min the one with the minimum distance. • Let sε be the set of rays so that | tof i – tof min | < ε. • Compute the corrected minimum distance tof out as a weighted average of the rays in sε (3.4) wgt i = 1 − (∆ηi − ∆ηmin ) 2 ∑ wgt ⋅ tof ∑ wgt i tof out = (3.4) i i∈sε i i∈sε In this case, ε has been empirically set to 10 centimetres, which produces fair results. The original method gives results with more uncertainty in regular environments, while the modified one produces more consistent measures (Figure 3.2). 122 HUMBERTO M ARTÍNEZ BARBERÁ Figure 3.2: Sonar-tracing original method (left) and modified method (right). 3.2.2 Infrared sensors Quaky-Ant’s uses the Sharp GP2D02 High Sensitive Distance Measuring Sensor. These sensors operate using an infrared LED emitter and a PSD light detector. The maximum intensity width of the light beam is typically about 10 centimetres. The module can measure from 10 centimetres up to 120 centimetres with accuracy and resolution depending on the distance. There are three fundamental kind of errors: • Object colour/reflectivity . As light reflectivity varies with colour, the darker the object, • • the further away it appears. The accuracy degrades quickly with the distance. The object can even remain undetected. Global illuminance. As regular illumination contains light in the infrared band, the brighter the light the further away the object appears. Quantisation error. This sensor converts the measure to 8 bits. This conversion is not linear in the full range, and thus, the further away the object, the less accurate is the reading. 123 CHAPTER 3: SENSOR FUSION Figure 3.3: Sharp GP2D02 distance output (left) and sensing range (right). As there are not many infrared models in the literature, a simple model based on the manufacturer’s datasheet has been developed. The angle of incidence is not as important as in sonars, and thus it is simply ignored. The output of the infrared sensor is not linear with the distance (Figure 3.3). Due to this non-linearity, the sensor response is modelled using the following equations (Eq. 3.5), with the response decomposing into a piece-wise linear curve. y = −0.25x + 95.00 x ∈[60,120] y = −1.14 x + 148 .57 y = − 5.33x + 253 .33 x ∈[25, 60 ] x ∈ [10, 25] (3.5) where y is the output of the sensor and x is the real distance to the wall. The output of the sensor is an 8 bit value. This model is rather approximate for reflective objects between 18% and 90%, and the range of maximum accuracy is between 10 and 80 centimetres. 3.3 Sensor fusion using neural networks As stated above, neural networks (see chapter 1) are to be used for solving the fusion problem. Many authors have used neural networks as sensor fusion models with sonar sensors (Chang, 1996) or acoustic sensors (Pearson et al., 1995). Others use them as interpreters (Thrun, 1998a), (Santos, 1994) where information coming from different sonar sensors is merged to produce a probability value of space occupancy. There are many antecedents in sensor fusion with sensors of different natures as well. Buchberger (Buchberger, 1993) compiles information coming from a laser radar and some sonar sensors, Courtney (Courtney, 1994) uses readings from ultrasonic, vision and infrared sensors to localise a mobile robot. In this case, a standard multilayer perceptron is used. The following subsection describes the training process as well as presenting some results. 124 HUMBERTO M ARTÍNEZ BARBERÁ 3.3.1 Experiments and results To compare the neural network fusion model the simple but effective Flynn’s rules model (Flynn, 1988) is used. This fusion method is as follows: • If the sonar reading is greater than the maximum range for the infrared sensor, then ignore • • the infrared sensor. If the sonar reading is at its maximum value, then the real distance is greater. Whenever the infrared sensor detects a change from non-detection to detection, and the associated sonar reading is less than 10 feet, then a valid depth discontinuity has been detected. In this way, the original sonar boundary is redrawn to take into account features that are found by the infrared sensor. Although this model works well in simulators and in some real environments, it presents some problems: • The rules do not exploit the fact that modern infrared sensors have a greater accuracy (like • those of Quaky-Ant), and the robot should rely on them to get accurate short distance measures. A better fusion method should give more importance to the infrareds. It does not take into account the physical characteristics of each pair of sensors: the sensor manufacturing process is not really uniform and there are some calibration differences between sensors (Gallardo, 1999). Two different scenarios have been used for both collecting sensor data and for neural network validation. The first one is heavily influenced by the basement of the Computer Science Faculty (where our lab is located in). Its overall dimensions are 17 by 22 metres, with corridors of 2 metres of width, and doorways of 80 centimetres (Figure 3.4). The other one is based on the floor plant of the Trinity College Robot Fire Fighting Contest1. Its overall dimensions are 3.5 by 3.5 metres, with corridors and doorways of 65 centimetres (Figure 3.5). 1 http://www.trincoll.edu/~robot 125 CHAPTER 3: SENSOR FUSION Figure 3.4: Basement environment. Figure 3.5: Trinity environment. The data acquisition procedure consists of storing triplets of data (sonar, infrared, and real measures) in different positions along two different, manually defined paths in both scenarios (Figure 3.6). This data set is divided into two sets: the training-set, which is directly used for neural network training (75% of the data set), and the test-set (25% of the data set), which is used for neural network validation. It can be observed that the data set is not regular and contains many contradictory values, as happens in the real world. This fact reinforces the idea that learning with simulated data can be helpful in a real world scenario. Figure 3.6: Training data set. The neural networks are defined by the following parameters: • The topology, defined by the number of input, hidden and output nodes. • The learning rate, which governs the influence of each example in learning. • The momentum, which indicates the strength of the vector pointing to the highest error • variation direction. The number of epochs, which defines the duration of the training procedure. 126 HUMBERTO M ARTÍNEZ BARBERÁ In order to select the appropriate values for the network parameters, an iterative procedure is followed (Figure 3.7). Instead of training a single network, different networks with different parameter values are trained and the best network (the one with the minimum test-set error) is returned as the output of the learning procedure. To speed up the execution of the neural network training procedure, four hosts (in a variety of different platforms) have been involved, each one running a backpropagation algorithm, where the only difference between them is the computational power of the machine they are running on. Figure 3.7: Neural network training process. The number of input and output nodes is fixed: two input nodes (sonar and infrared sensors) and one output node (virtual sensor). The momentum and the number of epochs have also been fixed. The momentum is not important for a coarse grained search, and the number of epochs, if set high enough, is only an arbitrary measure. Thus, for each learning experiment, a series of hidden nodes and learning rates have been tried: hidden nodes numbers are in the set {5, 10, 15, ... , 80}, and learning rate values are in the set {0.01, 0.06, 0.11, 0.16}. A total of 16 x 4 = 64 learning experiments have been run. By applying the automated training procedure to the acquired data set, the best test-set error is obtained (Figure 3.8) with the following parameters: 10 hidden nodes and a learning rate of 0.04363 in 20,000 epochs. Figure 3.8: Error evolution with different training parameters. The output of the neural network approximates and smoothes the data set (Figure 3.9). One point to notice is that the same neural network is used for each pair of sensors, with all of 127 CHAPTER 3: SENSOR FUSION them exhibiting the same physical characteristics. If data is to be collected from real world, it is necessary to obtain different data sets, and then produce a series of neural networks, one for each pair of sensors. Gallardo (Gallardo, 1999) reported that there are noticeable differences in a series of Polaroid 6500 sonar sensors. Figure 3.9: Neural network fusion surface. After the training process has finished, the robot is run in both scenarios along different predefined paths, and taking measurements from both sensors. These measurements are fused and compared to the real distance to calculate the error of the fused distance. The purpose of this step is to test the generalisation capabilities of the neural network, and to quantify how well it performs. An Euclidean metric is used to compute the absolute error of all the sensors, measured in metres (Eq. 3.6). err = n ∑ virtu i (3.6) − reali i where virtui is the fused distance and reali is the real distance to the wall. The results of the tests are summarised in tables 3.1 and 3.2. They represent the average error (Avg), the variance (Var), and the rooted mean square error (RMSE) using the previously defined error metric. It can be noted that the neural network fusion outperforms better in all cases except in the trinity scenario, in which the average error is higher. This is due to the fact that the walls are always so close that most of time the infrared suffices to approximate the real measure, and using Flynn’s rules, the sonar is only used for long distances. Method ANN IR SONAR FLYNN Avg 0.5499 0.4278 3.7716 0.4841 Var 9.5839 8.6730 192.4129 17.0875 RMSE 0.6638 0.5194 4.0186 0.6778 Table 3.1: Errors in the trinity scenario. Method ANN IR Avg 2.8014 4.4054 Var 153.7345 256.6320 128 RMSE 3.3804 5.0515 HUMBERTO M ARTÍNEZ BARBERÁ SONAR FLYNN 4.7821 3.2924 123.8674 143.5941 5.0812 3.7761 Table 3.2: Errors in the basement scenario. 3.4 Sensor fusion using fuzzy rules In addition to simple neural networks, this thesis approaches the fusion problem by way of fuzzy rule systems (see chapter 1). In this case, the rules are obtained using a hybrid learning mechanism. The process is composed of three techniques, each one carrying out one of the different phases of fuzzy modelling. First, a technique to find the number of rules that compose the system (system structure) is needed. An auto-organised clustering method called mountain clustering is used in order to find the number of different behaviours in the data set. The output of this method is the number of initial centroids and it represents the number of rules in the system. These centroids are the input to another clustering method whose goal is to tune these centroids in order to obtain improved ones, which constitute the initial parameters of the rules. The K-means, the fuzzy K-means or the fuzzy tabu clustering methods will perform this step. An adaptive neural network called ANFIS carries out the tuning of the rules as the final step in this schema. In the next subsections the different algorithms that are part of the learning process are described. 3.4.1 Some background on clustering methods Clustering is an important technique used in searching for structures in data. Given a finite set of data, p, the problem of clustering in p is to find a collection of cluster centres, which can properly characterise relevant classes of p. In classical cluster analysis, these classes are required to form a partition of p, such that the degree of association is considered strong for data lying in the blocks of the partition and weak for data in different blocks (Duda et al., 1973). If this requirement is relaxed, a crisp partition on p is replaced with a weaker requirement of a fuzzy partition on p, and therefore it falls in the area of fuzzy clustering. The fuzzy clustering problem may be stated as follows: given mp patterns in Rn, allocate each pattern to a cluster with a certain degree of membership such that an objective function is minimised. This function has three parameters: the patterns, the clusters and the memberships. Generally, this function is a sum of squared distances between each pattern and each centroid of the clusters multiplied by the membership to that cluster. Several distances may be used, like: Hamming, Euclidean, Mahalanobis, maximum, Minkowski, etc (Yuan et al., 1995). The fuzzy clustering problem can be mathematically described as follows (Eq. 3.7): Minimise J (p, c, u) (3.7) 129 CHAPTER 3: SENSOR FUSION mp ∑i uij = 1, j ∈[1,..., nc] Subject to 0 ≤ uij ≤ 1 pi ∈ Rn , i ∈[1,. .., mp] n c j ∈R , j ∈[1, ..., nc] where: • • • • • • J is the objective function (given) mp is the number of patterns (given) p is the pattern vector (given) nc is the number of clusters (given) c is the centroids vector (to be found) u memberships matrix (to be found) and the memberships matrix is defined as follows: • uij degree of membership of pattern pi to the cluster cj The modified mountain clustering is an improvement of the mountain clustering method proposed in (Yager and Filev, 1994). It features a computing time which is proportional to the number of data points instead of their dimension. At the beginning, all the data in the training set are candidates for clusters. The second step is to calculate a density function called mountain for each candidate. In the third step, the point with maximum density is the next cluster chosen. After this, the effect of the last cluster found must be eliminated, and so we must subtract a gaussian component to its density function, centred in this cluster and proportional to its height. If the density of the last cluster is higher than a predetermined constant, the process continues in the second step. The k-means clustering divides the data X={x1,...,xn} into a known number of groups {C1,...,Ck}, and finds the point (centre) cj of each group that minimises a cost function J. The groups are defined by a k x n matrix U, where u i,j=1 ⇔ cj ∈ Ci. If every cj is fixed, then the u i,j values minimising J are (Eq. 3.8): 2 2 1 x j − ci ≤ x j − ck ∀k ≠ i ui, j = 0 otherwise (3.8) On the other hand, if every u i,j is fixed, the cj that minimise J are the centre of the area of each group (Eq. 3.9): cj = 1 Cj ∑x k (3.9) xk ∈Ci Therefore the clustering process is as follows: 130 HUMBERTO M ARTÍNEZ BARBERÁ • • • • Initialise randomly the centres cj. Obtain the U matrix. Modify the centres cj. If J does not reach a tolerance level, go to step 2. The fuzzy k-means clustering is an improvement on the last method, in which the data belong to each cluster in a fuzzy way. Then U matrix is not binary and the u i,j are values in the range [0..1]. One technique of fuzzy clustering is the fuzzy c-means clustering method which is based on fuzzy c-partitions where c designate the number of fuzzy classes in the partition. The fuzzy c-means clustering algorithm was initially developed by Dunn and later generalised by Bezdek (Bezdek, 1981). The fuzzy c-means algorithm is an iterative procedure for approximately minimising the objective function J by using a Picard iteration. A characteristic of this is that it converges q-linearly to either a local minimum or saddlepoint of J. 3.4.2 Fuzzy tabu clustering In order to treat the possible existence of multiple local minima, a new algorithm has been developed based on a tabu search technique while preserving the fuzzy c-means algorithm ideas. The tabu search technique is different from the well-known hill-climbing local search techniques in the sense that it does not become trapped in local optimal solutions, as is the case of the classical fuzzy c-means (Al-Sultan, 1995). The algorithm has been implemented and tested on various problems, and the results are very encouraging. 3.4.2.1 The tabu search The tabu search is a heuristic that can be used to solve combinatorial optimisation problems. It allows moves out of a current solution that makes the objective function worse, in the hope that it eventually will achieve a better solution (Al-Sultan, 1995). The tabu search requires the following basic elements to be defined: • Configuration is a solution or an assignment of values to variables. • A move characterises the process of generating a feasible solution to the combinatorial • problem that is related to the current solution (i.e. a move is a procedure by which a new trial solution is generated from the current one). Set of candidate moves is the set of all possible moves out of a current configuration. If this set is too large, one could operate with a subset of this set. 131 CHAPTER 3: SENSOR FUSION • Tabu restrictions: these are certain conditions imposed on moves that make some of them • forbidden. These forbidden moves are known as tabu. To take these tabu moves into account, a list of a given size records these forbidden moves. It is called the tabu list. Aspiration criteria: these are rules that override tabu restrictions, i.e. if a certain move is forbidden by tabu restriction, then the aspiration criterion, when satisfied, can make this move allowable. Given the above basic elements, the tabu search scheme can be described as follows: start with a certain (current) configuration, evaluate the criterion function for that configuration. Then, follow a certain set of candidate moves. If the best of these moves is not tabu or if the best is tabu, but satisfies the aspiration criterion, then pick that move and consider it to be the next current configuration; otherwise, pick the best move that is not tabu and consider it to be the new current configuration. Repeat the procedure for a certain number of iterations. On termination, the best solution obtained so far is the solution obtained by the algorithm. The tabu list has a certain size, and when the length of the tabu reaches that size and a new move enters that list, then the first move on the tabu is freed from being tabu and the process continues (i.e. the tabu list is circular). 3.4.2.2 The fuzzy tabu clustering algorithm In this section a new tabu search-based algorithm for fuzzy clustering is presented (Delgado et al., 1997). However, before the algorithm is stated, some notation is needed. The objective function (Eq. 3.10) is similar to that of the fuzzy c-means algorithm (Liu et al., 1995), where the parameter m is the fuzziness of the clustering. The Euclidean distance is always used in the calculations, although a different metric could be used. This aim of this method is to divide the data P={p1,...,pm} into a known number of groups C={C1,...,Cn}, and find the point (centre) cj of each group that minimises a cost function J. The groups are defined by an m x n matrix U, where u i,j ∈ [0..1] is the degree of membership of p i to cluster Cj. The cost function J is shown in (Eq. 3.10), where the parameter m is the fuzziness of the clustering. mp nc J= ∑i ∑j pi − c j 2 (3.10) ⋅uijm The centroid of each cluster is calculated as follows (Eq. 3.11): mp ∑ p ⋅u i cj = (3.11) m ij i mp ∑u m ij i The membership of a pattern to a cluster is calculated as follows (Eq. 3.12): 132 HUMBERTO M ARTÍNEZ BARBERÁ u ij = 1 p −c j i nc ∑k 1 (3.12) m−1 2 1 pi − c k 1 m −1 2 To simplify the explanations, let a configuration be a tuple composed of a centroids vector and the corresponding degree of membership matrix, and it will be denoted as Ak. Each tentative solution will be a specific configuration, calculated from equations (Eq. 3.11) or (Eq. 3.12) depending on the method used to generate these tentatives. These methods are discussed later on. In addition, the following parameter names are used: • • • • • MTLS maximum tabu list size. P probability threshold. NTS number of trial solutions. ITMAX maximum number of iterations. TLL tabu list length. The algorithm can be described, by means of four simple steps, as follows: (s1) Initialisation step. Let A0 be an arbitrary solution and J0 be the corresponding objective function value computed using equation (Eq. 3.10). Let Ab = A0 and Jb = J0 . Select values for the following parameters: MTLS, P, NTS, and ITMAX. Let k = 1, let TLL = 0, and go to step s2. (s2) Generation step. Using Ab generate NTS trial solutions A1k , Ak2 ,... , AkNTS , and evaluate their corresponding objective function values J 1k , Jk2 ,.. ., JkNTS , and go to step s3. (s3) Selection step. Order J1k , Jk2 ,.. ., JkNTS in ascending order and denote them by [ 2] [ NTS] J[1] . If A[1] is not tabu, or if it is tabu but J[1k ] < J b , then let Ac = A[1] and k , Jk ,..., Jk k k J c = J [k 1] , and go to step s4; otherwise let Ac = A [k L ] and J c = J [k L ] , where J [kL ] is the best [ 2] [ NTS] objective function of J [1] that is not tabu and go to step s4. If all k , Jk ,..., Jk [1] [ 2] [ NTS] Jk , Jk ,..., Jk are tabu, go to step s2. (s4) Modification step. Insert Ac at the bottom of the tabu list and let TLL = TLL + 1 (if TLL = MTLS + 1, delete the first element in the tabu list and let TLL = TLL - 1). If Jb > Jc, let Ab = A0 and Jb = Jc. If k > ITMAX, stop (Ab is the best solution found and Jb is the corresponding best objective function value); otherwise, let k = k + 1 and go to step s2. To generate the trial solutions, two methods have been tested: 133 CHAPTER 3: SENSOR FUSION (t1) Move centroids. In this case the idea is to move the centroids over the space and then to calculate the new memberships matrix, using equation (Eq. 3.12). For each centroid, if a random number is greater than P, the move is performed. This move can be done in two ways: (m1) Complete space. For each coordinate of the centroid compute a random number and if greater than P, move this coordinate across the complete solution space, delimited by the patterns. (m2) Decreasing radius. In the same way as the method described below compute a random number, and if greater than P then move the coordinate inside a n-dimensional sphere centred in the centroid. At the first iteration, the radius is set to include the complete space and then is reduced lineally at each iteration. (t2) Shake memberships. In this method the idea is to swap the degrees of membership of patterns to clusters and then to calculate the new centroids, using equation (Eq. 3.11). For each pattern, a shake is performed if a random number is greater than P. 3.4.2.3 Experiments and results Several experimental results are presented to demonstrate that with the tabu searchbased fuzzy clustering algorithm it is much easier to find the global or near-global optimum solutions than when using the traditional fuzzy clustering algorithms. To show the influence of the different parameters of the algorithm, the SIMPLE data set (Figure 3.10), which consists of four differentiated and spaced clusters, and the well-known Anderson's IRIS data set, which consists of three clusters unclearly differentiated, are used. Each test will be identified by: A set of values for the algorithm parameters (as described in the section below). • Repeat: as stated in the previous section, the first step of the algorithm generates a random • initial cluster configuration, and thus each run will depend on this configuration. To reduce the effect of this randomness, each test is run several times. Move: the kind of movement generation method used in each test. 134 HUMBERTO M ARTÍNEZ BARBERÁ 6 4 2 0 -5 0 -2 5 -4 -6 Figure 3.10: The SIMPLE dataset. The probability threshold (P) is a key parameter in obtaining good objective function values. The lower the value of P, the more global the search performed is. The higher the value of P, the more local the search performed is. There is a trade off between local and global search. As is shown below (Figure 3.11), the range [0.55 ... 0,65] has been found to be the best. The initial parameters are (Table 3.3) MTLS = 350 ITMAX = 500 move = t1-m1 repeat = 10 NTS = 15 m = 1.5 mp = 52 nc = 4 Table 3.3: Initial parameters for the probability threshold experiment. The following curve (Figure 3.11) represents the average of the objective function for each run with different probability thresholds. 185 175 165 155 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 145 Figure 3.11: Values of the objective function versus the probability threshold (SIMPLE dataset). The next parameter that will heavily condition the success of the algorithm is the maximum number of iterations (ITMAX). Moreover, the performance of algorithms tends to be measured in terms of number of iterations. The experiment will consist of a series of runs of the SIMPLE data set with the parameters listed in (Table 3.4), varying ITMAX from 100 to 3000, and a series of runs of the IRIS data set with the parameters listed in (Table 3.5), varying ITMAX from 100 to 4000. As can be seen, the performance grows with the maximum number of iterations, as could be supposed. MTLS = 70% of ITMAX P = 0.6 move = t1-m1 repeat = 10 NTS = 15 m = 1.9 mp = 52 nc = 4 135 CHAPTER 3: SENSOR FUSION Table 3.4: Initial parameters for the iterations experiment (SIMPLE dataset). MTLS = 2500 P = 0.85 move = t1-m1 repeat = 10 NTS = 15 m = 1.5 mp = 150 nc = 3 Table 3.5: Initial parameters for the IRIS experiment. The objective function values and percentages of well-classified elements for the SIMPLE data set are shown in (Figure 3.12). The upper curve is the average objective function, and the lower one is the percentage of well-classified elements. As can be seen, the percentage of good classifications approaches 100% at approximately 1500 iterations. 3500 3000 2500 2000 1500 1000 500 100 180 160 140 120 100 80 60 Figure 3.12: Values of the objective function and percentage of well classified elements versus iterations (SIMPLE dataset). The percentages of well-classified patterns for the IRIS data set are shown in (Figure 3.13). The curve is the average of the percentage of well-classified patterns. All values below the discontinuous line are acceptable classifications, with less than 15 per cent classification error (Bezdek et al., 1992). 4000 3500 3000 2500 2000 1500 1000 500 100 95 90 85 80 75 70 65 Figure 3.13: Percentage of well classified elements versus iterations (IRIS dataset). The objective function values for the IRIS data set are shown in (Figure 3.14). The upper curve is the average objective function and the lower is the average deviation from the average objective function. All values below the discontinuous line are acceptable classifications (with less than 15 per cent classification error). 136 HUMBERTO M ARTÍNEZ BARBERÁ 200 150 100 50 4000 3500 3000 2500 2000 1500 1000 500 100 0 Figure 3.14: Values of objective function versus iterations (IRIS dataset). Another important parameter is the fuzziness (m). The lower the value of m, the crisper the partitions obtained. The higher the value of m, the fuzzier the partitions obtained. There is a trade off between fully crisp and fully fuzzy clustering. At can be seen (Figure 3.15) the best results are obtained with m in the range of [1.5 ... 1.9]. The experiment will consist of a series of runs of the IRIS data set with the parameters listed in (Table 3.6), with m varying from 1.1 to 2.5. MTLS = 850 P = 0.65 move = t1-m1 repeat = 10 NTS = 15 ITMAX = 1000 mp = 150 nc = 3 Table 3.6: Initial parameters for the fuzziness experiment (IRIS dataset). 90 85 80 75 2,5 2,3 2,1 1,9 1,7 1,5 1,3 1,1 70 Figure 3.15: Percentage of well classified elements versus fuzziness (IRIS dataset). The movement generation methods have been tested with different data sets. The experiment will consist of a series of runs of the IRIS data set with the parameters listed in (Table 3.7), and the methods: shake clusters (S), move centroids constant radius (MC), and move centroids lineal radius (ML). MTLS = 70% of ITMAX P = 0.6 m = 1.9 repeat = 10 NTS = 15 ITMAX = 500, ..., 3000 mp = 150 nc = 3 Table 3.7: Initial parameters for the movement experiment (IRIS dataset). The results of the movement generation test (Figure 3.16) shows that the best method is ML. The method MC presents a high degree of randomness (due to is global nature). The method S presents a poorer degree of correct classifications in the long run, but with a few 137 CHAPTER 3: SENSOR FUSION iterations it gets the better results (1000 iterations approximately). Thus, this method should be used as the initialisation step of the algorithm in order to get a good first trial and reduce the number of iterations significantly. 90 85 80 75 S MC ML Figure 3.16: Percentage of well classified elements versus movement method (IRIS dataset). It should be noted that although the number of iterations is greater than in the case of the fuzzy c-means algorithm, the kind of search performed by the proposed algorithm needs them if it is not to become trapped in a local minimum. The algorithm has been applied to different examples (Figure 3.14) (Figure 3.17) showing comparable results to the well-known fuzzy clustering algorithms available in the literature (Bezdek et al., 1992) (Yuan et al., 1995) (Liu and Xie, 1995) (Pal et al., 1993). 90 80 2000 1500 1000 500 400 300 160 200 120 40 60 80 70 Figure 3.17: Percentage of well classified elements and objective function values versus PIPE iterations (IRIS dataset). 3.4.3 Adaptive network based fuzzy inference system (ANFIS) The ANFIS adaptive network (Jang et alt., 1997) obtains a model consisting of fuzzy rules. It simulates the TSK reasoning mechanism (Kam, 1997) (Kurz, 1996), and it benefits from its adaptive capabilities to tune the rules parameters, which may have been estimated using some previous clustering algorithm. The output of clustering algorithms is a set of centroids {c1,...ct} describing the training set points with similar features. These clusters become the antecedent parameters of the fuzzy rules, which in TSK form are like (Eq. 3.13): Ri ≡ if (x1 ∈ A1i ) ∧ ...∧ (xn ∈ Ani ) then yi = poi + p1i x1 + ...+ p nix n 138 (3.13) HUMBERTO M ARTÍNEZ BARBERÁ where x1,..,xn are crisp values. Each consequent is a linear function with crisp inputs x1,..,x n, crisp output yi, and crisp parameters p0i,...,pni (it can be observed that if p 1i,...,p ni are 0, then the consequent is a crisp constant). Using the approximated reasoning theory and the centre of area mechanism (CoA), the inferred output for a TSK model with t rules is obtained using (Eq. 3.14): t ∑τ ( p i y = * 0i (3.14) + p1i x1 + ... + pni xn ) i =1 t ∑τ i i= 1 where the fire strength for each rule is (Eq. 3.15): n (3.15) τi = ∏ Aji (x j ) j =1 The ANFIS network architecture is composed of 4 layers (Figure 3.18). Nodes of the same layer perform similar tasks. The nodes of layers 1 and 4 are adaptive nodes, and their values are the parameters of the antecedents and consequents, respectively. Figure 3.18: ANFIS network structure. Learning is achieved by combining the backpropagation algorithm and the minimum squares method. In each epoch the method runs a forward and a backward step. In the forward step, for each input vector, the net is evaluated up to layer 4, and the parameters of the consequent are estimated using minimum squares. Next, errors are calculated from each pair of network and desired output. Last, in the backward step, the errors are propagated and the parameters of the antecedents are modified (backpropagation). 3.4.4 Experiments and results 139 CHAPTER 3: SENSOR FUSION The experimental setup for generating fuzzy rules is quite similar to that of the neural network training (see previous section). Two different scenarios have been used for both collecting sensor data and for neural network validation. The first one is the trinity scenario, whose overall dimensions are 3.5 by 3.5 metres, with corridors and doorways of 65 centimetres (Figure 3.19). The other one resembles a corridor with different rooms. Its overall dimensions are 5.2 by 4.9 metres, with corridors of 1.2 metres width, and doorways of 80 centimetres (Figure 3.20). The set of examples (Figure 3.6) and the error metric (Eq. 3.6) are the same as in the neural network training. Figure 3.19: Trinity learning environment. Figure 3.20: Corridor learning environment. To generate the fuzzy rules different combinations of clustering methods have been tried, but only the three combinations that obtained the bests results are described. They use mountain clustering to find the number of clusters (Table 3.9), a clustering method (tabu, kmeans or fuzzy k-means), and an ANFIS network (Table 3.11) to generate rules from the clusters. The output of the ANFIS network is a series of TSK rules, which use bell fuzzy sets in the antecedents and the product t-norm. The specific clustering methods of the three combinations are shown below: • Experiment ANFIS-3: uses the fuzzy k-means clustering (Table 3.9). • Experiment ANFIS-5: uses the k-means clustering. • Experiment ANFIS-8: uses the fuzzy tabu clustering (Table 3.10). Parameter α β γ Meaning Density function height Last cluster effect elimination Tolerance threshold Value 1 1.5 0.001 Table 3.8: Mountain clustering parameters. Parameter f δ Meaning Fuzziness Tolerance threshold Table 3.9: Fuzzy k-means clustering parameters. 140 Value 2 0.001 HUMBERTO M ARTÍNEZ BARBERÁ Parameter m MTLS P NTS ITMAX TLL Meaning Fuzziness Maximum tabu list size Probability threshold Number of trial solutions Maximum number of iterations Tolerance threshold Value 1.5 300 0.65 5 1000 0.001 Table 3.10: Fuzzy tabu clustering parameters. Parameter λ κ n m p q epochs Meaning Forgetting factor Initial step size Decreases before augmenting Oscillations before decreasing Step increase ratio Step decrease ratio Iterations of the learning process Value 1 0.2 2 4 0.1 0.1 10000 Table 3.11: ANFIS network parameters . After the training process has finished, the robot is run in both scenarios along the paths used to extract the examples (Figure 3.19) (Figure 3.20). The purpose of this process is to test the learning capabilities of the different fuzzy rule sets. The results are summarised below (Table 3.12) (Table 3.13), including the results of the neural network and the raw infrared and sonar sensors for comparison purposes. They represent the average error (Avg), the variance (Var) and the rooted mean square error (RMSE) using the previously defined error metric. Method ANFIS-3 ANFIS-5 ANFIS-8 ANN FLYNN IR SONAR Avg 0,2112 0,1807 0,1743 0,2052 0,1781 0,2498 0,5906 Var 7,7404 1,1832 5,7561 5,0959 6,3026 2,6540 20,8355 RMSE 0,3493 0,2617 0,2966 0,3050 0,3078 0,3779 0,9901 Table 3.12: Errors in the trinity learning scenario. Method ANFIS-3 ANFIS-5 ANFIS-8 ANN FLYNN IR SONAR Avg 0,5954 0,5901 0,5599 0,6024 0,6091 1,0938 1,8561 Var 65,7508 64,3527 65,3739 64,8455 71,5587 57,7467 342,3615 RMSE 1,0059 0,9958 0,9834 1,0056 1,0424 1,3318 2,6208 Table 3.13: Errors in the corridor learning scenario. The results have been plotted (Figure 3.21) (Figure 3.22) in order from the lowest to highest average error. It can be noted that the combination ANFIS-8 produces the best results, both in Avg and RMSE. The fuzzy rules obtain better results (compared with the simple 141 CHAPTER 3: SENSOR FUSION Flynn’s rules) in the corridor scenario because it is similar to a real world setup and the trinity scenario has corridors which are too narrow for the robot size. Figure 3.21: Trinity learning results. Figure 3.22: Corridor learning results. Finally, the robot is run in both scenarios along different paths (Figure 3.23) (Figure 3.24). The purpose of this process is to test the generalisation capabilities of the different fuzzy rule sets. The results are summarised below (Table 3.14) (Table 3.15), including the results of the neural network and the raw infrared and sonar sensors for comparison purposes. They represent the average error (Avg), the variance (Var) and the rooted mean square error (RMSE) using the previously defined error metric. Figure 3.23: Trinity validation environment. Method ANFIS-3 ANFIS-5 ANFIS-8 ANN FLYNN IR SONAR Figure 3.24: Corridor validation environment. Avg 0,3018 0,2657 0,2846 0,3130 0,2750 0,3547 1,2684 Var 11,9367 11,9246 11,0117 11,1385 14,2645 4,9095 47,5786 142 RMSE 0,4588 0,3590 0,4372 0,4575 0,4672 0,5240 1,7466 HUMBERTO M ARTÍNEZ BARBERÁ Table 3.14: Errors in the trinity validation scenario. Method ANFIS-3 ANFIS-5 ANFIS-8 ANN FLYNN IR SONAR Avg 0,6920 0,6905 0,6282 0,6762 0,6829 1,1076 2,0324 Var 55,9554 55,5552 59,3456 56,8823 66,1706 60,4763 328,0456 RMSE 1,0190 1,0160 0,9940 1,0130 1,0621 1,3534 2,7223 Table 3.15: Errors in the corridor validation scenario. The results have also been plotted (Figure 3.25) (Figure 3.26), in order from lowest to highest average error. It can be noted that the combination ANFIS-8 produces the best result in the corridor scenario, but in the trinity scenario ANFIS-5 and the Flynn’s rules obtain slightly better results. This is because the trinity scenario can be navigated using only infrareds and the validation path follows a very complicated trajectory. In any case, the best overall results are obtained with the ANFIS-8 fuzzy rules, which have been obtained using the fuzzy tabu clustering methods described above. Figure 3.25: Trinity validation results. Figure 3.26: Corridor validation results. 3.5 Conclusions Two different methods have been developed for solving the sensor fusion problem in a priori unknown environments. In this problem, model based techniques that make use of a priori information are not applicable, and thus, neural networks and fuzzy rule based systems are used instead. These fusion techniques require some examples for the learning stage. Obtaining such examples from real sensors is an extremely complex problem. The approach followed is to use examples obtained using a simulator. For this reason, models of sonar and infrared sensors have been evaluated and integrated in a simulation environment, which is also used for sensor fusion validation. The sonar model is based on the sonar-tracing technique which has been modified to take into account the sonar lobe effects, and the infrared model has been obtained by linearising experimental values. 143 CHAPTER 3: SENSOR FUSION Once the neural network and fuzzy rules fusion methods have been trained, they are tested on different simulated environments ranging from a small to a large environment. The fusion methods are then compared with raw sensor data and with a simple and heuristic rule based system. The results show that in most cases the proposed fusion methods perform better than the raw sensors and simple rules. In addition, the fuzzy rule system obtained using fuzzy tabu clustering has the best overall performance of all the different fusion methods tested. The fuzzy tabu clustering is an algorithm based on the fuzzy c-means algorithm but with the introduction of a global minimum search by means of a tabu search technique. The algorithm has been successfully applied also to different examples showing comparable results with the well-known fuzzy clustering algorithms available in the literature. 144 Chapter 4 Robot Navigation 4.1 Introduction For a mobile robot, the ability to navigate is one of the most important of all. Staying operational, for instance avoiding dangerous situations such as collisions, come first. But if any tasks which relate to specific places in the robot’s environment are to be performed navigation is a must. Navigation can be defined as the combination of three fundamental competences (Figure 4.1): • Map-building. This is the process involved in constructing a map from sensor readings • • taken at different robot locations. The correct treatment of sensory information and the reliable localisation of the robot are fundamental points in the map-building process. Localisation. This is the process involved in obtaining or knowing the actual robot’s pose or location from sensor readings and the current map. An accurate map and reliable sensors are key points in achieving good localisation. Path-planning. This is the process involved in generating a feasible and safe trajectory based on the current map from the current robot location to a goal position. In this case it is very important to have an accurate map and a reliable localisation. As can be noted, these competences are tightly coupled. If one of them fails, the robot performance is greatly degraded or reduced. A typical example is when the sensory system does not detect a door and the path-planner cannot trace a path through the corresponding wall. Often, one or more of those competences may not be present in a given robot, usually when it is working in known and structured environments and when the robot is provided with a priori information or special sensor systems. One of the objectives of this thesis is to work on unknown environments, and thus, the robot is provided with the three competences. The following subsections describe some basic techniques to develop and implement these competences. The Quaky-Ant platform has been provided with algorithms and methods to address the navigation problem. Initially, the system relied on the assumption of reliable odometry for 145 CHAPTER 3: SENSOR FUSION short time and distance operations. In line with this assumption, it incorporated a map building method and a path planner. The chosen algorithms proved to be adequate in terms of performance and robustness. Then, to overcome that demanding assumption, the platform incorporated a localisation method based on another map building method. The final combination of methods proved to perform quite well. The following sections show these methods used in the Quaky-Ant platform. Figure 4.1: Navigational tasks. 4.1.1 Map representation methods The most natural representation of the robot’s environment is a map (Dudek and Jenkin, 2000). In addition to representing places in an environment, a map may include other information, including reflectance properties of objects, regions that are unsafe or difficult to traverse, or information of prior experiences. In general, map representation methods can be divided into two main groups: those that rely primarily on an underlying metric representation and those that are topological. Metric maps can also be decomposed into spatial and geometric representations. The most straightforward spatial representation is to sample discretely the two- or three-dimensional environment to be described. The simplest method is to sample space in cells of a uniform grid. Samples taken at points in the lattice express the degree of occupancy at the sample point. In two dimensions the grids are known as occupancy grids. In three dimensions the sampling elements are known as voxels. The main advantage of a regular lattice representation is its extreme generality: no strong assumptions are made regarding object type, and thus these grids can represent anything. The main disadvantage of this representation is that grid resolution or fidelity is limited by the cell size, and the grid is storage intensive even if much of the environment is empty or occupied. In addition to representing the level of occupancy at particular locations in space, elements of the lattice can also be tagged with other attribute information such as the confidence of the occupancy value, the safety or desirability of occupying the cell, terrain information and so forth. 146 HUMBERTO M ARTÍNEZ BARBERÁ One alternative to the storage problem is to represent space using cells of nonuniform shape and size, although a recursive hierarchical representation is used more commonly. The most common of these methods is the quadtree, which is a recursive data structure for representing a square two-dimensional region. It is a hierarchical representation of cells that can potentially reduce storage and aid in certain types of computations. Cells that are neither uniformly empty nor full are subdivided into four equal subparts. Subparts are subdivided in turn until they are either uniformly empty or full or until a resolution limit is reached. The three-dimensional analogy of the quadtree is an octree. In general, the number of cells varies roughly with the areas of the obstacles being described. Thus, for environments where most of the space is free or occupied, quadtree-like representations are very suitable, at the expense of some computational overhead. Quadtrees are restrictive in some way because, unfortunately, not all space is well characterised by this power of two representation, nor are all environments aligned with power of two boundary planes. An alternative spatial decomposition method, which is not quite restrictive, is binary space partitioning trees (BSP trees). It is a hierarchical representation within which a rectangular cell is either a leaf or is divided into two BSP trees by a plane parallel to one of the external boundaries of the environment. BSP trees provide the same binary space-splitting characteristics of quadtrees but do not enforce the same exactly even subdivision. A BSP representation of a given space is not necessarily unique and radically different representations of the same space are obtained. Geometric maps are those made up of discrete geometric primitives: lines, polygons, points, polynomial functions and so forth. Such maps have the advantage of being highly space-efficient because an arbitrarily large region of space can be represented by a model with only a few numerical parameters. In addition, geometric maps can store occupancy data with almost arbitrarily high resolution and without becoming liable to the storage penalties incurred by techniques based on spatial sampling. Geometric maps are composed of the union of simple geometric primitives. Such maps are characterised by two key properties: the set of basic primitives used for describing objects, and the set of composition and deformation operators used to manipulate objects. The primary shortcoming of geometric model-based representations relates to the fact that they can be difficult to infer reliably from sensor data. Three fundamental modelling problems are regularly encountered: lack of stability, lack of uniqueness, and lack of expressive power. The lack of stability is related to the representation, which may change in a drastic way given just a small variation in the input. One approach to addressing the instability is to add a stabiliser that acts to damp variations in the modelling system with variations in the input. The stabiliser is a criterion that serves to bias the model-fitting process toward specific types of solutions. The lack of uniqueness is due to the fact that many different environments may map to the same representation. This problem occurs because many modelling systems, especially those that have substantial expressive power and that approximate the input data rather than expressing it exactly, can express a single set of observations in more than one way. Some approaches associate a cost with each model used and its approximation errors, and attempt to find a single model that minimises the total cost. The lack of expressive power is due to the fact that it may be difficult (or impossible) to represent the salient features of the environment within the modelling system because they are not well suited to expressing 147 CHAPTER 3: SENSOR FUSION distributions of measurements and the associated error models. One approach is to explicitly represent aspects of the original distribution that are important, such as means or standard deviations of uniform or Gaussian distributions. Although better than ignoring the data misfits, such error models may fail to capture the complex distributions associated even with simple sensors. Geometric representations rely on metric data as the core of the representation. Unfortunately these metric data are likely to be corrupted by sensor noise at the very least. To avoid reliance on error-prone metric data, a non-metric topological representation may be used. The key to a topological relationship is some explicit representation of connectivity between regions or objects. In its purest form this may involve a complete absence of metric data. A topological representation is based on an abstraction of the environment in terms of discrete places with edges connecting them. In some cases, the edges have length and the edges are oriented with respect to the nodes. Typically, a robot’s environment is modelled as a graph whose vertices correspond to landmarks, which can be recognised using sensors, placed within the environment. Each vertex corresponds to one of the unique landmarks, whereas edges correspond to known straight paths between landmarks. Each edge may be labelled with the distance that needs to be travelled along the edge to arrive at the next landmark. Although the robot has no real understanding of the geometric relationship between locations in the environment, the representation does encode sufficient information for the robot to conduct point-to-point navigation. The representation is also extremely compact. 4.1.2 Path planning The basic path planning problem refers to determining a path in configuration space between an initial configuration of the robot and a final configuration, such that the robot does not collide with any obstacle in the environment, and that the planned motion is consistent with the kinematic constraints of the vehicle (Dudek and Jenkin, 2000). Configuration space is a formalism for motion planning, in which a configuration is a specification of the physical state of a robot with respect to a fixed environmental frame. The basic formulation can be augmented in several ways, such as considering the minimum length path, moving obstacles, minimum time path, etc. To render the planning problem tractable it is often necessary to make a variety of simplifications with respect to the real environment. There are two groups of techniques for path planning: discrete and continuous state space search. Searching a discrete state space involves a set of possible problem states and a state transition function to determine the states directly reachable from any given state. A search method is an algorithm to control the exploration of the state space in order to identify a path from some initial state to the goal. Many graph search methods exist in the literature. They assign a node to each possible state and establish arcs between nodes that are transversable, that is, there is a free path from one to another. Given this state transition function, a graph search algorithm will find a path from an initial node s to any of a final set t, if any of such paths exists. Some graph search formulations make use of evaluation functions to obtain 148 HUMBERTO M ARTÍNEZ BARBERÁ optimal or heuristically optimal paths. Usually, these formulations try to reduce the number of nodes that are evaluated in order to obtain faster searches. If no path exists from the start to the goal, these algorithms eventually will evaluate every possible reachable state before returning false. This failure can take a considerable amount of time to detect. Although effective, simple graph search can be too slow for robot path planning for a reasonably sized search space. The most widely used graph search algorithm for path planning is A*, which finds optimal paths from s to t, and can be tuned to expand a minimum number of nodes. Dynamic programming is another general purpose technique that can be used for path planning. It is a recursive procedure for evaluating the minimum cost path to any point in the environment from some source. To use this method the problem adhere to a principle of optimality, which can be stated as an assertion that given three points A, B, and C, the optimal path from A to C via B can be computed by determining the optimal path from A to B and from B to C. The discretised environment is referred to as a cost table. The idea is to compute the cost of each cell in the table by computing an increment to the cost of an adjacent cell for which the cost is already known. To efficiently compute the path from s to t without completing the entire cost table heuristic functions are often used to expand cells preferentially. In general, these methods produces similar results (optimality and search-time) than graph search techniques. The difference usually resides in the simplicity to define or compute the evaluation function, which hardly depends on the nature of the environment and the specific problem. Given a technique to search a discrete state space for a path, it remains to take an environment and construct a state space to represent it. The most straightforward approach is to represent the space using a grid map and assign anode to each cell and then connect adjacent cells. Other mechanisms for mapping a robot’s environment onto a discrete searchable space include visibility graphs and Voronoi diagrams. A visibility graph produces a minimum length path from a start to a goal by solving a simple graph traversal algorithm. The visibility graph is defined such that a set of vertices is made up of the union of all the vertices of the polygonal obstacles in the environment as well as the start and end points. The edges of the graph connect all vertices that are visible to one another. One unfortunate feature of a path found using a visibility graph is that the path passes through vertices and is arbitrarily close to the edges of obstacles. For non circular robots this method is not complete, it may fail to find a path even if one exists. The visibility graph brings the robot too close to the boundaries of the obstacles. An alternative approach is a Voronoi diagram, which keeps the robot as far as possible from all the obstacles in the environment. A Voronoi diagram is the locus of points equidistant from the closest two or more obstacle boundaries, including the workspace boundary. This set of points has the useful property of maximising the clearance between points and obstacles. Given a start and end position, the robot could move directly away from its nearest obstacle until it moves to a point laying on the Voronoi diagram. It can then follow the set of arcs and straight line segments that compose the diagram until it reaches the nearest collision-free position to the goal. The major problem with this type of planning is that paths a relatively long, because this technique is often too conservative to be attractive. Rather than searching through a discrete space, an alternative is to model the space of the robot as a continuum space and to consider path planning as the determination of an appropriate trajectory within this continuum. This is known as continuous state space planning. Two of the most well known techniques for continuous state space are the potential 149 CHAPTER 3: SENSOR FUSION fields and the vector field histogram methods. These are methods that model the world as forces that drive the robot towards the goal while driving it away from obstacles (see chapter 5 for a detailed description of these methods). These methods are very fast to compute at the expense that they can drive the robot to local minima, although some heuristics try to overcome this problem. Alternatively, there is a class of path planners that are based on the bug algorithm and are guaranteed to find a path from the start to the goal if such path exists. These methods operate by switching between two simple behaviours: moving directly towards the goal location, and circumnavigating an obstacle. The drawback of these methods is that the generated trajectories can be arbitrarily worse than the optimal path to the goal. 4.1.3 Localisation For numerous tasks a mobile robot needs to know where it is either on an ongoing basis or when specific events occur. This problem has many different connotations. In the strongest sense, knowing where the robot is involves estimating some global representation of the space. This is usually referred to as strong localisation. The weak localisation problem, in contrast, involves merely knowing if the current location has been visited before. Between the extremes of the weak localisation and strong localisation problems exist a continuum of different problem specifications that involve knowing where the robot is or estimating the robot’s pose. In certain circumstances it may be necessary to infer the robot’s position without an a priori estimate of its location. This type of positioning is referred to as global localisation. A more common version of the localisation problem is the need to refine an estimate of the robot’s pose continually. This task is known as pose maintenance or local localisation. A key step in the process of performing either local or global localisation involves matching the set of current observations to some established map. Standard matching methods can be broadly classified into the following categories: • Data-data matching. Directly matching the current raw data with predicted raw data • • (extracted from the map either by predictive modelling or using stored data sets). Data-model matching. Matching the observed data to more abstract models stored in the map (based on model of how models and data are associated). Model-model matching. Matching models stored in the map to models generated from current observations. In general matching with raw data can reduce dependence on a priori assumptions about the environment but tends to be less robust unless the matching technique itself is very sophisticated (in which case it resembles model-model matching). Most devices for measuring position and distance are relative measurement tools. By counting the number of rotations executed by a vehicle’s drive wheels, for example, and using knowledge of the wheel’s size and the vehicle’s kinematics, an estimate of the rate of position 150 HUMBERTO M ARTÍNEZ BARBERÁ change can be obtained. Computing absolute coordinates thus involve the integration of such local differential quantities. This technique is known as dead reckoning, and when using odometers it is called odometry. Dead reckoning can have acceptable accuracy over sufficient small steps given a suitable terrain and drive mechanism. For larger or complex motions, however, the unavoidable errors in the individual position estimates have a major consequence. Simple scaling errors can be corrected, but more complex errors are essentially impossible to eliminate. Another different broad set of techniques is based on the use of external referencing systems. These are usually known as landmark-based localisation. In principle, the problem is related to pose estimation of a landmark with respect to a fixed sensor. Important variations of the problem arise when the landmarks are unlabelled, when the landmarks are difficult to detect, or when the measurements are inaccurate. Landmarks can be active or passive, natural or artificial. Active landmarks are typically transmitters that emit unique signals and are placed about the environment. Active landmarks avoid many of the problems commonly associated with passive, naturally occurring landmarks. In general, the use of an artificial landmark, either active or passive, can extend the operational range of the underlying sensor technology relative to a natural landmark. Triangulation refers to the solution of constraint equations relating the pose of an observer to the positions of a set of landmarks. Pose estimation using triangulation methods from known landmarks has been practised since ancient times, and it is the basis of multiple landmarks localisation. Sensor-based servoing is a technique based on storing a specific sensor image I(qgoal) associated with a target position for the robot qgoal and using the difference between I(qgoal) and I(q c) to move the robot from its current position q c to the goal. As a result of the approximations used and sensor noise, actually attaining the target position usually involves an iterative approach. Sensor based servoing usually lacks a robust and stable feature space, quantitative positioning, and can only be used to return to previously visited places. On the other hand, the technique has low overhead and makes only simple assumptions about the environment. To accomplish position estimation using external referencing, the disparity between the known and observed locations of obstacles provides an estimate of positions to be used in correcting the drift in estimates from internal sensors. A Kalman filter is one of the most widely used techniques to combine pose estimates from dead reckoning and other landmarkbased measures in an optimal manner. In the context of mobile robotics, the criteria that assure an optimal result are often violated. In practice, to account the different nonlinearities of the sensors response functions a variant of the Kalman filter, the extended Kalman filter (EKF), is obtained by linearising the model of the system. Although the Kalman filter is a powerful technique that often performs well under suitable conditions, it does have its shortcomings. Fundamentally, it depends on the acceptability of the linearisation of the system being modelled (extended), an acceptable estimate of the sources of error in the system, and on well behaved error distribution (e.g. Gaussian). In addition, the need for a simple relationship between geometric structures and observed measurements makes it difficult to apply such methods in some environments. 151 CHAPTER 3: SENSOR FUSION 4.3 Map building and path planning The first navigation procedure relies on the assumption of reliable odometry for short time and distance operations. With this assumption in mind, the robot is provided with a map building method and a path planning algorithm. The robot does not know the environment a priori, and thus, it has to perform both tasks simultaneously. A fuzzy grid map is used for map building, while the path planning is based on the A* algorithm. As opposed to other navigation schemas, the robot is required to build the maps while it is moving, and thus approaches based on non-continuous movement (the robot maps the area, generates a short distance map, and then moves that distance) are not desired. The rationale for this requisite is to exhibit some kind of reactivity (see chapter 5) so that the robot seems to perform more animal or human like. The proposed navigation procedure is as follows. Each time new sensor values are available, the map is updated. This step, which is not very time consuming, is performed at a given frequency (approximately every 300 ms). The robot simultaneously is moving and with a certain periodicity a new path is computed based on the current map. This step is performed at a slightly lower frequency (approximately every 1200 ms), because the path planning is comparatively more time consuming. The different algorithms are explained below. 4.3.1 Fuzzy grid map The choice of internal map representation depends on various characteristics but mainly on the environment itself (obstacles density, rooms dimension, etc.). Lee (Lee, 1996) makes a classification attending to the range of geometric properties that can be derived from the map. Considering the properties of an indoor, high density environment, a metric map may be considered as the most appropriate choice because of the compact representation of the environment. The square cells grid maps are the most widely used metric maps (Moravec, 1985), (Borenstein, 1988), (Lim, 1992). Although they have many shortcomings, the advantage of employing grid maps is that path planning is quite direct and simple. In recent years, different tools based on neural networks (Kurz, 1996), (Thrun, 1998) or fuzzy logic (Oriolo et alt., 1998) have been introduced to build grid maps. The method used for map building is based on fuzzy grid maps (Oriolo et alt., 1998), and is described below. Each cell in the map has two values associated: the degree of certainty that the cell is empty εij and the degree of certainty that the cell is occupied ωij. These values are calculated for each cell in the beam of a single sensor. As sonar and infrared sensors have different aperture widths, that of the wider sensor (sonar) is used. Let τ be the fused value for a given pair s of sonar and infrared sensors. Let cs be cells in the aperture of s. For each cs, whose polar coordinates from s are θ and ρ, the value of its corresponding εij and ωij is computed using equations (Eq. 4.1) and (Eq. 4.2). ε(θ,ρ) = kεg b gcg d (4.1) 152 HUMBERTO M ARTÍNEZ BARBERÁ ω(θ , ρ) = kω ga gc gd (4.2) These equations combine some possibilities and beliefs about the cells, taking into account their position along the sensor beam. In the equations described below, the following values have been used: ∆ρ = 0.15, kε = 0.1, k ω = 0.5, h1 = 1.2, h2 = 1.0, and h3 = 0.1. • • • • The possibility ga that cells in the arc of the sensor range τ are obstacles (Eq. 4.3). The possibility gb that cells in the circular sector of radius τ are empty (Eq. 4.4). The belief g c that shorter distances are more accurate (Eq. 4.5). The belief g d that cells near to the beam axis are empty (Eq. 4.6). 2 1 − ρ− τ ∀ρ: ρ− τ < ∆ ρ g a ( ρ) = ∆ρ 0 otherwise (4.3) 1 ∀ρ < τ − ∆ρ gb (ρ ) = 0 otherwise (4.4) gc (ρ) = min(1,h1e − h2 ρ + h3 ) (4.5) 12 2 π gd (θ) = 1− π θ ∀θ : θ < 12 0 otherwise (4.6) The εij and ωij obtained in this way are then aggregated to the previous values of each cell, to obtain the aggregated degree of certainty that the cells are empty E(i,j) or occupied O(i,j), using equations (Eq. 4.7) and (Eq. 4.8). As these is combined in way, if the cell 4.9): the degree 4.10). E(i , j ) = U εk (i, j) (4.7) O(i, j) = U ωk (i, j ) (4.8) E(i,j) and O(i,j) correspond to fuzzy degrees of membership, such information an additional value for each cell M(i,j), which represents, in a conservative is occupied. The meaning can be defined precisely based on its definition (Eq. of certainty that the cell is empty, is not occupied, and is not ambiguous (Eq. C(i, j) = E(i, j)I O(i, j ) (4.9) M(i, j) = E(i, j )IO (i, j)I C(i, j ) (4.10) Using this method, the robot uses and maintains three different maps: • The map of cells that are likely to be empty (Figure 4.2), which is constructed using the E(i,j). Darker cells correspond to a higher possibility of being empty, while lighter cells correspond to a lower possibility. 153 CHAPTER 3: SENSOR FUSION • The map of cells that are likely to be occupied (Figure 4.3), which is constructed using the • O(i,j). Darker cells correspond to a higher possibility of being occupied, while lighter cells correspond to a lower possibility. The navigation map (Figure 4.4), which is constructed using the M(i,j). This map is the one that is used later for path planning. Darker cells correspond to a higher certainty of being an obstacle, while lighter cells correspond to a higher certainty of being empty. Figure 4.2: Empty cells map. Figure 4.3: Occupied cells map. Figure 4.4: Navigation map. One important point to be noted here is derived from the fact that the original method (Oriolo et alt., 1998) only takes into account sonar sensors, modelling the uncertainty directly on the map. In this case, virtual sensors are used to build the map instead of using the sonars directly. These virtual sensors fuse sonar and infrared sensors to obtain a more reliable measure (see chapter 3). The fusion method works as a sonar with distances greater than 80 centimetres. With smaller distances it works as an infrared. In this case the infrared beam is more or less similar to the sonar one if it is projected on a cell map, because the beam is discretised on 10-20 centimetres cells. Thus, the map model is still valid. 154 HUMBERTO M ARTÍNEZ BARBERÁ Another important difference is the map initialisation. The original method was considered for use in a sense-map-plan-move approach, and the unexplored area is modelled as an ambiguous (εij = 1.0, ωij = 1.0) or occupied area (εij = 0.0, ωij = 1.0). As has been stated above, this behaviour is not desired and consequently the path-planning procedure is different. This requires that the unexplored area is modelled as empty area (εij = 1.0, ωij = 0.0) so that tentative paths may cross unexplored areas if necessary. Finally, all the cells occupied by the robot are set to the maximum values of certainty that the cells are empty (εij = 1.0) and not occupied (ωij = 0.0). Trivially, if the robot is over these cells, there is not any obstacle there. The influence of this strategy can be noted in the maps shown above (Figure 4.2) as very dark cells over the robot’s trajectory. 4.3.2 A *-based planning The most direct method for path-planning using grid maps is to consider each cell as a node in a graph and then apply a graph search algorithm. The A* algorithm (Hart et alt., 1968) allows heuristic information to be incorporated when available, resulting in an efficient search. Let s be the cell which the robot is currently located on. Let t be the desired objective cell. Let f(n) be an evaluation function that heuristically decides how likely the node n is to be expanded. The A* algorithm proceeds as follows: (s1) Mark s as open, and calculate f(s). (s2) Select the open node n whose value f(n) is smallest. Resolve ties arbitrarily, but always in favour of any node n = t. (s3) If n = t, mark n as closed and terminate the algorithm. (s4) Otherwise, mark n as closed and apply the successor operator to n. Calculate f(n) for each successor n and mark as open each successor not already marked as closed. Remark as open any closed node n i which is successor of n and for which f(n i) is smallest now than it was when ni was marked closed. Go to step s2. The evaluation function f(n) can be written as a sum of two parts (Eq. 4.11): (4.11) f(n) = g(n) + h(n) where g(n) is the actual cost of an optimal path from s to n, and h(n) is the actual cost of an optimal path from n to the goal t. As f(n) is not known a priori, an estimate f *(n) is used. The new evaluation function can be written as a sum of two parts (Eq. 4.12): f *(n) = g*(n) + h*(n) (4.12) where g *(n) is the cost of the path from s to n with minimum cost so far found, and h*(n) is an estimate of the cost of an optimal path from n to the goal t. From the definition, the following condition (Eq. 4.13) must hold true: 155 CHAPTER 3: SENSOR FUSION g (n) g*(n) (4.13) 0 A* is said to be complete under the admissibility condition (Eq. 4.14): 0 h*(n) (4.14) h (n) Moreover, the heuristic function h(n) is said to be locally consistent if, for any pair of adjacent cells (a, b) the following condition (Eq. 4.15) holds true: 0 h*(a) h*(b) + w(a,b) (4.15) where w(a,b) is the cost of the arc between the cells a and b. The choice h*(n) = 0 is trivially admissible and locally consistent, resulting, however, in a non-informed search. Taking a fuzzy grid map M(i,j), as described in the previous subsection, the path planning problem can be stated in the following way: the robot has to navigate from an initial cell s’ to a goal cell t’. Instead of looking for a path starting at the robot location, the A* is applied to look for a path starting in the goal cell. In this way, the starting position s’ corresponds to the A* goal t, and the goal position t’ corresponds to the A* start s. There are approaches to solve this problem taking into account sensor noise, the robot size and minimum length paths (Oriolo et alt., 1998), but not simultaneously. In this thesis a different approach has been taken to find minimum length paths while providing safe clearance of obstacles. Using the previous problem statement, the h*(n) function (Eq. 4.16) is defined as follows: * 2 2 h (n) = (s x − nx ) + (s y − n y ) (4.16) where (sx, sy) and (n x, ny) are the coordinates of the cells s and n respectively. Trivially, the condition (Eq. 4.14) holds true as do the local consistency criteria (Eq. 4.15), because this h*(n) is the Euclidean distance of the most direct path between s and n. The definition of the g*(n) is more complex as it is computed in a two step process: first, the fuzzy map is dilated locally around the cell a, and second, the resulting cost is augmented. The dilation step dil(a) is performed as follows (Eq. 4.17): min {M } i∈[a x − δ , a x +δ ] dil(a) = [ j ∈ a y −δ ,a y +δ (4.17) ij ] kdil where δ is the size of the dilation window, and k dil is the dilation weight. The augmentation step aug(x) is performed as follows (Eq. 4.18): k aug ⋅x aug(x) = e (4.18) where k aug is the augmentation weight. Finally, the g*(n) function is recursively computed as follows (Eq. 4.19): 156 HUMBERTO M ARTÍNEZ BARBERÁ v(n) = aug(dil(n)) (4.19) v(n) n x = (n + 1)x ∧ n y = (n + 1)y * g inc (n,n + 1) = 2 ⋅ v(n) otherwise g * (s) = 0 * g (n) = * * g (n + 1) = g* ( n) + g inc (n,n + 1) * where v(n) is the composition of the dilation and augmentation steps, and g inc (n,n + 1) is * * the incremental step for the computation of g (n). The incremental function ginc (n,n +1) computes the cost of traversing from cell n to n+1, where n+1 is the next cell in the path to the goal. Basically, the function returns v(n), if both cells are in the same horizontal or vertical, or 2 v(n ) if the cells are diagonally aligned. The dilation function serves to fill small gaps in the maps so that tentative paths do not cross walls or small openings. By choosing different values of δ, different robot radii or sizes can be taken into account. The augmentation function exponentially spreads the degrees of certainty that cells are occupied or empty so that small certainty values give similar results while high certainty values give much bigger results. The idea is to avoid the use of occupied cells for path planning, and thus, this augmentation helps the use of lower certainty values M(i,j) when possible. The dilation function is applied only locally in order to avoid applying the transformation to the whole map because plans usually involve small portions of the map and the dilation process is time consuming. A sample of the effects of the evaluation function f *(n) is shown below (Figure 4.5). The dilation effect is observed as peaks with half the height of their surrounding. The augmentation effect is observed as the difference between cells with different certainty values. Figure 4.5: Evaluation function f*(n). Using the previous A* based planning scheme, the navigation strategy uses the computed path to set a look-ahead point which is used as a short-term goal location. This look-ahead point is defined as the point of the path from s to t which is at a distance of δlook cells from s. The distance δlook depends on both the robot’s average speed and the navigation control cycle frequency. For the Quaky-Ant robot, the following navigation parameters are used: k dil = 2.0, δ = 2, kaug = 5.0, δlook = 10. A sample path is shown below (Figure 4.6), where 157 CHAPTER 3: SENSOR FUSION the dilation effect (the path leaves enough clearance near the walls) and the look-ahead point (small circle in the front of the robot) can be noted. * Figure 4.6: A based path. 4.3.3 Experiments and results Different experiments have been conducted to test the simultaneous map-building and path-planning. One of these tests is shown below (Figure 4.7). The Quaky-Ant robot is simulated in autonomous mode (see chapter 6), and has to navigate from a given home position (bottom-right corner) to a goal location (top-left corner) in an a priori unknown environment. The different slides show the evolution of the fuzzy grid map (top image) and the A* evaluation function f *(n) (bottom image). The world map is superimposed on map images for reference. Cells that are not expanded are represented with the maximum value of f*(n) found so far. The robot starts (position “a”) with no previous knowledge, and thus the path to the goal is an almost straight line. As the robot adds new features to the fuzzy grid map, the path consequently avoids the modelled walls (position “b”). Due to sensor noise, parts of the walls are not modelled, and in this way paths can be generated through them if the corresponding holes are bigger than the robot (position “c”). If more evidence about the wrong walls is acquired, even if they are not completely continuous, the dilation step prevents the method from crossing the walls with a path (position “d”). When the local environment is almost fully modelled (position “e”), the path is traced over safe positions until the robot reaches the goal (position “f”). 158 HUMBERTO M ARTÍNEZ BARBERÁ a) Starting the run. b) Crossing the 1st doorway. d) In the corridor. e) Crossing the 2 doorway. nd c) Entering the corridor. f) Reaching the goal. Figure 4.7: Simultaneous map-building and path-planning (simulated). 159 CHAPTER 3: SENSOR FUSION The experiment area has a dimension of 5.4 x 4.95 metres, and it is tessellated in 0.075 metres wide cells, resulting in a grid map that contains 4,752 cells. The following table (Table 4.1) shows the number of cells in the path, the number of expanded nodes and the time taken to expand the nodes in each of the previous navigation slides (Figure 4.7). The test has been performed in a Mac PowerPC G3 running at 300 MHz on top of the BGen environment (see chapter 5), which has been developed in Java. Position a b c d e f Path 52 53 48 40 20 5 Expanded 52 814 579 374 99 5 Time (ms) 5 100 148 23 7 1 Table 4.1: Number of expanded cells. In addition to simulated experiments, the navigation approach has also been tested in real world environments (see chapter 6). One of these is shown below (Figure 4.8). The mapped area has a dimension of approximately 8 x 8 metres, and it is composed of wooden walls (bottom-left room) and standard-building walls (top-right corridor). The cleanness of the map and the safe path generated are points to be noticed. Figure 4.8: Simultaneous map-building and path-planning (real world). 4.4 Map building and localisation The first navigation procedure relied on the assumption of reliable odometry for short time and distance operations. This is clearly an important constraint in large environments or in operations requiring long time. To overcome this limitation, the previous system is complemented with a localisation method based on a fuzzy segments map. Global grid maps 160 HUMBERTO M ARTÍNEZ BARBERÁ are not suitable for localisation, while local grid maps (based on sonar signature matching) require a precise compass reference to obtain reliable results (Duckett and Nehmzov, 1998). In addition, the previous approach, if provided with a reliable localisation, performs fast and produces safe paths. Taking into account these facts, it was decided to improve the system by adding a separate technique for localisation, which did not need an a priori environment model. The new navigation procedure is as follows. The fuzzy grid map and the A* path planner are executed in a similar way as described in the previous section, but instead of using positions returned by the odometry, they use the corrected localised positions. Simultaneously, a local fuzzy segments map is updated each time a sensor buffer is full (approximately every 1200 ms). This is compared (approximately every 2500 ms) to a global fuzzy segments map to compute the differences and thus obtain the current odometry drift, which is then corrected. Both the local and global maps are constructed from scratch, without any prior knowledge. The different algorithms are explained below. 4.4.1 Fuzzy segments map The method used for map building is based on the fuzzy segments map (Gasós and Martín, 1996b)(Gasós, 2000), which is a geometric representation composed of line segments only. By carefully managing the uncertainty related to the robot’s position and sensor noise, the method tries to overcome some of the problems typically related to geometric models. In this way, a fuzzy segments map vastly reduces the lack of stability by decreasing the expressive power of the line fittings which keep the uncertainty all the way along the process. The method is described below. Let {ou ...o v },v ≥ u + 3 be a set of at least four consecutive collinear sensor observations, and let {(xu ,y u )K (x v , yv )} the corresponding points in the plane. A segment is a tuple (Eq. 4.20): B = θ , ρ ,(xi , yi ),(x j , yj ), k (4.20) where θ, ρ are the parameters of the equation x ⋅ cos(θ ) + y ⋅ sin(θ ) = ρ , obtained by eigenvector line fitting of the points in the plane k = v – u + 1, and the limits of the segment (xi, yi),(xj, yj) are the perpendicular projections on the line of the first and last points (xu, yu) and (xv, yv) respectively. The uncertainty in the real location of a segment B is due to three different groups of factors: the lack of knowledge about the way the sensor signal reflects in the objects, the distance between the sensor and the object, and the uncertainty of the robot location due to estimation by dead reckoning. Given a segment B, the uncertainty of its real location due to the first group of factors is expressed by representing the value of ρ as a trapezoidal fuzzy set fρ1 which depends on the scatter of the points. The trapezoidal fuzzy set is built by assigning the confidence interval with confidence level 0.68 to the α-cut in 1, and the interval with 161 CHAPTER 3: SENSOR FUSION confidence level 0.95 to the α-cut in 0. For a normal distribution of observations, the values 0.68 and 0.95 generate limits of the interval that are located at a distance equal to one and two times the standard deviation. These intervals are computed from a sample of size k drawn 2 from a normal distribution with sample mean ρ and variance s ρ of the mean given by the following equation (Eq. 4.21): ρl2 ∑= k(k − 1) l u v (4.21) where ρl is the perpendicular distance from the point (xl, yl) to the line. Symmetrical 1-β confidence limits are given by ρ ± t β / 2 sρ , in which tβ/2 is the β/2 percentage point of the t distribution with k-1 degrees of freedom. Thus fρ1 is defined as follows (Eq. 4.22): f ρ1 = ( − t 0.025 sρ , − t 0.16 s ρ , t 0.16 sρ , t0.025 sρ ) (4.22) The uncertainty of the real location of B due to the second group of factors is represented by a trapezoidal fuzzy set fρ2 (Eq. 4.23): f ρ2 = ( − k2 d, −k1 d,k1 d,k2 d), k2 > k1 (4.23) where k1 and k 2 are constants with a value that depends on the particular sensor and working environment, and they are calculated as bounds of the uncertainty due to these factors in normal working conditions. The d value is the average distance of the observations used to build the segment. Finally, the uncertainty in the real location of B due to the third group of factors is represented by a trapezoidal fuzzy set fρ3 (Eq. 4.24): f ρ3 = (− k4 a, − k 3a, k3 a, k4 a), k4 > k3 (4.24) where k 3 and k 4 are constants with a value that depends on the particular robot and terrain properties, and a measures the displacement of the robot since the last time its location was updated using an absolute positioning system. Since the three sources of uncertainty are mutually independent, the total uncertainty on the real location of the segment fρ is obtained by addition of the corresponding trapezoidal fuzzy sets (Eq. 4.25): f ρ = f ρ1 ⊕ fρ 2 ⊕ fρ 3 (4.25) A fuzzy segment is a segment in which information regarding the uncertainty on the parameter ρ has been added as a trapezoidal fuzzy set (Eq. 4.26): { } F = θ , ρ, fρ ,(xi , yi ),(x j , yj ),k (4.26) 162 HUMBERTO M ARTÍNEZ BARBERÁ Given a fuzzy segment F, the uncertainty of the real position has been represented previously as a fuzzy set fρ, and the uncertainty of the real orientation θ is now computed as a trapezoidal fuzzy set fθ (Eq. 4.27): f ρ = (− ρ0 , − ρ1 , ρ1 , ρ 0 ) (4.27) 2ρ 2ρ 2ρ 2ρ f θ = θ − atan( 0 ),θ − atan( 1 ),θ + atan( 1 ),θ + atan( 0 ) l l l l where l is the length of the segment. Each fuzzy segment provides information about the position and orientation of an object boundary, and also, about the uncertainty in the robot’s knowledge about these values. Since the segments are obtained from different sensors or from different positions of the robot, a key aspect of the process of map building is detecting and combining the segments that might have come from the same object boundary. Fuzzy segments that are found to be collinear are combined and their respective uncertainties are propagated in the integration process. Two fuzzy segments F1 and F2 are considered to be collinear if the following criterion is satisfied (Eq. 4.28): M( fθ1 , f θ2 ) ≥ 0.5 ∧ M( fρ1 , f ρ2 ) ≥ 0.5 (4.28) where M(X,Y ) = AX + AY AXY 2AX AY (4.29) where AX and AY denote the area enclosed by the fuzzy sets X and Y respectively, and AXY denotes the area of the intersection of X and Y. Once two collinear segments have been detected, they have to be combined to form a boundary and their associated uncertainty has to be propagated. A fuzzy boundary is a fuzzy segment obtained by successively combining all collinear segments that intersect. Given two collinear fuzzy segments F1 and F2, the combination is a new fuzzy segment Fr (Eq. 4.30): Fr = {θr ,ρr , fρr ,(xir , yir ),(x jr , y jr ).k1 + k2 } (4.30) where k1 ρ1 + k 2 ρ2 k1 k2 k fρ ⊕ k2 fρ2 fρr = (2 − M ( fρ1 , fρ 2 )) 1 1 k1k2 θr = k1θ 1 + k2θ 2 k1 k2 ρr = 163 (4.31) CHAPTER 3: SENSOR FUSION and (xir, yir), (xjr, yjr) are the extreme points of the perpendicular projections of (xi1, yi1), (xj1, yj1), (xi2, yi2), (xj2, yj2) on the line (θr, ρr). A fuzzy segments map is a collection FM (Eq. 4.32) of fuzzy boundaries. FM = {F1 ,F2 ,K, Fm} (4.32) The original work included a method to generate segments based on single sensor scans. The idea was to obtain some measurements form a single sensor. These measurements were then preprocessed to eliminate easily detectable misreadings and to group together smoothly changing measurements. Finally, each group of measurements was fitted to a straight line to obtain fuzzy segments. This approach (Figure 4.9) combines the information from different sensors at the segment level. If two segments from two different sensors are collinear, the segments are combined and merged into a new segment. Figure 4.9: Fuzzy segments map from single sensors. While the original approach generates good maps, it lacks the ability to combine measurements from different sensors at a sensor level. This makes the map generation process slow, because a lot of time is needed to obtain good sensor measurements to fit a straight line. The previous example (Figure 4.9) shows the trajectory followed by the robot (yellow line), the fuzzy segments (blue and grey lines), and some sonar scans (orange dots). As can be clearly seen, the bottom-right measurements could be fitted with a line, but as the points come from different sensors, the previous approach does not generate a fuzzy segment there. To overcome this problem, a new segment generation method has been developed. The new method builds and maintains a circular sensor buffer in a heuristic way, which is described below. Let n be the number of sensors, and let m be the number of measurements stored in the buffer for each sensor. The number of entries in the sensor buffer is t = n x m. When a series of new values for the n sensors is available, those measurements that are smaller than a given threshold δlen overwrite the oldest previously stored values sequentially. The sensor buffer is shown below (Figure 4.10), where light grey cells are the old values and dark grey cells are the most recent sensor readings. The buffer thus constructed serves to extract candidate points for the line fitting procedure. 164 HUMBERTO M ARTÍNEZ BARBERÁ Figure 4.10: Raw sensor buffer (left) and sensor buffer with segments (right). To generate a fuzzy segment, at least four points are required to be considered as a line. The algorithm to generate a line fitting is as follows: (s1) Initialisation. Let O = {o1 ,o2 ,K, ot } be the sensor buffer. Let P be the set of candidate points that constitute a segment. Let i be an arbitrary cell in the sensor buffer, which for simplicity is set to 1. Let δmax be the maximum distance between consecutive points on a line. Let δgap be the maximum number of outliers in a segment, that is, the number of cells in a sequence that do not belong to a line. Let δang be the maximum difference in orientation between two lines for them to be considered as collinear. (s2) First point. Let ilast be equal to i, let inext be equal to i + 1, and let g be equal to 0. The value oi is set as the first candidate point to form the line, and consequently P = {oi }. Increment the index i. (s3) Second point. If oi , P1 < δ max , then let ilast be equal to i, let P = P ∪ {o i }, increment i, and go to step s4. Otherwise, increment i and g. If g > δgap then let i be equal to inext, and go to step s2. Otherwise, repeat step s3. (s4) Third point. Let α be the orientation of the line P1 P2 , and β be the orientation of the line P2 oi . If o i , P2 < δ max and α− β < δang , then let ilast be equal to i, let P = P ∪ {o i }, increment i, and go to step s5. Otherwise, increment i and g. If g > δgap then let i be equal to inext, and go to step s2. Otherwise, repeat step s4. (s5) Extra points. Let u be the number of point in P. Let α be the orientation of the line Pu −1 Pu , and β be the orientation of the line Pu oi . (s5.1) If o i , Pu < δ max and α − β < δ ang , then let ilast be equal to i, let P = P ∪ {o i }, increment i. If i = t, then generate a fuzzy segment with P and go to step s2. Otherwise, repeat step s5. (s5.2) Otherwise, increment i and g. If g > δgap and u 4, then generate a fuzzy segment with P and go to step s2. Otherwise, if g > δgap then let i be equal to inext, and go to step s2. Otherwise, repeat step s5. 165 CHAPTER 3: SENSOR FUSION This method obtains heuristically more consistent segments in a given environment than the original approach while makes fewer movements. The method has been tested with the following values: n = 16, m = 10, δlen = 2 metres, δmax = 0.2 metres, δgap = 10, δang = 15º. An example shows below (Figure 4.11) the trajectory followed by the robot (yellow line), the fuzzy segments (blue and red lines), and some sonar scans (grey dots). As can be seen, the trajectory followed by the robot is much smaller than in the previous example (Figure 4.9) given the same environment, while there are many more consistent segments. This is a key point for achieving a good localisation (see next subsection). Figure 4.11: Fuzzy segments map from sensor buffer. 4.4.2 Fuzzy segments localisation The fuzzy segments map representation is used for self-localisation (Gasós and Martín, 1996a). While the robot is navigating in an environment in order to accomplish a task, the sensor observations are used to build a partial fuzzy segments map of the robot surroundings. This map is compared to the global fuzzy segments map, which has been previously constructed. When the matching results provide enough evidence of the coincidence of both maps, the robot location is updated, based on their differences. This localisation task is run as a continuous process during all the time the robot is moving in order to continuously update its position. The global map is built from scratch while keeping track of the areas that have been already visited. For this task a coarse binary grid (1 x 1 metre cells) represents the areas visited by the robot. While the robot moves through areas not yet visited, it adds new segments to the global map and the cells are marked as visited. When the robot arrives at visited areas, it adds the new segments to the local map. The local and global maps are compared until a significant number of local segments match those in the global map. Then, the transformation that brings both maps together is used to estimate the dead reckoning errors accumulated in the time span between the construction of the maps. In this case, the 166 HUMBERTO M ARTÍNEZ BARBERÁ local map is added to the global map, allowing changes in the environment to be completed and updated, and the process continues in the same way. One key point in the localisation process is the correct matching between the local and global map. If there are not enough perpendicular segments, the matching is not considered. In other words, if only parallel segments match, there is not enough certainty in the position and the localisation is not considered. Clearly, the greater the number of correct segments that are generated, the better the localisation results. This is the main benefit achieved with the segment generation scheme described in the previous subsection. Figure 4.12: Fuzzy segments localisation. An example is shown above (Figure 4.12), which represents the robot trajectory (yellow line), the current local map (blue segments), the visited areas map (light grey boxes), and the global map (dark grey segments). The important odometric error in the fifth complete round in the environment can be noticed. Previous errors were in the same magnitude. The localisation succeeds in finding the transformation between the local map and the global map. 4.4.3 Experiments and results Different experiments have been conducted to test the simultaneous map-building and self-localisation. One of these tests is shown below (Figure 4.13). The Quaky-Ant robot is simulated in autonomous mode (see chapter 6), and has to navigate from a given home position (bottom-right corner) to a goal location (top-left corner) and then return to the home position in an a priori unknown environment. The different slides show the evolution of the sensor buffer (top image) and the local and global fuzzy segments maps (bottom image). 167 CHAPTER 3: SENSOR FUSION a) Starting the run. b) In the corridor. c) Reaching the goal. d) Returning to the home. e) Crossing a doorway. f) Reaching the home. Figure 4.13: Simultaneous map-building and self-localisation. 168 HUMBERTO M ARTÍNEZ BARBERÁ The robot starts (position “a”) with no previous knowledge, and updates the global map while it is moving through unexplored areas (positions “b” and “c”). When the robot reaches a visited area it starts matching the local and global map (position “d”) and correcting the odometry (position “e”) until it reaches the home position (position “f”). 4.5 Conclusions Two navigation schemes have been developed to allow the Quaky-Ant robot to navigate in a priori unknown environments. The first procedure relies on the assumption of reliable odometry for short time and distance operations. Bearing this assumption in mind, the robot is provided with a map building method and a path planning algorithm, which are run simultaneously and thus the robot is required to build the map while it is moving, resulting in a continuous movement. This approach is based on a fuzzy grid map for map-building and an A* based algorithm for path-planning. The fuzzy grid map has been modified to accomplish the required task: the virtual sensors are used instead of sonars, the initialisation procedure has been changed, and some heuristics have been incorporated to modify the certainty of visited cells. The A* based planner has been defined with a new evaluation function to generate paths that leave enough space between the robot and the obstacles, and also to be efficient in time, so that the node evaluation is fast enough to run the planner at a high frequency. The navigation scheme has been successfully used in different, both simulated and real world, scenarios. The second procedure overcomes the limitation of the previous approach by adding a localisation method. This is accomplished by building fuzzy segments map to model the previously visited environment and the current sensed environment. The transformations needed to match these maps are used to correct the current robot’s position. The performance of the localisation method is greatly affected by the number and quality of the fuzzy segments. The segments generation procedure has been modified to make use of a sensor buffer, which produces more consistent segments. Although the motivation behind the navigation procedure was to make the overall work of the system more robust, it also proved to be capable of solving the SLAM (Simultaneous Localisation And Map-building) problem in the different, both simulated and real world, tests carried out. More tests are still needed to verify or prove in which situations or environments this set-up is valid for the SLAM problem. 169 Chapter 5 Control Architecture 5.1 Introduction Previous chapters have dealt with robot hardware, sensor processing and navigation algorithms. This chapter deals with frameworks to group those systems, concepts and algorithms into a working system by providing a processing structure. This section starts by introducing some concepts and previous works related to control architectures. Other sections then describe: • A new control architecture putting emphasis on its characteristics, and relating it to • • previous works. A new robot programming language presenting its syntax and semantics and a comparison with other robotic languages. A sample control system, which has been developed using the architecture and language proposed, intended to control the Quaky-Ant robot. Finally, some examples are presented using the development environment built upon those language and architecture, and some remarks about the implementation of the system are given. 5.1.1 Deliberative and reactive control architectures The operation of a mobile autonomous robot in an unstructured environment, as occurs in the real world, needs to take into account many details. A main function of the controller is to be able to operate under conditions of imprecision and uncertainty. For instance, the a priori knowledge of the environment is, in general, incomplete, uncertain, and approximate. Typically, the perceptual information acquired is also incomplete and affected by noise. Moreover, the execution of a control command is not completely reliable due to the complexity and unpredictability of the real world dynamics. 171 CHAPTER 5: CONTROL ARCHITECTURE Early mobile robots were more concerned with the process of actually getting the robot to move rather than with considering the limitations of the control architecture on the performance of the vehicle. As an example, the Stanford CART (Figure 1.2) was capable of performing limited point-to-point navigation in an unknown but static environment. The operational cycle (Moravec, 1990) started with the robot in a halted position, taking some pictures, and taking a long time to process them. Then it planned a new path, executed a little bit of it, and paused again. This process was repeated until the robot achieved its goal. The structure of the software that controlled the CART can be thought of as a sequence of separate computational units or modules, each one of which is processed serially with the output of one action forming the input to the next. Although the overall system performance seems low by modern standards, the structure of the underlying system architecture is the basis of many today’s mobile robot systems. This horizontal or functional decomposition of the robotic task forms the basis of deliberative architectures (Figure 5.1). A deliberative architecture (Nilsson, 1980) breaks the problem of driving the robot into separate functions or components, each of which must be processed in turn, with the output of one module acting as the input to the next. Traditional deliberative architectures are based on sensing followed by planning. This is also known as the sense-plan-act or sense-model-plan-act cycle. These deliberative architectures must deal with the problem of organising information and its representation within the robot. Two standard mechanisms have emerged to address this issue: hierarchical and blackboard systems. Figure 5.1: Deliberative architecture. Hierarchical systems decompose the control process by function. Low-level processes provide simple functions that are grouped together by higher-level processes in order to provide overall vehicle control. The NASA-NBS Standard Reference Model for Telerobot Control System Architecture (NASREM) is a classical example of a hierarchical control system (Albus et al., 1994), although it originally included both a hierarchical and a blackboard control architecture. The Autonomous Benthis Explorer (Bradley et al., 1990) underwater robot provides a another typical example of hierarchical control. It utilises a distributed hierarchical control architecture organised in two broad layers. The lower layer provides low-level control of the robot’s sensors and actuators, whereas the higher layer provides mission planning and operational activity monitoring of the lower level. 172 HUMBERTO M ARTÍNEZ BARBERÁ Blackboard systems (Hayes-Roth, 1985) decompose the control task into separate units and attempt to minimise the communication between units. Blackboard based systems rely on a common pool of information (or blackboard) shared by the independent computational processes that act on the blackboard. Fundamental to any blackboard-based system is a mechanism to provide efficient communication between the various computational processes and the blackboard. These systems provide a loose coupling between each subsystem, allowing the straightforward exchange of information between the subsystems, and they provide a clear and consistent representation of information that can be used by each individual subsystem. The Ground Surveillance Robot (GSR) system (Harmon, 1987), an autonomous robotic system designed to navigate from one known geographical location to another over unknown natural terrain, is a typical blackboard-based system. GSR uses a blackboard to represent information and to pass information from one software module to another. Although blackboard based systems offer a natural mechanism for sharing information to a number of parallel computational tasks, the shared database can lead to bottlenecks in terms of processing. In addition, the asynchronous nature of the blackboard system can make software development difficult and can lead to subtle timing errors between the various computational modules. By 1985 it was becoming clear that deliberative model based systems had many shortcoming. The standard criticism of these systems is the structure’s inability to react rapidly, for example in the case of an emergency. These systems tend to have a long latency time built into the system, although there have been attempts to address this by specifying temporal constraints (Tsotsos, 1997). In addition to latency problems, deliberative systems often make the traditional assumption that the world remains static between successive activations of the perception module. Conditions that violate this assumption can cause problems for the robot, for example moving obstacles. As quoted from Gat (Gat, 1998): “This problem often manifests itself as running researcher syndrome, characterised by having to chase the robot to push the emergency stop button after it makes a wrong turn.” Some researches in the mid 80s suggested that a different execution mechanism was needed (Payton, 1986). The reason: mobile robots were increasingly required to navigate and perform purposeful autonomous tasks in more complex domains, in real world like environments. These requirements demanded some reactive capacity in their navigation systems. Whereas deliberative systems are based upon planning, reactive architectures (Figure 5.2) directly couple sensors and actuators. The reactive control paradigm is based on animal models of intelligence: the overall action of the robot is decomposed by behaviour rather than by a deliberative reasoning process. Common to most reactive systems is that goals may not be represented explicitly, but rather they are encoded by the individual behaviours that operate within the controller. One key point in reactive systems is how to coordinate the different behaviours. There are two modes of behaviour coordination: cooperative and competitive. The cooperative model (Figure 5.4) provides the ability to concurrently use the outputs of more than one behaviour at a time. Since the behaviours are combined with a set of gains, the output tends to be smooth with sensor noise. However, it is prone to local minimum 173 CHAPTER 5: CONTROL ARCHITECTURE traps or deadlock situations arising from possible conflicts among the different behaviours. On the other hand, the competitive model (Figure 5.3) selects only one behaviour as the final output. Consequently, it bypasses the local minimum problem but at the expense of not being smooth with sensor noise and also having the problem of limit cycles. Figure 5.2: Reactive architecture. To address the coordination problem some different reactive architectures have emerged such as subsumption (Brooks, 1986), motor schema (Arkin, 1989), reflexive behaviour (Payton, 1986) and situated agents (Agre et al., 1987). Subsumption control systems (Brooks, 1986), perhaps the best known of the reactive architectures, consist of a number of behaviour modules arranged in a hierarchy. Different layers in the architecture take care of different behaviours. Lower layers control the most basic behaviours of the device, while the higher behaviour modules control more advanced functions. The overall structure is similar to hierarchical motor patterns as they are used in ethology. The subsumption architecture introduced the concept of providing levels of competence, where each level includes all earlier levels. The behaviours are organised in a static hierarchy in which lower levels of competence have an implicit priority over higher ones. Then, the final emergent behaviour of the system is the set of reactions that emerge from the various modules provided to achieve each of these levels of competencies. Lower level layers interface to higher level ones by suppressing the results of the higher level computations and superseding their results. That is, it is based on a competitive coordination model (Figure 5.3). Figure 5.3: Competitive behaviour coordination. 174 HUMBERTO M ARTÍNEZ BARBERÁ In its original presentation, subsumption is, at least ostensibly, not a radical departure from deliberative control at all, but rather an attempt to make deliberative architectures more efficient by applying tasks-dependent constraints to the subsumption layers. Where subsumption radically departs from deliberative models is in its repudiation of plans, and indeed of symbolic representation in general. Time-consuming computations like planning and world modelling generate an internal state whose semantics reflect world states, whether they are past, present or future. Deliberative approaches get into trouble when the robot’s internal state loses sync with the reality that it is intended to represent. The reactive solution to this problem is to minimise the use of internal state as much as possible. If there is no state, then it cannot lose sync with the world, sometimes stated as: “The world is its own best model.” Subsumption reached a pinnacle with a robot called Herbert, which was programmed to find and retrieve soda cans in an office environment. While Herbert’s capabilities were impressive even by today’s standards, it also appeared to represent the limits of what could be achieved with subsumption. Herbert was very unreliable and no subsumption-based robot seems to have ever matched its capabilities since. One possible cause of subsumption’s apparent capability ceiling is that the architecture lacks mechanisms for managing complexity. The years following the introduction of subsumption saw a profusion of new robot control architectures, some developed more or less independently (Arkin, 1990) and others introduced as a direct response to subsumption’s shortcomings (Rosenblatt et al., 1989). Motor schema systems (Arkin, 1989) are characterised by their neuroscientific and psychological plausibility, the absence of hard arbitration between behaviours, the fusion of behavioural outputs through the use of vector summation, and the inherent flexibility due to the dynamic instantiation and deinstantiation of behaviours on an as-needed basis. Motor schemas consist of individual motor behaviours, each of which reacts to sensory information obtained from the environment. The output of each schema is a velocity vector that represents the direction and speed at which the robot is to move for that schema given the current sensed view of the environment. Individual schemas are simply the vector output of a computation based on a local internal state and the current input sensor values of the robot. Each schema is straightforward to compute. At any point in time a given set of schema is active, and the outputs of the members of the active set of motor schema are combined as a normalised vector sum. That is, the behaviour coordination model is cooperative (Figure 5.4), opposed to the winner-takes-all approach of subsumption architecture. 175 CHAPTER 5: CONTROL ARCHITECTURE Figure 5.4: Cooperative behaviour coordination. 5.1.2 Hybrid control architectures If a reactive system can be built to provide control for a limited task, one obvious extension is to build different reactive modules for different portions of the robot’s mission and to swap between the different reactive modules as required. Various mechanisms have been proposed to allow the specification of which set of behaviours should be operational and for how long. These include BDL (Rochwerger et al., 1994), RAP (Firby, 1994), DAMM (Payton, 1986), RPL (McDermott, 1991), and many others. The Behaviour Description Language (BDL) was designed for the integration of multiple behaviours on a task-specific basis (Rochwerger et al., 1994) in an autonomous allterrain robot (based on an HMMWW platform) for driving on and off roads. Overall control of the robot is accomplished via a collection of independent perception-action processes that are executed concurrently. Each process converts sensory data into some type of action. A global blackboard is used for communication between the processes. The overall behaviour of the robot is defined by the set of processes currently running. BDL is based on an augmented finite state machine model. Transitions between states are controlled by external event, and associated with each state is a run set (the minimum set of behaviours to be executed) and a kill set (the behaviours that must not be executed when the robot is in this state). This model provides a mechanism to allow different sets of behaviours to be enabled under specific circumstances. The Reactive Action Packages system (RAP) provides an architecture to integrate and interface between reactive control modules (Firby, 1989)(Firby, 1994). It allows continuous reactive primitives to be grouped together and executed in sequence or in parallel to perform complex tasks. Underlying the RAP system is the concept that individual reactive behaviours are executed in parallel and the state of the individual behaviours, the robot and the environment can be queried by the RAP controller. On the basis of this state information and on asynchronous events generated by the reactive behaviours, different behaviours can be scheduled and sequenced to generate more complex plans. These systems, in general, are not taskable, that is, it is not possible to change the task they perform without rewriting their control program. This is due to the fact that the 176 HUMBERTO M ARTÍNEZ BARBERÁ behaviour combination mechanism implicitly specifies the task. The natural evolution of these systems is what is called hybrid architectures, which combine reactive controllers and deliberative modules. At least three different groups of researchers working more or less independently came up with very similar solutions to the tasking problem at about the same time (Connel, 1992)(Gat, 1991)(Bonasso, 1991). All three solutions consisted of control architectures that comprised three main components: a reactive feedback control mechanism, a slow deliberative planner, and a sequencing mechanism that connected the first two components. Connel’s sequencer was based on subsumption, Bonasso used the REX/GAPPS system (Kaebling, 1987), and Gat’s was based on RAP. Bonasso’s group later adopted RAP as their sequencing component, while Gat’s sequencer was developed into the new language ESL (Gat, 1997). The RAP-based architecture is called 3T, Connell’s subsumption-based architecture is called SSS, and Gat’s architecture is called ATLANTIS. Figure 5.5: Three-Layer hybrid architecture. These architectures, known as three-layer architectures (Gat, 1998), organise algorithms according to whether they contain no state, contain state reflecting memories about the past (world models), or contain state reflecting predictions about the future (plans). The three-layer architecture consists of three components (Figure 5.5): a reactive feedback control mechanism, a reactive plan-execution mechanism, and a mechanism for performing timeconsuming deliberative computations. These components run as separate computational processes. This is most easily accomplished by using a multitasking or multithreading operating system, but it can also be done by carefully coding the code so that the components are interleaved within a single computational process. Several names have been proposed in the literature for these layers. For instance, in the 3T architecture they are called skill layer, sequencing layer, and planning layer, and in the ATLANTIS architecture they are called controller, sequencer, and deliberator. In the context of this thesis they are referred as controller, sequencer, and planner. The controller consists of one or more execution threads that implement one or more feedback control loops, tightly coupling sensors to actuators. The transfer function(s) computed by the controller can be changed at run time. Usually the controller contains a library of transfer functions called basic reactive behaviours. Which ones are active at any given time is determined by an external input to the controller. Classic examples of basic reactive behaviours are wall following, collision avoidance, moving to a close destination, etc. There are several architectural constraints on the algorithms that go into the controller: 177 CHAPTER 5: CONTROL ARCHITECTURE • Computing one iteration of the transfer function should be constant-bounded time and • • • space complexity, and this constant should be small enough to provide stable closed-loop control for the desired behaviour. Algorithms in the controller should be designed to detect any failure to perform the function for which they were designed. Rather than attempting to design algorithms that never fail (which is impossible on real robots), it is more worthwhile to design algorithms that never fail to detect a failure. This allows other components of the system to take corrective actions to recover from the failure. The use of internal state should be avoided whenever possible. An important exception to this rule is filtering algorithms, which can be used in the controller although they rely on internal state. If internal state is to be used for other purposes, it should expire after some constant-bounded time. Internal state in the controller should not introduce discontinuities in a behaviour, in the mathematical sense. It is the responsibility of the sequencer to manage transitions between regimes of continuous operation. A number of special-purpose languages have been developed for programming the controller, like ESL (Gat, 1991). Most of the special-purpose languages for programming the controller were developed at a time when robots could support only very small processors for which no other development tools were available. One current trend is simply to program the controller in C. The job of the sequencer is to select which basic reactive behaviour, or set of basic reactive behaviours, the controller should use at a given time, and to supply parameters for the behaviour. By changing primitive behaviours at strategic moments, the robot can be lead into performing useful tasks. The problem is that the outcome of selecting a particular basic reactive behaviour (or set) in a particular situation might not be the intended one, and so a simple linear sequence of behaviours is unreliable. The sequencer must be able to respond conditionally to the current situation, whatever it might be. One approach to the problem is to enumerate all the possible states the robot can be in and pre-compute the correct primitive to use in each state for a particular task. In some constrained environments this may be a tractable problem. However, this universal plan approach has two drawbacks: it is often not possible for a robot to know its current state, and it does not take into account the previous history, which often contains useful information. A practical alternative is conditional sequencing, which is a more complex model of plan execution motivated by human instruction following. Conditional sequencing systems include constructs for responding to contingencies and managing multiple parallel interacting tasks. It is possible to construct a conditional sequencing system in a traditional programming language like C, but because the control constructs are so much more complex than those provided by such languages, conditional sequencing is much more effectively done with special purpose languages. These include RAP (Firby, 1989), PRS (Georgeff et al., 1987), REX/GAPPS (Kaelbling, 1987), Behaviour Language (Brooks, 1989), or ESL (Gat, 1997). There are two major approaches to the design of conditional sequencing languages. They can be complete languages on their own (e.g. RAP and PRS) or they can be layered on 178 HUMBERTO M ARTÍNEZ BARBERÁ top of a syntactically extensible programming language like LISP (e.g. Behaviour Language and ESL). In any case, the sequencer should not perform computations that take a long time compared to the rate of environment change at the level of abstraction presented by the controller. Exactly how long they take depends on both the environment and the repertoire of behaviours. Usually this implies that the sequencer should not perform any search or temporal projection. The planner is the locus of time-consuming computations. Usually this means such things as exponential search-based algorithms, but it could also include polynomial-time algorithms with large constants (such as certain vision processing algorithms). The key architectural feature of the planner is that several behaviour transitions can occur between the time a deliberative algorithm is invoked and the time it produces a result. There are no architectural constraints on algorithms in the planner, which usually are written in standard programming languages. The planner can interface to the rest of the system in two different ways. It can produce plans for the sequencer to execute (e.g. 3T architecture) or it can respond to specific queries from the sequencer (e.g. ATLANTIS architecture), although these two approaches are not mutually exclusive. 5.1.3 Behaviour definition The three-layer architecture specifies how to organise different algorithms and impose some constraints on them. One key point is how to formulate the behaviours that compose the controller while sticking to the restrictions imposed: constant-bounded execution time, failure detection, no internal state usage, and continuous transfer function. In fact, behaviours can be viewed from a classical control point of view: they have to provide closed-loop feedback control (Figure 5.6). There are many formulations to deal with the reactive behaviour formulation problem, namely from control theory, potential field based methods, neural networks, and fuzzy logic, to cite just a few. Examples and the applicability to mobile robotics of some of them are presented below. Figure 5.6: Typical closed-loop feedback based behaviour. Fuzzy logic provides tools that are of potential interest to mobile robot control (Saffiotti, 1999). Most applications of fuzzy logic in this field concern the use of fuzzy control techniques to implement individual behaviour units. Fuzzy controllers are a convenient choice when an analytical linear model of the system to be controlled cannot be easily obtained. Fuzzy controllers have shown a good degree of robustness in the face of large variability and 179 CHAPTER 5: CONTROL ARCHITECTURE uncertainty in the parameters, and they lend themselves to efficient implementations, including hardware solutions. These characteristics fill the requirements for reactive behaviours well: a mathematical model of the environment is usually not available, sensor data is uncertain and imprecise, and bounded time execution (that is, real time operation) is of essence. For this reason fuzzy control has been the first application of fuzzy logic techniques in this domain since Sugeno and Nishida’s model car (Sugeno et al., 1985), and it is still the most common (Voudouris et al., 1994)(Surmann et al., 1995). Fuzzy logic has later been applied for path-tracking (Sánchez et al., 1997)(Ollero et al., 1998), simulated (Biris and Shen, 1997) and real world navigation (García-Alegre et al., 1993)(Topalov et al., 1998) (Xu and Tso, 1999), low-level car motor control (Reyes et al., 1999), obstacle avoidance (Stanev et al., 1998)(Martínez et al., 1994), and wall-following (Urzelaiz et al., 1997)(García-Rosa and García-Alegre, 1990). Fuzzy modelling, as a complement to the conventional modelling techniques, has become an active research topic and found successful applications in many areas. However, most present fuzzy models have been built based only on operator's experience and knowledge, but when a process is complex there may not be an expert (Wang et al. 1996). This can be the case when developing fuzzy behaviours for robot control, for example when there are many input variables and the control of the plant is not simple. In this kind of situation the use of learning techniques is of fundamental importance. The problem can be stated as follows. Given a set of data for which we presume some functional dependency, the question arises whether there is a suitable methodology to derive fuzzy rules from these data which characterise the unknown function as precisely as possible. Recently, several approaches have been proposed for automatically generating fuzzy if-then rules from numerical data without domain experts (Wang and Mendel, 1992)(Ishibuchi et al., 1995)(Joo et al., 1997)(Gómez-Skarmeta and Jiménez, 1997a)(Gómez-Skarmeta and Jiménez, 1997b). Mobile robotics applications using fuzzy modelling include, for instance, wall-following (García-Cerezo et al., 1997), obstacle avoidance (Fukuda and Kubota, 1999), moving obstacles avoidance (Pratihar et al., 1999), parking of truck and trailer (Denna et al., 1999), and robot stabilisation (Kwon et al., 1998). Neural networks have an important application area in the field of mobile robotics because of their adaptive and learning properties (Kröse and van Dam, 1997). Neural networks can be used for the control of systems with unknown or complex behaviour, where the operator’s experience can be used to obtain a set of examples to train the network. In the case of robotics, neural networks are applied for the control of nonlinear systems because of their ability to learn a nonlinear model from examples. Moreover, neural networks can be adaptive, in the sense that they continue learning during operation. This allows for continuous improvement of the controller while in operation. One of the first examples of neural sensormotor mapping is the Autonomous Land Vehicle in a Neural Net, ALVINN (Pomerleau, 1991). The neural network controlled the CMU’s Navlab vehicle on a road-following task without having a model of the road. It learned associations between visual patterns and steering commands. There are many successful uses of neural networks for controlling mobile robots in the literature (Opitz and Lernfahrzeug, 1990)(Glasius et al., 1995)(Nguyen and Widrow, 1990)(Aguilar y Contreras, 1994)(Zalama et al., 1995). Nevertheless, the effectiveness of the neural networks is completely restricted by the training patterns, since the robot is unable to react properly to an environment which does not fall in the space specified 180 HUMBERTO M ARTÍNEZ BARBERÁ by the training patterns, on account of the poor exploration property of neural networks (Xu and Tso, 1999). Potential field methods (PFM) for obstacle avoidance have gained increased popularity among researchers in the field of mobile robots. PFM methods are based on the idea of imaginary forces acting on a robot (Andrews and Hogan, 1983)(Khatib, 1985). In these approaches obstacles exert repulsive forces onto the robot, while the target applies an attractive force to the robot. The sum of all forces, the resultant force, determines the subsequent direction and speed of travel. The popularity of PFMs is due to their exhibiting some advantages: PFMs are simple and elegant, while they can be implemented quickly and initially provide acceptable results without requiring many refinements. Early implementation of PFMs was in purely reactive robots (Brooks, 1986)(Arkin, 1989) and in off-line systems for path planning (Thorpe, 1985). The generalised PFM were proposed as an attempt to combine local and global path planning (Krogh, 1984)(Krogh and Thorpe, 1986). One interesting implementation of PFM is the virtual force field (VFF) method (Borenstein and Koren, 1989). The VFF method is based on the force field (FFM) method, another early PFM implementation (Brooks, 1986)(Brooks and Connel, 1986), and the certainty grid (see the section on maps in chapter 4)(Moravec and Elfes, 1985). As the robot moves, range readings are taken and projected into the certainty grid, called histogram grid. The algorithm uses a small square window of the grid, such that the robot is centred in the window. Each occupied cell inside the window applies a repulsive force to the robot, which is inversely proportional to the square of the distance. Thus the resultant repulsive force is the vectorial sum of all individual repulsive forces. At any time during the motion, a constant magnitude force pulls the robot towards the target. Finally, the resultant of the vectorial sum of the global repulsive force and the goal’s attractive force is used as the reference for the robot’s steering rate command. Despite the initial soundness of the VFF method, there are four problems that are inherent to PFMs, independently of their particular implementation (Koren and Borenstein, 1991): • • • • They produce trap situations due to local minima. They fail to find a passage between closely spaced obstacles. There are oscillations in the presence of obstacles. There are oscillations in narrow passageways. The experience with PFMs leaded to the development of the vector field histogram (VFH) method for point robots (Borenstein and Koren, 1991) and later for non-point robots (Borenstein and Raschke, 1992). The VFH method builds the histogram grid the same way as VFF does. However, VFH introduces an intermediate data representation, called the polar histogram, that retains the statistical information of the histogram grid, but reduces the amount of data that needs to be handled in real-time. The polar histogram is an array, where each cell represents an arc sector. The certainty values of the active window are mapped onto the polar histogram, and the resulting values can be viewed as the instantaneous polar obstacle density. The algorithm then computes the required steering direction in the same way 181 CHAPTER 5: CONTROL ARCHITECTURE as the VFF, from the goal’s orientation and the polar histogram. Some modifications to the original VFH have been published recently, namely the VFH+ (Ulrich and Borenstein, 1998) and the VFH* (Ulrich and Borenstein, 2000) methods. There are several more approaches to the PFM problem in the literature. Some of them introduce fictitious goals or attractors (Maravall et al., 2000), and others are based on different concepts, like variable thermal conductivity (Wang and Chirikjian, 2000). 5.1.4 Behaviour coordination Behaviour-based control systems have been a major corner stone in mobile robotics, and have achieved notable success when incorporated in hybrid architectures. Since the first appearance of behaviour-based approaches, researchers have noticed the importance of the problem of behaviour coordination: how to coordinate the simultaneous activity of several independent behavioural components to obtain an overall coherent behaviour that carries out the intended task. The firsts approaches were based on a competitive coordination model (Brooks, 1986) (Payton, 1986) (Firby, 1987). A common example to illustrate that this model may sometimes be inadequate is the following: a robot navigates following a path and suddenly encounters an obstacle. At this point it has two options, to turn to the left or to the right. This choice may be indifferent for the obstacle avoidance behaviour, but from the path following perspective one might be dramatically better than the other. In competitive coordination schemes, usually, the obstacle avoidance behaviour knows nothing about it, and therefore, takes an arbitrary decision. Of course, a more elaborate coding of the behaviours could solve the problem, but at the expense of computational cost, behaviour maintenance due to lateral effects, or behaviour independence. In order to overcome these deficiencies several researchers have proposed cooperative coordination schemes that allow parallel execution of different behaviours, as in motor schemas (Arkin, 1989), and context-dependent blending (Saffiotti, 1997b). The behaviour coordination problem can be divided into two conceptually different sub-problems: how to decide which behaviour should be activated at a time (behaviour arbitration), and how to combine the result from different behaviours that are simultaneously active (command fusion). The behaviour arbitration policy determines which behaviour or set of behaviours should influence the operation of the robot at each moment, and thus ultimately determines the task actually performed by the robot. Subsumption based architectures rely on a fixed arbitration policy, hard-wired into a network of suppression and inhibition links (Brooks, 1986). These simple arbitration strategies that only allow one behaviour at a time to be active may incur in serious limitations. Hence, a further step ahead is to decide which behaviours are active dynamically, based on both the current goal and the environmental contingencies, but without allowing the concurrent execution of behaviours (Payton, 1986)(Firby, 1987). Both fixed and dynamic arbitration policies can be implemented using the mechanisms of fuzzy logic (Saffiotti, 1997b). Fuzzy logic provides some advantages: the ability to express partial and concurrent activations of behaviours, and the smooth transition between behaviours. The first appearances of fuzzy logic for arbitration were based on fixed priority schemes (Berenji 182 HUMBERTO M ARTÍNEZ BARBERÁ et al. 1990)(Pin and Watanabe, 1994). More flexible arbitration policies were obtained using fuzzy meta-rules, or context rules, where more than one behaviour can be activated at a time and their output fused (see below for a discussion on output fusion). Fuzzy context rules appeared to switch between flight modes in a fuzzy-controlled unmanned helicopter (Sugeno et al., 1993) and to combine behaviours in the robot Flakey (Saffiotti et al, 1993). Fuzzy context rules have been applied successively and with success in an increasing number of robots (Voudouris et al., 1994)(Surmann et al., 1995)(Tunstel, 1996)(Goodridge and Luo, 1994)(Michaud et al., 1996). It should be noted that the use of fuzzy context rules is independent of the way in which individual behaviours are implemented, that is, behaviours do not need to be fuzzy. Some researchers have proposed coordination schemes where several behaviours can be concurrently activated. Command fusion relates to how the outputs of the different active behaviour are combined to produce just one output. Initial approaches to command fusion were based on vector summation methods (Arkin, 1989)(Khatib, 1986). In these approaches, a force vector represents each command, and commands from different behaviours are combined by vector summation. The robot responds to the force resulting from the combination. In this case, fuzzy logic can be effectively used to perform a more general form of weighted command fusion. The first proposal of fuzzy command fusion appeared as a reaction to the existing single behaviour arbitration schemas (Rosenblatt and Payton, 1989), and later it was re-stated in terms of fuzzy logic (Yen and Pfluger, 1995). Several other authors have used fuzzy logic to perform fuzzy command fusion. Some of these actually implement a trivialised form of fuzzy command fusion, using weighted singletons as fuzzy outputs and CoG defuzzification (Goodridge at a., 1994)(Goodridge et al., 1997), or symmetric rectangles and CoG defuzzification (Pin and Watanabe, 1994). Both these methods can be shown to be equivalent to a vector summation scheme. The majority of authors have opted for full implementation of fuzzy command fusion, through the combination of arbitrary fuzzy sets followed by a defuzzification step (Yen and Pfluger, 1995)(Saffiotti et al., 1993)(Tunstel, 1996)(Voudoris et al., 1994)(Michaud et al., 1996)(Michaud, 1997). The most general form of behaviour combination using fuzzy logic is obtained by using both fuzzy behaviour arbitration (with fuzzy context rules) and fuzzy command fusion (with fuzzy combination rules). This approach subsumes all the approaches listed as special cases, and it is known as context-dependent blending or CDB in short (Saffiotti, 1997b). The idea of context-dependent blending was initially suggested by Ruspini (Ruspini, 1991) and later developed by Saffiotti (Saffiotti et al., 1993). CDB can be characterised as follows: • Each behaviour generates preferences from the perspective of its goal. • Each behaviour has a context of activation, representing the situations where it should be • • used. The preferences of all behaviours, weighted by the truth value of their contexts, are fused to form a collective preference. One command is chosen from this collective preference. 183 CHAPTER 5: CONTROL ARCHITECTURE Figure 5.7: Context-dependent blending as a hierarchical fuzzy controller. The preferences are represented by fuzzy sets, which correspond to control actions generated by fuzzy controllers. Contexts are represented by formulae in fuzzy logic, which serve as the antecedent in the fuzzy arbitration rules. Fusion and choice are respectively performed by a fuzzy combination operator and by defuzzification. CDB can be implemented in a hierarchical fuzzy controller (Figure 5.7), where the behaviours correspond to the lower level rule bases and the context rules correspond to the meta-rule base of the hierarchical controller. This is the implementation originally used in the Flakey robot (Saffiotti et al., 1993). Hierarchical fuzzy controllers have been proposed and applied to building complex robot behaviours using CDB (Voudouris et al., 1994)(Congdon et al., 1993). When using CDB, behaviours take their input from the local perceptual space (LPS), a common storage structure that stores perceptual data, interpretations of these data, and the representation of subgoals created by the planner. This way, the use of the LPS makes the integration of reactivity and long-term planning possible (Saffiotti, 1993). The planner generates a set of subgoals that are later exploited by the controller to orient the robot’s activity. In general, CDB can be used to execute a full plan for action, represented by a set of context rules. The rules tell which behaviours should be used in each situation. CDB is used to blend the recommendations from different concurrent behaviours, each weighted by the corresponding context. Fuzzy context rules provide a flexible means to encode behaviour arbitration strategies. Like fuzzy control rules, context rules allow to write complex strategies in a modular way using a logical format. Fuzzy command fusion can then be used to combine recommendations from concurrent behaviours. The resulting context-dependent blending is strictly more general than other coordination schemes commonly used in robotics. The fact that the same format is used for control rules and the arbitration rules allows increasingly complex behaviours to be written in a hierarchical fashion. Following its implementation in Flakey, CDB has been used by several researchers in autonomous mobile robots, both in real robots (Goodridge and Luo, 1994)(Surmann et al., 1995)(Voudouris et al., 1994)(Arrúe et al., 1997) and simulators (Tunstel, 1996)(Michaud et al., 1996)(Aguirre et al., 1998). It is interesting to note that learning techniques have also been applied to CDB (Bonarini and Basso, 1997). A detailed discussion is included in chapter 7. 184 HUMBERTO M ARTÍNEZ BARBERÁ Several other methods exist: case-based reasoning (Ram et al., 1997), utility fusion (Rosenblatt, 2000), neuro-fuzzy approaches: general architecture (Li et al., 1997), simplistic behaviour linear combination (Benreguieg et al. 1998), to cite just a few. 5.1.5 Multiagent systems In recent years, the term agent (Rusell and Norvig, 1995)(Hendler, 1999) has grown in significance in many different fields, and especially within the artificial intelligence community (AI). Initially the term intelligent agent was applied to planning systems that interact with a changing environment in real time. The underlying idea was that agents have deliberative capabilities while being able to work in real world like applications. When things like e-commerce appeared, the term bounded rational agent was used to refer to systems with decision-making capabilities that are restricted in terms of time or computational resources. The underlying idea was that agents are capable of taking decisions on real world like applications as well. This new idea was particularly well received and reinforced an important part of the agents literature. Meanwhile, both the embedding of an agent in an environment and the notion of a decision-making agent come to the forefront when working with mobile robots, particularly autonomous mobile robots that operate in natural environments. As the capabilities of such robots improved (they seemed more intelligent), more agent terminology entered the robotics literature. Thus roboticists also started using the term agent to refer their work. The underlying idea was that agents are also physical systems that take decisions on real world like applications. Many current agent architecture ideas, particularly those of multiple-level architectures and emergent behaviour, grew largely from the robotics field. The field of artificial intelligence has recognised some key aspects (although they are not required simultaneously) related to the notion of agent: changing environments, decisionmaking capabilities, and physical systems. Much of the work on artificial intelligence is related somehow to agents. Hendler relates an interesting case on the effect of agents in artificial intelligence (Hendler, 1999): “David Miller […] was fond of telling government agencies that they should stop funding AI (artificial intelligence) and start funding IA (intelligent agents).” Now, several other communities are using the term agent to refer to many different concepts. For instance, with the growth of Internet applications a new model of software, that takes encapsulation beyond the object level and gives software modules a temporal extent, has raised. These agents are particularly useful when combined with mobile code. When these mobile agents are working through the net to perform purposeful tasks for their users, some new terms appear: search agent, which helps a user to find things on the net; routing agent, which helps packets to move faster; and information-filtering agent, which looks for particular conditions in a network or data resource, and alerts the user or takes other actions. In addition the term agent also relates to advanced user interfaces. Thus, the term agent, even 185 CHAPTER 5: CONTROL ARCHITECTURE limited to an information technology context, refers to many different software concepts. Although, agents have been deployed in many domains, the functions that researchers are exploring have some commonalities. In this thesis the term agent is used to refer to decisionmaking entities that operate in real world like environments to control mobile robots, that is, the robotics and artificial intelligence domain. The distributed artificial intelligence (DAI) field has existed for less than two decades. Traditionally, distributed artificial intelligence is broken into two sub-disciplines: distributed problem solving (DPS) and multiagent systems (MAS). The main topics considered in DPS are information management issues such as task decomposition and solution synthesis. For example, a constraint satisfaction problem can often be decomposed into several not entirely independent sub-problems which can be solved on different processors. Thus these solutions can be synthesised into a solution of the original problem. Multiagent systems aim to provide both principles for construction of complex systems involving multiple agents and mechanisms for coordination of independent agents. Thus, MAS allow the sub-problems of a constraint satisfaction problem to be subcontracted to different problem solving agents with their own interests and goals. MAS are applicable in many domains (Stone and Veloso, 2000), particularly when there are different elements with distinct (possibly conflicting) goals and proprietary information and when a system to handle their interactions is needed. MAS are also conceivable in systems that are not distributed, for example a domain that is easily broken into components so that independent tasks can be handled by separate agents. Another benefit of multiagent systems is their scalability. Since they are inherently modular, it should be easier to add new agents in a multiagent system than it is to add new capabilities to a monolithic system. Moreover, from a programmer’s point of view the modularity of multiagent systems can lead to simpler programming, thus enhancing performance and/or reliability (following the “keep it simple, stupid!” principle). In the mobile robotics field, MAS are particularly useful both in single robot systems and in multi-robot systems. As this thesis is on single-robot systems, the latter case is not considered. For example, the three-layer architecture can be broken into different agents that can be laid out as a multiagent system. This topic is discussed in the following section. Several taxonomies, which make use of more than three dimensions, have been presented for DAI (Decker, 1987) and MAS (Parunak, 1996). Typical MAS have coarse agent granularity and high-level communication, while they vary across the whole range in the dimensions of heterogeneity and distributed control. In fact, degree of heterogeneity is a major MAS dimension, and all the methods of distributing control appear in the literature as major issues. Recently, Stone and Veloso presented a new MAS taxonomy based solely on heterogeneity and communication characteristics (Stone and Veloso, 2000). In all these cases, communicating agents (be they homogeneous or heterogeneous) need a framework within which to communicate (Brenner, 1998). They should define both a communication/coordination method, which defines how agents communicate with each other to share information (the architecture of the communication infrastructure), and an agent communication language, which defines what agents communicate to the others (the format of the messages that are interchanged). 186 HUMBERTO M ARTÍNEZ BARBERÁ There are a number of fundamentally different communication/coordination methods (Brenner, 1998)(Cabri et al., 2000), but only non-mobile approaches are considered. The simplest case is, when referring to software-based agents, is the invocation of an agent’s procedure by another agent. The invoking agent explicitly uses calling parameters to inform its communication partner of a specific wish and of its intentions. The return values of the procedure represent the answer of the addressed agent. Depending on where the addressed agent is physically located, the procedure invocation can take place either locally or remotely. This simple method lacks the ability to support complex communication patterns. The second method arises out of DAI and is known as the blackboard metaphor. The blackboard (Hayes-Roth, 1985) represents an extension to the agenda of traditional AI systems and rule-based expert systems. A blackboard provides all agents within a MAS (Kirn, 1996) with a common work area in which they can exchange information, data, and knowledge. The blackboard can be used both for task sharing, where an agent requests other agents for help in solving specific subtasks, and result sharing, where an agent which has solved a specific subtask communicates with other agents to make its results available. No private areas exist within simple blackboards. All agents have access to every information item and they have the complete responsibility in deciding which information they actually access. For example the Ambit system is a MAS formal model based on a blackboard for agent coordination (Cardelli and Gordon, 1998). Blackboards provide a very flexible concept for the cooperation and communication of problem-solving units. However, the central structure of a blackboard is a growing obstacle to the continuing expansion of network-centric applications. All agents registered on the blackboard are forced to place their information directly on the blackboard, irrespective of their position in the network. In some cases this might produce a bottleneck because of high network load. The third method of agent communication, message passing, tries to overcome this limitation. In this case, the messages that agents exchange with each other can be used to establish communications and cooperation mechanisms using defined protocols. In message passing systems an agent, called the sender, transfers a specific message to another agent, called the receiver. In contrast to blackboard systems, messages are directly exchanged between two agents. One important issue is that all the agents have to know the semantics of the content of the messages. An agent should not treat a received message just as a fact, but rather analyse it for implicitly associated actions and intentions. The speech-act theory (Austin, 1962) attempts to cater for these demands. A speech act designates a message that contains not only a true or false statement but also exercises a direct influence on its environment by causing changes within its environment. The communication within a typical MAS does not usually consists just of sending unrelated messages, and dialogs take place, in which an exchange of several related messages takes place between two or more conversation partners. For example D’Agents is a Java-based MAS which uses a message-passing scheme for agent coordination (Karnik and Tripathi, 1998). Although message-passing communication based MAS are more efficient in terms of network bandwidth usage, agents must know somehow about the others, their capabilities and their knowledge. Otherwise, in blackboard based MAS, agents do not need to know each other, but only the blackboard. The use of one of these methods is dictated by the current application. There are other possible coordination strategies in the field of MAS, mainly 187 CHAPTER 5: CONTROL ARCHITECTURE related to mobile agents, for instance meeting oriented coordination and Linda-like coordination (Cabri et al., 2000), that will not be discussed here because they are outside the scope of this thesis. Just as the term agent has been overused, the term agent communication language (ACL) refers to several different concepts. Hendler proposes a classification of ACLs attending to their level of use (Hendler, 1999): • Performative. This level provides the protocols with which agents can establish some • • • form of knowledge interchange. Many current proposals, for standard ways for agents to communicate, focus primarily on the design of unambiguous performative systems. Service. When agents work across a distributed network, they might need to advertise their capabilities, find agents that can provide capabilities which they cannot, or both. This level provides the infrastructure upon which such tasks can be performed. Content. Once agents are communicating (using performative terms and having found each other through the service advertisements), they need a deep level of problem-solving knowledge, often domain-dependent. The content level is in charge of providing a framework so that different agents can understand or extract knowledge from the messages. Control add-ins. Where multiple agents might compete for resources, they need mechanisms that let them represent or reason about their own computational needs or those of other agents. This level is in charge of providing a framework for the exchange of information related to their processing requirements (or even to negotiate mobility). Many different languages have been used for the exchange of information and knowledge between applications (sockets, remote procedure calls, remote method invocation, etc). What distinguishes ACLs from such attempts are the objects of the discourse and their semantic complexity (Labrou et al., 1999). ACLs stand a level above such methods for two reasons: ACLs handle propositions, rules, and actions instead of simple objects with no semantics associated with them, and an ACL message describes a desired state in a declarative language, rather than a procedural method. When using an ACL, agents transport messages over a lower level protocol (for example TCP/IP, HTTP, IIOP, etc). The ACL itself defines the types of messages, and their meanings, which agents can exchange. Currently, most well known ACLs are KQML, with its many dialects and variants, and FIPA ACL. KQML (Labrou and Finin, 1994)(Labrou and Finin, 1997) is a high level, message oriented communication language and protocol for information exchange independent of content syntax and applicable ontology. Thus, KQML is independent of the transport mechanism (TCP/IP, IIOP, etc), independent of the content language (KIF, SWL, PROLOG, etc), and independent of the ontology assumed by the content. Conceptually there are three layers in a KQML message: content, communication, and message. The content layer bears the actual content of the message in the program’s own representation language. Every KQML implementation ignores the content portion of the message, except to determine where it ends. The communication layer encodes a set of features to the message that describe the lower-level communication parameters, such as the identity of the sender and recipient, and a unique identifier associated with the communication. The message layer, which encodes 188 HUMBERTO M ARTÍNEZ BARBERÁ a message that one agent would like to transmit to another, is the core of KQML. This layer determines the kinds of interactions one can have with a KQML-speaking agent. The message layer’s primary function is to identify the network protocol with which to deliver the message and to supply a speech act or performative that the sender attaches to the content. This speech act indicates whether the message is an assertion, a query, a command, or any other set of a set of known performatives. In addition, since the content is opaque to KQML, the message layer also includes optional features which describe the content language, the ontology it assumes, and some type of description of the content. The Foundation for Intelligent Physical Agents ACL (FIPA ACL), like KQML, is based on speech-act theory (Austin, 1962): messages are actions or communicative acts, as they are intended to perform some action by virtue of being sent (FIPA, 1997). The FIPA ACL specification consists of a set of message types and the description of their pragmatics, that is the effects on the mental attitudes of the sender and receiver agents. FIPA ACL is superficially similar to KQML. Its syntax is identical to KQML except for different names for some reserved primitives. Thus, it maintains the KQML approach of separating the outer language (that of the message format) from the inner language (that of the content format). In FIPA ACL the communication primitives are called communicative acts, instead of performatives as in KQML. 5.2 The BGA control architecture In this thesis a new control architecture, called BGA, is proposed. BGA is a multi-agent architecture that combines both reactive and deliberative agents. These agents play an important role as they are the structural elements of the architecture, and BGA uses a blackboard scheme as the mechanism to communicate the different agents. Agents can be layered and distributed according to their functionalities, and in this way some examples are presented using a three-layer hybrid architecture schema. As some of these agents may implement different reactive behaviours, the fusion of their outputs is accomplished by a new behaviour blending method derived from the context-dependent blending. This section introduces the antecedents of the BGA architecture, an early study on reactive architectures for small micro-robots (MadCars’s architecture). Then the BGA architecture is described in detail, with emphasis on its structure and how the different elements are interconnected. Finally, there is also a discussion on the characteristics of the architecture and how it relates to previous works and concepts (those described in the previous section). 5.2.1 Antecedents: MadCar’s architecture Early works on the BGA architecture were based on reactive architectures for small micro-robots, when the available robotics platform was the non-holonomic robot called 189 CHAPTER 5: CONTROL ARCHITECTURE MadCar (see chapter 2), which possesses four sonar sensors, one wheel encoder and one electronic compass. The intended task was a simple safe-wander while maintaining a given magnetic heading. The MadCar’s processing unit is a small 8 bit microcontroller (MC68HC11) which is programmed using a pre-emptive multi-tasked interpreted C based language (Interactive-C). Purely reactive architectures provide a valuable framework when working with robots with a reduced set of sensors and low processing capabilities. The reactive process of perception, behaviour generation and behaviour fusion can easily be implemented in such small systems. Moreover, these architectures (compared to deliberative ones) perform well when there is not enough information to model the world. For instance, MadCar’s processing unit has 32 Kb of RAM and a clock at 4 MHz. Evidently, with such features, an occupancy grid with A* path-planner was not viable (see chapter 4). A first look into existing reactive architectures showed that subsumption (Brooks, 1986) and motor schema (Arkin, 1989) did not provide any means of communication between the different behaviours, at least from a theoretical point of view (particular implementations may share some global information). In addition when the control system becomes complex, the modification of individual behaviours without introducing side effects is very difficult. This is related to granularity. For instance, in the following example (Figure 5.8) the granularity of the behaviours is high, implying that some tasks are unnecessarily duplicated. A careful division of the two behaviours into a set of more elaborated ones can solve the problem at the expense of obtaining a more complex behaviour inhibition/suppression network. Battery Level Dock IR Detector Sonar Avoid S Motors Figure 5.8: Sample subsumption based behaviours. Blackboard systems (Hayes-Roth, 1985) provide a common poll shared by independent computational processes and a communication mechanism between them. Blackboard architectures typically have three different elements: • The blackboard: structure that stores information, which typically is implemented as • • shared memory. The blackboard is the only way to share data between the different processes. The knowledge bases (KB): the different computational processes, each one possibly implementing its own problem resolution method. The control module (CM): the process in charge of sequencing and/or arbitrating the execution of the different KBs. 190 HUMBERTO M ARTÍNEZ BARBERÁ It has been shown that blackboard based architectures are well suited for the integration of multiple paradigms and real-time execution (Vranes and Stanojevic, 1995). Real-time can be achieved assigning a given quantum to every KB. Then, the CM can schedule the execution of the different KB to meet the desired real-time constraints. In principle, blackboard systems can be classified as deliberative architectures (see the section above). With some minor modifications, mainly in the way that the different KBs are specified, hybrid architectures can be developed using the blackboard paradigm as the basis, and thus reactivity can be achieved. The architecture proposed for MadCar (Figure 5.9) was based on this hybrid blackboard-based architecture (Gómez-Skarmeta and Martínez-Barberá, 1997), where the different KBs are now called behaviours. The main differences with respect to standard blackboard systems are: • Most of the behaviours are direct mappings between sensor perceptions and motor • • actions, and thus they are referred as reactive behaviours. These exhibit bounded-time execution, so the system’s latency can be sufficiently small, with a low granularity being obtained as well. The control module is no longer explicit, that is, there is no special process in charge of activating/deactivating the different behaviours. The responsibility lies in the different behaviours, which are in charge of specifying which should be the following behaviour to be executed. This schema is similar to that of subsumption behaviour network. This implicit control module is implemented using a priority-based stack, which stores pointers to the behaviours that are to be executed (called events). When a behaviour is running it pushes events onto the stack. When a behaviour finishes running, the new behaviour scheduled for execution is that with the highest priority at the top of the stack. Additionally, there are two asynchronous processes that manage sensory-motor information. The sensing module is in charge of acquiring sensory information, processing it, and then signalling important events (like collision against a wall). Events are generated by inserting behaviours with high priority into the stack. The motor module is in charge of implementing the closed-loop control of the physical robot’s actuators. Figure 5.9: MadCar’s hybrid architecture. Following this architecture, MadCar’s task (a safe-wander behaviour while maintaining a given magnetic heading) is implemented using six behaviours (Figure 5.10), a sensing module, and a motor module. Both the Forward and Backward behaviours are direct mappings between the desired forward/backward velocity and the corresponding traction motor controls. The Turn behaviour is a direct mapping between the desired heading and the corresponding steering motor control. The Collision behaviour checks for the activation of bumper sensors and then proposes a new direction to follow (it commands the robot to move 191 CHAPTER 5: CONTROL ARCHITECTURE to the opposite direction to the current one). The Path behaviour, using the information produced by the compass, tries to maintain the desired robot heading. Lastly, the Avoid behaviour performs obstacle avoidance, using range information produced by the sonars, and modifies the desired speed and heading. The sensing module implements the interface to fire the four sonar modules. The motor module implements a MISO fuzzy inference system, with trapezoidal sets and singleton outputs. Its rule base, which takes as input the desired heading and speed, produces control actions that are then applied to the steering and the traction motors using PWM signals. Figure 5.10: MadCar’s behaviours implementation (des = desired). To illustrate how the stack-based controller is implemented, some pseudo-code listings are shown. The sensing module (Listing 5.1) operates as an asynchronous process that reads in raw sensory data (read function), processes and filters them (filter function), and then writes data onto the blackboard (write function). In case of a collision, the module puts a high priority event (HIT) into the blackboard (add function). do forever { read read read read sonars; encoder; compass; bumpers; filter sonars; write encoder; write compass; write bumpers; write sonars; if (collision) add (HIT, PRI_CRITICAL); } Listing 5.1: Sensing module. The stack control (Listing 5.2) runs as another asynchronous process that continually pops events from the stack (FORWARD, BACKWARD, HIT, ..., etc.), using a priority based extraction method (extract function), and then executes their corresponding reactive behaviours (parse_forward, parse,_backward, ..., etc.). add (FORWARD, PRI_NORMAL); do forever { 192 HUMBERTO M ARTÍNEZ BARBERÁ switch (extract ()) { case FORWARD: parse_forward (); break; case BACKWARD: parse_backward (); case REVERSE: parse_reverse (); case HIT: parse_hit (); case STEER: parse_steer (); } } Listing 5.2: Stack control process. When a behaviour is executed, it occurs in the same thread as the stack control process. Behaviour implementations should do different tasks. They should add the required events to the stack, perform their own processing, and then send commands to the motor control module. For example, the collision reaction behaviour (Listing 5.3), depending on where the robot collided (front, back, left or right), drives the robot in the opposite direction by pushing events on to the stack, i.e. add (REVERSE, PRI_NORMAL), and sending commands to the steering motor, i.e. servo (RIGHT). if (collision | HIT_MASK_FORWARD) { add (REVERSE, PRI_NORMAL); add (BACKWARD, PRI_NORMAL); add (REVERSE, PRI_NORMAL); } else if (collision | HIT_MASK_BACKWARD) { add (REVERSE, PRI_NORMAL); add (FORWARD, PRI_NORMAL); add (REVERSE, PRI_NORMAL); } else if (collision | HIT_MASK_LEFT) { servo (RIGHT); add (current_sense, PRI_NORMAL); } else if (collision | HIT_MASK_RIGHT) { servo (LEFT); add (current_sense, PRI_NORMAL); } Listing 5.3: Collision reaction behaviour. The behaviour to drive the robot forward (Listing 5.4) is very simple. First it signals two events (that will be processed afterwards), and then it sets traction motor speed (motor function) to full forward. add (FORWARD, PRI_NORMAL); add (STEER, PRI_NORMAL); motor (FULL_FORWARD); Listing 5.4: Drive forward behaviour. 193 CHAPTER 5: CONTROL ARCHITECTURE The proposed architecture enabled MadCar to achieve the intended task successfully, despite its mechanical and electronic (see chapter 2) constraints. Some facts affected the operation of MadCar negatively: • The sonar subsystem was not very reliable in operation, producing lots of false readings • • due to voltage drop of the sonar battery. The minimum speed was extremely high compared to the sonar firing rate. The steering system was all-or-nothing, producing jerky paths. Although these problems caused many collisions in highly cluttered environments, the reactive nature of the system and its ability to drive the non-holonomic platform backwards performed acceptably well. The architecture had an important drawback, inherent to most reactive architectures: the behaviour coordination scheme. It was very easy to implement but suffered, as subsumption and others do, the tasking problem. To change the robot’s intended task, it is necessary to re-code the behaviour sequencing scheme, by introducing new events in the stack. This is not hard as long as the task is not complex. Nevertheless, complex tasks need to take into account the inter-dependence of many behaviours and side effects do occur (Saffiotti, 1997b), and thus this simple mechanism does not scale appropriately in this scenario. As a result, the BGA architecture was proposed to address more complex tasks, integrating reactivity and deliberation. 5.2.2 BGA: a multiagent architecture MadCar’s reactive architecture was well suited for small systems of few sensors and which exhibit simple reactive behaviours, and, in fact, it was used successfully both in MadCar and La-Garra. But the more evolved Quaky-Ant robot (see chapter 2) allow for new and more complex tasks, and consequently, the architecture was modified to meet the following requirements: • Seamless integration of reactive (i.e. obstacle avoidance) and deliberative modules (i.e. • • • path-planning), each being a semiautonomous and independent entity that is in charge of a given task, so that encapsulation, isolation, and local control can be achieved. There is no assumption about the knowledge each subsystem has or the resolution method it implements. In addition, extensive use of fuzzy logic should be made. Flexible and dynamic organisation, allowing the layout of modules depending on processing needs and enabling and disabling modules at run-time. Learning techniques are incorporated in the architecture at different levels (behaviours, blenders, tasks). For a discussion of learning in BGA see chapter 7. In this context and to achieve the desired objectives, the initial simple blackboard based architecture was developed into a multi-agent system that uses a blackboard to share 194 HUMBERTO M ARTÍNEZ BARBERÁ information. The new architecture, called BGA, is built using three different computational basic units: • Behaviour: the behaviour is the simplest computational unit in BGA. Behaviours are • • concurrent processes that have a near constant bounded execution time. Behaviour units are intended for reactive purposes. Blender: when there are two or more behaviours, the blender is in charge of behaviour coordination, both behaviour arbitration and command fusion. The blender is not mandatory, although it is recommended. Task: the task is the more complex computational unit in BGA. Tasks are asynchronous processes that do not have constant bounded execution time. Task units are intended for deliberative purposes. Figure 5.11: BGA architecture. Using behaviours, blenders, and tasks as the basic building blocks, the BGA architecture ( Figure 5.11) is composed of the following different types of elements: • Agent: an agent is simply a canvas that serves to group behaviours, blenders and tasks. An • • agent should not simultaneously contain behaviours and tasks, and thus two types of agents can be distinguished: reactive agents, those that only contain behaviours, and deliberative agents, those that only contain tasks. Agents run in an asynchronously distributed fashion. Blackboard: the blackboard serves as the inter-connection infrastructure that links the different agents of the system. All the agents use a common protocol to communicate to the blackboard. This communication can be realised using network or shared-memory based resources. Sensor processing: sensor processing units are in charge of accessing the different raw sensor signals, filtering and processing them, and then sending that data to the blackboard. There is typically only one sensor processing module in the architecture. 195 CHAPTER 5: CONTROL ARCHITECTURE • Actuator control: actuator control units are in charge of accessing the blackboard to read motor commands and then sending them to the different actuators. There is typically only one actuator control module in the architecture. 5.2.2.1 Re active agents A key idea in BGA is that each reactive agent is decomposed into very simple behaviours. These access the BGA blackboard by way of some input and output variables. As two different behaviours may modify the same variable, a behaviour coordination method is needed to arbitrate behaviours and fuse commands. BGA blenders are based on the contextdependent blending concept, a cooperative behaviour coordination method. In addition to the characteristics depicted in section 5.1.4, this method presents some advantages related to scalability and learning. In competitive behaviour coordination methods, like subsumption (Brooks, 1986) and MadCar’s architecture (see previous subsection), the selection of which behaviour is active depends on how they are implemented (Gómez et al., 1999c). If the overall task needs to be changed, some effort needs to be applied in changing the suppression/inhibition network (subsumption) or modifying the order events are pushed onto the stack (MadCar’s architecture). Nevertheless, with cooperative approaches the management resides on the fusion method, which makes the process of defining individual behaviours a simple task, while much effort was to be applied to the definition of the blending rules. It may appear to be a lot more work, compared to competitive approaches, but this is not so. Firstly, once the behaviours and blending rules have been defined it is not very difficult modify the individual behaviours or the rules. In this case, if the overall task needs to be changed, it often only requires the adding of additional behaviours and blending rules. Secondly, if learning techniques are used in conjunction with context-dependent blending, the modification of the overall task only requires the adding of the required behaviours and the retraining of the blending rule base (Gómez et al., 1999c). For a detailed discussion on the learning of blending rules see chapter 7. 196 HUMBERTO M ARTÍNEZ BARBERÁ Figure 5.12: BGA reactive agent. A BGA reactive agent (Figure 5.12) is composed of three kinds of modules that are placed sequentially inside the agent's execution loop: • A common module, which is executed before any behaviour, builds or updates the local • • perceptual space (LPS) of the agent (see section 5.1.4). A set of behaviours that are executed concurrently. These are standard reactive modules that directly map input (from sensors or LPS) to actions. A blender module, a new behaviour blending method that is based on the contextdependent blending (CDB), which specifies what behaviours are active and how the outputs of these behaviours are fused (see section 5.1.4). The reactive agent’s execution loop starts in the common module. It reads sensory information, performs whatever processing is needed, and then builds or updates the LPS. All the modules that compose the agent share this LPS. When the common module ends, all the behaviours are executed concurrently, providing their computation time is small and bounded. They can use both information stored in the blackboard and the LPS. When finished, they produce some output that should be stored in the blackboard. Instead of accessing directly the blackboard, the blending module selects which outputs will be taken into account and how they will be fused. The BGA blending mechanism works as follows. At development time, each behaviour is provided with a given fixed weight (wgt), which determines how important the output of the behaviour is (actually it is used to weight different outputs). Each behaviour, based on its sensory input (blackboard and LPS), produces actions or output variables (ov) that are meant to be put in the blackboard. Instead of working directly with output variables, behaviours work with local copies of the output variables (lov). The blender implements a fuzzy metarule base (crisp rule bases are merely particular cases), where each rule (eq. 5.1) has inputs (iv) from the LPS as the antecedent and degrees of activation of some behaviours (beh) as the consequent. IF iv1 is ISET 1 and … and ivn is ISET n THEN beh1 is OSET1 and … and behm is OSETm (5.1) where: • • • • ivi is the ith input variable. ISETi is the ith fuzzy set for the ith input variable. beh j is the jth behaviour of the rule. OSETj is the jth fuzzy set for the jth output variable. The fuzzy inference mechanism then produces the degree of activation of the different behaviours. These degrees define the behaviour arbitration schema. When the degree for a certain behaviour is zero, then the behaviour is not active. If the degree is one, the behaviour is fully active. In any other case, the behaviour is only partially activated. The last step in the 197 CHAPTER 5: CONTROL ARCHITECTURE blending procedure is the command fusion (eq. 5.2). The local copies of the output variables are weighted taking into account both the weight and the degree of activation of the behaviour that produced them. α j = wgtj ⋅ behj ∑ α ⋅ lov = ∑α j ovi (5.2) ij j j j where: • • • • ovi is the ith output variable. lov ij is the local copy of the ith output variable in the jth behaviour. wgtj is the weight of the jth behaviour. beh j is the result of the fuzzy inference for the jth behaviour. These fuzzy context rules provide, as they do in CDB, a flexible means of encoding behaviour arbitration strategies. Like fuzzy control rules, context rules allow complex strategies to be written in a modular way using a logical format. In addition, context rules overcome the limitations of typical fixed arbitration strategies, like subsumption, in which relationships between behaviour are hard coded in advance, and thus, the modification of one behaviour produces side effects because of behavioural dependence. These context rules provide a framework to define independent behaviours and a way to combine them. While the activation degrees produced by the context rules vary according to environmental changes, fixed weights act as scaling factors. Each particular application will dictate which combination of weight and inference is the best (to use only weights, to use only inference, or to combine both). Weights differentiate the BGA blending scheme from CDB, playing their role by assisting in the development of the control system, and expressing the natural importance of the behaviours. Some behaviours appear to be more important than others, for instance, it seems rational that the behaviour in charge of collision-avoidance has a higher weight than the behaviour in charge of path-following. The developer sets in advance the overall importance of each behaviour. This can be viewed, in some sense, as a competitive coordination approach, because these weights are fixed a priori. When the behaviours are running, the LPS is modified based on the sensory input, and the blending mechanism adapts or reacts to the environment by changing the degree of activation of the different behaviours. This can be viewed as a cooperative coordination approach, because the degree of activation of the different behaviours dynamically changes according to the environment, and some behaviours are active at the same time. Thus, the BGA behaviour coordination mechanism mixes both the competitive and the cooperative parts to produce the final result. When all the behaviour’s weights are set to the same value, the result is approximately equivalent to the CDB method, and in some cases, the same result could be achieved with CDB as long as the context rules are carefully designed, at the cost of losing the expressiveness of the weights. Weights are also important when learning techniques are used in the development of the 198 HUMBERTO M ARTÍNEZ BARBERÁ blending rule base. Instead of performing a blind search over a wide parameter space, weights allow the learning mechanism to converge faster because the search space is reduced. Of course, this approach works because it uses the developer’s knowledge about the problem (the importance of the different behaviours). When this knowledge is unavailable (or partially unavailable), the method reduces to the CDB scheme. Chapter 7 includes a detailed discussion on how learning techniques are applied to BGA blending. It is important to note here that there is a subtle difference, regarding defuzzification, between BGA blending and CDB: the order in which defuzzification is performed when behaviours are implemented using fuzzy rules. The simplest approach is to perform defuzzification at each behaviour, just before command fusion takes place (Figure 5.13a). In this case each behaviour has the same importance, irrespective of the shape of its output fuzzy sets, and the output is averaged. Figure 5.13a: Defuzzification before command fusion. The more complex approach is that of CDB, which is intended for behaviours implemented as fuzzy inference systems that do not perform the defuzzification step (Figure 5.13b). The output variables of the different behaviours are fuzzy sets instead of crisp values (as in the latter example), and the defuzzification takes place just after the command fusion step (Saffiotti, 1997). In this case the shape of the behaviours does matter, and in fact influences the output of the system. Figure 5.13b: Defuzzification after command fus ion. Finally, the approach of the BGA blending scheme, which is intended for behaviours not only implemented as fuzzy inference systems, tries to make a trade-off between the two previous schemes (Figure 5.13c). The first exhibits the simplest implementation while the second has a more appealing or “natural” response. In this case defuzzification is performed at each behaviour, just before command fusion takes place, as in the first example. It is here 199 CHAPTER 5: CONTROL ARCHITECTURE where weights play their role, by weighting the output of the behaviours. Now different behaviours, when activated, influence the overall system with different strengths. Figure 5.13c: Defuzzification before weighted command fusion. In CDB the outputs of the different behaviours reflect both preferences (the commands to be executed) and desirability measures for these preferences (the shapes of the output fuzzy sets). Combining preferences thus uses more information than a simple vector-weighting schema. This desirability depends on the current behaviour interests (it depends on the behaviour’s goal and the sensory input), and it dynamically changes as the behaviour changes its preferences. In BGA blending, the outputs of the different behaviours reflect only preferences, and the weights of the behaviours are fixed all the time and depend on the overall importance of the behaviours (in some sense they no longer are desirability measures, but have a similar effect as fixed ones). As has been discussed above, this is by no means more restrictive, and in fact, it directly expresses some of the developer’s knowledge about the behaviours. 5.2.2.2 Deliberative agents BGA deliberative agents perform typical deliberative tasks, like reasoning, planning, and world modelling. Just as reactive behaviours use the concept of local perceptual space, deliberative tasks use the extended perceptual space (EPS). The EPS is a storage structure that stores global and high-level perceptual data, interpretations of these data, and the representation of goals and subgoals. Thus, deliberative agents take their input both from the blackboard and their EPS and put their output both in their EPS and on the blackboard. Tasks can also share their EPS or parts of them through the blackboard, allowing for more decentralised computing paradigms. An example of EPS can be the associated information for a grid map and a path-planner. In this case the EPS stores the position of the goal and it is updated with the fusion of low-level sensory information (sonars, infrareds, or other range sensors), and the resulting heading of the path-planner. Occasionally, a deliberative agent may want to communicate some command to a reactive agent. The deliberative agent takes the appropriate information from its EPS, and then sends the command to the blackboard. The reactive agent, at each execution cycle updates its LPS with information from the blackboard. This is how the BGA architecture enables the asynchronous transmission of commands between reactive and deliberative agents. In short, the LPS is related to data of local nature, while the EPS is related to data of global nature. 200 HUMBERTO M ARTÍNEZ BARBERÁ A BGA deliberative agent (Figure 5.14) is composed of two kinds of modules that are placed sequentially inside the agent's execution loop: • A common module executed before any task and which usually updates one of some of • the extended perceptual spaces (EPS) of the different tasks. A set of tasks that are executed sequentially. These are standard deliberative modules that do not directly map input to actions, but instead model the world (through the use of the EPS) or reason about the world. Figure 5.14: BGA deliberative agent. The deliberative agent’s execution loop starts in the common module. It reads data from the blackboard, performs whatever processing is needed, and then builds or updates one or some of the EPS. All the modules that compose the agent have their own EPS (although they can share them by using the blackboard). When the common module ends, the first task in the list is executed. It uses information stored in the blackboard and its EPS to perform a deliberative computation. When finished, the following task in the list performs its processing. Deliberative agents do not need to perform time-bounded computations, and their tasks can be arranged hierarchically using the task sequencing mechanism. Thus traditional AI deliberative schemes, like the sense-plan-act model, can be achieved. One important point that differentiates reactive and deliberative agents is that tasks do not have explicit arbitration schemes. It is the designer’s responsibility to avoid conflicts between the different tasks. For example, a typical case where two deliberative agents, using different techniques, produce commands or goals to reactive agents. These goals can be contradictory, and a direct access to the blackboard might produce undesirable results. The solution is that these agents do not access the same variable in the blackboard, but instead use their own variables in the blackboard. Then a third agent can combine or fuse the results of these two agents and effectively perform command fusion at the agent level. Following this 201 CHAPTER 5: CONTROL ARCHITECTURE example, it is easy to see that the BGA deliberative agent approach is by no means restrictive and can group a variety of possible arbitration and computational schemes. 5.2.2.3 Discussion BGA relies on and takes advantage of various well-developed concepts. It is based on a blackboard architecture as the mechanism to communicate the different subsystems. These subsystems are based on the notion of agent, which plays an important role as the structural element of BGA. Agents can be layered and distributed according to their functionalities, and thus different cognitive models can be implemented. In following sections an example based on the three-layer hybrid architecture schema is presented. As some of these agents may implement different reactive behaviours, the fusion of their outputs is accomplished by a blending method derived from the context-dependent blending. The main contributions of the BGA architecture are the following: • BGA effectively groups different concepts and paradigms of control architectures (namely • blackboard architectures, hybrid architectures, and multi-agent systems) by extending and combining their characteristics. BGA incorporates a new blending mechanism that is based on the context-dependent blending that effectively addresses the behaviour coordination problem. Blackboard architecture (Hayes-Roth, 1985) is a proven mechanism to coordinate different processing elements. It is simpler to implement and more efficient than a message passing system when coordinating loose-coupled subsystems. Moreover, the blackboard architecture does not assume any special problem solving technique in the elements that it coordinates (Vranes et al. 1995). It is important to note that when hard timing constraints exist blackboard architectures are not appropriate for the coordination of distributed systems (although there are some real-time single processor systems successfully working which use blackboard architectures). Under this paradigm, in BGA, a series of elements shares the available information about the system and the environment. Each element can read the relevant information from the blackboard, perform its own processing, and then write each possible result onto the blackboard. These elements run in the same or different machines as asynchronous processes (or execution threads) allowing an effective distributed architecture. In BGA, these elements are in fact agents that are capable of taking decisions and also of serving as a way to modularise and organise the robot control software. In this way these BGA elements can also be considered intelligent agents (Hendler, 1999), and both reactive and deliberative agents can coexist in BGA. Additionally, the BGA architecture has a sensorprocessing unit and an actuator-control unit, which can be viewed as specialised agents. BGA architecture can be classified as a heterogeneous communicating multiagent system (Stone and Veloso, 2000), as there are agents of different natures that communicate with the others via the blackboard. 202 HUMBERTO M ARTÍNEZ BARBERÁ Reactive agents can exhibit different reactive behaviours simultaneously. Hence, they need to rely on a behaviour blending mechanism to address the behaviour coordination problem, by providing a way to arbitrate the behaviours (enabling and disabling their outputs, both totally and partially) and to fuse the commands they produce (by somehow weighting the outputs). The BGA blending mechanism is based on the context-dependent blending concept (Saffiotti, 1997) modified to fit into the architecture and augmented to allow for and assist in the use of learning techniques (see chapter 7). The distinction between reactive and deliberative agents allows for a layered distribution of functionalities in the architecture. Although BGA does not impose any constraints on the organisation of agents, the usual way to organise agents is to follow the three-layer architecture (Gat, 1998) schema. A three-layer architecture can be implemented in BGA by using only one deliberative agent (planner layer), which implements some tasks, and one reactive agent, which implements some behaviours (controller layer) and contains a behaviour blender (sequencer layer). In this case, a BGA instance is functionally equivalent to a three-layer architecture. In a typical BGA system there are usually more than two agents (see following sections and chapters), and the corresponding agent organisation can be viewed as an augmented and distributed three-layer architecture, where the planner, controller and sequencer layers are partitioned and distributed across the different agents. 5.2.3 Inter-agent communication in BGA The BGA architecture relies heavily on the communication services provided by the blackboard (Figure 5.11). Agents communicate by sending data to the blackboard and receiving data from the blackboard. Thus, the BGA architecture needs an ACL to support this data interchange, and both KQML and FIPA ACL are, in principle, good candidates. They are almost identical with respect to their basic concepts and the principles they observe (Labrou et al., 1999)(Vasudevan, 1998). The two languages differ primarily in the details of their semantic frameworks. In one sense this difference is substantial: because of the different semantic framework it would be impossible to come up with exact mappings of transformations between KQML performatives and their completely equivalent FIPA primitives, or vice versa. Both languages assume a basic non-commitment to a reserved language, although in FIPA ACL an agent must have some limited understanding of a semantic language to process a received message properly. Another difference between the two ACLs is in their treatment of the registration and facilitation primitives. In KQML these tasks are associated with performatives that the language treats as first-class objects. FIPA ACL, intended to be a pure ACL, does not consider these tasks in their own right. Instead, it treats them as requests for action and defines a range of reserved actions that cover the registration and life-cycle tasks. Moreover, FIPA ACL does not provide facilitation primitives. There are two important points regarding the use of FIPA ACL and KQML as the ACL for the BGA architecture: 203 CHAPTER 5: CONTROL ARCHITECTURE • From a functional point of view, both ACLs provide more facilities that those really • needed for BGA. BGA itself is not intended as a general-purpose agent architecture, but an architecture for integrating reactive and deliberative agents in a mobile robotics framework. This way, the number of performatives required in BGA is limited and application specific. There are not many implementations of FIPA ACL, but there are many implementations of KQML. Moreover, some of these KQML implementations are for Java. One of the requirements of the BGA architecture is that it needs to be platform independent, and the way to achieve this is through the use of Java. Although FIPA ACL seems to be the natural successor to KQML, it was decided to use KQML as the ACL for the BGA architecture for the above reasons. There are three important Java based agent development systems which implement KQML, namely JATLite, Jackal, and JAFMAS. These are described below. JATLite 2 (Java Agent Template Lite), developed at Stanford, is a package of Java programs that allows users to create communicating agents quickly. Agents run as applets launched from the browser, and for that reason all agents register with an agent message router facilitator that handles message delivery. This way, JATLite is a centralised system, and peer-to-peer communication between agents is not possible. Jackal3, developed at the University of Maryland, is a Java package that provides a comprehensive communications infrastructure for Java-based agents, while maintaining maximum flexibility and ease of integration (Cost et al., 1998). Jackal, which strongly emphasises conversations between agents, provides an environment for a planning and scheduling project for manufacturing (CIIMPLEX). It provides a flexible framework for designing agents around conversations and includes extensive support for registration, naming, and control of agents, through its own protocol (KNS). Conversation specifications can be downloaded and employed in the moment in Jackal's multi-threaded conversation space. A blackboard style interface provides the agent application with a maximally flexible interface to the message stream. JAFMAS4 (Java-based Agent Framework for Multi-Agent Systems), developed at the University of Cincinnati, is a set of Java classes that supports the implementing of communicating agents (Chauhan, 1997)(Chauhan and Baker, 1998). JAFMAS supports directed (point-to-point) communication as well as subject-based, broadcast communications. JAFMAS, which also includes some support for conversations, provides a working environment for AARIA, a planning and scheduling project for manufacturing (Parunak et al., 1997). 2 http://java.stanford.edu 3 http://jackal.cs.umbc.edu/Jackal 4 http://www.ececs.uc.edu/~abaker/JAFMAS 204 HUMBERTO M ARTÍNEZ BARBERÁ Although both Jackal and JAFMAS are good candidates for general-purpose MAS systems, the KQML implementation in BGA relies on JATLite for the following reasons: • JATLite is an agent router centred implementation, instead of decentralised • implementations as Jackal and JAFMAS. This is not a drawback by any means because BGA is a blackboard architecture and all the agents are in communication with the blackboard. This way, agents do not need to know how to reach others, and those decentralised capabilities are not needed. JATLite is the simpler implementation, providing basic methods for constructing and parsing KQML messages. JATLite does not provide conversation support, as Jackal and JAFMAS do. The BGA architecture does not require all the features of a full generalpurpose KQML-based MAS, and hence, the “the simpler the better” principle supports the use of JATLite. There is a drawback common to all three implementations of KQML: they use TCP as the message transmission protocol (although Jackal and JAFMAS can use multicast-UDP for agent discovery). TCP (Tanenbaum, 1996) is a robust protocol designed to provide reliable end-to-end byte stream over an unreliable internetwork. An internetwork differs from a single network because different parts may have wildly different topologies, bandwidths, delays, packet sizes, and other parameters. This reliability is achieved at the expense of a very high latency compared to UDP (Tanenbaum, 1996), which is a connectionless transport protocol, which provides unreliable end-to-end byte stream. The BGA architecture is intended for mobile robotics applications, where agents typically run in a single network, and response times are very important. Since lowering the latency lowers the response time, it was decided to re-implement the JATLite transport layer to support both shared memory communication (between threads in a simple Java Virtual Machine) and UDP communications. A BGA connection is a one-way stateless link between an agent and the blackboard, implemented as a UDP socket (Comer and Stevens, 1993). Each connection is referred by the server-side address and UDP port. For a full-duplex transmission to occur between the blackboard and an agent, two connections are needed. Thus, each connection is used to send data from the client side to the server side of that connection. There is a facilitator process (following KQML terminology) which is aware of the existence of agents and their respective network addresses, and is in charge of maintaining a list of available connections. In BGA the blackboard acts as the facilitator (Figure 5.15). 205 CHAPTER 5: CONTROL ARCHITECTURE Figure 5.15: BGA transport layer infrastructure. When the BGA system starts up, the blackboard process starts to listen on a given UDP port, and each agent starts to listen on its own local UDP port as well. All the agents are provided with the address and port which the blackboard process is listening to. When an agent starts running, it should send a message (CONNECT) to the blackboard to indicate that it is available. The blackboard process then updates its internal connections table with the agent’s local UDP port, so that the blackboard can later send any answer to that agent. When an agent stops running, it should send a message (DISCONNECT) to the blackboard to indicate that it will be no longer available. There are two additional messages available to the agents: sending data to the blackboard (PUT) and receiving data from the blackboard (GET). These three types of BGA messages are implemented using the KQML (Labrou and Weber, 1993)(Labrou, 1997)(Labrou and Finin, 1997) performatives register, unregister, tell, and ask-if. Both the tell and ask -if performatives need the specification of the ontology and the language used for the content field. In this case, all the messages currently assume a given ontology named ROBOT and a content description language called BGCL. The BG Content Language (BGCL) is based on the s-expressions of the LISP language, which are balanced parenthesis lists (it is the same syntax as KQML itself). The ROBOT ontology specifies the semantics of the elements of the BGCL. Each element of a BGCL list should be a tuple containing, at least, two elements: a variable name and its value. The CONNECT message is implemented using the performative register. In the example below (Listing 5.5), the agent named Agent-A wants to notify that it has just started running, and thus it simply sends the register performative to the Blackboard process. (register :sender :receiver :reply-with Agent-A Blackboard none ) Listing 5.5: CONNECT message. The DISCONNECT message is implemented using the performative unregister. The KQML standard specifies that every unregister performative should correspond to a previously issued register performative. Agents are registered the first time they send a register message to the facilitator (that is, the blackboard), and thus, the in-reply-to field is not taken into account and can be left blank. When the facilitator receives an unregister 206 HUMBERTO M ARTÍNEZ BARBERÁ performative, it simply eliminates the sender from the connection list and makes no additional processing. In the example below (Listing 5.6) the agent named Agent-A wants to end and simply sends the unregister performative to the Blackboard process. (unregister :sender :receiver :in-reply-to Agent-A Blackboard none ) Listing 5.6: DISCONNECT message. The PUT message is implemented using the performative tell. This message is utilised to modify the values of a list of variables on the blackboard. In the example below (Listing 5.7), the agent named Agent-A wants to modify two variables corresponding to the speed and direction the motors should run. The agent forms a valid BGCL list with the names of the variables and the desired values and then sends a tell performative to the blackboard including that list in the content field. (tell :sender :receiver :language :ontology :content Agent-A Blackboard BGCL ROBOT ((motL 50.0) (motR –35.0)) ) Listing 5.7: PUT message. The GET message is implemented using the performatives ask -if and tell. This message is utilised to get the values of a list of variables from the blackboard. In the example below (Listing 5.8), the agent named Agent-A wants to receive the current value of two variables corresponding to the speed and direction the motors are running. The agent forms a valid BGCL list with the names of the variables and dummy values, and then sends an ask -if performative to the blackboard which includes that list in the content field. The blackboard then replies using a tell performative, including in the content field the list of the required variables that do exist with their corresponding values. (ask-if :sender :receiver :reply-with :language :ontology :content Agent-A Blackboard id1 BGCL ROBOT ((motL 0.0) (motR 0.0)) ) :sender :receiver :in-reply-to :language :ontology :content Blackboard Agent-A id1 BGCL ROBOT ((motL 50.0) (motR –35.0)) ) (tell Listing 5.8: GET message sequence. 207 CHAPTER 5: CONTROL ARCHITECTURE 5.2.4 Conclusions BGA relies on and takes advantage of various well-developed concepts. It is based on a blackboard architecture that coordinates a multiagent system. Agents can be layered and distributed according to their functionalities, including three-layer hybrid architecture schemas, where the behaviour coordination is accomplished by a blending method derived from the context-dependent blending. The main contributions of the BGA architecture are the following: • BGA effectively groups different concepts and paradigms of control architectures (namely • • blackboard architectures, hybrid architectures, and multi-agent systems) extending and combining their characteristics. BGA incorporates a new blending mechanism that is based on the context-dependent blending that effectively addresses the behaviour coordination problem. BGA makes use of KQML, which is typical of message-passing systems but is not habitual in blackboard systems, implemented over UDP for performance. 5.3 The BG programming language One of the objectives of this research is to have a simple mobile robot performing delivery tasks in an indoor, office-like environment, without prior knowledge of the floor plant, and without any modification or special engineering of the environment. This objective, among other things, means that methods are needed to develop the control software based on different robot behaviours. The BGA architecture, defined in the previous section, provides a good framework to develop such control software. To support the development of control programs further, a high-level robot programming language, called BG, has been defined and its corresponding interpreter has been built. The language fully supports the BGA architecture and allows the user to develop BGA based control systems in a friendly but powerful way. This interpreter has been incorporated into a development environment, and it is also intended for deployment. This section discussed some related robot programming languages and development environments. Then the BG language is described in detail, putting emphasis in its syntax and its semantics, and presenting some examples. Finally, the development environment for the BG language is described. 5.3.1 Antecedents Traditionally, behaviour-based robotics researchers have traditionally avoided complex control and data structures, in part because the need of efficiency. More recently, a number of 208 HUMBERTO M ARTÍNEZ BARBERÁ C-like languages for robot programming have been developed (Horswill, 2000). The parallel functional decision tress (PFDT) language (Schaad, 1998) is able to emulate many of the popular behaviour-based programming constructs, including both parallel and serial control structures. It is implemented as a C++ class library. The CES language (Thrun, 1998b) extends C with probabilistic data types and support for gradient descent function approximators. A particular appealing property of CES is that training information need not be specified at the output of the function approximator. Instead, desired values can be provided for other variables that are computed from the output of the approximator. The compiler does the appropriate symbolic manipulations to compute the gradient of the derived quantity with respect to the weights of the approximator. TDL (Simmons and Apfelbaum, 1998) is a C++ extension that provides support for subgoal decomposition and synchronisation. It includes a graphical front-end for the language that allows a form of visual programming. MissionLab (MacKenzie and Arkin, 1997) is another visual tool, which is intended to allow novice programmers to efficiently task robots, although it does not allow the development of new low-level behaviours. COLBERT (Konolige, 1997) is another C-like language for low-level reactive systems that is designed to interface with the Saphira architecture (Konolige et al., 1997). COLBERT makes use of fuzzy rule bases for the definition of behaviours and behaviour blenders. Frob (Peterson et al., 199) is an embedded language in Haskell which can be used for reactive control loops. GOLOG (Levesque et al., 1997) is a logic programming language that combines the automatic inference capabilities of logic programming with the explicit control operations of imperative languages. GRL (Horswill, 2000) is a language that extends traditional functional programming techniques to behaviour-based systems, in a similar way as Frob. GRL is an architecture-neutral language embedded within Scheme and provides a wide range of constructs for defining data flow within communicating parallel control loops. A simple compiler distils the code into a single while loop containing straight line code and then emits it as Scheme, C, C++ or BASIC code. Both Frob and GOLOG provide considerably more expressiveness than GRL at the cost of increased execution time. Each one of the languages above has its own pros and cons. One of the main goals of the BGA architecture is to tackle the behaviour blending problem and the learning of the rules that drive the blending process. The two languages that are more interesting from the BGA point of view are COLBERT and GRL. CLOBERT uses a similar mechanism for behaviour definition and blending, but it is intended for the implementation of the low-level software only. GRL is a higher-level language that is flexible enough to implement different control schemes. As new ideas and functionalities, especially the use of Java, were required to try, a new high level programming language, named BG, was defined and implemented borrowing ideas from COLBERT and GRL. 5.3.2 BG syntax and semantics To support the development of BGA based control systems, a high level language, called BG, along with its corresponding interpreter were specified and developed. The main goal of BG is to aid in the development cycle of robotics applications by providing a single and uniform framework in which to declare and specify all the elements of the BGA 209 CHAPTER 5: CONTROL ARCHITECTURE architecture (behaviours, tasks, blenders and agents). Moreover, behaviour modelling is supported by an extensive use of fuzzy logic. A BG program is made up of the definition of some BGA agents and their interface with the blackboard. A BG program is intended to be executed as a single process (which can contain different execution threads), and thus, a full distributed BGA architecture is composed of many BG programs with each one running in a different processor. BG, in its current version, incorporates many traditional elements and control structures of classical imperative languages: • • • • • Local and global variables. Expressions and assignments. Arithmetical, logical and relational operators. Conditionals. C-like functions. BG supports two data types: real numbers and fuzzy sets with standard shapes. In addition to that, BG presents advanced non standard control structures like: • • • • • Deterministic finite state machines. Fuzzy rule bases. Approximate reasoning support. Robot navigation support. Agents, tasks, behaviours and behaviour blending support. The BNF notation is used to specify the BG syntax, which is largely based on the C language syntax. In BNF, lowercase names represent non-terminal symbols and uppercase names represent terminal symbols. The non-terminal number represents any real value (either positive or negative) and the terminal ID represents any correctly defined identifier. In both cases their syntaxes correspond to that of C. All terminal symbols except ID are considered reserved keywords and thus they cannot be used as identifiers. Moreover, the character string corresponding to a terminal symbol is that composed of the same letters, but in lowercase. For instance, the terminal FLOAT corresponds to the string “float”. BG supports the use of comments as in C (“/* */”) or in C++ (“//”) and they are used along the different examples to clarify the meaning of some elements of the language. 5.3.2.1 Variables, operators and expressions BG supports only real numbers and fuzzy sets. One design criteria of BG was to make it as simple as possible by only supporting a minimal set of types and instructions without losing generality. As integers can be viewed as a particular case of real numbers, only the latter type is available. Variables are declared by specifying their type. The vdecl non- 210 HUMBERTO M ARTÍNEZ BARBERÁ terminal corresponds to a variable declaration (Listing 5.9). The FLOAT type corresponds to real numbers coded using the IEEE 754 format, and they can be optionally initialised. Real variables can be accessed to read or to modify their content and are only available in the scope of a BG program. When these variables are not in the scope of a BG program (that is, when they are used to access a BGA blackboard) they must be declared using modifiers. Modifiers constrain the access to these variables in read or write operations to reduce the communication overhead. Three types of modifiers are supported: • Sensor: the variable is accessed in read-only mode. • Effector: the variable is accessed in write-only mode. • Status: the variable is accessed in read-write mode. vdecl ::= SET ID “=” ftype fuzzy “;” | FLOAT varlist “;” | SENSOR FLOAT idlist “;” | EFFECTOR FLOAT idlist “;” | STATUS FLOAT idlist “;” ftype ::= TRAPEZOID | SIGMOID | BELL | TRIANGLE | CRISP | TSK fuzzy ::= “[“ number ( “,” number )* “]” varlist var idlist ::= var ( “,” var )* ::= ID | ID “=” number ::= ID ( “,” ID )* Listing 5.9: BNF for variable declaration. The SET type corresponds to fuzzy sets that must be initialised. Five types of fuzzy sets are supported, as well as TSK rule consequents: • Trapezoid. Corresponds to fuzzy sets with trapezoidal shape (Figure 5.16) defined by four • • • points [a, b, c, d]. It is mandatory to specify all four points so that they form a valid trapezoid, although b and c are allowed to be coincident (this way a triangle is obtained). Triangle. Corresponds to fuzzy sets with triangular shape (Figure 5.16) defined by three points [a, b, c]. It is mandatory to specify all three points so that they form a valid triangle. Crisp. Corresponds to crisp sets (Figure 5.16) defined by a single number [a]. It is mandatory to specify just a single point. Sigmoid. Corresponds to fuzzy sets with sigmoidal shape (Figure 5.17) defined by two points [a, b]. It is mandatory to specify both points so that they form a valid sigmoidal function (Eq. 5.3). 211 CHAPTER 5: CONTROL ARCHITECTURE • Bell. Corresponds to fuzzy sets with bell-like shape (Figure 5.17) defined by three points • [a, b, c]. It is mandatory to specify all three points so that they form a valid bell-like function (Eq. 5.4). TSK. Corresponds to TSK rule consequents defined by points [x0, x1, …, xn] so that x0 is the independent term and x1 … xn are the coefficients corresponding to the n input variables. It is mandatory to specify enough points so that they form a valid TSK consequent (see chapters 1 and 3) at run-time, that is, the number of points is checked only when executing the rules. Figure 5.16: Trapezoid, triangle and crisp fuzzy sets. Figure 5.17: Bell and sigmoid fuzzy sets. µx = µx = 1 1+e (5.3) −b ( x −a ) 1 2b x−c 1+ a (5.4) Real variables can be operated with standard arithmetical (+, -, *, /), logical (&&, ||, !), and relational operators (==, !=, <, >, <=, >=). The logical and relational operators are always evaluated and give the result as a real number (1.0 for true and 0.0 for false). 212 HUMBERTO M ARTÍNEZ BARBERÁ Additionally, real variables and fuzzy sets can be operated with standard fuzzy operators (&&, ||, !, IS) that represent t-norm, t-conorm, inverse and degree of membership, respectively. By default, the semantics of the t-norm and t-conorm are the min and the max, respectively. Variables, sets, numbers and operators can be combined to form expressions. The expression non-terminal corresponds to a real variable expression (Listing 5.10), and the fexpression non-terminal corresponds to a fuzzy expression (Listing 5.11). expression ::= expression “+” expression | expression “-” expression | expression “*” expression | expression “/” expression | “!” expression | expression “&&” expression | expression “||” expression | expression “==” expression | expression “!=” expression | expression “>” expression | expression “>” expression | expression “>=” expression | expression “>=” expression | “(” expression “)” | ID | number | function Listing 5.10: BNF for real expressions. fexpression ::= fexpression “&&” fexpression | fexpression “||” fexpression | “!” fexpression | fexpression IS fexpression | “(” fexpression “)” | ID | number | set fuzzy | function Listing 5.11: BNF for fuzzy expressions. 5.3.2.2 Commands and functions The BG language, like C, allows the use of classical control structures as assignments and conditionals, as well as finite state machines, fuzzy rule bases and navigation support methods. These structures are called commands or instructions, and they can be grouped in the so-called blocks. A block is a set of commands that are executed sequentially. The block non-terminal corresponds to a block declaration, which is composed of one or many command non-terminal symbols (Listing 5.12). block ::= “{” (command)+ “}” | command command ::= IF “(” expression “)” block 213 CHAPTER 5: CONTROL ARCHITECTURE | | | | | | | | | | IF “(” expression ID “=” expression PRINTF “(” ID “)” HALT “;” function “;” rules grid fsm SHIFT ID “;” RETURN expression “)” block ELSE block “;” “;” “;” Listing 5.12: BNF for commands. Conditionals (IF, and IF-ELSE) and assignments (“=”) are similar to their counterpart in C. Additionally, the PRINTF command outputs the value of the variable that is its argument, and the HALT command stops the execution of the BG interpreter, and thus no more actions are sent to the robot. The special-cases fsm, grid and rules non-terminals are discussed below. funcdecl ::= FUNCTION ID “(” idlist “)” block function ::= ID “(” idlist “)” Listing 5.13: BNF for functions. BG supports the use of simple user-defined functions that are global to the different BG modules, which correspond to the elements of the BGA architecture (agents, behaviours and tasks). The implicit return type of BG functions is real numbers and the actual value that a function returns is specified using the RETURN command (Listing 5.12). The funcdecl nonterminal corresponds to the declaration of a BG function and the function non-terminal corresponds to the call of a BG function (Listing 5.13). The parameters of a BG function are also explicit and must be real values. If no return command appears, by default, BG functions return zero. 5.3.2.3 Finite state machines The use of finite state machines (FSM) allows the programmer to sequence tasks and actions, and define the conditions to change from one task to another. The BG language, as in its current implementation, does not have instructions for loops (while, for, or repeat-until), but these can be modelled using finite state machines. The main reason for this lack of loops is to keep the duration of each control loop under control (so that an error cannot lock it). In future, loops will be incorporated, as in COLBERT (Konolige 1997), loops which are internally, in fact, finite state machines. Another use of FSMs is the implementation of simple planners that merely chain simple actions. The fsm non-terminal corresponds to the declaration of an FSM (Listing 5.14). Each state has a unique identifier and a block of commands that are executed continually while the state is active. The SHIFT command (Listing 5.12) can be used to change the current state of the FSM. The initial state of an FSM is indicated in its definition (Listing 5.14). fsm ::= FSM START ID “{” states “}” 214 HUMBERTO M ARTÍNEZ BARBERÁ states ::= ( STATE ID “:” block ) + Listing 5.14: BNF for finite state machines. For example, a robot has to move from a home location (home_x, home_y) to a given goal coordinates (gx, gy), and then return to home. It features a sensor to identify its position (x, y), and a function (ingoal) that determines if the robot is in its current goal. The task supervision could correspond to the following sample BG code (Listing 5.15), with its corresponding state machine (Figure 5.18). This graph uses ellipsoids to represent states, a thick arrow to represent the initial state and thin arrows to represent transitions. Each thin arrow is labelled with the symbol that triggers the corresponding transition. A special symbol (“*”) is used as a wildcard and denotes every possible input symbol except those corresponding to the other thin arrows leaving the state. In the example below, the FSM starts in the HOME state, which sets some variables in order to establish the home and goal coordinates and then shifts to the LOOKING state. This state executes its code (labelled as “user code”) repeatedly until the ingoal variable is set to 1.0 (that is, the robot is in the goal location). When this occurs, some variables are modified in order to replace the goal coordinates by the home position and then CHANGING is set as the new active state. This state is executed until the ingoal is set to 0.0 (that is, the robot no longer is in the goal location), and then it shifts to the RETURNING state. This state is executed until the ingoal is set to 1.0 (that is, the robot is in the home location), and then it shifts to the GOAL state. This state simply sets a flag variable to indicate that the task has been accomplished. Figure 5.18: Sample FSM-based planner. fsm start HOME { state HOME: { goal home_x home_y goal_x = = = = 0.0; x; y; gx; // Goal not reached // Set origin coordinates // Set goal coordinates 215 CHAPTER 5: CONTROL ARCHITECTURE goal_y = gy; shift LOOKING; } state LOOKING: { /* User code */ if (ingoal == 1.0) { goal_x = home_x; goal_x = home_y; shift CHANGING; } } state CHANGING: { /* User code */ if (ingoal == 0.0) shift RETURNING; } state RETURNING: { /* User code */ if (ingoal == 1.0) shift HALT; } state GOAL: { goal = 1.0; halt; } // Change goal coordinates // Goal reached } Listing 5.15: Example of an FSM. 5.3.2.4 Fuzzy rule bases Fuzzy rule bases are intended for both control and behaviour blending (see previous section on BGA architecture), and are one of the most noticeable features of the BG language. BG fuzzy rules can be constructed by using any combination of BG fuzzy operators (t-norm, t-conorm, inverse, and degree of membership), variables and fuzzy sets to define the antecedent, and by using any combination of variables and fuzzy sets to define the consequent. In this way BG supports multiple -input multiple -output (MIMO) fuzzy rules, as well as any combination of input and output variables. The rules non-terminal corresponds to a fuzzy rule base (Listing 5.16), which is composed of a set of fuzzy rules (the IF terminal) and also of a special kind of rules, called background rules (the BACKGROUND terminal). A background rule represents a fuzzy rule whose antecedent always evaluates to the same fixed degree of membership, that of the parameter. These rules are useful to cover undefined input space. This is typically achieved by specifying a very low degree of membership, which dominates the output if no other rule fires and otherwise affects the output without any significance. Background rules play an important role when used in conjunction with automated learning systems, because these methods usually tend to leave uncovered input space regions. 216 HUMBERTO M ARTÍNEZ BARBERÁ In the current implementation, the t-norm and t-conorm operators can only be changed globally for all the fuzzy rule bases. The examples in this thesis assume that the min and the max operators are used as t-norms and t-conorms, except where explicitly indicated. Finally, the center of gravity method is used for defuzzification. ::= RULES “{” (rule)+ “}” ::= IF “(” fexpression “)” postlist “;” | BACKGROUND “(” expression “)” postcond “;” ::= postcond ( “,” postcond )* ::= ID | ID IS ID | ID IS ftype fuzzy | ID IS number rules rule postlist postcond Listing 5.16: BNF for fuzzy rule bases. For instance, if a robot has to maintain a given distance to a wall on its right, the following BG code performs the task (Listing 5.17). Right is the input variable that measures the distance from the robot to the wall, which can be small (fuzzy sets CLOSE and NEAR), medium (fuzzy set MED), or large (fuzzy set FAR). Turn is the output variable that controls if the robot should steer to the left (fuzzy sets TSL and TL), to the right (fuzzy set TSR), or should continue straight ahead (fuzzy set TN). // Sets for ToF sensor range [0 .. 10] (metres) set CLOSE = trapezoid {0.0, 0.0, 0.17, 0.2}; set NEAR = trapezoid {0.17, 0.2, 0.3, 0.4}; set MED = trapezoid {0.3, 0.4, 1.3, 1.4}; set FAR = trapezoid {1.3, 1.4, 10.0, 10.0}; // Sets for steering [-10 .. 10] set TL = trapezoid {-10.0, -9.0, -6.0, -5.0}; set TSL = trapezoid {-6.0, -5.0, -2.0, -1.0}; set TN = trapezoid {-2.0, -1.0, 1.0, 2.0}; set TSR = trapezoid {1.0, 2.0, 5.0, 6.0}; set TR = trapezoid {5.0, 6.0, 9.0, 10.0}; // // // // // Left Small left None Small right Right rules { if if if if (right (right (right (right is is is is FAR) MEDIUM) NEAR) CLOSE) turn turn turn turn is is is is TSR; TN; TSL; TL; // // // // Turn small right No turn Turn small left Turn left } Listing 5.17: Example of a fuzzy rule base. 5.3.2.5 Robot navigation support The BG language provides a basic structure and methods to manage two-dimensional spatial information for constructing maps and for path planning. The grid non-terminal corresponds to a grid map structure (Listing 5.18) that consists of a series of square cells arranged in an N by M matrix. Each cell has two values associated: the degree of certainty that the cell is empty eij, and the degree of certainty that the cell is occupied wij. These values are 217 CHAPTER 5: CONTROL ARCHITECTURE calculated and fused for each cell in the beam of a single sensor to produce the final M(i,j) values that are used for navigation purposes. A detailed description of the map-building and path-planning processes is given in chapter 4. When a grid is initialised, the user provides some parameters: the horizontal (N) and vertical (M) size of the grid map (the left-bottom cell serves as the origin of coordinates), the length in metres of the cell sides, and the pathplanning method. Currently BG supports two types of path-planning methods, which produce as a result the absolute heading that the robot should follow to reach the goal: • A*. An A* algorithm uses the M(i,j) values of the grid to search for the shortest path from • the current robot position to the goal. This process is time consuming and obtains the full free-of-collision path to the goal. This mode is used when the A_STAR terminal is specified. Direct. This method calculates the heading to the goal from the current robot position without taking into account the grid map, which is useful to implement completely reactive goal-seeking behaviours. This process is very fast and merely obtains the heading to the goal. This mode is used when the DIRECT terminal is specified. Each time the grid structure is processed, it can be updated with the following information: • The current absolute location (x, y) of the robot, using the LOCATION grid command • • • (Listing 5.18). The current absolute location (x, y) of the goal place, using the GOALPOS grid command (Listing 5.18). The absolute location (x, y) of a possible obstacle, using the OBSTACLE grid command (Listing 5.18). A relative sensor reading (x, y, sensor, sensor_value) from the given absolute location, using the UPDATE grid command (Listing 5.18). grid gmode gbody gblock gcoord gcommand gaction ::= GRID gmode UPDATES ID gbody ::= A_STAR | DIRECT ::= CELLS number BY number SIDE number gblock | CELLS number SIDE number gblock | gblock ::= “{” (gcommand) + “}” ::= ID “,” ID ::= IF “(” expresion “)” gaction | LOCATION “(” gcoord “)” “;” | gaction ::= GOALPOS “(” gcoord “)” “;” | OBSTACLE “(” gcoord “)” “;” | UPDATE “(” gcoord “,” ID “,” number “,” ID “)” “,” Lis ting 5.18: BNF for grid map structures. 218 HUMBERTO M ARTÍNEZ BARBERÁ An example of a grid map generated in a robot run is shown below (Figure 5.19). A circular robot navigates in an indoor office like floor plant. It has ten sonar range sensors (sonar0, … , sonar9) at different angles, each one identified by a number (0, .., 9). In each control loop cycle, the current robot location and the current goal are updated. If a sonar reading is less than a given threshold (5 metres) the grid is updated with the reading, otherwise it is discarded. Finally, the variable heading is updated using the output of the A* path-planner and the values of the cells that form the map are updated (Listing 5.19). It is interesting to note here that “a_star” refers to ht e path-planning method, “updates heading” refers to the variable that is updated with the result of the path-planning, and “update ()” refers to the command to update the grid map. Figure 5.19: Example of a BG grid map and the path obtained. grid a_star updates heading cells 71.0 by 71.0 side 0.05 { location (x, y); goalpos (gx, gy); if if if if if if if if if if (sonar0 (sonar1 (sonar2 (sonar3 (sonar4 (sonar5 (sonar6 (sonar7 (sonar8 (sonar9 < < < < < < < < < < 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) update update update update update update update update update update (x, (x, (x, (x, (x, (x, (x, (x, (x, (x, y, y, y, y, y, y, y, y, y, y, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, sonar0); sonar1); sonar2); sonar3); sonar4); sonar5); sonar6); sonar7); sonar8); sonar9); } Listing 5.19: Example of a grid-based processing. 5.3.2.6 Agents, behaviours and tasks definition 219 CHAPTER 5: CONTROL ARCHITECTURE BG was designed to serve as the development language of BGA architectures, and thus, it supports the definition of the different elements of a BGA architecture: agents, tasks, behaviours and blenders. The agent non-terminal corresponds to an agent (Listing 5.20) which can be reactive or deliberative. The former is specified using the REACTIVE keyword. A deliberative agent is declared by specifying an optional common block (the common nonterminal) and a series of tasks (the task non-terminal). A reactive agent is a little bit more complicated. It is declared by specifying an optional common block (the common nonterminal), by a series of behaviours (the behaviour non-terminal), and by an optional blender block (the blender non-terminal). All the different processing blocks (namely common, task, behaviour, and blender blocks) have in common that they can make use of local real variables (local fuzzy sets are not supported), and these can be used to store values that are not used elsewhere along a BG program. In addition, behaviours can specify a weight optionally, which serves in the BGA behaviour fusion procedure (see previous section). If the weight is not specified, it is set to 1.0 by default. When using a BG program in conjunction with learning techniques to develop a certain behaviour blender, reactive agents can specify what input variables will be used inside the blender block along with their respective ranges. This way the learning method can limit the search space (see chapter 7 for an in depth discussion). The blendl non-terminal corresponds to a list of real variables that are used in the blender block. agent agbody ::= AGENT ID REACTIVE agbody | AGENT ID “{” common (task) + “}” ::= “{” blendl common (behaviour)+ blender “}” blendl blenv ::= BLENDING blenv ( “,” blenv ) * “;” ::= ID RANGE “(” number “,” number “)” | ID local common blender ::= ( FLOAT varlist “;” )? ::= ( COMMON “{” local (command) + “}” )? ::= ( BLENDER “{” local (command)+ “}” )? behaviour behbody ::= BEHAVIOUR ID ( WEIGHT number )? behbody ::= “{” local (command) + “}” task ::= TASK ID “{” local (command)+ “}” Listing 5.20: BNF for agent definition. A simple reactive agent definition is shown below (Listing 5.21). The agent has three behaviours: one which avoids obstacles (avoid ), one which aligns the robot to the left wall (alignL), and one which aligns the robot to the right wall (alignR). These behaviours based on sensory data (left, right, and front, the input variables that represent the distance to objects on the left, right and front respectively) propose control commands (turn and speed). The blender block specifies how the output of these behaviours is fused in each time step (Figure 5.20), based on its sensory input (left, right, and front). The example shows how these elements are specified using BG. The agent is specified to be reactive (“reactive” keyword), and thus it is composed of a common module, some behaviours and a blender module. The actual implementation of the behaviours and the common module is not shown in the example. The blender simply contains a fuzzy rule base (“rules” keyword), where the consequents are the behaviours previously mentioned. The declaration of the fuzzy sets is not shown, and they should be declared before the agent declaration. The fuzzy sets that appear in the antecedents 220 HUMBERTO M ARTÍNEZ BARBERÁ are similar to that of previous examples (Listing 5.17), while the fuzzy sets that appear in the consequents (LOW, HALF, and HIGH) represent low, medium and high degrees of activation. Figure 5.20: Result of the behaviour fusion. agent simpleControl reactive { blending left, right, front; common behaviour avoid weight 1.0 behaviour alignL weight 0.5 behaviour alignR weight 0.5 { { { { /* /* /* /* Code Code Code Code goes goes goes goes here here here here */ */ */ */ } } } } blender { rules { background (0.01) background (0.01) background (0.01) avoid is LOW; alignR is LOW; alignL is LOW; if (left is CLOSE) avoid is HIGH; if (front is CLOSE) avoid is HIGH; if (right is CLOSE) avoid is HIGH; if (left is NEAR) if (front is NEAR) if (right is NEAR) avoid is HALF; avoid is HALF; avoid is HALF; if (left is MED) if (right is MED) alignL is HALF; alignR is HALF; if (left is FAR) if (right is FAR) alignL is HIGH; alignR is HIGH; } } } Listing 5.21: Example of a reactive agent definition. 5.3.2.7 BG programs definition A BG program is simply a collection of agents and variables. The program nonterminal corresponds to a valid BG program (Listing 5.22) composed of a variable declaration section, a function declaration section, an initialisation block and an agent declaration section. 221 CHAPTER 5: CONTROL ARCHITECTURE The initialisation block (the initializ non-terminal) is executed only once when the BG program is started and it is intended to perform whatever initialisation is needed. program initializ ::= (vdecl)* (funcdecl)* initializ (agent)+ ::= (INITIALIZATION “{” (command)+ “}” )? Listing 5.22: BNF for BG programs. 5.3.3 The BGen development environment The BG language interpreter has been developed using the Java language. The BG parser has been produced using the Java CUP 5 parser generator. Java CUP is a system for generating LALR parsers from a grammar specification. It serves the same role as the widely used program YACC (Johnson, 1975), which uses C instead of Java. The BG interpreter is composed of two modules: the BG parser and the BG executor. The BG parser takes as input a file containing a BG program and then it proceeds with the lexical (using a custom-made finite state lexical parser) and syntactical analysis (using the CUP parser). As the BG language is simple, the semantic analysis (basically just type checking) is performed simultaneously with the syntactical analysis. When a valid BG program has been parsed, the module produces an intermediate code for later execution. The BG executor takes as input the intermediate code and makes no additional checking, and then interprets the code (working in fact as a transliterator). The intermediate code is stored as a directed graph, where the nodes correspond to different elements of the BG language (variables, expressions, commands, behaviours, tasks and agents). This way, the time-consuming part of the interpretation process (the lexical and syntactical parsing) is decoupled from the execution part for efficiency reasons. Although the BG executor is programmed using Java, which in fact is also an interpreted language, BG programs can be executed successfully on i486 processors. The BG language interpreter has been incorporated into a development environment, called BGen, which also integrates a simulator for different customised holonomic robots, data visualisation and recording tools, and a behaviour learning tool. The BGen environment is intended not only for development but also for deployment. In the development stage the user defines and specifies a BG program that reflects a particular BGA architecture and then simulates the overall behaviour of the robot to test the feasibility of the solution. Once the BG program is ready for deployment, it is transferred to the real robot, which contains the BGen environment but uses a robot interface module instead of the simulator. This is possible because all the software has been developed in Java and also implies that the real robot must include a Java virtual machine, which is possible in most common operating systems. The robots used in this thesis to test the BGen environment run the Linux operating system. This approach for development and deployment presents some advantages: • Both the development and deployment environments are identical. 5 http://www.cs.princeton.edu/~appel/modern/java/CUP 222 HUMBERTO M ARTÍNEZ BARBERÁ • The development to deployment time is considerably reduced. • Different platforms can reuse the code, except for the lowest-level functions. Figure 5.21: BGen edition and component testing modules. BGen consists of different modules, and the most important are described below. The edition module (Figure 5.21) contains a simple text editor and some tools to aid in the process of writing BG programs. One of these tools presents the structure of a BG program graphically and another tool serves to test functions, individual behaviours or complete agents by computing their output when input variables vary inside a given range. In the example, the “Controller Graph” window displays the controller as a tree, where the root is the controller itself. The first level corresponds to high-level module categories, namely functions and agents. The second level in the hierarchy corresponds to the current elements that are inside these categories. In this case there are four functions (min, max, sgn, and limits) and four agents (VehicleControl, Localization, Planning, and ReactiveControl). The third level corresponds to the different sub-elements of the elements that appear in the second level. In this case they are the leaf nodes of the agents. The “Component Test Options” window lets the user select a component, two variables and a range of values. Using these values the component is evaluated and the corresponding control surface is displayed in the “Component Test Output” window. 223 CHAPTER 5: CONTROL ARCHITECTURE Figure 5.22: BGen simulation module. The robot simulation module (Figure 5.22) simulates holonomic mobile robots and executes BG programs. It graphically displays the robot’s behaviour in a given environment, the map being constructed by the robot, and also the current behaviour activation. It shows additionally the values of the robot’s sensors, the robot’s path, the number of collisions and epochs, and the shortest path from the robot to the goal. In the example, the “Robot Simulation Environment” window lets the user specify the parameters of the simulation, that is the actual map of the environment, the robotics platform being simulated, and the BG control program. The “Simulated World” window displays the robot, the map, and the robot’s sensor values. The “Behaviour Blending” window displays the degree of activation of the different behaviours implemented in the BG program at each time step. The “Occupancy Grid” window has two tabs, one that displays the grip map generated by the BG program, the path-planning result and superimposed the real map, and other that displays the 3D cost function used by the path-planning method specified in the BG program. Additionally, windows that display the values of the effector variables of the BG program can also be activated and shown. 224 HUMBERTO M ARTÍNEZ BARBERÁ Figure 5.23: BGen sensor simulation and map-building module. The sensor simulation and map-building module (Figure 5.23) simulates sonar and infrared sensors, and serves as a testbed of different data fusion algorithms and map-building techniques. In the example, the “Sensor Simulation Environment” window lets the user specify the map, the robot, and the path that the robot will follow. It also allows the selection of parameters related to the simulation: the simulation method used for the sensors, the number and type of sensors to be simulated, the filters to be applied to the sensors, and the sensor fusion method that is used. When a simulation is run, the module displays the values of the sensors, the path of the robot, some performance calculations (for instance, the rooted mean square error of the sensor fusion method), and optionally the internal data used to simulate sensors (for instance, the sound paths in the case of sonar sensors). The “Occupancy Grid” window displays the grid map generated during the robot’s path, and lets the user change the way cells are updated. 5.3.4 Conclusions BG is a high-level robot programming language that fully supports the BGA architecture and thus, it allows the user to develop BGA based control systems in a friendly but powerful way. BG aids in the development cycle of robotics applications by providing a single uniform framework to declare and specify all the elements of the BGA architecture 225 CHAPTER 5: CONTROL ARCHITECTURE (behaviours, tasks, blenders and agents), and by supporting behaviour modelling via extensive use of fuzzy logic. In addition, a full distributed BGA architecture can be achieved by running BG programs in different processors. The BGen development environment incorporates a BG language interpreter, as well as integrating a simulator for different customised holonomic robots, data visualisation and recording tools, and also a behaviour learning tool. The BGen environment is intended both for development and deployment, and provides a simple generic way to develop and test BG programs off-robot, and to execute such programs in a real robot. Thus, the development to deployment time is considerably reduced, and different platforms can reuse the code. 226 Chapter 6 A Sample Agent Architecture 6.1 Introduction Once an architecture (BGA), a programming language (BG) and a development environment (BGen) are available, the natural step is to use them in an application using the Quaky-Ant robot (see chapter 2). The robot’s task is to navigate from a given home position to a goal location, and then return to the initial position, in a completely unknown environment but similar to a real indoor floor plant (with different rooms and corridors). The coordinates of the two places are known beforehand. In addition, the robot should neither collide nor get frozen (or get stuck in an infinite loop). To accomplish this goal the robot must be able to perform certain tasks: navigation with obstacle avoidance, goal seeking, map building and localisation, some form of high level planning, etc. The whole system has been decomposed, by extracting the most important elements (Armstrong et al., 1998), into the following series of agents (Figure 6.1) which will be described in the next subsections: • Reactive Control. This is the BGA reactive agent that is in charge of the reactive • • • navigation. It implements a series of behaviours to avoid collisions, follow a given path, align to the walls, and escape from U-shaped obstacles. Planning. This is the BGA deliberative agent that is in charge of the high level planning and the global goal supervision. Navigation. This is the BGA deliberative agent that is in charge of building a model of the environment, locating the robot in it, and generating feasible paths to reach the goal. Sensor Processing and Actuator Control. These are the BGA elements that are in charge of processing raw sensor data and controlling the motors. The former accesses raw sensor signals, filters and processes them, and puts relevant data on the blackboard. The latter reads motor commands from the blackboard and sends the corresponding control actions to the motors. 227 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE Figure 6.1: Sample agent architecture. 6.2 Sensor processing and actuator control The sensor processing module accesses raw sensor signals, filters and processes them, and puts relevant data on the blackboard. Most parts of this module are platform dependent, because they communicate with the hardware platform, although they have been encapsulated to provide a uniform interface across different platforms. The basic functionalities of this module are the following: • • • • Sonar and infrared range sensors data acquisition and fusion. Encoders data sampling and processing. Collision detection. Absolute positioning and compass sensors data acquisition. The sensor fusion method, which is platform independent and user selectable, is based on a set of fuzzy rules that combine information from the sonar and infrared sensors to produce a feasible range. This method has been described in depth in chapter 3. The sensor processing module writes the following information on the blackboard continuously: • • • • • • Range sensors: information provided by the fusion method (virtu0 – virtu9). Collision sensors: information provided by the touch sensors (bump 0 – bump 4). Odometry: information provided by the encoders (x, y, alpha). Absolute position: information provided by the GPS (gps_x, gps_y). Absolute heading: information provided by the compass (compass). Goal detection: flag to indicate that the goal (gx, gy) has been reached (ingoal). 228 HUMBERTO M ARTÍNEZ BARBERÁ In the current example, the collision sensors, the compass and the GPS are not used, but they are in other applications (mostly that presented in chapter 8). The actuator control module reads motor commands from the blackboard and sends the corresponding control actions to the motors. As occurs with the sensor processing module, most parts of this module are platform dependent and have been encapsulated as well. The Quaky-Ant robot present differential steering, and thus, the motor module accepts commands of the form <motL, motR>, where motL and motR correspond to the power applied to the left and right motors respectively. To allow for simple and intuitive programs, the tuple <speed, turn> is used to decouple the robot’s speed and steering angle, instead of using motL and motR directly. The reason for decoupling these variables is that developers feel usually more comfortable with decoupled speed and turning rate than with differential motor commands. These four variables are related by the following equations: 2 ⋅ turn + 10 turn < 0 ml = turn ≥ 0 10 (6.1) 10 mr = (−2) ⋅ turn+ 10 (6.2) turn ≤ 0 turn > 0 motL = ml ⋅ speed ⋅ km (6.3) motR = mr ⋅ speed⋅ km (6.4) The turn variable operates in the [-10, 10] range, the speed variable operates in the [-1, 1], and the motL and motR variables operate in the [-100, 100] range. The ml (Eq. 6.1) and mr (Eq. 6.2) variables, which operate in the [-10, 10] range, translate the turn rotation to a percentage of the desired maximum speed. ml and mr are later combined into motL (Eq. 6.3) and motR (Eq. 6.4), scaling the result by km (in this case km = 10). 6.3 Reactive control This is the most important and crucial agent because it implements the basic reactive behaviours and has the responsibility of maintaining the physical integrity of the robot by avoiding collisions. The reactive behaviours make extensive use of fuzzy logic in the form of fuzzy rule bases, which have been obtained both from previous experience and fuzzy modelling techniques (Gómez-Skarmeta and Martínez-Barberá, 1997)(Gómez-Skarmeta et al., 1999c). The following reactive behaviours (Figure 6.2) have been identified and implemented: • Avoid-left (avoidL): avoids an obstacle located on its left. • Avoid-front (avoidF): avoids an obstacle located on its front. • Avoid-right (avoidR): avoids an obstacle located on its right. 229 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE • • • • Align-right (alignR): holds the robot at a given distance from the wall on its right. Align-left (alignL): holds the robot at a given distance from the wall on its left. Move-to-goal (toGoal): drives the robot in the direction of the current goal. Escape (escape): turns the robot randomly in order to escape from a U-shaped wall. These behaviours and the different modules that compose the reactive control agent are described in the following subsections. Special detail is put to show the different fuzzy rule bases, the fuzzy sets that appear in the rule bases, and the different variables that play an active role in the control of the robot (both those from the blackboard and those from the LPS). toGoal Blackboard alignL alignR LPS uobject heading turn alpha left right leftd rightd speed virtu i avoidR front avoidF common avoidL escape blender Figure 6.2: Reactive control agent. 6.3.1 Common block The common block reads sensory information from the blackboard, process it, and then updates the LPS, which is used later on by the different behaviours. In the current example the LPS (Figure 6.2) contains the following features extracted from the environment: uobject, left, leftd, front, rightd, and right. These features account for the presence of objects on different directions, and are defined as follows: 230 HUMBERTO M ARTÍNEZ BARBERÁ • • • • • • On-front (front): indicates the presence of an object on the front (Eq. 6.5). On-left (left): indicates the presence of an object on the left (Eq. 6.6). On-right (right): indicates the presence of an object on the right (Eq. 6.7) On-diagonal-left (leftd): indicates the presence of an object between the front and the left (Eq. 6.8). On-diagonal-right (rightd): indicates the presence of an object between the front and the right (Eq. 6.9). U-shaped-wall (uobject): indicates whether the robot is in a U-shaped wall (Listing 6.1). Most of these features are computed based on the distances provided by the virtual sensors, which fuse data from infrared sensors and sonar sensors located around the robot’s perimeter (Figure 6.3). These features do not correspond to real measures directly but to reference distances, which are platform dependent. These features, in fact, are the result of a simple and heuristic sensor fusion method. Thus, the features combine information from different sensors to reduce the amount of sensory information provided to the behaviours. In this case, the set of 10 virtual sensors (virtui) is reduced to only 5 features (left, leftd, front, rightd, and right). The coefficients that appear in the fusion equations (Eqs. 6.5, 6.6, 6.7, 6.8, and 6.9) have been obtained by trial-and-error, and the underlying heuristic reflects how important is each sensor in the corresponding feature. front leftd Σ left Σ virtu2 virtu1 virtu0 Σ virtu3 virtu4 virtu5 virtu6 virtu7 Σ Σ virtu9 rightd right virtu8 Figure 6.3: Combination of Quaky-Ant’s range sensors. front=min {virtu 3 ,virtu 4 } (6.5) left = (virtu 9 + virtu 0 ) ⋅ 0 .5 (6.6) right = (virtu 7 + virtu 8 )⋅ 0 .5 (6.7) leftd = 0.45 ⋅ (min {virtu5 ,0 .75}) + 0 .45 ⋅ (min {virtu 6 ,0.75}) + 0.1 ⋅ (min {virtu 7 ,0 .75 }) (6.8) 231 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE rightd = 0 .45 ⋅ (min {virtu 2 ,0 .75}) + 0.45 ⋅ (min {virtu1 ,0.75}) + 0 .1 ⋅ (min {virtu 0 ,0.75}) (6.9) A different case is the uobject feature. It is computed using a series of fuzzy rules (Listing 6.1), in which the antecedents are previously computed features (leftd, front, and rightd). These rules use fuzzy sets (Figure 6.4) that are also used for behaviours. The interpretation of the uobject feature is that the higher the value the more likely is that the robot is in a U-shaped obstacle. rules { background (0.01) uobject is 0.0; if ((front is CLOSE) && ((rightd is CLOSE) || (leftd is CLOSE))) if ((front is NEAR) && ((rightd is CLOSE) || (leftd is CLOSE))) if ((front is CLOSE) && ((rightd is NEAR) || (leftd is NEAR))) if ((front is NEAR) && ((rightd is NEAR) || (leftd is NEAR))) uobject is 1.0; uobject is 0.7; uobject is 1.0; uobject is 0.7; } Listing 6.1: U-shaped obstacle recognition. In addition to the LPS, the different behaviours also use the following variables extracted from the blackboard (Figure 6.2): alpha is the current robot’s global heading, heading is the desired robot’s heading, speed is the desired robot’s lineal speed, and turn is the desired robot’s rate of turn. A deliberative agent sets the heading variable, while the sensor processing and actuator control module sets or updates the others. 6.3.2 Obstacle avoidance behaviours Most of the reactive behaviours have been defined using expressive fuzzy logic controllers (as is the case of the obstacle avoidance behaviours), which, for instance, use fuzzy sets for range measures, speed control, and turning rate control. Five fuzzy sets (CLOSE, NEAR, MED, FAR, and VFAR) have been defined for range measures (Figure 6.4). These sets are not uniformly distributed along the sensors range [0, 10], and the safety area (the area where an object might be dangerous for the robot) is covered by CLOSE, NEAR, and MED. Three fuzzy sets (SLOW, SMEDIUM , and SFULL) have been defined for speed setting (Figure 6.5). These sets are almost uniformly distributed along the speed range [0, 1]. Seven fuzzy sets (TTL, TL, TSL, TC, TSR, TR, TTR) have been defined for turning rate setting (Figure 6.6). These sets are almost uniformly distributed along the turning rate range [-10, 10], except TTL and TTR which are outside the actuator range. Although the actuator control module forces values to be in the range, these out-of-range values are useful when a behaviour needs to stress that a tight turn is really needed. Thus, higher values are used in the inference process and it will be more likely that higher values will be obtained as a result. 232 HUMBERTO M ARTÍNEZ BARBERÁ CLOSE NEAR MED FAR VFAR 1 0 0.0 0.2 0.3 0.4 0.5 0.6 0.7 1.3 1.4 10.0 Figure 6.4: Fuzzy sets for range sensors. SLOW SMEDIUM SFULL 1 0 0.0 0.3 0.4 0.6 0.7 Figure 6.5: Fuzzy sets for speed setting. TTL TL TSL TC TSR 1.0 TR TTR 1 0 -15 -10 -9 -6 -5 -2 -1 1 2 5 6 9 10 15 Figure 6.6: Fuzzy sets for turning rate setting. The obstacle avoidance behaviours are intended to avoid collisions with obstacles that are in the vicinity of the robot. There are four obstacle avoidance behaviours (Figure 6.2): • • • • Avoid-left (avoidL): drives the robot out of an obstacle on the left. Avoid-front (avoidF): drives the robot out of an obstacle on the front. Avoid-right (avoidR): drives the robot out of an obstacle on the right. Escape (escape): drives the robot out of a U-shaped obstacle. The avoidL behaviour (Listing 6.1) drives the robot to the right when it encounters an obstacle on the left. The closer is the obstacle the tighter is the turn to the right. When the obstacle is close enough, the translation speed is set to a small value. This behaviour is crucial when crossing doorways and small corners. behaviour avoidL weight 1.0 { fusion turn, speed; 233 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE rules { background (0.01) background (0.01) speed is SFULL; turn is TC; if if if if if ((leftd ((leftd ((leftd ((leftd ((leftd is is is is is CLOSE) CLOSE) CLOSE) CLOSE) CLOSE) if if if if if ((leftd ((leftd ((leftd ((leftd ((leftd is is is is is NEAR) NEAR) NEAR) NEAR) NEAR) if if if if if ((leftd ((leftd ((leftd ((leftd ((leftd is is is is is MED) MED) MED) MED) MED) && && && && && && && && && && && && && && && (front (front (front (front (front (front (front (front (front (front (front (front (front (front (front is is is is is is is is is is is is is is is CLOSE)) NEAR)) MED)) FAR)) VFAR)) CLOSE)) NEAR)) MED)) FAR)) VFAR)) CLOSE)) NEAR)) MED)) FAR)) VFAR)) if ((leftd is CLOSE) || (front is CLOSE)) if ((leftd is NEAR) || (front is NEAR)) if ((leftd is MED) || (front is MED)) turn turn turn turn turn is is is is is TTR; TTR; TTR; TTR; TTR; turn turn turn turn turn is is is is is TR; TR; TR; TSR; TSR; turn turn turn turn turn is is is is is TR; TR; TSR; TC; TC; speed is SLOW; speed is SLOW; speed is SMEDIUM; } } Listing 6.2: Avoid-left reactive behaviour. The avoidR behaviour (Listing 6.2) drives the robot to the left when it encounters an obstacle on the right. The closer is the obstacle the tighter is the turn to the left. When the obstacle is close enough, the translation speed is set to a small value. This behaviour is crucial when crossing doorways and small corners. behaviour avoidR weight 1.0 { fusion turn, speed; rules { background (0.01) background (0.01) speed is SFULL; turn is TC; if if if if if ((rightd ((rightd ((rightd ((rightd ((rightd is is is is is CLOSE) CLOSE) CLOSE) CLOSE) CLOSE) if if if if if ((rightd ((rightd ((rightd ((rightd ((rightd is is is is is NEAR) NEAR) NEAR) NEAR) NEAR) if if if if ((rightd ((rightd ((rightd ((rightd is is is is MED) MED) MED) MED) && && && && && && && && && && && && && && (front (front (front (front (front (front (front (front (front (front (front (front (front (front is is is is is is is is is is is is is is CLOSE)) NEAR)) MED)) FAR)) VFAR)) CLOSE)) NEAR)) MED)) FAR)) VFAR)) CLOSE)) NEAR)) MED)) FAR)) 234 turn turn turn turn turn is is is is is TTL; TTL; TTL; TTL; TTL; turn turn turn turn turn is is is is is TL; TL; TL; TSL; TSL; turn turn turn turn is is is is TL; TL; TSL; TC; HUMBERTO M ARTÍNEZ BARBERÁ if ((rightd is MED) && (front is VFAR)) turn is TC; if ((rightd is CLOSE) || (front is CLOSE)) speed is SLOW; if ((rightd is NEAR) || (front is NEAR)) speed is SLOW; if ((rightd is MED) || (front is MED)) speed is SMEDIUM; } } Listing 6.3: Avoid-right reactive behaviour. The avoidF behaviour (Listing 6.4) drives the robot to the left or to the right when it encounters an obstacle on the front. The closer is the obstacle the tighter is the turn to the left or right. The turn is made to the left or right depending which side has a greater open area. When the obstacle is close enough, the translation speed is set to a small value. This behaviour is crucial when avoiding frontal collisions. behaviour avoidF weight 1.0 { fusion turn, speed; rules { background (0.01) background (0.01) turn is TC; speed is SFULL; if ((rightd is CLOSE) && (front is CLOSE) && !(leftd is CLOSE)) if ((left is CLOSE) && (front is CLOSE) && !(right is CLOSE)) if ((rightd is NEAR) && (front is CLOSE) && !(leftd is NEAR)) if ((left is NEAR) && (front is CLOSE) && !(right is NEAR)) if ((rightd is CLOSE) && (front is NEAR) && !(leftd is CLOSE)) if ((left is CLOSE) && (front is NEAR) && !(right is CLOSE)) if ((rightd is NEAR) && (front is NEAR) && !(leftd is NEAR)) if ((left is NEAR) && (front is NEAR) && !(right is NEAR)) if ((front is CLOSE) || (front is NEAR)) turn is TTL; turn is TTR; turn is TTL; turn is TTR; turn is TL; turn is TR; turn is TL; turn is TR; speed is SLOW; } } Listing 6.4: Avoid-front reactive behaviour. The escape behaviour (Listing 6.5) drives the robot to the left or to the right when it encounters a U-shaped obstacle. The behaviour works with some hysteresis, and thus it is active when the front object is at less than 0.2 metres and it changes to inactive when the front obstacle is at more than 0.3 metres. This behaviour, in fact, is not purely reactive because it has a state that accounts for the hysteresis effect. The turn is made to the left or right depending which side has a greater open area. When a U-shaped obstacle is detected the translation speed is set to a small value. This behaviour is crucial when approaching sharp concave corners. 235 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE behaviour escape weight 1.25 { fusion turn, speed; if (front > 0.3) ushape = 0.0; if ((front < 0.2) && (ushape == 0.0)) { ushape = 1.0; if (leftd usense if (leftd usense < rightd) = 1.0; >= rightd) = -1.0; } if (ushape == 1.0) { if (usense == 1.0) turn = 10.0; if (usense == -1.0) turn = -10.0; turn = turn * 10.0; speed = 0.5; } } Listing 6.5: Escape behaviour. 6.3.3 Goal reaching behaviour The goal reaching behaviours are intended to drive the robot to the current goal. There is only one goal seeking behaviour (Figure 6.2): toGoal. This behaviour (Listing 6.6) drives the robot to the left or to the right depending on the difference between the current heading and the desired heading (the heading required to reach the goal). The behaviour itself is purely reactive because it has no state, but the resulting behaviour might not be reactive because it uses a heading that could have been computed by a state-based method. This behaviour is important to reach the goal in an efficient way, although it is not strictly necessary because the robot can reach the goal performing an indefinite random walk. behaviour toGoal weight 0.3 { fusion turn; float diff; diff = alpha - heading; if (diff > 3.1415) diff = diff - (2 * PI); if (diff < -3.1415) diff = diff + (2 * PI); rules { if (diff is GN) if (diff is GSN) turn is TTL; turn is TL; 236 HUMBERTO M ARTÍNEZ BARBERÁ if (diff is GZ) if (diff is GSP) if (diff is GP) turn is TC; turn is TR; turn is TTR; } } Listing 6.6: Move-to-goal reactive behaviour. Five fuzzy sets (GN, GSN, GZ, GSP, and GP) have been defined for heading differences (Figure 6.7). These sets are not uniformly distributed along the speed range [-3.5, 3.5]. GN GSN GZ GSP GP 1 0 -3.5 - 0 . 8 7 - 0 . 5 2 -0.087 0 0.087 0 . 5 2 0 . 8 7 Figure 6.7: Fuzzy sets for heading difference. 3.5 6.3.4 Alignment behaviours The alignment behaviours are intended to maintain the robot along the centre of corridors. There are two alignment behaviours (Figure 6.2): • Align-right (alignR): aligns the robot to the wall on the right. • Align-left (alignL): aligns the robot to the wall on the left. The alignR behaviour (Listing 6.8) keeps the robot at a given distance from the wall on its right. The greater is the difference between the actual distance and the desired distance the tighter is the turn to the left or right. If the wall is close the behaviour drives the robot to the left, otherwise is drives the robot to the right. When the wall is close enough, the translation speed is set to a small value. This behaviour is important when navigating along corridors. behaviour alignR weight 0.15 { fusion turn, speed; rules { if if if if if (right (right (right (right (right is is is is is VFAR) FAR) MED) NEAR) CLOSE) turn turn turn turn turn is is is is is TR, TSR, TC, TSL, TL, speed speed speed speed speed } } Listing 6.8: Align-right reactive behaviour. 237 is is is is is SFULL; SFULL; SFULL; SMEDIUM; SLOW; CHAPTER 6: A SAMPLE AGENT ARCHITECTURE The alignL behaviour (Listing 6.9) keeps the robot at a given distance from the wall on its left. The greater is the difference between the actual distance and the desired distance the tighter is the turn to the left or right. If the wall is close the behaviour drives the robot to the right, otherwise is drives the robot to the left. When the wall is close enough, the translation speed is set to a small value. This behaviour is important when navigating along corridors. behaviour alignL weight 0.15 fusion turn, speed; rules { if if if if if (left (left (left (left (left is is is is is VFAR) FAR) MED) NEAR) CLOSE) turn turn turn turn turn is is is is is TL, TSL, TC, TSR, TR, speed speed speed speed speed is is is is is SFULL; SFULL; SFULL; SMEDIUM; SLOW; } } Listing 6.9: Align-left reactive behaviour. 6.3.5 Behaviour blending The output of the different behaviours is fused by the blender block (Figure 6.2), which is defined using a fuzzy rule base. The blender (Listing 6.10) selects at each control cycle the behaviours that are more appropriate given the current sensory information, extracted from the LPS. There are three groups of rules: • Rules for U-shaped obstacles, which modify the escape and toGoal behaviours. • Rules for obstacle avoidance, which modify the avoidL, avoidR, avoidF and ToGoal • behaviours. Rules for wall alignment, which modify the alignL and alingR behaviours. The underlying idea is: when the robot is navigating a free area the wall alignment rules prevail, but when there are obstacle in the vicinity of the robot the obstacle avoidance rules prevail. If the obstacles are close enough then the U-shaped obstacles rules apply. In either case, the toGoal behaviour has a default high value, so that it is active until it is explicitly deactivated, which occurs when the robot reaches any obstacle. blender { rules { background background background background background background (0.01) (0.01) (0.01) (0.01) (0.01) (0.001) escape avoidR avoidL avoidF toGoal alignR 238 is is is is is is LOW; LOW; LOW; LOW; HIGH; LOW; HUMBERTO M ARTÍNEZ BARBERÁ background (0.001) alignL is LOW; // Rules for U-shaped objects if (uobject is UZONE) escape is HIGH, toGoal is 0.0; // if if if Rules for obstacle avoidance (leftd is CLOSE) avoidL is HIGH, toGoal is LOW; (front is CLOSE) avoidF is HIGH, toGoal is LOW; (rightd is CLOSE) avoidR is HIGH, toGoal is LOW; if (leftd is NEAR) if (front is NEAR) if (rightd is NEAR) avoidL is HALFH, toGoal is HALFL; avoidF is HALFH, toGoal is HALFL; avoidR is HALFH, toGoal is HALFL; if (leftd is MED) if (front is MED) if (rightd is MED) avoidL is HALFL, toGoal is HALFH; avoidF is HALFL, toGoal is HALFH; avoidR is HALFL, toGoal is HALFH; // Rules for corridor alignment if (left is MED) alignL is HALFL; if (right is MED) alignR is HALFL; if (left is FAR) if (right is FAR) alignL is HALFH; alignR is HALFH; if (left is VFAR) if (right is VFAR) alignL is HIGH; alignR is HIGH; } } Listing 6.10: Behaviour blending rules. Four fuzzy sets (LOW, HALFL, HALFH, and HIGH) have been defined for behaviour activation (Figure 6.7). These sets are distributed along the degree of activation range [0, 1]. In this case, a value of 1 means that the behaviour is executed as its maximum degree, while a value of 0 means that the behaviour is not executed. One fuzzy set (UZONE) has been defined for U-shaped object presence (Figure 6.8). This set uses part of the degree of presence range [0, 1]. In this case, a value of 1 means that a U-shaped object is likely to be present, while a value of 0 means that no U-shaped object is likely to be present. LOW HALFL HALFH HIGH 1 0 0.0 0.1 0.2 0.4 0.5 0.7 0.8 Figure 6.7: Fuzzy sets for behaviour activation. 239 1.0 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE UZONE 1 0 0.0 0.5 Figure 6.8: Fuzzy set for U-shaped object presence. 1.0 6.4 Navigation The navigation agent is in charge of building a map of the environment, computing a path to the goal location, and keeping track of the robot location. To implement the map, this agent uses the grid structure provided by the BG language. This grid is updated continuously with the different sonar readings. The robot’s pose is computed relying only on odometry, under the assumption of near perfect odometry. This assumption holds true as long as the total distance of each robot run is kept short, and, in fact, experiments with real robots were reduced to an area of 20 by 20 metres (see also chapter 8). Only one task (map) has been defined for this agent (Figure 6.9), which is in charging of updating the grid map. Blackboard EPS heading gy gx virtu i alpha grid x y map Figure 6.9: Navigation agent. The map task (Listing 6.11) updates the grid map, which is stored in its EPS, using the different sensor measures (virtui) with a given robot’s pose (x, y, and alpha). Sensor readings that are longer than 5 metres are rejected in the map building process, to reduce the uncertainty in the map. This task also specifies the use of the A* algorithm for path planning, and the initial (x, and y) and last position (gx, and gy) of the path. Finally, the path planning method updates the desired heading variable (heading). agent Navigation 240 HUMBERTO M ARTÍNEZ BARBERÁ { task map { grid a_star updates heading cells 72.0 by 66.0 side 0.075 { location (x, y); goalpos (gx, gy); if if if if if if if if if if (virtu0 (virtu1 (virtu2 (virtu3 (virtu4 (virtu5 (virtu6 (virtu7 (virtu8 (virtu9 < < < < < < < < < < 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) 5.0) update update update update update update update update update update (x, (x, (x, (x, (x, (x, (x, (x, (x, (x, y, y, y, y, y, y, y, y, y, y, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, virtu0); virtu1); virtu2); virtu3); virtu4); virtu5); virtu6); virtu7); virtu8); virtu9); } } } Listing 6.11: Navigation agent. 6.5 Planning The planning agent is in charge of supervising the task which is executed by the robot, changing the goal when a partial run has been completed, and stopping the execution when the overall task has been accomplished. To implement this simple planner, this agent uses the finite state machine structure provided by the BG language. The different states of this FSM have associated goal co-ordinates. These are used for path planning (navigation agent), which produces a heading than later drives the robot (reactive control agent) to the desired state’s goal.). Only one task (plan) has been defined for this agent (Figure 6.10), which is in charging of updating the grid map. Blackboard EPS stage x gy gx h o m e _x y ingoal plan 241 h o m e _y CHAPTER 6: A SAMPLE AGENT ARCHITECTURE Figure 6.10: Planning agent. The plan task (Listing 6.12) manages information related to the robot’s goal. The task starts updating its EPS with the home location (home_x, home_y) with the starting robot’s position (x, y). The sensor processing module provides a function that sets a variable (ingoal) of the blackboard when the current goal (gx, gy) is achieved. State transitions occur based on this variable. A counter of how many subgoals are left to accomplish the global task is also updated (stage). This is a valuable information when a learning procedure is applied for behaviour learning. For a detailed discussion see chapter 7. agent Planning { task plan { fsm start HOME { state HOME: { home_x = x; home_y = y; stage = 2.0; shift LOOKING; } state LOOKING: { if (ingoal == 1.0) } state CHANGING: { gx = home_x; gy = home_y; stage = 1.0; if (ingoal == 0.0) } state RETURNING: { if (ingoal == 1.0) } state HALT: { stage = 0.0; halt; } } } } shift CHANGING; shift RETURNING; shift HALT; Listing 6.12: Planning agent. The state machine (Figure 6.11), which corresponds to the planning agent (Listing 6.12), starts in the HOME state, which sets some variables in order to establish the home and goal coordinates and then shifts to the LOOKING state. This state is executed until the ingoal variable is set to 1.0 (the robot is in the goal location), and then shifts to the CHANGING state. This state new state is executed until the ingoal variable is set to 0.0 (the robot is no longer in the goal location). While the state is being executed the new goal coordinates are set to that of the home location. When the robot exits the goal area, the new state is set to RETURNING. This state is executed until the ingoal is set to 1.0 (the robot is in the home 242 HUMBERTO M ARTÍNEZ BARBERÁ location), and then it shifts to the GOAL state. This state simply halts the robot and indicates that the robot’s task has been achieved (stage is set to 0.0). * * LOOKING ingoal = 1 HOME CHANGING ingoal = 0 GOAL * RETURNING ingoal = 1 * Figure 6.11: Planning agent’s finite state machine. 6.6 Experiments and results Using the agent architecture above explained, a series of simulator and real robot runs have been performed to test the performance and suitability of both the BG language and the BGA architecture. This performance is not measured quantitatively, but qualitatively. For some quantitative results see chapter 7, and for web-based experiments with a real robot see also chapter 8. All simulator experiments presented in this section have been simulated using sound reflective walls (to emphasise the robustness of control programs, the sonar sensors chosen are noisier in simulation than in real world environments). The floor plants of the experiments correspond to office like environments with narrow corridors and doorways and these are unknown to the robot before each experiment. 6.6.1 Point-to-point navigation experiment (nav-A) This simple experiment, named nav-A, shows a point-to-point navigation task, in which a simulated robot has to reach a goal location starting from a given initial position. No odometry errors are taken into account, and thus perfect odometry is assumed. This experiment is carried out in a complex floor plant (Figure 6.12), named “corridor”, which simulates a corridor with different rooms but on a smaller scale. The overall dimensions of this scenario are 5.2 by 4.9 metres, which forces the robot to move in a reduced environment and make sharp turns (to test its obstacle avoidance behaviours). The robot starts in the bottom-right corner and has to reach the top-left corner. The BG program for this experiment is that described in the previous section, with a minor modification in the planning agent. The 243 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE LOOKING state of the planner shifts to the HALT state when the ingoal condition is true, and thus the robot finishes the task. Goal Position Home Position Figure 6.12: Corridor floor plant. The BGen environment is provided with the Quaky-Ant robot description, the “corridor” floor plant description which includes the home and goal locations and the BG program. Some snapshots of the sequence are shown below (Figure 6.13). These have been obtained by modifying the images captured from two BGen windows. The upper part of each image shows the current behaviour activation, while the lower part shows the current measurement of the sensors, a portion of the complete “corridor” map, and the path followed by the robot. The thin triangles correspond to infrared sensors, the thick triangles correspond to sonar sensors and the arrows correspond to the fused virtual sensor. Some snapshots of the map-building and path-planning process are also shown below (Figure 6.14). These have been obtained by capturing a BGen window. The image shows the current grid map represented as small squares of different shading (the darker the cell, the more certainty that it is occupied). The “corridor” floor plant is superimposed as thin straight lines. The shortest path to the goal is represented as a thin non-straight line. And the robot position is represented as a circle (larger than the robot). 244 HUMBERTO M ARTÍNEZ BARBERÁ Step-1: The robot starts. Step-2: The robot has just passed through a doorway. Step 3: The robot is in the corridor. Step 4: The robot is going through a doorway. 245 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE Step 5: The robot has reached the goal. Figure 6.13: Navigation experiment with simulated Quaky-Ant robot (nav-A). Step 1 Step 2 Step 3 Step 4 Step 5 Figure 6.14: Different views of the grid-map during the nav-A experiment. The experiment shows that, with the BG program described above, the robot is able to reach the goal while describing a smooth path and avoiding obstacles. It is important to note here that the difficulties the doorways suppose are very real since the size of the robot is only slightly less than that of the robot. In this example the reactive behaviours help the robot in the obstacle avoidance problem, while the path-planning module modifies the most direct path to the goal in order to guide the robot through zones without obstacles. 6.6.2 Point-to-point navigation experiment (nav-B) 246 HUMBERTO M ARTÍNEZ BARBERÁ This experiment, named nav-B, shows a point-to-point navigation task, in which a real robot has to reach a goal location, starting from a given initial position, and then return to the initial position. The robot, Milou (Figure 6.15), is a Nomad-2006 commercial robot, which belongs to the Centre for Applied Autonomous Sensor Systems (AASS), Örebro University, Sweden, which the author visited for some months. Due to a hardware failure, the infrareds were not working properly, and thus, the robot relied only on sonars. As the experiment involves only small runs, the BG program relies on odometry and does not use any localisation method. Odometry errors are not therefore taken into account. Figure 6.15: Milou, a Nomad-200 mobile robot. This experiment is carried out in a laboratory floor plant (Figure 6.16), named “AASSlab-1”, which is made of wooden walls. The approximate overall dimensions of this scenario are 4.2 by 7.5 metres. The robot starts in the bottom-right corner, has to reach the top-right corner, and then return to the bottom-right corner. There is no direct path between the starting and goal locations. 6 http://www.robots.com 247 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE Figure 6.16: AASS-lab-1 floor plant. The BG program for this experiment is similar to what is described in the previous section, but with some obvious minor modifications. The changes from the original BG program for Quaky-Ant described in the previous section can be summarised as: • The number and orientation of the sonar sensors is different and thus the numbering is • • different. The infrareds are not fused to sonars. Moreover, the navigation agent uses more sensors (16 instead of 10) to update the grid map. Milou runs a little bit faster than Quaky-Ant, and thus a larger clearance area is needed. For this reason the distance-related fuzzy sets have been shifted 15 centimetres to allow for a greater reaction time. The linear and rotational speeds are already decoupled in Milou’s low-level controllers, and thus the BG variables turn and speed are simply scaled and passed on to Milou’s controllers. The BGen environment is provided with the Milou robot description, the home and goal locations, and the BG program. Some images of the sequence are shown below (Figure 6.17). 248 HUMBERTO M ARTÍNEZ BARBERÁ Figure 6.17: Navigation experiment with Milou robot (nav-B). The experiment shows that, with the BG program described above, the robot is able to reach the goal and return to the initial position, describing a smooth trajectory and whitout colliding with the walls. It is important to remember here that the robot does not know the environment in advance, and thus the paths are recomputed dynamically as soon as new sensor information is available. Moreover, this experiment has been repeated with different starting and goal points, and even with slow-moving human obstacles, and once the low-level behaviours and blending rules are correctly adjusted the robot never collides and always finds its way to the goal point (provided there is a valid path). 6.6.3 Point-to-point navigation experiment (nav-C) This experiment, named nav-C, shows a point-to-point navigation task, in which a simulated robot has to reach a goal location starting from a given initial position and then return to the initial position. This experiment is similar to nav-B, but with the robot and floor plant (Figure 6.12) of nav-A. Two different runs, which differ in the path-planning method, are compared. While the first is a deliberative approach, based on an A* path-planner, the second is a purely reactive approach, based on the most direct path to the goal. The deliberative run (Figure 6.18) produces similar results to experiments nav-A and nav-B. The reactive behaviours help the robot in the obstacle avoidance problem, while the path-planning module modifies the most direct path to the goal in order to guide the robot through free zones. In this case no paths are traced through previously discovered obstacles. 249 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE Figure 6.18: Navigation experiment with deliberative path-planning (nav-C). The deliberative run (Figure 6.19) proceeds in a completely different way. While the reactive behaviours help the robot in the obstacle avoidance problem, the path-planning module always returns a direct path to the goal without taking into account known obstacles. In this case the robot still completes the task but follows a path that depends on the configuration of the environment. It is important to notice that when using this reactive approach the robot can get trapped in U-shaped obstacles or in infinite loops (as is usual in reactive path-planning based systems). Figure 6.19: Navigation experiment with reactive path-planning (nav-C). The experiment shows that changing a strategy in BGen can be as simple as changing a line of BG code. In this case, the only difference between both runs is the first parameter of the grid structure, which is defined in the navigation agent. The deliberative approach is 250 HUMBERTO M ARTÍNEZ BARBERÁ obtained using an A* path-planner (a_star option), and the reactive approach is obtained using a reactive planner (direct option). It is important to note that the obstacle avoidance behaviour prevails in both cases, and thus, the robot keeps its integrity in different environments, providing the robot’s sensors can detect the obstacles. A typical problem with the Nomad-200 robot is that its sonar sensors are placed too high and cannot usually detect chairs, tables and small furniture in general. 6.6.4 Map-building experiment (map-A) This experiment, named map-A, shows an autonomous map-building task. A real robot is placed in a completely unknown environment in order to build a map. This experiment has been carried out using a set-up similar to that of experiment nav-B: the same robot without the infrared sensors. As the experiment involves long runs, the BG program uses a localisation method based on a fuzzy segments map (see chapter 4). This experiment was carried out in a laboratory floor plant (Figure 6.20), named “AASS-lab-2”, which is made of wooden walls. The approximate overall dimensions of this scenario are 4.2 by 7.5 metres. The robot starts somewhere in the environment and tries to build a map of the environment by moving to open and unmapped areas. Figure 6.20: AASS-lab-2 floor plant. The BG program for this experiment is similar to what is used in nav-B experiment, but with two important modifications: • In addition to the fuzzy grid map, which is used for path-planning, the robot also builds a • fuzzy segments map, which is used for localisation. The combination of these two kinds of maps has been proved to perform very well. The planning agent is similar to that of nav-A experiment, but instead of receiving goal coordinates from a human operator (as a parameter of the BG program), it receives goal coordinates from a new agent, the topology-based planner. 251 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE The topology-based planner builds and maintains a global topological map (Duckett and Nehmzow, 1998) by relaxation (Duckett et alt., 2000). The topological map (Figure 6.22) keeps track of the places visited (black circles) and also of tentative free places (red circles). The planner selects one of these free places, and sends its coordinates as a new goal to the planning agent. Figure 6.21: Topological map. The BGen environment is provided with the Milou robot description and the BG program. The map resulting after a ten minutes run is shown below (Figure 6.22). The darker segments correspond to the global map, and the lighter segments correspond to the local map. Figure 6.22: Map-building experiment with Milou robot (map-A). 252 HUMBERTO M ARTÍNEZ BARBERÁ The experiment shows that the robot is able (with the BG program described above and a topology-based planner) to build a map of the environment while staying localised. It is important to note that odometry errors are very significant after some minutes and the robot is still capable of localising. Although the reason of the experiment was to test the interoperability of a BG program with a non BGen-based program, the experiment also proved to be capable of solving the SLAM (Simultaneous Localisation And Map-building) problem. More tests are still needed to verify or prove in which situations or environments this set-up is valid for the SLAM problem. 6.7 Conclusions The experiments presented in the previous section show that BGen, which includes the BG language and the underlying BGA architecture, provides a good working environment for mobile robotics related problems. BG programs are simple and easy to understand. Without knowing in advance what a BG program does, it is easy to figure out the purpose of the program, although of course, this also depends on how well written the program is. In any case, although the tasks involved in achieving a robot’s goal are complex, the BG language hides the details and allows the robot programmer to concentrate on the application domain. All the specific techniques for mapbuilding, localisation, path-planning, etc are available directly from the language, while new methods can be added in the future. This favours and helps in a quick development of robot programs as well as in their maintenance. The BGen environment is open and portable. While all the development of the BGen environment has been carried out in the University of Murcia (Spain) in parallel to the development of the Quaky-Ant robot, one of the experiments (nav-B) shows the performance of a completely different robot. The porting and interfacing of the BGen environment to the new system and the rewriting of the BG program to account for the new robot geometry and speed took less than two weeks, when the author was at the University of Örebro (Sweden). It is important to note also that the environment was not tried completely in a real robot. These facts support that the BGen environment is flexible and easy to port to other platforms. Java-based robotics environments are practical and usable. It is often argued that Java programs are extremely slow. While it is true that Java just-in-time compiled programs are a little bit slower than native ones, it is still possible to use Java in most robotics applications. Moreover, the BG interpreter runs on top of a Java virtual machine. Thus, BG programs are interpreted twice, but this is not a problem because most computing-intensive techniques run directly as Java libraries (map-building, localisation and path-planning). The added value of the BGen system is again its portability and reuse of code at two different levels: at a system level, reusing the whole set of BGen libraries, and at an application level, reusing BG programs for different robot platfor ms. The BGen environment is expandable. One of the experiments (map-A) shows that new techniques or modules can be incorporated into the BGA architecture. In this case the 253 CHAPTER 6: A SAMPLE AGENT ARCHITECTURE topology-based planner is not a Java program but a C/C++ program interfaced by way of the Java Native Interface (JNI). In this way new services or legacy applications can be added. 254 Chapter 7 Behaviour Fusion Learning, 7.1 Introduction A key issue of behaviour-based control is how to efficiently coordinate conflicts and competition among different types of behaviour in order to achieve a good performance. Instead of just inhibiting and suppressing strategies according to assigned priorities, the BGA architecture uses a fusion method which is based on context dependent blending (see chapter 5). A fuzzy rule base carries out this fusion, enabling/disabling the output of the different behaviours, both totally and partially. The use of learning techniques (Bonarini and Basso, 1997) (Dorigo and Colombetti, 1998) in the fusion rule base may result in a robot navigation performance improvement. One of the advantages of this approach is that the system can be trained with just some simple behaviours, and then, if new behaviours are to be added, only the fusion rule base needs to be re-trained while preserving the behaviours previously developed. Different learning techniques have been used for the fusion problem, for instance: Q-learning of motor schemas in robot soccer and robot formation maintenance (Balch and Arkin, 1995) (Balch, 1998), genetic programming for hierarchical fuzzy control of robots (Tunstel, 1995) (Tunstel, 1996), reinforcement learning of behaviour selection in reactive systems and in sequential behaviour patterns (Dorigo and Colombetti, 1998), Q-learning based methods for behaviour selection in robot soccer (Stone, 2000), reinforcement learning of fuzzy behaviours combination (Bonarini and Basso, 1997). Other approaches do not consider the fusion problem and apply learning techniques to develop single or multiple behaviours. The proposed learning process is as follows. The user selects the BGA agent whose blending rule base is to be learned. An interesting and practical approach is to start with a predefined set of fuzzy rules, which are then used as input to the learning method. This rule set can even be the empty set. In either case, the individuals of the genetic algorithm are fuzzy rule bases (Pittsburgh coding), and the result of the learning process is a fusion fuzzy rule base. The structure of the learning method is described in the following section. 7.2 Behaviour fusion using genetic-based algorithms 255 CHAPTER 7: BEHAVIOUR FUSION LEARNING 7.2.1 Introduction Genetic algorithms (Biethaln et al., 1995)(Golberg, 1989) are adaptive procedures of search and optimisation inspired by the mechanisms of natural evolution that find solutions to problems (see chapter 1). They imitate, on an abstract level, biological principles such as a population based approach, the inheritance of information, the variation of information via crossover/mutation, and the selection of individuals based on fitness. For any type of application, genetic algorithms are characterised by: • A genetic-like representation of solutions to the problem. • A method to obtain or create an initial population of solutions. • A fitness function to measure how good any solution is. The function plays the role of the • • environment in which best solutions may have a greater probability of survival. Genetic operators that effect the composition of children during reproduction. A set of parameters used by the algorithm to guide its evolution: population size, number of generations, crossing and mutation probabilities, etc. Genetic algorithms have shown good results in obtaining fuzzy rule bases, although usually using different algorithms in the generation and tuning process or just performing one of these steps (Herrera et al., 1995) (Ishigami et al., 1995). The proposed algorithm for behaviour fusion (Gómez Skarmeta and Jiménez, 1997) (Gómez Skarmeta and Jiménez, 1999) (Gómez Skarmeta et al., 1997a) has two operation modes: • Only tuning. The genetic algorithm is used to tune a fuzzy rule base, which has been • previously introduced using any method. Generation and tuning. The genetic algorithm is used to generate and tune a fuzzy rule base directly from data. It is also possible to incorporate an initial rule base, as in the previous mode, to help the search process. The user defines the maximum number of rules (RMAX) a priori. In the context of behaviour fusion, the algorithm is used in generation and tuning mode with an initial rule base obtained manually from user knowledge. The main characteristics of the algorithm and its parameters are described in the following subsections. 7.2.2 Representation and initial population To simplify the representation of the fuzzy rules, the algorithm imposes and assumes some constraints: the fuzzy sets must be trapezoidal and the rules must use only conjunctions. While the first constraint can influence the performance of the results, the second one does not impose any restriction. Any set of fuzzy rules with disjunctions and negations can be reduced to a rule set with conjunctions only by changing the number of rules and 256 HUMBERTO M ARTÍNEZ BARBERÁ complementing some fuzzy sets. Trapezoidal fuzzy sets are usually used in many control applications and the experimental results show that they are adequate for this problem. Conjunction-only MIMO fuzzy rules have the following form (Eq. 7.1): Ri ≡ if xi1 ∈Ai1 ∧K ∧ xin ∈Ain then yi1 = Bi1 ∧K ∧ yim = Bim (7.1) where xi1 … xin are the n input variables, yi1 … yim are the m output variables, and Ai1 … Ain, Bi1 … Bim are trapezoidal fuzzy sets, which are defined as follows (Eq. 7.2): [ Tij = tij1 ,t 2ij ,tij3 ,t ij4 ] (7.2) Each rule Ri is coded by concatenating all the points that determine the different fuzzy sets. To avoid using very small fuzzy sets, a cut-off threshold sth is applied. If one fuzzy set has a support which is smaller than a percentage sth of its range, it is not taken into account for the final rule. The general format is shown below (Eq. 7.3): [ 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Ri ≡ ai 1 , ai 1 , ai 1 ,ai 1 ,K,ai 1 ,ai 1 ,ai 1 ,ai 1 ,bi 1 ,bi 1 ,bi 1 ,bi1 ,K,b i1 ,bi 1 ,bi 1 ,bi 1 ] (7.3) An individual Hi is composed by the concatenation of the different rules and a series of control bits (Eq. 7.4). These are used to determine which rules are enabled or disabled (d i) and which ones are background rules (k i). Background rules (see chapter 5) use the same coding as regular fuzzy rules, but they only use the first number, which corresponds to the degree of activation of the rule, and the values representing the consequent. To avoid the excessive generation of background rules, only a percentage bth of the total number of rules RMAX is allowed. Hi ≡ [R1 ,K RRMAX,d1 ,k1 ,K,dRMAX,k RMAX] (7.4) If the individuals are to be used in tuning mode only, the control bits are not used, and RMAX corresponds to the initial set of rules. Using this coding scheme, the input variables are those used to distinguish or establish the different activation contexts and the output variables are the degrees of activation of the different behaviours. As stated previously, the learning procedure is run in generation and tuning mode. To create the initial population two criteria are followed: • To explore all the available information (user defined fusion rule base). • To explore the solution space to obtain an appropriate diversity. According to these criteria, three different alternatives F1, F2, F3 are used to initialise the solutions, which are applied with different probabilities pf1, pf2, pf3. The F1 and F2 forms assume the existence of an initial rule base with k rules, where 1 k RMAX. The initialisation methods are described below: 257 CHAPTER 7: BEHAVIOUR FUSION LEARNING (F1) All the rules of the user defined rule base are introduced at random positions of the solution, the corresponding values of the control set dj are set to 1, and the corresponding values of the control set k j are set according to the type of the rules. The remainder (RMAX – k) rules are obtained randomly according to the domains of the input and output variables, the corresponding values of the control set dj are set to 0 and the corresponding values of the control set kj are obtained randomly. (F2) Each rule of the solution, with equal probability is either selected from the initial rule base or obtained randomly. All the values of the control sets dj and kj are generated randomly. (F3) All the RMAX rules of the solution and their corresponding control sets dj and kj are obtained randomly. 7.2.3 Genetic operators and parameters The operation of the algorithm is directed by series of parameters (Golberg, 1989). Basically, in addition to the above-defined parameters, they correspond to the type of genetic operators used, the probabilities of both mutation (Pm) and crossover (Pc), the maximum number of generations (Geners) and the population size (Popul). Since the individuals contain two different types of information (denoted as rule set and control set) it is necessary to use different genetic operators to perform completely the recombination process. The following operators are used for the rule set (Gómez Skarmeta and Jiménez, 1999): • Simple crossover. Two individuals Hi and Hj are selected as parents for the crossover operation. Let r and s be two random numbers such that r ∈[1K RMAX ⋅ (n + m + 2)] and s ∈[r K RMAX ⋅ (n + m + 2)]. The elements of Hi and Hj are interchanged between • positions r and s to obtain new individuals. Arithmetical crossover. Two individuals Hi and Hj are selected as parents for the crossover operation. Let r and s be two random numbers such that r ∈[1K RMAX ⋅ (n + m + 2 )] and s ∈[r K RMAX ⋅ (n + m + 2)], and let α ∈[0 K1] be a random number. For each fuzzy number u k from Hi with r k s and for each number vk from Hj with r k s, the following equation is applied (Eq. 7.5): t t t (7.5) u k ← α uk + (1 − α )vk t = 1K4 v tk ← αvtk + (1 − α )ukt t = 1K4 • MaxMin crossover. Two individuals Hi and Hj are selected as parents for the crossover operation. Let r and s be two random numbers such that r ∈[1K RMAX ⋅ (n + m + 2 )] and s ∈[r KRMAX ⋅ (n + m + 2)]. For each fuzzy number u k from Hi with r k s and for each number vk from Hj with r k s, the following equation is applied (Eq. 7.6): 258 HUMBERTO M ARTÍNEZ BARBERÁ u tk ← max{u kt ,vkt } t = 1K4 v tk ← min{ukt ,vkt} t = 1K4 (7.6) • Simple mutation. Let uk be a fuzzy set which has been selected for mutation, and let α1, α2, α3, α4 be random real numbers. The new fuzzy set is obtained as follows (Eq. 7.7): (7.7) u 2 − u1k 1 uk2 − u1k ,u k + u 1k ← α 1 ∈max a,u1k − k 2 2 u 2 − u 1k 2 uk3 − u2k u 2k ← α 2 ∈ uk2 − k ,u k + 2 2 u 3 − u2k 3 uk4 − u3k u 3k ←α 3 ∈ u3k − k ,u k + 2 2 u 4 − u 3k u 4 − uk3 u k4 ← α 4 ∈ u4k − k ,min b,u k4 + k 2 2 • where [ a, b] is the domain of the variable corresponding to the fuzzy set uk. Nonuniform mutation. The action of this operator depends on the age of the population, and its effect is a fine local tuning in the last generations. Let uk be a fuzzy set which has been selected for mutation, let r ∈[0 K1] be a random real number, let s ∈[0,1 ] be an integer random number, and let c be a parameter which represents the degree of nonuniformity. The new fuzzy set is obtained as follows (Eq. 7.8): (7.8) g ut + (b − u t )1 − r 1− Geners s = 0 k k t ← uk t = 1K 4 g t 1− uk − (u kt − a)1 − r Geners s = 1 c c where g is the number of the current generation, and [a, b] is the domain of the variable corresponding to the fuzzy set u k. As the control sets are bit strings, simple binary operators are used (Gómez Skarmeta and Jiménez, 1999). They are described below: • Classical crossover. Two individuals Hi and Hj are selected as parents for the crossover operation. Let r and s be two integer random numbers such that r ∈[1KRMAX ] and s ∈[r KRMAX ]. The elements of d r and d s, kr and ks are interchanged to obtain new • individuals. Classical mutation. Let d i and k i be two control bits which have been selected for mutation. The new control set is obtained by assigning 0 and 1 randomly to d i and k. 259 CHAPTER 7: BEHAVIOUR FUSION LEARNING When crossover or mutation are to be applied, the algorithm selects one of the previously defined methods randomly. Although different probabilities may be selected, in the current application all the operators have the same probability of being selected. 7.2.4 Fitness function While the other algorithm-related topics are more or less standard, the definition of the fitness function depends strongly on the application context, which is not usually straightforward. This function has to measure how good each of the individuals is, which actually affect the performance of the robot. Thus, the fitness function takes into account how the robot executes a predefined task. The user must define a simulation framework with, at least, the following properties or parameters: • Goal: where the robot starts, where the robot ends, and what the robot has to do. • Map: a floor plant with walls, doors, corridors and obstacles. • Iters: the number of simulation runs per individual. To avoid infinite loops in the simulation some additional parameters are needed: • Timeout: maximum allowable time for each run, measured in epochs. • Maxcolls: maximum allowable number of collisions. Evaluating the fitness function for each individual implies the use of the its fuzzy rule bases as fusion rules in a BG program, and then the simulation of the robot with those BG programs, with one different BG program for each individual in each generation. When a simulation finishes, it returns the following values: • • • • • Gotcha: -1.0 if the goal is accomplished, 1.0 otherwise. Epochs: the total number of simulations steps performed. Colls: the number of robot-wall collisions. Stage: the last portion of the whole task already completed by the robot. Way: the percentage of the distance left from the start to the goal location in the current stage. The way (Eq. 7.11) is computed as a percentage in the shortest Euclidean distance between the start and goal locations (Eq. 7.9) and the current robot and goal locations (Eq. 7.10). The fitness function (Eq. 7.12) is then computed using the different parameters and values defined above. 260 HUMBERTO M ARTÍNEZ BARBERÁ Total = (startx − goalx )2 + ( starty − goaly ) 2 (7.9) ToGoal = (robotx − goalx ) 2 + (robot y − goaly ) 2 (7.10) Way = Total × 100 ToGoal (7.11) iters Fitness = ∑ [(Timeouti − Epochsi ) − k1 ⋅ Collsi − k 2 ⋅ Gotchai − k3 ⋅ Stage ⋅ Way] (7.12) i The fitness function thus defined tries to take into account the different aspects relevant to a good robot performance: rewarding the full completion of the task (gotcha), rewarding low execution times (epochs), and punishing collisions with the walls (colls). In addition, this function is required to measure somehow the robot’s performance, even if it did not complete the task. Thus, it also takes into account: the number of subgoals completely achieved (stage), and the degree of completion of the last subgoal (way). Otherwise, the evolution cannot differentiate between different individuals that are not good enough as final solutions, but could be good candidates to continue evolving. This is very important at the beginning of the learning process. 7.3 Experimental results Different experiments have been carried out to test the performance of the genetic-based algorithm for the behaviour fusion problem. In all cases, the learning process starts from a given BG program, which is the one described previously as a sample architecture (see chapter 6). The BG program specifies the number of input and output variables, which correspond respectively to the different fusion variables (Figure 6.3) and the reactive behaviours (Figure 6.2) of the reactive control agent. It also specifies the initial fuzzy rule base (Listing 6.10), which is used in the initialisation process. The task (simulation parameter Goal) has been defined in a similar way as in experiments nav-B and nav-C of the sample architecture: the robot starts at a home location, it has to reach a goal location, and then it has to return to the home position. As the fitness function is computed using the results of a simulation, this needs to take the least time possible. For this reason only very small environments are used (simulation parameter Map). Small environments have the disadvantage of presenting only a few situations, which are useful for the learning process. Thus, the approach used for training the fusion rule base is to run the simulation in three different small scenarios which are complementary (Figure 7.1). These scenarios represent three classical reactive problems: the goal is occluded by a symmetrical obstacle, the robot has to turn to the left while the goal is to its right, and the robot has to turn to the right while the goal is to its left. If only one simulation is run per individual (simulation parameter Iters), only the first scenario is used. Otherwise, the other two are also used if Iters is set to 2 or 3. It is important to note that the alignL and alignR behaviours (see chapter 6) are not used due to 261 CHAPTER 7: BEHAVIOUR FUSION LEARNING the size of the scenarios: these behaviours are intended for greater environments with wider corridors. Figure 7.1: Simulation maps for learning. Different tests have been carried out to select the appropriate parameters of the genetic algorithm, which are summarised below (Table 7.1). As can be noted, the mutation and crossover probabilities are higher than in typical genetic-based systems. The reason is related to simulation execution time. If the probabilities are set to low values the exploratory time (i.e. the number of generations needed to explore different parts of the solution space) is greater. If the probabilities are set to higher values the exploratory time is smaller at the expense of a poorer local search. In the current application the global search capability is preferred, and thus, high probabilities are used. Parameter pf1, pf 2, pf3 sth bth Pm Pc RMAX n m Meaning Initialisation probabilities for F1, F 2, F3 Set size cut-off threshold Background rules number threshold Probability of mutation Control set Rule set Probability of crossover Control set Rule set Maximum number of rules Number of input variables Number of output variables Value 0.33, 0.33, 0.33 0.05% 7.5% 0.15 0.25 0.3 0.9 100 6 5 Table 7.1: Parameters of the genetic algorithm. Using the previously defined learning environment setup and genetic parameters, an experiment (Test#31) is performed with the initial fusion rule base modified so that the robot neither reaches the goal nor collides. The underlying idea is to test the exploratory ability of the algorithm because the initial fusion rule base is very simple. However, it is not a search from scratch because the rules provide a good starting point. The specific parameters of the experiment are shown below (Table 7.2). The fitness function (Figure 7.2) starts with negative values, which corresponds to when the goal is not reached, and ends with high positive values, which corresponds to when the is goal reached in a small number of epochs. The dark curve corresponds to the fitness of best individual, while the light one corresponds to the average fitness of all the individuals. It can be noted that the population is benefiting from the evolution (the average is slowly growing). The learning process took 111.9 hours to finish in a Mac PowerPC G3 at 300 MHz. 262 HUMBERTO M ARTÍNEZ BARBERÁ Parameter Geners Popul Iters Timeout Maxcolls Meaning Number of generations Number of individuals Number of simulations per individual Maximum number of epochs Maximum number of collisions Value 1033 8 2 500 5 Table 7.2: Parameters of the Test#31. Test#31 500 400 300 200 100 Best 0 0 200 400 600 800 1000 Avg 1200 -100 -200 -300 -400 -500 Geners Figure 7.2: Evolution of the fitness function (Test#31). As stated previously (see chapter 5), the behaviour definition approach of BG and the underlying BGA model allow the use of weights to indicate the importance of each behaviour. In the learning context these weights play an important role because the information is not evolved and is kept static during the learning process. This information is contributed by the user which acts as a domain expert. The behaviour fusion mechanism uses both the fusion fuzzy rules and the behaviour weights. Thus, evolving just the fuzzy rules makes sense, and it is justified by the fact that weights establish high level priorities, while fuzzy rules give a finer control. To test the influence of the weights in the learning process, two experiments (Test#34 and Test#35) have been carried out. The first one has been performed using different weights for the different behaviours and the second one has been performed without using different weights (i.e., all the weights have been set to 1.0). The specific parameters of both experiments are shown below (Table 7.3). Parameter Geners Popul Iters Timeout Maxcolls Meaning Number of generations Number of individuals Number of simulations per individual Maximum number of epochs Maximum number of collisions Value 50 6 2 700 7 Table 7.3: Parameters of the Test#34 and Test#35. The same set of initial fusion rules as in the nav-B experiment (see chapter 6) is used to launch two learning processes: one which uses user defined behaviour weights and another which does not uses behaviour weights. Each learning process took 15.7 hours to finish in a PC Pentium III at 450 MHz. The first one (Figure 7.3) achieves a lower best fitness value but achieves a much better population average. 263 CHAPTER 7: BEHAVIOUR FUSION LEARNING Figure 7.3: Evolution of the fitness function (Test#34 and Test#35). To test the generalisation capabilities of both approaches a series experiments has been carried out. Each experiment consists of five tests to measure the performance in different scenarios, which are described below: • Test A*: the initial rule set using behaviour weights and a path planner using the A* • • • • algorithm. Test A*+GA +: the same as above but the fusion rules are those of Test#34. Test A*+GA -: the same as above but the fusion rules are those of Test#35. Test Direct: the initial rule set using behaviour weights and a path planner using the reactive approach. Test Direct+GA+: the same as above but the fusion rules are those of Test#34. The experiment has been run five times, and the results and their averages are summarised below (Table 7.4). As can be noted, the use of weight produces better results in all cases: the average of the test A*+GA+ is greater than the average of the test A* and the average of the test Direct+GA+ is greater than the average of the test Direct. The test A*+GAproduces better local results but on average produces poorer results compared to the tests A* and A*+GA+. It should be noted that the tests involving the reactive path planner are more difficult because the timeout is fixed for all tests, and, as has been shown (see chapter 6), it does not produce optimal paths. A* Tests Average 533 529 523 522 514 524.2 A*+GA+ 541 550 549 548 539 545 A *+GA558 -1010 -1140 569 -1320 -468 Direct 520 -1135 508 513 -1140 -146.8 Table 7.4: Influence of priorities in the learning process. 7.4 Conclusions 264 Direct+GA + 529 -635 341 513 509 251 HUMBERTO M ARTÍNEZ BARBERÁ A genetic-based method has been developed to assist in the development of behaviour fusion fuzzy rule bases. The algorithm is used to obtain or learn a new fusion rule set from an initial set of rules provided by the user (this may be the empty set). The genetic algorithm has been adapted to encode individuals which are converted to BG fusion fuzzy rule bases. Thus, these individuals can be standard MIMO fuzzy rules and also background fuzzy rules. Moreover, the fitness function has been defined to include the effects of the experimental evaluation of the robot’s behaviour by using simulation. This function takes into account different parameters which measure the robot’s performance in both successful and unsuccessful runs. This point is crucial in the initial part of the learning procedure because most of the individuals do not reach the goal, and it is very important to discriminate between them to speed up the learning process. Different experiments have been carried out to test the influence of the behaviour weights and the results of the learning process. As has been shown, behaviour weights can aid the learning process by narrowing the search space through using user’s knowledge of the domain. Moreover, the use of the learning process improves the performance of the robot with respect user defined fusion rules. 265 Chapter 8 A Web-Based Teleoperation Application 8.1 Introduction In recent years, web-based robotics applications (Taylor and Dalton, 2000) have grown in such way that many of the major robotics laboratories have applications to access one of their robots from the web. The roots of new web-based robotic applications appear back at the early days of teleoperation. In teleoperation, a human operator forms part of the closed-loop control of a robot. At its simplest, communication is through mechanical linkages and feedback is by directly viewing. Such equipment has long been used in the nuclear industry, where just a few centimetres of lead glass separate the operator from the robot. Teleoperation from greater distances through communication links and with video or other forms of feedback finds application in hazardous environments, notably in the nuclear industry (hundreds of metres), the control of underwater remotely operated vehicles (several kilometres), and space applications (thousands of kilometres). However, regardless of the application, all teleoperation systems have the following specific features (Sayers, 1999): • An operator interface, incorporating a master input device that the operator uses to • • command the system. A slave output device that performs the operator’s commanded actions at the remote site. A communication scheme between sites. As teleoperation systems have moved from a mechanical connection to a digital technology, the opportunities for assisting the operator have increased. Possibilities for operator aids include (Sayers, 1999): • Reindexing. Reindexing is where the connection between master and slave devices is • temporarily removed, the master arm is repositioned, and then the connection is reestablished. This allows a master device with a relatively small workspace to move a slave device through a large motion by using a series of discrete motions. This technique is basically used with teleoperated robotic arms (Sayers, 1999). Feedback interfaces. The use of video cameras at the remote site has long been used as a feedback aid to provide views of the task site from different points of view. Video transmission demands high bandwidth availability. When this is not possible, computer- 267 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION • generated imagery supplies the operator with a virtual interface that combines low bandwidth sensory data to form a realistic image. These virtual interfaces are also useful to overlay computer predictions or visual clues onto video images (Kim and Stark, 1989). Kinesthetic aids. It is interesting for some tasks that the operator receives physical feedback additionally to video feedback. These haptic-augmented systems give superior performance compared to purely visual systems. Using haptic systems the operator can sense the response of its commands directly in the control interface. This technique is applied in teleoperated robotic arms and in remotely operated vehicles control (Brooks et al., 1990). When there is significant time delay in the communication link (that is, as the distance between sites increase), instability occurs and manual closed-loop control is no longer suitable (Sheridan, 1992). There are some techniques to cope with such delays: • Long range teleoperation. It was the first way of eliminating manual closed-loop control. • • It consists of a classical open-loop control where the operator does not have direct feedback. These systems tolerate delays by using the human operator as an intelligent link to the control system. An error in the pose of the remote device is visible in the returned imagery, and the operator sends commands to the remote device to compensate. While open-loop control systems are not certainly optimal, they are simple and they work. However, as delays become large, the efficiency with which the operator can work drops dramatically (Ferrell, 1965). Predictive displays. The way to improve performance is to provide the operator with immediate feedback. Because of the communications delay, that immediate feedback cannot come from the remote site, and instead it must be generated at the operator station. Predictive display is suitable where a model of the system is available. Examples of such systems include manipulators control (Machida et al., 1990) and underwater vehicles control (Sayers, 1999). Supervisory control. One technique that can improve operator performance and avoid the instability problem is a shared or supervisory control scheme, where control of the robot is shared between a local control loop and the remote human operator (Sheridan, 1992). The goal is not to make a robot that can realise the full task autonomously but to enable the robot to perform some simple subtasks that the operator can sequence. Web telerobotics is a concept still in its infancy. It involves control of a remote robot or device from within a web browser over the Internet. It requires a supervisory control scheme to avoid instability and makes a robot available to vast number of people, thus opening up a new range of applications. In August 1994, a SCARA type robot was connected to the Internet at the University of California at Berkeley7. Four weeks later, in September 1994, an ASEA IRB-6 robot was connected to the Internet through a web server at the University of 7 http://www.usc.edu/dept/garden 268 HUMBERTO M ARTÍNEZ BARBERÁ Western Australia 8 so that anyone with web access could direct the robot to manipulate wooden blocks on a table. These were the first physical devices connected to the web (Taylor and Dalton, 2000a). In August 1996, the IRB-6 robot was replaced by an ABB IRB 1400 robot. The software was rewritten to work with the new robot and to incorporate knowledge gained from the first version. The robot is now in use most of the time. Other groups have since pursued this concept. Cox9 put up a system that allows web user to remotely schedule photos from a robotic telescope. Paulos and Canny have implemented several Internet robots with elegant user interfaces (Paulos and Canny, 1996). Bekey and Goldberg used a six-axis robot, rotating platform, and stereo cameras to allow remote scholars to closely inspect a rare sculpture (Akatsuka et al., 1998). An impressive variety of other Internet robot projects were reported in the recent IROS and ICRA workshops. To get a rough idea about the number and type of such web based robots see (Taylor and Dalton, 2000a)(Goldberg et al., 2000)(Saucy and Mondada, 2000)(Schulz et al., 2000)(Taylor and Dalton, 2000b)(Stein, 2000)(Simmons et al., 2000)(Hirukawa and Hara, 2000). Additionally, there is a long list of mobile robots that can be controlled over the web. Most of these systems provide exclusive remote control to a single person or provide queues to schedule user requests. KhepOnTheWeb (Saucy and Mondada, 2000) is a typical representative of mobile robots with a web interface. Users can give elementary movement actions to the robot and observe it using several web-cams. Xavier (Simmons et al., 2000) was probably the first mobile robot to operate in a populated office building controlled through the web. Xavier can be advised by web users to move to an office and tell a “knock-knock” joke after arrival. The robot collects requests both on-line and off-line and then process them during special working hours. After the completion of the tasks, the robot informs the user via e-mail. The Autonomous Observer (Simmons et al., 1999) consists of a semi-autonomous mobile robot that follows routes specified by a remote user, searches for soft-drink cans, and collects them in a bag. For this mission the robot has attached a gripped and a basket. The remote user sends the robot request to move to different places and when to look for cans. The Rhino and Minerva museum tour-guide robots (Schulz et al., 2000) are good examples of successful interaction of web-robots with people in real world environment. They operate in highly populated museums, performing the function of a robotic tour guide, and leading the people in the museum through exhibitions. The web is used to make the museum accessible to remote visitors, who watch images from the robot’s camera and who could control the robot’s operation. Web-based operation can take place while the museum is open, which means that both virtual and physical visitors share the control of the robot. NASA has also used Internet for control of vehicles in space (Tso et al., 1998). For example NASA’s rover interface10 provides the user with different views of the surrounding environment. The images in these 8 http://telerobot.mech.uwa.edu.au 9 http://www.eia.brad.ac.uk/rti 10 http://mars.graham.com/wits 269 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION views are calibrated and contain range data. The user performs measures of the environment and navigation through a point-and-click interface. Although the field is rather immature and there are no commercial services provided with these technologies, there are a large number of groups pursuing this concept. One of these is well funded and provides an Internet-accessible machining service, called CyberCut11. This suggests commercial services will become common in the future. Webbased telerobotics is suitable not only for classical hazardous environments but also for new areas the followings: • Tele-manufacturing. The CyberCut machining service is available from the web, and it • • • • seems to be the first commercial venture of web-telerobotics. Remote access to machinery can reduce production costs by reducing external manpower. This is a very promising application. Training. Providing access to robots and other expensive equipment for training purposes, where purchasing cannot be justified, is another promising area. Nevertheless, it seems that there are currently no web-based training applications. Entertainment. Following the experience of the University of Western Australia webbased robotics arm, it is apparent that many people consider operating them an entertainment. There are other successful experiences of museum tour-guide robots like Rhino and Minerva (Schulz et al., 2000). Surveillance. The Autonomous Observer (González-Baños et al., 1999) is an incipient web-based application where a mobile robot can detect and track objects. This has practical use in surveillance and law-enforcement applications. The operator can command the robot to follow a suspicious subject without being exposed to danger. Weboperated surveillance robots can also have utility for personal applications when the owners of a house go for holidays. Exploration. The experiments carried out by NASA (Tso et al., 1998) have shown that there is a potential application of web-robotics in space missions. A private company called LunaCorp, in conjunction with the Carnegie Mellon University, plans to launch the first private lunar mission12. The project involves landing of a pair of web-based teleoperated robotic vehicles on the Moon’s surface. This application is intended also as an entertainment mission. 11 http://cybercut.berkeley.edu 12 http://www.ri.cmu.edu/lri 270 HUMBERTO M ARTÍNEZ BARBERÁ 8.2 Web robotics technologies Currently there are many different technologies which enable a robot to be teleoperated from a network. The term web-telerobotics is commonly used to refer to web based telerobotics applications, that is, teleoperation applications that can be run inside, or be launched from, a web browser. What these applications have in common is that, as long as the client machine has a compatible web browser and the required Java Virtual Machine (Java VM), they are platform independent. This is not really a constraint because there are free browser implementations and free Java VMs for almost all typical operating systems and machines. This section describes the technologies involved in some successful web-robotics applications. These do not include cases like the GRIP13 system (González-Baños et al., 1999), which makes use of X-Windows programs for the client side and cannot be considered a true platform independent solution: an X-Windows server is needed on the client machine, and in some cases it is difficult to get free implementations. 8.2.1 The UWA Telerobot The University of Western Australia’s Telerobot (Taylor and Dalton, 2000a) goal is to provide enough information to the remote operator of a robotic arm to make interpretation easy while minimising transmitted data and maintaining a simple layout. The Telerobot, which initially was an ASEA IRB-6, is an ABB IRB 1400 robotic arm. The operator users of the robot submit requested moves by filling in an HTML form, done by clicking images of the workspace, or by specifying multiple moves in a script form. Additionally, there are observer users that do not control the robot but observe what the robot is doing. The control method of the UWA Telerobot (Figure 8.1) is implemented by submitting the details of an HTML page, the user interface (Figure 8.2), to a web server, which receives the request and launches a CGI script. One CGI is launched for every request, and several copies may be running simultaneously to serve an operator and many observers. Only one person at a time can control the robot. The CGI script determines whether the request has come from an operator, and in such case, establishes communication with the robot server and the image server. It then generates an HTML page from a template, replacing tags representing variables with their current values. The image server provides some services such as operator-controlla ble image sizes, a software zoom, and images from multiple cameras. Once a move is complete, new images of the workspace are taken, and a new form with these images and the robot position is returned via CGI to the user. 13 http://underdog.stanford.edu/GRIP 271 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION Figure 8.1: UWA Telerobot system architecture. a) Operator/observer interface. b) Live video feedback. Figure 8.2: UWA Telerobot user interface. Other technologies like JavaScript and Java have been used in Telerobot. JavaScript provides the capacity for limited local data processing instead of relying on CGIs for interface management. When specifying 3D locations using images, clicks in two different images are 272 HUMBERTO M ARTÍNEZ BARBERÁ needed. Under normal CGI operation a click results in a request being sent to the web server. JavaScript is used to allow specification of robot moves with a single submission to the web sever. Java is used to produce a wireframe model of the robot, which allows users to simulate moves before submitting them to the real robot. Some other web-based teleoperated robotic arms are very similar in both operation and technological support. For instance, the Mercury Project (Goldberg et al., 2000) allows users to command the remote arm to perform archaeological-like tasks. 8.2.2 EPFL KhepOnTheWeb The Swiss Federal Institute of Technology of Lausanne’s KhepOnTheWeb (Saucy and Mondada, 2000) goal is to provide access, through network communication facilities, to a complex and unique mobile robotics setup for carrying out research in mobile robotics. The system consists on a mobile robot that moves in a wooden maze. The robot is a small Khepera mobile robot (55 mm in diameter) which carries a small color CCD camera. Additionally there is a panoramic overhead camera. The labyrinth is made in such a way that the operator can see several effects. It contains a mirror that allows the user to watch the robot, some walls are lower than the robot allowing the user to look further, and there is a ramp that gives the robot access to a view outside the maze. Figure 8.3: EPFL KhepOnTheWe b system architecture. The control method of KhepOnTheWeb (Figure 8.3) is based on plain HTML pages. When the user connects to the system (Figure 8.4) the server creates a new CGI process, which continuously updates the video on the user interface. The user, using clickable images, can control the robot movement, the overhead camera orientation and zoom, and has the option of changing from robot camera to overhead camera. Additionally, a Java applet running in the client continually requests information about the state of the robot and the time left to the user. 273 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION There is no local intelligence on the robot such as obstacle avoidance. Thus, the operator performs a closed-loop control of the robot using video as the feedback. In the Khepera case, there is no need for intelligent-like behaviours because of the lightweight of the robot. This also means that there is no risk of destroying a wall or the robot itself. The KhepOnTheWeb designers chose this option because having direct control has the advantage that the user can see the result of his own action without any external contribution. Of course, this has a major drawback: the control of the robot is more difficult without help and under important delays. Moreover, the robot operates in a tethered mode, and thus, autonomy is not a concern: is always available and it does not run out of batteries. Although the experience with KhepOnTheWeb is quite satisfactory for the user, the approach does not scale to real world environments. a) Operator interface. b) Overhead live video feedback. Figure 8.4: EPFL KhepOnTheWeb user interface. Additionally, a virtual model of the experiment has also been developed using VRML and Java. This provides the user with the possibility of training himself by running a simulation locally on his machine, and without time-delay constraints (Michel et al., 1997). 8.2.3 CMU Xavier The Carnegie Mellon University’s Xavier (Simmons et al., 2000) web venture started as an aid in testing navigation algorithms. A web site was provided so that users could send tasks to the robot and have some feedback on the degree of completion of the task. The experiment consists of an autonomous mobile robot that moves in a real building. The robot is a custom mobile robot (0.6 m in diameter) based on a robotic base from Real World Interface, Xavier, 274 HUMBERTO M ARTÍNEZ BARBERÁ which has a full set of sensors (bumpers, encoders, laser, and sonars) and a color CCD camera. The control method of Xavier (Figure 8.5) is very simple and is based on plain HTML pages. There are two different interfaces: the observer interface (Figure 8.6-b) and the operator interface (Figure 8.6-a). When an observer connects to the system the server creates a new CGI process, which continuously updates the video on the user interface. The user is provided, in addition to the video, with a map of the building and the current position of the robot. The operator interface is just another HTML where the user can select the required task and then send the command to a CGI. When the task is completed, the user is notified via email. There are three types of tasks: telling a joke, saying hello, or taking a picture. In all cases the user selects the destination room where the task is to be performed. Figure 8.5: CMU Xavier system architecture. Xavier, as opposed to KhepOnTheWeb, does not permit operator closed-loop control. Instead, it allows only very high-level task specification. To achieve the desired task, the robot has a task sequencing and a navigation module (Figure 8.5). The robot is provided in advance with a model of the building. The navigation module is composed of four layers: • Servo-control. This layer controls both the base and the pan-tilt head, and provides simple • velocity and/or position control. It also provides feedback on command execution and position information, based on encoder readings. Obstacle-avoidance. This layer keeps the robot moving in a desired direction, while avoiding static and dynamic obstacles. This module is based on the Lane-Curvature method to find highly transversable lanes in the desired direction, and the CurvatureVelocity method to switch between lanes and avoid dynamic obstacles (Ko and Simmons, 275 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION • • 1998). Both methods take vehicle dynamics into account to provide safe high-speed motion, at an average of 0.45 m/s. Navigation. The navigation layer is responsible for getting the robot from one location to another. It uses a partially observable Markov decision process (POMDP) model to maintain a probability distribution of where the robot is at all times (Koening and Simmons, 1998). Thus, while the robot almost never knows precisely where it is, it rarely gets lost. Path-planning. This layer determines efficient routes based on a topological map that is augmented with rough metric information. It uses a decision-theoretic approach to choose plans with high expected utility, and which take sensor and actuator uncertainty into account. a) Operator interface. b) Observer interface. Figure 8.6: CMU Xavier user interface. The Xavier experiment suffers real world problems: inaccuracy of the sensors, moving obstacles and autonomy. Despite these facts, the system perform satisfactorily most of the time, with an extremely low failure ratio. Since the robot is being tasked at a high level, high bandwidth interaction is not needed and both video and position are updated at a very low rate. Even if communication is lost completely, Xavier can still continue achieving its current task. The Xavier approach has a real negative impact on web-based interaction: commanding at a high level is not as interactive as teleoperation. Some other web-based tasked mobile robots are very similar in both operation and technological support. For instance, the Rhino and Minerva museum tour-guide robots (Schulz et al., 2000) allow users to select which area of the museum they want to see. One of the differences between Minerva and Xavier is that the former uses a higher update rate so that user experience feels more interactive. 276 HUMBERTO M ARTÍNEZ BARBERÁ 8.2.4 NASA WITS The NASA Web Interface for Telescience (WITS) has been developed to provide Internet-based distributed ground operations for planetary lander and rover missions (Tso et al., 1998). WITS provides an integrated environment for operations at a central location and collaboration with geographically distributed scientists. There are two versions of WITS, the private one for NASA use, and the public one that is free and available to the general public. Both versions have similar functionalities, but they differ in real robot kinematics and commands (as required by law). WITS has been used and deployed satisfactorily in some NASA Mars missions: • Sojourner mission (1997). The WITS control interface was tested off-line with a • • simulated rover, but providing real data. Polar Lander mission (1998). The WITS control interface was tested on-line to control both the robotic arm and the camera of the mission. The public version works like the Sojourner version: simulated robot with real data. Field Integrated Design and Operations (FIDO) mission (2003). The WITS control interface will be used for Internet based control of the FIDO rover. During May, 2000 field tests were conducted with a FIDO prototype on Earth (Backes & Norris, 2000). During these tests, operators used WITS to visualise downlink data and generate command sequences from several laboratories and control centres round the world. The WITS system architecture (Figure 8.7) includes a database, a server and multiple clients (Backes et al., 2000a)(Backes et al., 2000b). The database is a structured file system that holds downlink data and uplink sequence information. The server provides communication between clients and the database. Clients are distributed over the Internet and provide the interface needed to view downlink data and generate command sequences. The WITS system is implemented using Java and some of its extensions, like Java3D (for drawing 3D synthetic models) and the Java Cryptography Extension (to assure secure communications). The user gets the software through an HTML page and then stops using HTML. The Java applet is then in charge of accessing the user’s local WITS database and the WITS server. The user generates a sequence of actions locally, using the FIDO simulator to check the results. When finished, the user sends the sequence to the WITS server. It is checked using a sequence integration and verification module (SEQ_GEN) and then the full sequence is sent to the rover. Additionally, the user accesses the data received through the downlink into the remote WITS database. This data include the robot position, images from the navigation stereo cameras, the panoramic stereo cameras and other sensors. WITS, as happens with Xavier, does not allow the closed-loop control of the remote rover, although the reason is quite different. WITS has been designed to assist in space missions where delays are very considerable and closed-loop control offers very low performance or none at all. Nevertheless, WITS, in contrast to Xavier, allows the user to specify complete tasks without any constraints. The user generates a sequence of actions that the rover executes at the end of the process. The sequence includes an integrity test. 277 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION Figure 8.7: NASA WITS system architecture. Figure 8.8: NASA WITS user interface. 278 HUMBERTO M ARTÍNEZ BARBERÁ One of WITS’s advantages is that it allows collaboration (Backes et al., 2000a) between mission users by providing secure communications and daily downlink data and allowing them to specify and generate command sequences that are stored in a common database. Users can see each other’s targets and sequences and can use target inputs from each other’s sequences. Secure communications are provided through the WITS Encrypted Data Delivery System (WEDDS) service (Norris and Backes, 2000), which is a communication platform based on the Secure Sockets Layer (SSL) protocol. WITS is a fully functional Internet-based teleoperation system which features a very complete and detailed user interface aimed at assisting in space missions. Although the user is presented with many controls and options, interactivity is greatly penalised because the user does not feel that he is really controlling the robot (this was not a design concern), as happens with Xavier, but when it is in simulation mode. 8.3 The BGen-Web system 8.3.1 Objectives BGen-Web is a proof-of-concept web-robotics exploration application. The BGen-Web system aim is to provide a web-based service for highly interactive supervisory control (Sheridan, 1992) of a mobile robot in order to perform exploration of scarcely known, or completely unknown, environments. This objective is very similar to some of the examples described in the previous subsection. Nevertheless, it also exhibits features not found in those examples. The BGen-Web application’s objective can be summarised in the following points: • The application should run in real world environments, both indoor and outdoor. In the • • • latter case, as there is no all-terrain vehicle available, only the on-road case is considered. The intended use is for exploration and surveillance of scarcely known or completely unknown environments. The system should be built on top of the BGA architecture and thus use all the resources from the software already developed. The user’s experience should be as highly interactive as possible, and only limited in the main network bandwidth requirements. The user’s degree of control should be profile dependent, from operator closed-loop control (mostly for data-gathering use) to operator supervisory control (for regular remote operators). A major premise is that neither the robot nor the remote operators know the environment in advance. In some sense, both the NASA WITS and EPFL KhepOnTheWeb systems can be considered to belong to the same class of applications. In both cases, the first time the operator connects to the system, he does not know anything about the environment and has to rely on the robot’s sensors to get a model of the world. The key difference is that in BGen-Web the user himself does not build a model of the environment but uses the model 279 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION provided by the robot. Thus, in BGen-Web the robot is provided with range sensors to construct a model of the environment and positioning sensors to locate the robot within the model. In addition the robot has a camera to provide the operator with video feedback. Although the operator can be allowed to close the control loop, the operator typically only sends desired goal locations, referred to the robot’s initial position, and then the robot moves autonomously to that place. While the robot is moving, it constructs a map of the environment and computes the path to reach the goal, without colliding with obstacles. Hence, BGen-Web makes full use of the characteristics of BGen and the sample agent architecture developed for the Quaky-Ant robot. Another important aspect of BGen-Web is the task granularity. KhepOnTheWeb provides interface to control at the lowest level, i.e., direct control of the motors. WITS provides control commands at a higher level, for instance, positioning the robot at a given heading and distance. Xavier provides control commands at the task level, for instance, going to a previously known room. BGen-Web provides commands at an intermediate level between WITS and Xavier. Thus, BGen-Web achieves interactivity levels beyond those of KhepOnTheWeb (much more information is provided to the operator), and WITS and Xavier (they do not feel interactive). Additionally, BGen-Web, like WITS and Xavier, operates in real world environments. Nevertheless, BGen-Web operation time and distance is somewhat limited because it incorporates localisation methods that degrade noticeably with time and distance. In any case, as was stated at the beginning of the section, the system is intended to be a proof-of-concept and, by no means, a robust and readily deployable system. 8.3.2 BGen-Web architecture As shown in the previous examples (Telerobot, KhepOnTheWeb, and Xavier), HTMLbased interfaces are not the answer for highly interactive systems because HTML demands a lot of processing on the server side. As WITS demonstrates, the way to accomplish the objective is by increasing the processing on the client side, and thus, Java technology provides a multi-platform and easily deployable framework to achieve high performance interfaces (in the multi-platform context). As the whole BGA and BGL implementations are already based on Java, it is a very simple process to incorporate previous developments in a web-based architecture. Although the BGen-Web architecture is not robot dependent, the current implementation relies on the Quaky-Ant robot (see chapter 2). Among other elements, it has a series of sonar and infrared sensors, a video camera, a radio modem for bidirectional data transmission and a 2.4 GHz video emitter. As in WITS, the BGen-Web architecture (Figure 8.9) uses HTML as the means of distributing software and configuration files, and launching the Java applet. The clients provide interfaces to sensor data, video feedback, and robot control. The web server provides the software repository and a gateway to the BGA architecture protocol (KQML). The video server provides the real-time video. Both servers can run physically on the same machine. The BGA gateway acts as a proxy. It receives requests from the different clients and puts the information onto blackboard, then it reads sensor data from the blackboard and sends the information to the different clients. 280 HUMBERTO M ARTÍNEZ BARBERÁ Figure 8.9: BGen-Web system architecture. The real-time video feedback has been developed using the Java Media Framework (JMF), which is an API that allows incorporating time based media types in Java applications and applets. It allows programmers to develop software that presents, captures (using video cameras, microphones, etc), and stores such media, and also controls the kind of processing applied to the multimedia flows. Moreover, JMF offers RTP support to be used with ondemand audio and video applications or interactive videoconferencing applications. The Real-Time Transport Protocol (RTP) is a protocol that provides end-to-end realtime data transmission. It is the Internet standard for audio and video transport, as it is defined in the IETF (Internet Engineering Task Force) RFC 1889, developed by the AVT working group (Audio Video Transport). RTP by itself does not offer on time delivery of packets or other quality of service (QoS) warranties. RTP is therefore complemented with the Real-Time Control Protocol which allows monitoring the quality of data distribution and controlling and identifying RTP transmissions. An RTP stream is a series of UDP packets, which have their origin in a given source and can be sent to a multicast address, thus producing a multireceptor emission which makes a scalable use of the network. Each packet has two parts: a header structure and a payload. There is a 7 bits payload-type identification field in the header. Its value is an index in a payload format profile. The mappings for audio and video payloads are specified in the RFC 1890. The media type used for video transmission in BGen-Web is the H.261 ITU recommendation (Tekalp, 1995). It is a video compression standard developed to support 281 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION ISDN videoconferencing services at n x 64 kbps, with n = 1 … 30 (n is the number of ISDN channels). The H.261 standard presents two important features: it specifies a maximum coding delay of 150 ms (the user does not appreciate interactive visual feedback with a greater delay), and the low-cost VLSI hardware implementation of the codecs is possible (this is very important in standardising the format in videoconferencing equipment). The input image format is specified using the Common Intermediate Format (CIF). For low bit-rate applications there is also a smaller format (QCIF) which is a quarter of the CIF. The CIF and QCIF formats specification is as follows (Table 8.1): CIF Number of pixels / line Luminance (Y) Chrominance (U, V) Number of lines / image Luminance (Y) Chrominance (U, V) Interlacing Temporal ratio Aspect ratio QCIF 360 (352) 180 (176) 180 (176) 90 (88) 288 144 1:1 30, 15, 10 o 7.5 4:3 144 72 1:1 30, 15, 10 o 7.5 4:3 Table 8.1: H.261 CIF and QCIF image formats. At 30 frames/s, the CIF data rate is 37.3 Mbps and the QCIF data rate is 9.35 Mbps. With 10 frames/s QCIF images, a 48:1 compression is required to use only one 64 Kbps channel. CIF images should only be used with six or more 64 Kbps channels. In the current BGen-Web implementation, the video server sends H.261 QCIF images at 7.5 frames/s, which produces enough video feedback quality over Internet. The operation method in BGen-Web is very simple. When the applet (Figure 8.10) starts up, it connects to both the BGA proxy (using a unicast address) and the video server (using a multicast address), and starts receiving sensor data and real-time video feedback. The applet has some widgets to show the current sensor readings (sonar, infrared, virtual sensors, bumpers, compass, odometry and GPS), the video feedback and the current fuzzy grid map, and to drive the robot. At start-up the map is empty and it is updated while the robot moves. The operator in supervisory control mode can choose the desired goal point by clicking on the map. The robot runs a BG program which tries to reach the current desired goal, and when it reaches the goal, it stays still. The operator can choose a different goal point at any time. In this way, the typical operator can explore unknown environments. In addition, for data acquisition purposes, the operator can also work in human closed-loop control mode using the steer and speed sliders. This option is only available to a certain set of operators whose machine identifiers are stored in the BGA proxy. Thus, whenever a steering/driving command arrives at the proxy, its source address is compared to the allowed list, and if not found, it is not forwarded to the robot. The BG program implements an architecture similar to that described in chapter 6, which is composed of three agents: 282 HUMBERTO M ARTÍNEZ BARBERÁ • Reactive behaviours. This agent contains the same set and implementation of behaviours • • used for autonomous robot navigation. These include obstacle avoidance and goal seeking behaviours. Navigation. This agent builds a fuzzy grid map and computes robot paths using the A* path planning algorithm. Task supervision. This agent is an FSM which supervises the current goal. If the robot reaches the desired goal position, it stops the robot until a new goal position is available. Figure 8.10: The BGen-Web teleoperation applet. 8.3.3 Experiments and results The BGen-Web system has been tried and tested in different places and configurations. With this proof-of-concept application the service is not available all the time and it is mounted only for exhibitions or tests, because the hardware is shared for other experiments. A typical set-up (Figure 8.11), mainly used for exhibitions, is as follows: the Quaky-Ant robot is used in its standard hardware (see chapter 2) and software (see chapter 6) configurations, with a modified task supervision FSM. The robot uses a radio modem bidirectional link to send sensor data and receive commands (usually goal points), and a 2.4 GHz video emitter to send video feedback. The analogue video is digitised and streamed in a PC running the video 283 CHAPTER 8: A WEB-BASED T ELEOPERATION APPLICATION server. The radio modem is connected to a PC, which manages the PPP link and runs the BGA proxy. Another PC runs the web-based client to monitor sensors, in order to get video feedback and to send desired goal points. Figure 8.11: Experimental set-up. A sequence of images from a teleoperation run is shown below (Figure 8.12). The robot is commanded to reach the position occupied by the public. The robot tries to get to the position twice, but cannot occupy the position because there is an obstacle. Finally it is ordered to reach a position near the exhibition table, were it stays still. 284 HUMBERTO M ARTÍNEZ BARBERÁ Figure 8.12: Sequence of a Quaky teleoperation run. 8.4 Conclusions BGen-Web is a proof-of-concept web-robotics exploration application. The BGen-Web system provides a web-based service for highly interactive supervisory control of a mobile robot to perform exploration of scarcely known, or completely unknown, environments, and as such it has been successfully tested in real world environments, both indoor and outdoor. This system has been built on top of the BGA architecture in order to use all the resources from the already developed software. Using sensor data and video feedback in real-time, the operator’s experience is highly interactive when controlling a robot over Internet (it is only limited by network bandwidth requirements). This is an important aspect which distinguishes it from other web robotics applications. Another important aspect of BGen-Web is the task granularity. Some systems provide interfaces to control at the lowest level (for instance direct control of the motors), while others provide control commands at a higher level (for instance positioning the robot in a given previously known place). BGen-Web provides commands at an intermediate level: the user chooses the desired position and the robot tries to reach it. Thus, BGen-Web shows interactivity levels beyond those of KhepOnTheWeb (much more information is provided to the operator), and WITS and Xavier (they do not feel interactive). Additionally, BGen-Web operates in real world environments, as do WITS and Xavier. 285 Chapter 9 Conclusions and Future Work 9.1 Conclusions This thesis presents a series of mobile robotics related works, which cover the different layers needed to have a mobile robot perform tasks autonomously in an a priori unknown environment. Each layer provides a series of capabilities that are used by upper level layers. These layers are the physical platform, the sensor fusion layer, the navigation layer, the control architecture and language layer and the application layer. The following subsections summarise the results and the contributions achieved in each layer 9.1.1 The Quaky-Ant mobile robot The Quaky-Ant is a simple but powerful design that features a flexible and modular architecture. When measuring cost, performance and expandability Quaky-Ant compares favourably with commercial robots. The main characteristics of this robot can be summarised as: • It features an open architecture. Two key points are the use of a standard I2C bus for the • • interconnection of different modules and the definition of a generic open protocol. One consequence of the design of the architecture is its robustness (mainly due to the custom protocols) and its expandability (all the modules are addressed the same way irrespective of their nature). The architecture is not tied to any control architecture or development system, although it serves as the reference point of the control architecture developed. It operates smoothly and correctly in indoor environments and is also capable of operating in non-rough outdoor environments. Either way, it does not exhibit an outstanding top speed (approximately 0.45 m/s) because this was not a design concern. In contrast it exhibits manoeuvrability and precision. 287 CHAPTER 9: CONCLUSIONS AND FUTURE WORK 9.1.2 Sensor fusion Two different methods have been developed for solving the sensor fusion problem in a priori unknown environments. In this problem, model based techniques that make use of a priori information are not applicable, and thus, neural networks and fuzzy rule based systems are used instead. These fusion techniques require some examples for the learning stage. Obtaining such examples from real sensors is an extremely complex problem. The approach followed is to use examples obtained using a simulator. For this reason, models of sonar and infrared sensors have been evaluated and integrated in a simulation environment, which is also used for sensor fusion validation. The sonar model is based on the sonar-tracing technique which has been modified to take into account the sonar lobe effects, and the infrared model has been obtained by linearising experimental values. Once the neural network and fuzzy rules fusion methods have been trained, they are tested on different simulated environments ranging from a small to a large environment. The fusion methods are then compared with raw sensor data and with a simple and heuristic rule based system. The results show that in most cases the proposed fusion methods perform better than the raw sensors and simple rules. In addition, the fuzzy rule system obtained using fuzzy tabu clustering has the best overall performance of all the different fusion methods tested. The fuzzy tabu clustering is an algorithm based on the fuzzy c-means algorithm but with the introduction of a global minimum search by means of a tabu search technique. The algorithm has been successfully applied also to different examples showing comparable results with the well-known fuzzy clustering algorithms available in the literature. 9.1.3 Navigation Two navigation schemes have been developed to allow the Quaky-Ant robot to navigate in a priori unknown environments. The first procedure relies on the assumption of reliable odometry for short time and distance operations. Bearing this assumption in mind, the robot is provided with a map building method and a path planning algorithm, which are run simultaneously and thus the robot is required to build the map while it is moving, resulting in a continuous movement. This approach is based on a fuzzy grid map for map-building and an A* based algorithm for path-planning. The fuzzy grid map has been modified to accomplish the required task: the virtual sensors are used instead of sonars, the initialisation procedure has been changed, and some heuristics have been incorporated to modify the certainty of visited cells. The A* based planner has been defined with a new evaluation function to generate paths that leave enough space between the robot and the obstacles, and also to be efficient in time, so that the node evaluation is fast enough to run the planner at a high frequency. The navigation scheme has been successfully used in different, both simulated and real world, scenarios. The second procedure overcomes the limitation of the previous approach by adding a localisation method. This is accomplished by building fuzzy segments map to model the 288 HUMBERTO M ARTÍNEZ BARBERÁ previously visited environment and the current sensed environment. The transformations needed to match these maps are used to correct the current robot’s position. The performance of the localisation method is greatly affected by the number and quality of the fuzzy segments. The segments generation procedure has been modified to make use of a sensor buffer, which produces more consistent segments. Although the motivation behind the navigation procedure was to make the overall work of the system more robust, it also proved to be capable of solving the SLAM (Simultaneous Localisation And Map-building) problem in the different, both simulated and real world, tests carried out. More tests are still needed to verify or prove in which situations or environments this set-up is valid for the SLAM problem. 9.1.4 Control architecture and programming language The BGA architecture relies on and takes advantage of various well-developed concepts. It is based on a blackboard architecture that coordinates a multiagent system. Agents can be layered and distributed according to their functionalities, including three-layer hybrid architecture schemas, where the behaviour coordination is accomplished by a blending method derived from the context-dependent blending. The main contributions of the BGA architecture are the following: • BGA effectively groups different concepts and paradigms of control architectures (namely • • blackboard architectures, hybrid architectures, and multi-agent systems) extending and combining their characteristics. BGA incorporates a new blending mechanism that is based on the context-dependent blending that effectively addresses the behaviour coordination problem. In addition, its design allows the use of expert knowledge during the process of learning the fuzzy fusion rule base. BGA makes use of KQML, which is typical of message-passing systems but is not habitual in blackboard systems, implemented over UDP for performance. The BG language is a high-level robot programming language that fully supports the BGA architecture and thus, it allows the user to develop BGA based control systems in a friendly but powerful way. BG aids in the development cycle of robotics applications by providing a single uniform framework to declare and specify all the elements of the BGA architecture (behaviours, tasks, blenders and agents), and by supporting behaviour modelling via extensive use of fuzzy logic. In addition, a full distributed BGA architecture can be achieved by running BG programs in different processors. The BGen development environment incorporates a BG language interpreter, as well as integrating a simulator for different customised holonomic robots, data visualisation and recording tools, and also a behaviour learning tool. The BGen environment is intended both for development and deployment, and provides a simple generic way to develop and test BG 289 CHAPTER 9: CONCLUSIONS AND FUTURE WORK programs off-robot, and to execute such programs in a real robot. Thus, the development to deployment time is considerably reduced, and different platforms can reuse the code. A series of experiments has shown that BGen, which includes the BG language and the underlying BGA architecture, provides a good working environment for mobile robotics related problems. Some conclusions have been obtained: • BG programs are simple and easy to understand. Without knowing in advance what a BG • • • program does, it is easy to figure out the purpose of the program, although of course, this also depends on how well written the program is. In any case, although the tasks involved in achieving a robot’s goal are complex, the BG language hides the details and allows the robot programmer to concentrate on the application domain. All the specific techniques for map-building, localisation, path-planning, etc are available directly from the language, while new methods can be added in the future. This favours and helps in a quick development of robot programs as well as in their maintenance. The BGen environment is open and portable. While all the development of the BGen environment has been carried out in the University of Murcia (Spain) in parallel to the development of the Quaky-Ant robot, one of the experiments shows the performance of a completely different robot. The porting and interfacing of the BGen environment to the new system and the rewriting of the BG program to account for the new robot geometry and speed took less than two weeks, when the author was at the University of Örebro (Sweden). It is important to note also that the environment was not tried completely in a real robot. These facts support that the BGen environment is flexible and easy to port to other platforms. Java-based robotics environments are practical and usable. It is often argued that Java programs are extremely slow. While it is true that Java just-in-time compiled programs are a little bit slower than native ones, it is still possible to use Java in most robotics applications. Moreover, the BG interpreter runs on top of a Java virtual machine. Thus, BG programs are interpreted twice, but this is not a problem because most computingintensive techniques run directly as Java libraries (map-building, localisation and pathplanning). The added value of the BGen system is again its portability and reuse of code at two different levels: at a system level, reusing the whole set of BGen libraries, and at an application level, reusing BG programs for different robot platforms. The BGen environment is expandable. One of the experiments shows that new techniques or modules can be incorporated into the BGA architecture. In this case the topology-based planner is not a Java program but a C/C++ program interfaced by way of the Java Native Interface. In this way new services or legacy applications can be added. A genetic-based method has been developed to assist in the development of behaviour fusion fuzzy rule bases. The algorithm is used to obtain or learn a new fusion rule set from an initial set of rules provided by the user (this may be the empty set). The genetic algorithm has been adapted to encode individuals which are converted to BG fusion fuzzy rule bases. Thus, these individuals can be standard MIMO fuzzy rules and also background fuzzy rules. Moreover, the fitness function has been defined to include the effects of the experimental evaluation of the robot’s behaviour by using simulation. This function takes into account 290 HUMBERTO M ARTÍNEZ BARBERÁ different parameters which measure the robot’s performance in both successful and unsuccessful runs. This point is crucial in the initial part of the learning procedure because most of the individuals do not reach the goal, and it is very important to discriminate between them to speed up the learning process. Different experiments have been carried out to test the influence of the behaviour weights and the results of the learning process. As has been shown, behaviour weights can aid the learning process by narrowing the search space through using user’s knowledge of the domain. Moreover, the use of the learning process improves the performance of the robot with respect user defined fusion rules. 9.1.5 Web robotics application BGen-Web is a proof-of-concept web-robotics exploration application. The BGen-Web system provides a web-based service for highly interactive supervisory control of a mobile robot to perform exploration of scarcely known, or completely unknown, environments, and as such it has been successfully tested in real world environments, both indoor and outdoor. This system has been built on top of the BGA architecture in order to use all the resources from the already developed software. Using sensor data and video feedback in real-time, the operator’s experience is highly interactive when controlling a robot over Internet (it is only limited by network bandwidth requirements). This is an important aspect which distinguishes it from other web robotics applications. Another important aspect of BGen-Web is the task granularity. Some systems provide interfaces to control at the lowest level (for instance direct control of the motors), while others provide control commands at a higher level (for instance positioning the robot in a given previously known place). BGen-Web provides commands at an intermediate level: the user chooses the desired position and the robot tries to reach it. Thus, BGen-Web shows interactivity levels beyond those of other systems, which provide much less information to the operator and do not feel interactive. Additionally, BGen-Web operates in real world environments, as do WITS and Xavier. 9.2 Future work This thesis presents solutions in different areas to achieve a distributed architecture for intelligent control in autonomous mobile robots. From this point new lines and areas of work are created for the improvement of the different components. The sensor fusion approaches have shown good performance, especially the hybrid neuro-fuzzy approach, which from a qualitative point of view produces fair results in real world. The experimental setup needed to extend these tests, in order to include real world data 291 CHAPTER 9: CONCLUSIONS AND FUTURE WORK and obtain quantitative results, is very difficult. Moreover, the real world data acquisition is quite complex and has not yet been performed. This could also improve the results. The navigation problem has been successfully solved for most situations, but there are still some points that can be improved. The fuzzy grid map assumes perfect odometry and cells are updated based on the current localisation. If this map is used in conjunction with the fuzzy segments map, the quality of the results depends heavily on the update frequency of the localisation. The uncertainty on the position is not forwarded to the fuzzy map and, clearly, this could be an improvement. In addition, the fuzzy segments localisation suffers from the initial localisation problem. If the tracking of the current environment is lost by any reason, the initial localisation may fail in highly symmetrical environments, like those found in buildings with corridors. In this case, a simple geometrical matching does not suffice, and it remains being an open problem. The BGA architecture is flexible enough to approach different problems and tasks, like those presented in this thesis. Although this architecture is not restricted to single robots, some additional structures are needed to use BGA in a multi robot scenario. A possible improvement is the inclusion of a message passing facility to communicate between the different robots, while keeping the current blackboard model for intra robot communication. In this context, the Java-based JINI 14 network technology is an interesting framework for establishing communications between the different agents of the architecture. It provides simple mechanisms which enable agents to plug together to form a community without any planning, installation or human intervention. Additionally, the BG language should be modified to incorporate the new control structures needed for communicating with other robots. The BG language, at its beginning, was designed to use fuzzy grid maps and A* path planning. During the work of this thesis new methods and techniques were added, but they are not visible from the BG language. A clear improvement could be the use of generic navigation structures (map-building, path-planning and localisation) which are instantiated at run-time using a configuration file. In this way, the control program simply sends update commands to these structures and receives the corresponding feedback using a generic interface. From a readability point of view, the language could also be extended to support more data types (booleans, integers and data structures). From a performance point of view, the current BG interpreter is very flexible and fast. But if BG programs are compiled to Java, an important speed up could be achieved at the expense of a more complex program preparation (program modification on the fly will be no longer supported). The BGen architecture supports the Quaky-Ant and Nomad 200 robots, but it is currently being updated to support other commercial robots, like the RWI B21. The genetic-based approach for behaviour fusion has been shown to produce fair results. Its disadvantage is the time required to obtain a good solution. The problem is two fold. First, different scenarios are needed to discard random search and to test different abilities. If there are many scenarios, or their dimensions are long, the algorithm spends too 14 http://www.sun.com/jini 292 HUMBERTO M ARTÍNEZ BARBERÁ much time simulating the robot. In this case, the use of a parallelised or distributed version of the algorithm could improve noticeably the time needed to obtain a solution. Additionally, more termination conditions should be incorporated to avoid wasting time in clearly unuseful solutions. The BGen-Web web-robotics application has been shown to be very effective. It presents a high degree of interactivity, mostly due to the video feedback, and has many interesting real world applications, like semi-autonomous exploration. One problem with the current approach is that it uses multicast. Usually, only some Internet Service Providers (ISP) provide or allow the routing of multicast packages. This clearly limits the number of potential users. An improvement could consist of the addition of an on-demand video server to allow non multicast users receiving video feedback, at the expense of a greater network load. Another problem is security. Currently, the access control is performed based on IP addresses and no encryption is performed. A possible improvement could be the use of a Java implementation of SSL15 (Alan et alt., 1996) to implement certificate-based authentication and encryption. Finally, if JINI technology is incorporated in the BGA architecture, the BGen-Web application could benefit from some added value services, like automatic discovery of BGen-Web servers. The BGen environment could also incorporate new technologies for distributed simulation. First, the current user interface includes a two-dimensional representation of the world. The environment could benefit from the use of web-based three-dimensional visualisation technologies, like VRML16 (Roehl et al., 1997) or Java3D (Brown and Petersen, 1999). These allow a web browser to display three-dimensional contents generated from a Java application. Most web browsers support VRML, while only Windows and Solaris based browsers support Java3D currently. However, this should change as long as Sun is developing Java3D on top of OpenGL. Additionally, the use of the IEEE 1278.1 standard, the Distributed Interactive Simulation17 protocol (DIS), could allow the use of a more elaborated simulation environment organised in a distributed framework. In this case, this framework could be a fundamental point for integrating a non-monolithic multi-robot simulation. Moreover, Java implementations of the DIS protocol are available. 15 http://jcewww.iaik.at 16 http://www.vrml.org 17 http://standars.ieee.org/catalog/simint.html 293 Appendix A Commercial Mobile Robots: a Survey A.1 IS Robotics Real World Interface18 (RWI), a division of IS Robotics, manufactures mobile robots and develops robot programming environments. In the medium size segment they have the following models (Table A.1): B14r, Magellan and microATVR. They are built around the rFLEXTM robot control architecture, which guarantees time-synchronised sensor and encoder readings, that provides flexibility and responsiveness in robot control. The sensing bus is based on the FARnetTM logical ring architecture that permits simple hub-and-node configuration and use of standardised components, and provides low network latency. The idea under the rFLEXTM architecture is to speed the development and modification of robot research projects and applications. The rFLEXTM architecture is accessed through the Mobility development environment. Model Characteristics Diameter: 32 cm Height: 60 cm Weight: 27.2 Kg Payload: 20 Kg Ground clearance: 0.65 cm CPU: Pentium 233 MHz and rFLEXTM B14r 18 http://www.rwii.com 295 Sensors 16 sonar sensors 16 infrared sensors 22 tactile sensors Optional sensors: Vision system Compass Actuators 3 wheels 2 24V DC motors Speed: 0.9 m/s Synchronised APPENDIX A: COMMERCIAL MOBILE ROBOTS : A SURVEY Diameter: 36.8 cm Height: 19.8 cm Weight: 13.2 Kg Payload: 9.1 Kg CPU: rFLEXTM Magellan Length: 42 cm Height: 32 cm Weight: 11.5 Kg Payload: 5 Kg Ground clearance: 5.1 cm CPU: rFLEXTM microATRV 16 sonar sensors 16 infrared sensors 16 tactile sensors Optional sensors: Vision system Laser range finder Compass 17 sonar sensors Optional sensors: Vision system Compass Inertial navigation Laser range finder GPS 2 wheels 2 24V DC motors Speed: 2.5 m/s Differential 4 wheels 2 24V DC motors Speed: 1.0 m/s Differential Table A.1: IS Robotics medium size robots A.2 ActivMedia Robotics ActivMedia Robotics19 manufactures mobile robots. In the medium size segment they have the following models (Table A.2): Pioneer-2 DX, and AmigoBot. They implement a custom sensing bus, which is accessed through the Saphira development environment and COLBERT and AmigoEYES programming languages. Model Pioneer-2 DX Characteristics Length: 44 cm Width: 38 cm Height: 22 cm Weight: 9 Kg Payload: 23 Kg Ground clearance: 6 cm CPU: Siemens C166 Length: 33 cm Width: 28 cm Height: 15 cm Weight: 3.6 Kg Payload: 1 Kg Ground clearance: 3 cm AmigoBot Sensors 8 sonar sensors Optional sensors: 8 sonar sensors Vision system Compass Tactile sensors Laser range finder GPS 10 sonar sensors Optional sensors: Vision system Speed: 1.6 m/s Differential 2 wheels 2 12V DC motors Speed: 1.0 m/s Differential $1,800 CPU: Hitachi H8S Table A.2: ActivMedia Robotics medium size robots 19 Actuators 2 wheels 2 12V DC motors http://www.activmedia.com 296 HUMBERTO M ARTÍNEZ BARBERÁ A.3 Nomadics Technologies Nomadics Technologies20 manufactures mobile robots and develops robot programming environments. In the medium size segment they have the following models (Table A.3): Nomad Scout II and Nomad Super Scout II. They implement a custom sensing bus, which is accessed through the XRDev development. Model Characteristics Diameter: 41 cm Height: 35 cm Weight: 23 Kg Payload: 5 Kg Ground clearance: 1.5 cm Sensors 16 sonar sensors 6 tactile sensors Actuators 2 wheels 2 12V DC motors No optional sensors Speed: 1.0 m/s Differential 16 sonar sensors 6 tactile sensors 2 wheels 2 12V DC motors No optional sensors Speed: 1.0 m/s Differential CPU: MC68332 16 MHz Scout II Diameter: 41 cm Height: 35 cm Weight: 23 Kg Payload: 5 Kg Ground clearance: 1.5 cm CPU: Pentium 233 MHz Super Scout II Table A.3: Nomadics Technologies medium size robots A.4 K-Team K-Team21 manufactures small mobile robots. In the medium size segment they have the following model (Table A.4): Koala. It implements a custom sensing bus, which is accessed through plug-in modules for Matlab and LabView or via cross-compiling in C. Model 20 http://www.robots.com 21 http://www.k-team.com Characteristics Sensors 297 Actuators APPENDIX A: COMMERCIAL MOBILE ROBOTS : A SURVEY Length: 32 cm Width: 32 cm Height: 20 cm Weight: 3 Kg 16 infrared sensors Optional sensors 4 triangulation infrared 6 wheels 2 DC motors Differential CPU: MC68331 22 MHz Koala Table A.4: K-Team medium size robot A.5 Robosoft Service Robotics Robosoft Service Robotics22 manufactures and resells mobile robots and manipulators. In the medium size segment they have the following model (Table A.5): Robulab-III. It implements a custom sensing bus, which is accessed through programming in C the ASDP libraries. Model Characteristics Diameter: 48 cm Height: 47 cm Weight: 50 Kg Payload: 30 Kg CPU: MC68040 25 MHz Sensors Tactile sensors Optional sensors Video system 6 infrared sensors 6 sonar sensors Actuators 2 wheels 2 24V DC motors Speed: 1.8 m/s Differential Robulab-III Table A.5: Robosoft medium size robot A.6 Applied AI Systems Applied AI Systems 23 manufactures and resells mobile robots and manipulators. In the medium size segment they have the following models (Table A.6): Gaia-1a and Labo-3. They implement a custom sensing bus, which is accessed through cross-compiling in C. 22 http://www.robosoft.fr 23 http://www.aai.ca 298 HUMBERTO M ARTÍNEZ BARBERÁ Model Characteristics Length: 53 cm Width: 36 cm Height: 26 cm Weight: 26 Kg Payload: 10 Kg Sensors 7 sonar sensors 4 infrared sensors Actuators 4 wheels 2 12V DC motors Optional sensors Video system Speed: 0.8 m/s Differential 7 sonar sensors 4 infrared sensors $13,000 2 wheels 2 12V DC motors Optional sensors Video system Speed: 1.8 m/s Differential Gaia-1a CPU: MC68332 16 MHz Length: 46 cm Width: 34 cm Height: 38 cm Weight: 25 Kg Payload: 30 Kg Labo -3 CPU: MC68332 16 MHz $7,500 Table A.6: Applied AI Systems medium size robot A.7 Probotics Probotics24 manufactures low-cost mobile robots. In the medium size segment they have the following model (Table A.7): Cye-sr. This robot is extremely low-cost, and uses just only a current sensor to detect if any of its wheels are blocked (that is, the robot has collided). It has available an I/O digital port, and it is programmed using both Java libraries and the Mapn-Zap software. Model Cye -s r Characteristics Length: 25 cm Width: 40 cm Height: 12.5 cm Weight: 4 Kg Payload: 4.5 Kg Sensors 2 current sensors No optional sensors Speed: 1.0m/s Differential CPU: 16 bit - 16 MHz $695 Table A.7: Applied AI Systems medium size robot 24 Actuators 2 wheels 2 12V DC motors http://www.personalrobotics.com 299 Appendix B Obtaining the BGen Software The BGen environment, some papers and other documents are available on-line for web download at the following address: http://ants.dif.um.es/~humberto/robots Additionally, the author can be contacted using the following address: Humberto Martínez Barberá Facultad de Informática Universidad de Murcia 30100 Murcia (Spain) Phone: Fax: E-Mail: 301 +34-968.364666 +34-968.364151 [email protected] References Adams, D. (1979). The Hitchhiker’s Guide to the Galaxy, Pan Books, London Agre, P.E. & Chapman, D. (1987). Pengi: an Implementation of a Theory of Activity, Proc. 6th National Conference on AI, Morgan Kaufman, pp 268-272 Aguilar, J.M. & Contreras, J.L. (1994). NAVITE: a Neural Network System for Sensory-Based Robot Navigation, World Congress on Neural Networks, New Jersey, USA, pp 177-182 Aguirre, E.; García Alegre, M.C. & González, A. (1998). A Fuzzy Safe Follow Wall Behaviour Fusing Simpler Fuzzy Behaviours, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Madrid, Spain, pp 607-612 Akatsuka, Y.; Goldberg, S.B. & Bekey, G.A. (1998). Digimuse: an Interactive Telerobotic System for Viewing of Three-Dimensional Art Objects, IEEE Intl. Conf. on Intelligent Robots and Systems, Victoria, Canada Al-Sultan, K.S. (1995). A Tabu Search Approach to the Clustering Problem, Pattern Recognition, 28(2):1443-1451 Alan, A.O.; Freier, P. & Kocher, P.C. (1996). The SSL Protocol Version 3.0, IETF Internet Draft Albus, J.S.; Quintero, R. & Lumia, R. (1994). An Overview of NASREM: the NASA-NBS Standard Reference Model for Telerobot Control System Architecture, NIST Technical Report 5412, Gaithersburg Andrews, J.R. & Hogan, N. (1983). Impedance Control as a Framework for Implementing Obstacle Avoidance in a Manipulator, in Control of Manufacturing Processes and Robotic Systems (D.E. Hardt and W. Book eds.), ASME, Boston, USA Araújo, R. & Almeida, A. (1999). Learning Sensor-Based Navigation of a Real Mobile Robot in Unknown Worlds, IEEE Trans. on Systems, Man and Cybernetics, 29(2):164-178 Arkin, R. (1989). Motor-Schema-Based Mobile Robot Navigation, Intl Jour. Robot Res., 8(4):92-112 Arkin, R. (1990). Integrating Behavioural, Perceptual and World Knowledge in Reactive Navigation, Robotics and Autonomous Systems, 6(1):105-122 303 REFERENCES Armstrong, D.G. & Crane, C.D. (1998). A Modular, Scalable, Architecture for Intelligent, Autonomous Vehicles, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Universidad Carlos III, Madrid, Spain, pp 62-66 Arrúe, B.C.; Cuesta, F.; Braunstingl, B. & Ollero, A. (1997). Fuzzy Behaviour Combination to Control a Non Holonomic Mobile Robot using Virtual Perception Memory, 6th IEEE Intl. Conf. on Fuzzy Systems, Barcelona, Spain, pp 1239-1244 Austin, J.L. (1962). How to Do Things with Words, Harvard University Press, Cambridge Backes, P. & Norris, J.S. (2000). Mars Rover Mission Distributed Operations, TMOD Technology and Science Programs News, June Backes, P.; Tso, K.S.; Norris, J.S.; Tharp, G.K.; Slostad, J.T. & Bonitz, R.G. (2000a). Mars Polar Lander Mission Distributed Operations, IEEE Aerospace Conf., Big Sky, USA Backes, P.; Tso, K.S.; Norris, J.S.; Tharp, G.K.; Slostad, J.T.; Bonitz, R.G. & Ali, K.S. (2000b). Internet-Based Operations for the Mars Polar Lander Mission, IEEE Intl. Conf. on Robotics and Automation, San Francisco, USA Balch, T. (1998). Behavioral Diversity in Learning Robot Teams, Ph.D. Thesis, Georgia Institute of Technology, USA Balch, T. & Arkin, R.C. (1994). Communication in Reactive Multiagent Robotic Systems, Autonomous Robots, 1(1) Balch, T. & Arkin, R.C. (1995). Motor Schema -Based Formation Control for Multiagent Robot Teams, 1st Intl. Conf. on Multi-Agent Systems (ICMAS-95), Menlo Park, USA, pp 10-16 Barman, R.A.; Kingdon, S.J.; Little, J.J.; Mackworth, A.K.; Pai, D.K.; Sahota, M.; Wilkinson, H. & Zhang, Y. (1993). DYNAMO: Real-Time Experiments with Multiple Mobile Robots, Intelligent Vehicles Symposium, Tokyo, Japan, pp 261-266 Barshan, B. & Kuc, R. (1990). Differentiating sonar reflections from corners and planes by employing an intelligent sensor, IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(6):560-569 Bayes, T. (1763). An Essay Towards Solving a Problem in the Doctrine of Chances, Philosophical Transactions of the Royal Society of London, 53(1):370-418 Benreguieg, M.; Maaref, H. & Barret, C. (1998). Navigation of an Autonomous Mobile Robot by Coordination of Behaviours, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Madrid, Spain, pp 589-594 Beom, H.R. & Cho, H.S. (1995). A Sensor-Based Navigation for a Mobile Robot using Fuzzy Logic and Reinforcement Learning, IEEE Trans. Systems Man and Cybernetics, 3(25):464-477 Berenji, H.; Chen, Y.Y.; Lee, C.C.; Jang, C.S. & Murugesan, S. (1990). A Hierarchical Approach to Designing Approximate Reasoning-Based Controllers for Dynamic Physical Systems, 6th Conf. on Uncertainty in Artificial Intelligence, Cambridge, USA 304 HUMBERTO M ARTÍNEZ BARBERÁ Berlanga, A.; Borrajo, D.; Fernández, F.; García-Martínez, R.; Molina, J.M. & Sanchís, A. (1999). Robótica Cognoscitiva y Aprendizaje Automático, VIII Conf. de la Asociación Española para la Inteligencia Artificial, Murcia, Spain, pp 1-8 Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York Bezdek, J.C.; Tsao, E.C.-K. & Pal, N.R. (1992). Fuzzy Kohonen Clustering Networks, IEEE Intl. Conf. on Fuzzy Systems, pp 1035-1043 Biethahn, J. & Nissen, V. (1995). Evolutionary Algorithms in Management Applications, Springer-Verlag, Berlin Biris, E. & Shen, Q. (1997). Navigating Mobile Robot with a Fuzzy Logic Controller, 7th ISAF World Congress, Prague, Austria, pp 345-350 Bonarini, A. & Basso, F. (1997). Learning to Compose Fuzzy Behaviours for Autonomous Agents, Intl. Journal of Approximate Reasoning, 17(1):409-432 Bonasso, R.P. (1991). Integrating Reaction Plans and Layered Competences through Synchronous Control, 12th Intl. Joint Conf. On Artificial Intelligence, San Francisco, USA, pp 1225-1231 Bonissone, P. (1997). Soft Computing: the Convergence of Emerging Reasoning Technologies, Soft Computing, 1(1):6-18 Borenstein, J.; Everett, H.R. & Feng, L. (1996). Navigating Mobile Robots: Systems and Techniques, A.K. Peters, Wellesley, MA Borenstein, J. & Koren, Y. (1988). Obstacle Avoidance with Ultrasonic Sensors, IEEE J. Robotics and Automation, 4(1):213-218 Borenstein, J. & Koren, Y. (1989). Real-Time Obstacle Avoidance for Fast Mobile Robots, IEEE Trans. on Systems, Man and Cybernetics, 19(5):1179-1187 Borenstein, J. & Koren, Y. (1991). The Vector Field Histogram: Fast Obstacle Avoidance for Mobile Robots, IEEE J. on Robotics and Automation, 17(3):278-288 Borenstein, J. & Raschke, U. (1992). Real-Time Obstacle Avoidance for Non-Point Mobile Robots, SME Trans. on Robotics Research, 2(1):1-10 Bradley, A.M. & Yoerger, D.R. (1993). Design and Testing of the Autonomous Benthic Explorer, 20th Ann. Technical Symposium and Exhibition (AUVS-93), Washington, USA Brenner, W.; Zarnekow, R. & Wittig, H. (1998). Intelligent Software Agents, Springer-Verlag, Berlin, Germany Briot, M; Talou, J.C. & Bauzil, G. (1979). Le Systeme de Perception du Robot HILARIE, 2eme Congress AFCET/IRIA, Toulouse, France Brooks, F.P.; Ouh-Uoung, M.; Batter, J.J. & Kilpatrick, P.J. (1990). Project GROPE: Haptic Displays for Scientific Visualisation, ACM Computer Graphics, 24(4):177-185 Brooks, R.A. (1986). A Robust Layered Control System for a Mobile Robot, IEEE J. Robotics and Automation, 1(2):14-23 305 REFERENCES Brooks, R.A. (1989). The Behaviour Language User’s Guide. Artificial Intelligence Laboratory Memo 1227, MIT, Cambridge, USA Brooks, R.A. & Connel, J.H. (1986). Asynchronous Distributed Control System for a Mobile Robot, in Proceedings of the SPIE, 727(1):77-84 Brown, K. & Petersen, D. (1999). Ready to run Java 3D, Wiley, New York, USA Buchberger, M.; Jorg, K.W. & Puttkamer (1993). Laser Radar and Sonar Based World Modelling and Motion Control for Fast Obstacle Avoidance of the Autonomous Mobile Robot MOBOT-IV, IEEE Intl. Conf. on Robotics and Automation, pp 534-539 Cabri, G.; Leonardi, L. & Zambonelli, F. (2000). Mobile-Agent Coordination Models for Internet Applications, IEEE Computer, 33(2):82-89 Cardelli, L. & Gordon, D. (1998). Mobile Ambients, in Foundations of Software Science and Computational Structures, Springer-Verlag, Stuttgart, Germany Chang, C.C. & Song, K.T. (1996). Ultrasonic Sensor Data Integration and its Application to Environment Perception, Journal of Robotic Systems, 13(10):663-677 Chauhan, D. (1997). JAFMAS: a Java-Based Agent Framework for Multiagent Systems Development and Implementation, PhD. Thesis, ECECS Dept., University of Cincinnati, USA Chauhan, D. & Baker, A. (1998). JAFMAS: a Multiagent Application Development System, in 2nd ACM Conf. on Autonomous Agents (M. Woolridge and T. Finin eds.) ACM Press Cheah, J.Y.C. (1999). Practical Wireless Data Modem Design, Artech House Publishers, Boston, USA Chien, S.Y.P.; Xue, L.Q. & Palakal, M. (1997). Task Planning for a Mobile Robot in an Indoor Environment using Object-Oriented Domain Information, IEEE Trans. on Systems, Man and Cybernetics, 27(6):1007-1016 Comer, D.E. & Stevens, D.L. (1993). Internetworking with TCP/IP Volume III: Client-Server Programming and Applications BSD Sockets Version, Prentice Hall, New Jersey, USA Congdon, C.; Huber, M.; Kortenkamp, D.; Konolige, K.; Myers, K.; Ruspini, E. & Saffiotti, A. (1993). CARMEL vs Flakey: a Comparison of Two Winners, AI Magazine, 14(1):49-57 Connel, J. (1992): SSS: a Hybrid Architecture Applied to Robot Navigation, IEEE Int. Conf. on Robotics and Automation, Los Alamitos, USA, pp 2719-2724 Cost, R.S. (1998). Jackal: a Java-Based Tool for Agent Development, in Working Papers of the AAAI-98 Workshop on Software Tools for Developing Agents, AAAI Press Courtney, J.D. & Jain, A.K. (1994). Neural network learning of variable grid-based maps for the autonomous navigation of robots, IEEE Intl. Conf. on Robotics and Automation, pp 40-45 Crevier, D. (1993). AI: The Tumultuous History of the Search for Artificial Intelligence, Basic Books, Harper Collins, New York, USA Cybenko, G. (1988). Continuous Valued Neural Networks with Two Hidden Layers Are Sufficient, Technical Report, Dept. of Computer Science, Tufts University, USA 306 HUMBERTO M ARTÍNEZ BARBERÁ Cybenko, G. (1989). Approximation by Superpositions of a Sigmoidal Function, Mathematics of Control, Signals, and Systems, 2(1):303-314 Dain, R. (1998). Developing Mobile Robot Wall-Following Algorithms using Genetic Programming, Applied Intelligence, 8(1):33-41 Decker, K.S. (1987). Distributed Problem Solving: a Survey, IEEE Trans. on Systems, Man and Cybernetics, 17(5):729-740 Delgado, M.; Gómez Skarmeta, A.F. & Martínez Barberá, H. (1997). A Tabu Search Approach to the Fuzzy Clustering Problem, Sixth IEEE Intl. Conf. on Fuzzy Systems (FuzzIEEE'97), Barcelona, Spain, pp 125-129 Delgado, M.; Gómez Skarmeta, A.F.; Martínez Barberá, H. & Gómez, J. (1998). Fuzzy Range Sensor Filtering for Reactive Autonomous Robots, Fifth Intl. Conference on Soft Computing and Information / Intelligent Systems (IIZUKA'98), Fukuoka, Japan, pp 688694 Dempster, A.P. (1967). Upper and Lower Probabilities Induced by a Multivalued Mapping, Annals of Mathematics and Statistics, 38(1):325-339 Denna, M.; Mauri, G. & Zanaboni, A.M. (1999). Learning Fuzzy Rules with Tabu Search – An Application to Control, IEEE Trans. on Fuzzy Systems, 7(2):295-318 Dorigo, M. & Colombetti, M. (1998). Robot Shaping: an Experiment in Behaviour Engineering, The MIT Press, Cambridge, USA Driankov, D.; Hellendoorn, H. & Reinfrank, M. (1993). An Introduction to Fuzzy Control, Springer-Verlag, Berlin, Germany Duckett, T.; Marsland, S. & Shapiro, J. (2000). Learning Globally Consistent Maps by Relaxation, IEEE Intl. Conf. on Robotics and Automation, California, USA Duckett, T. & Nehmzow, U. (1998). Mobile Robot Self-Localisation and Measurement of Performance in Middle-Scale Environments, Journal of Robotics and Autonomous Systems, 24(2):57-69 Duda, R. & Hart, R. (1973). Pattern Classification and Pattern Analysis, Wiley, New York, USA Dudek, G. & Jenkin, M. (2000). Computational Principles of Mobile Robotics, Cambridge University Press, Cambridge, UK Ferrell, W.R. (1965). Remote Manipulation with Transmission Delays, IEEE Trans. on Human Factors in Electronics, 6(1):24-32 FIPA (1997). FIPA ACL Version 1.0 Specifications: Agent Management and Agent Communication Language, in http://www.fipa.org Firby, R.J. (1989). Adaptive Execution in Complex Dynamic Worlds, Technical Report YALEU/CSD/RR 672, Dept. Computer Science, Yale University, USA Firby, R.J. (1994). Task Networks for Controlling Continuous Processes, 2nd Intl. Conf. On AI Planning Systems, Menlo Park, USA, pp 49-54 307 REFERENCES Flynn, A.M. (1988). Combining Sonar and Infrared Sensor for Mobile Robot Navigation, Intl. Journal Robot. Res., pp 5-14 Fukuda, T. & Kubota, N. (1999). An Intelligent Robotic System Based on a Fuzzy Approach, Proceedings of the IEEE, 87(9):1448-1470 Gallardo, D. (1999). Aplicación del Muestreo Bayesiano en Robots Móviles: Estrategias para la Localización y Estimación de Mapas del Entorno, PhD Thesis, Dept. of Computer Technology and Computer Science, University of Alicante, Spain García Alegre, M.C.; Gasós, J. & García Rosa, R. (1991). Fuzzy Control of Autonomous Mobile Robots, Congreso Español sobre Tecnologías y Lógica Fuzzy (ESTYLF’91), Granada, Spain, pp 71-77 García Alegre, M.C.; Ribeiro, A.; Gasós, J. & Salido, J. (1993). Optimization of Fuzzy Behaviour-Based Robots Navigation in Partially Known Industrial Environments, 3rd IEEE Intl. Conf. on Industrial Fuzzy Control and Intelligent Systems, Texas, USA García Cerezo, A.; Mandow, A. & López, M.J. (1997). Fuzzy Modelling Operator Navigation Behaviours, 6th IEEE Intl. Conf. on Fuzzy Systems, Barcelona, Spain, pp 1339-1345 García Rosa, R. & García Alegre, M.C. (1990). Fuzzy Logic Strategies to Control an Autonomous Mobile Robot, Intl. Journal on Cybernetics and Systems, 21(1):267-276 Gasós, J. & Martín, A. (1996a). Mobile Robot Localization using Fuzzy Maps, in Lectures Notes in Artificial Intelligence (A. Ralescu and T. Martín eds.), Springer-Verlag, Germany Gasós, J. & Martín, A. (1996b). A Fuzzy Approach to Build Sonar Maps for Mobile Robots, Computers in Industry, 32(1):151-167 Gasós, J. (2000). Integrating Linguistic Descriptions and Sensor Observations for the Navigation of Autonomous Robots, in Fuzzy Logic Techniques for Autonomous vehicle Navigation (D. Driankov and A. Saffiotti eds.), Springer-Verlag, Germany Gat, E. (1991). Reliable Goal-Directed Reactive Control for Real-World Autonomous Mobile Robots, PhD Thesis, Dept. Computer Science, Virginia Polytechnic Institute and State University, USA Gat, E. (1997). ESL: a Language for Supporting Robust Plan Execution in Embedded Autonomous Agents, IEEE Aerospace Conference, Los Alamitos, USA Gat, E. (1998). Three Layer Architectures, in Artificial Intelligence and Mobile Robots (D. Kortenkamp, P. Bonasso and R. Murphy eds.), The MIT Press Georgeff, M.P. & Lanskey, A. (1987). Reactive Reasoning and Planning, 6th National Conf. of Artificial Intelligence, Menlo Park, USA, pp 677 Glasius, R.; Komoda, A. & Gielen, S. (1995). Neural Networks Dynamics for Path Planning and Obstacle Avoidance, Neural Networks, 8(1):125-133 Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Goldberg, K.; Gentner, S.; Sutter, C. & Wiegley, J. (2000). The Mercury Project: a Feasibility Study for Internet Robots, IEEE Robotics and Automation Magazine, 7(1):3540 308 HUMBERTO M ARTÍNEZ BARBERÁ Gómez Skarmeta, A.F. & Jiménez, F. (1997). Generating and Tuning Fuzzy Rules with Hybrid Systems, Sixth IEEE Intl. Conference on Fuzzy Systems (FuzzIEEE’97), Barcelona, Spain, pp 247-252 Gómez Skarmeta, A.F. & Jiménez, F. (1999). Fuzzy Modelling with Hybrid Systems, Fuzzy Sets and Systems, 104(1):199-208 Gómez Skarmeta, A.F., Martínez Barberá, H. (1997). Fuzzy Logic Based Intelligent Agents for Reactive Navigation in Autonomous Systems , Fifth International Conference on Fuzzy Theory and Technology, Raleigh, USA, pp 125-131 Gómez Skarmeta, A.F.; Martínez Barberá, H. & Botía, J. (2000). Fuzzy Modelling in a MultiAgent Framework for Learning in Autonomous Systems, in Fuzzy Learning and Applications (M. Russo and L.C. Jain eds.), CRC Press Gómez Skarmeta, A.F.; Martínez Barberá, H. & Sánchez, M. (1999a). A Fuzzy Agents Architecture for Autonomous Mobile Robots, International Fuzzy Systems World Congress (IFSA'99), Taiwan, pp 713-717 Gómez Skarmeta, A.F.; Martínez Barberá, H. & Sánchez, M. (1999b). A fuzzy logic based language to model autonomous mobile robots, 8th IEEE Intl. Conference on Fuzzy Systems (FuzzIEEE’99), Seoul, Korea, pp 550-555 Gómez Skarmeta, A.F.; Martínez Barberá, H. & Sánchez, M. (1999c). Learning behaviour fusion in autonomous mobile robots, First EUSFLAT/ESTYLF Joint Conference (ESTYLF’99), Mallorca, Spain, pp 91-94 González Baños, H.; Gordillo, J.L.; Lin, D.; Latombe, J.C.; Sarmiento, A. & Tomasi, C. (1999). The Autonomous Observer: a Tool for Remote Experimentation in Robotics, SPIE Intl. Symposium on Voice, Video and Data Communications, Boston, USA Goodridge, S.G.; Kay, M.G. & Luo, R.C. (1997). Multi-Layered Fuzzy Behaviour Fusion for Reactive Control of an Autonomous Mobile Robot, IEEE Int. Conf. on Fuzzy Systems, Barcelona, Spain, pp 579-584 Goodridge, S.G. & Luo, R.C. (1994). Fuzzy Behaviour Fusion for Reactive Control of an Autonomous Mobile Robot: MARGE, IEEE Intl. Conf. on Robotics and Automation, San Diego, USA, pp 1622-1627 Grand, S. & Cliff, D. (1998). Creatures: Entertainment Software Agents with Artificial Life, Autonomous Agents and Multi-Agent Systems, 1(1):39-58 Harmon, S.Y. (1987). The Ground Surveillance Robot (GSR): an Autonomous Vehicle Designed to Transit Unknown Terrain, IEEE Trans. on Robotics and Automation, 3(3) Hart, P.E.; Nilsson, N.J. & Raphael, B. (1968). A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Trans. on Man, Systems, and Cybernetics, 4(2):100-107 Hayes-Roth, B.A. (1985). Blackboard for Control, Artificial Intelligence, 1(26):251-324 Hendler, J. (1999). Making Sense out of Agents, IEEE Intelligent Systems, 2(14):32-37 Herrera, F.; Lozano, M. & Verdegay, J.L. (1995). Tuning Fuzzy Logic Controllers by Genetic Algorithms, Intl. J. of Approximate Reasoning, 12(1):299-315 309 REFERENCES Hexmoor, H. & Meeden, L. (1997). Learning in Autonomous Robots: A Summary of the 1996 Robolearn Workshop, Knowledge Engineering Review, 11(4) Hirukawa, H. & Hara, I. (2000). Web-Top Robotics, IEEE Robotics and Automation Magazine, 7(2):40-45 Hoffmann, F. & Pfister, G. (1996). Evolutionary Learning of Fuzzy Control Rule Base for an Autonomous Vehicle, Proc. Sixth Int. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'96), Granada, Spain, pp 1235-1240 Holland, J.H. (1975). Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, USA Horswill, I.D. (2000). Functional Programming of Behaviour-Based Systems, Autonomous Robots, 9(1):83-93 Huberman, B. & Clearwater, S.H. (1995). A Multi-Agent System for Controlling Building Environments, 1st Intl. Conf. on Multi-Agent Systems, Menlo Park, USA, pp 171-176 Ishibuchi, H.; Nozaki, K.; Yamamoto, N. & Tanaka, H. (1995). Selecting Fuzzy If-Then Rules for Classification Problems Using Genetic Algorithms, IEEE Trans. on Fuzzy Systems, 3(3):260-270 Ishigami, H.; Fukuda, T.; Shibta, T. & Arai, F. (1995). Structure Optimization of Fuzzy Neural Network by Genetic Algorithms, Fuzzy Sets and Systems, 71(1):257-264 Jang, J.S. (1993). ANFIS: Adaptive-network -based fuzzy inference systems, IEEE Trans. on Systems, Man, and Cybernetics, 23(3):665-685 Jang, J.S.; Sun, C.T. & Mizutani, E. (1997). Neuro-Fuzzy and Soft Computing, Prentice Hall, New Jersey, USA Johnson, S.C. (1975). YACC: Yet Another Compiler Compiler, CS Technical Report#32, Bell Telephone Laboratories, Murray Hill, USA Joo. Y.H.; Hwang, H.S.; Kim, K.B. & Woo, K.B. (1997). Fuzzy System Modelling by Fuzzy Partition and GA Hybrid Schemes, Fuzzy Sets and Systems, 86(1):279-288 Kaelbling, L.P. (1987). REX: a Symbolic Language for the Design and Parallel Implementation of Embedded Systems, AIAA Conf. on Computers in Aerospace, Wakefield, USA Kam, M.; Zhu, X. & Kalata, P. (1997). Sensor Fusion for Mobile Robot Navigation, Proceedings of the IEEE, 85(1):108-119 Karnik, N.M. & Tripathi, A.R. (1998). Design Issues in Mobile -Agent Programming Systems, IEEE Concurrency, July/Sept, pp 52-61 Khatib, O. (1985). Real-Time Obstacle Avoidance for Manipulators and Mobile Robots, IEEE Intl. Conf. on Robotics and Automation, St. Louis, USA, pp 500-505 Khatib, O. (1986). Real-Time Obstacle Avoidance for Manipulators and Mobile Robots, Intl. Journal of Robotics Research, 5(1):90-98 Kim, W.S. & Stark, L.W. (1989). Cooperative Control of Visual Displays for Telemanipulation, IEEE Intl. Conf. on Robotics and Automation, pp 1327-1332 310 HUMBERTO M ARTÍNEZ BARBERÁ Kirn, S. (1996). Kooperativ-Intelligente Softwareagenten, Information Management, 11(1) Ko, N.Y. & Simmons, R.G. (1998). The Lane-Curvature Method for Local Obstacle Avoidance, IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, Victoria, Canada Koenig, S. & Simmons, R.G. (1998). Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models, in Artificial Intelligence Based Mobile Robotics (D. Kortenkamp, R. Bonasso and R. Murphy eds.), MIT Press, USA Konolige, K. (1997). COLBERT: a Language for Reactive Control in Saphira, German Conference on Artificial Intelligence, Freiburg, Germany Konolige, K.; Meyers, K.; Saffiotti, A. & Ruspini, E. (1997). The Saphira Architecture: A Design for Autonomy , Journal of Experimental and Theoretical Artificial Intelligence, 9(1):215-235 Koren, Y. & Borenstein, J. (1991). Potential Field Methods and Their Inherent Limitations for Mobile Robot Navigation, IEEE Intl. Conf. on Robotics and Automation, Sacramento, USA, pp 1398-1404 Kosko, B. (1992). Neural Networks and Fuzzy Systems , Prentice Hall, New Jersey, USA Krogh, B.H. (1984). A Generalised Potential Field Approach to Obstacle Avoidance Control, IEEE Intl. Conf. on Robotics Research, Bethlehem, USA Krogh, B.H. & Thorpe, C.E. (1986). Integrated Path Planning and Dynamic Steering Control for Autonomous Vehicles, IEEE Intl. Conf. on Robotics and Automation, San Francisco, USA, pp 1664-1669 Kröse, B. & van Dam, J. (1997). Neural Vehicles, in Neural Systems for Robotics (O. Omidvar and P. van der Smagt eds.), Academic Press, San Diego Kurz, A. (1996). Constructing maps for mobile robot navigation based on ultrasonic range data, IEEE Trans. on System, Man, and Cybernetics, 26(2):233-242 Kwon, Y.D.; Won, J.M. & Lee, J.S. (1998). Control of a Mobile Robot by using Evolutionary Fuzzy Controller, IEEE Intl. Conf. ICEC, pp 422-426 Labrou, Y. (1997). Semantics for an Agent Communication Language, PhD. Thesis, Dept. of Computer Science and Electronic Engineering, University of Maryland, USA Labrou, Y. & Finin, T. (1994). A Semantics Approach for KQML – a General Pu rpose Communication Language for Software Agents, 3rd Intl. Conf. on Information and Knowledge Management Labrou, Y. & Finin, T. (1997). A Proposal for a New KQML Specification, Technical Report TR-CS-97-03, Computer Science and Electrical Engineering Dept., University of Maryland, USA Labrou, Y.; Finin, T. & Peng, Y. (1999). Agent Communication Languages: the Current Landscape, IEEE Intelligent Systems, 14(2):45-52 Labrou, Y. & Weber, J. (1993). Specification of the KQML Agent Communication Language, DARPA Knowledge Sharing Initiative, External Interfaces Working Group 311 REFERENCES Lee, D. (1996). The Map-Building and Exploration Strategies of a Simple Sonar-Equipped Robot, Cambridge University Press, Cambridge Lekei, D. (1997). Using a PIC16C5X as a Smart I2C Peripheral, AN541, Microchip Technology, Chandler, USA Leonard, J. & Durrant-Whyte, H. (1992). Directed Sonar Sensing for Mobile Robot Navigation, Kluwer, Dordrecht, The Netherlands Levesque, H.; Reiter, R.; Lespérance, Y.; Lin, F. & Scherl, R. (1997). GOLOG: A Logic Programming Language for Dynamic Domains, Journal of Logic Programming, 31(1):5984 Levy, R. & Rosenschein, J.S. (1992). A Game Theoretic Approach to the Pursuit Problem, 11th Intl. Workshop on Distributed Artificial Intelligence, pp 195-213 Li, W.; Ma, C. & Wahl, F.M. (1997). A Neuro-Fuzzy System Architecture for BehaviourBased Control of a Mobile Robot in Unknown Environments, Fuzzy Sets and Systems, 87(1):133-140 Lim, J.H. & Cho, D.W. (1992). Physically Based Sensor Modelling for a Sonar Map in a Specular Environment, IEEE Intl. Conf. on Robotics and Automation, pp 1714-1719 Liu, J. & Xie, W. (1995). A Genetics-Based Approach to Fuzzy Clustering, IEEE Intl. Conf. on Fuzzy Systems López Sánchez, M.; López de Mántaras, R. & Sierra, C. (1998). Possibility Theory-Based Environment Modelling by Means of Behaviour Based Autonomous Robots, European Conference on Artificial Intelligence, pp 590-594 Lund, H.H.; Miglino, O.; Pagliarini, L.; Billard, A. & Ijspeert, A. (1998). Evolutionary Robotics – A Children’s Game, IEEE Intl. Conf. ICEC, pp 154-158 Luo, R.C. & Kay, M.G. (1989). Multisensor Integration and Fusion in Intelligent Systems , IEEE Transaction on Systems, Man and Cybernetics, 19(5):901-931 Machida, K.; Toda, Y. & Iwata, T. (1990). Graphic-Simulator-Augmented Teleoperation System for Space Applications, J. of Spacecraft and Rockets, 27(1):64-69 MacKenzie, D. & Arkin, R. (1997). Evaluating the Usability of Robot Programming Toolsets, Technical Report, Georgia Institute of Technology, USA Maravall, D.; de Lope, J. & Serradilla, F. (2000). Combination of Model-Based and Reactive Methods in Autonomous Navigation, IEEE Intl. Conf. on Robotics and Automation, San Francisco, USA, pp 2328-2333 Martin, F.G. (1994). Circuit to Control: Learning Engineering by Designing LEGO Robots, Ph.D. Thesis, MIT, USA Martin, F.G. (1996). The Interactive C Manual for the HandyBoard, MIT, USA Martinez, A.; Tunstel, E. & Jamshidi, M. (1994). Fuzzy Logic Based Collision Avoidance for a Mobile Robot, Robotica Matsuyama, T. (1997). Cooperative Distributed Vision, 1st Intl. Workshop on Cooperative Distributed Vision, Kyoto, Japan 312 HUMBERTO M ARTÍNEZ BARBERÁ McDermott, D. (1994). A Reactive Plan Language, Technical Report YALEU/CSD/RR 864, Dept. of Computer Science, Yale University, USA Michaud, F. (1997). Selectin g Behaviours using Fuzzy Logic, IEEE Int. Conf. on Fuzzy Systems, Barcelona, Spain, pp 585-592 Michaud, F.; Lachiver, G. & Dinh, C.T.L. (1996). Fuzzy Selection and Blending of Behaviours for Situated Agents, IEEE Intl. Conf. on Fuzzy Systems, New Orleans, USA Michel, O.; Saucy, P. & Mondada, F. (1997). KhepOnTheWeb: an Experimental Demonstrator in Telerobotics and Virtual Reality, in Intl Conf. on Virtual Systems and Multimedia, IEEE Computer Society Press Mitchell, T. (1997). Machine Learning, McGraw Hill, New York, USA Moravec, H. (1990). The Stanford CART and the CMU Rover, Autonomous Robot Vehicles, I.J. Cox and G.T. Wilfong eds., Springer-Verlag Moravec, H.P. & Elfes, A. (1985). High Resolution Maps from Wide angle Sonar, IEEE Intl. Conf. on Robotics and Automation, Washington, USA, pp 116-121 Nehmzow, U. (2000). Mobile Robotics: a Practical Introduction, Series on Applied Computing, Springer-Verlag, London, UK Nguyen, D. & Widrow, B. (1990). The Trucker Backer-Upper: an Exmaple of Self-Learning in Neural Networks, in Neural Networks for Control (W.T. Miller, R. Sutton and P. Werbos eds.), Bradford Book, MIT Press, Cambridge Nilsson, N.J. (1969). A Mobile Automaton: An Application of Artificial Intelligence Techniques, Proc. of IJCAI, reprinted in Autonomous Mobile Robots: Control, Planning and Architecture, 2(1): 233-239 Nilsson, N.J. (1980). Principles of Artificial Intelligence, Tioga Press, Palo Alto, USA Norris, J.S. & Backes, P. (2000). WEDDS: The WITS Encrypted Data Delivery System, IEEE Aerospace Conf., Big Sky, USA Ollero, A.; García Cerezo, A.; Martínez, J.L. & Mandow, A. (1998). Fuzzy Tracking Methods for Mobile Robots, in Applications of Fuzzy Logic: Towards a High Machine Intelligence Quotient (M. Jamshidi ed.), Prentice Hall Opitz, R. & Lernfahrzeug, D. (1990). Neural Network Application for Autonomous Mobile Robots, in Advanced Neural Computers (R. Eckmiller ed.), Elsevier, Amsterdam Oriolo, G.; Ulivi, G. & Venditelli, M. (1998). Real-Time Map Building and Navigation for Autonomous Mobile Robots in Unknown Environments, IEEE Trans. on Systems, Man and Cybernetics-Part B: Cybernetics, 3(28):316-333 Pal, N.C.; Bezdek, J.C. & Tsao, E.C.-K. (1993). Generalized Clustering Networks and Kohonen's Self -Organizing Scheme, IEEE Trans. on Neural Networks, 4(1):549-557 Parunak, H.vD. (1996). Applications of Distributed Artificial Intelligence in Industry, in Foundations of Distributed Artificial Intelligence (G.M.P. O’Hare and N.R. Jennings eds.), Wiley Interscience 313 REFERENCES Parunak, H.vD.; Baker, A.D. & Clark, S.J. (1997). The AARIA Agent Architecture: an Example of Requirements-Driven Agent-Based System Design, 1st Intl. Conf. on Autonomous Agents, Marina del Rey, USA Paulos, E. & Canny, J. (1996). Delivering Real Reality to the World Wide Web via Telerobotics, IEEE Intl. Conf. on Robotics and Automation Paulos, E. & Canny, J. (1999). Designing Personal Tele-Embodiment, IEEE Intl. Conf. on Robotics and Automation, Detroit, USA Payton, D.W. (1986). An Architecture for Reflexive autonomous Vehicle Control, IEEE Intl. Conf. on Robotics and Automation, pp 1838-1843 Pearson, J.C.; Gelfand, J.J.; Sullivan, W.E.; Peterson, R.M. & Spence, C.D. (1995). Neural Network Approach to Sensory Fusion, in Multisensor Integration and Fusion for Intelligent Machines and Systems (R.C. Luo and M.G. Kay eds.), Ablex Publishing, USA Peterson, J.; Hager, G. & Hudak, P. (1999). A Language for Declarative Robot Programming, in Proceedings of the Intl. Conference on Robotics and Automation, IEEE Press, USA Pin, F.G. & Watanabe, Y. (1994). Navigation of Mobile Robots using a Fuzzy Behaviourist Approach and Custom-Designed Fuzzy Inference Boards, Robotica, 12(1):491-503 Pomerleau, D.A. (1991). Efficient Training of Artificial Neural Networks for Autonomous Navigation, Neural Computation, 3(1):88-97 Pratihar, D.K.; Deb, K. & Ghosh, A. (1999). A Genetic-Fuzzy Approach for Mobile Robot Navigation Among Moving Obstacles, Intl. J. of Approximate Reasoning, 20(1):145-172 Ram, A.; Arkin, R.C.; Boone, G. & Pearce, M. (1994). Using Genetic Algorithms to Learn Reactive Control Parameters for Autonomous Robotic Navigation, Adaptive Behaviour, 2(3):277-304 Ram, A.; Arkin, R.C.; Moorman, K. & Clark, R.J. (1997). Case-Based reactive Navigation: a Method for On-Line Selection and Adaptation of Reactive Robotic Control Parameters, IEEE Trans. on Systems, Man and Cybernetics, 27(3):376-394 Reyes, C.; Alcalde, S.; Hernán, I.; Reviejo, J.; García-Rosa, R. & de Pedro, T. (1999). Controlador de Vehículos sin Conductor: Simulación e Instrumentación, VIII Conf. de la Asociación Española para la Inteligencia Artificial, Murcia, Spain, pp 9-17 Rochwerger, B; Fennema, C.L.; Draper, B.; Hanson, A.R. & Riseman, E.M. (1994). Executing Reactive Behaviour for Autonomous Navigator, Technical Report CMPSCI TR94-05, Dept. of Computer Science, University of Massachusetts at Amherst, USA Roehl, B.; Couch, J.; Reed-Ballreich, C.; Rohaly, T. & Drown, J. (1997). Late Night VRML 2.0 with Java, Ziff-Davis Press, USA Rosenbaltt, F. (1959). Two Theorems of Statistical Separability in the Perceptron, Symp. Mechanization of Thought Processes, London, UK, pp 421-456 Rosenblatt, J.K. (2000). Optimal Selection of Uncertain Actions by Maximizing Expected Utility, Autonomous Robots, 9(1):17-25 Rosenblatt, J.K. & Payton, D.W. (1989). A Fine-Grained Alternative to the Subsumption Architecture, AAAI Spring Symposium on Robot Navigation, Stanford, USA 314 HUMBERTO M ARTÍNEZ BARBERÁ Ruspini, E. (1991). Truth as Utility: a Conceptual Synthesis, 7th Conf. on Uncertainty in Artificial Intelligence, Los Angeles, USA Rusell, S.J. & Norvig, P. (1995). Artificial Intelligence: a Modern Approach, Prentice Hall, Englewood Cliffs, USA Saffiotti, A. (1993). Some Notes on the Integration of Planning and Reactivity in Autonomous Mobile Robots, AAAI Spring Symposium on Foundations of Planning, Stanford, USA, pp 122-126 Saffiotti, A. (1997a). The use of Fuzzy Logic for Autonomous Robot Navigation, Soft Computing, 1(4):180-197 Saffiotti, A. (1997b). Fuzzy Logic in Autonomous Robotics: Behaviour Coordination, Sixth IEEE Intl. Conference on Fuzzy Systems (FuzzIEEE’97), Barcelona, Spain, pp 573-578 Saffiotti, A. et al. (1994). Robust Execution of Robot Plans using Fuzzy Logic, in Fuzzy Logic in Artificial Intelligence: IJCAI'93 Workshop. Lectures Notes in Artificial Intelligence 847 (Ralescu A. Ed.). Springer-Verlag, pp 24-37 Saffiotti, A.; Ruspini, E. & Konolige, K. (1993). Blending Reactivity and Goal-Directness in a Fuzzy Controller, Second IEEE Intl. Conference on Fuzzy Systems (FuzzIEEE’93), San Francisco, USA, pp 134-139 Saffiotti, A.; Ruspini, E. & Konolige, K. (1999). Using Fuzzy Logic for Mobile Robot Control, in International Handbook of Fuzzy Sets (D. Dubois, H. Prade and H. Zimmermann eds.), Kluwer Academic Publishing Sánchez, O.; Ollero, A. & Heredia, G. (1997). Adaptive Fuzzy Control for Automatic Path Tracking of Outdoor Mobile Robots. Application to Romeo 3R, IEEE Int. Conf. on Fuzzy Systems, Barcelona, Spain, pp 593-599 Santos, V.; Gonzalves, J. & Vaz, F. (1994). Perception Maps for the Local Navigation of a Mobile Robot: a Neural Network Approach, IEEE Intl. Conf. on Robotics and Automation, pp 2193-2198 Saucy, P. & Mondada, F. (2000). KhepOnTheWeb: Open Access to a Mobile Robot on the Internet, IEEE Robotics and Automation Magazine, 7(1):41-47 Sayers, C. (1999). Remote Control Robotics, Springer-Verlag, New York, USA Schultz, A.C. & Nourbakhsh, I.R. (2000). Computational Intelligence in Robotics and Automation, Autonomous Robots, 9(1):5-6 Schultz, D.; Burgard, W.; Fox, D.; Thrun, S. & Cremers, A.B. (2000). Web Interfaces for Mobile Robots in Public Places, IEEE Robotics and Automation Magazine, 7(1):48-56 Serradilla, F. & Maravall, D. (1998). A Novel Approach to Autonomous Navigation: the Sensory Gradient Concept and its Application to Mobile Robots, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Madrid, Spain, pp 383-388 Shaad, R. (1998). Representation and Execution of Situated Action Sequences, Dissertation der Wirtschaftswissenschaftlichen, University of Zurich, Switzerland Shafer, G. (1976). A Mathematical Theory of Evidence, Princeton University Press, Priceton, USA 315 REFERENCES Sheridan, T. (1992). Telerobotics, Automation, and Human Supervisory Control, MIT Press, Cambridge, USA Simmons, R. & Apfelbaum, D. (1998). A Task Description Language for Robot Control, in Proceedings of the Conference on Intelligent Robots and Systems, IEEE Press, USA Simmons, R.; Fernández, J.L.; Goodwin, R.; Koening, S. & O´Shullivan, J. (2000). Lessons Learned from Xavier, IEEE Robotics and Automation Magazine, 7(2):33-39 Stanev, P.; Enchev, E.; Noikov, S.; Manolov, O. & Zaprianov, J. (1998). Fuzzy Control and reflexive Behaviour of an Autonomous Vehicle in an Unknown Environment, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Madrid, Spain, pp 521-525 Stein, M.R. (2000). Interactive Internet Artistry, IEEE Robotics and Automation Magazine, 7(2):28-32 Stone, P. (2000). Layered Learning in Multiagent Systems, The MIT Press, Cambridge, USA Stone, P. & Veloso, M. (1999). Task Decomposition, Dynamic Role Assignment, and LowBandwidth Communication for Real-Time Strategic Teamwork, Artificial Intelligence, 110(2):241-273 Stone, P. & Veloso, M. (2000). Multiagent Systems: a Survey from a Machine Learning Perspective, Autonomous Robots, 8(3):345-383 Stone, P.; Veloso, M. & Riley, P. (1999). The CMUnited-98 Champion Simulator Team, in RoboCup-98: Robot Soccer World Cup II (M. Asada and Kitano H. eds.), Springer-Verlag, Berlin, Germany Sugeno, M. & Nishida, M. (1985). Fuzzy Control of a Model Car, Fuzzy Sets and Systems, 16(1):103-113 Sugeno, M. & Takagi, T. (1983). Multidimensional Fuzzy Reasoning, Fuzzy Sets and Systems, 9(1) Surmann, H.; Huser, J. & Peters, L. (1995). A Fuzzy System for Indoor Mobile Robot Navigation, IEEE Intl Conf. on Fuzzy Systems, Yokohama, Japan, pp 83-86 Sycara, K.; Decker, K.; Pannu, A.; Williamson, M. & Zeng, D. (1996). Distributed Intelligent Agents, IEEE Expert, 11(6):36-46 Takagi, T. & Sugeno, M. (1985). Fuzzy Identification of Systems and its Application to Modelling and Control, IEEE Trans. on Systems, Man, and Cybernetics, 15(1):116-132 Tanenbaum, A.S. (1996). Computer Networks, Prentice Hall, New Jersey, USA Taylor, K. & Dalton, B. (2000a). Internet Robots: a New Robotics Niche, IEEE Robotics and Automation Magazine, 7(1):27-34 Taylor, K. & Dalton, B. (2000b). Distributed Robotics over the Internet, IEEE Robotics and Automation Magazine, 7(2):22-27 Tekalp, A.M. (1995). Digital Video Processing, Prentice Hall, New Jersey, USA Thorpe, C.F. (1985). Path Relaxation: Path Planning for a Mobile Robot, Annual Report, Robotics Institute, Mobile Robots Laboratory, Carnegie -Mellon University, USA, pp 3942 316 HUMBERTO M ARTÍNEZ BARBERÁ Thrun, S. (1998a). Learning Metric-Topological Maps for Indoor Mobile Robot Navigation, Artificial Intelligence, 99(1):21-71 Thrun, S. (1998b). A Framework for Programming Embedded Systems: Initial Design and Results, Technical Report CMU-CS-98-142, Carnegie Mellon University, USA Topalov, A.V.; Tsankova, D.D.; Petrov, M.G. & Proychev, T.P. (1998). Intelligent SensorBased Navigation and Control of Mobile Robot in a Partially Known Environment, 3rd IFAC Symposium on Intelligent Autonomous Vehicles, Madrid, Spain, pp 439-444 Tso, K.S.; Backes, P.G. & Tharp, G.K. (1998). Mars Pathfinder Mission Internet-Based Operations using Wits, IEEE Intl. Conf. on Robotics and Automation, Leuven, Belgium Tsotsos, J.K. (1997). Intelligent Control for Perceptually Attentive Agents: the S* Proposal, Robotics and Autonomous Systems, 21(5):5-21 Tunstel, E. (1995). Adaptive Hierarchy of Distributed Fuzzy Control: Application to Behaviour Control of Rovers, Ph.D. Thesis, University of New Mexico, USA Tunstel, E. (1996). Mobile Robot Autonomy via Hierarchical Fuzzy Behaviour Control, World Automation Congress, ISARM Track, Montpellier, France, pp 837-642 Tunstel, E. & Jamshidi, M. (1996). On Genetic Programming of Fuzzy Rule Based Systems for Intelligent Control, Intl. Journal of Intelligent Automation and Soft Computing, 3(1) Ulrich, I. & Borenstein, J. (1998). VFH+: Reliable Obstacle Avoidance for Fast Mobile Robots, IEEE Intl. Conf. on Robotics and Automation, Leuven, Belgium, pp 1572-1577 Ulrich, I. & Borenstein, J. (2000). VFH*: Local Obstacle Avoidance with Look Ahead Verification, IEEE Intl. Conf. on Robotics and Automation, San Francisco, USA, pp 25052511 Urzelai, J.; Uribe, J.P. & Ezkerra, M. (1997). Fuzzy Controller for Wall-Following with a Non-Holonomous Mobile Robot, IEEE Int. Conf. on Fuzzy Systems, Barcelona, Spain, pp 1361-1368 Vasudevan, V. (1998). Comparing Agent Communication Languages, OBJS Technical Note 9807, Object Services and Consulting Inc., USA (http://www.objs.com/agility/techreports/9807-comparing-ACLs.html) Velasco, J.R. & Ventero, F.E. (1994). Some Applications of Fuzzy Clustering to Fuzzy Control Systems , Third International Conference of Fuzzy Theory and Technology, Durham, USA Villalba, G. & Martínez Barberá, H. (1998). Robot Autónomo con Control Reactivo, MsE Thesis, Escuela Politécnica Superior, Universidad de Murcia, Spain Voudouris, C.; Chernett, P.; Wang, C.J. & Callaghan, V.L. (1994). Fuzzy Hierarchical Control for Autonomous Vehicles, Intl. Workshop on Intelligent Robotic Systems Vranes, S. & Stanojevic, M. (1995). Integrating Multiple Paradigms within the Blackboard Architecture, IEEE Transactions on Software Engineering, 3(21): 244-262 Walter, W.G. (1950). An Imitation of Life, Scientific American, 182(5):42-45 317 REFERENCES Wang, L. & Langari, R. (1996). Complex Systems Modelling via Fuzzy Logic , IEEE Trans. on Systems, Man and Cybernetics, 1(26):100-106 Wang, L. & Mendel, J.M. (1992). Generating Fuzzy Rules by Learning from Examples, IEEE Trans. on Systems, Man and Cybernetics, 22(6):1414-1427 Wang, X. (1996). Planning while Learning Operators, 3rd Intl. Conf. on AI Planning Systems Wang, Y. & Chirikjian, G. (2000). A New Potential Field Method for Robot Path Planning, IEEE Int. Conf. on Robotics and Automation, San Francisco, USA, pp 977-982 Watt, A. & Watt, M. (1992). Advanced Animation and Rendering Techniques, ACM Press, 1992 Welch, J.C. (1999). Wireless Networking and the AirPort, MacTech Magazine, 15(12):14-32 Werbos, P. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science, PhD Thesis, Harvard University, Cambridge, USA Widrow, B. & Hoff, M.E. (1960). Adaptive Switching Circuits, IRE Western Electric Show and Convention Record, USA, pp 96-104 Xu, W.L. & Tso, S.K. (1999). Sensor-Based Fuzzy reactive Navigation of a Mobile Robot Through Local target Switching, IEEE Trans. on Systems, Man and Cybernetics, 29(3):451-459 Yager, R. & Filev, D. (1994). Approximate Clustering via the Mountain Method, IEEE Trans. on Systems, Man, and Cybernetics, 24(1):1279-1284 Yen, J. & Pfluger, N. (1995). A Fuzzy Logic Based Extension to Payton and Rosenblatt’s Command Fusion Method for Mobile Robot Navigation, IEEE Trans. on Systems, Man, and Cybernetics, 25(6):971-978 Yeong, W.; Howes, T. & Kille, S. (1993). Lightweight Directory Access Protocol, RFC 1487 Yuan, B.; Klir, G.J. & Swan-Stone, J.F. (1995). Evolutionary Fuzzy C-Means Clustering Algorithm, IEEE Zadeh, L.A.. (1965). Fuzzy Sets, Inf. Control, 8(1):338-353 Zalama, E.; Gaudiano, P. & López Coronado, J. (1995). A Real-Time, Unsupervised Neural Network for the Low-Level Control of a Mobile Robot in a Nonstationary Environment, Neural Networks, 8(1):103-123 318