XI Jornadas de Tiempo Real - Departament de Matemàtiques i

Transcripción

XI Jornadas de Tiempo Real
Palma de Mallorca, 7 y 8 de febrero de 2008
XI Jornadas de Tiempo Real
Palma de Mallorca, 7 y 8 de febrero de 2008
Editores:
Albert Llemosí
Julián Proenza
Comité Organizador:
Albert Llemosí
Julián Proenza
Departament de Ciències Matemàtiques i Informàtica
Universitat de les Illes Balears
Cra de Valldemossa km 7,5
07122 Palma de Mallorca
Entidades patrocinadoras:
Conselleria d’Economia, Hisenda i Innovació del Govern Balear*
Fundació “La Caixa”
Ministerio de Educación y Ciencia
Universitat de les Illes Balears
(*)La aportación de la Conselleria d’Economia, Hisenda i Innovació del Govern
Balear ha sido cofinanciada con fondos FEDER
Lista de Contribuciones
1
2
3
Modelización y Métodos Formales
1.1 Adaptive Petri Nets implementation. The Execution Time Controller.
R. Piedrafita Moreno and J. L. Villarroel Salcedo . . . . . . . . . . . . .
1.2 Modeling and Verification of Master/Slave Clock Synchronization Using
Model-Checking.
G. Rodrı́guez-Navas, J. Proenza and H. Hansson . . . . . . . . . . . . .
1.3 Software Modeling of Quality-Adaptable Systems.
J. F. Briones, M. A. de Miguel, A. Alonso and J. P. Silva . . . . . . . . . .
1
. . . . . . . . . . . . .
Hybrid Automata and
3
. . . . . . . . . . . . .
11
. . . . . . . . . . . . .
21
Análisis Temporal
2.1 Considerations on the LEON cache effects on the timing analysis of on-board applications.
G. Bernat, A. Colin, J. Esteves, G. Garcia, C. Moreno, N. Holsti, T. Vardanega and M. Hernek
2.2 A Stochastic Analysis Method for Obtaining the Distribution of Task Response Times.
J. Vila-Carbó and E. Hernández-Orallo . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 D-P domain feasibility region in dynamic priority systems.
P. Balbastre, I. Ripoll and A. Crespo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Providing Memory QoS Guarantees for Real-Time Applications.
A. Marchand, P. Balbastre, I. Ripoll and A. Crespo . . . . . . . . . . . . . . . . . . . . . . .
Sistemas Operativos y Middleware
3.1 Operating System Support for Execution Time Budgets for Thread Groups.
M. Aldea Rivas and M. González Harbour . . . . . . . . . . . . . . . . . . .
3.2 Una Máquina Virtual para Sistemas de Tiempo Real Crı́ticos.
J. A. Pulido, S. Urueña, J. Zamorano y J. A. de la Puente . . . . . . . . . . .
3.3 Middleware based on XML technologies for achieving true interoperability
gramming tools.
E. Estevez, M. Marcos, F. Perez and D. Orive . . . . . . . . . . . . . . . . .
3.4 Real-Time Distribution Middleware from the Ada Perspective.
H. Pérez, J. J. Gutiérrez, Daniel Sangorrı́n and M. González Harbour . . . .
27
. .
29
. .
38
. .
45
. .
55
65
. . . . . . . . . . .
67
. . . . . . . . . . .
between PLC pro-
72
. . . . . . . . . . .
78
. . . . . . . . . . .
84
4
Sistemas Distribuidos
99
4.1 Integración de RT-CORBA en robots Eyebot.
M. Dı́az, D. Garrido, L. Llopis y R. Luque . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 An Ada 2005 Technology for Distributed and Real-Time Component-based Applications.
P. López Martı́nez, J. M. Drake, P. Pacheco and J. L. Medina . . . . . . . . . . . . . . . . . . . . 108
4.3 An Architecture to Support Dynamic Service Composition in Distributed Real-Time Systems.
I. Estévez-Ayres, L. Almeida, M. Garcı́a-Valls and P. Basanta-Val . . . . . . . . . . . . . . . . . . 122
5
Herramientas de Desarrollo
131
5.1 Plataforma Java para el Desarrollo de Aplicaciones Empotradas con Restricciones Temporales.
J. Viúdez y J. A. Holgado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6
Sistemas de Control
153
6.1 A Taxonomy on Prior Work on Sampling Period Selection for Resource-Constrained Real-Time
Control Systems.
C. Lozoya, M. Velasco, P. Martı́, J. Yépez, F. Pérez, J. Guàrdia, J. Ayza, R. Villà and J. M. Fuertes 155
6.2 Distributed Control of parallel robots using passive sensor data.
A. Zubizarreta, I. Cabanes, M. Marcos, D. Orive and C. Pinto . . . . . . . . . . . . . . . . . . . 159
v
1. Modelización y Métodos Formales
1. Modelización y Métodos Formales
3
Adaptive Petri Nets implementation. The Execution Time Controller.
Ramón Piedrafita Moreno. José Luis Villarroel Salcedo
Aragon Institute for Engineering Research (I3A), University of Zaragoza,
Maria de Luna, 1. 50018, Zaragoza, Spain
(Tel: 0034 976762335{piedrafi, jlvilla}@unizar.es)
Abstract: In this work we have developed a technique which allows the choice in real time of the most
suitable algorithm to execute a Petri Net for control applications in accordance with its structure and the
sequence of control events. Thus, we decided to design a controller, which we have called Execution Time
Controller (ETC). The aim of the ETC is to determine in real time which algorithm executes the Petri Net
fastest and to change the execution algorithm when necessary. In this way, the two Petri net
implementation algorithms with better performances have been taken into account, the Enabled Transition
and the Static Representing Places techniques. In the case of system control, this minimizes the controller
reaction time and also the power consumed by the controller. One possible application of the technique is
the minimization of execution time of the Programmable Logic Controllers programs developed in
Grafcet. The well behavior of the ETC has been established by a set of test using a parametric Petri net
library and the Java real-time specification.
1. INTRODUCTION
Petri Nets (PN) are a formalism which is well suited to
modeling concurrent discrete event systems: it has been
satisfactorily applied in fields such as communication
networks, computer systems, discrete part manufacturing
systems, etc. Net models are often regarded as selfdocumented specifications, because their graphical nature
facilitates communication among designers and users. Petri
Nets have a strong mathematical basis which allows
validation and verification of a wide range of correctness and
liveness properties. Moreover, these models are executable
and can be used to animate and simulate the behavior of the
system and also for monitoring purposes once the system is
readily working. The final system can be derived from a Petri
Net model by means of hardware and software (code
generation) implementation techniques. In other words, they
can be used throughout the life cycle of a system. In this
paper we assume that the reader is familiar with the basic
concepts of Petri Nets (Murata, 1989).
The implementation of Petri Nets has received considerable
attention from researchers over the last twenty five years.
This implementation involves the translation of a system
model expressed by a Petri Net to an actual system with the
same behavior as the model. It has been studied in numerous
application fields such as digital hardware, simulation,
robotic systems and other concurrent software applications.
However, the most extended application is the GRAFCET, a
standardized language for programmable logic controllers.
A Petri Net implementation may be either hardware or
software. However, we are interested in the latter; i.e.
software implementation. A software implementation is a
program which fires the net transitions, observing marking
evolution rules; i.e., it plays the “token game”. An
implementation is composed of a control part and an
operational part. The control part applies to the structure,
marking and evolution rules of the Petri Net. On the other
hand, the operational part is the set of actions and/or codes of
the application associated with the net elements.
In a centralized implementation, which is the most extended
approach, the full control part is executed by a single task,
commonly referred to as token player or coordinator. The
execution of a Petri net without a suitable algorithm can
suppose increases of the response time in control
applications. Moreover, the reduction of the CPU overload
introduced by implantation algorithm can allow the execution
of other tasks (tests, communications, statistics, …) and the
use of more complex control models.
The objective of the technique developed in the present paper
is to determine in real time which algorithm executes a Petri
Net fastest and to minimize the execution computation time.
Furthermore, this must be achieved with minimum
computation time overload.
An analysis of the features of Petri Net centralized
implementation algorithms was carried out in (Piedrafita and
Villarroel, 2007). Brute Force, Enabled Transitions, Static
Representing Places and Dynamic Representing Places
algorithms were analyzed. The following conclusions were
reached:
•
•
The implementation of the Enabled Transitions, Static
and Dynamic Representing Places algorithms brings
about a drastic reduction in the execution computation
time compared to the Brute Force algorithm.
If the Static Representing Places algorithm chooses
suitable Representing Places, performance is similar to
or better than Dynamic Representing Places.
4
XI Jornadas de Tiempo Real (JTR2008)
•
The choice of the most suitable type of algorithm to
execute a Petri Net depends on the Petri Net behavior
(effective concurrency vs. effective conflicts).
Whether or not an algorithm is suitable to execute a Petri Net
depends on its structure, but also on the sequence of events
that fire the transitions. Even if the net structure is analyzed
previously, the evolution of its marking, which is brought
about by the events that arise, may lead to it shifting to the
algorithm that would execute the Net fastest in the light of
that sequence of events. In this work we have developed a
technique which allows the choice in real time of the most
suitable algorithm to execute a Petri Net in accordance with
its structure and the sequence of events. With this aim in
mind, we decided to design a controller, which we have
called Execution Time Controller (ETC). The aim of the ETC
is to determine in real time which algorithm executes the
Petri Net fastest and to change the execution algorithm when
necessary.
In the case of system control, this minimizes the controller
reaction time and also the power consumed by the controller.
One possible application of the technique is the minimization
of execution time of the Programmable Logic Controllers
programs developed in Grafcet.
synchronization places. Only transitions whose Representing
Place is marked are considered as candidates for firing.
Transition-driven approaches. A characterization of enabling
of transitions, other than marking, is supplied, and only fully
enabled transitions are considered. This kind of technique is
studied in works such as (Briz, 1995; Silva et al, 1982).
In the present work we have implemented two algorithms in
which different enabled transition search techniques are
developed:
• Enabled Transitions (ET)
• Static Representing Places (SRP)
Other implementation algorithms as Brute Force o Dynamic
Representing Places are not taken into account following the
conclusions of the work (Piedrafita and Villarroel, 2007).
2.1 Data Structures
The organization of this paper is as follows; in Section II, we
review the centralized implementation techniques for PN; in
Section III the ETC is introduced; Section IV describes the
techniques developed for algorithm computation time
estimation; in the Section V the technique evaluation is done
describes the tests run to evaluate the estimation techniques
and the working of the ETC in real time. Finally, in section
VI, we present the main conclusions and suggest future lines
of research.
In a centralized interpreted implementation we need a static
data structure which encodes the Petri Net structure and a
dynamic one which represents the net state or marking. All
the algorithms share the same basic data structure, in which
the Petri Net is encoded, with different access possibilities
adapted to each technique. Likewise, they have two kind of
lists to make the net evolve: treatment lists to be processed in
the present treatment cycle and formation lists to be
processed in the next cycle. With the use of these two lists,
the net is executed by steps, avoiding the appearance of runs.
The fundamental difference between each of the
implementation techniques lies in the way in which the
formation lists are built, and hence in the transitions which
are considered in each treatment cycle.
2. CENTRALIZED PETRI NETS IMPLEMENTATION
In the Enabled Transitions technique the following data
structures will be available:
Petri Net implementation is highly dependent on the
interpretation of the net model; namely, how inputs, actions
and code are associated to the net elements. In this work we
consider binary Petri Nets with an interpretation that
associates application code to places, and predicates and
priorities to transitions. Centralized implementation
techniques (Colom et al, 1985; Villarroel, 1990) are codified
in a task called the Coordinator, which plays the so-called
token game, i.e., it makes the net evolve over time. The
Coordinator establishes when transitions are enabled and
must fire.
Apart from the simple exhaustive test of all transitions (brute
force approach), various solutions are available for reducing
the costs of the enabling test and subsequently the overload
introduced by the coordinator. Depending on the solution
chosen, centralized implementation techniques can be
classified into any of the following classes (Briz, 1995):
Place-driven approaches. Only the output transitions of some
representative marked places are tested (Colom et al, 1985).
Each transition is represented by one of its input places, the
Representing Place. The remaining input places are called
•
Enabled Transitions List (ETL). Treatment list made up
of the transitions with all marked input places.
• Almost Enabled Transitions List (AETL). Formation list
which is built with the output transitions of the places
marked in the firing of the transitions, that is, the
transitions than can become enabled in the next cycle.
In the Static Representing Places technique, the following
data structures will be available:
•
•
Marked Representing Places list (MRPL) and Marked
Synchronization Places list (MSPL). Treatment Lists
with the marked Representing Places and
Synchronization Places.
Marked Representing Places list next cycle (MRPLnext)
and Marked Synchronization Places list next cycle
(MSPLnext). Formation Lists with the Representing
Places and Synchronization Places that will be marked
in the next cycle by the firing of the transitions.
2.2 Algorithm Execution Cycle
Program 1 presents the basic treatment cycle of the
Coordinator for ET technique and Program 2 for SRP
technique. The entire code for launching the actions
associated to the net places has been omitted in these
programs.
loop forever
while elements in ETL do
T = next_element (ETL) ;
// enabled transition analysis
if enabled (T) and predicate(T) then
// transition firing
// update ETL
Demark_input_places (T, ETL) ;
// update AETL
Mark_output_places (T) ;
end if ;
end while ;
// update ETL with AETL
ETL.update(AETL);
Clear(AETL);
end loop ;
5
3. EXECUTION TIME CONTROLLER
In the programs referred to in the previous section it is
demonstrated that the coordinator cycle computation time
depends on the size of the treatment and formation lists. The
size of the treatment lists in the case of ET and SRP depends
on the current net marking. The current marking determines
the number of enabled transitions and the number of
representing places marked. The size of the formation lists
depends on the number of transitions that fire in the cycle
(and also on the net structure). If no transitions fire, the
formation list size will be zero. Thus, the computation time
depends on the evolution of the net marking and on the active
net part, the net structure and the sequence of events. As
algorithms ET and SRP use different lists, their computation
times will be different.
Execution Time Controller
Program 1. ET Coordinator Treatment Loop
TET=f(ETLsize,AETLsize)
TSRP=f(MRPLsize, MRPLNsize…)
Error=TET-TSRP
Integral=integral+error
…..
If integral>integralmax then
changealgorithm
In the ET technique, ETL contains all enabled transitions at
the beginning of the cycle. From this list must be extracted in
the cycle execution the fired transitions and the disabled
transitions (effective conflicts). AETL is built with the output
transitions of the places marked in the firing of the
transitions. When ETL is updated for the next cycle,
transitions in AETL are verified for enabling.
loop forever
while elements in MRPL do
Rplace = next_element (MRPL);
Transitionsrepr=RPlace.transitionsrep ;
while elements in Transitionsrepr do
T = next_element (Transitionsrepr) ;
// enabled transition analysis
if enabled (T) and predicate(T) then
// transition firing
// update MRPL and MSPL
Demark_input_places (T, MRPL, MSPL);
// update MRPLnext and MSPLnext
Mark_output_places
(T, MRPLnext, MSPLnext);
Break () ;
end if ;
end while ;
end while ;
// update MRPL with MRPLnext
// update MSPL with MSPLnext
MRPL.update(MRPLnext);
MSPL.update(MSPLnext);
Clear(MRPLnext); Clear(MSPLnext);
end loop ;
Program 2. SRP Coordinator Treatment Loop
In the case of SRP technique, MRPL contains the marked
representing places and MSPL the marked synchronization
places. The output transitions of a marked representing place
are verified for enabling. If a represented transition fires the
verification process ends because the rest of represented
transitions become disabled (effective conflict). MRPLnext
and MSPLnext are built with the places that become marked
in a treatment cycle. Finally, MRPL and MSPL are
incremented with MRPLnext and MSPLnext respectively.
ETLsize
AETLsize
MRPLsize
MRPLNsize
MSPLNsize
SRP
Actions
ET
Events
Fig. 1. Execution Time Controller
With a view to minimizing Petri Net computation time, we
propose an adaptive implementation that will choose the best
algorithm to execute the net at a given time. We refer to this
solution as the Execution Time Controller (ETC). The main
function of the ETC will be to determine in real time which
algorithm executes a Petri Net fastest. The ETC will execute
the algorithm chosen and estimate the computation time of
the non-executed algorithm, making the necessary
comparisons and choosing the best algorithm in line with the
controlled system. If necessary, the ETC will make the
algorithm change.
The execution time controller will be implemented as a highpriority thread that will be executed with the same period as
the thread that executes the Petri Net. In an initial phase, the
ETC loads the net and measures several static execution
times related to ET and STR (see the next section). This
measurement is made without carrying out any actions. The
initial choice of the best algorithm is made with the results of
the time measurements. The Petri Net then returns to its
initial marking and the control loop is activated. The control
6
system inputs are read, the Petri Net is executed and the
outputs are written cyclically. The computation times of the
algorithms are then estimated.
Although the computation time of the executed algorithm can
be measured by reading the system clock. To avoid the
overload of the control actions, the execution time of the
executed algorithm (runnig_alg) are then calculated with
equation (2) or (3) (it depends on the algorithm that is
executed).
The execution time of the alternative algorithm
(alternative_alg) must be estimated with equation (2) or (3).
The comparison will be made with the calculated time of the
executed algorithm. The decision of to change or not to
change the algorithm is based on these computing times.
To avoid the overload of continuous algorithm changes, an
integral cost function is used:
ε = ExTcalculated (running _ a lg) − ExTestimated (alternative _ a lg)
(1)
⎧ I (k − 1) + ε (k ) if I (k − 1) + ε (k ) > 0
I (k ) = ⎨
0 if I (k − 1) + ε (k ) ≤ 0
⎩
The change is made when I(k) is greater than half the
computation time of the executed algorithm. When a change
happens then I(k-1) = 0.
// Offline Control
Load Petri Net
Measuring Times
First Choice of the best algorithm
Return to initial mark
// OnLine Control
loop forever
Read Inputs
Petri Net execution with the best algorithm
Write Outputs
Compute execution time of running_alg
Estimate execution time of alternate_alg
Compute I(k)
If I(k)>(ExTcalculated(running_alg)/2) then
Change algorithm
Initialize data structures
I(k-1)=0
End if
Wait for next period();
end loop
Program 3. Execution Time Controller
4. ESTIMATION OF ALGORITHMS EXECUTION TIME
In the implementation algorithms the dependency of the
computation time on the size of the treatment and formation
lists is observed. First, we study the SRP algorithm. The SRP
algorithm cycle time expression is:
ET(SRP)=T1*size(MRPL)+T2* FTnumber +
T3*size(MRPLnext) + T4*size(MSPLnext)
(2)
Where:
• FTnumber is the number of fired transitions
• T1 mean consulting time of output transitions from a
marked representing place
•
•
T2 mean firing time of a enabled transition
T3 mean update time of MRPL with a place of
MRPLnext
• T4 mean update time of MSPL with a place of
MRPLnext
The ET algorithm is also analyzed. The cycle time expression
is:
ET(ET)=T5*size(ETL)+T6* FTnumber +
T7*size(AETL)
(3)
Where:
• T5 mean time of the enabling test of a transition
• T6 mean firing time of a enabled transition
• T7 mean update time of ETL with a transition of AETL
The ETC must determine the execution time of the alternative
algorithm. Thus, it needs to know the value of times T1 to
T7, the number of fired transitions and estimate the size of
the algorithm treatment and formation lists if it was being
executed. Times T1 to T7 are measured in an off line
execution test. For this purpose, the required measurement
instrumentation is incorporated into the program. This
instrumentation comprises the instructions required for
reading the system clock and the necessary time difference
calculations.
If the net is executed with the SRP algorithm, the net
execution computation time with the ET algorithm must be
estimated. The number of fired transitions (FTnumber) is
known because the different algorithms lead the net to evolve
in the same way and, therefore, the number of fired
transitions will be equal in all the algorithms. The size(ETL)
will be estimated when the enabling of the transitions
represented by the marked representing places is tested. The
solution adopted is to make an approximation that involves
considering sensitized half of the represented transitions yet
to be verified when a sensitized transition that can be fired is
found. The size(AETL) will be estimated in the transition
firing as the size of the set of descending transitions of the
output places of the fired transitions. The results are shown in
section 5.
In the execution of the ET algorithm the size of the lists of
the SRP algorithm must be estimated. The mean number of
marked representing places is more or less constant in most
nets; therefore, the size(MRPL) will be the mean value
estimated in the off line time measurement test.
Consequently, it can be stated that, on average, the firing of a
transition involves the unmarking of its representing place
and the marking of a new one. size(MRPLnext) can be
approximated by the number of transitions fired.
size (MRPLnext) ≈ FTnumber
(4)
size(MSPLnext) can be approximated by the expression:
size(MSPLnext) ≈ FTnumber * (fp—1)
(5)
Where fp is the mean parallelism factor (number of output
places of a transition) of the net.
7
5. TECHNIQUE EVALUATION
5.1 Test Platform
A library of Petri Nets has been developed for carrying out
the tests. The library is based on eight base models which can
be scaled using a parameter. Some of these models are Petri
Nets which are well-known and frequently used in the
literature. The library comprises the following nets:
• SEQ. Petri Nets with one sequential process with ne
(1..100) states (Fig. 2.a).
• PAR. Petri Nets with p (1..100) sequential processes with
2 places (Fig. 2.b).
• PR1. Petri Nets with p (1..40) sequential processes with 2
states and a common resource (Fig. 2.c). These belong to
s3pr net class (Ezpeleta and Colom, 1995).
• DB. Petri Nets of b (5..11) databases (Fig. 2.d) (Jensen,
1997).
• P1R. Petri Nets with 1 sequential process and r (1..40)
resources (Fig. 2.e). These belong to s3pr net class
(Ezpeleta and Colom, 1995).
• PH. Petri Nets with the philosophers' problem (Dijkstra ,
1971) (Fig. 2.f) with f (5..40) philosophers.
• SQUARE. Petri Nets with r (1..40) sequential processes of
r+1 states and r common resources (Fig. 2.g, defined by
the authors).
• PR5. Petri Nets of p (5..62) sequential processes of 6
states and 5 common resources (not showed in figure).
These belong to s3pr net class (Ezpeleta and Colom,
1995).
We have implemented centralized PN implementation
techniques in Java language using Java Real-time extension
(RTJS, 2000) and following some ideas presented in
(Piedrafita and Villarroel, 2006 a, b). In our implementations,
we used the Real Time Java Virtual Machine JamaicaVM
v2.7 (Aicas, 2007). The target hardware was a personal
computer with a Pentium IV processor at 1.7GHz, running
Net Hat Linux 2.4.
5.2 Computation Time Estimation Tests
The results of the tests carried out for establish the precision
of estimations are shown in Fig. 3. The execution time is
shown in nanoseconds of 2000 transition firings. Figures a..e
show the estimation of the execution time compared with the
measured execution time for the two algorithms and for
several kind of nets (SEQ, PAR, PR5, SQUARE, PH) and
sizes. In Fig. 3.a the results for 100 SEQ nets are shown,
varying the number of places from 1 to 100. In Fig. 3.b the
results for PAR nets of 1 to 100 p processes are shown.
b)
SEQ
a)
e)
d)
PAR
P1_1
P1R
DB
P1_1
T1_1
T1_1
....
....
....
c) PR1
T1_2
....
T1_2
....
T1_ne-1
P1_ne
T1_r
T1_ne
P1_r+1
T1_r+1
f)
PH
Idle_0
Idle_1
Tr0_0
Wleft_0
T0
Wright_0
Idle_f-1
Tr0_1
Wleft_1
T1
g)
SQUARE
Tr0_f-1
Wright_1
T2
Wleft_f
Tf-1
Wright_f
P1_1
P2_1
P3_1
Pr_1
T1_1
T2_1
T3_1
Tr_1
T2_2
T3_2
Tr_2
T0
R_1
Tr2_f
Tr3_0
Eat_0
Tr4_0
Tleft_f
Tr3_1
Eat_1
Tr4_1
Tr2_f
T1_2
R_2
Tright_f
....
Tr3_f
Activación
periódica
Eat_f
Tr4_f
R_r-1
T1_r
P1_r+1
T1_r+1
Fig. 2. Petri Nets Library
....
Tr1_f
Tr2_1
Tright_1
....
Tr1_1
Tleft_1
....
Tr2_0
Tright_0
....
Tr1_0
Tleft_0
....
Tr1_0
R_r
T2_r
T3_r
Tr_r
P2_r+1
P3_r+1
Pr_r+1
T2_r+1
T3_r+1
Tr_r+1
8
8
SEQ nets. Real Execution and time computing estimation
x 10
8
6
time ns 2000 firings
4
3
9
PR5 nets. Real Execution and time computing estimation
x 10
8
SRPreal
SRPest
ETreal
ETest
14
5
PAR nets. Real Execution and time computing estimation
x 10
16
SRPreal
SRPest
7
ETreal
ETest
12
6
10
5
7
8
6
4
3
2
SRPreal
SRPest
1
0
ETreal
ETest
0
10
20
30
40
50
60
parameter ne
a)
9
3
70
80
90
4
2
2
1
0
100
0
SQUARE nets. Real Execution and time computing estimation
x 10
10
9
3.5
SRPreal
40
ETreal
50
60
parameter p
70
80
90
0
100
20
25
parameter p
30
35
40
PAR nets. ET Overload for SRP estimation
ET
10
ETestSRP
9
8
1
15
8
2.5
1.5
10
x 10
11
3
2
5
c)
SRPreal
SRPest
ETreal
ETest
ETest
30
PH nets. Real Execution and time computing estimation
x 10
SRPest
2.5
20
b)
2
1.5
1
7
6
5
4
3
2
0.5
0.5
1
0
5
6
7
8
9
d)
9
12
13
14
0
15
20
25
parameter f
30
35
0
40
0
20
8
SEQ nets. SRP Overload for ET estimation
x 10
10
30
40
f)
18
50
60
parameter p
70
80
90
100
SQUARE nets. SRP Overload for ST estimation
x 10
SRP
ET
4.5
ETestSRP
14
3.5
2
1.5
1
12
3
2.5
2
1.5
10
8
6
4
1
0.5
SRPestET
16
4
15
8
5
2.5
10
e)
SQUARE nets. ET Overload for SRP estimation
x 10
5
3
10
11
parameter p
SRP
0.5
2
SRPestET
0
g)
5
6
7
8
9
10
11
parameter p
12
13
14
15
0
0
10
20
30
h)
40
50
60
parameter ne
70
80
90
0
100
5
6
7
i)
8
9
10
11
parameter p
12
13
14
15
Fig. 3. Petri Net tests with ET and SRP. SRPreal and ETreal are the actual execution time of algorithms and SRPest and ETest
the estimated ones. In figures f and g the plot ET is the execution time of the algorithm without perform the estimation of SRP
and ETestSRP with estimation. In figures h and i the plot SRP is the execution time of the algorithm without perform the
estimation of ET and SRPestET with estimation.
In Fig. 3.c the results for 35 PR5 nets are shown, varying the
number of processes from 5 to 39. In Fig. 3.d the results for
10 SQUARE nets are shown, varying the number of
processes from 5 to 15. In Fig. 3.e the results for 36 PH nets
are shown, varying the number of philosophers from 5 to 40.
In all the performed tests the estimation has been precise
enough.
On the other hand, Fig. 3.f to Fig. 3.f show four examples of
the overload introduced by the computation of the estimation
of the alternative algorithm execution time. In all performed
tests the overload is minimal.
5.3 Real Time Execution of ETC
The experiments shown, were carried out on a net comprising
a SQUARE of 8 resources and a PAR of 20 processes. In this
net, if events reach both the SQUARE and the PAR parts in
similar amount, both the SRP and ET algorithms display the
same computation time. If more events reach the SQUARE
part of the net, the algorithm with best behavior is the SRP
algorithm (see Fig. 3.d). If more events reach the PAR part
the algorithm with best behavior is the ET algorithm (see Fig.
3.b).
9
0
0
0.5
x 10
10
1.5
2
time
b) SQUARE8PAR20 Real Time execution ETC, SRP and ET
1
2.5
ETC
SRP
ET
5
0
0
0.5
1
change algorithm ET to SRP
1.5
2
time
change algorithm SRP to ET
c) SQUARE8PAR20 Time computing
7
x 10
10
time nanosecons
ETC
ESTsame
ESTother
Integral+
5
6
cycle time nanosecons
a) SQUARE8PAR20 Real Time execution an Estimation
2.5
0
-5
0
0.5
1
1.5
2
integSRP
integET
2.5
time
Fig. 4 Execution of ETC
As has been stated, the change takes place when I(k) is
greater than half the current cycle time of the executed
algorithm. Fig. 4.b shows the execution of the ETC,
compared with the actual execution of SRP and ET when
faced with the same sequence of events.
6
a) SQUARE8PAR20 Real Time execution an Estimation
x 10
10
x 10
a) SQUARE8PAR20 Real Time execution and Estimation
Integral+
0
0.5
ETC
ESTsame
ESTother
Integral+
1
1.5
2
time
6
10
ESTsame
ESTother
5
0
ETC
x 10
2.5
ETC
SRP
ET
5
0
0
0.5
1
1.5
2
2.5
time
4
x 10
2
inteSRP
0
-2
inteET
0
0.5
1
1.5
2
2.5
time
Fig. 6. Execution of ETC
In Fig. 4.c, Fig. 5.c and Fig. 6.c we see the real error integral
between SRP and ETC (inteSRP), and the real error integral
between ET and ETC (inteET). When these graphs are
positive, they show the computation time gained with respect
to the two algorithms; when they are negative, they show the
slight overload that the ETC introduces with respect to the
best algorithm.
5
0
0
0.5
6
x 10
1
1.5
2
2.5
3
6. CONCLUSIONS
3.5
time
ETC
SRP
ET
10
5
0
0
0.5
1
1.5
change algorithm SRP to ET
2
2.5
3
3.5
change algorithm ET to SRP
time
integSRP
integET
7
time nanosecons
6
10
7
5
time nanosecons
6
x 10
10
Fig. 6 shows a test in which events only reach the PAR part
of the net during odd cycles (best algorithm ET) and only
reach the SQUARE part during even cycles (best SRP
algorithm). As can be observed, the ETC does not make any
algorithm change; this is due to the fact that I(k) (Integral+)
does not reach the change value and executes the SRP
algorithm, which is the one with the shortest mean
computation time.
Fig. 4.a shows the execution of the ETC, the estimation of
the algorithm that is being executed at the time (ESTsame)
and of the other algorithm (ESTother). The cost function
integral I(k) (Integral+ in the figure) is calculated from the
difference between the two estimations.
increased and between 1.8 and 2 seconds the two algorithms
behave practically identically. More events reach the
SQUARE part from second 2 (the best algorithm will be
SRP).
The ETC is implemented in a Periodic Real Time Thread of
high Priority with 20 ms of period. Fig. 4 shows a test in
which events only reach the SQUARE part for 20 cycles (the
best algorithm will be SRP) and in the following 60 events
only reach the PAR part of the net (the best algorithm will be
ET).
20
x 10
10
0
0
0.5
1
1.5
2
2.5
3
3.5
time
Fig. 5 Execution of ETC
In Fig. 5 we see a test in which more events reach the PAR
part for the first 1.8 seconds (the best algorithm will be ET);
the events that reach the SQUARE part are gradually
In this work we have developed a technique which allows the
choice in real time of the most suitable algorithm to execute a
Petri Net in accordance with its structure and the sequence of
events. The execution of a Petri Net without a suitable
algorithm can lead to significant increases in computation
time, together with a less satisfactory and slower answer in
control applications.
The ETC has been tested with all the nets comprising the PN
library, with a high best-algorithm success rate. The
execution of the ETC can lead to enormous savings in
computation time; in a SQUARE net with 15 resources, the
saving would amount to 43% (see Fig. 3.d); in a PH net with
40 philosophers, the saving would amount to 77% (see Fig.
3.e). The success rate is lower in nets in which the SRP and
ET algorithm computation time is very similar, such as PAR
nets with 5 to 15 processes.
10
The ETC enables the algorithm that executes a Petri Net
fastest to be chosen in real time, thus leading to faster
reaction of Petri Net-based control systems. One application
of the technique could be minimization of the execution time
of programs written in Grafcet, by minimizing the
Programmable Logic Controller cycle time.
Future work will examine the following:
•
•
•
Incorporation of new Petri Net execution algorithms
into ETC.
Improvement in real time algorithm computation time
estimation.
We hope to systematize the use of the reachability graph
so as to choose the most suitable algorithm to
implement a Petri Net.
ACKNOWLEDGMENTS
This work was funded by the NERO project DPI200607928 of the Spanish Ministry of Science and Technology.
REFERENCES
Aicas GmbH (2007). JamaicaVM Realtime Java Technology.
http://www.aicas.com/jamaica.html
Briz, J.L. (1995). Técnicas de implementación de Redes de
Petri. PhD thesis, Univ. Zaragoza.
Colom, J.M., Silva M., and Villarroel, J.L. (1986). On
software implementation of Petri Nets and coloured Petri
Nets using high-level concurrent languages. Proc of 7th
European Workshop on Application and Theory of Petri
Nets, Oxford. pp 207-241
Dijkstra E. W. (1971). Hierarchical ordering of sequential
processes. Acta Informática, vol.1, pp. 111-127
Ezpeleta, J., Colom, J.M. and Martínez, J.(1995). A Petri Net
based deadlock prevention policy for flexible
manufacturing systems. IEEE Transactions on Robotics
and Automation, Vol. 11, 2, pp. 173-184.
Jensen, K. (1987). Coloured petri nets. Petri Nets: Central
Models and Their Properties, pages 248-299. SpringerVerlag, 1987. LNCS 254.
Murata, T. (1989). Petri Nets: Properties, Analysis and
Applications. Proc. of the IEEE, 77(4) , 541-580.
Piedrafita, R. and Villarroel, J.L. (2006). Petri Nets and Java.
Real-Time Control of a flexible manufacturing cell. 11th
IEEE International
Conference on
Emerging
Technologies and Factory Automation. Prague.
Piedrafita, R. and Villarroel, J.L. (2006). Implementation of
Time Petri Nets in Real-time Java. The 4th International
Workshop on Java Technologies for Real-time and
Embedded Systems. Paris.
Piedrafita, R. and Villarroel, J.L. (2007). Performance
Evaluation of Petri Nets Execution Algorithms. 2007
IEEE International Conference on Systems, Man, and
Cybernetics. Montreal.(to be published)
RTJS. The Real-Time for Java Expert Group ( 2000). The
Real-time
Specification
for
Java.
https://rtsj.dev.java.net/. Addison Wesley.
Silva, M. and Velilla, S. (1982). Programmable logic
controllers and Petri Nets: A comparative study. Proc. of
the Third IFAC/IFIP Symposium, Software for Computer
Control, pp 83-88.
Villarroel, J.L. (1990). Integración Informática del Control de
Sistemas Flexibles de Fabricación. PhD thesis,
University of Zaragoza.
11
Modeling and Verification of Master/Slave Clock
Synchronization Using Hybrid Automata and
Model-Checking
Guillermo Rodríguez-Navas, Julián Proenza
SRV, Dept. de Matemàtiques i Informàtica
Universitat de les Illes Balears, Spain
[email protected], [email protected]
Hans Hansson
Malardalen Real-Time Research Centre
Dept. of Computer Science and Electronics
Mälardalen University, Sweden
[email protected]
Abstract— An accurate and reliable clock synchronization
mechanism is a basic requirement for the correctness of many
safety-critical systems. Establishing the correctness of such mechanisms is thus imperative. This paper addresses the modeling
and formal verification of a specific fault-tolerant master/slave
clock synchronization system for the Controller Area Network.
It is shown that this system may be modeled with hybrid
automata in a very natural way. However, the verification of
the resulting hybrid automata is intractable, since the modeling
requires variables that are dependent. This particularity forced
us to develop some modeling techniques by which we translate
the hybrid automata into single-rate timed automata verifiable
with the model-checker U PPAAL. These techniques are described
and illustrated by means of a simple example.
I. I NTRODUCTION
This paper addresses the formal verification of a specific
solution for fault-tolerant clock synchronization over the Controller Area Network (CAN) fieldbus [1]. This solution is
called OCS-CAN, which stands for Orthogonal Clock Subsystem for the Controller Area Network [2], [3]. The aim of
this formal verification is to use model checking in order to
determine whether the designed fault tolerance mechanisms
guarantee the desired precision in the presence of potential
channel and node faults.
OCS-CAN can be naturally described with the formalism of
hybrid automata [4] by assuming that clocks are continuous
variables. Unfortunately, the resulting automata cannot be
directly verified with model checking. The main difficulties
are caused by two specific characteristics of the adopted clock
synchronization algorithm: the existence of clocks of various
rates, and the fact that neither the rates nor the values of the
clocks are independent.
Without the second characteristic, the first one would not
be a real problem. It is known that a system with clocks of
different rates, also known as multirate clock system, can be
translated into a verifiable single-rate timed automata as long
as the rates of the clocks are independent [5], [6]. But the
second characteristic |the lack of independence| poses a real
challenge to model checking, as it actually relates to a more
general issue in the field of hybrid systems: the undecidability
of the reachability problem in hybrid automata where variables
are not decoupled [4], also called non-rectangular hybrid
automata.
Despite this limitation, we were able to translate our nonrectangular hybrid automata into a network of timed automata
verifiable with U PPAAL [7], and thus model check the precision guaranteed by OCS-CAN, as shown in [3], [8]. The
essence of this translation is twofold: 1) the behavior of the
system is expressed over a single timeline, and 2) the lack
of precision (the offset) between the clocks is converted into
the corresponding delays over that timeline. The techniques
developed to perform these tasks, which are closely related to
the notion of perturbed timed automata [6], are discussed in
this paper.
The contribution of this paper is relevant in many senses.
First, it concerns the application of model checking to a
realistic, and relatively complex, system. Second, it addresses
a very important topic in the context of dependable embedded systems: formal verification of clock synchronization;
and proposes a novel approach, since to the authors’ best
knowledge, model checking has not been previously applied
to master/slave clock synchronization. Third, it shows that
despite the theoretical limitation of verifying non-rectangular
hybrid automata, the model of OCS-CAN can be translated
into timed automata to allow model checking of certain
properties. The discussed translation techniques may inspire
other researchers willing to model check hybrid systems with
dependent variables.
The rest of the paper is organized as follows. Sect. II
introduces the notion of perturbed time automaton and relates
it to the problem of clock synchronization. In Sect. III,
the main characteristics of OCS-CAN are discussed, paying
special attention to the properties of its clock synchronization
algorithm. In Sect. IV, the basic notation of OCS-CAN is
12
Fig. 1.
An example of two perturbed timed automata
defined, and the aim of the formal verification is stated in terms
of this notation. Sect. V describes the modeling of OCS-CAN
as a network of non-rectangular hybrid automata. In Sect. VI,
the translation of such hybrid automata into a network of timed
automata verifiable with U PPAAL is addressed. Some verification results are presented in Sect. VII, whereas Sect. VIII
summarizes the paper.
Fig. 2. An external observer to check precision between clock x1 and clock
x2
II. P ERTURBED T IMED AUTOMATA
Timed automata are, in principle, a very useful formalism to
model systems with clocks. However, timed automata exhibit
an important limitation: although they allow definition of multiple clocks, all clocks must evolve at the same pace [9]. This
represents a limitation because real systems often work with
drifting clocks, i.e. clocks that evolve at a slightly different
rate, and therefore such systems cannot be directly modeled as
timed automata. This limitation may, however, be overcome by
adopting certain modeling techniques. One of such techniques,
which is known as perturbed timed automata [6], proposes to
move the uncertainty caused by the drifting clocks into the
guards and invariants of the automata. A similar technique is
also used in [5].
The usefulness of perturbed timed automata is illustrated
by the example in Fig. 1. This example shows two automata
which exhibit the same behavior: they both use a clock (x1
and x2, respectively) to trigger a periodical action (signaled
through channel a1 and a2, respectively)), with period R.
Both clocks are assumed to start simultaneously and to have
the same maximum drift (ρ) with respect to real time. Due to
this drift, they actually do not trigger the periodical actions
at an exact point of time, but they may trigger it within a
time interval [R - ρR, R + ρR], as defined by the guard and
invariant expressions.
When using such a model, the lack of synchronism between
the clocks can be easily checked by an external observer,
which just measures the time elapsed between the signaling
over channel a1 and the signaling over channel a2. This observer is depicted in Fig. 2. Notice that location Failure can
only be reached when one of the automata has performed the
periodical signaling at least Π time units later than the other
one. Assuming that exceeding such threshold is undesirable for
some reason, the following safety property should be defined
for the system: A[ ] not Observer.Failure, stating
that location Failure should never be reached.
Note that according to the automata of Fig. 1, the location
Failure is reachable, regardless of the value of Π, because
the clocks are never resynchronized. Therefore, behaviors in
which they continuously diverge are possible. This perfectly
Fig. 3.
Architecture of an OCS-CAN system
matches the behavior of a real system with unsynchronized
clocks.
Nevertheless, the aim of this work is to model check the
clock error of a system (OCS-CAN) where clock resynchronization is periodically performed, and where the effect of
resynchronization is to dynamically change the values and
drifts of the clocks. For instance, we wish to specify actions
such as x2:= x1, which means that clock x2 takes the
value of clock x1 (i.e. x2 synchronizes to x1). This requires
more complex modeling than just perturbed automata. The
techniques we have developed for this modeling are described
in Sect. VI.
III. S YSTEM UNDER V ERIFICATION
OCS-CAN is designed to be incorporated into a CAN-based
distributed embedded system. The role of OCS-CAN within
such a system is to provide a common time view, which
the processors of the nodes can rely on in order to perform
coordinated actions [2], [3].
A. Architecture of OCS-CAN
OCS-CAN is made up of a set of specifically designed
hardware components, named clock units, which are interconnected through a CAN bus. When OCS-CAN is used, a clock
unit is attached to every node of the system, as depicted in
Fig. 3, along with the processor and the fieldbus controller
(FC). Notice that the clock unit has its own connection to the
CAN bus.
The clock unit is provided with a discrete counter, the socalled virtual clock, which is intended to measure real time.
The clock units execute a master/slave clock synchronization
algorithm, which aims at keeping all virtual clocks within
a given interval of tolerance, which is called precision. In
Fig. 4.
13
Transmission pattern of the TM in the absence of faults
SOF
Data field
t_reference
t
t_reference
Fig. 5.
Fig. 6.
Order of events within a synchronization round
The Time Message contains a timestamp of the Start of Frame bit
principle, only one of the clock units (the master) is allowed
to spread its time view, and the rest of clock units (the slaves)
synchronize to this time view.
In order to spread its time view, the master periodically
broadcasts a specific message, which is called the Time Message (TM). Fig. 4 shows the transmission pattern of the TM
when the resynchronization period is R time units.
The function of the TM is twofold: it signals the resynchronization event, which coincides with the first bit (the Start of
Frame bit) of the TM, and also contains a timestamp that
indicates the occurrence time of that event. This is depicted in
Fig. 5. Thanks to such timestamp mechanism, after receiving
the TM, every slave can adjust the value and the rate of its
virtual clock to take the value and the rate of the master’s
virtual clock [2].
B. Fault Tolerance Issues
Concerning the fault model, it is important to remark that
the failure semantics of the clock unit is restricted to crash
failure semantics by means of internal duplication with comparison. With respect to channel faults, OCS-CAN assumes the
CAN bus to provide timely service but not reliability nor data
consistency. This means that a TM broadcast by a master clock
unit at time t is expected to be delivered to some clock unit
within the interval (t, t + wcrt] or not delivered at all, where
wcrt is the worst-case response time of the message [10]. Both
inconsistent duplicates and inconsistent omissions of the TM,
as defined in [11], [12], may occur. Permanent failures of the
bus, such as bus partition or stuck-at-dominant failures, are
not addressed by OCS-CAN.
In order to provide tolerance to faults of the master, OCSCAN defines a number of backup masters, one of which should
take over upon failure of the active master. The mechanism
for master replacement assumes that masters are organized
hierarchically. The priority of a master is defined with two
parameters. The first parameter is the identifier of the TM
broadcast by the master; following the common convention in
CAN, a lower identifier implies higher priority. The second
parameter is the release time of the TM, which for every
round indicates to every master when it is allowed to broadcast
its corresponding TM. The release time of master m in the
resynchronization round k, is calculated as follows:
Trlsm = k · R + ∆m
Where R is the resynchronization period (the same for all
masters) and ∆m (the release delay) is a small delay |in the
order of a few ms| whose length is inversely proportional to
the priority of the master.
The release time, combined with the assignment of identifiers discussed above, must guarantee that in a round, a master
may broadcast its TM before a master of higher priority only
if the latter is faulty. This is depicted in Fig. 6, for the case of
three masters. In the absence of faults, the second and third
TM are usually not broadcast, and if any of them is broadcast
(for instance because one backup master could not timely abort
a just-requested TM broadcast) then it is ignored by the slaves.
The spare TMs are only taken into account if master 0 fails
and is not able to broadcast its TM. Thanks to the master
redundancy, in such situation the system will recover after a
very short delay.
Nevertheless, in a CAN network it may happen that a
message is not consistently received by all the nodes, as
discussed in [11], [12]. In such cases, the clock units might not
receive a TM to synchronize with, or even worse, in the same
round different clock units may synchronize to TMs broadcast
by different masters. These scenarios, although being rather
unlikely, may jeopardize clock synchronization and should be
carefully studied.
A fundamental property of the CAN protocol states that,
regardless of being consistent or not, a CAN broadcast always
finishes within a bounded time interval, so the worst-case
response time of any broadcast can be calculated, as discussed
in [10]. In OCS-CAN this property implies that whenever a
master clock units requests a TM broadcast, this request causes
a reception of the TM in some other clock units before wcrt
time units, or it does not cause any reception at all.
This property also means that for every resynchronization
round, receptions of the TM may only happen within a
bounded temporal interval. This is shown in Fig. 6 by means
of a shadowed window, which is called TMdelay. In an OCSCAN system, the length of TMdelay is equal to ∆l + wcrtl ,
where l is the master of lowest priority in the system. Since
clock synchronization may only happen after reception of a
TM, this implies that the maximum distance between two
consecutive synchronizations of a clock unit is Rmax = R
+ TMdelay. Although it is not properly represented in Fig. 6,
R is always much greater than TMdelay.
14
C. Master and Slave Finite State Machines
This section describes the algorithms executed by the clock
units, as they are fundamental to understand the model used
for formal verification. Every clock unit may behave either as a
master or a slave. A non-faulty master clock unit executes the
finite state machine in Fig. 7, whereas a non-faulty slave clock
unit executes the finite state machine in Fig. 8. Both algorithms
are built upon five primitives: TM.Request, TM.Indication,
TM.Confirm, TM. Abort and Sync.
TM.Request. A master executes TM.Request to broadcast
its TM as soon as it reaches the corresponding release time.
This primitive is denoted TM.Req(n), where n is the identifier
of the TM broadcast. Further information about the lowlevel actions triggered by TM.Req, such as timestamping, is
available in [3].
TM.Indication. This primitive is executed when a TM
is received. It is denoted TM.Ind(n), where n indicates the
identifier of the received TM. Every master compares the value
of n with its own identifier (m) to determine whether this TM
comes from a higher priority master (case n < m) or not.
Masters may only synchronize to masters of higher priority.
TM.Confirm. This primitive indicates to the transmitting
master that a previously requested TM broadcast has been
successful. It is denoted TM.Conf(n), where n indicates the
identifier of the successfully broadcast TM.
TM.Abort. A master uses this primitive to abort the broadcast of a TM whose transmission was previously requested. It
is denoted TM.Abort(n), where n is the identifier of the TM to
be aborted. This action is caused by the reception of a higher
priority TM, and has some associated latency so it may be the
case that the TM broadcast is not timely aborted.
Sync. This primitive is executed by any clock unit (either
master or slave) that receives a valid TM and wants to adjust
its own virtual clock to the value conveyed by the TM.
For the slaves, a valid TM is the first TM received in any
resynchronization round (first TM.Ind(n)). For the masters, a
valid TM is the first TM of higher priority received in any
resynchronization round (the first TM.Ind(n) with n < m),
provided that the master did not successfully broadcast its own
TM in that round. This primitive is denoted Sync(n, a), where
a indicates the clock unit that is adjusting its virtual clock,
and n is the identifier of the TM which clock unit a is using
as a reference.
Concerning the Sync primitive, it is important to remark that
the clock adjustment can never be exact. Even with the very
accurate timestamping mechanism of OCS-CAN [3], certain
imprecision remains, for instance due to small system latencies
or to fixed-point arithmetics.
Note that a clock unit can only synchronize once per round.
This is ensured by entering a waiting state after execution of
the Sync primitive, in which further receptions of TM are
ignored. Given that R2 > TMdelay (as already indicated in
Sect. III-B), we ensure that TM duplicates and non-aborted
TMs cannot cause duplicated resynchronizations.
IV. A IM OF O UR F ORMAL V ERIFICATION
In this section, the basic notions of OCS-CAN, such as clock
unit or virtual clock, are formally defined. These definitions
Fig. 8.
Behavior of a non-faulty slave s
are specially useful for describing the aim of our formal
verification, which is to model check the precision guaranteed
by OCS-CAN under diverse fault assumptions.
A. Basic Definitions
The synchronization algorithm is characterized by the resynchronization period R, and two parameters ²0 and γ0 , which
indicate the "quality" of the mechanism for clock adjustment.
The failure assumptions are defined with two values OD
(the omission degree) and CD (the crash degree), which
indicate the maximum number of consecutive rounds affected
by inconsistent message omissions and the maximum number
of faulty masters, respectively.
Definition 1 An OCS-CAN system is a set:
OCSS = {A, R, ²0 , γ0 , OD, CD}
such that:
• A is a set of clock units.
+
• R ∈ R
is the resynchronization period of the clock
synchronization algorithm.
+
• ²0 ∈ R is the maximum offset error after synchronization.
+
• γ0 ∈ R , γ0 << 1, is the maximum drift error after
synchronization.
• OD ∈ N is the omission degree.
• CD ∈ N is the crash degree.
In an OCS-CAN system, the state of a clock unit is defined
at any instant by the three following variables: the value of its
virtual clock vc(t), the rate of its virtual clock vc(t),
˙
and its
operational state f (t). Furthermore, every clock unit is characterized by the following three additional parameters, which
indicate how the clock unit executes the clock synchronization
algorithm: the relative priority (p) of the TM that the clock
unit broadcasts, the release delay (∆) of the TM that the clock
unit broadcasts, and the worst case response time (wcrt) of
the TM that the clock unit broadcasts.
Definition 2 A clock unit a ∈ A is a 6-tuple:
a = (vca , vc
˙ a , fa , pa , ∆a , wcrta )
such that:
Fig. 7.
•
•
•
•
•
•
15
Behavior of a non-faulty master m
vca (t) ∈ R+ is the value of the virtual clock of a at time
t, ∀t ∈ R+ .
vc
˙ a (t) ∈ R+ is the instantaneous rate (or speed) of the
virtual clock of a at time t, ∀t ∈ R+ .
fa : R+ → {0, 1} is the operational state of clock unit
a. fa (t) = 1 when a is faulty at time t, otherwise
fa (t) = 0.
pa ∈ N is the relative priority of the TM that clock unit
a broadcasts, where pa = 0 means that the clock unit
never broadcasts the TM.
∆a ∈ R+ is the release delay of the TM that clock unit
a broadcasts.
wcrta ∈ R+ is the worst case response time of the TM
that clock unit a broadcasts.
Note that although the virtual clock of a clock unit is
actually implemented as a discrete counter, and therefore it
may take only values over N, we define it over R+ for
compatibility with the definition of time in timed automata.
Also note that the values of ∆a and wcrta are irrelevant for
slaves.
B. Offset and Precision
In OCS-CAN, each clock unit supplies its corresponding
processor with a local view of real time. Therefore, the
consistency in the perception of time depends on the difference
(or offset) exhibited by the virtual clocks.
Definition 3 Let A be a set of clock units. The maximum offset
of set A at time t is:
ΦA (t) = max {|Φab (t)|}
a,b∈A
where:
Φab (t) = vca (t) − vcb (t) is the offset between clock units
a, b ∈ A at time t.
When the maximum offset between the clock units is
always bounded, then the OCS-CAN system is said to be
synchronized.
Definition 4 An OCS-CAN system is Π-synchronized when
there exists a constant Π ∈ R+ , which is called the precision,
such that ΦA (t) ≤ Π, ∀t ∈ R+ .
The extent to which the system is synchronized depends on
the value of Π. The lower the value of Π, the higher is the
achieved precision.
Last, we define the concept of consonance between two
clock units, as this concept turns out to be very important
when modeling drifting clocks.
Definition 5 Let a, b ∈ A be two clock units. The consonance
between them at time t is:
γab (t) = vc
˙ a (t) − vc
˙ b (t)
V. M ODELING OCS-CAN AS A N ETWORK OF H YBRID
AUTOMATA
The first step of model checking is to specify a formal model
of the system under verification. Whenever a system combines
both continuous components, which evolve over time as expressed by a differential equation, and discrete components,
expressed by finite state machines, hybrid automata are very
suitable for the modeling [13]. This is the case of OCS-CAN,
since the virtual clocks can be easily modeled as continuous
variables that are modified by the (discrete) synchronization
actions performed by the clock units.
In this section, we discuss how the behavior of OCSCAN can be specified by means of hybrid automata. It is
shown that the resulting model includes variables that are not
independent. Although this characteristic makes, in principle,
the verification of our model unfeasible by model checking, in
Sect. VI we show that the model can still be translated into
timed automata. Thanks to this, some safety properties, such
as the guaranteed precision, can be verified.
A. Channel Abstraction
The communication channel is abstracted by means of
an additional process channel_control, together with a
global variable msg_id and a broadcast channel [7] called
16
Since the Sync(n,a) primitive may cause discontinuities of
the virtual clock values as well as discontinuities of the virtual
clock rates, it makes sense to define the following notation.
f (t+
0 ) = lim f (t)
t&t0
f (t−
0)
Fig. 9.
Abstraction of the communication channel in OCS-CAN
tx_msg. This abstraction is shown in Fig. 9. The function of
the automaton channel_control is to enforce the worstcase response time of the TM broadcasts. A full description
of channel_control is available in [3].
The variable msg_id represents the identifier of the TM
being broadcast. TM.Request is modeled as a write operation
over msg_id, and CAN arbitration [1] is modeled by allowing
the masters to overwrite the value of msg_id only whenever
they have higher priority that the TM being transmitted. Therefore, TM.Req(m) is modeled with the following assignment:
msg_id:= min{m,msg_id}. However, and for compatibility with the U PPAAL model checker, we hereafter use
the C-like assignment: msg_id:= m <? msg_id, which
is equivalent.
The broadcast channel tx_msg is used by
channel_control to signal the instant at which the
TM is delivered. Therefore, TM.Confirm and TM.Indication
primitives are both signaled through tx_msg. For a master,
a signaling through tx_msg is a TM.Confirm if the value
of msg_id is equal to the identifier written by the master.
Otherwise, it is a TM.Indication.
B. Abstraction of Clock Correction
In [2], we provide some details about the way virtual clocks
are corrected (or adjusted) in OCS-CAN, and we highlight
that clock correction is never performed immediately, but it is
gradually carried out. This is called clock amortization.
Nevertheless, for the purpose of modeling and formal verification, we assume instantaneous clock correction instead of
clock amortization. We make this abstraction because including clock amortization would cause an unnecessary complexity
in the modeling. We are interested in assessing the maximum
error (the achievable precision) between virtual clocks, and to
do this we only have to examine the value of the virtual clocks
a long time after the last resynchronization action. At these
time instants, and provided that clock amortization is properly
implemented [14], there is no difference between considering
either instantaneous clock correction or clock amortization.
When instantaneous clock correction is assumed, executing
the Sync(n,a) primitive is equivalent to assigning the value and
the rate of the virtual clock of master n to the virtual clock
of the synchronizing clock unit a. Since this assignment is
never exact, the value and the rate assigned are always within
an error interval. The width of this interval is determined
by the maximum offset error ²0 , in the case of clock value
assignments, and by the maximum drift error γ0 , in the case
of clock rate assignments.
= lim f (t)
t%t0
After that, the points of discontinuity can be characterized.
Definition 6 Let a ∈ A be a clock unit and m ∈ M be
a master. Then both vc
˙ a (t) and vca (t) are piecewise linear
functions such that:
• vc
˙ a (t+ ) = B̄(vc
˙ m (t− ), γ0 ) when clock unit a executes
Sync(m, a) at time t.
+
−
• vca (t ) = B̄(vcm (t ), ²0 ) when clock unit a executes
Sync(m, a) at time t.
where B̄(x, ²) = [x − ², x + ²].
Remark 1 Let m ∈ M be a master and a ∈ A be a clock unit.
If clock unit a executes Sync(m,a) at time t then |Φma (t+ )| ≤
²0 and |γma (t+ )| ≤ γ0 .
C. Master and Slave Hybrid Automata
When using the discussed abstractions for the communication channel and for clock correction, the hybrid automaton
of a master corresponds to the one in Fig. 10.
Notice that in the transitions where the Sync(n, a) primitive
should be executed, which were described in Sect. III-C, this
model includes assignments to the virtual clock’s value and to
the virtual clock’s rate, as specified in Sect. V-B (Definition 6).
Particularly, these assignments occur in the transitions from
location 1 to location 4 and from location 2 to the committed
location right before locations 3 and 4.
Furthermore, this automaton models three additional characteristics of OCS-CAN masters: the inconsistent reception of
the TM, the possible non-abortion of the TM, and the possibility of master crash. A full description of these characteristics
can be found in [3].
Inconsistent receptions of the TM are modeled at the
receiver’s side, by ignoring TM.Indications. For this reason,
in locations 1 and 2, it is possible that a transition fired by
a valid TM (tx_msg? with msd_id < m) does not cause
any modification of the virtual clock.
When describing the management of the TM in Sect. IIIC, it was mentioned that a TM broadcast may not be timely
aborted. This is modeled with a committed location, between
locations 2 and 3, which is reached when the master has
performed a TM.Request, but receives a TM.Indication of a
higher priority master. From this location, the master may
either overwrite again the variable msg_id or not. The first
behavior would represent a non-aborted message.
The master hybrid automaton includes a location that represents the crash failure (location 5). Notice that a master may
nondeterministically step into this state as long as there is
another non-faulty master in the system (condition nalive
> 1).
Fig. 10.
Fig. 11.
17
Hybrid automaton of master m
Hybrid automaton of slave s
The hybrid automaton of a slave is depicted in Fig. 11. This
automaton also models the synchronization as an assignment
to the virtual clock’s value and to the virtual clock’s rate,
according to Definition 6. The possibility of inconsistent
receptions of the TM is modeled by having transitions that are
fired by TM.Indications but do not cause any clock correction.
Crash failures are not modeled for slaves, as such failures
do not have any consequence for the rest of the system.
VI. T RANSLATING THE M ODEL INTO T IMED AUTOMATA
As discussed in Sect. IV, the aim of our formal verification is to determine whether an OCS-CAN system is
Π-synchronized under certain fault hypotheses or not. This
formal verification is addressed by translating our hybrid
automata into a network of timed automata verifiable with
U PPAAL.
The main challenge of such translation is that, as a consequence of the Sync actions, clock and rate assignments exist.
Although these assignments cannot be directly specified in
timed automata notation, we circumvent this limitation in the
following way: 1) the behavior of the system is expressed over
a single timeline, and 2) the offset between the virtual clocks
is converted into delays over that timeline.
Therefore, the first step is to decide what this single timeline
represents. In our model, time corresponds to the clock of the
highest priority master, which is called reference clock hereafter. For the rest of clock units, we use the consonance (γi )
with respect to this clock in order to calculate the delays over
the reference timeline. Furthermore, four additional aspects
need special consideration:
• The instant when the offset is to be checked has to be
properly defined. This instant is called the observance
instant.
• Updates of the value of a virtual clock, as defined in the
equations of Sect. V-B, must be modeled.
• Updates of the rate of a virtual clock need to be modeled
as well. Particularly, it is important to model how a rate
change may affect the consonance with respect to the
reference clock.
• The model must include changes of the reference clock
when the master of highest priority crashes.
In the following, these aspects are described in detail. A
model of a simplified OCS-CAN system, which is made up
of two masters and an arbitrary number of slaves, is used
to illustrate the main points. The modeling of the failure
assumptions of OCS-CAN is not included to reduce the
complexity of the model and help reader’s understanding. The
complete U PPAAL model can be found in [8].
A. Definition of the Observance Instant
In order to adopt the verification technique described in
Sect. II (the precision observer), the observance instant must be known a priori, and it must be signaled by
all of the clock units. Since we are interested in knowing
the precision of OCS-CAN, we should check it at the instant
with the maximum offset. This instant must be located before
a Sync(n,a) primitive because the involved clocks converge
immediately after this primitive is executed.
Although it is not possible to know the exact instant
of execution of any Sync(n,a) primitive, it is possible to
determine the maximum distance between the synchronization
instants of two consecutive rounds. In Sect. III-C it was shown
that the maximum distance is given by Rmax= R + TMdelay.
18
Fig. 14.
Auxiliary automata
B. Modeling of Virtual Clock Value Assignments
Fig. 12.
Virtual clock of clock unit i
Fig. 13.
Precision observer
This value can be used to upper bound the offset accumulated
during one synchronization round, as described next.
Fig. 12 depicts the virtual clock automaton, which
models the behavior of virtual clock i. Although one
virtual clock is included for each clock unit in the
system, this automaton does not describe the behavior of a
master or a slave; what it actually models is the passage
of time as measured by clock unit i, and represents it with
the clock vc[i]. According to the value of vc[i] certain
events are signaled, so the clock units (either master or slave)
can execute the clock synchronization algorithm discussed in
Sect. III-C. This means that in every round, every master and
slave automaton chooses which virtual clock it uses, which is
equivalent to having clock assignments.
As shown in Fig. 12, the virtual clock automaton
signals three events: the instant to broadcast the TM (through
channel begin[i]); the observance instant; and the instant
for resetting the virtual clocks (through channel end[i]).
Notice that the first two events are signaled within time
intervals whose lengths depend on the consonance (γi ) with
respect to the reference clock.
In the second event, the integer variable nsync is incremented. This variable is monitored by the observer depicted
in Fig. 13, in order to detect the first virtual clock to reach
this point. This observer resets clock watch after that event.
If watch exceeds a given value Π before all nodes increment
nsync then location Failure will be reached, expressing
that the system is not Π-synchronized. Note that the observer
makes use of the synchronization channel a, which is activated
by the dummy automaton shown in Fig. 14.
Once the observance instant and the precision observer have
been defined, the model must ensure that, for each virtual
clock, the delay in reaching this instant really corresponds to
the offset between the virtual clock and the reference clock.
According to the hybrid automata of Sect. V-C, the
Sync(n,a) primitive causes an update on the value of the
synchronizing virtual clock. This kind of clock assignments
may be indirectly modeled with the simultaneous restart of
the clocks involved in the synchronization action. However, in
our model virtual clocks cannot be restarted immediately after
the Sync(n,a) primitive because this would interfere the role
of the observer. Instead, virtual clocks have to continue until
they reach the observance instant and signal it.
This forces us to delay the simultaneous restart. In this
manner, the Sync(n,a) primitive does not cause a clock assignment (which is not possible in a timed automaton) nor an
immediate restart (which would make the measurement of the
precision impossible). Instead, Sync(n,a) causes an assignment
to a clock pointer, the variable ref_id, which is used later
on to detect when to restart vc[i]. This is shown in Fig. 15
for master 2, and in Fig. 16 for slave j.
In these automata, the channel abstraction of Sect. V-C,
based on the variable msg_id and the channel tx_msg, is
further simplified to reduce the complexity of the automata
and improve legibility. In fact, Sync(1,j) is signaled through
channel s1 whereas Sync(2,j) is signaled through channel s2.
In both automata it can be observed that vc[i] is restarted
when the corresponding virtual clock automaton signals
|through channel end[ref_id]| that the pointed clock has
reached a certain value R1= R/2 (third event in Fig. 12). This
modeling technique guarantees that all the clocks that have
synchronized to the same master are restarted simultaneously,
thus fulfilling Remark 1 in Sect. V-B. In contrast, whenever
two clocks do not synchronize to the same master, the offset
that these two clocks have accumulated in the round is kept
for the next round.
Channel all_end, which appears in the automaton of
Fig. 12, is used in order to avoid violation of time invariants.
The left auxiliary automaton of Fig. 14 uses this channel
to make every virtual clock automaton wait until all
masters and slaves have reset vc[i].
C. Modeling of Virtual Clock Rate Assignments
Clock rate assignments can be easily modeled with a variable γi that keeps the consonance with respect to the reference
clock. This variable is used by the virtual clock automata in
order to define the interval of occurrence of any relevant event.
19
TABLE I
FAULT ASSUMPTIONS AND PRECISION GUARANTEED ( IN µS ) WITH R =
1 SEC
# Channel faults
No faults
OD = 0
OD = 1
OD = 2
OD = 3
Fig. 15.
Automaton of master 2
Fig. 16.
Automaton of slave j
In Fig. 15 and 16 it can be seen that the value of γi is updated
in every synchronization action. Whenever a clock unit does
not synchronize to any master within a synchronization round,
the value of γi remains unchanged.
It is important to remark that whenever a clock unit synchronizes to a master that is not the current reference clock,
the clock unit "inherits" the drift error of that master. In this
case, the consonance after synchronization may be worse than
before synchronization. This can be observed in one of the
transitions fired by s2 in the slave automaton of Fig. 16.
D. Change of the Reference Clock due to Master Crash
Whenever the reference clock crashes, the timeline of the
model needs be redefined. Although it is not shown in the
automata, this recalculation is implicitly performed if in every
round the value of γi is assigned as follows:
• If master i is the current reference clock: γi := 0.
• If clock unit i synchronizes to the current reference clock:
γi := γ0 .
• If clock unit i synchronizes to a master n that is not the
current reference clock: γi := γ0 + γn + γref ; where γref
is the consonance between the current reference clock
and the reference clock of the previous synchronization
round.
VII. S OME V ERIFICATION R ESULTS
By applying the transformations described above, an OCSCAN system can be modeled as a network of timed automata
and the guaranteed precision can be model checked. In a
previous paper we provided some results that were achieved
0
2
2.1
6.1
10.1
14.1
# Faulty masters
1
2
3
2.1
2.1
2.1
2.1
2.1
2.1
6.1
6.1
6.1
12.1
12.1
12.1
16.1
16.1
16.1
with the complete U PPAAL model of OCS-CAN [8]. These
verification results were obtained in the following situations:
• Fault-free scenario.
• Only master faults (no channel faults).
• Only channel faults (no master faults), assuming data
consistency and without assuming data consistency.
• Master faults and channel faults, assuming data consistency and without assuming data consistency.
Concerning the precision guaranteed by the clock synchronization service, Table I shows the precision that was verified
under diverse fault assumptions. These results were obtained
with the following parameters: N= 4 masters, R= 1s, ∆0 = 0,
∆1 = 1 ms, ∆2 = 2 ms, ∆3 = 3 ms. Regarding the network load,
it was assumed that no other messages were sent on the bus,
so wcrt= 1.04 ms was used in those scenarios without channel
faults whereas wcrt= 6 ms was used in those scenarios with
channel faults.
The first cell in Table I shows the precision guaranteed in
the fault-free scenario. This precision equals to 2 µs. The first
row of Table I corresponds to the scenarios in which only
master’s faults were assumed. Note that the number of faulty
masters does not affect significantly the precision guaranteed.
This is due to the fact that master replacement takes place in
a very short time, which is negligible compared to R.
The first column of Table I corresponds to the scenarios in
which only channel’s faults were assumed. OD= 0 indicates
that no inconsistent omissions can occur, which is a common
assumption in other clock synchronization protocols for CAN.
The rest of cells in Table I correspond to the scenarios where
a combination of node and channel faults is assumed. In
particular, the right bottom cell corresponds to the most severe
fault scenario.
VIII. C ONCLUSIONS
In this paper, the formal verification of OCS-CAN has been
discussed. OCS-CAN is a solution for clock synchronization
over CAN that adopts a fault-tolerant master/slave clock
synchronization. It has been shown that this system can be
naturally described with hybrid automata, by modeling the
virtual clocks as variables that evolve over time with certain
rates.
An important particularity of these hybrid automata is that
they are not rectangular, because of the inevitable dependencies that the clock synchronization actions cause among
the clocks. This lack of independence makes, in principle,
the verification of these timed automata intractable by model
checking. However, we have shown that it is possible to
20
translate the hybrid automata into a timed automata verifiable
with the U PPAAL model checker. Thanks to this, the precision guaranteed by OCS-CAN has been successfully model
checked under diverse fault assumptions.
The techniques developed in order to carry out such translation have been presented, and they have been illustrated
in a simple example. These techniques somehow extend the
notion of perturbed timed automata, by allowing drifting
clocks whose rates may change dynamically as a consequence
of discrete actions. Our modeling may be useful for other
researchers that aim at model checking hybrid systems in
which variables are dependent.
Acknowledgments: This work is partially supported by DPI
2005-09001-C03-02 and FEDER funding. Authors would like
to thank Mercè Llabrés and Antonio E. Teruel for their useful
remarks on the mathematical notation.
R EFERENCES
[1] ISO: ISO11898. Road vehicles - Interchange of digital information Controller area network (CAN) for high-speed communication (1993)
[2] Rodríguez-Navas, G., Bosch, J., Proenza, J.: Hardware Design of a
High-precision and Fault-tolerant Clock Subsystem for CAN Networks.
Proceedings of the 5th IFAC International Conference on Fieldbus
Systems and their Applications (FeT 2003), Aveiro, Portugal (2003)
[3] Rodríguez-Navas, G., Proenza, J., Hansson, H.: Using UPPAAL to
Model and Verify a Clock Synchronization Protocol for the Controller
Area Network. Proc. of the 10th IEEE International Conference on
Emerging Technologies and Factory Automation, Catania, Italy (2005)
[4] Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What’s decidable
about hybrid automata? Journal of Computer and System Sciences 57(1)
(1998) 94–124
[5] Daws, C., Yovine, S.: Two examples of verification of multirate timed
automata with KRONOS. In: Proceedings of the 16th IEEE Real-Time
Systems Symposium (RTSS’95), Pisa, Italy. (1995) 66–75
[6] Alur, R., Torre, S.L., Madhusudan, P.: Perturbed Timed Automata.
In Morari, M., Thiele, L., eds.: 8th International Workshop, Hybrid
Systems: Computation and Control, HSCC 2005. Number 3414 in
LNCS, Springer–Verlag (2005) 70–85
[7] Behrmann, G., David, A., Larsen, K.G.: A tutorial on UPPAAL. In
Bernardo, M., Corradini, F., eds.: Formal Methods for the Design of
Real-Time Systems: 4th International School on Formal Methods for
the Design of Computer, Communication, and Software Systems, SFMRT 2004. Number 3185 in LNCS, Springer–Verlag (2004) 200–236
[8] Rodriguez-Navas, G., Proenza, J., Hansson, H.: An UPPAAL Model
for Formal Verification of Master/Slave Clock Synchronization over
the Controller Area Network. In: Proc. of the 6th IEEE International
Workshop on Factory Communication Systems, Torino, Italy. (2006)
[9] Alur, R., Madhusudan, P.: Decision problems for timed automata:
A survey. In Bernardo, M., Corradini, F., eds.: Formal Methods
for the Design of Real-Time Systems: 4th International School on
Formal Methods for the Design of Computer, Communication, and
Software Systems, SFM-RT 2004. Number 3185 in LNCS, Springer–
Verlag (2004) 200–236
[10] Tindell, K., Burns, A., Wellings, A.J.: Calculating Controller Area
Network (CAN) Message Response Time. Control Engineering Practice
3(8) (1995) 1163–1169
[11] Rufino, J., Veríssimo, P., Arroz, G., Almeida, C., Rodrigues, L.: Faulttolerant broadcasts in CAN. Digest of papers, The 28th IEEE International Symposium on Fault-Tolerant Computing, Munich, Germany
(1998)
[12] Proenza, J., Miro-Julia, J.: MajorCAN: A modification to the Controller
Area Network to achieve Atomic Broadcast. IEEE Int. Workshop on
Group Communication and Computations. Taipei, Taiwan (2000)
[13] Henzinger, T.A.; Pei-Hsin Ho; Wong-Toi, H.: Algorithmic analysis of
nonlinear hybrid systems. IEEE Transactions on Automatic Control
43(4) (1998) 540–554
[14] Schmuck, F., Cristian, F.: Continuous clock amortization need not
affect the precision of a clock synchronization algorithm. In: PODC
’90: Proceedings of the ninth annual ACM symposium on Principles of
distributed computing, New York, NY, USA, ACM Press (1990) 133–
143
21
Software Modeling of
Quality-Adaptable Systems
Javier F. Briones, Miguel Ángel de Miguel, Alejandro Alonso, Juan Pedro Silva
Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid
Abstract—Enclosing quality properties with software designs
is typically used to improve system understanding. Nevertheless,
these properties can be used to compose systems whose quality
can be adapted and predict their behavior. Existing software
modeling languages lack enough mechanisms to describe
software elements that may offer/require several quality levels.
This paper presents concepts that such a language need to
include to describe quality-adaptable systems.
Index Terms— Software Modeling Language, Software
Architecture, Quality of Service
I. INTRODUCTION
D
EPENDABLE systems are those on which reliance can
justifiably be placed on the functionality they deliver.
Reliance is usually evaluated by means of non-functional
parameters and encompasses measures such as reliability,
availability, safety and security. As the dependence on
computing systems increases, so does the likelihood of
hazards and malfunctions caused by those systems. The
consequences range from merely inconveniences (e.g. poor
quality in a video application) to the loss of human lives.
Engineering systems to the highest reasonably practicable
standards of dependability is a great challenge. To better deal
with this challenge, engineers could study primary sketches of
the systems attempting to estimate their probable
dependability measures; and not less important they could
include exhibited quality constrictions in analysis and design
documents. During the design of the system architecture these
constrictions can be allocated to the different subsystems and
thus responsibilities broken down. Only when the functional
and the non-functional characteristics of a subsystem are
completely described and a rigorous development process is
put in practice, enough confidence can be placed on the future
subsystems.
Sometimes it is appealing to create subsystems adaptable in
its environment, e.g. when an air traffic management vendor
makes a subsystem and expects to make profit from
subsequent sells, or when a multimedia subsystem needs to
deliver different quality levels according to user wishes or
environmental conditions… Adaptable subsystems can vary
the set of non-functional properties it provides. This
adjustment can occur during design time as in the first
example or in run-time as in the second one. Nevertheless, the
use of non-functional properties from design documents is
useful in both cases. In the first one, to help to compose the
architecture selecting the subsystems (potentially from
different vendors) that all together deliver the required
dependability properties. Certainly, the impossibility to
deliver such properties could be reached. And in the second
case, to simulate how the composition and adaptation would
work and thus detect quality levels that cannot be achieved, or
subsystems overused since under most circumstances they are
delivering very high quality.
In this work we undertake:
• the representation of non-functional properties in design
documents
• the description of software architectures adaptable in
quality
• the automatic composition of contracts based on the nonfunctional properties exposed by design documents
• the prediction of the quality behavior that the system being
constructed will exhibit
• the required mechanisms to allow simulations of adaptable
architectures.
II. RELATED WORK
The notion of building dependable systems by enforcing
contracts between subsystems has been largely exploited in
the literature. [1] remarks the importance of determining
beforehand whether we can use a given component within
mission-critical applications. This information takes the form
of specification and against it the component can be verified
and validated, thus providing a kind of contract between
component and its clients. The authors differentiate four
classes of contracts in the software component world: basic or
syntactic, behavioral, synchronization, and quality of service
contracts. They explain the particularities of each one and
examine the different technologies used to deal with each
kind. They distinguish four phases a contract passes through:
definition, subscription, application, and termination/deletion.
During the contract application, contracts should be monitored
to check whether they are violated, and violations can be
handled in different ways: ignore, reject, wait, and renegotiate.
The authors also identify different types for data constraints:
22
precondition, post condition and invariants, as they do with
control constraints. It is a wide-ranging paper that covers the
lifecycle of the four different contracts, as opposed to our
work which focus on only quality of service contracts.
Currently, we concentrate more in the design phase of the
development, so the definition and subscription (what we call
composition) of contracts is better covered. We focus, as well,
in just a specific group of technologies included by the term
model driven software development. Concerning constraints
and up to now we only work with invariants.
About the specification of QoS-aware architectures, we
want to highlight [3]. This paper recognizes three main
techniques used to specify QoS systems: extensions of
Interfaces Description Languages (IDL), mathematical
models, and UML extensions and meta-models. The author
elects this last technique as it is part of the initial submission
of the OMG standard [11] “UML Profile for Quality of
Service and Fault Tolerance Characteristics and Mechanisms”.
We coincide with this election, and our work could be
considered an extension to this standard although the work is
reusable if other UML extensions or similar meta-models are
used. [3] also points that QoS specifications can be used for
different purposes: i) specification of QoS-aware
architectures, ii) management of QoS information in QoS
reflective infrastructures (e.g. QoS adaptable systems), and iii)
generation of code for management of QoS concepts (e.g.
negotiation, access to resource managers). This is our main
intention, to demonstrate that QoS specifications are utilizable
to compose, analyze, simulate and develop QoS-adaptable
systems. Nevertheless, the cited standard centers in the
annotation of design models, providing support for QoS
characteristics, QoS constraints, QoS execution modes, and
QoS adaptation and monitoring. [3] and this standard is a
reference of our work in the annotation of QoS-aware
architectures.
In the position paper [9] P. Collet investigates what
properties need to be provided by the languages supporting
the four levels defined above in the context of software
components. To be used as a contract, he considers that a
contract must provide: a specification formalism, a rule of
conformance (to allow substitution), a runtime monitoring
technique. For the first requirement he does not have an ideal
candidate although he thinks QML [8] (QoS Modeling
Language) looks like the most advanced QoS specification
language. QML does not provide any means to express a QoS
contract according to some parameters that would come from
the component interface. In QML, a contract is an instance of
contract type (QoS aspects, such as performance or
reliability), and a profile associates a QML contract with an
interface. He remarks the importance of considering partial
conformance regarding the second requirement. Concerning
the last requirement he shows, after assuming that is not
possible to fully verify statically that contracts are never
violated, the need of a contract monitoring framework to
verify functional and non-functional properties included in the
contracts. Besides QML (proposed e.g. by [6] and [9]), other
specification formalisms are been used. We will use an
extended version of the OMG standard [11] “UML Profile for
Modeling QoS and Fault Tolerance Characteristics and
Mechanisms” (used e.g. by [3] and [5]); complemented with
OCL2 [12] to represent the constraints. Other option is CCL-J
inspired by OCL. The work in [7] enhances this formalism to
adapt it to a specific component model. CCL-J provides five
kinds of specifications: inv, post, pre, guarantee (guarantor is
the implementation of the method), rely (guarantor is the
method caller). Another choice is QoSCL [10] (QoS Contract
Language). This is a QoS-aware component meta-model
which includes some QML concepts and allows designers to
define quality levels contracts attached to provided and
required interfaces.
Another work [2] describes the architecture of a distributed
middleware that provides end-to-end QoS management
services in an application independent framework. Services
provided include: characterization and specification,
negotiation and establishment, and adaptation. As we do, it
asserts the importance to provide, for each component,
information on how the QoS of its output depends on the QoS
of its input and the amounts of resources it uses. The authors
use the concepts of feasible region and reward function;
similar notions can be found in our work with the minor
difference that we treat resources as another QoS (a QoS
provided by a model element). The paper nevertheless, focus
on the negotiation at run-time, meanwhile we mainly do on
the contract binding at design time.
There exists works that research how to incorporate
contracts in a model-driven development life cycle. Authors of
[4] consider contracts, including non-functional, as a
modeling technology to foster the assembly of componentbased applications mainly by allowing case tools checking
when the contracts are fulfilled. To support modeling
contract-aware components in UML they present an extension
to the UML meta-model. Even when they intended to keep
changes as little as possible, this diverges from our intention
of not modifying UML at all. For the authors abstract
contracts identified during the analysis phase are transformed
in more concrete contracts to accommodate design,
implementation, deployment and runtime phases; also new
contracts can emerge as the development process advances.
They use the concept of phase transition to deal with such
transformations. [5] takes this idea further promoting QoSaware model transformations to gradually resolve QoS
requirements in order to deliver efficient code. Our work does
not consider contract/requirement transformations and
succeeds only within a single modeling phase. Besides
creating a QoS-aware specification framework and propose a
way of handling and resolving QoS specifications in model
transformations, [5] also describes a QoS-aware execution
platform for resolving requirements at run-time. In contrast,
we study how our work could be reused to work during runtime or be mapped to a QoS-aware run-time platform. For the
implementation of transformations, [6] propose a novel
method: i) specify non-functional properties as QoS contracts
with a small set of stereotypes, ii) specify how they can be
implemented with aspects, representing them a bit like design
patterns occurrences, that is, using parameterized
collaborations equipped with transformations rules expressed
with meta-level OCL2. A design level aspect weaver takes the
form of meta-level OCL2 interpreter to output UML models
that can serve as a basis for code generation.
[7] presents the contracting system, ConFract, for an open
and hierarchical component model, Fractal. Fractal is a
component model with several features: composite
components, shared component, reflective capabilities and
openness. A Fractal component is composed by content and
membrane, where interceptors and controllers (contract,
lifecycle, binding, content) are located. Three types of
contracts (run-time specifications for the authors): interface
contract between each pair of client and server interfaces,
external composition contract which refer only to external
interfaces of the component to express usage, and internal
composition contract to express assembly and internal
behavior rules of the implementation. In ConFract, the system
dynamically builds contracts from specifications at assembly
time and updates them accordingly to the dynamic
reconfigurations of components. Contract violations are
handled by activating an atomic negotiation. Our work shares
with this one the automatic building of contracts, even when
ours is used mainly for simulation. They achieve it because in
their negotiation model, components have clearly identified
their roles and so they can interact automatically. They
distinguish three roles that we adopt: guarantor, beneficiary
and contributor. On the other hand, we do not confine to
hierarchical components as we use general UML2 [13].
To conclude, there are numerous QoS-aware architectures
proposed, being some of them component-based. Most of
them cover many of the aspects involved in this kind of
systems: admission test, resource reservation, contract
negotiation, composition, quality monitoring, adaptation, and
maintenance. Since we concentrate on design models and on
simulating the composition to analyze some properties of the
architecture, there is no need to discuss them. If the
architecture is going to be realized they will have to be studied
deeply.
III. META-MODEL IN THE UML PROFILE FOR QOS
We consider important to have a clear and comprehensive
model for QoS concepts. This model can be used to explain
the concepts used, as a source to build a UML profile, or as
the model to create modeling and analysis tools. We take as
foundation of our model the meta-model included in the
standard: “UML Profile for Modeling QoS and Fault
Tolerance Characteristics and Mechanisms”. In our opinion,
this meta-model lacks some mechanisms to enable the
composition of adaptable architectures in the way we
sketchily described in the introduction.
The following figure is part of the current meta-model to
show the relationships among some of the concepts of the
23
standard. We will describe first the main meta-classes of the
current meta-model to highlight afterwards the main
enhancements we propose.
Fig. 1. Current meta-model used to build the UML Profile for QoS.
• QoSCharacteristic. It is used to represent non-functional
characteristics of the system elements (e.g. services) included
in the design models. QoSCharacteristic is the constructor for
the description of non-functional aspects like, for instance:
latency, throughput or capacity. It allows specifying these
characteristics independently of the elements they qualify.
• QoSDimension: It is a dimension for the quantification of
QoSCharacteristics. We can quantify a QoSCharacteristic in
different ways, for instance: absolute value, maximum and
minimum values, or statistical values.
• QoSConstraint. It is an abstract meta-class to limit the
allowed values of one or more QoSCharacteristics. Two
approaches for the description of allowed values are: an
enumeration
of
QoSValues
for
each
involved
QoSCharacteristic, or an expression that must be fulfilled by
the QoSCharacteristics. These expressions define for instance
the maximum and minimum values and the dependencies of
QoSCharacteristics.
• QoSContext. The QoSContext establish the vocabulary of
the constraint. This meta-class is required since often a
QoSConstraint combine functional and non-functional
elements, and can have more than one QoSCharacteristic
associated.
• QoSRequired. When a client (of a software element or a
resource) defines its required QoSContraint, the provider that
supports it must provide some quality levels to achieve its
client’ requirements. When the provider defines its required
QoSConstraint, the client must achieve some quality
requirements to get the quality offered.
• QoSOffered. When the provider defines a QoSOffered
constraint, it is the provider who must achieve the constraint.
When a client defines a QoSOffered constraint, the client must
achieve the constraint. Often a QoS offered depends on the
QoS provided by the resources and the providers that the
software element uses.
• QoSContract. The quality provider specifies the quality
24
values it can support (provider-QoSOffered) and the
requirements that must achieve its clients (providerQoSRequired); and the client, the quality it requires (clientQoSRequired) and the quality it ensures (client-QoSOffered).
Finally in an assembly process, we must establish an
agreement between all constraints. In general, the allowed
values that client-QoSRequired specifies must be a subset of
values supported in provider-QoSOffered, and the allowed
values that provider-QoSRequired specifies must be a subset
of values supported in client-QoSOffered. Sometimes we
cannot compute the QoSContract statically, because it
depends on the resources available or quality attributes fixed
dynamically.
• QoSLevel. It represents the different modes of QoS that a
subsystem can support. Depending on the algorithms or the
configurations of the systems, the component can support
different working modes, and these working modes provide
different qualities for the same services. For each working
mode, we specify a QoSLevel. They represent states in the
system from a quality point of view.
• QoSTransition. It is used to model possible transitions
among QoSLevels
IV. NEW REQUIREMENTS
We enlarged the meta-model to allow:
• The specification of adaptable architectures whose elements
can change their quality characteristics to react to environment
changes or/and user wishes. The meta-class, QoSOffer, allows
grouping all the quality levels that can be offered in an
excluding way: only one can be ensured at a time during the
execution. An equivalent concept stands for QoSRequirement
where only one can be relied on. Implementing a quality level
implies to fulfill a set of constraints (the “allowed space”). If
the quality level is located in the offer-side the constraints are
of the kind QoSOffered. In case it is in the requirement-side
they are QoSRequired.
• The
composition
of
quality-aware
elements
(QoSAwareEntities) to bring up architectures that fulfill the
expected requirements. Indeed, this composition is only
studied from a non-functional point of view. QoSContracts
need to be bound in order to ensure the fulfillment of every
QoSRequired constraint based on the available QoSOffered.
This process can be automated if enough information is given
within the constraints. At design phase some QoSContracts
can be “given” by the modeler either because it is a preferred
contract or the only one allowed in a contract binding. A
contract binding comprises the negotiation of constraints to be
hold by QoSAwareInstances. A negotiation may involve
several QoSOffers and QoSRequirements according to the
QoSContexts of the constraints they involve. The negotiation
ends up when the level for every QoSOffer and
QoSRequirement is established. To compute all the QoS
contracts, the resources available and the quality attributes
need to be fixed statically but they can also be
simulated/estimated.
Fig. 2. QoS Negotiation.
• The definition of responsibilities in the monitoring and
adaptation. Three responsibilities have been observed:
guarantor, beneficiary and contributor.
- A QoSGuarantor has at least one offer whose quality can
be adapted; this means it must ensure a QoSLevel.
- A QoSBeneficiary has at least one adaptable
requirement; this also means a QoSLevel it can rely on.
- A QoSContributor represents any other element which
wants to be aware of a QoSConstraint. For example, when an
element is responsible of monitoring a constraint when this
constraint is involved in a contract during run-time, being this
element neither the guarantor nor the beneficiary.
V. CONCEPTS ADDED TO THE METAMODEL
The following figure shows some of the concepts that have
been added and their relationships. It includes some of
described elements:
QoSAwareEntity represent elements of a model which play
a role in the definition of quality-aware architectures. Not
every QoSAwareEnitity can be adapted. The number of
instances can be specified. We distinguish three different
roles:
• QoSGuarantor: the one that ensures a negotiated quality in
a quality contract
• QoSBeneficiary: the one that relies on a negotiated quality
• QoSContributor: other elements aware of a specific contract
QoSAwareInstance
represent
instances
of
QoSAwareEntities.
Specializations
identified
are:
QoSGuarantorInstance,
QoSBeneficiaryInstance,
and
QoSContributorInstance to model other elements aware of a
specific contract
QoSContract now include two new attributes: “given” to
indicate a contract defined by the modeler, and “renegotiable”
to indicate that a contract can be renegotiated in quality
adaptations occurring in the system.
25
Fig. 3. New QoS concepts added to the meta-model: QoSAwareEntity and QoSAwareEntityInstance.
Fig. 4. New QoS concepts included in the meta-model: QoSOffer and QoSRequirement.
QoSOffer is used to group the quality levels at the offerside, from which only one can be ensured at a time. Several
QoS guarantors that have the same QoSOffer need to
implement all the quality levels included in the offer.
QoSRequirement is used to group the quality levels at the
requirement-side from which only one has to be selected at a
given instant, the one that can be relied on in a contract.
Several QoS beneficiaries that have the same
QoSRequirement need to implement all the quality levels
included in the requirement.
QoSOffer and QoSRequirement inherit the abstract metaclass QoSExternalBehavior. A QoSAwareEntity presents zero
or more QoSExternalBehavior. Adaptable entities are those
ones which can change for at least one behavior the quality
level exhibited. Notice that during execution-time, an entity
instance exhibits at most one quality level for each behavior
its entity type declares. A meta-class QoSBehaviorRealization
links the behavior, the quality level exhibited on that behavior
and the instance exhibiting that quality level.
To offer a QoSLevel included in a QoSOffer, a guarantor
needs to meet all the QoSOffered constraints of the
allowedSpace. If any of the constraints is not satisfied the
QoSLevel is not respected. When a beneficiary is requesting a
QoSLevel included in a QoSRequirement, it is claiming that
the counter-part in the contract meet all the constraints
QoSRequired of the allowedSpace.
QoSContributorBehavior is some-how an artifice denoting
that a QoSContributor is a special QoSAwareEntity. OpLevel
makes reference to the constraints a contributor is related. A
contributor may exhibit a NoOpLevel indicating that the task
it performs is not required for the whole system to work.
At this moment, the concept QoSNegotiation of figure 2
26
can be better explained. It is used to group all the offers and
requirements involved in a negotiation process because they
are compatible. Two QoSExternalBehaviors are compatible
when the QoSConstraints they involve constrain the same
model elements: QoSCharacteristics and QoSAwareEntities.
After the negotiation process, one or more contracts are
bound. The contracts given by the user can alter the collection
of contracts bound. A bound contract implies that several
instances will need to fulfill some QoSBehaviorRealizations.
Another issue tackled is the dependency between the
quality offered by an element and the quality required by that
element. This is a common situation where the quality a
software component offers depends on the quality other
components supply to it. A situation where this does not occur
is the quality provided by a resource, such as the bandwidth of
a network; in this case there is no input quality. We had to
further develop the idea of dependency among quality
constraints because we care about the composition and
simulation of quality aware architectures. We need to describe
how an element is going to behave internally from a quality
point of view (QoSInternalBehavior).
• A table could be used to match input QoS required level
and output QoS provided level. This table should be included
in the design models.
• A mathematical or statistical function to include in the
design models.
• A constraint expressing output and input quality levels.
resources available change. There is a big difference between
functional and QoS composition. For the functional
composition it is enough an external view of the entities,
whereas for the QoS composition some properties of the
internals (behavior) need to be known. A simulation of the
system behavior allows us one or more degrees of freedom, by
estimating some values in some elements, but the rest of the
architecture need to be fully described.
We used a UML profile as concrete syntax of the proposed
metamodel. We use it to include QoS elements in the design
models. Since we take into consideration more concepts that
the ones included in the standard “UML Profile for Modeling
QoS and FT Characteristics and Mechanisms”, we needed to
augment the profile conveniently. The created profile enables
us to enhance models with features not included in the
standard.
We are working in a tool to enable the composition of
quality-aware architectures. Initially, this tool will validate the
quality constraints included in the model, check the contracts
declared, bind requirements and offers to build quality
contracts, find out quality levels at the requirement-side that
cannot be met because there is no offer in their context,
discover required constraints that cannot be hold since the
architecture does not allow to satisfy them. This tool will be
extended to fully compose and simulate quality-aware
architectures.
REFERENCES
VI. CONCLUSION
Once responsibilities, constraints, and quality levels are
incorporated within the design models, it is possible to
analyze the architecture to find out some features about the
behavior of the architecture. To enumerate some examples:
• Is it possible to fulfill all the QoS requirements even at
their lowest level of quality required? Will the system meet all
the non-functional requirements?
• Which quality-aware entities, among a repository, enable
to meet all the QoS requirements in the system? The
component of which vendor can I use to meet the
requirements?
• Is the architecture able to operate at every level of quality
offered for an entity of the system? Will be the system able to
offer the environment all the quality levels exposed?
• Has any of the quality-aware entity to operate at the
highest level of quality offered (with the quality requirements
establish for the system)? Is any component of the system
overused?
• How many quality reconfigurations need to be made
when a quality requirement change its quality level? How
many internal changes of quality level are triggered when the
user demands a new quality level?
To achieve the composition of quality-aware architectures
at design-time all the quality characteristics need to be
modeled. This include the quality behavior of entities, that is,
how an entity behaves when the quality it relies and/or the
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Antoine Beugnard, Jean-Marc Jézéquel, Noël Plouzeau, Damien
Watkins, Making Components Contract Aware, Computer, vol. 32, no.
7, pp. 38-45, July, 1999.
Mallikarjun Shankar, Miguel de Miguel, Jane W.S. Liu, An end-to-end
QoS management architecture, RTAS99 In Real-Time Applications
Symposium, IEEE Computer Society 1999.
M. de Miguel, QoS Modeling Language for High Quality Systems, In
8th IEEE International Workshop on Object-oriented Real-time
Dependable Systems, IEEE Computer Society, January 2003.
T. Weis, C. Becker, K. Geihs and N. Plouzeau, A UML meta-model for
contract-aware components, In Proc. of UML’2001 conference, LCNS
Springer Verlag, 2001.
A. Solberg, J. Oldevik, J. Aagedal, A Framework for QoS-aware Model
Transformation, using a pattern-based approach, In International
Symposium on Distributed Objects and Applications, LCNS Springer
Verlag, 2004.
J.M. Jezequel, N. Plouzeau, T Weis, K Geihs, From Contracts to Aspects
in UML Designs, AOSD Workshop on "AOP in UML", 2002.
P. Collet, R. Rousseau, T. Coupaye, N. Rivierre, A Contracting System
for Hierarchical Components, In Component-Based Software
Engineering, LCNS Springer Verlag, 2005.
S. Frolund and J. Koistinen. Quality of Service Specification in
Distributed Object Systems Design, Distributed System Engineering, In
Proceedings of the 4th conference on USENIX Conference on ObjectOriented Technologies and Systems, 1998.
P. Collet, Functional and Non-Functional Contracts Support for
Component-Oriented Programming, First OOPSLA Workshop on
Language Mechanisms for Programming Software Components, 2001.
O. Defour, J.M. Jézéquel, N. Plouzeau, Extra-functional Contract
Support in Components, In Proceedings of the 7th International
Symposium on Component-Based Software Engineering, LCNS
Springer Verlag, 2004.
UML Profile for Modeling QoS and Fault Tolerance Characteristics and
Mechanisms, OMG.
Object Constraint Language, OMG.
Unified Modeling Language, OMG.
2. Análisis Temporal
2. Análisis Temporal
29
Considerations on the LEON cache effects on the timing
analysis of on-board applications
G. Bernat, A. Colin
J. Esteves, G. Garcia, C. Moreno
N. Holsti
Rapita Systems
Thales Alenia Space
Tidorum
IT Centre, York Science Park,
York Y010 5DG
United Kingdom
100, bd. du Midi
BP99, F-06156 CANNES LA BOCCA
France
Tiirasaarentie 32
FI-00200 Helsinki
Finland
[email protected]
[email protected]
[email protected]
T. Vardanega
M. Hernek
University of Padua
ESA/ESTEC TEC-SWE
via Trieste 63, I-35121 Padova
Italy
Keplerlaan 1, Postbus 299, 2200 AG Noordwijk
The Netherlands
[email protected]
[email protected]
1. Introduction
This paper provides a short account on the findings
of the project “Prototype Execution-time Analyser
for LEON” (PEAL) funded by ESA/ESTEC and
executed in the course of 2006. The PEAL project
was a collaboration between academic researchers,
small and medium-sized tool vendors, and a large,
established space company. Our goal was to study if
and how the presence of cache memory in an onboard computer system complicates the reliable
verification of its real-time performance. We
adapted some timing analysis tools to the LEON
processor in a prototype fashion, performed
experiments, and drew the conclusions reported
here.
This paper is organized as follows. Section 2 sets
the scene by explaining the trend towards caches,
the verification problems that caches create, and
some suggested solutions to these problems. Section
3 summarises the objectives of the PEAL study and
section 4 sets out the assumptions of the study.
Section 5 describes the experiments and results and
section 6 presents our conclusions.
2. Study Context
2.1 The processor-memory gap
European space projects are moving away from
16-bit on-board processors such as the MIL-STD1750 series through rather simple 32-bit processors
such as the ERC32 and special-purpose processors
such as the ADSP-21020, to more complex and
powerful processors such as the LEON family. In
the long-term view the evolution of on-board
processors is expected to provide for: more
computing power through higher processor speeds;
simpler programming thanks to large, flat address
spaces, avoiding complications such as memory
overlays; the ability to run several applications on
one and the same computer, using memory
management units and other hardware features to
prevent space and time interference between
applications; and better software reusability and
portability through relaxed constraints on speed
and memory and through the adoption of general,
rather than space-specific, processor architectures
and software-development tools.
As processor speed increases, however, the speed
gap between the processor and the memory
becomes a critical bottleneck, as it has already long
been for ground-based computing. The LEON
processor family has therefore introduced a
memory hierarchy with a fast, on-chip cache
memory between the processor and the slower,
external (off-chip) main memory. The LEON design
also includes other accelerator features that aim to
isolate the processor from slow memory accesses:
instructions can be fetched in burst mode, and
writes to memory are buffered and completed
asynchronously.
2.2 The problem of variability and worstcase execution time
The cache memories in the LEON processor are
expected to provide for higher average computation
speed, but at the cost of more variable execution
time, as a cache miss takes much longer than a
cache hit. The number of hits and misses incurred
by an execution depends on the cache architecture
and on what the program does and how and when it
does that. Most on-board software is subject to realtime deadlines, so that the important figure is not
(only) the average execution time, but the worstcase execution time (WCET) or, in practice, a
trustworthy upper bound or estimate of the WCET.
The PEAL project addressed the question of
whether and how the presence of a cache helps or
else hinders the goals listed above, when we
consider the WCET and not just the average
execution speed. For the purposes of the project this
30
basic question presented four main avenues of
investigation:
• Is the verifiably-usable computing power always
really higher with a cache? Perhaps some (rare)
programs or situations may exist in which the
miss rate is so high that the cache is a brake
rather than an accelerator.
• Is programming simpler with a large memory,
even if the memory is slow, and fast access is
limited to a relatively small cache? Perhaps the
software designers must pay a great deal of
attention to making good use of the memory
hierarchy so as to not risk poor performance.
• Can we isolate applications running on the same
computer from one another? If one and the same
cache is shared among all applications that run
on one and the same processor, the applications
may compete for cache-space and thus slow each
other down. Some cache-handling errors in one
application could even propagate to corrupt other
applications.
• Can we achieve these goals without sacrificing
portability, maintainability and reusability of the
on-board software? If the design and coding of
the programs has to match the details of the
cache architecture, such as the associativity or
total size, the program becomes unportable to
other cache architectures, or at least may perform
worse when ported.
2.3 How caches affect execution time
Compared to a cache-less processor, four new
factors emerge which influence the execution time
of a program on a cache-equipped processor by
changing the pattern of cache hits and misses:
1. the total size of the program code and datain
relation to the cache size,
2. the location (i.e. the memory addresses) of the
program code and data,
3. the history of code and data addresses accessed
by the program in the past, and
4. the interrupts and preemptions
during program execution.
occurred
In actual fact, the LEON processor can also be
configured with a Memory Management Unit
(MMU) which uses page-mapping tables to
translate virtual memory addresses to physical
memory addresses. To speed this translation up (in
the average case) the MMU contains a small cachelike memory, called the Translation Look-aside
Buffer (TLB). References to the TLB can also “hit”
or “miss” and thereby reduce or increase the
execution time.
These effects of the cache and the MMU cause
problems to the verification of the real-time
performance of an application because they tend to
reduce execution time in average cases but make it
more difficult to set bounds on the worst-case
execution time. In particular, these effects reduce
the
predictive
value
of
execution-time
measurements obtained by test, because the total
program size, the memory layout, the execution
paths and the pattern of interrupts or preemptions
may all change from testing to flight, and the tests
themselves are unlikely to hit on the worst
combination of all these factors together. Cache
effects also reduce the precision of static analysis for
WCET because the analysis can only approximate
the state of the caches at various points in the
program.
Moreover,
schedulability
analysis
becomes harder because, firstly, the overhead of an
interrupt or context switch is no longer constant but
depends on the cache state at the time, and
secondly, the WCET of a task is no longer a property
of the task alone but depends on the interrupts and
preemptions imposed on the task, unless the kernel
takes costly measures to preserve the state of the
caches across an interrupt or preemption.
The crudest reaction to this problem is to avoid it
by simply disabling the caches, at least for the timecritical parts of the application. The drawback of
that solution of course is that the performance gain
is totally lost. This strategy could be defended if it
could be shown that the cache performance is so
unpredictable that the verifiable (worst-case) gain is
close to nil.
Another crude way is to ignore the problem
altogether by using caches fully and freely in the
same way as in non-critical systems such as desktop
computers and number crunchers. This way one
may enjoy increased (average) performance, but at
the cost of risking occasional, unpredicted and
quizzical performance problems. One could attempt
to reduce the risk by adopting larger performance
margins, even if this cuts into the available
performance gain, or by more performance testing,
even if the quantitative risk and risk reduction are
unknown. To defend this solution one can point out
that no in-flight mission failure or problem has yet
been traced to cache-related performance problems
– at least as far as we know.
Where the crude solutions are not satisfactory
then the only approach left is to use some
systematic method to get a safe upper bound on the
WCET, or a sufficiently reliable and precise estimate
of the WCET. If we discount the traditional end-toend execution-time measurement tests as too
unlikely to find the worst case, we are left with two
methods:
(i) static cache-aware WCET analysis [1, 5]; and
(ii) measurement-based WCET analysis [2, 4] with
some systematic way to specify and measure the
test coverage to ensure a sufficiently reliable result.
2.4 Methods for WCET analysis
We know of two methods that can give better
(safer) bounds or estimates of the WCET of a task
than end-to-end measurements can. The static
WCET analysis method makes a model of the task
that includes all its possible executions and gives an
upper bound on the duration of each execution. This
model combines control-flow graph with call graph
and includes all instructions that the task can
execute. An execution of the task corresponds to a
path through this graph. A model of the processor
gives an upper bound on the execution time of each
basic block in the graph. When this time depends on
the processor state (e.g. cache contents or register
values) a static analysis of the state changes is
necessary to find a safe approximation (upper
bound) on the state and the time; this step typically
uses abstract interpretation of the instructions. For
complex processors it can be quite difficult to devise
a model-analysis combination that is precise
enough but not detailed to the point of failing the
analysis through combinatorial explosion. Finally
an upper bound on the WCET of the whole task is
found by summing up the execution-time bounds of
the blocks along the worst-case execution path. This
step typically uses Integer Linear Programming to
find the maximum sum without explicitly trying all
execution paths.
The second method, called measurement-based
WCET analysis, also computes a WCET estimate
from the execution times of the blocks of the task
and the possible execution paths through these
blocks. However, here the execution times of the
blocks are measured, i.e. observed in test runs, and
not computed from a static analysis. Some
processors have special tracing hardware that can
measure the execution time of each block
transparently, without interfering with the
execution at all; the NEXUS interface is one
example. For other processors one must instrument
the program with additional instructions that read
the clock at suitable points in the code (e.g. on entry
to a block) and somehow record the resulting time
stamps (together with an identification of the
instrumentation point) in an execution trace. This
gives a set of samples of the execution times of each
block. For complex processors the execution time of
a block depends on the initial processor state when
the block is entered. Even if the test set executes
each block many times there is no guarantee that
the worst case for this block is measured. Some
implementations of measurement-based analysis
try to set the processor into a worst-case initial state
before each block so that the measured time is the
WCET for the block. However, for complex
processors it can be very difficult to find the worstcase initial state and to set the processor to that
state.
The measured block execution times are
combined into a WCET estimate for the whole task
by considering all possible paths in the task, as in
the static analysis method, but perhaps using
sophisticated statistics to compute execution-time
distributions.
Comparing the static and measurement-based
methods we may note the following:
• Complex hardware, such as caches, is a problem
for both methods: making a processor model is
difficult for the static analysis; and measuring the
worst case is difficult for the measurement-based
analysis.
• Both methods have another common problem:
the analysis may include execution paths – paths
through the control-flow and call graph – that are
impossible (i.e. infeasible) according to the
overall logic of the task. This defect leads to a
31
WCET over-estimation factor that depends on the
design of the task and can be important.
• Both methods must usually be guided manually.
The engineer running the analysis must usually
specify iteration bounds on complex loops and
mark some execution paths as infeasible so as to
reduce over-estimation.
2.5 Cache locking, freezing and
partitioning
Caches are good and easy to use for reducing the
average execution time because they are selfadjusting containers of the dynamically changing
working set of code and data. But this dynamic
behaviour
causes
unpredictability.
Several
researchers suggest cache locking for increasing
predictability, e.g. [6]. This technique requires that
the application deliberately loads some frequently
accessed code or data into the cache and then locks
this content in the cache so that it stays in the cache
and is not evicted until it is unlocked. The locked
part of the cache is similar to a scratchpad memory,
e.g. [7, 8]. The general LEON architecture supports
cache locking per cache line, but the fault-tolerant
(FT) LEON variant used in space only supports
cache freezing, which is equivalent to locking the
whole cache at once.
Another suggestion is to partition the cache so
that each task (or each software function) is given a
piece of the cache for its own use. This removes the
dynamic aspect of the cache-mediated interference
between tasks or functions but reduces the usable
cache size for each task or function, which may
increase the unpredictability within each task or
function. The current LEON architecture does not
support cache partitioning.
3. Study Objectives
Overall the PEAL project was given the following
four objectives:
• To evaluate the cache-sensitivity of typical onboard software (OBSW).
• To consider how the LEON cache should be
configured and how the application software and
kernels should use the cache (including possible
changes to the compilers and kernels) to improve
predictability, testability or analysability.
• To procure a prototype static WCET analyser for
the LEON, based on the ERC32 version of the
Bound-T tool [3] from Tidorum Ltd (however,
excluding static cache analysis at this stage).
• To procure and evaluate the measurement-based
WCET analysis by experiments with the
RapiTime tool [4] from Rapita Systems Ltd.
The project achieved much of these objectives
and also identified subjects for future work, whether
technical investigation or tool development. The
project reports are available from ESA [9].
4. Study Assumptions
In the PEAL project we considered primarily the
instruction cache (I-cache) for it can be analyzed
32
with much greater accuracy than the data cache (Dcache). Arguably in fact, the former depends in ways
that can be accurately analyzed on the fixed and
unmodifiable contents and the finitely variable
evolution of the program flow. The latter instead
depends on application-specific behaviour that is
difficult to categorize as well as on programming
and coding techniques (e.g. modes and numbers of
subprogram parameters; use of pointers) which
greatly vary across software suppliers. Thus the Dcache was disabled in all our experiments except for
experiments expressly aimed at the D-cache only. In
particular the study focused on the cache
configuration as provided in the LEON AT697E chip
from Atmel, which specifies an I-cache equipped
with 32 KB of memory, 4-way associativity and
least-recently-used (LRU) replacement policy. The
D-cache is similar but has 16 KB of memory and 2way associativity.
In the study we intentionally excluded reliance
on any ad-hoc support from the compilation
system. However, we surveyed and evaluated
research on compiler support for memory
hierarchies, for example compiler-supported use of
scratchpads as an alternative to caches.
Our fundamental intent was to gage the
magnitude of the impact of I-cache effects on the
verification and validation process of industrialquality OBSW. We did so empirically, by way of
experiments, yet striving to relate our findings to
sound engineering principles and industrial best
practices. On this account, the ultimate products of
the PEAL study are guidelines and recommendations, backed by experimental results and observations, for anticipating, assessing and taming the
effects that the LEON cache may have on typical
OBSW.
5. Experiments
5.1 Cache-risk patterns
To guide our experiments we constructed a set of
“cache-risk” design patterns: reasonable examples
of code for which a bad memory layout could cause
up to 100% I-cache misses, persistently or
sporadically, making execution up to four times
slower even for the fastest external memory. These
patterns helped us select the OBSW parts for our
experiments and set the experimental conditions,
such as good or bad memory layouts, but they are
certainly not an exhaustive list of designs that can
cause cache problems.
A common feature of the patterns is that they
“overload” some cache sets by using more memory
blocks that map to the same set than there are cache
lines in the set. If the design accesses these memory
blocks cyclically in the same sequence the LRU
replacement policy always evicts the blocks before
they are needed again, so we have 100% cache
misses for the overloaded sets.
For example, on the AT697E consider a loop in
which the loop body consists of 32 KB of code with
consecutive addresses and no inner loops. The first
iteration of the loop loads all this code into the
I-cache, which means that later iterations run with
no cache misses (assuming that the cache is not
changed by preemptions or interrupts). However, if
the loop body grows larger than 32 KB then some
cache sets are assigned more than 4 memory blocks;
these sets are overloaded and cause cache misses. If
the loop body is 40 KB or larger all I-cache sets are
overloaded and all code fetches are cache misses.
For code that is not laid out consecutively similar
problems may happen with much smaller amounts
of code. For example, on the AT697E consider a
loop with a body that just calls five procedures from
five separate modules. If we are unlucky the linker
may place these modules in memory so that these
five procedures are mapped to the same I-cache sets
(that is, the procedures have the same address mod
8 KB). In the worst case the procedures have the
same size and overlap exactly; then all cache sets for
the code in these procedures are overloaded resulting in 100% misses. Figure 1 illustrates the good
and bad layouts. This is an example pattern for the
effect of memory layout only.
The rest of the cache-risk patterns are variants of
this “loop-calls-five-modules” pattern. For example,
to illustrate the effect of execution history we make
one of the five calls conditional. As long as the
condition is false, the loop loads its I-cache sets only
by four memory blocks per set, so there are no
I-cache misses. However, if the condition becomes
true the load increases to five blocks per set and all
instruction fetches cause cache misses in the worstcase memory layout.
The patterns that illustrate the impact of
preemptions and interrupts put the loop and some
of the five calls in one task and the rest of the calls
in another task or an interrupt handler. Thus the
cache sets used in the loop are overloaded only
when a preemption or interrupt happens while the
loop is executing.
33
(a) Good layout
All memory
A
A
mod
8 KB
I-cache load
(blocks /s et)
E
D
A div 8 KB
01234567
Load ≤ 4: all hits
I-cache load
(blocks /s et)
All memory
A
mod
8 KB
B
C
(b) Bad layout
A
B
E
C
A div 8 KB
D
01234567
Load > 4: 100% misses in these sets
Figure 1: Good and bad layouts of "loop A; B; C; D; E; end loop".
5.2 The example OBSW
5.3 The OBSW I-cache experiments
We made several experiments on one example of
on-board software (OBSW) for the LEON from
Thales Alenia Space (then known as Alcatel Alenia
Space). This OBSW is a baseline intended for reuse
across new-generation on-board systems. Its architecture is highly representative of real industrial
OBSW with a real-time kernel (OSTRALES),
concurrent and preemptive tasks and interrupt
handlers. The OBSW is easily customisable and can
be run in a fully simulated environment. It is not a
full, ready-to-fly OBSW; it implements only a
selection of services and functions, but these include
both heavy algorithmic parts such as an Attitude
and Orbit Control System (AOCS) and heavy datahandling parts such as an on-board Mission TimeLine (MTL) of time-tagged telecommands. The
AOCS is autocoded in C from Matlab Simulink. The
MTL implements most of the corresponding PUS
service, is written in Ada and uses several balanced
trees to hold the telecommands.
The OSTRALES kernel was extended to be
cache-aware. Firstly, the cache can be frozen during
interrupt handling. Secondly, the cache can be
frozen during the execution of some tasks and
unfrozen for other tasks as controlled by a settable
task attribute. Thirdly, the kernel provides an API
that lets the OBSW flush, freeze or unfreeze the
cache at any time. “Flushing” the cache means to
discard all cached code or data; after a flush the first
access to any address is always a miss. For our
experiments we also extended the OBSW with direct
commands that control the cache via this API.
Our OBSW experiments focused on the I-cache
usage in the AOCS and the MTL. All the
experiments were run in the same input/output
scenario. The AOCS scenario was a gyro-stellar
control mode with open-loop tabulated equipment
inputs. The MTL scenario was a sequence of
insertions of telecommands in the time-line. We ran
over 20 experiments with different ways of using
the cache and/or different memory layouts. Table 1
shows in summary how we varied the experimental
conditions to investigate various questions. In all
OBSW experiments the D-cache was disabled and
Instruction Burst Fetch was enabled.
Our OBSW experiments were run on the TSIM
simulator from Gaisler Research (professional
version 1.3.9). It should be noted that execution
times on TSIM may differ by some 20% from the
execution time on a real AT697E. We chose to
generate the execution traces by instrumenting the
OBSW because the non-intrusive alternative was
too slow on TSIM. We compared the experimental
results from the two WCET tools, RapiTime and
Bound-T, both for the overall execution time (end to
end) and at the detailed levels of single subprograms and basic blocks of code, and for different
experimental conditions.
Figure 2 shows the process for instrumenting,
compiling, executing and analysing the OBSW
experiments, starting from the OBSW source-code
on the left and ending with a comparison of the
WCETs found by Bound-T and RapiTime on the
right. (The arrow carrying the Control Flow Graph
(CFG) from Bound-T to RapiTime is special to the
PEAL project and was necessary only because
RapiTime was not yet fully adapted to Ada
programs on the LEON.)
34
Table 1: Questions and conditions for OBSW experiments
Question
Condition 1
Condition 2
Importance of the I-cache overall
I-cache enabled and unfrozen for the
whole OBSW (including interrupts)
I-cache disabled
Importance of I-cache retained
content from one task activation to
the next (task is AOCS or MTL)
I-cache enabled and never flushed;
code loaded by a task activation may be
reused in next activation
Flush the I-cache at the start of each
task activation
Effect of a preemption on a task (task
is AOCS or MTL)
Flush the I-cache at start of task
Flush the I-cache also at some chosen
point inside the task
Effect of MTL task on AOCS task
Let both the AOCS and the MTL use and
update the I-cache (unfrozen)
Freeze the I-cache during MTL
execution; unfreeze during AOCS
Effect of AOCS task on MTL task
As above
Freeze the I-cache during AOCS
execution; unfreeze during MTL
Effect of good memory layout on
MTL
I-cache unfrozen for MTL only;
uncontrolled memory layout as chosen
by the linker
I-cache unfrozen for MTL only;
memory layout of MTL modules
chosen to reduce the number of
cache conflicts
Effect of bad memory layout on MTL
As above
As above, but MTL module layout
chosen to increase the number of
cache conflicts
OBS W
s ources
RapiTime
ins trumenter
Compilation
LEON binary
for tes t cas e
Bound-T
WCET bounds
w/o cache
Layout
(ld s cript)
Tes t execution
environment
(TS IM + equipment s im.)
Extended CFG
with i-points
(XML)
Comparis on
Execution trace
with time s tamps
RapiTime
ET s tatis tics
WCET es timates
Firs t compilation
Layout tool
Figure 2: Compilation, execution and analysis of OBSW experiments
5.4 The D-cache experiments
We made some experiments with the D-cache
using small routines for CRC-16 computations. The
CRC procedure takes two inputs: the data packet
(an array of 512 octets) and a look-up table (a 256element array of 16-bit constants). To increase the
number of data references we made two variants of
this procedure: the first variant computes the CRC
and also copies the data packet into an output
packet, octet by octet; the second variant computes
the CRC and also compares the data packet to a
second input packet, octet by octet. The experimental variables were the procedure variant, the
data in the packet, the layout (addresses) of the
data, D-cache disabled or enabled, and a possible Dcache flush in the middle of the CRC computation to
simulate a preemption. These experiments were run
on a LEON FPGA board, model GR-XC3S-1500,
from Pender Electronic Design and Gaisler
Research. We ran the experiments once with SRAM
memory and once with SDRAM. We used Bound-T
to get static WCET bounds for comparison. We did
not use the RapiTime tool for these very simple
programs.
5.5 Results of the experiments
All numerical results in cache-performance
experiments are very specific to the application, the
experimental conditions and the processor, cache
and memory architectures. The numbers that we
present below are only samples or examples. They
should not be taken as the “typical” impact of the
LEON cache and certainly not as the maximum
impact. Moreover, the impact is proportional to the
cache-miss penalty, which is quite low on the
AT697E, while larger cache-miss penalties are
expected in future space systems.
Bound-T versus RapiTime
We found that the static WCET bounds from
Bound-T were quite comparable to the
measurement-based
WCET
estimates
from
RapiTime with the caches disabled. This is not
surprising because when caches are disabled the
execution time of a LEON code block is almost (but
not quite) independent of context and both tools use
the same method to compute the total WCET from
the WCETs of the blocks. A similar comparison with
the caches enabled is not possible because Bound-T
does not yet have a static cache analysis.
Over-estimation?
The WCETs computed by Bound-T and
RapiTime were clearly larger than the largest
measured end-to-end execution times, by 22% for
the AOCS and by 115% for the MTL. We do not
know how much of this difference is due to overestimation, for example from infeasible paths,
because we do not know the real WCET. The larger
percentage for the MTL is probably explained by the
larger logical complexity of the MTL code which
makes it less likely that our experiments measured
the real worst case (end-to-end) and more likely
that the analyses included infeasible paths.
Cache gain
Enabling the I-cache for the OBSW decreased
execution times by a factor of 2.32 to 2.64. A factor
of 4 is typically quoted as the LEON/AT697E gain
from the cache; our smaller factor shows only the
I-cache effect since the D-cache was always disabled
in our OBSW experiments. Enabling the D-cache for
the CRC experiments (with the I-cache already
enabled) decreased execution time by a factor of 1.4
to 1.9. The smaller factor is for SRAM and the larger
for SDRAM, showing the influence of the larger
cache miss penalty in SDRAM.
Suspension impact
A typical OBSW task does not run continuously
but is activated (triggered) periodically or by some
event, performs some computation, and then
suspends itself to wait for its next activation. While
the task is suspended other intervening tasks may
use the cache and change its contents; when the
suspended task is activated again it will usually
incur several cache misses as it reloads its own code
and data into the cache. We call this the suspension
impact. We measured suspension impact by comparing the execution time of a given task under
three conditions: 1) no suspension impact because
only this task is allowed to change the cache (the
cache is frozen in all other tasks); 2) natural
suspension impact because all tasks are allowed to
change the cache; 3) the worst-case suspension
impact for this task because we flush the cache at
the start of the task. A very clear example of I-cache
suspension impact was observed for the MTL task
where execution time was 22% longer for condition
(3) than for condition (1). The D-cache suspension
impact for the CRC computation was 4% to 16%.
Interestingly we also saw one case of beneficial
suspension impact: a task ran faster under
condition (2) than under condition (1) because an
35
intervening task brought into the I-cache the code
for a subprogram that the suspended task calls early
in its (next) activation, but which is then evicted by
other code in this task. Under condition (2) the
cache misses for reloading this subprogram happen
in the intervening task.
Preemption impact
If a task is preempted and the preempting task is
allowed to change the cache, the preempted task
usually incurs more cache misses after the
preemption as it reloads its own code and data into
the cache. We call this the preemption impact. It is
difficult to measure because it depends on the point
of preemption, not only on the cache conflicts
between the two tasks. Our experiments simulated
preemptions by flushing the cache at some chosen
point in the “preempted” task. If the preemption
point is placed very close to the start of a task it is
similar to the suspension impact. Our OBSW
experiments did not find very large I-cache
preemption impacts, probably because we did not
find the “bad” preemption points in the rather
complex OBSW code. The D-cache preemption
impact for the CRC computation increased the
execution time by 2% to 11% for a single simulated
preemption.
Layout impact
We also experimented with the layout impact, or
how the execution time depends on the memory
layout (memory addresses) of the relevant code and
data, as illustrated in Figure 1. We chose the MTL
function in the OBSW for this experiment because it
contains loops that call several procedures, in a
manner that resembles the pattern described in
section 5.1. We created a “good” layout of the MTL
procedures by placing them consecutively in
memory; they occupy only 22 KB so the consecutive
layout does not overload any I-cache set. We also
created a “bad” layout by placing the procedures to
start at the same address mod 8 KB, thus mapping
all starting points to the same cache set. However,
we later found a still worse “hybrid” layout in which
the mid-points of the MTL procedures are placed at
the same address mod 8 KB. We compared these
layouts under two conditions: 1) the I-cache is
flushed at the start of an MTL task activation; 2) the
I-cache is not flushed at that point. In both cases the
I-cache was frozen for all other tasks, so under
condition (2) the I-cache has the same content at
the start of an MTL activation as at the end of the
preceding activation.
Under condition (1) we found essentially no
execution-time difference between the layouts. A
closer look into the MTL code showed that no loop
within one MTL activation calls more than 4 of
these procedures, which means that even the “bad”
layouts do not overload the 4-way I-cache. In
contrast, under condition (2) the repeated
activation of the MTL task forms an outer loop and
this loop does call more than 4 procedures. Thus the
4-way I-cache can be overloaded (although not in all
cache sets) and execution time increases by 11% for
the “hybrid” layout over the “good” layout. Note that
36
these layouts are probably not the really worst and
best layouts for this code.
For the D-cache experiments we controlled the
addresses of the data packet, the look-up table and
the comparison packet or copy packet. The dataaccess pattern of the CRC computation is so simple
that it was rather easy to create a very “good” layout
and a very “bad” layout. Note, however, that the
addresses (indices) used in the look-up table
depend on the data in the input packet. We used a
specific input packet to maximize the conflicts.
As expected we found no D-cache layout impact
in the pure CRC computation because it accesses
only two data objects (the data packet and the lookup table) and so does not overload the 2-way Dcache. Nor was any layout impact observed in the
variant that computes a CRC and makes a copy
packet, because the third data object (the copy
packet) is only written, and data writes in the LEON
never evict any cached data (for data writes the
LEON uses write-through with no allocation of
cache lines in case of cache miss). However, the
variant that computes a CRC and compares the data
packet to a comparison packet has three data inputs
and suffers a massive layout impact: the “bad”
layout runs 35% to 63% slower than the “good”
layout. In fact, if we completely disable the D-cache
with the “bad” layout in SRAM we only add 8% to
the execution time, which shows that the layout
impact almost cancels the whole D-cache gain in
this experiment.
Reserving the cache for one task
We did not experiment with selective cache
locking as that feature is not supported in the LEON
FT. As a rough alternative we used cache freezing to
select which tasks update the cache. For example,
we found that the MTL runs 17% slower if other
tasks also update the cache.
Summary of results
To summarise, the size of the impacts we found
could be significant with regard to typical OBSW
deadlines and margins. The impacts could arise or
stay dormant in rather complex ways. The impact is
likely to increase with larger cache-miss penalties in
future processor architectures.
6. Conclusions
In this section we recapitulate the lessons we
drew from the study and formulate some
considerations on ways (whether technical or
methodological) in which space industry could
master the presence of I- and D-caches in newgeneration processors and mitigate the risk effects
that may arise from them on the qualification of the
execution performance of on-board software.
Based on this work, we consider caches to be a
significant problem for the reliable verification of
real-time performance in OBSW. At least, when
real-time performance is critical, the risk is high
enough to be worth considering and countering in
some way. A small modification (in layout only, for
example) to an already verified system can result in
a system with a different behavior with little
warning to the user.
The basic problem is that the performance of a
cache (miss ratio) depends on so many factors that
test-sets of practical size are unlikely to cover the
worst case, especially if evolution of the software is
considered (e.g. possible changes to the memory
layout and to the code itself). Cache designers try to
make the bad cases very unlikely. There is a critical
area where the likelihood of a bad case is so small
that it is missed in testing, but is too large to be
risked in flight. This risk is inherent in the idea of a
cache as a self-adjusting container for the current
working set. The contents of the working set are
dynamic, history-sensitive, and thus hard to predict,
especially when there are concurrent activities.
Furthermore, practical cache designs cannot
contain the “true” working set, but are limited (by
capacity and address conflicts) to an approximation
of the working set. In extreme cases (i.e. with many
cache conflicts) the approximation is quite poor,
which may cause many misses.
The risk of bad cases can be reduced somewhat
by suitable software designs and coding styles.
Caches are designed to make use of temporal and
spatial locality of reference and thus will work
better when the design and code create more local
references (to code and data) than non-local ones.
One recommendation is to favour “local” or lowlevel loops that do not call many subprograms from
other modules, and to avoid high-level loops that do
call many subprograms from other modules when
the called subprograms do not contain local loops.
There is much research in more predictable
alternatives to cache memories, such as
scratchpads, or more predictable ways of using
caches, such as locking parts of the code and data in
the cache. We feel that such ideas could be useful
and practical for small, simple applications where
the critical code or data are easy to identify. Actual
OBSW is too complex for this, especially if several
applications share one and the same computer.
Industry does not want to return to such detailed
memory management, which is also unportable.
Still, some gains could be expected by locking the
instructions and data of some time-critical tasks in
cache across successive task activations, of course
on a scale limited by the tiny size of the caches. The
current LEON FT and AT697E do not provide
locking for parts of the cache but a similar effect can
be achieved by designing the memory layout so that
some cache sets (some ranges of address mod 8 KB)
are reserved for the critical code and data, placing at
most 4 code blocks in each I-cache set and at most 2
data blocks in each D-cache set, and placing all noncritical code and data in other sets (other values of
address mod 8 KB).
In theory, a static analysis of the program, based
on a detailed model of the processor, the caches and
other relevant parts of the system (e.g. the MMU)
can give a safe upper bound on the WCET of each
task. Such WCET tools are currently used in the
aerospace and automotive industries [5] but are not
yet available for the LEON. However, the detailed
processor model is hard to define and implement,
37
and the WCET bound is often too large (pessimistic), especially for the D-cache. Our experiments
illustrate how the pessimism depends on the
software design and how the WCET bound can be
improved by design changes or manual guidance of
the analysis to exclude infeasible execution paths.
However, including interrupts and preemptions in
the WCET analysis or the schedulability analysis is
still an issue for research, as is a good D-cache
analysis for pointer-heavy programs.
Measurement-based WCET tools such as
RapiTime do not need a processor model, but only a
way to just measure the execution time of small
program fractions, by either hardware probes or
software instrumentation. Though far better than
simple end-to-end measurements, the WCET estimates obtained from these tools are not guaranteed
safe upper bounds and can also become pessimistic
through undetected infeasible execution paths.
Current static and measurement-based WCET
tools have an important common problem: they
apply to a single memory layout, the current one.
They cannot consider all possible layouts, except by
trying them all. There is room for tools that visualise
the layout and its possible cache conflicts and
perhaps improve or optimize the layout to reduce
conflicts. One such tool, the RapiCache cacheconflict analyzer, was initiated in this project.
In conclusion, we find that, for the prospective
use of LEON uniprocessors, caches currently
represent the most significant problem for reliable
verification of real-time performance. The MMU,
and especially the TLB, can cause similar problems,
but we have not studied this issue experimentally.
For
multiprocessors
and
systems-on-chip,
analysing the timing and performance of traffic on
the shared buses, like the AMBA bus for LEON
systems, will also be a hard or perhaps even harder
problem.
References
[1] “Fast and efficient cache behaviour prediction
for real-time systems”, by C. Ferdinand and
R. Wilhelm. Real-Time Systems 17, 131–181,
1999.
[2] “WCET analysis of probabilistic hard real-time
systems”, by G. Bernat, A. Colin and
S.M. Petters. In Proceedings of the 23rd RealTime Systems Symposium RTSS 2002, Austin,
Texas, USA, 279–288.
[3] Bound-T Execution Time Analyzer.
http://www.bound-t.com, Tidorum Ltd.
[4] RapiTime.
http://www.rapitasystems.com/wcet.html,
Rapita Systems Ltd.
[5] aiT Worst-Case Execution Time Analyzers.
http://www.absint.com/ait/,
AbsInt Angewandte Informatik GmbH.
[6] “Cache contents selection for statically-locked
instruction caches: an algorithm comparison”,
by A. Martí Campoy, I. Puaut, A. Perles Ivars
and J. V. Busquets Mataix. In Proceedings of
the 17th Euromicro Conference on Real-Time
Systems (ECRTS 2005), 6-8 July 2005, 49-56.
[7] “Influence of memory hierarchies on
predictability for time constrained embedded
software”, by L. Wehmeyer and P. Marwedel. In
Design, Automation and Test in Europe (DATE
2005), 7-11 March 2005, Vol. 1, 600-605.
[8] “WCET centric data allocation to scratchpad
memory”, by V. Suhendra, T. Mithra,
A. Roychoudhury and T. Cheng. In Proceedings
of the 26th IEEE International Real-Time
Systems Symposium (RTSS '05), 223-232.
[9] “PEAL Final Report”, TR-PEAL-FR-001;
“WP13 - Study of Cache Usage Effects in
Typical OBSW”, TR-PEAL-TN-WP13; "Caches
and the LEON", TR-PEAL-TN-003. Send
requests for copies to Maria Hernek,
[email protected].
PEAL is ESTEC/Contract 19535/05/NL/JD/jk.
38
A Stochastic Analysis Method for Obtaining the
Distribution of Task Response Times
Joan Vila-Carbó, and Enrique Hernández-Orallo,
Departamento de Informática de Sistemas y Computadores. Universidad Politécnica de Valencia. Spain
Abstract—Real-time analysis methods are usually based on
worst-case execution times (WCET). This leads to pessimistic
results and resource infrautilisation when applied to highly
variable execution time tasks. This paper uses a discrete statistical description of tasks, known as histograms, that enables a
powerful analysis method which provides a statistical distribution
of task response times. The analysis allows to study workloads
with utilisations higher than 1 during some overloads. System
behaviour is shown to be a stochastic process that converges
to steady state probability distribution as long as the average
utilisation is not higher than 1. The paper shows that workload
isolation is a desirable property of scheduling algorithms that
highly eases the analysis and makes it algorithm independent. An
analysis method, referred as the “interference method”, which
is algorithm dependent is also introduced and illustrated for the
case of GPS algorithms.
I. I NTRODUCTION
The histogram method is based on the idea of improving
the results of the worst-case kind of analysis in the areas of
real-time systems and network calculus. The basic hypothesis
is that most workloads are inherently variable and the worst
case is, in general, far from the average case. This is a well
known fact in network traffic and multimedia systems, because
most media encoding patterns produce bursty workloads. But it
is also true for real-time systems, not only because multimedia
processing is more and more common in these systems, but
also because there are important sources for uncertainty, like
cache memories or complex and unpredictable algorithms. In
this scenario the worst case analysis produces very pessimistic
results and leads to resource infrautilization.
Two main approaches have been proposed in the literature to
deal with real-time requirements: deterministic and statistical
techniques. Temporal requirements in deterministic techniques
are mainly based on the deadline mechanism. They allow no
deadline violations in data processing or packet losses in data
transmission. A lot of work has been done in this area during
the last decades (see [1] for example).
Providing statistical guarantees has been also analysed in
the literature, although to a lesser extent than the deterministic
model. Statistical techniques are usually regarded as QoS
(Quality of Service) techniques. They usually guarantee a
processor (or trasmission) bandwidth and allow some percentage of deadline violations or data losses in order to improve
resource utilisation. This way, the work of [2] is representative
This work was developed under grant of the Spanish Government CICYT
under grant TIC2002-04123-C03-03
of a set of servers for jointly scheduling hard and soft realtime tasks, using variable execution times in soft tasks and
probabilistic deadlines. The statistical model is much more
common in network calculus, where it was first introduced in
[3] for a channel-establishment test with statistical guarantees
using the Earliest-Due-Date (EDD) packet scheduling policy.
A statistical analysis of the GPS (Generalised Processor Sharing) flow based theory was introduced in [4].
One of the main problems with highly variable execution
times is finding a simple task model described by a reasonable
number of parameters that enables powerful analysis methods
and provides accurate results. Some works based on the
statistical model concentrate on matching statistical parameters
of continuous distributions. This is the case of some traffic
modelling of video sources [5], [6]. However, there is no
consensus on a model [7], The proposals usually degenerate to
a readily analysable Gaussian model on very large networks.
An interesting approximation for traffic characterisation are
histogram based models which describe tasks or network
traffic as a discrete statistical probability mass function.
In the area of real time systems one of the first works
to model execution times using statistical variables and to
calculate task interferences using the concept of statistical
convolution is by Tia et al. [8]. The work by [9] uses the
same approach but a different method. In [10] Lehoczky makes
an stochastic analysis with intense traffic called “Brownian
movement”. The work by Dı́az, Garcı́a et al. [11], [12] makes
a complete analysis of the RM (Rate Monotonic) algorithm
with task execution times characterised by histograms. The
analysis includes a method for calculating task interference
and the resulting stochastic process.
In the area of network calculus the histogram model was
used by Skelly et al. [13] to predict buffer occupancy distributions and cell loss rates for multiplexed streams. It was also
used by Kweon and Shin [14] to propose an implementation
of statistical real-time channels in ATM using a modified
version of the Traffic Controlled Rate-Monotonic Priority
Scheduling (TCRM). These works used an analysis method
based on a M/D/1/N queueing system where traffic sources
are approximated to a Poisson distribution with a rate λ which
is modelled as a discrete random variable.
This paper generalises the stochastic analysis for real-time
tasks with highly variable task execution times modelled as
discrete probability mass functions (histograms). The method
introduces the hypothesis of workload isolation and shows
that the stochastic analysis of the system can be done in an
algorithm independent way when this hypothesis holds. Work-
Interval
number
i
0
1
2
3
4
5
39
Class
interval
interval(i)
[0, 20[
[20, 40[
[40, 60[
[60, 80[
[90, 100[
[100, 120[
Midpoint
Probability
ci
10
30
50
70
90
110
xi
0
0.1
0.4
0.2
0.15
0.15
Cumulative
probability
x+
i
0
0.1
0.5
0.7
0.85
1.0
Fig. 1: Grouped probability distribution
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
Fig. 2: Probability mass function and cumulative distribution
load isolation is only approximately met by some algorithms.
When it does not hold, the paper shows how to extend the
method using a given scheduling algorithm. This is known
as the interference method and is solved for the case of GPS
algorithms.
II. H ISTOGRAM
BASICS
The basic idea behind this work is improving the results of
the deterministic analysis by providing a task characterisation
that describes more accurately the workload variability than
deterministic or simplistic descriptions and provides a powerful analysis method.
The proposed model characterises tasks with variable execution times using histograms. A histogram is a form of a
bar graph representation of a grouped probability distribution
(fig. 1) which is a table that represents the values of a variable
against their corresponding probabilities (or frequencies). The
range of values of the variable is, in general, continuous and divided into intervals (also referred as classes). The probabilities
of values in an interval are grouped together. All the intervals
are the same width and are characterised by their lower limit,
upper limit, midpoint and interval number. Figure 1 shows
the grouped probability distribution of a sample workload and
figure 2 shows the corresponding histogram. For convenience,
the X-axis of the histogram will show the interval number or
its midpoint rather than the interval limits.
Histograms processing will be done using their probability
mass functions. Given a variable X representing a task execution time characterised by a grouped probability distribution
divided into n intervals, the probability mass function (pmf) of
X, denoted as X (in calligraphical letters), is the sequence of
values xi that represent the (grouped) probability that variable
X takes a value in interval number i:
X = [xi , i = 0 . . . n],
xi = P (lower(i) < X ≤ upper(i))
The values upper(i) and lower(i) represent the upper and
lower limits of interval i of variable X. Figure 2 represents the
pmf (in bars) and the cumulative distribution function (cdf) (in
dashed lines) of the previous grouped probability distribution.
Some important values P
in a histogram are the mean value
(or expectation) E[X ] = i xi ∗ i, and the maximum value
M [X ] = max(i : xi > 0).
The goal of this pmf definition is to make abstraction of
some variable attributes, particularly the range of values of
X. Two random variables X and Y with different range
values , denoted as range(X) = [0, Xmax ] and range(Y ) =
[0, Ymax ], will be said to be equivalent if X = Y. This means
that the ranges of variables X and Y are divided into the same
number of intervals and the probabilities of all intervals are
the same. This occurs frequently when scaling variables.
For example, if some variable X represents the amount of
data arrived over a time period, the computation time Y of
a task is usually proportional to X, so histograms X and Y
will be equivalent as long as they are expressed with the same
number of intervals. Similarly, if some variable X represents
the computation time of a task referred to a processor with rate
(speed) r1 , then the execution time referred to a processor with
rate r2 will be Y = X ∗r2 /r1 but X and Y will be equivalent.
A. Histogram number of classes and precision
One of the most critical issues of the histogram method is
the precision that can be obtained by making a continuous
variable discrete. The key issue is to determine the number of
classes of a histogram. This is in general a tradeoff between
representation economy and precision: with too many intervals, the representation will be cumbersome and and histogram
processing expensive, since the complexity of algorithms
mostly depends on the number of classes but, on the other
hand, too few intervals may cause loosing information about
the distribution and masking trends in data.
Another important problem is that histogram processing
with a low number of classes results in important precision
errors. It is paradoxical that these errors occur even if that
low number of classes is enough to properly describe a given
workload without loosing much information. The reason for
those inaccuracies seems to be the effect of discontinuities
in histogram processing through iterative algorithms. The
solution proposed in this paper consists in reducing the discontinuities of histograms by “artificially” increasing the number of classes through a transformation called overclassing.
This transformation is a function ∆f : X −→ Y consisting
in uniformly distributing each probability xi in an interval
[lower(i), upper(i)] of length f centered around i such that:
yi = P (Y = j) ≡ P (X = i)/f , j ∈ [lowerX (j), upperX (j)].
This transformation meets E[X ] = f ∗ E[Y].
III. S YSTEM
MODEL
The model of real-time system assumed in this work can
be defined as “soft” real-time and was originally devised in
the scope of multimedia transmission and processing, although
the proposed analysis can be also applied to some other areas,
40
Media
source
Buffer
Processor
Buffer Processor
Playback
Buffer
concept of system stability. A task set executing on a processor
will be said stable if the pending execution time of every task
is bounded. The pending execution time of a task is defined
as the amount of its computation time that cannot be executed
by the end of a hyperperiod and hence, has to be enqueued to
the next hyperperiod. This time corresponds to the processing
time of data stored in the buffer. Stability will be the soft
real-time required in the proposed system.
Media
Player
A. Problem statement
Fig. 3: End-to-end processing
specially those using tasks with highly variable processing
times.
In multimedia processing, computations are usually datadriven. Data are produced continuously by media sources and
at highly variable rates. Computations are structured according
to the pipeline paradigm (see figure 3) a processor reads data,
processes them, and sends processed data to the next processor.
The chain of processors forms an end-to-end system. Processors are interconnected through buffers: a processor output is
enqueued into the next processor input buffer. This model is
mostly assumed by streaming applications. When the last stage
in a data stream is a media player, its input buffer is usually
known as the playback buffer. One of the main problems of
streaming is to avoid situations where the playback buffer is
empty, because it causes disruptions in media playing. An
empty playback buffer is mainly due to long and unpredictable
delays in previous stages buffers. This occurs often during
congestions, when some buffer lengths can become too long
and may even overflow, causing data losses. Some other times,
data losses are due to data dropping in routers to alleviate
congestion.
In the proposed system model assumed, every processor
executes a set of tasks X = {Xi : i = 1 . . . m}, to process
the corresponding data streams. Tasks are periodic and are
characterised by their computation time and their period.
One of the main differences of the proposed model
with the classical real-time system model is that execution
times are not characterised by a constant (WCET), but discrete statistical variables, whose pmfs will be denoted as
X = {Xi : i = 1 . . . m}.
Multimedia processing tasks, as most real-time tasks, are
periodic. Task computation times will be always referred to
the hyperperiod. This is because the proposed method analyses
the system’s period or hyperperiod, defined as the m.c.m.
(minimum common multiple) of all task periods. These periods
are usually related to application parameters like the frame
frequency, or audio sampling period. We assume that there is
certain flexibility in adjusting a task period (to some multiple
or submultiple of the “natural” period. The goal is to make
the hyperperiod as small as possible.
Unlike hard real-time systems, in systems with highly
variable execution times, temporal requirements do not involve
strict deadlines. This requirement will be replaced by the
The problem addressed in this work is an end-to-end system
as shown in figure 3. The analysis first considers a single node
of the system. This node processes a set of data streams as
inputs (see figure 4). Data are supplied trough buffers of finite
or infinite capacity. The goals of the analysis are studying
system stability, task response times and some other QoS
parameters as data loss rates.
The amount of data generated by a data source is, in
general, time dependent. According to this, the execution
times of the task set executing on a processor will be also
a set of time dependent functions, and will be referred as:
A(t) = {Ai (t), i : 1 . . . m}.
We will start our discussion by expressing these functions
as continuous time dependent functions. They will be later
transformed into discrete statistical variables.
The service time for a particular data source, denoted as
Si (t) will be defined as the processor bandwidth allocated to
data stream i. In the general case (figure 4), service time Si (t)
is a function of the set of all traffic sources and the processor
scheduling algorithm ̥: Si (t) = ̥(A(t)), i : 1 . . . n.
However, this problem can be simplified by requiring
strong workload isolation: every data source is assigned a
constant bandwidth that is time and workload independent:
Si (t) = ri , ∀t. This hypothesis allows to decompose the problem into n independent problems as the one illustrated in 5.
This assumption is ideal because it only occurs in the fluid
model of GPS used in networks calculus [15]. Some other
algorithms exhibit a weaker form of this property that will
be referred simply as workload isolation: the average of the
service time over a time interval (for example the hyperperiod)
is constant. This is true for some algorithms based on the
idea of constant bandwidth ( [16], [17]). The general case
where the service time cannot be decoupled from the rest of
data sources will be analysed using the, so called, interference
method. This kind of analysis has been used in [12] for an
statistical study RM algorithm and in this paper for the case
of the GPS algorithm.
IV. A NALYSIS
METHOD BASED ON WORKLOAD ISOLATION
This section analyses system stability under the hypothesis
of workload isolation in an algorithm independent fashion.
Later sections develop an interference based method for the
case of the GPS algorithm used in media processing and
transmission.
System stability requires to compute the pending computation time for a particular data stream in a given processor at
time t,. This time corresponds to the required time to process
41
Q0
A0
S0
Q1
A1
..
. Qm
S1
Sm
1
2
3
4
5
6
7
8
Processor
Am
Fig. 4: General scenario
Algorithm HBSP(A,r,b)
A: arrival rate
r: service rate
b: buffer length
Q(0) = [1]
k=0
do
k =k+1
I(k) = Q(k − 1) A
Q(k) = Φbr (I(k))
while E[Q(k)] − E[Q(k − 1)] > ε
return Q(k)
Fig. 6: HBSP algorithm
Q
r
A
where operator ⊗ stands for the standard statistical convolution and the bound operator Φba () is defined as the statistical
generalisation of the previously defined φba () operator:
Processor
Fig. 5: Workload isolation scenario
the data stored in the buffer up to that time. Expressing the data
arrival rate A(t) as a time dependent function and assuming
that the buffer has a finite capacity l, the buffer length Q(t)
of a data stream at a given time t can be expressed as:
Z t
Q(t) =
φl0 (A(t) − S(t)) dt
(1)
0
where, S(t) is the processor service rate and operator φ
limits buffer lengths so they cannot be negative and cannot
overflow value l either. This operator is defined as follows:

 0, for x < a
x, for a ≤ x < b + a
φba (x) =
(2)

b, for x ≥ b + a
This expression can be rewritten as a recurrence equation
(known as Lindley’s equation [18]) assuming a discrete time
space T = t0 , t1 , t2 , . . . where tk = k × τ for some integer
value k and τ being the duration of the hyperperiod. This
way, functions A(t), S(t) and Q(t) can be replaced by discrete
functions that will be expressed as a function of discrete time
k:
Q(k) = φl0 (Q(k − 1) + A(k) − S(k))
In this expression A(k) is the cumulative number of bits
that the date source puts into the buffer during the k-th
hyperperiod. Analogously the service rate S(k) is the number
of cumulative bits that the processor removes from the buffer
during the same hyperperiod.
Assuming workload isolation, the service time for each data
source can be expressed as a constant r:
Q(k) = φl0 (Q(k − 1) + A(k) − r) = φlr (Q(k − 1) + A(k))
The foundation of the histogram method basically consists
of eliminating the time dependence of A(k) of the previous
expression by expressing it as a discrete random variable with
pmf A = [ak , k = 0 . . . n]. This way, previous equation can
be expressed transformed into a statistical equation:
Q(k) = Φlr (Q(k − 1) ⊗ A)
(3)
Φba (X ) = Φba ([x0 , x1 · · · xn ]) =
a
n
X
X
=[
xi , xa+1 , xa+2 , . . . , xb+a−1 ,
xi ]
i=0
(4)
i=b+a
Notation Φa (X ) will be used as an equivalent for Φ∞
a (X ).
Note that now Q(k) has become a discrete time stochastic
process. The evolution in time of this stochastic process
depends on the mean value of A.
When M [A] < r, then the buffer length is always zero
because it is easy to prove, from its definition, that Φr (A)
is zero in this case. The case when M [A] ≥ r is the most
interesting one, because unlike worst-case kind of analyses,
statistical analyses allow arrival rates to exceed occasionally
processor capacity during transitory overloads and still have
a stable system depending on E[A]. Two cases must be
considered: infinite and finite buffer. In the infinite buffer case,
the system converges to a steady-state pmf iff E[A] ≤ r. With
a finite buffer, the process always converges because it is
always bounded by operator Φba ().
In figure 6 is an iterative algorithm to calculate the steady
state solution of the buffer length problem of equation 3. It
will be referenced as the HBSP (Histogram Buffer Stochastic
Process) algorithm .
V. A NALYSIS
METHOD BASED ON INTERFERENCES
The analysis method of previous sections assumes workload
isolation, i.e., every data source has a constant service rate.
The interference method assumes that, in general, the service
rate for each workload depends, in general, of the processor
bandwidth r, the task set execution times X , and the scheduling algorithm ̥: Si (t) = ̥(A(t)). This section develops a
method when this hypothesis does not hold using the GPS
family of algorithms. A similar study for the RM algorithm
was developed in [11].
The interference of task set X on task Ai using scheduling
algorithm ̥, referred as ̥(X , Ai ), is defined as the pmf of
the response time of task Ai when the processor executes task
set X using scheduling algorithm ̥. In other words, it is the
time that the processor devotes to process Ai when it is also
executing X .
42
The calculus of ̥(X , Ai ) is ̥-dependent and may have a
high computational cost for some scheduling algorithms. This
is the reason why workload isolation simplifies the analysis of
real-time systems.
System stability requires to compute the pending execution
time in a hyperperiod. This time corresponds to the processing
of data stored in the buffer, i.e. data that could not be processed
during a hyperperiod. The buffer length in this situation can
be obtained by adapting equation 3. In this equation, the
expression Q(k − 1) ⊗ A is the required processing time in
hyperperiod k assuming a virtual processor that only executes
stream A (workload isolation). When workload isolation does
not hold, it has to be replaced by the time that the processor
devotes to process Qi (k − 1) ⊗ Ai :
Qi (k) = Φlr (̥(X (k − 1),
Qi (k − 1) ⊗ Ai ))
(5)
1
0.14
Xo
X1
X2
Wo
0.8
0.12
0.1
0.6
0.08
0.06
0.4
0.04
0.2
0.02
0
0
1
2
3
4
5
6
7
0
0
8
(a) Task set and bandwidth ω0
0.14
0.14
0.12
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
where X (k − 1) is the interfering task set for hyperperiod k.
After the first iteration, the task set that interferes Ai may have
changed. This is because workloads in the new hyperperiod
have to be convolutioned with the corresponding pending
workload. In other words, in each iteration k the interfering
workload has to be recalculated as:
0.14
0.14
0.12
0.12
X (0) = X
X (k) = {Qi (k − 1) ⊗ Ai ,
i = 1 · · · m}
(6)
So now, system stability becomes more complex to compute
andP
to prove. In this case, the condition for the convergence
is: m
i=1 E[Xi ] ≤ r although the proof has been omitted.
0
0
2
4
6
8
10
12
14
0
0
(c) Interference in interval (2,6)
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
2
4
6
8
10
12
(e) Resulting interference in (0,8)
4
6
8
10
12
14
2
4
6
8
10
12
14
(d) Interference in interval (6,8)
0.1
0
0
2
(b) Interference in interval (0,2)
14
0
0
2
4
6
8
10
12
14
(f) Result of simulation
Fig. 7: Example of the interferences method
A. Analysis of response times of the GPS algorithm
The GPS algorithm [15] is based on sharing the processor
bandwidth among a set of tasks in a weighted way. Each
task Xi is assigned a percentage of the processor bandwidth
that will be referred as ωi . When some task finishes its
execution, its processor bandwidth is reassigned to the rest
of tasks proportionally to its percentage, so the relation ωi /ωj
is preserved for every pair of tasks Xi , Xj . The GPS theory
assumes an ideal fluid model in which the processor bandwidth
can be splitted among task for any infinitesimal interval of
time. This ideal behaviour has been approximated by some
algorithms [16].
This section presents the analysis of the response time of
the GPS algorithm using histograms as an example of the
interferences method. The analysis will be first introduced
through an example, and then generalised in algorithmical
form.
The example calculates the response time of a task X0 as
a consequence of the interferences of tasks {X1 , X2 }. Task
X0 has an execution time that is uniformly distributed in the
interval [0, 8]. Its histogram is shown in figure 7a in solid
bars. In a first approximation to the problem, we will assume
that interfering tasks X1 and X2 both have deterministic
execution times of 4 units and 2 units respectively, also
shown in figure 7a. This set of tasks is executed using a
GPS algorithm with the following bandwidth percentages:
ω0 = 3/10, ω1 = 6/10 and ω2 = 1/10, so ω0 = 12 ω1 ,
ω0 = 3 ∗ ω2 , and ω0 + ω1 + ω2 = 1.
The first step to calculate the response time pmf of X0 is to
find out the order and the time by which deterministic tasks
finish their execution. In the above example, it is easy to see
that X1 will finish first by time 4/ω1 = 6.67, so up to this time
it has been interfering task X0 . By that time, X0 has executed
2 computation units because ω0 = 12 ω1 . Once X1 is finished,
its bandwidth is reassigned to X0 and X2 in such a way that
ω0 /ω2 is preserved. So the new bandwidth percentage of X0 is
3
6
now ω0 = 10
/(1 − 10
) = 43 and ω2 = 14 . Task X2 will finish
when it completes its 2 execution units. Since the relation
ω0
ω2 = 3 is always preserved, X2 will interfere task X0 until
0
X0 executes ω
ω2 ∗ 2 = 6 time units. Henceforth X0 will have
the full processor bandwidth so ω0 = 1. The different values
of ω0 with respect the execution times of X0 are represented
with a dashed line in figure 7a.
Once the evolution of ω0 has been established, the pmf
of the execution time of X0 is calculated by intervals with
the same ω0 . In these case, three intervals are considered:
[0, 2[, [2, 6[ and [6, 8[.
In interval [2, 6[, the value of ω0 is 3/10. If the execution
time of X0 belongs to this interval, the task will be executed
1
10
ω0 = 3 = 3.33 times slower than if it had the full processor
bandwidth, so the execution times will have to be multiplied by
this factor. This means that the response time will be obtained
by scaling up the the histogram X-axis in this interval by a
factor of 10
3 . This scaling is done via linear interpolation.
The resulting histogram is shown in figure 7b. The proce-
43
dure can be expressed as:
interval[] = getInterval(0,2)= [ 18 , 18 ]
1
1
result[] = scale( 10
,interval) = [ 28
, 28
,
3
1
, 1, 1, 1, 1]
28 28 28 28 28
1
2
3
4
5
6
7
8
9
10
In interval [2, 6[, the value of ω0 is 3/4. If the execution
time of X0 is in this interval, the histogram interval will have
to be scaled up by a factor of ω10 = 43 . But the calculus of
the response time in this interval needs to take into account
the execution time of previously finished tasks. Task X1 is
finished just before this interval start, so the scaled interval
must be shifted to the right 7 positions (that is the round up
of 6.67) In statistical terms, this means the convolution of the
scaled interval with a finished histogram that have 7 zeros
and a one in position 8 (the one function). The result of this
convolution is shown in figure 7c. Algorithmically:
interval[] = getInterval(2,6,X0 )= [ 18 , 18 , 81 , 18 ]
1
1
1
1
1
sint[] = scale( 34 ,interval) = [ 10
, 10
, 10
, 10
, 10
]
finished[] = one(7) = [0, 0, 0, 0, 0, 0, 0, 1]
csint[] = sint[] ⊗ finished[] = [0, 0, 0, 0, 0, 0, 0, 0,
result[]=sum([result[], csint[])
Algorithm result[] = detGP SInterf erence(A,DX[])
A: struct of the interfered task
DX[]: array of struct of deterministic tasks
result: pmf of the interfered task
[W[],T[],F[]] = getU tilisations(A,DX[]);
result = [];
finished = [1];
for i=1:length(T[])-1
interval[] = getInterval(T(i),T(i+1),A.ρ[]);
sint[] = scale(1/W(i),interval);
finished[] = one(F(i));
csint[] = sint[] ⊗ finished[];
result[]=sum(result[],csint[]);
end
Fig. 8: Algorithm for deterministic interferences with GPS.
of the execution times for each interval are scaled up by a
factor of 1/ω and convolutioned with the execution time of
previously finished deterministic tasks. Finally, the probability
values of each interval are summed element by element using
procedure sum.
1
, 1, 1, 1, 1]
10 10 10 10 10
In interval [6, 8[, the value of ω0 is 1. The histogram in this
interval will have to be scaled up by a factor of 1. This value
has to be convolutioned with the execution time of previously
finished tasks, which in this case includes X1 and X2 (12).
The result of this convolution is shown in figure 7d. The
particularisation of the algorithm for this case is:
interval[] = getInterval(6,8,X0 )= [ 18 , 18 ]
sint[] = scale(1,interval) = [ 18 , 18 ]
finished[] = one(12)
csint[] = sint[] ⊗ finished[] =
= [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18 , 18 ]
result[]= sum([result[], csint[]]) =
1
1
1
1
1
1
1
[ 28
, 28
, 28
, 28
, 28
, 28
, 28
, 0.1, 0.1, 0.1, 0.1, 0.1, 18 , 18 ]
Variable result in previous algorithm accumulates the
probabilities for each interval. The resulting histogram is
obtained by summing up the probabilities of all intervals.
This histogram is shown in figure 7e. This results almost
match perfectly the results of a packet by packet simulation
of figure 7f, checking this way the algorithm correctness.
The general algorithm for computing the interference of a
deterministic task set is shown in figure 8 and can be easily
understood once the previous example has been introduced.
The task whose response time has to be computed is represented as A, with A being a struct with two fields: A.ω which
is the nominal processor utilisation of A and A.ρ[] which is
an array with the pmf of the execution time of A. The set of
deterministic tasks that interfere A is denoted by the array of
structs DX[]. Each element i of the array is an structure with
two fields: DX(i).ω which is the nominal processor utilisation
of DX(i) and DX(i).τ which is the (deterministic) execution
time of DX(i).
The algorithm starts by invoking procedure getU tilisations
to calculate the values of the utilisation percentage ω and their
corresponding intervals. It also calculates the finish time of the
3 3
interval. In the previous example: W[]=[ 10
, 4 ,1], T[]=[0,2,6,8],
F[]=[0,6.67,12].
Once these values and intervals of ω have been obtained,
the algorithm performs as detailed in the example: the values
Algorithm getU tilisations (figure 9) calculates the values
of the utilisation percentage ω of a task A as a consequence
of the interference of a deterministic task set DX, the intervals
of the pmf of A of these values, and the amount of finished
deterministic task computation in each interval. It starts by
scaling the execution times of deterministic tasks to the value
of A, according to their utilisation factors: if, for example, task
i has double utilisation factor than A, then its execution time
should be divided by 2. Then, the interfering tasks are sorted
according to their execution time. Value ωtot is initialised to
the sum of all utilisation factors (usually 1). The heart of the
algorithm is the loop which starts in line 8. The loop filters
repeated values of time in array T[]. In each pass, ωtot is
updated by subtracting the ω of the finished tasks and the new
value of A.ω is readjusted according to the GPS algorithm as
A.ω/ωtot .
Once the interference of a set of deterministic tasks has been
calculated, the algorithm for computing the interference of a
set of tasks (not necessarily deterministic) can be obtained
using the previous one. The general algorithm is presented
in figure 10. The general idea is to decompose each non
deterministic task as a set of deterministic tasks. For example,
a task with a pmf [0, 0.3, 0.7] would be decomposed into two
tasks with pmf’s [0, 0.3] and [0, 0, 0.7]. The corresponding
pmf’s for each case should have to be weighted summed
according to their probabilities: 0.3 and 0.7. The algorithm of
figure 10 computes all the combinations of deterministic task
sets that can be formed with an array of non deterministic
tasks X[]. This is a double array DXA[][] where DXA(i)[]
is a deterministic task set characterised by three parameters:
DX(i).ω is the utilisation factor, DX(i).τ is the deterministic
execution time, and DX(i).ρ is the probability of this execution
time. The algorithm detGP SInterf erence is invoked for
each deterministic task set DXA(i)[], and the results for each
case are summed with weight DX(i).ρ.
44
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Algorithm [W[],T[],F[]] = getU tilisations(A,DX[])
DX[]: array of struct of deterministic tasks
W[]: array of utilisations
T[]: array of times of utilisation changes
F[]: array of finished times
for i=1:length(DX[])
DX(i).τ = round(DX(i).τ *(A.ω/DX(i).ω));
end
DX[] = sortByT
P ime(DX[]);
ωtot = A.ω +
X(i).ω
T[]=[0]; W[]=[A.ω/ωtot];
F[]=[0]; f=0;
for i=1:length(DX[])
if ( DX(i).τ ≤ (length(A)))
ωtot =ωtot - DX(i).ω;
if ( i==1 | (i>1 & (DX(i).τ 6= DX(i-1).τ )))
T[]=[T, DX(i).τ ];
W[]=[W, A.ω/ωtot];
else
W(end)=A.ω/ωtot;
end
end
end
if ( T(length(T)) < length(A) )
T=[T,length(A)];
end
for i=2:length(T[])
F[]=[F, F(i-1)+(T(i)-T(i-1))/W(i-1)];
end
Fig. 9: Algorithm for determining the intervals of ω
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Algorithm result = GP SInterf erence(A,X[])
X[]: array of struct of the interfering tasks
result: pmf of the interfered task
result=[0];
for i=1:length(X[])
DXA[][] =combinations(DXA[][], X(i));
end
[l,w] = size(DXA[][]);
for i=1:l
DR(i)[]=detGP SInterf erence(A,DXA(i)[]);
q=1;
for j=1:length(X[])
q=q*DXA(i)(j).ρ;
end
WDR(i)[]=sum(WDR(i)[], q*DR(i)[]);
end
for i=1:l
result[]=sum(result[], WDR(i)[]);
end
Fig. 10: Algorithm for computing interferences using GPS
VI. C ONCLUSIONS
This paper has presented a stochastic method for calculating
response times in systems where the average load can be
occasionally higher than one during systems overloads. System
stability is proposed as a soft real-time requirement alternative
to the deadline requirement. It has been shown that system stability can be studied in an algorithm independent fashion under
the hypothesis of workload isolation. This hypothesis holds
approximately for some scheduling algorithms. A method
based on calculating task interferences has been developed
for the GPS family of algorithms and it has been showed to
have a much higher computational complexity. An interesting
issue is how different are the solutions obtained with assuming
workload isolation and the interferences method, but this has
been left as future work. The developed method is applicable
in the fields of real-time systems and network calculus.
R EFERENCES
[1] J. W. Liu, Real-Time Systems, Vol. 1, Pearson, 2000.
[2] L. Abeni, G. Buttazzo, Qos guarantee using probabilistic deadlines, in:
Proc. of the Euromicro Confererence on Real-Time Systems, 1999.
[3] D. Ferrari, D. Verma, A scheme for real-time channel establishment in
wide-area networks, IEEE Journal of Selected Areas Communication
8 (2) (1990) 368–379.
[4] Z. L. Zhang, D. Towsley, J. Kurose, Statistical analysis of the generalized
processor sharing scheduling discipline, IEEE Journal of Selected Areas
Communication 14 (6) (1995) 1071–1080.
[5] W.E.Leland, M.S.Taqqu, W.Willinger, D.V.Wilson, On the self-similar
nature of ethernet traffic (extended version), IEEE/ACM Transactions
on Networking 2 (1) (1994) 1–15.
[6] M. Zukerman, T. D. Neame, R. G. Addie, Internet traffic modeling and
future technology implications, in: IEEE Infocom, 2003.
[7] R. G. Addie, M. Zukerman, T. D. Neame, Broadband traffic modeling:
Simple solutions to hard problems, IEEE Communications Magazine
(1998) 88–95.
[8] T. Tia, Z. Deng, M. Shankar, M. Storch, J. Sun, L. Wu, J. Liu,
Probabilistic performance guarantee for real-time tasks with varying
computation times.
[9] M. Gardner, Probabilistic analysis and scheduling of critical soft realtime systems., Phd thesis, University of Illinois, Urbana-Champaign.
(1999).
[10] J. Lehoczky, Real-time queueing network theory, in: Proc. of the 18th
IEEE Real-Time Systems Symposium, 1997, p. 5867.
[11] J. Dı̀az, D. Garcı̀a, K. Kim, C. Lee, L. L. Bello, J. López, S. L.Min,
O.Mirabella, Stochastic analysis of periodic real-time systems, in: Proc.
of the 23rd IEEE Real-Time Systems Symposium,, 2002, p. 289300.
[12] J. Dı̀az, Tecnicas estocasticas para el calculo del tiempo de respuesta
en sistemas de tiempo real, Phd thesis, Universidad de Oviedo, Spain
(2003).
[13] P. Skelly, M. Schwartz, S. Dixit, A histogram-based model for video
traffic behavior in an atm multiplexer, IEEE/ACM Transactions on
Networking 1 (4) (1993) 446–459.
[14] S.-K. Kweon, K. G. Shin, Real-time transport of mpeg video with a
statistically guaranteed loss ratio in atm networks, IEEE Transactions In
Parallel and Distributed Computing 12 (4) (2001) 387–403.
[15] A.K.Parekh, R.G.Gallager, A generalized processor sharing approach
to flow control in integrated services networks: The single node case,
IEEE/ACM Transactions on Networking 1 (3) (1993) 344–357.
[16] J. Bennett, H. Zhang, Wf2q: Worst-case fair weighted fair queueing, in:
Proc. IEEE INFOCOM, 1996, p. 120128.
[17] M. Spuri, G. C. Buttazzo, Scheduling aperiodic tasks in dynamic priority
systems, Real-Time Systems 10 (2) (1996) 179–210.
[18] L. Kleinrock, Queueing Systems. Volume 2: Computer Applications,
Wiley-Interscience, New York, 1976.
45
D-P domain feasibility region in dynamic priority systems
Patricia Balbastre, Ismael Ripoll and Alfons Crespo∗
Department of Computer Engineering
Technical University of Valencia, Spain
{patricia,iripoll, alfons}@disca.upv.es
Abstract
In the design of a real-time application it is fundamental to know how a change in the task parameters would
affect the feasibility of the system. Relaxing the classical
assumptions on static task sets with fixed periods and deadlines can give higher resource utilisation and better performance. But the changes on task parameters have to be
done always maintaining feasibility. In practice, period and
deadline modifications are only necessary on single tasks.
Our work focuses on finding the feasibility region of deadlines and periods (called D-P feasibility region) for a single
task in the context of dynamic, uniprocessor scheduling of
hard real-time systems. This way, designers can choose the
optimal deadline and period pairs that best fit application
requirements. We provide an exact and an approximated
algorithm to calculate this region. We will show that the
approximated solution is very close to the exact one and it
takes considerably less time.
1 Introduction
Real-time systems are often designed using a set of periodic activities running on top of a real-time operating system. Timing parameters, such as computation times, periods and deadlines, are susceptible to adjustment during the
design phase of the system. In practice, deadlines and periods are chosen to satisfy certain performance requirements.
For hard real-time applications, the feasibility of the schedule has to be guaranteed a priory.
In the context of real-time task scheduling of flexible
applications, Earliest Deadline First (EDF) seems more attractive than Rate Monotonic (RM), since it makes a better
use of CPU resources. Traditionally, many of the works on
EDF scheduling are focused on tasks with deadlines equal
∗ This work was partially granted by the Spanish Government Research Office (CICYT) under grant DPI2005-09327-C02-02 and TIN200508665-C03 and by the European Unions Sixth Framework Programme
(FP6/2005/IST/5-034026)
to periods. This assumption greatly simplifies the feasibility
analysis, but limits the applicability of the results.
In control applications, setting deadlines less than periods allows the reduction of jitter and delay. Variable delays
and jitter may degrade the performance of the system even
jeopardise its stability [25, 27]. In real-time databases, temporal freshness of data is assured and workload minimised
by assigning a deadline lower than the transaction’s period,
such as in the More-Less model proposed in [28].
Flexible applications, such as in multimedia systems,
need to execute periodic activities but task periods do not
need to be as rigid as in control applications. Moreover,
control applications can benefit from executing at different rates for different operating conditions. Relaxing the
classical assumptions on static task sets with fixed periods
and deadlines can give higher resource utilisation and better control performance. One of the typical strategies is to
adapt on-line periods, computation times and deadlines in
a way that optimises overall control performance [10, 19].
But the changes on task parameters have to be done always
maintaining feasibility.
Thus, it is necessary to provide a feasibility region to the
control designers. This feasibility region can be obtained by
means of a sensitivity analysis, which is a generalisation of
feasibility analysis [8]. Sensitivity analysis is a promising
approach to deal with design uncertainties and allows the
system designer to keep track of the flexibility of the system. In the literature, sensitivity analysis has been applied
considering how a change in a task affects the rest of the
tasks of the system. We are interested in how a change in
a task parameter affects the rest of parameters of the same
task.
1.1
Motivating example
Let us consider the task set with parameters listed in Table 1 (see Section 2 for notation details). The example illustrates a task set where deadlines are lower than periods,
which is used in control applications to limit the jitter and
latency of control tasks. The execution chronogram is de-
46
picted in Figure 1(a). As it can be seen in the chronogram,
CPU is not fully utilised and task periods and deadlines can
be reduced. Indeed generally, in a real-time control application, the shorter the period, the better the performance [24].
If T1 is a control task but T2 is not, then we are interested in
the set of feasible deadline and period assignments for task
T1 .
Table 1. Two tasks example
C D P
T1 2 6 10
T2 5 8 12
(a) Two periodic tasks with D < P
(b) T1 = (2,2,7)
(c) T1 = (2,4,5)
Figure 1. Execution chronogram of two periodic tasks with different period and deadline
assignment
Figures 1(b) and 1(c) show two alternatives of period and
deadline assignment for task T1 while maintaining feasibility. In the first case, the goal is to reduce the task deadline as
much as possible, and then the period has been reduced. In
the second case, the period has been shortened until its minimum value and afterwards the deadline has been reduced.
In both cases the task set is in the feasibility borderline although the order in which the reduction is made (first period
or deadline) leads to different task parameters.
Which alternative is better from the application point of
view? It depends on the kind of application. Even in control
applications, assigning a short period is not always better
than assigning a shorter deadline. In Figure 1(b), by assigning the minimum possible deadline to T1 allows a very low
jitter but, on the other hand, assigning the shortest possible
period to T1 as in Figure 1(c) can lead to a better control
performance. Anyway, it depends on the dynamics of the
controlled system. As stated in [1], the effect of delays is
not the same in any control loop, meaning that, sometimes
the best alternative is the one depicted in Figure 1(b) and
sometimes is the one showed in Figure 1(c). Therefore, it is
a designer’s decision.
To integrate the design phase with the real-time implementation, we will provide the range of feasible deadlines
and periods for a task so that system designers can choose
the appropriated values to quickly adapt to the system dynamics or to improve the system performance.
1.2
Related work
Sensitivity analysis has focused on permissible changes
on tasks WCET, mainly because this has been applied to
fault tolerance design. In this sense, [15, 26, 22] define the
Critical scaling factor, as the largest possible scaling factor for task computation times while the task set to remain
schedulable. These works assume Rate Monotonic priority
assignment and deadlines equal to periods.
Sensitivity analysis for computation times using EDF
scheduling has been performed by Balbastre et al. in [2].
Moreover, the Optional Computation Window (OCW) is defined as the possibility that a task can execute in n activations in a window of m consecutive invocations with an increased computation time than the initially assigned. Deadlines less than or equal to periods are assumed.
The analysis tool MAST[21] includes a slack calculation
tool that calculate the system, processing resource or transaction slacks by repeating the analysis in a binary search
algorithm in which computation times are successively increased or decreased. This implies to execute a feasibility
test in each iteration, making the computational complexity
very high.
Sensitivity analysis for task periods for dynamic priorities can be found in [9] by Buttazzo et al. In this paper, periods are modeled as springs with given elastic coefficients
and minimum lengths. Requested variations in task execution rates or overload conditions are managed by changing
the rates based on the spring elastic coefficients. This work
assumes deadlines equal to periods and EDF scheduling.
In [24] tasks periods are chosen to minimise a cost function, preserving feasibility in Rate Monotonic. A similar
problem is addressed in [7, 8] using the exact feasibility region in the domain of task frequencies. The rate analysis
in the domain of the embedded systems is analysed in [20]
and for multimedia streams in [18]. However, we were concerned with the rate and deadline analysis for hard real-time
systems.
Regarding deadline sensitivity analysis, there are some
papers which assign new deadlines to periodic tasks in order to achieve a secondary objective. Cervin et al. [11] cal-
culate new deadlines for control tasks in order to guarantee
close-loop stability of real-time control systems. Baruah
et al. [4] developed two methods to minimise output jitter
of tasks by assigning shorter relative deadlines. The first
method does not achieve the minimum deadline but it has
polynomial time complexity. The second method calculates
the minimum deadline at the expense of a higher computational complexity. Finding the minimum deadline of a periodic task has been also independently addressed by Balbastre et al. [3] and by Hoang et al. [14]. The feasibility
region for tasks deadlines in EDF when deadlines are less
than periods is formally presented by Bini and Buttazzo [6]
however, it does not present any algorithm to derive this region. This is mainly due to the intrinsic complexity of the
problem.
1.3
Contributions and outline
The main contribution of this paper is the characterisation of the feasibility region in the domain of deadlines and
periods (D-P feasibility region) for one task. We will provide both the exact feasibility region and an approximated
solution with reduced complexity. To calculate this region,
it is necessary to calculate the minimum deadline and period
of a task. The minimum deadline calculation is proposed
in [3]. As far as the authors are aware, there is no work on
finding the minimum period of a periodic task in EDF when
deadlines are different than periods. This is another contribution of our work. Our final goal is to present efficient and
applicable algorithms that can be directly usable by system
designers.
The remainder of this paper is organised as follows: Section 2 introduces the computational model and the assumptions used. Section 3 presents the problem definition. To
calculate the feasibility region we need to compute the minimum deadline and period of a task, which is detailed in
sections 4 and 5, respectively. In Section 6, the characterisation of the D-P feasibility region is detailed. We also have
implemented the proposed algorithms, and we have conducted some simulations in order to assess the validity and
effectiveness of our proposals. The results are presented in
Section 7. Finally, in Section 8 some conclusions are highlighted and future lines of research are discussed.
2 System model and notation
Let τ = {T1 , ..., Tn } be a periodic task system scheduled
under EDF. Each task Ti ∈ τ has the following temporal parameters Ti = (Ci , Di , Pi ). Where Pi is the period; Di is the
deadline relative to the start time; and Ci is the worst case
execution time, Ci ≤ Di . The total utilisation factor of τ
is expressed as Uτ = ∑ni=1 CPii . Each task Ti produces a sequence of jobs Jik (k = 0, 1, 2, ..) that must be completed by
47
its absolute deadline dik = kPi + Di .
Furthermore, the following definitions are needed:
Definition 1 The period improvement β j for a task T j ∈ τ
′
is computed as the ratio between the shortened period (Pj )
and the original period, that is:
′
βj = 1 −
Pj
Pj
Thus, β j = 0 means that no reduction is achieved, whereas
β j → 1 means that the maximum possible period reduction
was achieved.
j
k
i
Definition 2 [5] Let Hτ (t) = ∑ni=1 Ci t+PPi −D
. It denotes
i
the maximum cumulative execution time requested by jobs
of τ whose absolute deadlines are less than or equal to t.
2.1
EDF feasibility when Di ≤ Pi
The Earliest Deadline First scheduling algorithm was
first described in [17], and it was shown to be optimal in
[12]. Feasibility test for EDF (when Di < Pi ) consists of
checking that ∀t the inequality Hτ (t) ≤ t holds. Leung
and Merrill showed in [16] that the schedulability condition must only be checked in the interval [0, P ), where P is
the hyper-period. Baruah et al. [5] proposed a more accurate feasibility test, with the same schedulability condition
but with a shorter interval to check. Ripoll et al. [23] presented a more accurate interval. Moreover, it proposes a
new schedulability test based on the idea of Initial Critical
Interval (ICI), that is, the first interval [0, Rτ ) in which there
is no idle time. Other authors refer to this interval as the
first busy period. These ideas are summarised in the next
theorem:
Theorem 1 [23] τ is feasible if and only if:
Hτ (t) ≤ t
∀t < Bτ
where
Bτ = min(Lτ , Rτ )
and
Lτ =
∑ni=1 Ci (1 −
1 − UT
Di
Pi )
In fact, there is no need to check that Hτ (t) ≤ t for all
time instants in the interval [0, Bτ ] to determine whether τ is
feasible or not, except in the designated scheduling points.
Definition 3 The set Sτ of scheduling points for τ is defined
by:
Sτ = {dik / 1 ≤ i ≤ n ,
l m
1 ≤ k ≤ K ∧ dik ≤ P }
where K = PPi denotes the number of activations of Ti
in [0, P ).
48
3 Problem statement
As formally defined in [8] the feasibility region in the
X-space for a task set τ is the set of values of X such that
τ is feasible. In our paper, the feasibility region will be redefined:
Pi
r
A
r
Definition 4 The feasibility region in the X1 , .., Xm -space
for a task Ti is the set of values of X1 , .., Xm of Ti that such
that τ is feasible.
According to the latter definition, the D-P feasibility region for Ti ∈ τ is the set of values (Di , Pi ) such that the task
set τ is feasible. For simplicity, we will refer to the points
(Di , Pi ) within the D-P feasibility region as feasible points.
It is important to note that all timing parameters of tasks
different from Ti remain unchanged.
To build this region, we need to know the shortest values
for deadlines and periods, that is, we need to compute the
minimum period and the minimum deadline of a periodic
task. These calculations provide us two points in the D-P
space, which are:
• Point A=(DA , PA ): DA results from first computing the
minimum deadline of Ti and then PA is derived from
calculating the minimum period for Ti with a deadline
equal to DA .
• Point B=(DB , PB ): PB results from first computing the
minimum period of Ti and then DB is derived from
calculating the minimum deadline for Ti with a period
equal to PB .
The resulting region is depicted in Figure 2. Note that
it is not possible that DB < DA , since DA is the minimum
deadline of Ti . For the same reason, it is not possible that
PA > PB . It is possible that A=B. All the points in the lined
area generate a feasible EDF schedule. We can state that the
exact feasibility region coincides with the region depicted
in Figure 2 in the continuous lines. However, the exact area
from A to B (dotted lines) is unknown. One of the contributions of this paper is to demonstrate that all the points
belonging to the line AB are feasible. Up to now, we only
know that the area depicted in Figure 2 is a sufficient but
not necessary feasibility region. Before studying the D-P
feasibility region in greater depth, we need to know how to
compute the minimum deadline and the minimum period of
a periodic task, in order to obtain points A and B. This is
the aim of the next two sections.
4 Minimum deadline of a periodic task
Finding the minimum deadline of a periodic task has
been independently addressed by Balbastre et al. in [3]
B
D=P
Di
Figure 2. D-P Feasibility region of Ti (first approach)
and by Hoang et al. in [14]. Although both algorithms are
optimal, in the sense that they obtain the minimum deadline and run in pseudo-polynomial time complexity, it is
shown in [3] that Balbastre’s algorithm (called Deadlinemin) is much faster than Hoang’s algorithm.
The Deadlinemin algorithm which minimises the deadline of a task Ti is presented in Listing 1. The algorithm
starts at t = di0 and computes the slack time in the immediate previous scheduling point of t. If the slack is greater than
Ci , the algorithm continues backwards to the next scheduling point; else a candidate for the minimum deadline is
stored. The procedure is repeated for all jobs of Ti whose
deadline is in the interval [0, Rτ ).
5 Minimum period of a periodic task
Up to know, there is not an efficient method to calculate
the minimum period of a periodic task in EDF when Di ≤
Pi . In the case of deadlines equal to periods, the minimum
period of a periodic task in EDF is easily deduced assigning
the remaining processor utilisation to the task we want to
minimise. However, the minimisation when deadlines are
less than periods is more complex. In this section, a method
to minimise a task period is formally presented.
Let’s assume that we add a task Ti to the feasible task set
τ. The only known parameter of Ti is its computation time
Ci . We want to calculate the minimum period of task Ti .
The following theorem provides a way to calculate it.
Theorem 2 Let τ be a feasible set of n periodic tasks. Let
τ′ = τ ∪ Ti with Ti = (Ci , Pi , Pi ). Then, τ′ is schedulable in
(0, I] if and only if:
Pi ≥ max (Pit )
0≤t≤I
l
m
Hτ (t)
t
where Pi = Ai (t) + Ci and Ai (t) =
t−Hτ (t)
Ci .
49
Listing 1. Deadlinemin algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function Deadlinemin(Ti ) is
deadline := Ci ;
l m
k := RPiτ ;
Dmin
:= 0;
i
for s in 0.. (k − 1) loop
t := min{(sPi ) + Di , Rτ };
deadline = Ci ;
while t > sPi +Ci loop
if ∃ j i ≤ j ≤ n / t=(⌈ Pt j ⌉ − 1)P j + D j then
if t − Hτ (t) < Ci then
deadline := Hτ (t) +Ci − sPi ;
break while;
end if ;
end if ;
t := t −1;
end while;
Dmin
:= max(Dmin
i
i , deadline);
end for ;
return Dmin
i ;
end Deadlinemin;
Proof Let t ∈ (0, I]. We have to demonstrate that the processor demand function of task set τ′ is still feasible, that
is, Hτ′ (t) ≤ t. Introducing a new task implies that the processor demand function increases. As τ′ = τ ∪ Ti , then the
following expression holds:
Hτ′ (t) = Hτ (t) + Ci
t
Pit
Substituting Pit by its value and taking into account that
Ai (t)Ci = t − Hτ (t), we have:



t
Hτ′ (t) = Hτ (t) + Ci  H (t)
+ Ci
≤ Hτ (t) + Ci  H (t)
+ Ci
τ
Ai (t)

t
τ
Ai (t)






tAi (t)
≤ Hτ (t) + Ci
Hτ (t) + Ai (t)Ci
tAi (t)
≤ Hτ (t) + Ci
Hτ (t) + t − Hτ (t)
tAi (t)
≤ Hτ (t) + Ci
t
≤ Hτ (t) + Ci Ai (t)
≤ Hτ (t) + t − Hτ (t)
≤t
Then, if Pi ≥ maxt (Pit ) any deadline is missed since
Hτ′ (t) ≤ t ∀t ∈ (0, I].
Theorem 2 provides the way to calculate off-line the
minimum period of a periodic task in EDF. Moreover, note
that we always use Hτ (t) instead Hτ′ (t), so there is no need
to re-calculate Hτ (t) function in every scheduling point.
The theorem allows to calculate the minimum period of a
new task incoming the system and it also allows to reduce
the period of an existing task.
The algorithm to calculate the minimum period of a periodic task (Periodmin algorithm) is presented in Listing 2.
Some comments about the pseudo-code are needed: To
assure system’s feasibility we must calculate Pit for every
scheduling point all over the feasibility bound Bτ′ (I ≤ Bτ′ ).
However, note that this bound depends on task periods,
thereby needing to re-calculate Bτ′ in every t ∈ Sτ′ . Let’s
call Bτt ′ the feasibility bound of τ′ when Pi = Pit .
We need to define the initial value of Pi0 to compute Bτ0′
in the first scheduling point. As we know that the period
of Ti will never be greater than the corresponding minimum
Ci
period of a system with D = P, we set Pi0 = 1−U
. This first
τ
0
value of Pi gives the initial checking interval of (0,Bτ0′ ]. In
the next iteration Pi1 can have the following values:
• Pi1 = Pi0 : The checking interval is still the same.
• Pi1 > Pi0 : Then, the new checking interval is (0, Bτ1′ ].
Note that, in this case, Bτ1′ < Bτ0′ .
Therefore, when executing the Periodmin algorithm the
checking interval (end_time=min(Bτt ′ )) is reduced more
and more at each iteration, making the algorithm highly efficient (see Section 7 for runtime simulations).
The algorithm presented in Listing 2 has pseudopolynomial time complexity in the size of the problem instance. However, as stated in [4], this bound is quite reasonable in practice whenever the system has a relative small
utilisation. Moreover, minor changes on some periods may
reduce drastically the complexity to polynomial time [13].
6 D-P feasibility region
Section 3 introduces a first approach to the D-P feasibility region. Once we know how to calculate the minimum deadline and period of a periodic task, we can calculate points A and B. Listing 3 shows the pseudocode of the
points A and B of the D-P feasibility region of Ti , which
takes pseudopolynomial time. The algorithm starts the calculation of point A with the original deadline and period of
Ti (Di , Pi ), and stores it in D0 and P0 to assign them again
to Ti in the calculation of point B.
50
Listing 2. Periodmin algorithm
1
2
3
4
5
6
7
8
function Periodmin(Ti ) is
Ci
Pi0 = 1−U
;
τ
0
Pi = Pi ;
τ′ = τ + Ti ;
end time := Bτ′ ;
t := 0;
while ( t ≤ end time) loop
∀t ∈ Sτ ;
9
Pit = ⌈ Aτ(t) +Ci ⌉;
10
11
12
13
14
15
16
H (t)
r
A
r
B
D=P
i
if Pit >Pi then
Pi = Pit ;
end time = Bτ′ ;
end if ;
end while;
return Pi ;
end Period min alg ;
Listing 3. A and B calculation
1
2
3
4
5
6
7
8
9
10
11
12
Pi
procedure Calculate AB(in Ti ; out A,B) is
P0 =Pi ;
D0 =Di ;
DA =Deadlinemin(Ti );
PA =Periodmin(Ti );
Pi =P0 ;
Di =D0 ;
PB =Periodmin(Ti );
DB =Deadlinemin(Ti );
A=(DA , PA );
B=(DB , PB );
end Calculate AB;
The exact D-P feasibility region can be larger than the
one depicted in Figure 2, which means that there exists feasible points inside the rectangle defined by vertices A and B.
Then, the exact D-P feasibility region is larger than the area
depicted in Figure 2. Moreover, we state that the exact feasibility region is larger than or equal to the area represented
in Figure 3.
In the remainder of the section we will demonstrate that
any point inside the lined area depicted in Figure 3 generates a feasible EDF schedule. Before, an interesting property will be introduced:
Property 1 Let T1 (C1 , D1 , P1 ) and T2 (C2 , D2 , P2 ), where
1
D1 ≤ D2 and P1 ≥ P2 . There is a positive value k = DP21 −D
−P2 ,
0 ≤ k ≤ ∞ such that:
(
(0, k] ⇒ d1k ≤ d2k
[k, ∞] ⇒ d1k ≥ d2k
Di
Figure 3. D-P Feasibility region of Ti (second
approach)
Proof We are interested in the k-th activation of T1 and T2
in which the scheduling point d1k is greater than or equal to
the scheduling point d2k . Substituting by the expression of
a scheduling point in the k-th activation:
d1k = kP1 + D1
and
d2k = kP2 + D2
d1k ≥ d2k
kP1 + D1 ≥ kP2 + D2
D2 − D1
k≥
P1 − P2
An example to graphically illustrate this property is depicted in Figure 4. In this example, from the second activation of T1 and T2 (k = 1), all the i-th scheduling points of T1
are greater than the i-th scheduling points of T2 .
Figure 4. Example of Property 1
The following theorem demonstrates that the points in
the line AB of Figure 3 are feasible points.
Theorem 3 Let X(DX , PX ), Y(DY , PY ) and Z(DZ , PZ ) three
aligned points in the D-P space, such that DX ≤ DZ ≤ DY
and PX ≥ PZ ≥ PY . If A and B are feasible points then E is
feasible too.
Proof If X and Y are feasible, then1 :
HX (t) ≤ t
1 To simplify the notation,
and
HY (t) ≤ t
HW (t) denotes the cumulative execution time
function of τ (Hτ (t)), where Ti = (Ci ,DW ,PW ) ∈ τ, being W any point in
the D-P space.
51
We want to demonstrate that HZ (t) ≤ t. To do so, we
will demonstrate that in any activation k the k-th scheduling
point of Z is either greater than or equal to the k-th scheduling point of X or Y. By Property 1 the following condition
holds:
(
−DZ
(0, DPXZ −P
] ⇒ dXk ≤ dZk
X
DZ −DY
[ PY −PZ , ∞] ⇒ dY k ≤ dZk
−DZ DZ −DY
The uncertainty occurs in the interval [ DPXZ −P
, PY −PZ ].
X
However, if the three points A, B and E are aligned in the DP space, the lines A-E and E-B have the same slope. Therefore:
DZ − DY
DX − DZ
=
PZ − PX
PY − PZ
and we know that the scheduling points of Z are always
greater than or equal to the scheduling points of X or Y of
the same activation.
Note that Hτ (t) is a non-decreasing step function that increases Ci units of time at every scheduling point of Ti ∈ τ.
As points X, Y and Z have the same computation time, the
steps of H are of the same amount. If all the scheduling
points of Z are greater than or equal to the scheduling points
of X or Y in the same activation, then it is easy to see that:
perimeter with Exact-DP-region algorithm needs the execution of Deadlinemin algorithm which takes pseudopolynomial time and it implies the re-calculation of the hyperperiod (P ) and Hτ function. Therefore, it is expected that
Approx-DP-region algorithm to be much more efficient than
Exact-DP-region algorithm. Indeed, in Section 7 we will
show that the Approx-DP-region is a tight approximation to
the exact D-P feasibility region and its runtime is considerably slower.
Listing 4. Approximated D-P region algorithm
1
2
3
procedure Approx − DP − region(Ti ) is
Calculate AB(T i, A,B);
A
slope = DPBB −D
−PA ;
4
5
6
7
8
b= B PAB −PAA B ;
for PE in PA ..PB loop
E −b
DE = Pslope
;
end loop;
end DP−region;
(P ∗D )−(P ∗D )
Listing 5. Exact D-P region algorithm
HZ (t) ≤ HX (t) ≤ t
or
HZ (t) ≤ HY (t) ≤ t
Theorem 3 states that if two points X and Y are feasible,
then any point Z in the line XY will also be feasible. Then,
any point E in the line AB of Figure 3 is feasible, since both
A and B are feasible points.
Listing 4 shows the algorithm to generate the approximated feasibility region, that is, the region depicted in Figure 3. We will refer to this algorithm as the Approx-DPregion algorithm. Once the points A and B have been calculated, all the points E in the region’s perimeter between
A and B are obtained using the characteristic equation of
the AB line. Simulations have shown that there may exist
points below AB line that are feasible points, meaning that
the approximated region is sufficient but no necessary. It
is an open issue to formally characterise the exact D-P feasibility region. Nevertheless, listing 5 shows a method to
obtain the exact region (called Exact-DP-region), by means
of calculating the intermediate points L between A and B
by calculating the minimum deadline for a given period between PA and PB . Another way to calculate the exact D-P
region is to calculate the minimum period for a given deadline between DA and DB .
It is important to note that the calculation of points between A and B in Approx-DP-region algorithm presents a
constant-time cost (O(1)), while the calculation of the exact
1
2
3
4
5
6
7
procedure Exact − DP − region(Ti ) is
Calculate AB(T i, A,B);
for PL in PA ..PB loop
Pi =PL ;
DL =Deadlinemin(Ti );
end loop;
end DP−region;
7 Experimental evaluation
This section describes a set of experiments to evaluate
the performance of the Periodmin and the DP-region algorithms. A number of tests have been run, specifically,
104 synthetic task sets have been generated for utilisations
varying from Uτ =0.5 to Uτ =0.95 in steps of 0.05, resulting
in 105 total simulations. Each task was generated by randomly choosing the task period as an integer between 10
and 250, the task utilisation between 0 and 1, and then randomly selecting the task computation in such a way that the
total system utilisation is approximately equal to the desired
load. To generate tasks with deadlines less than periods,
deadlines were randomly decreased by 40% to 90% of the
task period. The generated task is added to the task set only
if the set is schedulable.
We have evaluated the performance and the runtime of
Periodmin, Approx-DP-region and Exact-DP-region. To
52
Period improvement
Periodmin runtime
1
40
3 tasks
5 tasks
8 tasks
10 tasks
3 tasks
5 tasks
8 tasks
10 tasks
35
0.8
Time (milliseconds)
30
βj
0.6
0.4
25
20
15
10
0.2
5
0
0
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0.5
0.55
0.6
0.65
Utilization (%)
0.7
0.75
0.8
0.85
0.9
0.95
Utilization (%)
Figure 5. Period improvement of Periodmin algorithm
Figure 6. Periodmin algorithm runtime
Periodmin vs Binary search (5 tasks)
Periodmin
Binary search
14
7.1
12
10
Time (milliseconds)
evaluate the runtime of the proposed algorithms, the number of processor cycles needed by the algorithm has been
measured on an AMD Athlon 64 2 GHz (Dual-Core). In
order to avoid some external interferences interrupts were
disabled during the algorithms execution.
8
6
4
Evaluation of the Periodmin algorithm
2
To evaluate the performance of the algorithm presented
in Listing 2 in reducing the period of a task Ti we have measured the average period improvement (βi ) defined in Section 2. For each task set, we have randomly chosen one of
the tasks Ti and we have applied the Periodmin algorithm to
this task. The results are depicted in Figure 5.
As it is expected, the period improvement is lower as
the total utilisation increases. It is also worth to observe
that a greater period reduction is achieved as the number
of tasks increases. The main reason of this behaviour is
the experiment design. As the workload generator tries to
obtain a set of tasks for a specific utilisation, in general,
the period of a task is greater in a set of 10 tasks than in a
set of 3 tasks. As the reduction depends on the remaining
utilisation, the minimum period is more or less the same for
a task in a set of 3 or 10 tasks, thus the period improvement
is greater as the number of tasks increases.
We have also tested the complexity of the algorithm as a
function of the total processor utilisation for different number of tasks. The runtime in milliseconds is depicted in Figure 6. Obviously, the complexity increases with the number
of tasks and the utilisation.
In order to compare the Periodmin algorithm with other
methods, we have implemented a binary search to obtain
the minimum period of a task. The results are shown in
Figure 7. The binary search is implemented by iteratively
reducing the search interval bounded by upper and lower
0
0.5
0.55
0.6
0.65
0.7
0.75
Utilization (%)
0.8
0.85
0.9
0.95
Figure 7. Periodmin vs Binary search runtime
comparison
values for the period. In each iteration, the feasibility test
for the candidate period is executed to know whether to increase or decrease the next iteration period. Note that this
method is much faster than successively reducing the task
period from an upper bound like in the MAST tool [21] and
even though, the Periodmin algorithm is faster.
7.2
Evaluation of D-P region algorithms
Now, we are going to compare Approx-DP-region and
Exact-DP-region algorithms. We expect that Approx-DPregion will be faster than Exact-DP-region, but this is useless if the approximated region greatly differs from the exact region. Therefore, the first experiment consists on evaluating how close is the approximated area the exact one.
The metric used is DDEL , that is, the relationship between the
points in the perimeter of the exact area (L) and the points
belonging to the AB line (E). The results are presented in
Figure 8, for 10 tasks where we can observe that the area
Quality of the solution
1
0.9
0.8
DL/DE
calculated by Approx-DP-region algorithm is very close to
the real area calculated by Exact-DP-region algorithm. Indeed, the error is below 0.8% in the worst case. For low
utilisations, A and B are probably the same point in D-P
space, meaning that the approximated region coincides with
the exact region, and DDEL = 1. For higher utilisations, we can
observe that DDEL is very close to 1, which means that the two
regions differ in a few points. Similar results are obtained
for different number of tasks.
Figures 9 and 10 show the processor cycles needed by
both algorithms depending on the utilisation of the task set,
for 5 and 10 tasks. As both algorithms execute the Calculate AB algorithm, only the execution inside the for loop
has been measured. Note that a logarithmic scale has been
used to appreciate the great differences between both algorithms. As it is expected, Approx-DP-region algorithm runs
several orders of magnitude faster than Exact-DP-region.
As a conclusion, Approx-DP-region algorithm is a good alternative to efficiently obtain the D-P feasibility region.
53
0.7
0.6
10 tasks
0.5
0.5
0.55
0.65
0.7
0.75
Utilisation
0.8
0.85
0.9
0.95
Figure 8. Quality of the solution provided by
Approx-DP-region algorithm
Approx vs Exact runtime (5 tasks)
1e+06
8 Conclusions and future work
Approx
Exact
100000
Processor cycles
10000
1000
100
10
1
0.5
0.55
0.6
0.65
0.7
0.75
Utilization (%)
0.8
0.85
0.9
0.95
Figure 9. Approx vs Exact D-P region. Runtime comparison (5 tasks)
Approx vs Exact runtime (10 tasks)
1e+07
Approx
Exact
1e+06
100000
Processor cycles
This paper addresses the problem of finding the feasibility region for deadlines and periods of the same task. The
aim is to provide to system designers a tool to choose the
best D and P values to adapt to the dynamic environment
and enhance the performance of the selected task. The feasibility region is re-defined, exact and approximated solutions are provided, so designers can analyse the influence of
deadline variations of a task on the period of the same task
and viceversa. The approximated region is very close to the
exact region (less than 0.2% in most cases) and it is computationally less expensive (several orders of magnitude).
To calculate this region, the minimum period of a task must
be known. We have also presented a pseudo-polynomial
method to calculate the minimum period of a periodic task.
Our analysis is done for one task at a time. However, it is
easy to see that the analysis can be performed for more than
one task in a sequential way. The order in which tasks are
selected determines the shape of the regions, but this would
be the first step to construct the D-P feasibility region of the
entire task set. This is one of our ongoing research.
Our future work is also focused on extending the feasibility region to the domain of the task computation times.
The idea is to construct a three domain feasibility region (D,
P and C space) where designers can choose the best D-P-C
combination without jeopardising feasibility.
0.6
10000
1000
100
10
1
0.5
References
[1] P. Albertos, M. Salgado, and M. Olivares. Are delays in digital control implementation always bad? In Asian Control
Conference, Shangai, 2000.
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Utilization (%)
Figure 10. Approx vs Exact D-P region. Runtime comparison (10 tasks)
54
[2] P. Balbastre, I. Ripoll, and A. Crespo. Analysis of windowconstrained execution time systems. Journal of Real-Time
Systems, 35(2):109–179, 2007.
[3] P. Balbastre, I. Ripoll, and A. Crespo. Minimum deadline
calculation for periodic real-time tasks in dynamic priority
systems. In Transactions on Computers, volume 57, pages
96–109, 2008.
[4] S. Baruah, G. Buttazzo, S. Gorinsky, and G. Lipari. Scheduling periodic task systems to minimize output jitter. In Sixth
Conference on Real-Time Computing Systems and Applications, pages 62–69, 1999.
[5] S. Baruah, A. Mok, and L. Rosier. Preemptively scheduling hard real-time sporadic tasks on one processor. In IEEE
Real-Time Systems Symposium, pages 182–190, 1990.
[6] E. Bini and G. Buttazzo. The space of edf feasible deadlines.
In 19th Euromicro Conference on Real-Time Systems, 2007.
[7] E. Bini and M. D. Natale. Optimal task rate selection in fixed
priority systems. In IEEE Real-Time Systems Symposium,
pages 399–409, 2005.
[8] E. Bini, M. D. Natale, and G. Buttazzo. Sensitivity analysis for fixed priority real-time systems. In 18th Euromicro
Conference on Real-Time Systems, pages 13–22, 2006.
[9] G. Buttazzo, G. Lipari, and L. Abeni. Elastic task model for
adaptive rate control. In IEEE Real-Time Systems Symposium, pages 286–295, December 1998.
[10] A. Cervin and J. Eker. Feedback scheduling of control tasks.
In Proceedings of the 39th IEEE Conference on Decision
and Control, 2000.
[11] A. Cervin, B. Lincoln, J. Eker, K. Arzen, and G. Buttazzo.
The jitter margin and its application in the design of realtime control systems. In Proceedings of RTCSA, 2004.
[12] M. Dertouzos. Control robotics: the procedural control
of physical processors. In IFIP Congress, pages 807–813,
1974.
[13] M. Garey and D. Johnson. Computer and Intractability, A
Guide to the Theory of NP-Completeness. W.H. Freeman
and Company, 1979.
[14] H. Hoang, G. Buttazzo, M. Jonsson, and S. Karlsson. Computing the minimum edf feasible deadline in periodic systems. In Proceedings of RTCSA, August, 2006.
[15] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic
scheduling algorithm: Exact characterization and average
case behaviour. In IEEE Real-Time Systems Symposium,
pages 166–171, 1989.
[16] J. Leung and R. Merrill. A note on the preemptive scheduling of periodic, real-time tasks. Information Processing Letters, 18:115–118, 1980.
[17] C. Liu and J.W.Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. JACM,
23:46–68, 1973.
[18] Y. Liu, S. Chakraborty, and R. Mardulescu. Generalized rate
analysis for media-processing platforms. In Proceedings of
the RTCSA, 2006.
[19] P. Martı́, J. M. Fuertes, and G. Fohler. Jitter compensation
for real-time control systems. In IEEE Real-Time Systems
Symposium, 2001.
[20] A. Mathur, A. Dasdan, and K. Gupta. Rate analysis for embedded systems. ACM Trans. on Design Automation of Electronic Systems, 3(3):408–436, 1998.
[21] J. L. Medina, M. González-Harbour, and J. M. Drake. Mast
real-time view: A graphic uml tool for modeling objectoriented real-time systems. In IEEE Real-Time Systems
Symposium, 2001.
[22] S. Punnekkat, R. Davis, and A. Burns. Sensitivity analysis
of real-time task sets. In Proceedings of the Conference of
Advances in Computing Science, pages 72–82, 1997.
[23] I. Ripoll, A. Crespo, and A. Mok. Improvement in feasibility
testing for real-time tasks. Journal of Real-Time Systems,
11:19–40, 1996.
[24] D. Seto, J. Lehoczky, and L. Sha. Task period selection and
schedulability in real-time systems. In IEEE Real-Time Systems Symposium, pages 188–198, 1998.
[25] K. Shin and X. Cui. Computing time delay and its effects on
real-time control systems. IEEE Trans. on Control Systems
Technology, 3(2):218–224, 1996.
[26] S. Vestal. Fixed-priority sensitivity analysis for linear compute time models. IEEE Transactions on Software Engineering, 20(4):308–317, April 1994.
[27] B. Wittenmark, J. Nilsson, and M. Törngren. Timing problems in real-time control systems. In Proceedings of the
American Control Conference, Jan. 1995.
[28] M. Xiong and K. Ramamritham. Deriving deadlines and
periods for real-time update transactions. In IEEE Real-Time
Systems Symposium, pages 32–43, 1999.
55
Providing Memory QoS Guarantees for Real-Time Applications
A. Marchand, P. Balbastre, I. Ripoll and A. Crespo
Universidad Politécnica de Valencia, Spain
[email protected]
{patricia, iripoll, acrespo}@disca.upv.es
Abstract
Nowadays, systems often integrate a variety of applications whose service requirements are heterogeneous. Consequently, systems must be able to concurrently serve applications which rely on different constraints. This raises
the problem of the dynamic distribution of the system resources (CPU, memory, network, etc.). Therefore, an integrated Quality of Service (QoS) management is needed so
as to efficiently assign resources according to the various
application demands. Within this paper, we focus on a dynamic approach of QoS management for memory resource
allocation based on the Skip-Over model. We detail our
solution and we show how it improves the service of task
memory requests while providing them guarantees. Quantitative results using the TLSF allocator have been performed
in order to evaluate the memory failure probability with and
without memory QoS manager.
Keywords: Real-time, Quality of Service, Memory
Management.
1 Introduction
Nowadays, new real-time applications require more flexibility being of major importance the ability to adjust system
resources to load conditions. The system resources (CPU,
memory, energy, network, disk, etc.) that an application
can use, can be adapted to the global needs. Up to now, all
the efforts have been focused on CPU, energy and network
management, while memory has not been considered as a
dynamic resource for the real-time community.
Recently a new algorithm for dynamic memory allocation (TLSF) [16] that solves the problem of the worst case
bound maintaining the efficiency of the allocation and deallocation operations allows the reasonable use of dynamic
memory management in real-time applications. The proposed algorithm, with a constant cost Θ(1), opens new possibilities with respect to the use of dynamic memory in realtime applications. There exists an increasing number of
emerging applications using high amounts of memory, such
as multimedia systems, video streaming, video surveillance,
virtual reality, scientific data gathering, or data acquisition
in control systems.
Viewing the memory as a dynamic resource to be shared
between real-time tasks, implies to manage this resource
and give some kind of guarantees. CPU scheduling techniques can be adapted and used to deal with memory management. In [14] is defined a memory management system that adjusts memory resources to meet changing demands and user needs. The architectural framework that realizes this approach allows adaptive allocation of memory
resources to applications involving both periodic or aperiodic tasks. However, in this model memory overruns are
not considered. Since fragmentation can lead to a significant system degradation, the possibility that a memory request fail must be taken into account in the real-time model
and it must be avoided as much as possible.
When considering CPU use, several scheduling policies
have been proposed to perform the CPU adaptation using
different points of view. In particular, resource-based algorithms have been developed to characterize the timing requirements and processor capacity reservation requirements
for real-time applications ([17, 2, 1, 10, 8]). Some works
based on a job skipping scheme [9, 12] and providing flexible task models, have also been introduced. More recently,
enhanced scheduling results based on the EDL (Earliest
Deadline as Late as Possible) algorithm [6] have been proposed in [15] to optimize resource allocation in a job skipping context. Additional approaches such as the weaklyhard model [3] provide more general frameworks for QoSbased systems.
1.1
Summary and contributions
Memory allocation is the problem of maintaining an application’s heap space by keeping track of allocated and
freed blocks. The decision to be made by the memory allocator is where to place the requested block in the heap. The
allocator has no information about when the blocks will be
56
system. A task τi is characterized by a worst-case computation time Ci , a period Ti , a relative deadline equal to its
period, and a skip parameter si . This parameter represents
the tolerance of this task to miss deadlines. That means that
the distance between two consecutive skips must be at least
si periods. When si equals to infinity, no skips are allowed
and τi is a hard periodic task. Every task τi is divided into
instances where each instance occurs during a single period
of the task. Every instance of a task is either red or blue
[12]. A red task instance must complete before its deadline
whereas a blue task instance can be aborted at any time.
However, if a blue instance completes successfully, the next
task instance is still blue.
Two scheduling algorithms were introduced about ten
years ago by Koren and Shasha in [12]. The first one proposed is the Red Tasks Only (RTO) algorithm. Red instances are scheduled as soon as possible according to Earliest Deadline First (EDF) algorithm [13], while blue ones
are always rejected. The second algorithm introduced is the
Blue When Possible (BWP) algorithm which is an improvement of the first one. Indeed, BWP schedules blue instances
whenever their execution does not prevent the red ones from
completing within their deadlines. In other words, blue instances are served in background relatively to red instances.
freed after they are allocated. The order of these requests is
entirely up to the application. In some cases, the allocator
may be unable to satisfy the request because there does not
exist a block of memory of the specified size. It is then up
to the application program to deal with such failure in an
appropriate manner.
This paper proposes a framework to minimise the number of fails in memory requests. The methodology proposed
is based in skippable tasks, specifically, we adapt the SkipOver model used in CPU scheduling to manage memory
overruns.
2 Skip-over based CPU overload management
Different techniques have been proposed to deal with
CPU overload management. Efforts are oriented towards
scheduling techniques which are able to include quality features into the scheduling of tasks while letting these tasks
meet certain real-time constraints. To represent such quality of service constraints, Hamdaoui and Ramanathan in [9]
proposed a model called (m,k)-firm deadlines. It guarantees
that a statistical number of deadlines will be met, by using a
distance-based priority scheme to increase the priority of an
activity in danger of missing more than m deadlines over a
window of k requests. If m = k, the system becomes a harddeadline system. The inherent problem of this model is that,
in some cases, constraints can be satisfied even when lots of
consecutive instances do not meet their deadline (e.g. the
(100,1000)-firm deadline is respected even if over 1000 activations of the task, 100 consecutive instances do not meet
their deadline).
This problem is solved for the special case m = k − 1
for which the (m,k) model reduces to the Skip-Over model
[12]. The skip-over scheduling algorithms skip some task
invocations according to a skip factor. The overload is then
reduced, thus exploiting skips to increase the feasible periodic load. West and Poellabauer in [18] proposed a windowed lost rate, that specifies a task can tolerate x deadlines
missed over a finite range or window, among consecutive y
instances. In [7], Bernat et al. introduce a general framework for specifying tolerance of missed deadlines under the
definition of weakly hard constraints.
In what follows, we focus on the significant Skip-Over
approach. Known results about the feabilibity of periodic
task sets under this model are also recalled.
2.1
2.2
Feasibility of skippable periodic task
sets
Liu and Layland in [13] have shown that a task set
{τi ; 1 ≤ i ≤ n} is schedulable if and only if its cumulative
processor utilization (ignoring skips) is no greater than 1,
i.e.,
n
Ci
∑ Ti
≤ 1.
(1)
i=1
Koren and Shasha proved that determining whether a
set of periodic occasionally skippable tasks is schedulable
is NP-hard [12]. However, they have shown the following necessary condition for schedulability for a given set
Γ = {τi (Ci , Ti , si )} of periodic tasks that allow skips:
n
Ci (si − 1)
≤ 1.
Ti si
i=1
∑
(2)
In [4], Caccamo and Buttazzo introduced the notion of
equivalent utilization factor defined as follows.
Definition 1 Given a set Γ = {τi (Ci , Ti , si )} of n periodic
tasks that allows skips, the equivalent utilization factor is
defined as:
Model description
The Skip-Over model [12] deals with the problem of
scheduling periodic tasks which allow occasional deadline
violations (i.e. skippable periodic tasks), on a uniprocessor
U p∗ = max
L≥0
2
∑i D(i, [0, L])
L
(3)
where
57
L
L
D(i, [0, L]) = (⌊ ⌋ − ⌊
⌋)ci .
Ti
Tisi
si which gives the tolerance of this task to memory failures. Thus, a real-time set of periodic tasks consists in
τi = (Ci , Ti , Di , Mi , si ).
In addition, Mi can be described by a 2-tuple Mi = (gi , hi )
considering the maximum amount of memory gi requested
each period, and the maximal time hi during which allocations are persistent in memory (expressed in terms of numbers of periods of task τi ).
Consequently, the amount of memory used by the application in the worst-case is given by ∑i hi gi . Additionally, dynamic memory allocation presents a level of memory fragmentation or wasted memory. Following the processor model, this wasted memory could be considered as
spatial overhead. In the paper this wasted memory will be
refered as M w . Moreover, the allocator uses a data structure to organise the available free blocks (M ds ). Considering these aspects, the total amount of memory M T needed
to fulfill the application requirements can be expressed as:
(4)
They also provided a necessary and sufficient condition
for guaranteeing a feasible schedule of a set of skippable
tasks which are deeply-red (i.e. all tasks are synchronously
activated and the first s1 − 1 instances of every task τi are
red) [5]:
Theorem 1 A set Γ of skippable periodic tasks, which are
deeply-red, is schedulable if and only if
U p∗ ≤ 1.
2.3
(5)
Illustrative example
Hereafter is presented a scheduling example with two
tasks τi (Ci , Ti , si ) defined according to the Skip-Over model.
Task τ1 is a hard real-time periodic task (si = ∞) while task
τ2 allows deadline skips (si = 3).
M T = ∑ hi gi + M w + M ds
(6)
i
τ1 (4,6,∞)
τ2 (1,2,3)
6
6
6
6
-
0
6
12
18
3.2
skip
skip
skip
6 6 6
6 6 6
6 6 6
6
0
2
4
6
8
10
12
14
16
For memory feasibility, we have shown in [14] that a task
set {τi ; 1 ≤ i ≤ n} is memory-schedulable if and only if its
cumulative memory utilization (ignoring skips) is no greater
than the total memory M T assigned to the application.
18
Figure 1. A Skip-Over schedule
The system is overloaded (U p = ∑ni=1 CTii = 64 + 12 = 1.17),
but tasks can be schedulable provided τ2 exactly skips one
instance every 3.
n
∑ hi gi ≤ M T .
(7)
i=1
As an analogy to the processor demand criteria [11], we
turned to another form of schedulability test: the memory
demand criteria.
3 Skip-Over based memory overload management
3.1
Memory feasibility of skippable periodic task sets
Definition 2 Given a set Γ = {τi (Ci , Ti , Mi , si )} of n skippable periodic tasks with memory constraints, the equivalent memory utilization factor is defined as:
New task model and notations
In this section, we formally define the task model used.
Let τ = {τ1 , ..., τn } be a periodic task system. It is assumed
that a periodic task, requiring dynamic memory, requests
each period an amount of memory. This amount of memory is allocated as result of one or several dynamic memory
requests. Allocated memory is freed after some time interval by the same or other task. Note that in this model, it
is not relevant which task frees memory, once it has been
allocated the relevant aspect is the holding time.
Taking into account this behavior, each task τi ∈ τ has
the following temporal parameters: a worst-case computation time Ci , a period Ti , a relative deadline Di , the
dynamic memory needs Mi and an additional parameter
M ∗ = max ∑ D(i, [0, L])
L≥0 i
(8)
where
L
L
L − Tihi
L − Tihi
D(i, [0, L]) = (⌊ ⌋ − ⌊
⌋−⌊
⌋+⌊
⌋)gi .
Ti
Ti si
Ti
Ti si
Proof: Let denote by D(i, [0, L[) the total memory demand within [0, L[ for task τi . First, let us evaluate the
amount of memory requested by a task τi over the interval [0, L[. Within any interval [0, L[, the number of periods
observed for every task τi is equal to ⌊ TLi ⌋, thus involving a
3
58
total demand for memory allocations equal to ⌊ TLi ⌋gi . According to the Skip-Over definition, every task τi is allowed
to skip one instance every si task activations. Thus, for
every task τi , the total skipped memory allocations within
[0, L[ is ⌊ TLi si ⌋gi . Let us now evaluate the amount of memory released by a task τi over the interval [0, L[. Withi hi
out skips, this amount would be only equal to ⌊ L−T
Ti ⌋gi ,
taking into account the fact that task τi does not perform
any memory releases within the interval [0, Ti hi [. However, every skippable periodic task τi does not release any
memory every Ti si periods. Hence, we have to withdraw
from the previous quantity, an amount of memory corresponding to skipped task instances (i.e. non-allocated memi hi
ory), which is equal to ⌊ L−T
Ti si ⌋gi . Consequently, the total
amount of memory remaining at time instant t = L for task
L−Ti hi
i hi
τi is D(i, [0, L]) = (⌊ TLi ⌋ − ⌊ TLi si ⌋ − ⌊ L−T
Ti ⌋ + ⌊ Ti si ⌋)gi . It
follows that the maximal memory utilization is given by
M ∗ = maxL≥0 ∑i D(i, [0, L]). 4 R-MRM Implementation Framework
The R-MRM (Robust-Memory Resource Controller) implementation framework is the enabling technology for efficiently managing memory-constraint tasks using Skip-Over
principles. It provides users with an operational framework
to manage real-time applications. This framework has the
following generic characteristics:
• Ability to dynamically handle task memory requests,
• Provide a recovery service invocation for failed requests,
• Provide a robust and fair service for tasks.
Robust service is the ability to queue memory requests
and ensure guaranteed allocation of these requests to the
memory. Primarily used for handling requests, this capability is crucial for responding to tasks, and for a reliable RMRM implementation. This is typically implemented using
reliable queues with store and forward as well as rejection
capabilities.
The implementation framework partly relies on the
framework previously proposed in [14]. R-MRM is a component that mediates between tasks and the dynamic memory allocator as depicted in Figure 3.
Theorem 2 A set Γ of skippable periodic tasks, which are
deeply-red, is memory-schedulable if and only if
M∗ ≤ MT .
3.3
(9)
Illustrative example
Hereafter is given an example of the memory allocations
a task which allows skips can perform (see figure 2). Task
τi has period Ti =10, skip factor si =3 and the data persistant
time in memory (hi ) is equal to 4 periods.
4*gi
3*gi
2*gi
gi
+
+
+
+
+0b
90%
100%
100%
60%
skip
30%
90%
80%
skip
Figure 3. R-MRM external interaction view
80%
b
b
10
b
20
b
30
b
40
50
b
b
60
b
70
b
80
90
100
Ti
hmax
Ti
i
R-MRM offers two kinds of operations that a task
can perform: memory request and free request. The
memory request function involves the memory size requested and a deadline for this request. If there is available
memory (this is evaluated according to a memory granting
policy) to serve the request, the allocation is done by means
of the malloc function provided by the dynamic allocator.
If not, the requester task is blocked until the request can
be solved or the deadline is reached. Solving the request
means that other tasks can free memory blocks during this
interval (calling the free request function) that can fulfill
the memory needs of this request.
Figure 2. Memory requests of a skippable periodic task (si = 3)
Observe that skips do not necessarily happen in a regular
fashion, but that the distance between every two skips is at
least si periods, thus providing a minimal Quality of Service (QoS) for memory allocations to tasks. That is, after
suffering a memory failure (i.e. gi = 0 for the corresponding period), memory allocations must be satisfied during at
least si − 1 periods of the task.
4
59
• a rejection policy,
• a recovery policy.
All these policies are implemented by the “Skip-Over
based Memory Controller” sub-component in a centralized
manner. It is a foundational piece of the R-MRM solution.
It includes crucial information needed for defining and provisioning memory. It implements the necessary functions
to keep track of the available memory in the system, manages the Failed and Granted request queues and controls
the timer associated to the deadline of a blue blocked task
in the Failed requests queue.
Figure 4. R-MRM internal view
When a task wants to free memory previously allocated,
it calls the free request function. Additionally to the
freed of the allocated memory, tasks that were blocked
waiting for available memory are re-evaluated, according
to a recovery policy (indicated in Listing 2 by the reevaluate Failed requests queue label). The pseudo-code
of the memory request and free request operations are
shown in Listings 1 and 2.
4.1
We can see that the component implements a memory
granting policy that allows to determine whether the memory request can be granted or not. If it is not the case (i.e.
the request fails), the re-evaluation of the failed requests is
done according to a recovery policy. Moreover, the R-MRM
component can apply a rejection policy so as to give priority
to the most important tasks (i.e. red tasks in the Skip-Over
model).
In the following, we consider the case of a real-time application whose heap space has been properly sized according to condition (9) of Theorem 2.
We are going to see in more detail the three policies
aforementioned.
Listing 1. Memory request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function memory request(size, deadline ) is
if ( memory granting policy)
insert in Granted requests queue
malloc(size );
else
if ( red task )
rejection policy ;
insert in Granted requests queue;
malloc ( size );
else
insert in Failed requests queue
−− the calling task is blocked
end if ;
end if ;
end memory request;
4.1.1 The memory granting policy
We are interested in a granting policy that controls the access to the application’s heap space efficiently according to
the skip-over constraints of all the tasks. When a memory
allocation request is submitted, the “Skip-Over based Memory Controller” evaluates the amount of memory already
allocated and compares it to the total amount of memory
granted to the application. Consequently, it is always possible to know the amount of available memory at the current
time in order to make a decision about the acceptance of the
request.
Let t be the current time which coincides with the arrival of a memory request I. Upon arrival, request I(d, g, h)
is characterized by its deadline d, its maximum amount of
memory g and the maximal time h during which allocation
is persistent in memory. We assume that several memory
requests are present in the Granted request queue at time
t. Let denote by I (t) = {Ii (di , gi , hi ), i = 1 to req(t)} the
memory request set supported by the machine at t. Then,
the acceptance problem to solve when any memory request
I occurs, is reduced to the test of a necessary and sufficient
condition:
Listing 2. Free request
1
2
3
4
Framework Policies
function free request ( int ptr ) is
free ( ptr );
re −evaluate Failed requests queue;
end free request ;
When a blue blocked task reaches its deadline, it is immediately discarded by the R-MRM component, exits the
Failed requests queue and a memory request failure is sent
to the task.
The R-MRM component embeds three kinds of policies
to address all its functionalities:
• a memory granting policy,
5
60
5 Simulation Study
Theorem 3 Memory is granted to request I if and only if,
considering the request set I (t) ∪ I, we have:
In this section, we evaluate how effective the proposed
task model and scheduling scheme can solve the problem of
guaranteeing memory allocation according to the QoS specification inherently provided by the Skip-Over task model.
The pursued objective is to show the influence of the skip
parameter si upon the probability of the system to experiment memory failures.
req(t)+1
∑
hi gi ≤ M T
(10)
i=1
If there is enough memory available then the allocation
is granted, otherwise the request will undergo a recovery
process aiming at attempting the request later on.
4.1.2 The recovery policy
5.1
If the memory granting policy determines that there is not
enough memory to serve the request, the task is temporarily put into the queue named “Failed requests queue” (see
Figure 4), waiting there to make another attempt at being
accepted. According to the Skip-Over model, tasks in this
queue are only blue and can exit the queue in the following
cases:
Experiments
To evaluate the actual performance of our solution, we
constructed a simulator that models the behavior of the
R-MRM component. Its evaluation was performed by
means of four simulation experiments, all operating with the
memory request and free request operations presented
in Listings 1 and 2. The proposed scenarios (see Table 1)
were specially designed to provide a comprehensive and
comparative analysis of the proposed approach regarding
the memory and QoS requirements previously exposed.
• When it is freed (by other task or tasks) the sufficient
amount of memory to serve its request. In this case,
the request can be granted and the task exits the Failed
requests queue.
• When the deadline is reached. Then, the task exits the
Failed requests queue with a failure.
4.1.3 The rejection policy
Considering the working hypothesis specifying that memory can always be granted to red tasks, the problem of a red
request failure is always resolvable, provided one (or several) blue tasks previously accepted is (are) removed from
memory. Therefore, the problem of the rejection decision
consists merely on determining which blue task has to be
rejected. The criterion set for rejection consists on identifying the blue task having the least actual failure factor.
Hence, we propose as a metric the Task Failure Factor
(T T Fi ) as the ratio between the number of failures observed
for task τi since initialization time and its number of activations at time t:
T T Fi (t) =
nb f ailures(t)
⌊ Tti ⌋
Test
si
1
2
3
4
∞
10
6
2
Table 1. Simulation scenarios
This permits to evaluate the capability of the system in
exploiting skips so as to resolve memory failures, thus always guaranteeing memory service for the most important
tasks (i.e. red tasks). si parameters have been considered
identical for all tasks in order to clearly demonstrate the influence of the QoS specification with respect to the memory
failure probability observed.
Experiments were carried out for 100 randomly generated task sets with period Ti uniformly distributed in the
range 20..250, maximal amount of memory Gmax
uniformly
i
distributed in the range 4096..102400. Deadlines equals to
avg
periods. Additional parameters, Gi and Gstdev
, define a
i
normal distribution (average and standard deviation) used
by task τi to request memory blocks that will be used during an interval randomly generated as a uniform distribution
between hmax
and hmin
periods.
i
i
(11)
That means that the ready blue task whose failure ratio T T Fi (t) computed from the initialization time is least,
is candidate for rejection. Ties are broken in favor of the
task with the earliest deadline. Note that this is an arbitrary
metric. For instance, we might equally base the rejection
criterion upon a sliding window whose size would need to
be specified as a function of task periods for example. The
criterion could as well rely on the evaluation of the task having the greatest number of successive successes since the
last failure. This could be the subject of another study.
5.2
Results
Our results are shown in several different ways in Figures 5 to 12. In all cases, the x-axis displays the different
6
61
4000
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
700
Number of solved requests
Number of failed requests
800
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
5000
3000
2000
1000
600
500
400
300
200
100
0
0
80
85
90
95
100
105
110
Memory level w.r.t total live memory (%)
115
80
Figure 5. Number of failed requests according to si
1600
5000
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
4000
Number of overruns
Number of retries
1200
115
Figure 7. Number of solved requests according to si
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
1400
85
90
95
100
105
110
1000
800
600
400
3000
2000
1000
200
0
80
85
90
95
100
105
110
0
115
80
85
90
95
100
105
110
115
Figure 6. Number of retries according to si
Figure 8. Number of overruns according to si
percentages of memory amounts provided for the application with respect to the total live memory (i.e. ∑i hi gi ) given
by the task specification itself.
Four output parameters have been evaluated:
where the amount of memory is exactly equal to the total
live memory (i.e. 100%), we observe a non-zero memory
failure probability. This is due to the spatial overhead induced by the dynamic memory allocator (here the TLSF).
For a zero memory failure probability, the amount of memory assigned to the application must be higher to take into
account the data structures needed by the dynamic memory
allocator for functioning (see equation 6).
On the other hand, as expected, we observe that the
memory failure probability for si = ∞ (i.e. no skips
allowed) is significantly higher than in the other scenarios.
Interestingly, curves for si = 10, si = 6 and si = 2 have
almost identical distribution for a memory level greater or
equal to 95% of the total live memory.
• the number of failed requests,
• the number of retries, i.e. the number of times a memory request is re-attempted,
• the number of solved requests,
• the number of overruns; this case occurring when
a task reached its deadline without having received
memory allocation.
In Figures 9, 10, 11 and 12, the relative memory failure
probability is shown for each scenario.
First, note that even when si = ∞, the memory failure
ratio is relatively low, thus underlying the good behavior
of the TLSF allocator. For instance, for a memory level
greater or equal to 100%, the failed requests ratio observed
Figures 5, 6, 7 and 8 show the absolute memory failure probability for each output parameter for the four tested
scenario.
First, notice that all the curves decrease with increasing
percentage of memory given to the application, which is a
logical behavior. Note also that for the remarkable point
7
Number of failures / Number of mallocs
5
Number of overruns / Number of mallocs
62
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
4
3
2
1
0
80
85
90
95
100
105
110
115
Number of overruns / Number of failures
Number of solved / Number of failures
4
3
2
1
0
85
90
95
100
105
110
60
40
20
100
80
60
40
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
20
0
80
0
80
85
90
95
100
105
110
115
Figure 11. Influence of si on the ratio of the
number of overruns to the number of mallocs
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
80
Test 1 (si = inf)
Test 2 (si = 10)
Test 3 (si = 6)
Test 4 (si = 2)
80
number of failures to the number of mallocs
100
5
115
85
90
95
100
105
110
115
number of overruns to the number of failures
number of solved requests to the number of
failures
6 Conclusions
While feasibility and schedulability analysis from the
CPU point of view is well understood, memory analysis for
real-time systems has received less attention. In this paper we addressed the problem of scheduling real-time task
sets with memory constraints. In particular, we presented
a memory feasibility analysis for skippable periodic task
sets. The memory feasibility test contained in this paper
represents the first known result for periodic real-time tasks
based on the Skip-Over model. Our main contribution was
actually to design a component for a skip-over based memory overload management and to evaluate it. Through the
results, we showed to what extend the proposed R-MRM
component can minimize the memory failure occurrence
probability, while a QoS level (i.e., skip parameter established by the application programmer) is always guaranteed
for tasks. The strong point of this approach relies on the
memory guarantees provided by the component. We be-
is less than 1% of the total memory requests. Results for
si ∈ {2, 6, 10} improve the memory failure occurrence probability. For instance, with si = 2 and a memory level equal
to 100%, the R-MRM enjoys more than factor of 2 memory failure advantage over the R-MRM used without skips.
Moreover, as expected, we can note that this advantage is
all the less significant as the memory level is higher. For
high memory levels (> 105%), the R-MRM tends to have
the same behavior for any scenario considered. Finally, we
see that the ratio of solved requests is higher for small values of si . For instance, for a memory level equal to 100%,
the R-MRM applied to memory requests with si = 2 will
resolve more than 1.7 as many failed requests as with the
R-MRM with si = ∞. In summary, we can say that the RMRM used with skips gives the best results from both the
memory failure probability point of view and the memory
failure resolution capabilities.
8
63
lieve that the present approach is promising for enhancing
the performance of memory-constraint applications and applying memory analysis in the real-time field.
[16] M. Masmano, I. Ripoll, A. Crespo, and J. Real. Tlsf: A new
dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time
Systems, Catania, Italy, 2004.
[17] C. W. Mercer, S. Savage, and H. Tokuda. Processor capacity
reserves for multimedia operating systems. Technical report,
Pittsburgh, PA, USA, 1993.
[18] R. West and C. Poellabauer. Analysis of a windowconstrained scheduler for real-time and best-effort packet
streams. In In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS’00), Orlando, Florida (USA), 2000.
References
[1] L. Abeni and G. Buttazzo. Resource reservation in dynamic real-time systems. Journal of Real-Time Systems,
27(2):123–167, 1998.
[2] L. Abeni, T. Cucinotta, G. Lipari, L. Marzario, and
L. Palopoli. Qos management through adaptive reservations.
Journal of Real-Time Systems, 29(2-3):131–155, 2005.
[3] G. Bernat, A. Burns, and A. Llamosi. Weakly hard real-time
systems. IEEE Trans. Comput., 50(4):308–321, 2001.
[4] M. Caccamo and G. Buttazzo. Exploiting skips in periodic tasks for enhancing aperiodic responsiveness. In
Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS’97), San Francisco, California, pages 330–339,
1997.
[5] M. Caccamo and G. Buttazzo. Optimal scheduling for faulttolerant and firm real-time systems. In Proceedings of fifth
conference on Real-Time Computing Systems and Applications (RTCSA’98), Hiroshima, Japan, 1998.
[6] M. Chetto and H. Chetto. Some results of the earliest deadline scheduling algorithm. IEEE Transactions on Software
Engineering, 15(10):1261–1269, 1989.
[7] A. B. G. Bernat and A. Llamosi. Weakly-hard real-time systems. In In IEEE Transactions on Computers, volume 50,
pages 308–321, 2001.
[8] C. Hamann, J. Loser, L. Reuther, S. Schonberg, J. Wolter,
and H. Hartig. Quality-assuring scheduling: Using stochastic behavior to improve resource utilization. In 22nd IEEE
Real-Time Systems Symposium, pages 119–128, 2001.
[9] M. Hamdaoui and P. Ramanathan. A dynamic priority assignment technique for streams with (m,k)-firm deadlines.
IEEE Transactions on Computers, 44:1443–1451, 1995.
[10] K. Jeffay, F. D. Smith, A. Moorthy, and J. Anderson. Proportional share scheduling of operating system services for
real-time applications. In IEEE RTSS, pages 480–491, 1998.
[11] K. Jeffay and D. Stone. Accounting for interrupt handling
costs in dynamic priority task systems. In Proceedings of
the 14th IEEE Real-Time Systems Symposium (RTSS’93),
Raleigh-Durham, NC, pages 212–221, 1993.
[12] G. Koren and D. Shasha. Skip-over algorithms and complexity for overloaded systems that allow skips. In Proceedings
of the 16th IEEE Real-Time Systems Symposium (RTSS’95),
Pisa, Italy, 1995.
[13] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of
the ACM, 20(1):46–61, 1973.
[14] A. Marchand, P. Balbastre, I. Ripoll, M. Masmano, and
A. Crespo. Memory Resource Management for Real-Time
System. In Proceedings of the 19th Euromicro Conference
on Real-Time Systems, Pisa, Italy, 2007.
[15] A. Marchand and M. Silly-Chetto. Dynamic real-time
scheduling of firm periodic tasks with hard and soft aperiodic tasks. Real-Time Systems, 32(1-2):21–47, 2006.
9
3. Sistemas Operativos y Middleware
67
Operating System Support for Execution Time Budgets for Thread Groups
Mario Aldea Rivas and Michael González Harbour
Universidad de Cantabria
39005-Santander, SPAIN
{mgh, aldeam}@unican.es
Abstract
The recent Ada 2005 standard introduced a number of
new real-time services, with the capability of creating and
managing execution time budgets for groups of tasks. This
capability has many practical applications in real-time systems in general, and therefore it is also interesting for realtime operating systems. In this paper we present an implementation of thread group budgets inside a POSIX real-operating system, which can be used to implement the new Ada
2005 services. The architecture and details of the implementation are shown, as they may be useful to other implementers of this functionality defined in the new standard.
Keywords: Real-time systems, Execution time budgets,
Thread groups, CPU time, Ada 2005.
1.
Introduction1
In hard real-time systems it is essential to monitor the
execution times of all tasks and detect situations in which
the estimated worst-case execution time (WCET) is
exceeded. This detection was usually available in systems
scheduled with cyclic executives, because the periodic
nature of its cycle allowed checking that all initiated work
had been completed at each cycle. In event-driven concurrent systems the same capability should be available, and
can be accomplished with execution time clocks and timers.
This need for managing execution time is recognized in
standards related to real-time systems. The POSIX standard
[4] defines services for execution time measurement and
budget overrun detection, and its associated real-time profiles [5] require implementations to support these services.
The recent Ada 2005 standard introduced a number of new
1. This work has been funded by the Plan Nacional de I+D+I of the
Spanish Government under grant TIC2005-08665-C03 (THREAD
project), by Ada Core, and by the European Union’s Sixth Framework
Programme under contracts FP6/2005/IST/5-034026 (FRESCOR project)
and IST-004527 (ARTIST2 NoE). This work reflects only the author’s
views; the EU is not liable for any use that may be made of the information
contained herein.
real-time services intended to provide applications with a
higher degree of flexibility. In particular this standard
defines capabilities for measuring the execution time of
individual tasks, and the ability to detect and handle execution-time budget overruns.
As real-time applications evolve towards an increased
complexity level, issues such as composability of independently developed application components and support for
legacy code introduce the need for supporting different levels of hierarchy in the scheduling mechanism, leading to a
hierarchical concurrency model with different layers, and
with capabilities for establishing boundaries for the protection of different parts of the application. In this context of
hierarchical scheduling it is often required to bound the
execution time of a group of activities that are inside the
same protection boundary, so that they cannot interfere
with other activities in other protection boundaries by using
up more resources than they should. This need introduces a
requirement on the underlying implementation to support
the measurement of the execution times of groups of tasks,
and the handling of potential budget overruns, in a way
similar to what is usually done for individual tasks.
Following this general requirement, the Ada 2005 standard defines services for execution-time budgets for groups
of tasks, and is now a step forward in relation to the realtime extensions to POSIX, which still has no such service.
In this paper we propose an implementation of a mechanism to support execution-time budgets for thread groups
inside a POSIX operating system. The API of this implementation could be used as a basis for a future extension to
POSIX. It will also be used to implement the task group
budgets defined in Ada 2005. The architecture and details
of the implementation are shown, as they may be useful to
other implementers of this functionality defined in the new
standard. Some performance metrics are provided.
The paper is organized as follows. Section 2 discusses
the current services that are available in the platform chosen
for this implementation, MaRTE OS and GNAT, and that
are related to thread group budgets. Section 3 introduces the
68
services designed to represent sets of threads. Section 4
discusses the implementation of the execution time clocks
for groups of threads, while Section 5 does the same for
budgets and their associated handlers. Section 6 provides
some performance metrics and, finally, Section 7 gives our
conclusions.
2.
Background
The implementation of execution time budgets for
thread groups presented in this paper has been developed in
MaRTE OS [1] [2], which is a real-time operating system
(RTOS) that follows the POSIX.13 [5] minimum real-time
system profile, and is mostly written in Ada. It is available
for the ix86 architecture as a bare machine, and it can also
be configured as a POSIX-thread library for GNU/Linux.
The GNAT run-time library has been adapted to run on top
of MaRTE OS, which is itself being extended in a joint
effort between Ada Core and the University of Cantabria
with the objective of providing a platform fully compliant
with Ada 2005, available for industrial, research, and
teaching environments. The implementation of thread
group budgets presented in this paper is part of the effort to
achieve this objective.
Two of the new Ada 2005 real-time services are closely
related to the thread group budgets and are already available in MaRTE OS and GNAT [3]:
• Timing events are defined in Ada 2005 as an effective
and efficient mechanism to execute user-defined timetriggered procedures without the need to use a task. They
are very efficient because the event handler may be executed directly in the context of the interrupt handler,
avoiding the need for a server task.
• Execution time clocks and timers are defined in Ada
2005 as a standardized interface to obtain the execution
time consumption of a task, together with a mechanism
that allows creating handlers that are triggered when the
execution time of a task reaches a given value, providing
the means to execute a user-defined action when the execution time assigned to a specific task expires.
Timing events have been implemented in MaRTE OS
through a service that we call “timed handlers”, which are
not only useful to implement their Ada counterpart, but are
also useful to other applications as a general-purpose
RTOS mechanism.
MaRTE OS supports the execution-time clocks and timers defined in POSIX.1, which would be appropriate to
implement their couterparts in Ada. However, the timers
defined in POSIX to detect execution time overruns use an
operating system signal to notify about their expiration.
Signals are a very scarce resource inside an RTOS.
Besides, the signal is usually handled through a thread that
is waiting to accept the signal, but this is a mechanism that
introduces relatively high overheads, mainly due to the
need for the handler to be a thread, with the associated
costs in context switches. This leads to the same reason for
introducing the new "timing events" mechanism for regular
time management.
As a consequence, the Ada implementation of execution
time clocks and timers has been achieved in MaRTE
through the "timed handler" mechanism, which allows a
direct handling of the event inside the hardware timer interrupt handler, thus avoiding the use of a signal and the subsequent double context switch that would be necessary
otherwise.
To implement thread group budgets inside MaRTE OS
we will follow an approach similar to that followed for
execution time budgets for individual threads, creating the
appropriate execution time clocks for thread groups and
extending the "timed handler" mechanism to also support
these new clocks.
3.
Thread sets
Before creating the execution time clocks for thread
groups or sets, it is necessary to specify a mechanism to
represent the groups themselves. Instead of defining a
mechanism specific to execution-time clocks, we have chosen to create an independent RTOS object that represents a
group of threads. In this way, we will be able to address
future extensions that require handling groups of threads
using these same objects. Examples of such new services
might be related to the requirements for supporting hierarchical scheduling, for instance to suspend or resume a
group of threads atomically.
A thread set is implemented by a record that may be
extended in the future to add functionality. This record has
the following fields:
• Set : A list of the threads belonging to the set.
• Iterator: A reference to the current thread in the list, used
when iterating through marte_threadset_first and
marte_threadset_next.
A restriction has been made so that a thread can belong
to only one thread set. This restriction is also made in the
Ada 2005 standard, and its rationale is that in the hierarchical scheduling environment for which thread groups are
useful, threads only belong to one specific scheduling
class, and therefore to one specific set. This restriction
allows a more efficient implementation, because at each
context switch only one of the Consumed_Time fields of
the set to which the running thread belongs needs to be
updated.
Threads can be added/removed to/from a thread set
dynamically.
69
Every thread has a pointer in its thread control block
(TCB) to the set it belongs to. This field is null if the thread
doesn’t belong to any thread set.
The C language API to manage thread sets from the
application level is the following:
The API to obtain an execution-time clock from a thread
set is:
// create an empty thread set
int marte_threadset_create
(marte_threadset_id_t *set_id);
// destroy a thread set
int marte_threadset_destroy
(marte_threadset_id_t set_id);
// empty an existing thread set
int marte_threadset_empty
(marte_threadset_id_t set_id);
// add a thread to a set
int marte_threadset_add
(marte_threadset_id_t set_id,
pthread_t thread_id);
// delete a thread from a set
int marte_threadset_del
// check thread membership
int marte_threadset_ismember
// reset the iterator and get the first thread id
int marte_threadset_first
pthread_t *thread_id);
// advance the iterator and get next thread id
int marte_threadset_next
pthread_t *thread_id);
// check whether the iterator can be advanced
int marte_threadset_hasnext
(marte_threadset_id_t set_id)
// get the set associated with the given thread
int marte_threadset_getset
(marte_threadset_id_t *set_id)
The returned id represents a clock that can be read and set
through the standard POSIX API for clocks, i.e., using
functions clock_gettime, clock_settime, ... They can
also be used as the base for POSIX timers and MaRTE OS
timed events as any other clock defined in the system. They
can not however be used as the base for the
clock_nanosleep operation, as is also the case with the
single-thread CPU-time clocks. POSIX leaves this
behavior as unspecified and Ada does not define execution
time as a type that can be used in the equivalent delay
statements.
POSIX requires type clockid_t to be defined as an
arithmetic type, and therefore clock ids are implemented
using a unsigned number of 32 bits. The value stored in a
clock id can have different interpretations:
4.
Execution time clocks for thread groups
To implement execution time clocks for groups of
threads we add the following information to the object that
represents a thread set:
• Consumed_Time: CPU-time consumed for all the task in
the group. Every time a thread of a given set leaves the
CPU, the time consumed by this task since its last activation is added to the Consumed_Time of its thread set,
even if there is no timed event associated with it,
because the value of the execution-time clock may be
read at any time by the application.
• Group_Timed_Event : A reference to the internal RTOS
execution time event, used by the scheduling mechanism. A set can be associated with at most one such
event.
// destroy a thread set
int marte_getgroupcpuclockid
clockid_t *clock_id);
• Special values for the regular calendar-time clock
CLOCK_REALTIME, the execution time clock of the current thread CLOCK_THREAD_CPUTIME_ID, and the
monotonic clock CLOCK_MONOTONIC.
• A pointer to a thread control block when the clock is a
thread CPU-time clock of a particular thread.
• A pointer to a thread set when it is a thread group clock.
5.
Timed events based on a group clock
Group clocks can be used as the base of timers and
timed handlers. When a timer or a timed handler is armed,
a MaRTE OS timed event is enqueued in the system event
queues. Time-based events in MaRTE OS are of two kinds:
standard time and execution time. They are kept in separate
priority queues because they cannot be compared with each
other for ordering. Events based on group clocks are a special case of execution time events. An execution time event
has the following information:
• CPU_Time: The event will expire when the execution
time consumed by the associated task reaches this value
• Group_Expiration_Time: The event will expire when the
Consumed_Time field of the task set associated with the
event reaches this value. This field is only used in events
based on a group clock.
• Is_Based_On_Group_Clock: This is a boolean used to
identify events based on group clocks
• Base_Clock: A clock id representing the clock used as
the timing base of the event. It could be a thread CPUtime clock or a group clock.
70
• Task_Where_Queued : A pointer to the task that has
queued the event.
Execution time events are kept in a queue associated
with the task on which the event is based on, and stored as
the CPU_Time_Timed_Event_Queue in the task control
block. Every time a new thread gets the CPU, the events at
the head of the standard-time events queue and of the running task´s CPU_Time_Timed_Event_Queue queue are
compared. The hardware timer is programmed to expire at
the most urgent of the two.
Events based on group clocks are special CPU-time
events
that
“jump”
between
the
CPU_Time_Timed_Event_Queue of the threads in the
group. Each time the system schedules a task included in a
thread set that has an event associated, the following
actions are performed in the Do_Scheduling internal kernel operation:
-- Set CPU_Time of the event according to the
-- time consumed by T
T.Set.Group_TE_Ac.CPU_Time := T.Used_CPU_Time +
(T.Set.Group_TE_Ac.Group_Expiration_Time T.Set.Consumed_Time);
-- Move Group_TE_Ac from one task to another
if T.Set.Group_TE_Ac.Task_Where_Queued /= null
then
-- Dequeue from the list it was queued
Dequeue (T.Set.Group_TE_Ac,
T.Set.Group_TE_Ac.Task_Where_Queued.
CPU_Time_TEs_Q);
end if;
-- Enqueue in T's list
Enqueue_In_Order (T.Set.Group_TE_Ac,
T.CPU_Time_TEs_Q);
T.Set.Group_TE_Ac.Task_Where_Queued := T;
Dequeue and enqueue operations are very fast, because
the number of CPU-time events associated to a task usually
will be very small, either one or two: a CPU-time event and
a “group event”. Consequently the number of extra operations required at each context switch to manage these
clocks is kept small, and the implementation can efficiently
schedule the threads with an acceptable overhead, as can be
seen in the following performance metrics section.
6.
Performance metrics
The support for group budgets has already been implemented in MaRTE OS. Execution time accounting introduces a small overhead: enabling this service in MaRTE
OS increments the context switch time by less than 5%.
Group execution time accounting increments the context
switch time by another 4%, representing a total of 9%
increment with respect to a system with no CPU-time
accounting in an x86 architecture.
The overheads of the budget overrun detection are also
relatively small. Table 1 shows a comparison of the over-
heads of two detection mechanisms, as measured in a
3.4GHz Pentium IV. The first one is implemented using a
regular POSIX timer that sends a signal when the budget
expires, and a handler thread that blocks waiting to accept
the signal. The second mechanism is implemented using
the new timed handler service. We can see that the overhead of the second mechanism is much smaller.
Table 1. Overhead of budget overrun notification mechanism
Time (μs)
(using timer and
auxiliary thread)
Time (μs)
(using timed
handlers)
From user´s thread to
handler
1.1
0.4
From handler to user’s
thread
0.8
0.7
Total time:
1.9
1.1
Metric
7.
Conclusion
As the complexity of real-time systems evolves, hierarchical scheduling and partitioning are mechanisms used to
cope with it, by helping in establishing protection boundaries and easing the composability of independently-developed application components. One of the requirements of
this partitioning is the time protection among the different
groups of tasks in the hierarchy, which can be achieved by
using thread group budgets as those specified in the new
Ada 2005 standard.
This paper has presented an implementation of the support needed to provide such budgeting services in a realtime operating system called MaRTE OS. The paper
describes the architecture and details of the implementation, together with the rationale for the main design decisions, so that this information can be used by other
implementers of this functionality, either as part of Ada
run-time systems, or as part of a general-purpose RTOS.
The implementation has proven to be straightforward, and
the overheads introduced are small, both in the context
switch times and in the budget overrun notification mechanism.
As future work, the functionality defined in Ada 2005
for group budgets will be implemented. It is anticipated
that support for the Ada group budgets will be a simple
package built on top of the MaRTE OS implementation
described in this paper.
References
[1] Aldea Rivas M. and González Harbour M. MaRTE OS:
Minimal Real-Time Operating System for Embedded
Applications. Universidad de Cantabria. http://
marte.unican.es/
[2] Aldea Rivas M. and González Harbour M. MaRTE OS: An
Ada Kernel for Real-Time Embedded Applications.
Proceedings of the International Conference on Reliable
Software Technologies, Ada-Europe-2001, Leuven, Belgium,
Lecture Notes in Computer Science, LNCS 2043, May, 2001,
ISBN:3-540-42123-8, pp. 305,316.
[3] Aldea Rivas M. and Ruiz J.F.. Implementation of new Ada
2005 real-time services in MaRTE OS and GNAT.
International Conference on Reliable Software Technologies,
Ada-Europe-2007, Switzerland.
[4] IEEE Std. 1003.1:2004 Edition, Information Technology —
Portable Operating System Interface (POSIX). The Institute
of Electrical and Electronics Engineers.
71
[5] IEEE Std. 1003.13-2003. Information Technology Standardized Application Environment Profile- POSIX
Realtime and Embedded Application Support (AEP). The
Institute of Electrical and Electronics Engineers.
[6] S. Tucker Taft, Robert A. Duff, Randall L. Brukardt, Erhard
Ploedereder, Pascal Leroy (Eds.) Ada-2005 Reference
Manual. Language and Standard Libraries. International
Standard ISO/IEC 8652/1995(E) with Technical corrigendum
1 and Amendment 1. Springer, Number 4348 in Lecture
Notes in Computer Science, Springer-Verlag (2006).
72
UNA MÁQUINA VIRTUAL
PARA SISTEMAS DE TIEMPO REAL CRÍTICOS
José A. Pulido, Santiago Urueña, Juan Zamorano y Juan A. de la Puente
Grupo de Sistemas de Tiempo Real y Arquitectura de Servicios Telemáticos
Universidad Politécnica de Madrid (UPM)
Resumen
software a partir de modelos construidos con herramientas como
R.
Simulink El código generado
La máquina virtual de ASSERT proporciona un
entorno de ejecución con un comportamiento temporal determinista para sistemas de tiempo real
con requisitos de alta integridad, como los utilizados en los sistemas embarcados en los vehículos
espaciales. Para ello, se congura como una plataforma que admite componentes de software con
un comportamiento temporal acorde con una serie
de normas, y rechaza la ejecución de otros tipos
de componentes. Los componentes de software se
generan automáticamente a partir de una descripción de alto nivel del sistema, que permite incluir
elementos derivados de modelos funcionales.
de esta forma se integra manualmente en una es-
Palabras clave: Sistemas de tiempo real, siste-
El proyecto ASSERT
tructura de concurrencia, que puede estar basada
en un ejecutivo cíclico [9] o, más recientemente, en
un núcleo de tiempo real [10]. Este tipo de intervención manual complica el proceso de desarrollo
de software y también el proceso de vericación y
validación del mismo, ya que a menudo ingenieros
expertos en la parte funcional del proyecto deben
encargarse de cumplir también con los requisitos
de tiempo real relacionados con los aspectos concurrentes del sistema, complicando su labor y, por
tanto, aumentando el número de errores introducidos a lo largo del proceso.
mas de alta integridad, desarrollo basado en mo-
(Automated proof based System and Software Engineering for Real-Time) tie-
delos.
ne como objetivo la mejora de los procesos de desarrollo de sistemas y software en el dominio aeroes-
1.
Introducción
pacial mediante la aplicación de métodos formales
y modelos de componentes de software. Con ello se
pretende desarrollar un conjunto de
Los sistemas de tiempo real embarcados en vehículos espaciales a menudo tienen unos requisitos de
tructivos
bloques cons-
para su utilización en entornos abier-
tos que se puedan compartir por diversos equi-
seguridad muy estrictos. Esto se debe a la ne-
pos de trabajo en este dominio. En este contexto
cesidad de asegurar que las acciones funcionales
se ha desarrollado un nuevo proceso de desarro-
se ejecutan en los intervalos de tiempo marcados
llo de software que permite separar los aspectos
por el algoritmo correspondiente, ya que el funcio-
funcionales (por ejemplo, algoritmos de control)
namiento incorrecto de muchos de estos sistemas
de los mecanismos de concurrencia y tiempo real.
puede comprometer seriamente la seguridad de los
Este enfoque permite que los ingenieros especia-
bienes, o incluso de las personas en el caso de una
lizados en el dominio aeroespacial se centren en
misión tripulada, por lo que se hace necesario ase-
el diseño de los algoritmos funcionales, y que el
gurar que su comportamiento está determinado en
código correspondiente se inserte de manera semi-
todas las situaciones posibles. El proceso de veri-
automática en un conjunto de
cación y validación de este tipo de sistemas es,
denen los elementos de concurrencia tareas y
por tanto, muy estricto. La Agencia Espacial Eu-
elementos de sincronización entra ellas y tem-
ropea (ESA) ha desarrollado normas estrictas pa-
porización comportamiento periódico o esporá-
ra el desarrollo y el control de calidad del software
dico, plazos de ejecución, etc. Los contenedores se
embarcado [4, 5]. Entre otros requisitos, estas nor-
ejecutan sobre una
mas establecen la necesidad de efectuar análisis de
que el comportamiento de los elementos de sincro-
tiempos de respuesta en el software de tiempo real
nización y temporización es correcto. En conjunto,
embarcado.
el proceso de desarrollo de ASSERT asegura que
La tendencia actual apunta hacia una separación
entre los métodos que se emplean para el desarrollo del código funcional y el que se utiliza para
garantizar el comportamiento temporal especicado. En el primer caso es corriente desarrollar el
contenedores
máquina virtual
que
que garantiza
los requisitos temporales especicados para el sistema se reejan en propiedades de las entidades
de ejecución de forma correcta.
En este artículo se describe el concepto y las ca-
73
racterísticas de la máquina virtual de ASSERT.
componentes que se ejecutan directamente so-
Previamente se efectúa una presentación introduc-
bre la plataforma de ejecución, la
máquina
toria del proceso de desarrollo seguido, y más ade-
virtual de ASSERT
lante se describen algunos aspectos de implemen-
plos de VMLC son las tareas periódicas y es-
tación. Por último se hace referencia a la aplica-
porádicas y los objetos de datos compartidos.
(ASSERT VM). Ejem-
ción preliminar de estos conceptos en algunos proyectos piloto de tipo industrial.
La gura 1 muestra un esquema general del proceso de desarrollo establecido en ASSERT. En esta
2.
El proceso de desarrollo de
gura se resalta la separación entre las diferentes
ASSERT
vistas del sistema (funcional, interfaz, concurrencia. . . ) y se distinguen a ambos lados de la línea
El proceso de desarrollo de ASSERT [1] está basado en los siguientes principios:
vertical, la parte visible para los diseñadores del
sistema (a la izquierda), de la parte que queda
oculta tras un proceso de transformaciones auto-
La denición de un metamodelo especíco
máticas.
que permite expresar las propiedades que in-
El proceso de desarrollo de ASSERT permite que
teresan en el dominio de aplicación que se
los ingenieros encargados del desarrollo del siste-
considera.
ma se concentren en el aspecto funcional del mis-
La separación entre los aspectos relacionados
con la funcionalidad, concurrencia, interfaz y
aquellos dependientes de la plataforma de ejecución.
mo realizando su trabajo mediante los modelos de
mayor nivel de abstracción (APLC). Dichos elementos no son directamente ejecutables sobre la
máquina virtual. Es el empleo de transformaciones automáticas lo que permite convertir el mo-
La denición formal de propiedades y de
delo anterior en otro semánticamente equivalente
transformaciones entre modelos que garanti-
pero formado ya exclusivamente por VMLC. El
zan que se preservan las propiedades tempo-
último paso consiste en emplear una serie de pa-
rales.
trones predenidos para generar a partir de los
VMLC el código fuente que se ejecutará nalmen-
El metamodelo es una denición precisa de los ele-
te sobre la máquina virtual de ASSERT.
mentos que se pueden utilizar para construir modelos de sistemas informáticos en un dominio de-
Una característica importante de este esquema es
terminado [7]. En el caso de ASSERT se ha deni-
que la semántica de todos los tipos de componen-
do un metamodelo basado en el perl de Ravens-
tes se basa en un metamodelo único, RCM. Es-
car [2], un subconjunto de la parte de concurrencia
ta base formal permite la transformación de unos
del lenguaje Ada [8] orientado a la realización de
modelos en otros preservando sus propiedades, in-
sistemas de alta integridad, con un comportamien-
cluidas las temporales.
to temporal previsible y analizable. Este metamo-
La gura 2 representa cómo el contenido de un
delo se denomina RCM (Ravenscar
Model ).
Computation
APLC es empotrado en la estructura ja de uno
o más VMLC que ya incluye los mecanismos ne-
La separación entre los aspectos funcional y tem-
cesarios para preservar las propiedades denidas a
poral se lleva a cabo mediante la noción de
alto nivel.
nedor.
conte-
Un contenedor es un componente genérico
en el cual se insertan los elementos funcionales del
3.
La máquina virtual de ASSERT
software, asegurando por construcción el resto de
las propiedades del mismo. En ASSERT se distin-
La máquina virtual de ASSERT es la plataforma
guen dos tipos de contenedores:
de ejecución para los elementos nales del sistema de tiempo real (VMLC). Para poder garanti-
Los
contenedores
de
aplicación
application-level containers ).
(APLC,
Denen
zar las propiedades de tiempo real especicadas en
los
los niveles de abstracción superiores, sólo acepta
componentes del sistema (por ejemplo, con-
entidades que son legales desde el punto de vista
troladores o sensores), incluyendo su conteni-
del modelo de cómputo elegido, RCM. Estas en-
do funcional y la interfaz a través de la cual
tidades, que toman la forma de componentes de
interactúan entre sí, todo ello de una forma
máquina virtual, responden a los siguientes
abstracta, independiente de la plataforma de
ejecución.
Los contenedores de máquina virtual (VMLC,
virtual-machine-level containers ). Denen los
tipos :
arque-
Componentes activos: periódicos o esporádicos, ejecutan una acción cada cierto tiempo
74
Código
funcional
Vista
funcional
Vista
interfaz
Vista
concurrencia
Análisis
viabilidad &
sensibilidad
WCET
Código
final
Información
de despliegue
Plataforma
de ejecución
Figura 1: Modelo de componentes y vistas de ASSERT
zados únicamente por un componente activo.
APLC
Por tanto, no necesitan ningún tipo de protección contra el acceso concurrente.
VMLC
TAREA
Los arquetipos de componentes responden al moMÓDULO DE CONTROL
MÓDULO FUNCIONAL
delo de concurrencia de Ravenscar (RCM). La máquina virtual de ASSERT soporta la ejecución de
ejemplares de estos arquetipos con distinto contenido funcional y atributos temporales especí-
Concurrencia
Funcional
cos (período, tiempo de ejecución máximo, plazo
de respuesta, etc.). Además, incluye mecanismos
Interfaces
de supervisión que permiten detectar errores de
temporización durante la ejecución del sistema y
ejecutar acciones de corrección siempre que sea
Figura 2: Esquema de transformaciones de AS-
posible. Estos mecanismos permiten:
SERT
Asegurar la ejecución periódica de los compo(en el caso de los componentes periódicos) o
nentes periódicos.
cada vez que ocurre un suceso determinado
Vigilar el tiempo de respuesta de cada acción,
(en el caso de los esporádicos).
detectando inmediatamente el incumpliento
Componentes protegidos: contienen objetos
del plazo especicado.
compartidos por dos o más componentes ac-
Medir el tiempo de ejecución en el procesador
tivos, protegidos contra el acceso concurrente
de cada acción.
para garantizar su integridad, proporcionando exclusión mutua en sus accesos.
Atribuir una cuota de tiempo de ejecución a
cada componente, y detectar su violación por
Componentes pasivos: contienen objetos utili-
exceso.
75
Asegurar un tiempo mínimo entre activaciones para los componentes esporádicos.
Utilizar el tiempo en el que el procesador permanece ocioso para corregir situaciones de sobrecarga sin comprometer el cumplimiento de
los plazos por parte de las demás tareas.
Con estas características, la máquina virtual de
ASSERT no sólo permite ejecutar aplicaciones
conforme al modelo cumputacional de Ravenscar,
sino que además tiene la capacidad de implemen-
Figura 4: Computador LEON2
tar un mecanismo de particionado temporal que
permite garantizar que la correcta ejecución de
una aplicación no se va a ver afectada por problemas derivados de un comportamiento defectuoso
por parte de otra aplicación, a pesar de que ambas
sean ejecutadas por el mismo procesador.
(ESA) con el n de validar las tecnologías y métodos desarrollados en el proyecto, incluyendo la
máquina virtual descrita en este artículo. Dichos
proyectos son:
Además, la máquina virtual de ASSERT permite
HRI: orientado al dominio de los satélites de
la ejecución distribuida de partes de un sistema en
larga duración sin posibilidad de manteni-
distintos computadores unidos por una red local
miento;
con un comportamiento temporal determinista.
MPC: centrado en los sistemas distribuidos
La elección del modelo de cómputo de Ravens-
basados en otillas de satélites;
car sugiere la utilización del lenguaje Ada para la
MA3S: centrado en sistemas críticos con res-
implementación de la máquina virtual. El están-
pecto a la misión embarcados en vehículos es-
dar actual del lenguaje [8] soporta todos los ele-
paciales sin tripulación.
mentos del perl de Ravenscar, junto con una serie de mecanismos de temporización y supervisión
que permiten satisfacer los requisitos anteriores y
proporcionar aislamiento temporal. Por tanto, la
máquina virtual de ASSERT se ha construido en
1
torno a un compilador de Ada 2005 (GNAT ), junto con un núcleo de tiempo real especializado para
el perl de Ravenscar ORK [10]. La versión actual
se ejecuta sobre un computador basado en el procesador LEON2, que es una versión resistente a la
2
radiación de la arquitectura SPARC8 .
La versión denitiva de la máquina virtual de ASSERT se ha empleado con éxito en los proyectos mencionados, siendo especialmente relevante
el proyecto HRI, en cuya presentación nal se ha
mostrado un sistema consistente en un satélite con
SpaAltimeter que
dos nodos (LEON2) comunicados por un bus
cewire, que contiene un Imager
y un
permiten obtener datos topográcos para estudios
posteriores. En dicho sistema se ha utilizado software real incluyendo la gestión de telemetría y te-
La arquitectura de la máquina virtual contiene,
lecomandos (datos enviados y recibidos por el sa-
además, otros componentes destinados a sopor-
télite respectivamente), el control de guiado y na-
tar la distribución (gura 3): una pila de proto-
vegación, más todo aquello necesario para la ope-
colos para el servicio de transferencia de mensajes
ración real de un satélite.
(MTS) de SOIS
3 [3], para una red local
Spacewire,
y una capa de software de intermediación basada
en PolyORB-HI [6], un
middleware
especializado
para sistemas de alta integridad y compatible con
el perl de Ravenscar.
5.
Conclusiones y traba jo futuro
La aproximación al desarrollo de software de tiempo real descrita en el artículo se basa en los siguientes principios:
4.
Experiencia industrial
Separación entre los aspectos funcionales del
En el marco del proyecto ASSERT se han denido
sistema de aquellos relacionados con la sin-
tres
cronización y temporización (modelo de con-
proyectos piloto
dirigidos por entidades públi-
cas y privadas de gran peso en el sector aeroespacial, incluyendo a la Agencia Espacial Europea
1
www.adacore.com
2
www.gaisler.com
3
Spacecraft On Board Interface Services
currencia y tiempo real);
Denición de un modelo formal de concurrencia, el modelo de cómputo de Ravenscar, en
el que se basan los modelos del sistema con
distintos niveles de abstracción;
76
Application code (generated)
Application code (generated)
ASSERT middleware
ASSERT middleware
SOIS MTS
comms services
SOIS MTS
comms services
Comms
drivers
ASSERT RT kernel
Comms
drivers
LEON 2 hardware
ASSERT RT kernel
LEON 2 hardware
SpaceWire communication channel
Figura 3: Arquitectura de la máquina virtual de ASSERT
77
Desarrollo del software basado en modelos y
In Nabil Abdennadher and Fabrice Kordon,
en transformaciones entre modelos que pre-
editors,
servan las propiedades temporales del sistema;
Ejecución del software sobre una plataforma especializada, la máquina virtual de AS-
12th International Conference on Reliable Software Technologies Ada-Europe
2007, number 4498 in LNCS, pages 114127.
Springer-Verlag, 2007.
[2] Alan Burns, Brian Dobbing, and George Ro-
SERT, que garantiza la satisfacción de los re-
manski.
quisitos temporales del sistema por construc-
high integrity real-time programs. In Lars As-
cion.
plund, editor,
Aislamiento entre aplicaciones que permite
evitar que una aplicación defectuosa comprometa el cumplimiento de los plazos establecidos para las demás.
The Ravenscar tasking prole for
Reliable Software Technologies
Ada-Europe'98, number 1411 in LNCS, pages 263275. Springer-Verlag, 1998.
[3] Consultative
Esta aproximación ha mostrado su viabilidad con
la construcción de un sistema con hardware y soft-
Committee
for
Space
Data
CCSDS Spacecraft
On-board Interface Services Green Book CCSDS 830.0-G-0.4, December 2004. Draft.
Standards
(CCSDS).
estructura y la implementación de la máquina vir-
ECSS-E-40 Part 1B: Space engineering - Software - Part 1: Principles and requirements, November 2003. Available from
tual con los siguientes desarrollos:
ESA.
ware real coordinado por la Agencia Espacial Europea. En el futuro próximo se prevé mejorar la
Particionado espacial, que permita el aislamiento en memoria de los datos referidos
a una aplicación, proporcionando protección
frente a accesos erróneos provocados por una
aplicación ajena defectuosa.
Mejora del aislamiento temporal, utilizando
el nuevo hardware disponible en versiones futuras de la plataforma hardware LEON.
[4] ECSS.
ECSS-Q-80B Space Product Assurance - Software Product Assurance, 2003. Avai-
[5] ECSS.
lable from ESA.
[6] Jérôme Hugues, Bechir Zalila, and Laurent
Pautet.
WiP'06,
Proceedings of RTSS-
MDA Guide Version 1.0.1, 2003. Avaiwww.omg.org/mda.
lable at
Este trabajo ha sido nanciado parcialmente por
el Plan Nacional de I+D+I (proyecto TIN200508665-C03-01) y por el 7
In
pages 14, Rio de Janeiro, Brazil,
December 2006. IEEE.
[7] OMG.
Agradecimientos
Middleware and tool suite for high
integrity systems.
o Programa Marco de la
[8] S.
T.
Taft,
R.
A.
Du,
R.
L.
Bru-
kardt, E. Plöedereder, and P. Leroy, edi-
Jorge López su contribución a la implementación
Ada 2005 Reference Manual. Language
and Standard Libraries. International Standard ISO/IEC 8652/1995(E) with Technical
Corrigendum 1 and Amendment 1. Number
de la máquina virtual.
4348 in Lecture Notes in Computer Science.
Unión Europea (proyecto ASSERT, IST-004033)).
Los autores desean agradecer a José Redondo y
Las ideas fundamentales sobre el modelo de componentes y transformaciones se deben al equipo
tors.
Springer-Verlag, 2006.
[9] Juan
Zamorano,
Alejandro
Alonso,
and
dirigido por Tullio Vardanega en la Universidad
Juan Antonio de la Puente. Building safety
de Padua.
critical real-time systems with reusable cy-
El trabajo sobre el
middleware
PolyORB-HI se ha
desarrollado en la ENST de Paris bajo la dirección
de Laurent Pautet y Jérôme Hugues.
El software para MTS que forma parte de la máquina virtual ha sido desarrollado en SciSys por
Stuart Fowell y Marek Prochazka.
Referencias
[1] Matteo Bordin and Tullio Vardanega. Correcteness by construction for high-integrity realtime systems: A metamodel-driven approach.
clic executives.
Control Engineering Practice,
5(7), July 1997.
[10] Juan
Zamorano
GNAT/ORK:
An
and
open
José
F.
Ruiz.
cross-development
environment for embedded Ravenscar-Ada
software.
In Eduardo F. Camacho, Luis
Basañez, and Juan Antonio de la Puente,
Proceedings of the 15th IFAC World
Congress. Elsevier Press, 2003.
editors,
78
Middleware based on XML technologies for achieving true interoperability
between PLC programming tools
E. Estevez, M. Marcos, F. Perez
D. Orive
Automatic Control and Systems Engineering Department, University of the Basque Country
Bilbao, Spain, (e-mail: [email protected], [email protected], [email protected], [email protected])}
Abstract: Industrial Process Measurement and Control Systems are used in most of the industrial sectors
to achieve production improvement, process optimization and time and cost reduction. Integration, reuse,
flexibility and optimization are demanded to adapt to a rapidly changing and competitive market. In fact,
standardization is a key goal to achieve these goals. The international standardization efforts have lead to
the definition of the IEC 61131 standard. Part 3 of this standard defines a software model for defining
automation projects as well as 5 programming languages. Nowadays, a major part of Programmable Logic
Controllers (PLC) vendors follows this standard, although each programming tool adds particularities and
stores the automation project in different manner. But, although they may use the same software model and
the same programming languages, source code reuse is not available. This work presents an infrastructure
that allows transferring source code from one PLC programming tool to any other transparently to the
users. The proposal consists of a textual expression of the software model and the programming languages,
as well as the mechanisms, based on XML technologies, to achieve tool interoperability.
1. INTRODUCTION
Nowadays most of the industrial sectors use Programmable
Logic Controllers (PLCs) to achieve the control of their
productive systems. In the last years, technological advances
in these controllers allow the production improvement,
process optimization and time and cost reduction. On the
other hand, for many years, only proprietary programming
languages could be used for vendor specific equipment.
Although some languages, such as ladder diagram or
instruction list were very widespread, their implementation
used to be rather different. It was obvious the need of
standardization in the field, covering from the hardware to
configuration issues, up to the programming languages. In
1993, the International Electrotechnical Commission (IEC)
published the IEC 61131, International Standard for
Programmable Controllers (IEC, 2003).
The IEC 61131-3 standard deals with the software model and
programming languages for Industrial Process Measurement
and Control Systems (IPMCS) (Lewis, R.W, 1998), (John,
K.H and Tiegelkamp M, 2001). In this sense, it has provoked
a movement to Open Systems in this application field. Thus,
the so-called Open PLCs that are open architecture
controllers that replace a PLC with a computer, have begun to
appear in the market.
Nowadays, most of the PLC vendors are doing a great effort
for becoming IEC 61131-3 standard compliant. In fact, this
offers great advantages to the control system engineers, as the
programming techniques become vendor independent.
Notwithstanding this, the standard does not specify an
import/export format but the elements of the software model
and the mechanisms to be offered to the user in order to
graphically define an application. Thus, every tool uses its
own storage format an offers commonly a set of Application
Program Interface (API) functions or, alternatively, an
import/export option. In this sense, it is impossible to reuse
the code programmed in one tool in others. It is necessary to
edit the code again. In order to achieve true reusability,
interoperability among tools is needed. There are
international organizations, such as, PLCopen (1992), a
vendor- and product-independent worldwide association,
whose mission is to be the leading association resolving
topics related to control programming. Its main goal is to
support the use of international standards in this field.
PLCopen has several technical and promotional committees
(TCs). In particular, TC6 for XML has defined an open
interface between all different kinds of software tools, which
provides the ability to transfer the information that is on the
screen to other platforms. The eXtensible Markup Language
(XML) (W3C, 2006a) was selected for defining the interface
format and in April 2004, the first XML schema for the
graphical languages was released for comments (W3C,
2004).
Nevertheless, the PLCopen interface is not universally
supported yet. Besides, the proposed interface focuses mainly
on transferring what is in the screen and thus, adds graphical
information as well as new elements to those used by the
standard. Finally, it does not impose an architectural style,
assuming that the code being transferred is correct.
The goal of the work presented here goes further, as an
interoperability middleware is proposed. It consists of a
common XML format for representing the IEC 61131-3
software model and languages and the mechanisms to
import/export information from/to every tool.
The layout of the paper is as follows: section 2 briefly
describes the elements of the IEC 61131-3 software model.
Section 3 presents the interoperability middleware that acts as
a common road for achieving true interoperability. Finally,
section 4 illustrates an example of interoperability between
two PLC programming tools.
2. THE IEC 61131-3 SOFTWARE MODEL
This section describes the software model proposed by IEC
61131-3 standard in order to identify the architectural style
and composition rules that any application IEC 61131-3
compliant must meet.
The architectural style is identified in a Component-based
fashion. The component-based strategy aims at managing
complexity, shortening time-to-market, and reducing
maintenance requirements by building systems with existing
components. Generally speaking, software architectures are
defined as configurations of components and connectors.
An architectural style defines the patterns and the semantic
constraints on a configuration of components and connectors.
As such, a style can define a set or family of systems that
share common architectural semantics (Medvidovic, N. and
Taylor, R.N., 1997).
In (E. Estevez, et al, 2007a) the different components and
connectors for defining software model of IEC 61131-3 are
identified. Two types of components can be distinguished:
those that do not encapsulate code, Configurations, Resources
and Tasks, and the Program Organization Unit (POU) that
encapsulates code. These latter can be Programs, Function
Blocks and Functions.
79
communication between programs residing in different
resources of the same configuration. VAR_GLOBAL at
resource level identifies the communication between
programs of the same resource. Finally, VAR_LOCAL
identifies the communication among nested POUs. In Fig. 1
the IEC 61131-3 software model architectural style is
illustrated using a meta-model expression. Every component
and connector has its own characteristics as defined in (E.
Estevez, et al, 2007a). The architectural style needs to be
combined with a set of composition rules in order to assure a
correct software architecture definition. In Table 1 the
identified composition rules are summarized.
TABLE 1. COMPOSITION RULES FOR IEC 61131-3 SW MODEL
Id
1
2
3
4
5
6
7
Rule
The type of Global Variable must be elementary or
defined previously by the programmer
The value of Global Variables must be in concordance
with its type
The type of POU formal parameters must be elementary
or previous defined previously by the programmer
The value of POU instance parameters must be in
concordance with its type
An Access variable must give permission to previously
defined variable
Resources of the same configuration must be
downloaded to the same processor
Resource POU instances only could be organized by
tasks of the same resource
3. INTEROPERABILITY MIDDLEWARE
The proposed interoperability middleware is formed by two
main modules. The first one consists of representing the IEC
61131-3 software model in a standard and generic format.
The latter is related to the integration mechanisms that allow
connection the tools through the middleware, achieving code
exchange.
The proposed middleware is based on XML technologies. In
particular XML schema (W3C, 2004) jointly with
schematron rules (Rick, J., 2006) has been used for defining a
common and generic format of the software model. This
format takes into account both, the architectural style and the
composition rules. Integration techniques involve related
XML technologies, such as XML stylesheets and Document
Object Model (DOM) (W3C, 2005) or Simple API of XML
(SAX, 2004) jointly with a programming language. Fig. 2
illustrates the general scenario of the proposed middleware
for achieving true interoperability between any PLC
programming tools.
Fig. 1. Architectural style of IEC 61131-3 software model
On the other hand, in this model, the variables act as
connectors. They represent the communication between
software components. In fact, their visibility identifies the
components that are involved in the communication:
VAR_ACCESS identifies the communications between
programs
residing
in
different
configurations.
VAR_GLOBAL
at
Configuration
level
identifies
Fig. 2. General Scenario of interoperability of PLC
programming tools
80
Fig. 3 Architectural style of IEC 61131-3 software model in swEng markup Language
The following sub-sections define both modules in more
detail: a generic representation format of the software model
and the integration mechanisms.
3.1. Standard format for Reusable Code
As commented above, the generic format proposed for
representing the IEC 61131-3 software model, uses the W3C
schema and schematron rules. In particular, each model
element is represented as XML schema element. The
architectural style is performed making use of the choice,
sequence and multiplicity mechanisms of W3C schema (E.
Estevez, et al, 2007a,b). On the other hand, the composition
rules are performed by means of the key/keyref constraints of
the W3C schema (Van der Vlist, E., 2002) and also by means
of schematron rules. Fig. 3 illustrates a general overview of
the IEC 61131-3 standard software model. It illustrates the
architectural style as well as the composition rule number
five of Table 1 that is implemented by means of the
key/keyref mechanism.
The complete definition of this module is available in
(swEngML, 2007). This interface could be very useful in
order to test how much the programming tool is IEC 61131-3
standard compliant.
3.2. Integration Techniques
In this section, the different integration techniques that allow
transferring information between programming tools and the
interoperability middleware are analyzed.
The integration techniques depend on the export/import
format of the PLC programming tool, as well as on the API
the tool offers. XML technologies can be very useful for
implementing tool integration.
Three tools categories can be distinguished:
a)
Tools that import/export information in XML format,
such as MultiprogTM from KW software (KW software,
2006) or Unity ProTM from Schneider (Unity ProTM,
2007).
b)
Tools that import/export information in structured text
format e.g. CoDeSysTM from 3S (Smart Software
Solutions) (CoDeSys, 2006)(Automation Alliance,
2006).
c)
Tools that import/export information in any other
format, e.g. ISaGRAFTM from ICS Triplex (ICS triplex,
ISaGRAF, 2006) and Simatic AdministratorTM from
Siemens (Administrador Simatic, 2006).
Following sub-sections describe the selected integration
technique for each type of PLC programming tool.
Tools with import/export option in XML format. If the tool
provides XML interface or it allows exporting/importing
projects to/from a XML file, the integration is practically
direct. In this case, it is necessary to develop a XML
stylesheet (W3C, 2006b). This XML technology can be used
for processing an input XML file coming from the tool,
filtering information and transforming it giving as output the
reusable code expressed in the format proposed in this work
(tool2standard.xsl). XSLT technology offers two type of
templates that can be used to define the processing of the
input file. The match template contains the processing to be
applied to a particular XML element. This processing could
be organized by means of the so-called name templates
(Tidwell, D., 2001).The same XML technology is also used
for filtering and transforming the reusable code expressed in
standard form to the format of the target PLC programming
tool (standard2tool.xsl).
In the first case the XSL match and name templates are tool
dependent. The match templates of the second case are
known; as they correspond to the elements of the IEC 611313 standard. Notwithstanding this, the algorithms they contain
are again tool dependant. In Table 2 the necessary templates
for transforming the reusable code in standard format to tool
format are summarized.
Tools that import/export structured text. Although the
number of tools that allows exporting/importing information
in XML format is increasing, currently it is not the common.
This sub-section describes the integration techniques for
those tools that allow exporting/importing code to/from
81
structured text format. In this case, it is necessary to know the
file structure.
provided by the tool API, as well as methods and functions
offered by SAX or DOM.
TABLE 2. LIST OF STANDARD2TOOL.XSL TEMPLATES
Thus, the integration techniques for capturing information
from the tool and to express it in a generic format consist of
an application that generates an initial XML file and an XML
stylesheet for filtering and transforming the information to
the standard format.
Templates
match=”sw:swEng”
<xsl:template match="sw:DataTypes">
<xsl:template match="sw:POUs">
<xsl:template match="*" mode="POU">
<xsl:template match="sw:Interface">
<xsl:template match="sw:Variables">
<xsl:template match="sw:Body">
<xsl:template match="sw:FBD">
…
<xsl:template
match="sw:Configuration">
<xsl:template match="sw:Resource">
<xsl:template name="globalVars">
<xsl:template match="sw:Task">
<xsl:template match="sw:ProgInst">
Characteristics
Main template, it guides de
transformation.
Generates the derived data
types
Organizes the POU structure
Selects its type of POU
A set of templates for
generating the POU interface
and functionality expressed
in any of 5 languages of IEC
61131-3
A set of templates for
generating the automation
project itself
There are different technologies that allow transforming
structured text into XML, such as the Chaperon project
(Stephan M, 2000). It can also be achieved by developing an
application having as input file the structured text file and
making use of DOM or SAX methods for generating an XML
file. Finally, a XML stylesheet will be necessary for
transforming this XML file to the standard format (see Fig.
4).
The application programming language depends on the tool
API and it also needs:
•
The functions provided by the tool API for getting
information from the automation project.
•
The functions or methods provided by SAX or DOM.
They are very useful for generating an initial XML file.
In the same way, the integration techniques for setting
information coming from the generic XML file to the tool
consist of an XML stylesheet that adds the tool particularities
to the XML file. Furthermore, it is necessary an application
that reads this XML file and sets all information in the
storage format of the target tool. In this sense, the
programming language of this application also depends on
the tool API and it makes use of:
•
SAX or DOM methods for reading and manipulating the
input XML file
•
Tool API functions for setting information into target
tool storage format.
Fig. 4. General scenario for importing/exporting reusable
code in structured text format
Fig. 5. General scenario for getting/setting information via
tool API
If the tool needs to import source code, XML stylesheets can
be used to transform the code expressed in the generic XML
format to a text file that follows the structure that the tool
expects. Besides the templates of Table 2, it is also necessary
to indicate to the XSLT processor the extension of the
resulting text file. Fig. 4 illustrates the general scenario for
exporting and importing reusable code.
Fig. 6 illustrates the general scenario for getting from the tool
information via its API and transforming it to the generic
format. In the same way this figure also illustrates the
mechanisms for setting the code expressed in the standard
format (ReusableCode.xml) to the target tool storage format.
Tools that import/export in any other format. Currently, this
is the more common case, in which tools neither have an
XML interface nor export/import code to/from structured
text. In this case the integration technique consists of
developing an application that makes use of functions
4. CASE STUDY
This section illustrates the proposed interoperability
framework as applied to transfer one POU programmed in
CoDeSysTM from 3S to MultiprogTM from KW software. The
first tool allows exporting/importing information to/from
structured text format. The second follows the interface
proposed by PLCopen TC6 XML. The code to be transferred
82
is a very simple example of a function (inRange) written in
Function Block Diagram language. This POU checks if the
content of a variable is between a minimum and a maximum.
To do this, three standard functions have been used (LT, GT
and AND). Fig. 6 illustrates this POU programmed in
CoDesysTM PLC programming tool.
The first step, to achieve the interoperability is to export the
code in text format. This definition contains the POU
interface, formed by three input formal parameters, and the
body expressed by reserved commands which represent the
functionality, originally in FBD, in text format (note that this
is a particularity of the CoDeSysTM tool).
In the second step (see Fig. 5 ) an application programmed in
Java using DOM translates the textual file into an XML file.
An XML stylesheet is applied to this file obtaining the
ReusableCode.xml (illustrated in Fig. 6).
Fig. 7. Project expressed in generic format (ReusableCode.xml)
Fig. 8. Example in PLCopen TC6 XML format
Fig. 6. POU example programmed in CoDeSysTM
The third step consists of transforming the generic format
XML file to the PLCopen TC6 XML grammar. This
transformation is done by means of a XML stylesheet. The
resulting file of the transformation is illustrated in Fig. 8.
This file contains graphical information, sometimes as
attributes of a XML element (e.g. a block has the width and
height attributes), and sometimes as a new element of the
schema (e.g. a child element of a block is its position in the
screen). Finally, the MultiprogTM tool can import this file.
The resulting project is illustrated in Fig. 9.
Fig. 9. InRange POU reused in MultiprogTM KW
Thus, by means of proposed integration middleware the
inRange POU initially programmed in CoDeSysTM can be
reused in MultiprogTM PLC programming tool. It is important
to remark that the transformations, which are task of the
middleware, are transparent to the user.
5. CONCLUSIONS
This paper has presented a formal approach that allows
transferring source code among different vendors of PLC
programming tools. Therefore, true interoperability between
tools is achieved by means of the proposed middleware. The
potentiality of XML technologies have been used for
developing the interoperability middleware. In particular
XML schema jointly with schematron rules form the core of
the proposed middleware. They have been used to express in
XML the IEC 61131-3 software model, taking into account
the architectural style and composition rules. The middleware
also offers mechanisms for tool integration making use of
related XML technologies, such as XSLT, SAX and DOM.
These techniques allows both transforming any tool model to
a generic XML format, and obtaining a vendor
understandable information from this generic XML format.
7. ACKNOWLEDGEMENTS
This work was financed by the MCYT&FEDER under DPI2006-4003 and DIPE 06-16.
REFERENCES
Administrador Simatic, 2006. available at:
http://www.automation.siemens.com/
83
Automation Alliance, 2006.available at:
http://www.automation-alliance.com/
CoDeSys, 2006. CoDeSys of Smart Software Solutions,
2006. available at:
http://www.3s-software.com/
E. Estévez, M. Marcos and D. Orive (2007a). Automatic
Generation of PLC Automation Projects from
Component-Based Models. The International Journal of
Advanced Manufacturing Technology. Springer London.
[Online] Available at:
http://www.springerlink.com/content/1517w5244341652
6/
E. Estévez, M. Marcos, D. Orive, E. Irisarri and F. Lopez
(2007b). XML based Visualization of the IEC 61131-3
Graphical Languages. Proc. of the 5th International
Conference on Industrial Informatics, pp: 279-285
(INDIN 2007). Vienna, Austria.
IEC (2003). International Electrotechnical Commision. IEC
International Standard IEC 61131-3 Programmable
Controllers. Par3: Programming Languages.
ICS triplex, ISaGRAF, 2006. available at:
http://www.isagraf.com/
John, K.H and Tiegelkamp M. (2001). Programming
Industrial Automation Systems. Springer.
KW software, 2006. available at:
http://www.kw-software.com/
Lewis, R.W. (1998). Programming Industrial Control
Systems using IEC 61131-3.IEE Control Engineering
Series.
Medvidovic, N. and Taylor, R.N. (1997). Exploiting
architectural style to develop a family of applications.
IEE Proc. Software Eng. 144 (5–6), pp:237–248.
PLCopen (1992), Web-site: http://www.plcopen.org
Rick J. (2006). Resource Directory (RDDL) for Schematron
1.5. Web Site: http://xml.ascc.net/schematron/
SAX (2004). Simple API of XML (SAX).
Web Site: http://www.saxproject.org/
Stephan M. (2000), Chaperon Project.
Web Site: http://chaperon.sourceforge.net/index.html
swEngML, (2007). IEC 61131-3 Markup Language.
Available at:
http://www.disa.bi.ehu.es/gcis/projects/merconidi
Tidwell, D.(2001). XSLT, Ed. O’REILLY.
Unity ProTM, 2007.Unity ProTM from Schneider is available
at: http://www.schneider-electric.com/
Van der Vlist, E. (2002), “XML Schema”. Ed. O’REILLY.
W3C (2004). XML Schema Part 0: Primer (Second Edition),
W3C REC-xmlschema-0-20041028. Available at:
http://www.w3.org/TR/2004/REC-xmlschema-020041028/
W3C (2005). Document Objecto Model (DOM),
Web Site http://www.w3.org/DOM/
W3C (2006a). eXtenslble Markup Language (XML) 1.0
(Fourth Edition), W3C Recommendation. Available at:
http://www.http://www.w3.org/TR/2006/REC-xml20060816/
W3C (2006b). Extensible Stylesheet Language (XSL)
Version 1.1, W3C Proposed Recommendation PR-xsl1120061006.
84
Real-Time Distribution Middleware from the Ada
Perspective1
Héctor Pérez, J. Javier Gutiérrez, Daniel Sangorrín, and Michael González Harbour
Computers and Real-Time Group
Universidad de Cantabria, 39005 - Santander, SPAIN
{perezh, gutierjj, daniel.sangorrin, mgh}@unican.es
http://www.ctr.unican.es/
Abstract. Standards for distribution middleware sometimes impose restrictions
and often allow the implementations to decide on aspects that are fundamental to
the correct and efficient behaviour of the applications using them, especially
when these applications have real-time requirements. This work presents a study
of two standard approaches for distribution middleware and the problems associated with their implementations in real-time systems. Moreover, the paper considers the problem of integration of the distribution middleware with a new
generation of scheduling mechanisms based on contracts. The standards considered are RT-CORBA, and the Distributed Systems Annex (DSA) of the Ada programming language.
Key words: distribution middleware, real-time, real-time communications, RTCORBA, Ada DSA, performance.
1
Introduction2
The concept of a distributed application is not new; it has existed since two computers
were first connected. However, the programming techniques of these systems have
evolved greatly and they have become especially relevant in the last decade.
Traditionally, message-passing mechanisms were used for communication among the
parts of a distributed application where the communications among the application
parts were done explicitly by the programmer. Since then, new object distribution
techniques have evolved, for instance using Remote Procedure Calls (RPCs) that allow
operations to be transparently used regardless of whether the functionality is offered in
the local processor or in a remote one.
The object distribution paradigm is probably the most relevant in current industrial
applications, and an important example is the CORBA standard [12]. This standard
also includes other distribution techniques of highest level such as CCM (CORBA
Component Model), or DDS (Data Distribution Service), but their degree of
1. This work has been submitted to the 13th International Conference on Reliable Software
Technologies, Ada-Europe 2008.
2. This work has been funded in part by the Spanish Ministry of Science and Technology under grant
number TIC2005-08665-C03-02 (THREAD), and by the IST Programme of the European
Commission under project FP6/2005/IST/5-034026 (FRESCOR). This work reflects only the
author’s views; the EU is not liable for any use that may be made of the information contained herein.
acceptance in industry is still lower. CORBA provides an object distribution
mechanism based on a language for the specification of interfaces (IDL, Interface
Definition Language) that enables the use of different programming languages in the
development of an application.
In addition to distribution standards, there are programming languages that allow the
development of distributed applications. This is the case of Java (a de facto standard)
with its specification for distributed systems, Java RMI (Java Remote Method
Invocation) [17], based on the distribution of objects. Also, the Ada standard allows
distribution through its DSA (Distributed Systems Annex, Annex E) [19], which
supports both distribution of objects and RPCs.
This work will focus on the real-time perspectives of distribution with the CORBA and
Ada standards. It does not consider Java RMI as real-time aspects have not yet been
fully addressed in commercial implementations. RT-CORBA [13] offers the CORBA
specification for real-time systems, and although Ada’s DSA is not specifically
designed for real-time systems, there are works that demonstrate that it is possible to
write real-time implementations within the standard [14][7][8]. The aims of this work
are to make a comparative study of the functionality offered by these standards for
implementing distributed real-time applications, an analysis of some of their
implementations from the viewpoint of management of calls to remote resources, and
an experimental evaluation of the response times that can be obtained in remote calls
in order to get an idea of the overheads introduced. Sometimes, the real-time
distribution middleware is developed over operating systems and network protocols
that are not real time. In this work some of the middleware implementations are
evaluated over a real-time platform.
The evolving complexity of real-time systems has lead to the need for using more
sophisticated scheduling techniques, capable of simultaneously satisfying multiple
types of requirements such as hard real-time guarantees and quality of service
requirements, in the same system. To better handle the complexity of these systems,
instead of asking the application to interact directly with the scheduling policies,
scheduling services of a higher level of abstraction are being designed, usually based
on the concept of resource reservations [2]. The FRESCOR European Union project
[3] in which we are participating is aimed at investigating these aspects by creating a
contract-based scheduling framework. In this framework the application specifies its
requirements for resource usage through service contracts that are negotiated with the
system to obtain a resource reservation that optimizes the quality of service, while
providing guarantees of a minimum level of resource usage. The FRESCOR
framework is aimed at managing multiple resources such as processors,
communication networks, memory, disk bandwidth, and energy. In such an
environment it is necessary to study the relationship between the distribution
middleware and the scheduling framework. In [8], some initial ideas were given about
the integration of middleware and advanced scheduling services, and in this paper we
extend those ideas to address the problem of handling distributed transactions.
Our research group has been working for several years with real-time distributed
systems programmed in Ada. We believe that although the DSA is not very popular, it
85
86
enables straightforward programming of distributed systems, and it makes it possible
to incorporate into the real-time middleware concepts such as distributed transactions
[8] and flexible scheduling [2][3]. Thus, another objective of this work is to establish
the basis for incorporating the experience acquired in systems programmed in Ada into
the world of RT-CORBA.
The document is organized as follows. Section 2 is dedicated to the presentation of the
basic characteristics of the distribution middleware based on RT-CORBA and Ada’s
DSA, and their implementations. Section 3 analyses in detail the aspects of scheduling,
distribution mechanisms, and management of the remote calls proposed in the two
standards and their implementations. The evaluation and comparison of the response
times of calls to remote operations for these implementations is dealt with in Section 4.
Section 5 proposes the integration of the distribution middleware with the scheduling
framework for flexible scheduling. Finally, Section 6 draws the conclusions and
considers future work.
2
Real-Time Distribution Middleware
This section will describe the scheduling models of RT-CORBA and of the DSA for
the execution of remote calls and will discuss how the distributed transaction model
can be supported. Furthermore, the different implementations to be analysed, all of
which are open source code, will be briefly introduced.
A distributed transaction is defined as a part of an application consisting of multiple
threads executing code in multiple processing nodes, and exchanging messages with
information and events through one or more communication networks. In a
transaction, events arriving at the system trigger the execution of activities, which can
be either task jobs in the processors or messages in the networks. These activities, in
turn, may generate additional events that trigger other activities, and this gives way to
a chain of events and activities, possibly with end-to-end timing requirements [8]. This
model is traditionally used for analysing the response time in real-time distributed
applications.
The main characteristics of the architecture proposed by RT-CORBA in its
specification [12] with respect to scheduling are the following:
• Use of threads as scheduling entities, for which an RT-CORBA priority can be
applied and for which there are functions for conversion to the native priorities of
the system on which they run.
• Use of two models for the specification of the priority of remote calls (following
the Client-Server model): Client_Propagated (the invocation is executed in the
remote node at the priority of the client, which is transmitted with the request
message), and Server_Declared (all the requests to a particular object are executed
at a priority preset in the server). In addition, it is possible for the user to define
priority transformations that modify the priority associated with the server. This is
done with two functions called inbound (which transforms the priority before
running the server's code) and outbound (which transforms the priority with which
the server makes calls to other remote services).
• Definition of Threadpools as mechanisms for managing remote requests. The
threads in the pool may be preallocated, or can be created dynamically. There may
be several groups of threadpools, each group using a specific priority band.
• Definition of Priority-Banded Connections. This mechanism is proposed for
reducing priority inversions when a transport protocol without priorities is used.
Ada’s DSA does not have any mechanism for transmission of priorities and so its
implementation is left up to the criterion of the implementation. What is specified is
that it must provide mechanisms for executing concurrent remote calls, as well as
mechanisms for waiting until the return of the remote call. The communication among
active partitions is carried out in a standard way using the Partition Communication
Subsystem (PCS). The concurrency and the real-time mechanisms are supported by
the language itself with tasks, protected types and the services specified in Annex D. In
[4], a mechanism for handling the transmission of priorities in the DSA is proposed.
This mechanism is in principle more powerful than that of RT-CORBA, as it allows
total freedom in the assignment of priorities both in the processors and in the
communication networks used.
The specification of RT-CORBA incorporates a chapter dedicated to dynamic
scheduling, which basically introduces two concepts:
• The possibility of introducing other scheduling policies in addition to the fixed
priority policy, such as, EDF (Earliest Deadline First), LLF (Least Laxity First),
and MAU (Maximize Accrued Utility). The scheduling parameters are defined as
a container that can contain more than one simple value, and can be changed by
the application dynamically.
• The Distributable Thread that allows end-to-end scheduling and the identification
of Scheduling Segments each one of which can be run on a processor. This concept
is similar to the distributed transaction presented in [8].
Ada included in its latest revision the scheduling policies EDF and Round Robin as
part of its Real-Rime Systems Annex (Annex D). Nevertheless, it does not
contemplate the existence of distributed transactions.
Neither RT-CORBA nor Ada’s DSA consider the possibility of passing scheduling
parameters to the communications networks.
This work analyses and assesses the following implementations of RT-CORBA and the
DSA:
• TAO [18] is an open source implementation of RT-CORBA that has been evolving
for several years. The applications are programmed in C++ and the version we
have used (1.5) runs on Linux and TCP/IP. It is offered as an implementation of
the complete specification.
• PolyORB [15][20] is presented as a “schizophrenic” middleware that can support
distribution with different personalities such as CORBA, RT-CORBA, or DSA. It
is distributed with the GNAT compiler [1] and in principle it is envisaged for
applications programmed in Ada. The version used (2007) supports CORBA and
some basic notions of RT-CORBA (priorities and their propagation), and allows
87
88
distribution through the DSA although it does not incorporate the scheduling
mechanisms. The execution platform is Linux and TCP/IP.
• GLADE [14] is the original implementation of the DSA offered by GNAT [1] to
support the development of distributed applications with real-time requirements.
The scheduling is done through fixed priorities and implements two policies for
distribution of priorities in the style of RT-CORBA (Client Propagated and Server
Declared). The 2007 version is used, and once again the execution platform is
Linux and TCP/IP.
• RT-GLADE is a modification of GLADE that optimizes its real-time behaviour.
There are two versions: in the first one [7] free assignment of priorities in remote
calls is permitted (both in the processors and in the communication networks).
The second version [8] proposes a way of incorporating distributed transactions
into the DSA and giving support to different scheduling policies in a distributed
system. The execution platform is MaRTE OS [9] and the network protocol is RTEP [10]. This communication protocol is based on token passing in a logical ring
over standard Ethernet, and it supports three different scheduling policies: fixed
priorities, sporadic servers, and resource reservations through contracts [2][3].
3
Analysis of Distribution Middleware Implementations
The objective of this section is to analyse the mechanisms for management of remote
calls used by the implementations of RT-CORBA or DSA to support their respective
specifications. It also discusses about the properties of the solutions adopted and
whether they can be improved, both in the standards and in their implementations.
3.1. Implementations of RT-CORBA and DSA
From the viewpoint of management of remote calls, TAO defines several elements that
can be configured [16]:
• Number of ORBs. The ORB is the management unit of the calls to a service.
There may be several or only one, given that each ORB can accept requests from
different parts of the application.
• The strategy of the concurrency server. Two models are defined: The reactive one,
in which a thread is executed to provide service to multiple connections; and a
thread-per-connection, in which for each new connection the ORB creates a
thread to serve it.
• The threadpools. Two types of thread groups are defined with two different
behaviours. In the ORB-per-Thread model each thread has an associated ORB that
accepts and processes the services requested. In the Leader/Followers model the
user can create several threads and each ORB will select them in turns so they
await and attend new requests arriving from the network.
For the management of remote calls, PolyORB defines the following configurable
elements [15]:
• ORB tasking policies. Four policies are defined:
- No_Tasking: the ORB does not create threads and uses the environment task to
process the jobs
- Thread_Pool: a set of threads is created at start-up time; this set can grow up to
an absolute maximum, and unused threads are removed from it if its size
exceeds a configurable intermediate value.
- Thread_per_Session: a thread is created for each session that is opened
- Thread_per_Request: a thread is created for each request that arrives and is
destroyed when the job is done
• Configuration of the tasking runtimes. It is possible to choose among a Ravenscarcompliant, no tasking, or full tasking runtime system.
• ORB control policies. Four policies are defined that affect the internal behaviour
of the middleware:
- No Tasking: a loop monitors I/O operations and processes the jobs
- Workers: all the threads are equal and they monitor the I/O operations
alternatively
- Half Sync/Half Async: one thread monitors the I/O operations and adds the
requests to a queue, and the other threads process them
- Leader/Followers: similar to TAO, the threads take turns in awaiting and
processing requests
The implementation of the DSA carried out in GLADE defines a group of threads to
process the requests with similar parameters to those of PolyORB in terms of the
number of threads (minimum number of threads created at start-up time, stable value
and absolute maximum), and uses another two intermediate threads for the requests;
one awaits the arrival of requests from the network, and the other one processes these
requests and selects one of the threads of the group to finally process the job.
The modifications made to GLADE to obtain the first version of RT-GLADE
eliminated one of the intermediate threads, so that there was a thread waiting for
requests arriving from the net, which in turn activated one of the threads of the group
to carry out the job (similar to the Half Sync/Half Async of PolyORB but without the
intermediate queue). In the second version of RT-GLADE, an API was provided to
allow an explicit configuration of the threads that execute the jobs, and they are
designed to wait directly on the net. This is done through the definition of
communication endpoints which handle the association with the remote thread and
support the scheduling parameters for the network. These parameters, that can be
complex, are associated with the appropriate entity when a distributed transaction is
installed and do not need to be transmitted each time the remote service is called.
TAO, PolyORB, and GLADE all use the priority assignment policies defined in RTCORBA. In contrast, in the first version of RT-GLADE [7] free assignment of
priorities is allowed for the remote services and for the request and reply messages.
This approach enables the use of optimization techniques in the assignment of
priorities in distributed systems.
89
90
On the other hand, in the second version of RT-GLADE [8], the definition of the
connection endpoints allows the programming of distributed transactions, which are
identified just by specifying a small number at the beginning of the transaction.
Moreover, the transaction is executed with the scheduling parameters associated to its
threads and messages. This concept is similar to the distributable thread of RTCORBA, except that this specification never takes the network scheduling into
account. TAO implements this part of the dynamic scheduling of RT-CORBA, in
which dynamic changing of the scheduling parameters of a scheduling segment is
permitted [5].
In this work, we have made a prototype porting of PolyORB to the MaRTE OS [9]
real-time operating system and we have adapted it to the RT-EP real-time network
protocol [10]. The personality of CORBA (PolyORB-CORBA) allows the use of the
control policies of the ORB defined in PolyORB. The DSA personality of PolyORB
does not currently allow the definition of any particular control policy. For this
personality (PolyORB-DSA), a basic version of the scheduling defined in [8] has been
implemented over our real-time platform.
3.2. Discussion
This section discusses the analogies and differences found both in the implementations
and in the standards themselves, and their suitability for real time, through the
objectives that the real-time distribution middleware must pursue. Among these
objectives there are the following ones:
• Allow a schedulability analysis of the complete application. Although the
middleware is executed in the processor and it is no more than a user of the
networks through clearly separate interfaces, in many cases the timing behaviour
of the networks has a strong influence on the overall response times, and therefore
the networks should be scheduled with appropriate techniques [6]. The
middleware should incorporate the ability to specify the scheduling parameters of
the networks through suitable models. RT-GLADE could be used as a reference
[8].
• Transactions or distributable threads. In agreement with the previous point, the
transactions or distributable threads should incorporate all the information about
scheduling in the processors and networks, either in the model proposed by RTCORBA or in the one proposed in RT-GLADE [8].
• Control of remote calls. The task models implemented in TAO and PolyORB can
be used as a reference, adding an extra case in which there is one thread per kind
of request, directly waiting on the net (as in the second version of RT-GLADE).
The latter case can be useful in flexible scheduling environments when threads
execute under contracts and the cost of negotiating or changing contracts is very
high. In the case when there are intermediate threads for managing remote calls
(GLADE, RT-GLADE or PolyORB) it is important to control their scheduling
parameters. This is also the case of groups of threads in which threads can execute
with different parameters each time.
• Allow the free assignment of scheduling parameters. This is the approach used in
RT-GLADE. In RT-CORBA there is a specification for static real-time systems,
and an extension for dynamic real-time systems (see Section 3 in [13]). The
specification for static systems imposes restrictions on the assignment of
priorities, but these restrictions are removed in the specification for dynamic
systems, in which it is possible for the implementation to define scheduling
policies.
• Integration of the distribution middleware itself with more complex scheduling
frameworks that allow building flexible real-time systems.
Finally, and although this could be considered a subjective criterion, the distribution
middleware should pursue the aim of programming simplicity. CORBA is far away
from fulfilling this aim.
4
Evaluating Distribution Middleware Implementations
The objective of this section is to provide an idea about the overhead introduced by the
analysed implementations in a distributed application. A hardware platform consisting
of two 800-MHz AMD Duron processors and a 100-Mbps Ethernet, and the following
two software platforms have been used:
• Linux kernel 2.6.10 with TCP/IP to evaluate the implementations of TAO,
PolyORB with CORBA personality and GLADE.
• MaRTE OS 1.6.9a with RT-EP to evaluate PolyORB-CORBA, PolyORB-DSA
and MaRTE OS 1.4 for RT-GLADE. The first version of RT-GLADE is used,
which is operative over the same platform to which PolyORB was ported.
The tests will measure the execution time of a remote operation that adds two integers
and returns the result. The measurement is carried out from the time when the call is
made until the response is returned. This operation will be carried out in two modes:
alone and with four other clients carrying out the same operation, but with a lower
priority. The objective is not to obtain exhaustive measurements of the platform, but an
idea of the performance that can be achieved with the middleware. In all the tests the
operation to be evaluated is executed 10,000 times, and the average, maximum, and
minimum times are evaluated.
Table 1 shows the results of the measurements taken with the Linux platforms, using
the middleware configurations that introduce the least overhead. For the case of a
single client in TAO the reactive concurrency model with a single thread in the group
has been used. In PolyORB the model with full tasking without internal threads has
been used for the experiment with one client. For the five-client case both in TAO and
in PolyORB a configuration of a group of 5 threads with a Leader/Followers model
has been used. In GLADE, a static group of threads equal to the number of clients is
defined. The priority specification model for TAO, PolyORB and GLADE was the
client propagated one. In order to make the middleware overhead measurements
comparable, the temporal cost of using the net is also evaluated. Thus, Table 1 includes
the average, maximum and minimum times for the case when a message is sent and a
91
92
Table 1. Response times in Linux for one and five clients
Times for the highest priority client
with five clients (μs.)
Times for one client (μs.)
Avg.
TAO
Max.
Min.
Avg.
Max.
Min.
914
8044
861
1072
8237
846
1276
3176
1214
7610
12675
4 447
GLADE
333
2201
303
1566
3960
601
Stand-alone network
113
366
109
--
--
--
PolyORB-CORBA
Table 2. Response times in MaRTE OS for one and five clients
Times for one client (μs.)
Avg.
Max.
Min.
Times for the highest priority client
with five clients (μs.)
Avg.
Max.
Min.
PolyORB-CORBA
4013
4033
3996
6303
7988
4412
PolyORB-DSA
2990
3017
2963
4609
5104
3640
RT-GLADE
1119
1309
1065
1490
1642
1079
727
735
723
--
--
--
Stand-alone network
response is received; the program on the server side answers immediately upon
reception.
In the results obtained for a client in Linux, it can be observed that GLADE achieves
better times than TAO and PolyORB, which demonstrates that it has a lighter code.
The maximum times that are obtained for TAO should be noted, because they are
remarkably higher than the average times in both cases of one and five clients. The
explanation may be that Linux is not designed for hard real-time systems and
introduces jitter. The average numbers for one and five clients show large differences
in PolyORB and GLADE, while in TAO they are relatively similar. We can conclude
that TAO makes a better management of the priorities and the queues on this platform.
Table 2 shows the results of the measurements carried out over the three
implementations on the MaRTE OS/RT-EP platform. The configuration of PolyORBCORBA is the same as for Linux. The PolyORB-DSA configuration creates a task
explicitly to attend the remote requests. The configuration of the group of threads for
RT-GLADE is made equal to the number of clients, that is, five. As for the RT-EP
network, the parameter corresponding to the delay between arbitration tokens is set to
a value of 150 μs. This value limits the overhead in the processor due to the network.
A simple transmission in the network is also evaluated for the same reason as in the
case of Linux.
From the results obtained in the evaluation on the real-time platform, it can be
observed that, firstly, the network protocol has a greater latency and it makes the times
of a simple round-trip transmission higher than in Linux; the trade-off is that this is a
predictable network with less dispersion among the values of the measurements.
Furthermore, the minimum and average times of RT-GLADE for one client are also
greater than those of GLADE over Linux, although the maximum time remains within
a bound indicating a much lower dispersion. An important part of the response times
obtained for RT-GLADE is due to the network, but is also due to the operating system
and the dynamic memory manager used [11] (to make the timing predictable). If we
observe the times of RT-GLADE for five clients, we can see that only the minimum
time is worse than in GLADE, although with less difference; in contrast the average
time and above all the maximum are now clearly better. The increase in all the times of
RT-GLADE with respect to the case of one client is reasonable and can be justified by
the blocking times that can be suffered both in the processor and in the network.
In the measurement of the times of PolyORB-CORBA over MaRTE OS we have
found a great disparity of the measurements for five clients depending on the priorities
used in them. After analysing the PolyORB code, we found that the implementation
made by the Leader/Followers model does not really fit with this model. Instead of
having a single thread awaiting the remote request to later execute it and send the
response, there is still a thread that breaks the reception of messages from the network
and the remote execution. Thus, PolyORB implements two groups of threads: one for
RT-CORBA (it is necessary to create it explicitly to support the model of priorities),
and the other which corresponds to the concurrency support of CORBA. The threads
that serve the clients' requests are taken from the RT-CORBA group, but the
intermediate threads are taken from the other group and they are executed at the
intermediate priority of the system, which can introduce large priority inversions
depending on the priorities of the servers. This is a part which must be improved to
guarantee lower bounds of the worst-case response times. In any case, the
measurements reflected in Table 2 for PolyORB-CORBA with five clients have been
obtained in a best-case scenario in which the low-priority clients are not preempted by
any of the threads in the thread pool.
Furthermore, in PolyORB-DSA we have substituted the scheduling with a very simple
version implementing an experimental prototype of the model defined in [8]. The
response times obtained are worse than those of RT-GLADE, but better than those of
PolyORB-CORBA.
Therefore, with respect to the times of PolyORB over MaRTE OS, it is again shown,
by comparing the results with those of RT-GLADE, that the pure implementation of
the DSA can be much lighter than that of RT-CORBA. Comparing the tests of
PolyORB-CORBA for one and for five clients it can be seen that there is an important
difference between the minimum and maximum times for five clients, which is due to
the priority inversion introduced by the intermediate tasks.
5
Integration of the Distribution Middleware with a ContractBased Scheduling Framework
The FRESCOR (Framework for Real-time Embedded Systems based on COntRacts)
EU project [3] has the objective of providing engineers with a scheduling framework
that represents a high-level abstraction that lets them concentrate on the specification
of the application requirements, while the system transparently uses advanced real-
93
94
time scheduling techniques to meet those requirements. In order to keep the
framework independent of specific scheduling schemes, FRESCOR introduces an
interface between the applications and the scheduler, called the service contract.
Application requirements related to a given resource are mapped to a contract, which
can be verified at design time by providing off-line guarantees, or can be negotiated at
runtime, when it may or may not be admitted. As a result of the negotiation a virtual
resource is created, representing a certain resource reservation. The resources managed
by the framework are the processors, networks, memory, shared resources, disk
bandwidth, and energy; additional resources could be added in the future.
Careful use of virtual resources allows different parts of the system (whether they are
processes, applications, components, or schedulers) to use budgeting schemes. Not
only can virtual resources be used to help enforce temporal independence, but a
process can interact with a virtual resource to query its resource usage and hence
support the kinds of algorithms where execution paths depend on the available
resources.
When distribution middleware is implemented on operating systems and network
protocols with priority-based scheduling, it is easy to transmit the priority at which a
remote service must be executed inside the messages sent through the network.
However, this solution does not work if more complex scheduling policies, such as the
FRESCOR framework, are used. Sending the contract parameters of the RPC handler
and the reply message through the network is inefficient because these parameters are
large in size. Dynamically changing the scheduling parameters of the RPC handler is
also inefficient because dynamically changing a contract requires an expensive
renegotiation process.
The solution proposed in [8] consisted in explicitly creating the network and processor
schedulable entities required to establish the communication and execute the remote
calls. The contracts of these entities are negotiated and created before they are used.
They are then referenced with a short identifier that can be easily encoded in the
messages transmitted. For identifying these schedulable entities the transactional
model is used and the identifier, called an Event_Id, represents the event that triggers
the activity executed by the schedulable entity.
In the current FRESCOR framework, support for the transactional model is being
built. A tool called the Distributed Transaction Manager (DTM) is a distributed
application responsible for the negotiation of transactions in the local and remote
processing nodes in a FRESCOR system that implements the contract-scheduling
framework. Managing distributed transactions cannot be done on an individual
processing node because it requires dynamic knowledge of the contracts negotiated in
the other nodes, leading to a distributed consensus problem. The objective of the
Distributed Transaction Manager is to allow the remote management of contracts in
distributed systems, including capabilities for remote negotiation and renegotiation,
and management of the coherence of the results of these negotiation processes. In this
way, FRESCOR provides support for distributed global activities or transactions
consisting of multiple actions executed in processing nodes and synchronized through
messages sent across communication networks.
The implementation of the DTM contains an agent in every node, which listens to
messages either from the local node or from remote nodes, performs the requested
actions, and sends back the replies. In every node there is also a DTM data structure
with the information used by the corresponding agent. Part of this information is
shared with the DTM services invoked locally from the application threads. This
architecture could benefit from the presence of a distribution middleware, by making
the agents offer operations that could be invoked remotely, thus simplifying the current
need for a special communications protocol between the agents.
The current version of the transaction manager limits its capabilities just to the
management of remote contracts. In the future, the DTM should also provide a full
support for the transactional model, integrated with the distribution middleware. For
this purpose the following services would need to be added to it:
• Specification of the full transaction with identification of its activities, remote
services and events, and contracts for the different resources (processors and
networks).
• Automatic deployment of the transaction in the middleware. This would require:
- choosing unused Event_Ids for the transaction events
- choosing unused ports in the involved nodes, for the communications
- creating send endpoints for the client-side of the communications, using the
-
desired contracts and networks
creating receive endpoints for the reception of the reply in the client-side of the
communications, using the desired networks, ports, and event ids.
creating the necessary RPC handlers with their corresponding contracts
creating the receive endpoints of the server-side of the communications using
the desired contracts and networks
creating the send endpoints of the server-side of the communication using the
desired contracts and networks.
All this deployment would be done by the DTM from the information of the
transaction, which could be written using a suitable deployment and configuration
language. After this initialization, the transaction would start executing, its RPCs
would be invoked and the middleware would automatically direct them through the
appropriate endpoints and RPC handlers almost transparently. We would just specify
the appropriate event ids.
With the described approach we would achieve a complete integration of the
distribution middleware and the transactional model in a system managed through a
resource reservation scheduler.
6
Conclusions and Future Work
The work presented here reports an analysis and evaluation of some implementations
of distribution middleware from the viewpoint of their suitability for the
implementation of real-time systems. Specifically, the following aspects have been
95
96
highlighted: the way remote calls are managed, the mechanisms for establishing the
scheduling parameters, and the importance of giving support to the transactions or
distributable threads.
The time measurements have been carried out over Linux as the native operating
system of the middleware analysed, and over a real-time platform based on the
MaRTE operating system and the RT-EP real-time network protocol, to which
PolyORB has been ported in this work. In the measurements obtained, it can be
observed that the implementations of Ada’s DSA are lighter than the implementations
of RT-CORBA. This demonstrates that Ada could be a good option for programming
distributed systems, and that it could find its niche in medium-sized embedded
distributed real-time systems. The measurements on the real-time platform also show
that the predictability has a cost in terms of overhead in the network and in memory
management.
Furthermore, new mechanisms for contract-based resource management in a
distributed real-time system have been identified, and the necessity to integrate the
distribution middleware with them has been described, together with some ideas on
future work needed to support this integration.
Our work will continue with experimentation on the PolyORB real-time platform that
we already have, given our experience in Ada and in GLADE. The objective will be to
progress with the improvement of the real-time aspects of this platform both for the
DSA and for RT-CORBA, and to integrate the distributed transaction model along with
their managers and the new contract-based scheduling mechanisms for processors and
networks using the ideas described in this paper.
References
1.
Ada-Core Technologies, The GNAT Pro Company, http://www.gnat.com/
2.
M. Aldea, G. Bernat, I. Broster, A. Burns, R. Dobrin, J.M. Drake, G. Fohler, P. Gai, M.
González Harbour, G. Guidi, J.J. Gutiérrez, T. Lennvall, G. Lipari, J.M. Martínez, J.L.
Medina, J.C. Palencia, and M. Trimarchi. “FSF: A Real-Time Scheduling Architecture
Framework”. Proc. of the 12th IEEE Real-Time and Embedded Technology and
Applications Symposium, RTAS 2006, San Jose (CA, USA), 2006.
3.
FRESCOR project web page: http://frescor.org
4.
J.J. Gutiérrez, and M. González Harbour. “Prioritizing Remote Procedure Calls in Ada
Distributed Systems”. Proc. of the 9th International Real-Time Ada Workshop, ACM Ada
Letters, XIX, 2, pp. 67–72, June 1999.
5.
Y. Krishnamurthy, I. Pyarali, C. Gill, L. Mgeta, Y. Zhang, S. Torri, and D.C. Schmidt.
“The Design and Implementation of Real-Time CORBA 2.0: Dynamic Scheduling in
TAO”. Proc. of the 10th IEEE Real-Time and Embedded Technology and Applications
Symposium (RTAS'04), Toronto (Canada), May 2004.
6.
J. Liu. “Real-Time Systems”. Prentice Hall, 2000.
7.
J. López Campos, J.J. Gutiérrez, and M. González Harbour. “The Chance for Ada to
Support Distribution and Real Time in Embedded Systems”. Proc. of the International
Conference on Reliable Software Technologies, Palma de Mallorca, Spain, in LNCS, Vol.
3063, Springer, June 2004.
8.
J. López Campos, J.J. Gutiérrez, and M. González Harbour. “Interchangeable Scheduling
Policies in Real-Time Middleware for Distribution”. Proc. of the 11th International
Conference on Reliable Software Technologies, Porto (Portugal), in LNCS, Vol. 4006,
Springer, June 2006.
9.
MaRTE OS web page, http://marte.unican.es/
10.
J.M. Martínez, and M. González Harbour. “RT-EP: A Fixed-Priority Real Time
Communication Protocol over Standard Ethernet”. Proc. of the 10th International
Conference on Reliable Software Technologies, York (UK), in LNCS, Vol. 3555, Springer,
June 2005.
11.
M. Masmano, I. Ripoll, A. Crespo, and J. Real. “TLSF: A New Dynamic Memory
Allocator for Real-Time Systems”. Proc of the 16th Euromicro Conference on Real-Time
Systems, Catania (Italy), June 2004.
12.
Object Management Group. “CORBA Core Specification”. OMG Document, v3.0 formal/
02-06-01, July 2003.
13.
Object Management Group. “Realtime CORBA Specification”. OMG Document, v1.2
formal/05-01-04, January 2005.
14.
L. Pautet, and S. Tardieu. “GLADE: a Framework for Building Large Object-Oriented
Real-Time Distributed Systems”. Proc. of the 3rd IEEE Intl. Symposium on ObjectOriented Real-Time Distributed Computing, (ISORC'00), Newport Beach, USA, March
2000.
15.
PolyORB web page, http://polyorb.objectweb.org/
16.
I. Pyarali, M. Spivak, D.C. Schmidt, and R. Cytron. “Optimizing Thread-Pool Strategies
for Real-Time CORBA”. Proc. of the ACM SIGPLAN Workshop on Optimization of
Middleware and Distributed Systems (OM 2001), Snowbird, Utah, June 2001.
17.
Sun Developer Network, http://java.sun.com
18.
TAO web page, http://www.cs.wustl.edu/~schmidt/TAO.html
19.
S. Tucker Taft, Robert A. Duff, Randall L. Brukardt, Erhard Ploedereder, and Pascal
Leroy (Eds.). “Ada 2005 Reference Manual. Language and Standard Libraries.
International Standard ISO/IEC 8652:1995(E) with Technical Corrigendum 1 and
Amendment 1”. LNCS 4348, Springer, 2006.
20.
T. Vergnaud, J. Hugues, L. Pautet, and F. Kordon. “PolyORB: a Schizophrenic
Middleware to Build Versatile Reliable Distributed Applications”. Proc.of the 9th
International Conference on Reliable Software Technologies, Palma de Mallorca (Spain),
in LNCS, Vol. 3063, June 2004.
97
4. Sistemas Distribuidos
101
Integración de RT-CORBA en robots Eyebot
Manuel Díaz, Daniel Garrido, Luis Llopis, Raúl Luque
{mdr, dgarrido, luisll}@lcc.uma.es, [email protected]
Depart. Lenguajes y Ciencias de la Computación
Universidad de Málaga, Grupo de Ingeniería del Software ( GISUM )
Resumen
El desarrollo de sistemas empotrados puede
beneficiarse de la utilización de los paradigmas,
metodologías o técnicas propuestos por la Ingeniería del
Software. En concreto, la utilización de CORBA podría
simplificar considerablemente las comunicaciones al
poder abstraer la plataforma de comunicaciones. Sin
embargo, las restricciones de este tipo de sistemas,
hacen difícil la utilización de implementaciones CORBA
las cuales estaban diseñadas para sistemas con mayores
capacidades de procesamiento o memoria. En este
trabajo, se presenta la integración de ROFES, una
implementación de RT-CORBA y minimumCORBA, en
un tipo de robot denominado Eyebot que incorpora un
amplio conjunto de elementos tales como motor, ruedas,
radio, cámara, etc. Se presenta además una aplicación
desarrollada utilizando ROFES que permite de forma
remota controlar el robot.
1. Introducción
Desde hace ya tiempo la progresiva implantación de
los sistemas empotrados distribuidos de tiempo real es
un hecho en todos los ámbitos de nuestra vida diaria. No
obstante, y como tampoco es desconocido, el desarrollo
de estos sistemas no es una tarea fácil y puede ser
beneficiada por la aplicación de las técnicas, modelos,
paradigmas, etc. proporcionados por la Ingeniería del
Software. En este sentido, por parte de OMG (Object
Management Group) se definen tanto la especificación
RT-CORBA
[OMGRT02]
[OMGRT05]
como
minimumCORBA [OMG98] para la utilización de
CORBA en este tipo de sistemas. Los beneficios que la
utilización de CORBA puede aportar son amplios, desde
la utilización de un paradigma de comunicación de alto
nivel basado en objetos, hasta la independencia de
plataforma de comunicaciones o sistema operativo para
el desarrollador. Sin embargo, existen pocas
implementaciones que permitan utilizar RT-CORBA o
minimumCORBA en sistemas empotrados con fuertes
restricciones de memoria, procesamiento, etc. En este
artículo se presenta la migración de ROFES [ROFES],
una implementación de RT-CORBA, para su utilización
en robots tipo Eyebot [EYEBOT], los cuales disponen
de bastantes posibilidades para el desarrollo de
aplicaciones empotradas distribuidas, pero que no
obstante, tienen restricciones como las comentadas que
hacen difícil la utilización de CORBA [GORAPPA].
La estructura del artículo es como sigue: en la
siguiente sección se presenta una breve introducción a
ROFES junto con RT-CORBA y minimumCORBA. La
sección 3 presenta las principales características del
robot Eyebot mostrando tanto aspectos hardware como
software. La sección 4 detalla el proceso de migración de
ROFES a los robots Eyebot. La sección 5 presenta una
aplicación desarrollada para el manejo del robot
utilizando ROFES. Finalmente, se presentan algunas
conclusiones.
2. RT-CORBA y minimumCORBA
El proyecto Real-Time CORBA for Embedded
Systems (ROFES) implementa una versión de CORBA
[OMG02][VINOS][SCHMIDT] para sistemas empotrados
de tiempo real siguiendo los estándares de la OMG Real
Time CORBA Specification versión 1.1 de Agosto de
2002 y minimumCORBA del 17 de Agosto de 1998.
Este proyecto lo está desarrollando el grupo Chair of
Operating System de la Universidad RWTH-Aachen de
Alemania.
La especificación RT-CORBA define un conjunto
opcional de extensiones y una serie de servicios de
planificación que permiten utilizar CORBA como un
componente más de un sistema de tiempo real. El
objetivo de estas extensiones es ofrecer un soporte para
la gestión de recursos que asegure que las actividades
que realizan las aplicaciones con restricciones de tiempo
real se ejecuten de manera predecible.
La especificación RT-CORBA 1.1. impone un
modelo de planificación basado en prioridades fijas y
depende de un sistema operativo de tiempo real que
planifique las hebras que representan las actividades del
sistema y que ofrezca un mutex como mecanismo de
sincronización para el acceso a los recursos compartidos.
En el caso de que algunos de estos mecanismos no sean
proporcionados por el sistema operativo, tendrán que ser
desarrollados, situación ésta, que se ha producido en el
102
trabajo presentado en este artículo en determinadas áreas
tales como sincronización o hebras.
Los sistemas empotrados de tiempo real necesitan una
versión reducida de CORBA que puedan ejecutar, dado
que normalmente las implementaciones CORBA
estándares requieren de sistemas con mayores
capacidades de memoria o procesamiento. Esta versión
reducida se denomina minimumCORBA, la cual define
un perfil, o subconjunto, de CORBA. Esta especificación
ha sido reemplazada actualmente por la especificación
Common Object Request Broker Architecture for
embedded (CORBA/e) del 3 de Agosto de 2006
[OMG06]. No obstante, la implementación RT-CORBA
utilizada, ROFES, hace uso de minimumCORBA.
Las características de CORBA omitidas por este perfil
tienen importancia en aplicaciones CORBA para
sistemas genéricos. Sin embargo, estas características
producen algún coste, en términos de recursos, y hay un
significativo tipo de aplicaciones para las cuales este
coste no se puede justificar, como es el caso de los
sistemas empotrados.
minimumCORBA define un único perfil que preserva
los principales beneficios de CORBA: portabilidad de
las aplicaciones e interoperabilidad entre ORBs. Se
reconocen los siguientes objetivos cuando se elige este
perfil:
•
Cualquier característica conservada debe tener
una gran aplicabilidad en el mundo de los
sistemas de recursos limitados.
•
minimumCORBA debería ser completamente
interoperable con CORBA.
•
minimumCORBA debería soportar el IDL
completo.
En general, se omiten las características que
proporcionan los aspectos dinámicos de CORBA.
usuario. Además, posee una memoria RAM de 1
MB, extensible a 2 MB, que permite almacenar
y ejecutar los programas de usuario.
•
Sensores: incluye sensores para la detección de
movimientos del robot y sensores infrarrojos
para detectar la presencia y situación de
obstáculos cercanos.
•
Cámara digital: resolución de 80x60 píxeles y 24
bits, situada en la parte frontal, ofreciendo la
posibilidad de capturar imágenes del entorno del
robot.
• Motores encargados de mover las ruedas.
• Puerto serie a través del cual se pueden
descargar aplicaciones al Eyebot, y también para
la comunicación entre el robot y un ordenador.
•
Módulo de radio, con tecnología Wifi o
Bluetooth, para comunicar un robot con otros o
con un ordenador.
• Puerto paralelo usado como depurador externo.
•
Pantalla LCD donde se pueden mostrar mensajes
del programa en ejecución.
•
Botones de interacción con el usuario.
•
Micrófono y altavoz interno para capturar y
reproducir sonidos.
Como puede observarse, el hardware del robot
incluye diferentes elementos que le proporcionan una
gran versatilidad. Las figuras 1 y 2 muestran diferentes
vistas del robot.
3. Descripción del Eyebot
El denominado Eyebot es un robot móvil, diseñado
para ser usado en el campo de la educación e
investigación de la robótica y de sistemas empotrados de
tiempo real. En esta sección se describen brevemente el
hardware del robot Eyebot, el sistema operativo RoBIOS
(Robot Basic Input Output System) [BRAUNL] y la
interfaz de programación proporcionada por el
fabricante. Las características y/o limitaciones de estos
elementos influyen en el proceso de migración de
ROFES para su utilización en los robots.
Figura 1. Vista frontal del Eyebot
3.1. Elementos hardware
El hardware del Eyebot incluye entre otros, los
siguientes elementos:
•
Microprocesador Motorola 68332 a 35 MHz con
una memoria Flash – Rom de 512 KB, de las
cuales 128 KB están reservadas al sistema
operativo y el resto para los programas del
Figura 2. Vista trasera del Eyebot
103
3.2. Sistema Operativo
4. Migración de ROFES al robot Eyebot
El robot Eyebot incluye un sistema operativo
denominado RoBIOS que consta de tres elementos:
consola, tabla HDT con los dispositivos hardware
conectados y la interfaz de programación.
El sistema operativo gestiona los diferentes recursos
del sistema y muestra al usuario una consola mediante la
cual se puede interactuar con el robot. Desde ésta, se
pueden cargar y ejecutar programas, hacer
comprobaciones del funcionamiento de los sensores y
motores del robot. En el sistema operativo existe una
tabla, denominada HDT (Hardware Description Table),
en la que se definen los diferentes dispositivos hardware
conectados realmente al robot. Para facilitar el acceso a
los distintos recursos desde los programas de usuario, la
RoBIOS ofrece además una interfaz de programación
con funciones adecuadas para tipo de dispositivo como
por ejemplo la cámara.
El proyecto ROFES implementa un ORB para
ejecutarse sobre sistema empotrados de tiempo real
como el Eyebot. Pero incluso así, los recursos hardware
disponibles en este robot son insuficientes para poder
ejecutar dicho ORB, por ejemplo, en cantidad de
memoria. También, se encuentran limitaciones software,
es decir, el compilador cruzado para el Eyebot no
proporciona todas las funciones necesarias para construir
el proyecto ROFES para el robot.
Para solventar el problema de la memoria se optó por
seguir la siguiente estrategia de implementación:
•
No se va a disponer de las funciones encargadas
de abrir y leer librerías compartidas, sólo se
pueden crear las librerías estáticas de ROFES.
•
No se va a disponer de los métodos necesarios
para los protocolos de comunicación que
implementa la librería ROFES y que no son
utilizados.
3.3. Interfaz de programación
La interfaz de programación del Eyebot consiste en
un conjunto de funciones para acceder y manipular los
distintos sensores, actuadores y dispositivos conectados
al robot. Estas funciones están escritas en C y se enlazan
con el código del programa usuario, de manera, que
resulta sencillo acceder a los distintos elementos.
Se describen aquí algunas de las características de
dicha API que son significativas para la migración de
ROFES.
El sistema operativo RoBIOS posee dos tipos
distintos de planificadores: cooperativo y expulsivo.
•
Cooperativo: la tarea, que se ejecuta, no es
expulsada del planificador hasta que ella misma
cede el testigo a otra tarea. La tarea que recoge
el testigo, será una tarea que esté en la cola de
listos y cuya prioridad sea la más alta.
•
Expulsivo: la tarea en ejecución es aquélla con
mayor prioridad no bloqueada o suspendida,
permitiendo la expulsión de tareas de menor
prioridad.
ROFES está basado en la utilización de hebras, por lo
que será necesario utilizar y/o adaptar las hebras
proporcionadas por RoBIOS.
Como mecanismo de sincronización, RoBIOS
proporciona semáforos, los cuales también será
necesarios para implementar por ejemplo los mutexes
ofrecidos por ROFES.
Finalmente, se ofrece también la posibilidad de
utilizar temporizadores que pueden ser necesarios para
los mecanismos de planificación.
El desarrollo de aplicaciones para el Eyebot se basa
en un proceso de compilación cruzada a través de Linux
en el que los programas se descargan a través del puerto
serie.
Teniendo en cuenta lo anterior, se elimina todo el
código relacionado con la apertura y lectura de librerías
compartidas y todas las clases que implementan los
distintos protocolos de comunicación no utilizados.
También, se elimina todo el código específico para los
sistemas operativos en los cuales se puede construir la
librería ROFES.
Con respecto a las limitaciones software, se
encuentran los siguientes problemas:
•
No se dispone de la librería para las hebras
POSIX. Únicamente se pueden utilizar para las
hebras las funciones proporcionadas por la
RoBIOS del Eyebot.
•
Las funciones proporcionadas por el compilador
cruzado que sean reimplementadas por la
RoBIOS no pueden utilizarse ya que en el
momento del enlace, el compilador indica que
no encuentra la librería que contiene esa función.
Esto ocurre, por ejemplo, con la función
sleep(n), que duerme el proceso n segundos, y
la RoBIOS proporciona la función OSWait(n),
que duerme el proceso n / 100.
•
Insuficiente tamaño de la pila de los programas.
Solventados todos estos problemas, se obtiene un
ORB capaz de ejecutarse sobre el Eyebot utilizando las
librerías de ROFES. La comunicación de los ORBs se
realiza a través del puerto serie, pudiéndose añadir los
protocolos de comunicación que utilicen los módulos de
Bluetooth o Wifi proporcionados por el fabricante del
Eyebot.
Además de las anteriores limitaciones, durante el
desarrollo de la migración fue necesario realizar un
estudio de la estructura del código fuente de ROFES
para poder realizar la modificación de diferentes partes
del ROFES. También fue necesario subsanar algunos
104
errores presentes en el propio ROFES por tratarse de un
proyecto en desarrollo.
Cambios en el ORB.
Por los problemas mencionados de limitación de
memoria, se han eliminado del núcleo del ORB todos
aquellos elementos no indispensables o que ya no tenían
sentido como, por ejemplo, los relativos a los protocolos
de comunicación que ya no se iban a utilizar en el robot.
Algunos de los principales cambios han sido:
•
Limitación del número de ORBs que se pueden
inicializar en el Eyebot a 1. Esto realmente no
supone una gran limitación para el desarrollador,
puesto que habitualmente sólo se inicializa 1
ORB.
• Cambios en la configuración: se han eliminado
algunas de las opciones de configuración
admitidas por el ORB y se han modificado otras
para permitir por ejemplo la comunicación con
los mecanismos que ofrece el robot. Así por
ejemplo, la opción ORBDefaultEndpoint admite
los valores rs232, bluetooth y wireless, si la
comunicación se realiza a través del puerto serie,
del módulo de bluetooth o del módulo wireless,
respectivamente.
No ha sido necesario modificar el ORB de tiempo
real.
Cambios en el adaptador de objetos.
El adaptador de objetos es el elemento del ORB que
permite realizar la localización de los objetos remotos.
La principal modificación ha consistido en eliminar el
código asociado a los protocolos de comunicación no
utilizados. También ha sido necesario realizar esta
modificación en el adaptador de objetos de tiempo real.
Gestión de prioridades.
El rango de prioridades admitidas por las hebras en el
Eyebot viene definido por las constantes MIN_PRI y
MAX_PRI con los valores 1 y 8, respectivamente. Es
necesario realizar cambios en ROFES para que se
reconozcan estos valores.
Así, por ejemplo, es necesario redefinir la constante
logical_div_native cuyo valor permite traducir la
prioridad CORBA a la prioridad nativa y viceversa. El
valor de esta constantes es la máxima prioridad CORBA
dividida entre la constante MAX_PRI, ya que las hebras
del Eyebot sólo pueden tomar ocho valores distintos. La
definición de estas constantes es:
int RTCORBA::PriorityMapping::os_maxPriority =
MAX_PRI;
int RTCORBA::PriorityMapping::os_minPriority =
MIN_PRI;
#define logical_div_native (32767 / MAX_PRI)
Como la clase PriorityMapping es la encargada de
traducir las prioridades de las hebras entre la prioridad
CORBA y la del sistema operativo, se ha eliminado, en
esta clase, todo el código implicado en la definición,
obtención, de las prioridades admitidas por los sistemas
operativos en los cuales puede construirse el proyecto
ROFES. También, se han reimplementado los métodos
to_native y to_CORBA para tener únicamente en cuenta,
los valores válidos de la prioridad para RoBIOS. El
código del método to_native quedaría como sigue:
if
((corba_priority
<
minPriority)
||
(corba_priority > maxPriority))
{
ROFES_LEAVE;
return false;
}
native_priority = corba_priority
/ logical_div_native;
// Las prioridades empiezan en 1
native_priority++;
if (native_priority > os_maxPriority)
native_priority = os_maxPriority;
ROFES_LEAVE;
return true;
Cabe
destacar
que
los
métodos
y the_priority() de la clase
RTCORBA::Current, encargados de establecer y obtener
la prioridad de la hebra actual, no tienen funcionalidad
debido a que la RoBIOS no proporciona ninguna función
o método que permita acceder y/o modificar la prioridad
de las hebras una vez inicializadas. El valor de la
prioridad de las hebras en el Eyebot únicamente se
establece en el momento de su creación.
the_priority(Priority)
Sincronización.
La clase CORBA::OSMutex utiliza las operaciones para
los mutex específicas de cada sistema operativo. La
RoBIOS sólo proporciona tres funciones de semáforos:
OSSemInit, OSSemP y OSSemV. Estas funciones han sido
utilizadas en los métodos lock y unlock para poder
simular los mutexes proporcionados por ROFES.
Conjuntos de hebras.
El principal problema en la migración del proyecto
ROFES al Eyebot ha sido la creación y gestión de las
hebras a través de las funciones proporcionadas por la
RoBIOS. El problema radica en que la función OSSpawn
espera, como uno de sus parámetros de entrada, un
puntero a una función C, y no un puntero a un método de
C++, que contiene el código a ejecutar.
Afortunadamente, los punteros a métodos de C++
estáticos son compatibles con los punteros normales a
funciones de C. Pero, surge un nuevo problema: los
métodos estáticos sólo pueden usar otros métodos y
atributos estáticos, lo cual es algo limitado. Para
solventar estos problemas, se crea un atributo estático
encargado de almacenar la referencia del objeto creado y
un método estático que llama al método C++ que
contiene el código que debe ejecutar la hebra.
De las clases implicadas en la implementación de los
Threadpools en ROFES sólo es necesario modificar la
clase Threadpool_impl. Se ha eliminado todo el código
relacionado con las funciones encargadas de crear y
gestionar las hebras en los sistemas operativos en los
cuales se puede construir el proyecto ROFES.
Otra modificación realizada ha consistido en
modificar el tamaño de la pila para las hebra. Así, por
ejemplo, en caso de usar las funciones de la cámara el
valor debe ser al menos de 150.000 bytes.
Cambios en GIOP.
El denominado General Inter-ORB Protocol se
encarga de definir los mensajes necesarios para
establecer las comunicaciones entre servidores y
clientes. En el caso de ROFES, éste incluye
implementaciones de GIOP para algunas plataformas
como TCP/IP o CAN.
Para el proceso de migración a RoBios, ha sido
necesario modificar las clases GIOPDevice y GIOPServer
añadiendo la posibilidad de utilizar el puerto serie en el
robot.
En la clase GIOPDevice se ha eliminado todo el
código relacionado con la implementación referente a la
red CAN y con la lectura de las librerías compartidas. En
la clase GIOPConnector se ha añadido el código
necesario para crear los dispositivos encargados de la
comunicación a través del puerto serie, de bluetooth y de
wireless, tal y como muestra el siguiente código:
105
new WIRELESSIOP::WIRELESSDevice(proper);
#endif
...
Ha sido también necesario modificar el mecanismo de
referencias IOR de los objetos de ROFES, ya que han
sido eliminados los protocolos de comunicación no
utilizados y se han añadido nuevos protocolos como por
ejemplo, la comunicación a través del puerto serie. Se
han realizado por tanto cambios en diferentes partes de
ROFES para que se tengan en cuenta los nuevos
protocolos.
Protocolo de comunicación por puerto serie.
La implementación de este protocolo realizada en el
proyecto ROFES no puede utilizarse porque la RoBIOS
proporciona sus propias funciones para comunicarse con
el puerto serie.
La clase RS232Device permite el envío y recepción de
los mensajes GIOP a través del puerto serie del Eyebot.
Se muestra a continuación como quedaría el método
receiveHeader(*buff), el cual lee del puerto serie los
doce primeros bytes correspondiente a la cabecera del
mensaje GIOP:
while(bytes < 12) {
diff = OSRecvRS232(&(buff[bytes]), SERIAL1);
if (diff == 0) { //Lectura correcta
bytes ++;
} else if (diff != 10) { //Error al leer
if (!hang_on--) { //No
mas intentos
ROFES_LEAVE;
return false;
}
} else {
ROFES_PRINT_ERROR1("error=%d\n", diff);
ROFES_LEAVE;
return false;
}
_major = profile->major();
}
_minor = profile->minor();
if (proto)
proper =
proto->transport_protocol_properties;
if (profile->id() == IOP::TAG_RS232IOP) {
_device = new RS232IOP::RS232Device(proper);
#ifdef HAVE_BLUETOOTH
} else if (profile->id() ==
IOP::TAG_BLUETOOTHIOP) {
_device =
new BLUETOOTHIOP::BLUETOOTHDevice(proper);
#endif
#ifdef HAVE_WIRELESS
} else if (profile->id() ==
IOP::TAG_WIRELESSIOP) {
_device =
5. Consola para el robot Eyebot
Una vez obtenida una versión de CORBA capaz de
ejecutarse en el Eyebot, se diseña una aplicación para el
Eyebot la cual crea un ORB, activando un objeto que
proporciona acceso a los distintos componentes
hardware del robot. Es decir, el objeto activo en el ORB
del Eyebot tiene como objetivos:
•
Leer los sensores infrarrojos PSD.
•
Obtener las velocidades del Eyebot.
•
Controlar el servo de la cámara y del pateador.
•
Controlar el tipo de movimiento y velocidad del
Eyebot.
•
Obtener imágenes de la cámara del Eyebot.
106
•
Comunicar a los clientes, si los hubiera, los
errores producidos por la ejecución de las
distintas funciones de la RoBIOS.
Se ha desarrollado además una aplicación que se
comunique con el objeto anterior y cuyas funciones son:
•
Mostrar los valores leídos por los sensores
infrarrojos PSD del Eyebot.
•
Mostrar los valores de la velocidad lineal y
angular con las que se mueve el Eyebot.
•
Establecer la nueva posición de los servos del
robot.
•
Establecer el nuevo tipo de movimiento del
robot, es decir, fijar la nueva velocidad lineal y
angular, el ángulo de giro y la distancia a
recorrer.
•
Mostrar las imágenes obtenidas por la cámara
del Eyebot.
•
Informar al usuario de las distintas excepciones
producidas durante el envío y recepción de las
peticiones al objeto.
Para que las aplicaciones puedan realizar sus
funciones, se define la siguiente interfaz en IDL:
module TestEyebot {
typedef char Image[15252];
interface ControlEyebot
{
exception NoCamera {};
exception InitErrorCamera {};
exception ReleaseCamera {};
exception WrongHandleServo {};
exception WrongHandleVW {};
void initCamServo()
raises (NoCamera, InitErrorCamera,
WrongHandleServo);
void releaseCamServo()
raises (ReleaseCamera, WrongHandleServo);
Image getImage();
void turnServo(in short phi)
raises (WrongHandleServo);
void setSpeed(in float v, in float w)
raises (WrongHandleVW);
void getSpeed(out float v, out float w)
void driveStraight(in float dist, in float v)
void driveTurn(in float phi, in float w)
void driveCurve(in float dist, in float phi,
in float v) raises (WrongHandleVW);
void getValuePSD(out short front,
out short left, out short right);
void hit() raises (WrongHandleServo);
};
};
El tamaño del tipo Image no se ha definido como una
constante debido a que el compilador IDL de ROFES no
realiza correctamente la traducción a C++ de las
constantes. El tamaño de este tipo debe ser igual al
tamaño de los tipos definidos en la RoBIOS para
almacenar imágenes. Este tipo almacena una imagen en
color de tamaño 82x62x3 obtenida por la cámara del
Eyebot.
Las excepciones definidas controlan los errores que
pueden ocurrir durante la ejecución de las distintas
funciones de las RoBIOS usadas y su significado es el
siguiente:
• NoCamera: No existe ninguna cámara instalada en el
Eyebot.
• InitErrorCamera: Error al reservar los recursos
necesarios para la utilización de la cámara.
• ReleaseCamera: Error al liberar los recursos
asignados a la cámara.
• WrongHandleServo: El manejador del servo es
erróneo.
• WrongHandleVW: El manejador de la interfaz V Omega es erróneo.
Los siguientes métodos permiten interactuar con la
cámara del robot:
● initCamServo: Resetea e inicializa la cámara del
Eyebot y el servo que controla la posición de la cámara.
● releaseCamServo: Libera los recursos asignados a
la cámara del robot y los asignados al servo de la
cámara.
● getImage: Obtiene una imagen en color de 82x62
de la cámara.
● turnServo: Fija la nueva posición del servo de la
cámara.
Los siguientes métodos permiten obtener y/o
modificar el tipo de movimiento realizado por el robot:
● setSpeed: Establece las nuevas velocidades lineal y
angular del Eyebot.
● getSpeed: Obtiene el valor actual de la velocidad
lineal y angular del Eyebot.
● driveStraight: El robot avanza o retrocede en
línea recta dist metros a una velocidad v.
● driveTurn: El robot gira a la izquierda o a la
derecha phi grados a una velocidad w.
● driveCurve: El robot avanza o retrocede
describiendo una curva de phi grados durante dist
metros a una velocidad v.
El método getValuePSD lee los valores leídos por los
tres sensores infrarrojos que posee el Eyebot.
Finalmente, el método hit mueve el servo del golpeador.
Como ejemplo, el método getImage devuelve una
imagen en color de tamaño 82x62 capturada por la
cámara del Eyebot. Para la implementación de este
método, se pensó, en un principio, en comprimir la
imagen en el formato PNG, pero es necesario migrar la
librería PNG a la RoBIOS y añadirle a la imagen
capturada, la cabecera de la imagen para el formato PNG
ya que la funciones de la RoBIOS devuelven una matriz
con el mapa de píxeles de la imagen. Esto conlleva un
gasto de los escasos recursos de los que se dispone en el
Eyebot para la ejecución de CORBA, y la ganancia de
tiempo en enviar la imagen comprimida a través del
puerto serie no es significativa con el tiempo que se tarda
en enviar la imagen sin comprimir. Por tanto, la
implementación de este método es:
colimage colimg;
TestEyebot::Image_slice* img = Image_alloc();
CAMGetColFrame(&colimg, FALSE);
img = Image_dup((Image_slice*) colimg);
return img;
Como puede observarse se utilizan funciones de la
API proporcionadas por RoBIOS tales como
CAMGetColFrame.
La figura 3 muestra la aplicación cliente desarrollada
con la librería Qt [QT3], y desde la que es posible
controlar el robot Eyebot.
Figura 3. Vista frontal del Eyebot
6. Conclusiones
En este trabajo se ha presentado la integración de un
entorno CORBA sobre un sistema empotrado de tiempo
real.
Las
especificaciones
RT-CORBA
y
minimumCORBA ayudan a que las limitaciones propias
de los sistemas empotrados no sean un impedimento para
conseguir un núcleo de CORBA con sus principales
características y capaz de convertirse en un componente
más del sistema empotrado de tiempo real.
Se ha podido comprobar que aun teniendo un sistema
operativo tan básico como RoBIOS del Eyebot, es
posible obtener una núcleo CORBA con sus principales
características capaz de comunicarse con otros entornos
CORBA o formar parte de un sistema CORBA más
complejo.
107
Para finalizar, cabe destacar, que la especificación de
minimumCORBA ha quedado obsoleta. Esta
especificación ha sido reemplazada por la especificación
de Common Object Request Broker Arquitecture for
embedded, el perfil CORBA/e, que añade más
funcionalidades a las versiones de CORBA que se
ejecutan en sistemas empotrados, ya que, estos sistemas
disponen, cada día que pasa, de más recursos.
Referencias
[BRAUNL] T. Bräunl. 2003. Embedded Robotics – Mobile
Robot Design and Applications with Embedded Systems.
Editorial: Springer.
[EYEBOT] http://robotics.ee.uwa.edu.au/eyebot
[GORAPPA] S.Gorappa, J.A. Colmenares, H. Jafarpour, R.
Klefstad, “Tool-based Configuration of Real-time
CORBA Middleware for Embedded Systems”, ISORC
2005.
[OMG98]
Object
Management
Group.
1998.
minimumCORBA
[OMG02] Object Management Group. 2002. The Common
Object Request Broker: Architecture and Specification.
Version 3.0
[OMG06] Object Management Group. 2006. Common Object
Request Broker Architecture (CORBA) for embedded
Specification.
[OMGRT02] Object Management Group. 2002. Real-Time
CORBA Specification. Version 1.1
[OMGRT05] Object Management Group. 2005. Real-Time
CORBA Specification. Version 1.2
[QT3] http://www.trolltech.com/qt
[ROFES] Chair of Operating System de la Universidad
RWTH-Aachen de Alemania.
http://www.lfbs.rwth-aachen.de
[SCHMIDT] D.C. Schmidt, F. Kuhns, “An overview of the
Real-time CORBA Specification” in IEEE Computer
special issue on Object-Oriented Real-time Distributed
Computing, June 2000.
[VINOS] M. Henning, S. Vinoski. 2002. Programación
Avanzada en CORBA con C++.Editorial: Addison
Wesley
108
An Ada 2005 Technology for Distributed and Real-Time
Component-based Applications
Patricia López Martínez, José M. Drake, Pablo Pacheco, Julio L. Medina
Departamento de Electrónica y Computadores, Universidad de Cantabria,
39005-Santander, SPAIN
{lopezpa,drakej,pachecop,medinajl}@unican.es
Abstract: The concept of interface in Ada 2005 facilitates significantly its usage
as the basis for a software components technology. This technology, taking benefit of the resources that Ada offers for real-time systems development, would be
suitable for component-based real-time applications that run on embedded platforms with limited resources. This paper proposes a model based technology for
the implementation of distributed real-time component-based applications with
Ada 2005. The proposed technology uses the specification of components and
the framework defined in the LwCCM standard, modifying it with some key features that make the temporal behaviour of the applications executed on it, predictable, and analyzable with schedulability analysis tools. Among these
features, the dependency on CORBA is replaced by specialized communication
components called connectors, the threads required by the components are created and managed by the environment, and interception mechanisms are placed
to control their scheduling parameters in a per-transaction basis. This effort aims
to lead to a new IDL to Ada mapping, a prospective standard of the OMG.
Keywords: Ada 2005, Component-based technology, embedded systems, realtime, OMG standards
1
Introduction1
While in the general-purpose software applications domain the component-based software engineering (CBSE) approach is progressing as a promising technology to
improve productivity and to deal with the increasing complexity of applications, in the
embedded and real-time systems domain, instead, its usage has evolved significantly
slower. The main reason for this delay is that the most known CBSE technologies like
EJB, .NET, or CCM, are inherently heavy and complex, they introduce not easily predictable overheads and do not scale well enough to fit the significant restrictions on the
availability of resources usually suffered by embedded systems.
Trying to find an appropriate solution to this problem, european research projects
like COMPARE [1] and FRESCOR [2], tackle from different points of view, the
1. This work has been funded by the European Union’s FP6 under contracts FP6/2005/IST/5-034026
(FRESCOR project) and IST-004527 (ARTIST2 One). This work reflects only the author’s views; the
EU is not liable for any use that may be made of the information contained herein.
development of a real-time component-based technology compatible with embedded
systems. Their approach is based on the usage of the Container/Component model
pattern defined in the LwCCM specification developed by OMG [3], but avoiding the
usage of CORBA as communication middleware, which is too heavy for this kind of
applications. With this pattern, the interaction of the component with the run-time
environment is completely carried out through the container, whose code is generated
by automatic tools with the purpose of isolating the component developer from the
details about the code of the execution environment.
The recent modification of the Ada language specification [4], so called Ada 2005,
provides an enhanced option for the implementation of fully Ada native componentbased technologies, which is really suitable for embedded platforms. Over the known
assets of Ada for real-time and distributed applications, like native support for
concurrency, scheduling policies, synchronization mechanisms, and remote
invocations; Ada 2005 includes the concept of interface, which allows to implement
directly the services offered and required by components (Facets and Receptacles in
LwCCM respectively). Additionally, Ada 2005 handles incomplete types, which
enable the definition of cross-references, very frequently used in component based
applications.
This paper proposes a component-based technology based on Ada. It implements the
LwCCM framework, with the container/component model, and both the code of the
environment and the code of the components are written in Ada 2005. The technology
incorporates mechanisms to the running environment, and extends the specification of
the components, in such a way that the timing behaviour of the final application is
totally controlled by the automatically generated execution environment. In this way,
real-time models of the application can be elaborated and analysed in order to verify its
schedulability when it is run in closed platforms, or to define the resource usage
contracts required to operate in open environments like FRESCOR[2][5]. The
description and deployment of applications and components in the technology follow
the “Deployment and Configuration of Component-Based Distributed Applications”
standard of the OMG [6] (D&C). The paper is focused in the description of the
framework that is the base of the technology, particularly on the resources used to
guarantee the required predictability.
Various proposals dealing with the adaptation of CBSE to real-time systems have
appeared in the last years, though none of them have fully satisfied the industry
requirements [7]. In the absence of a standard, some companies have developed their
own solutions, adapted to their corresponding domains. Examples of that kind of
technologies are Koala [8], developed by Philips, or Rubus [9], developed by Arcticus
Systems and used by Volvo. These technologies have been successfully applied in the
companies they were respectively created, but they do not enable the arising of an
inter-enterprise software components market. However, they have served as the basis
of other academic approaches. Robocop component model [10], is based on Koala and
adds some features to support analysis of real-time properties; Bondarev et al. [11]
have developed an integrated environment for the design and performance analysis of
Robocop models. Similarly, Rubus has been used as the starting point of the SaveCCT
technology [12]; the component concept in SAVE is applied at a very low granularity,
eventhough, under appropriate assumptions for concurrency, simple RMA analysis can
109
110
be applied and the resulting timing properties introduced as quality attributes of the
assemblies; SaveCCT focuses on control systems for the automotive domain. In a
similar way, COMDES-II [13] encapsulates control tasks following a hierarchical
composition scheme, applied in an ad-hoc C based RT-kernel.
The technology presented in this paper follows the idea proposed by PECT
(Prediction-Enabled Component Technology) [14]. It proposes to include sets of
constraints in the component-based technology that allow to predict the behaviour of
an assembly of components before its execution, based on properties of the
components. In our case, this approach is applied to obtain the complete real-time
model of the application. Though the Ada language is significantly used in the design
and implementation of embedded real-time systems, we have not found references of
its usage in support of component-base environments. This is probably due to the lack
of support for multiple inheritance in the old versions of the language.
The rest of this paper is organized as follows. Section 2 describes the two main
processes involved in a components technology, emphasizing the main contributions
of the proposal. Section 3 describes in detail the reference model of the framework,
and the aspects included for developing analyzable applications. Section 4 details the
architecture and classes to which a component is mapped in the technology and finally,
Section 5 explains our conclusions and future work.
2
Real-time component-based development
A component technology defines two different development processes, shown in Figure 1. The components development process comprises the specification, implementation, and packaging of components as reusable and independently distributable
entities, while the development of component-based applications includes specification, configuration, deployment and launching of applications built as assemblies of
available components. Both processes are independent and they are carried out by different agents in different stages, however, they require to be coordinated because the
final products of the first process are the inputs for the second. So, in order to guarantee their coherence, a component technology must define a set of rules about the kind
of products and information that are generated in each phase of the process, and the
formats in which they are supplied. A key aspect in a component technology is the
opacity of the components. This means that during the process of application development, components must be used without any knowledge of internal details of their
implementation or code. To achieve this, a component is stored as a package that
includes, together with the implementation code, other files which supply models and
complementary information (metadata) about different aspects (functional and nonfunctional) of the component, required for its usage.
A component development process starts when the “specifier”, who is an expert in a
particular application domain, elaborates the specification of a component that brings a
concrete functionality demanded in the domain. The “developer” implements this
specification according to a certain technology and elaborates the models that describe
the installation requirements of the component. This work is supported by automatic
tools, which generate the skeletons for the code of the component based on the
111
Components development
Required
functionality
Component
specification
(idl3 file)
Specifier
Packager
Ada 2005
Code generation
Automatic
tool
Packager
Tool
Component
Implementation
Real-Time
Model
Component
Description
(code,metadata,
real-time model…)
Developer
Repository
Application development
Platform
description & model
Component package
Executor
Application
Specification
Real-time
Requirements
Design
Tool
Workload
Model
Assembler/
Planner
Launching
Tool
Deployment
Plan
RT Model
Compiler
Application
RT Model
Application
Execution
RT Analysis
Tool
Fig. 1. Main processes in a component technology
selected technology. Therefore, the developer task is reduced to design and implement
the specific business code of the component without having to be aware of internal
details about the technology. Finally, the ”packager” gathers all the information
required to make use of the component, and creates and publishes the distributable
element that constitutes the component.
Relevant aspects of the proposed technology related to components development
are:
• The methodology for functional specification of components and the framework
proposed by the LwCCM specification have been adopted as the basis for the
technology. Hence, a container/component model is used in the component implementations, but CORBA is replaced by simpler static communication mechanisms
with predictable behaviour, and suitable for the execution platform. Remote communication between components is achieved by using Connectors. These are components whose code is completely and automatically generated by the tools and
encapsulate all the support for interactions among components.
• Component implementations are generated in Ada2005, so the set of Ada packages to which a component is mapped, as well as the code structure of all the elements that form the LwCCM framework have been defined. An automatic code
generation tool has been developed. It takes the specification of a component as
input and generates all the code elements that provide support for the component
inside the framework, including the frames in which the developer must write the
business code of the component.
• The technology follows the D&C specification for the description of the package
which represents the distributable component.
• In order to apply the technology to hard real-time component-based applications,
both standard specifications, D&C and LwCCM, have been extended with new
elements that are used to describe the temporal behaviour of components and the
requirements they impose on the resources in order to meet timing requirements:
112
- D&C specification has been extended in order to associate a temporal behaviour model to the specifications and implementations of components. This realtime model is used to describe the temporal responses of the component and the
configuration parameters that it admits. Although this paper does not detail the
modelling approach used, which is explained in [15], the basic idea is that the
real-time model of a component is a parameterized model, which describes the
component temporal behaviour having references to the models of the platform
in which the component is executed and to the models of other components that
it uses in order to implement its functionality. These real-time models have the
composability properties required to generate the real-time model of the complete application by composition of the individual real-time models of the software and hardware components that form it. This real-time model can be used
to obtain the response time of services, analyze schedulability or evaluate the
scheduling parameters required to satisfy the timing requirements imposed to
the application. In our case, the real-time models of the components are formulated according to the MAST model [16], so that the set of tools offered by the
MAST environment can be used to analyze the system.
- With the purpose of controlling the threading characteristics (number and
assignment of threads and scheduling parameters) of the components used in
the technology, the functional specification of a component, as it is described in
LwCCM, has been refined. A component can not create threads inside its business code. Instead of that, for each thread that a component requires, it declares
a port in its specification. This port implements one of the predefined interfaces
OneShotActivation or PeriodicActivation (see Section 3). When the component
is instantiated, the environment provides the thread for the execution of the
port. Scheduling parameters for the thread will be assigned as configuration
properties of the instance in the deployment plan.
- New mechanisms have been introduced in the container/component model to
define the scheduling parameters with which each invocation received by a
component is executed. The run-time control of these parameters is done by
means of interceptors which can be introduced in the framework for each operation offered by a component.
The application development process consists in assembling component instances,
choosing them from those which have been previously developed, and stored in the
repository of the design environment. This process is carried out by three different
agents in three consecutive phases. The “assembler” builds the application choosing
the required component instances and connecting them according to their instantiation
requirements. This work is led by the functional specification of the application, the
real-time requirements of the application, and the description of the available
components. The result of this first stage is a description of the application as a
composite component, which is useful by itself. The “planner” (usually the same as the
assembler) takes this description and decides the deployment of the application, which
means that it chooses the nodes in which each component instance will be installed,
and the communication mechanism between instances. The result of this stage is the
deployment plan, which completely describes the application and the way in which it
is planned to be executed. Finally, the “executor” deploys, installs, and executes the
application, taking the deployment plan and the information about the execution
platform as inputs. This labour is usually assisted by automatic tools.
Relevant aspects of the proposed technology regarding application development are:
• The D&C specification is taken as the basis for the process of designing and
deploying an application. D&C defines the structure of the deployment plan that
leads this process. It describes the component instances that form the application,
their connections, the configuration parameters for each instance and the assignment of instances to nodes.
• A deployment tool processes the information provided by the deployment plan. It
selects the code of the components suitable for the target platform and generates
the code required to support the execution of the components in each node. Specifically, it automatically generates the connectors, which provide the communication mechanisms between remote component instances, as well as the code for the
main procedures executed on each node.
• The specific aspects included to support hard real-time applications are:
- Once the planner has developed the deployment plan, the local or remote nature
of each connection between component ports is defined. Then, an automatic
tool generates the code of the connectors based on the selected communication
service and its corresponding configuration parameters, which were assigned to
the connection in the deployment plan. The communication service used must
hold a predictable behaviour, hence, the tool generates also the real-time models that describe the temporal behaviour of those connectors. These models will
be later composed with the real-time models of the other components in order
to build the analysis model of the complete application.
- Based on the deployment plan, a tool elaborates the real-time model of the
application by composition of the real-time models of the components that form
it (connectors included) and the models of the platform resources, which should
also be stored in the repository. This model is used either to analyze the schedulability of the application under a certain workload, or to calculate the resource
usage contracts necessary to guarantee its operation in an open contractual
environment [5]. These contracts will be negotiated, prior to the application
execution, by the launching tool.
- The execution environment includes a special internal service and interception
mechanisms that manage in an automated way the scheduling parameters of the
threads involved in the application execution. The configuration parameters of
this service, whose values may be obtained by schedulability analysis, are specified in the deployment plan and assigned to the service at launching time.
3
Reference model of the technology
The proposed technology is based on the reusability (with no modification) of the business code of the components, and the complete generation by automatic tools of the
113
114
code that adapts the component to the execution environment. This code is generated
according to the reference model shown in Figure 2. It takes the LwCCM framework
as a starting point, and adds to it the features required to control the real-time behaviour of the application execution. Each of the elements that take part in the execution
environment are explained below.
Component: A component is a reusable software module that offers a well-defined
business functionality. This functionality is specified through the set of services that
the component offers to other components, grouped in ports called facets, and the set
of services it requires from other components, grouped in ports called receptacles.
With the purpose of having complete control of the threading and scheduling
characteristics of an application, and in the look for being able to analyze it,
components in our technology are passive. The operations they offer through their
facets are made up of passive code that can call protected objects. But this does not
mean that there can not be active components in the framework, concurrency is
provided by means of activation ports. When a component requires a thread for
implementing its functionality, it declares a port that implements one of the two special
interfaces defined in the framework: OneShotActivation or PeriodicActivation. These
two kinds of ports are recognized by the environment, which creates and activates the
corresponding threads. The interface OneShotActivation declares a run() procedure,
which will be executed once by the created thread after the component is instantiated,
connected and configured. The interface PeriodicActivation declares an update()
procedure, which will be invoked periodically. A component can declare several
activation ports, each of them representing an independent unit of concurrency
managed by the component, and which are independent of the business invocations.
Activation ports are declared in the component specification (in the IDL file), and
all the elements required for their execution are created by the code generation tool.
Their configuration parameters, which includes the scheduling parameters of the
threads as well as the activation period (in case of PeriodicActivation ports) are
assigned for each component instance in the deployment plan.
Adapter: It represents the part of the components code which provides the run-time
support for the business code. All the platform related aspects are included in the
adapter. Its code is automatically generated according to the component/container
model. With this programming approach the component developer does not need to
know any detail about the underlying technology, he is only in charge of business code
development.
Connector: It represents the mechanism through which a component communicates
Receptacle
Client
Component
(bussiness
code)
Connector instance
Proxy
fragment
Servant
fragment
Client
Adapter
Facet
Interceptor
Server
Adapter
Scheduling
Attribute Service
Server
Component
(business
code)
Activation
port
Execution environment
Environment services
Fig. 2. Reference model of the technology
with another component connected to it by a port. In our technology, a connector has
the same structure as a component, but its business code is also generated by the
deployment tool, based on:
• The interface of the connected ports. The connectors are generated from a set of
templates which are adapted so that they implement the operations of the required
interface.
• The location of the components (local vs remote), and the type of invocation (synchronous or asynchronous). Combinations among these different characteristics
lead to different types of connectors:
- For local and synchronous invocations the connector is not necessary, the client
component invokes the operation directly on the server.
- For local and asynchronous invocations the connector requires an additional
thread to execute the operation (through activation ports).
- If the invocation is distributed, the connector is divided in two fragments: the
proxy fragment, which is instantiated in the client node, and the servant fragment, which is instantiated in the server node. The communication between the
two fragments is achieved by means of the communication service selected for
the connection. In this case, the connector can also implement synchronous or
asynchronous invocations, including the required mechanisms in the proxy
fragment.
• The communication service or middleware used for the connection and its corresponding configuration parameters, which are assigned for each connection
between ports in the deployment plan.
Interceptors: The concept of interception is taken from QoSforCCM [17]. It brings a
way to support the management of non-functional features of the application. An interceptor allows to incorporate calls to the environment services inside the sequence of an
invocation by executing certain actions before and after the operation is executed on
the component. The support for interceptors is introduced in the adapter, so it is hidden
to the component developer. Their introduction is optional for each operation, and it is
specified in the deployment plan.
In our technology they are used to control the scheduling parameters with which
each received invocation is executed. Based on the configuration parameters assigned
to it in the deployment plan, each interceptor knows the scheduling parameter which
corresponds to the current invocation, and uses the SchedulingParameterService to
modify it in the invoking thread. With this strategy, the following scheduling
parameters assignment schemes can be implemented:
• Client propagated: The scheduling parameters are those of the client that makes
the invocation.
• Server declared: The scheduling parameters are defined in the server component
and they are the same for all the received invocations.
• Transaction controlled: The scheduling parameters of an invocation depends on
the transaction[16] and the particular step inside the transaction in which the invocation takes place. This scheme enables better schedulability results since it
115
116
allows to impose at run-time scheduling parameters that may be different for each
invocation in the context of an end-to-end flow [18]. The values of these parameters are obtained from the analysis using holistic priority assignment tools.
SchedulingParameterService: It is an internal environment service which is invoked
by the interceptors to change the scheduling parameters of the invoking thread. The
kind of scheduling parameters that will be effectively used depends strongly on the
execution platform, it may be a single priority, deadline, or the contract to use in the
case of a FRESCOR flexible scheduling platform.
4
Architecture of a component implementation
There are two complementary aspects that a component implementation must address:
• The component has to implement the functionality that it offers through its facets,
making use of its own business logic and the services of other components.
• The implementation must include the resources necessary to instantiate, connect
and execute the component in the corresponding platform. This aspect is addresed
by implementing the appropriate interfaces which allow to manage the component
in an standard way. In our case, those defined by LwCCM.
Each aspect requires knowledge about different domains. For the first aspect, an expert
on the application domain corresponding to the component functionality is required.
For the second, however, what it is required is an expert on the corresponding component technology.
The proposed architecture for a component implementation tries to find an structural
pattern to achieve independency of the Ada packages that implement each aspect.
Besides, packages that implement the technology related aspects are to be automatically generated according to the component specification. With this approach, the
component developer only has to design the business code of the component.
The proposed architecture is based on the reference one proposed by LwCCM, but
adapted for:
• Making use of the abstraction, security and predictabilitiy characteristics of Ada.
• Including the capacity for controlling threading characteristics of the components.
• Facilitating the automatic generation of code taking the IDL3 specification of the
component as input and generating the set of classes that represent a component in
the technology.
• Providing a well-defined frame in which the component developer designs and
writes the business code.
In the proposed technology, the architecture of a component is significantly simplified
as a consequence of the usage of connectors. When two connected components are
installed in different nodes, the client component only interacts locally with the proxy
fragment of the connector, while the server component only interacts locally with the
servant fragment of the connector. Therefore, all the interactions between components
are local, since it is the connector who hides the communications mechanisms used for
the interaction.
117
<interface>
CCMObject
<interface>
ServerContainerInterceptorRegistration
<interface>
CCM_ComponentX_context
provide_facet()
connect()
get_connection_the_portU()
register_server_Interceptor()
<interface>
ClientContainerInterceptorRegistration
1 theContext
ComponentX_Wrapper
ComponentX_context
register_client_Interceptor()
<interface>
InterfaceA
thePortA_Facet 1..n
wrapper_InterfaceA
<interface>
ServerInterceptor
receive_Request()
delegated:CCM_InterfaceA
1 theExecutor
<interface>
CCMHome
theHome
create_componete()
1
<interface>
CCM_ComponentX
0..n
interceptor_for_OperA
Fig. 3. Example of Component Wrapper Structure for ComponentX
For each component, four Ada packages are generated. Three of them are
completely generated by the tool, while the last package leaves the “blank” spaces in
which the component developer should include the business code of the component.
The first module represents the adapter (or container) of the component. It includes the
set of resources that adapt the business code of the component to the platform,
following the interaction rules imposed by the technology. It defines three classes:
• The wrapper class of the component, called {ComponentName}_Wrapper, which
represents the most external class of the component. It offers the equivalent interface of the component, which LwCCM establishes as the only interface that can
be used by clients or by the deployment tool to access to the component. For that,
the class implements the CCMObject interface, which, among others, offers operations to access to the component facets, or to connect the corresponding server
components to the receptacles. Besides, the capacity to incorporate interceptors is
achieved by implementing the Client/ServerContainerInterceptorRegistration
interfaces, a modified version of the interfaces with the same name defined in
QoSCCM [17]. As it is shown in Figure 3, this class is a container which aggregates or references all the elements that form the component:
- The component context, through which components access to their receptacles.
- The home, which represents the factory used to create the component instance.
- The executor of the component, which represents its real business code implementation. Its structure is explained below.
- An instance of a facet wrapper class that is aggregated for each facet of the
component. They capture the invocations received in the component and transfer them to the corresponding facet implementations, which are defined in the
executor. The facet wrappers are the place in which the interceptors for managing non-functional features are included.
• The class that represents the context implementation, called {ComponentName}_Context. It includes all the information and resources required by the
component to access to the components which are connected to its receptacles.
118
• The {ComponentName}_Home_Wrapper, which implements the equivalent interface of the home of the component. It includes the class procedures (static) that
are used as factories for component instantiation.
The rest of generated Ada packages contain the classes that represent the implementation of the business code of the component (the executor). The LwCCM standard fixes
a set of rules that define the programming model to follow in order to develop a component implementation. Taking the IDL3 specification of a component, CCM defines a
set of abstract classes and interfaces which have to be implemented, either automatically or by the user, to develop the functionality of the component. This set of root
classes and interfaces are grouped in the generated package {ComponentName}_Exec.
The {ComponentName}_Exec_Impl package includes the concrete classes for the
component implementation. They inherit from the classes defined in the previous
package. The class that represents the component implementation, {ComponentName}_Exec_Impl, which is shown in Figure 4, has two attributes:
• A reference to the component context. It is set by the environment through the
set_session_context()
operation, and it is used to access to the receptacles.
• An aggregated object, of the {ComponentName}_Impl class, whose skeleton is
generated by the tool and has to be completed by the developer.
The {ComponentName}_Impl class, represented in Figure 4, is defined in a new package, in order to hide the environment internals to the code developer. It represents the
reference frame in which the developer introduces the business code. Relevant elements of that class are:
• For each facet offered by the component, a facet implementation object is aggregated.
• Each activation port defined in the specification of the component, represents a
thread that is required by the component to implement its functionality. For
implementing those threads two kinds of Ada task types are defined. The
OneShotActivationTask executes once the corresponding run() procedure of the
port, while the PeriodicActivationTask executes periodically the update()
procedure of the corresponding port. Both types of task receive as a discriminant
during its instantiation, a reference to the data structure that qualify their
<<Interface>>
Int erfa ce_A
<<Interface>>
CCM_ComponentX
set_session_context()
get_thePortA()
set_attribute1()
thePortA_facet
Co mp onen tX_ Impl
thePortA_Port
th eOSAPort
1
OneShotActivationTask
Co mp onen tX_ Exec _I mp l
theImpl
block
1
1
-thePAPort
1
OneShotActivationBlock
PeriodicActivationTask
theContext
theState
<<Interface>>
CCM_ComponentX_Context
1
1
ComponentX_State
theState
PeriodicActivationBlock
block
thePortU : Interface_U
theOSAPort : OneShotActivationBlock
thePAPort : PeriodicActivationBlock
attribute1 : Attr_Type_1
...
Fig. 4. Example of Component Implementation Structure for ComponentX
execution, including scheduling parameters, period, state of the component, etc.
These threads are activated and terminated by the environment by means of
standard procedures that LwCCM includes in the CCMObject interface to control
the lifecycle of the component.
• All the implementation elements (facet implementations, activation tasks, etc...)
operate according to the state of the component, which is unique for each instance.
Based on that, the state has been implemented as an independent aggregated class,
which can be accessed by the rest of the elements, avoiding cyclic dependencies.
Most of the code of this class is generated automatically, the component developer
only has to write the body of the activation port procedures (run and update), and the
set of operations offered by each of the facets implementations. The structure generated is exactly the same, but the “business” code, which in the case of a connector consists in the code required to implement remote invocations, is also generated
automatically by the deployment tool.
The code generation tools follows a very simple scheme. It is based on a set of
parameterized source code templates that represent all the possible code blocks which
can appear in any of the four packages to which a component specification is mapped.
The parameters represent the identifiers or options inside the blocks which depend on
the implemented component. The tool generates the whole code by inserting
consecutive blocks according to the elements defined in the specification of the
component (ports, operations, etc). For each element, its corresponding identifiers and
qualifiers are extracted, and they are used to replace the corresponding parameters in
the code block.
The current available Ada mapping for IDL [19] is based in Ada95, so for the
development of the code generation tool, it has been necessary to define new mappings
for some IDL types in order to get benefit of the new concepts introduced in Ada 2005.
The main change concerns to the usage of interfaces. The old mapping for the IDL
“interface” type leads to a complex Ada structure while now can be directly mapped to
an Ada interface. Besides, some data structures defined in IDL, as for example the
“sequence” type, can be implemented now with the new Ada 2005 containers.
5
Practical experience
At the time of the first attempts made to validate the proposed technology, there was
no real-time operating system with support for Ada 2005 applications, so the tests
were run on a Linux platform, using the GNAT (GAP) 2007 compiler. The construction of the connectors for the communication between remote components, was made
using the native Ada Distributed System Annex (DSA), Annex E of the Ada specification. The implementation of DSA used was GLADE [20]. Distributed test applications
were developed and executed successfully. The platforms used in this evaluation were
sufficient for the conceptual validation of the technology, since from the point of view
of the software architecture the final code is equivalent, but of course, it is not appropriate for the validation of the timing properties of real-time applications.
119
120
The recently released new version of MaRTE_OS [21] provides now support for the
execution of Ada 2005 applications, and allows to test the technology over a hard realtime environment. Still there is a lack for a real-time communication middleware. An
enhanced version of GLADE that enables messages priority assignment exists for
MaRTE_OS & GNAT [22], but it has not been ported to the new versions. To
overcome this limitation, we have developed simpler connectors using a link layer
real-time protocol. Our first tests on a real-time platform have been done with
connectors that use directly the RT-EP [23] protocol for the communication between
remote components. The same application tested in the linux platform was used in
MaRTE_OS, and as expectable, the code of the components did not require any
modification, the only necessary change was the development of the new connectors
suitable for the new communication service (RT-EP) used.
6
Conclusion and future work
This paper proposes a model based technology for the development of real-time component-based applications. The usage of the Ada language for its implementation,
makes it particularly suitable for applications that run in embedded nodes with limited
resources and strict timing requirements. The technology is based in the D&C and
LwCCM standard specifications, which have been extended in order to support the
development of applications with a predictable and analyzable behaviour.
The key features of this technology have been specified and tested successfully. Nevertheless some challenges arise for this community to face. The most rewarding of them
is the availability of an Ada native communication middleware, here used in the development of connectors, which must hold predictable behaviour, and allow a priority
assignment for the messages based on the transactional (or socalled end-2-end flow)
model. Our aim is to develop the connectors using the Ada Distributed System Annex
so that applications rely only on the Ada run-time infrastructure with no additional
middleware, which is highly desirable to target small embedded systems.
As future work, some more tests have to be applied in order to quantify the concrete
overheads introduced by the technology. A planned enhancement for the technology is
the construction of a graphical environment to integrate all the stages of development
of an application: design, code generation, analysis, and finally, execution. Another
effort that has been started in the OMG and arise from this work is the elaboration of
an updated version of the mapping from IDL to Ada 2005 [24].
References
[1]
[2]
[3]
[4]
IST project COMPARE : Component-based approach for real-time and embedded systems
http://www.ist-compare.org
IST project FRESCOR : Framework for Real-time Embedded Systems based on Contracts
http://www.frescor.org.
OMG : "Lightweight Corba Component Model", ptc/03-11-03, November 2003
T. Taft et al., editors. Ada 2005 Reference Manual. Int. Standard ISO/IEC 8652/1995(E)
with Technical Corrigendum 1 and Amendment 1. LNCS 43-48, Springer-Verlag 2006.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
Aldea M. et al: “FSF: A Real-Time Scheduling Architecture Framework” Proc. of 12th
RTAS Conference, San Jose (USA), April 2006
OMG : "Deployment and Configuration of Component-Based Distributed Applications
Specification", version 4.0, Formal/06-04-02, April 2006
A. Möller,M. Åkerholm, J. Fredriksson y M. Nolin. “Evaluation of Component Technologies with Respect to Industrial Requirements” Proc. of 30th Euromicro Conference on
Software Engineering and Advanced Applications, August 2004.
R. Ommering, F. Linden, J. Kramer: "The koala component model for con-sumer electronics software". In: IEEE Computer, IEEE (2000) 78-85.
Lundbäck K-L., Lundbäck J., Lindberg M.: "Component based development of dependable real-time applications" Arcticus Systems, http://www.arcticus-systems.com
Bondarev E., de With P., Chaudron M. “Predicting Real-Time Properties of ComponentBased Applications” Proc. of 10th RTCSA Conference, Goteborg, August 2004.
Bondarev E. et al. “CARAT: a toolkit for design and performance analysis of componentbased embedded systems” Proc. of DATE 2007 Conference, April 2007.
M. Åkerholm et al. “The SAVE approach to component-based development of vehicular
systems” Journal of Systems and Software, Vol. 80, 5, May 2007
Ke X., Sierszecki K. and Angelov C. “COMDES-II: A Component-Based Framework for
Generative Development of Distributed Real-Time Control Systems” Proc. of 13th
RTCSA Conference, August 2007
K. C. Wallnau. “Volume III: A Technology for Predictable Assembly from Certifiable
Components”, Technical report, Software Engineering Institute, Carnegie Mellon University, April 2003, Pittsburgh, USA.
P. López, J.M. Drake, and J.L. Medina: "Real-Time Modelling of Distributed ComponentBased Applications" Proc. of 32h Euromicro Conference on Software Engineering and
Advanced Applications, Croacia, August 2006.
M. González Harbour, J.J. Gutiérrez, J.C.Palencia and J.M.Drake: "MAST: Modeling and
Analysis Suite for Real-Time Applications" Proc. of the Euromicro Conference on RealTime Systems, June 2001.
OMG: “Quality of Service for CORBA Components”, ptc/06-04-05. April 2006
OMG: “A UML Profile for MARTE” ptc/07-08-04, August 2007
OMG: “Ada Language Mapping Specification” - Version 1.2. October 2001.
L. Pautet and S. Tardieu. “GLADE: a Framework for Building Large Object-Oriented
Real-Time Distributed Systems”. Proc. of the 3rd IEEE Intl. Symposium on Object- Oriented Real-Time Distributed Computing, Newport Beach, USA, March 2000.
M. Aldea and M. González. “MaRTE OS: An Ada Kernel for Real-Time Embedded
Applications”. Proc. of the International Conference on Reliable Software Technologies,
Ada-Europe 2001, Leuven, Belgium, Springer LNCS 2043, May 2001.
J.M.Martínez and M. González. “RT-EP: A Fixed-Priority Real Time Communication Protocol over Standard Ethernet” Proc. of the 10th Int. Conference on Reliable Software
Technologies, Ada-Europe 2005, York(UK), June 2005
López-Campos, J.-J. Gutiérrez and M. González-Harbour: "The chance for Ada to support
distribution and real-time in embedded systems" Procc. of the 8th Intl. Conference on
Reliable Software Technologies, Ada-Europe 2004, Palma de Mallorca, Spain , June 2004.
J.Medina. “Status report of the Ada2005 expected impact on the IDL to Ada Mapping”.
OMG documents mars/07-09-12 and mars/07-06-13. http://www.omg.org 2007
121
122
An Architecture to Support Dynamic Service Composition in Distributed
Real-Time Systems
Iria Estévez-Ayres
Dpto. Ing. Telemática
Univ. Carlos III de Madrid
Leganés, Madrid, Spain
[email protected]
Luı́s Almeida
DET / IEETA-LSE
Universidade de Aveiro
Aveiro, Portugal
[email protected]
Abstract
Recently, new trends in application development for distributed platforms, such as the composable services model,
attempt to provide more flexibility in system design, deployment and execution. Such trends, and particularly the referred composable services model, can also be beneficial
in real–time distributed embedded systems, also providing
a means to support more adaptive behaviors, reacting to
the execution environment or coping with system reconfiguration. This paper explores a relatively new direction,
which is the extension of the service-based model to dynamic, i.e. at run–time, composition in real-time distributed
environments, in order to support the level of flexibility and
adaptibility referred above. The paper proposes an architecture to support such dynamic service composition that
is based on the Flexible Time Triggered communication
paradigm (FTT).To achieve the desired goal, we also redefine the concepts of service and service–based application in the concept of the FTT paradigm. Finally, we show
experimental results obtained with a prototype implementation of the proposed architecture that confirm its feasibility
and good temporal behavior.
1 Introduction
In the last few years, distributed software systems have
become more dynamic allowing transparent distribution,
self-reconfiguration, portability and migration of applications, etc. As a consequence, new application development
paradigms emerged such as those based on the usage of
multiple services dispersed in the environment [22]. These
new paradigms, used in conjunction with the composable
service model, provide more flexibility in the aplication development and execution. Instead of monolithic applications resident in one single node, it is possible to create
Marisol Garcı́a-Valls, Pablo Basanta-Val
Dpto. Ing. Telemática
Universidad Carlos III de Madrid
Leganés, Madrid, Spain
{mvalls, pbasanta}@it.uc3m.es
applications dynamically from existing services, possibly
remote, enhancing the reuse of code and decreasing the development time.
The composable service model can also be advantageously used in the development of distributed real–time
embedded systems, identifying services with clear interfaces that are executed according to a specific sequence.
The services available, possibly with multiple versions, can
be shared among different applications and can change online providing a high level of adaptability to varying operational conditions, enhancing the efficiency in the use of
system resources [19]. Examples include networked control
systems, where different controllers and filters can be seen
as services that can be shared and updated on-line, and also
distributed multimedia systems, in which the different services correspond to filters and encoders/decoders possibly
with different levels of Quality of Service(QoS).
The dynamic composition of services in this kind of applications can be interesting not only from the applications
development point of view, by means of automatic composition of software components, but mainly as a way to support
on–line updating and reconfiguration of services, for example to provide dynamic QoS management, load balancing
and fault tolerance as explained next:
• Dynamic software and hardware updating. Whenever
a new version of a service appears in the system, it
will be analyzed to check whether the performance of
the respective application using this new version can
be improved, according to a pre-specified metric. Depending on such analysis, the system might switch service versions and recompose the applications that use
them.
• Fault Tolerance: Different service implementations
are used as backups to assure the survival of the system
if one of the nodes involved falls down. In this situation, upon a service failure, a failure detector requests
123
the removal of the failed service, causing the applications recomposition with another available version of
that service.
flexibility and the timeliness required to support dynamic
composition of real-time systems based on services allocated in nodes connected to a real-time network.
The rest of the paper is organised as follows: Section 2
briefly introduces the background of this work; Section 3
presents the underlying application model; Section 4 describes the proposed approach; Section 5 presents preliminary experimental results; and, finally, Section 6 outlines
the main conclusions.
• Load balancing: Whenever the performance of an application degrades because excessive load in a given
node is causing poor performance to one of its services, then the system can look for other versions of
that service residing in different nodes and currently
exhibiting better performance and invoke the recomposition of the application to exploit those alternatives.
This allows maximizing application performance at
each moment. Worst-case response time analysis can
be used to deduce the current performance level of a
given service as a function of the load in the respective
node.
2 Related Work
The component service model has been recently applied
to distributed environments such as Ubiquitous Computing
and Peer-to-Peer, to develop generic frameworks that allow
providing end–to–end application QoS via QoS–aware middleware systems [16, 10, 11, 13]. However, these systems
are not suitable for composing real–time applications since
the specified QoS does not take into account the underlying hardware and the scheduling requirements of the whole
system.
Integration of QoS characteristics into component technology has also been studied in [5]. However, these approaches aim at a rather static composition environment
and enterprise applications based on components and not
strictly on services.
In the real–time world, component–based software development (CBSD) [4, 12] is an emerging development
paradigm that enables the transition from monolithic to
open and flexible systems. This allows systems to be assembled from a pre–defined set of components explicitly
developed for multiple usages. However, none of these approaches can be directly applied to dynamic composition
of real–time components (or more generally, applications)
in a distributed environment because they are focused on
the design phase rather than on the execution phase. Within
the context of reconfigurable real–time systems Tesanovic’s
work [23] on reusable software using aspects and components defines a WCET–based composition model for systems that execute in a single node, thus not being suitable for distributed applications. Wang’s work [24], on the
other hand, studies the use of aspects and components in
a COTS solution, CIAO, the Component–Integrated ACE
ORB. This solution is too heavy to be applied to distributed
embedded systems based on simple microcontrollers.
Another line of work that bears some relationship with
the scope of this paper is the use of multiple modes. In a
way, multiple modes also correspond to dynamic reconfigurations. Traditionally, real–time systems are assumed to
be formed by a fixed set of tasks that are scheduled to fulfil
the application requirements and exhibit a desired behavior.
Applications needing to exhibit more than one behaviour
are known as multi–moded applications [20]. These appli-
As a practical example consider a vision system for textile inspection based in a parallel computer cluster[3]. Each
acquisition node delivers camera frames to one among n
computing nodes. However, acquisition nodes are sometimes idle, for example when the fabric roll is changed, and
thus, the load of the computing nodes varies. Using a compositional framework as referred above we can consider the
processing of each vision data stream as an application and
distribute the load of the computing nodes on–line to enhance the performance of the whole system. Another example concerns high availability systems, such as those involved in energy transportation [6]. A fault tolerance solution using the dynamic service composition referred above
can provide the required flexibility, reusability and portability, at a reasonable cost, instead of an ad hoc solution
at application level that lacks such attributes and would be
more expensive. Also, solutions at hardware and OS–level,
only, do not solve all dependability problems of those applications as indicated by the end-to-end argument [21].
However, in order to be effective for distributed real-time
embedded systems, the proposed compositional framework
must allow for on–line addition/removal of services and
nodes, transparent code mobility, scalability and continued
timeliness. This combination of attributes is not trivial to
achieve and requires an adequate arquitectural support.
Previous work [7, 9] focused on the definition of a framework that allowed performing the off-line composition of
static real-time applications from existing downloadable
services but this framework did not deal with dynamic distributed real-time systems or with the possibility of using
remote services. In this paper we define a model and propose an architecture that integrate techniques inspired by
emerging models of distributed computing for on-line composition of real-time services, offering the possibility of dynamic reconfiguration of software and hardware. To build
this architecture we use the Flexible Time Triggered (FTT)
communication paradigm [17] that provides the operational
2
124
cations consist in a fixed number of operating modes, each
one producing a different behaviour, implemented by different task sets. The set of modes is established off–line and
thus, despite the dynamism introduced by allowing on–line
mode changes, the overall run–time flexibility is still limited, particularly with respect to the level targeted by the architecture that we propose in this paper. In fact, we focus on
run–time flexible systems in which the composition of the
applications is not established a priori and can change during the system lifetime due to the arrival/departure of new
versions of the services that compose the applications, and
due to the possible composition of new applications with
the existing services.
tion, nor the means for service composition. This extension
is the main contribution of this paper.
3 Application model
We consider a distributed system executing recurrent applications that span over several nodes. Each application is
composed of a set of services possibly residing in different
nodes and invoked in a given sequence. Services are materialized in tasks, each with worst case computation time
Cit and relative deadline Dit . Given the recurrent nature of
the applications, tasks will also be recurrent with period Tit .
Global synchronization is available thus supporting use of a
release offset Oit .
Tasks communicate with each other across the system
exchanging messages which are read in the beginning of
their execution and generated at the end, in every periodic
instance. We classify tasks in four groups according to the
way they interact with one another. A task needing data
from other tasks is a consumer; a task that generates data
for other tasks is a producer; a task can also be both simultaneously, i.e., a consumer/producer; tasks that do not
interact with others are called stand–alone. The tasks in the
first three groups are generically called interactive and their
interactions are supported by message–passing with multicast, i.e. several consumers may receive a message sent by
one producer.
There is no control flow between tasks and messages. Their interface is based on data buffers which are
read/written by the tasks. Messages are then considered independent entities, triggered autonomously by the network,
each with worst-case transmission time Cim , relative deadline Dim , period Tim and release offset Oim .
Hence, an application corresponds to an ordered graph of
services (tasks) and the respective interactions (messages)
such as represented in fig. 1(a), which is executed periodically. A corresponding transaction (or data stream as in [2])
is then defined as the sequence of task executions and message transmissions between the triggering of the first producer and termination of the last consumer involved in an
application. Typically, all tasks and messages in a transaction share the same period but not necessarily, such as in
multirate control systems in which inner loops execute at
rates that are integer multiples of a lower rate at which an
outer loop executes.
If we have multiple implementations (versions) of a task
on different nodes, e.g. tasks 2A and 2B in fig. 1(b), the
system must decide, according to an appropriate parameter,
e.g., the worst–case execution time (WCET), worst–case response time (WCRT) or another QoS metric, which service
version will be used to compose the application, using a
composer engine. However, for transparent composition we
need a mechanism to hide the existence of different imple-
More related work can be found in the field of Fault Tolerance, where active, passive and hybrid replication mechanisms have been developed [15]. The architecture proposed in this paper supports a passive replication approach
in which backup services are activated upon the failure of
active ones. A particular aspect is that previous approaches
seem to have focused on using similar service or component instances, replicas, for backup elements. However, in
our case the backups do not need to be similar and can, inclusively, present different levels of performance or QoS.
When necessary, the system should use the best backup
service available to compose each application. This may
support more efficient and thus less expensive fault-tolerant
replication mechanisms. N-version software [1] is a particular approach that targets high coverage of software faults.
This approach also uses several different versions of each
software entity but it is normally associated to active replication and voting mechanisms that are capable of handling
broader fault models but are substantially more expensive
and less flexible than the approach proposed in this paper.
In the remainder of the paper we propose and describe
an architecture that explores the FTT paradigm [17] to support dynamic service composition in distributed environments. The composition is driven by specific performance
(QoS) metrics defined a priori thus being QoS–aware. The
FTT paradigm is particularly suited to support this model
because it concentrates in a single node, known as FTT–
master, the temporal properties of all message streams and
tasks currently running in the system, and distributes schedules for the remaining nodes generated at run–time. Thus,
changes in the system configuration are readily and synchronously applied to the whole system. Moreover, most
of the system management overhead is concentrated on one
node, only, i.e., the FTT–master, which can be replicated for
fault tolerance porposes [8], while the remaining nodes can
be relatively simple. This paradigm has been implemented
over Controller Area Network (CAN) [8] using simple 8bit microcontrollers and over Etheernet, both in shared [18]
and switched [14] modes. However, the FTT paradigm by
itself does not provide the notions of service and applica3
125
carried out by data buffers, as considered in the application
model above, without explicit control signals.
Beyond the tasks and messages scheduled by the master, the FTT paradigm also considers the existence of a
specific window inside each EC to support the transfer of
non-periodic messages, called asynchronous, e.g., related
to management actions, signalling and logging (CM and
NRTM). These, however, to do not interfere with the scheduled, i.e., synchronous, messages (SM).
msg3
msg1
1
msg2
msg4
2B
msg1
msg5
4
2
msg2B
msg1
5
1
msg2A
2A
3
3
(a) Distributed application ab- (b) Different versions of service 2
straction
2B
msg1
1
msg2
3C
3B
msg3
4
2A
3A
2
3
(c) Hidding the different versions
EC(i)
synchronous
window
Figure 1. Application model abstraction
TM
SM8
SM9 CM3 NRTM4
TM
SM4
SM11 SM9 CM1
NRTM2
NRTM7
Figure 2. FTT Elementary Cycle
mentations of the same service and make it transparent to
the other tasks that interact with that service (fig. 1(c)).
Such mechanism is basically a system manager that controls the execution of tasks and transmission of messages
across the distributed system, including the respective multiple versions, thus inherently playing the role of composer.
To implement this mechanism we use the FTT–paradigm
that already includes a system manager to control task executions and message transmissions by means of specific
trigger messages, and we add the necessary structures to
support the composition, namely to describe the applications and their associated services as well as all the service
versions available and their properties.
Some important features of the FTT paradigm are: Slave
nodes are not aware of the particular scheduling policy and
the master can change it on–the–fly autonomously or on demand; Slave nodes are synchronized upon the reception of
the TM; All dispatching information per EC is conveyed in
the TM; The master holds all current timing requirements
of the system; Any changes to such requirements are subject to on–line admission control to filter out changes that
could jeopardize the system schedulability; The activation
periods, the deadlines and the offsets are integer multiples
of the EC duration.
4.2
4 Applying the model to FTT
4.1
SM2 SM5
EC(i+1)
asynch
window
Temporal model of an application
In the FTT–paradigm the transmission of messages and
the execution of tasks can be triggered independently of
each other by the central scheduler. Alternatively, the
scheduler may trigger just the message transmissions while
the associated tasks are triggered by the reception of these,
using callback functions. In any case, the application transactions are triggered by the central scheduler, which resides
in a network component, and thus we call this a network–
centric approach [2, 14].
The network–centric approach requires the definition of
appropriate message offsets to achieve low transaction endto-end delay. In the example shown in fig. 3, messages m1
and m2 are scheduled by the central scheduler and trigger
tasks τ2 and τ3 respectively. Task τ1 is triggered by the central scheduler with offset 0 and represents the start of the
respective transaction. The offset O2m must be larger but
as close as possible to O1m + W1m + W2t where W1m and
W2t are the worst-case response times of message m1 and
task τ2 , respectively, considering all possible interferences.
If a new implementation of task τ2 becomes available with
a shorter C2t , thus shorter W2t , the central manager can reduce O2m accordingly, thus decreasing the transaction endto-end delay. The worst-case response time analysis must
FTT–paradigm principles
The FTT paradigm [17] uses an asymmetric architecture,
with one master node controlling multiple slave nodes. The
master holds the system requirements (communications and
computations) and executes the scheduling policies and on–
line admission control. Since the requirements and operational data and mechanisms are located in just one node, this
architecture supports fast and atomic changes to the system
configuration. Master redundancy is used to tolerate master
crashes.
The scheduling of the tasks and messages is carried out
on–line in the master, for a time window called Elementary
Cycle (EC). At the begining of each EC, the master distributes across the system the schedule for that EC, called
the EC-schedule, using trigger messages (TMs), fig. 2. The
TM (see structure in table 1) is the only control message
sent per EC and it may trigger the transmission of several
slave messages, called synchronous messages, and tasks executions. These messages use indirect source addressing
with unique identifiers assigned to all sources. Moreover,
the interface between tasks and synchronous messages is
4
126
Type
TM type
Master ID
2 Bytes
[b15 , b12 ]
[b11 , b0 ]
TM MESG ID
[0, 4096)
TM Flags
Reserv.
Num Sec.
2 Bytes
No def
[b7 , b0 ]
No def
[0, 256)
Num. synch.
msgs
2 Bytes
[b15 , b0 ]
[0, 65536)
ID
...
Time
Tx
1 Byte
[b7 , b0 ]
[0, 256)
2 Bytes
[b15 , b0 ]
[0, 65536)
...
...
...
Table 1. Original structure of the Trigger Message
Producer
Producer
τ1
Consumer
Consumer
msg1
msg2
τ2
Lookup service
τ3
Application
Service−Nodes
DB
end−to−end delay
App Admission Control
produce message 1
Producer ( τ 1)
t=0
W
t
1
W
1111 0
0000
1
Composer
t
1
1111 01
0000
m
1
O
T
addMessage(id)
delMessage(id)
t
1
produce message 2
Producer− ( τ )
Consumer 2
W
111 01
000
m
2
O
t=0
Consumer ( τ 3)
W
W
t=0
Application
Service−Nodes
DB
t
3
Admission Control
FTT−
Ethernet
111
000
t=0
Network
App Interface
t
2
m
1
1111
0000
W
Scheduler
m
2
111
000
EC Scheduler
Register
Dispatcher
Ethernet
Figure 3. A transaction and its timeline
Figure 4. Proposed architecture
be executed on–line whenever a service is added/removed
from a node. The application composer must then deduce
the appropriate offsets, for example, using the techniques
proposed in [2]. Notice, however, that either the response
time analysis and the generation of the required offsets are
beyond the scope of this work.
4.3
parameters and the messages involved. All such information is stored in the Application-Service-Nodes Database.
When a new service implementation arrives at the system it notifies the master using an asynchronous message
(Lookup service), the master analyzes the respective worstcase response time as well as the performance and schedulability of the target application and of the whole system
(Application Admission Control). If a given QoS parameter is improved, e.g., application end-to-end delay, and the
schedulability of the system is not jeopardized, the master
(Composer) makes the necessary changes at the application
level to replace the old with the new service implementation within the respective application transaction. These
changes are then passed down to the FTT layer as new messages/tasks to be added or replaced in the FTT System Requirements Database.
In order to achieve the transparency referred in Section 3,
the unique identifier of each message is broken into two
parts, a service level identifier and an application level identifier. The former separates different versions of the same
service while the latter identifies the service within an application. The consumers of a message will only check the
application-level identifier, i.e., they listen to any service
version indistinctively. On the other hand, the master and
the producers will look at the whole identifier, which al-
Proposed architecture
The proposed system architecture is an extension to the
FTT architecture, which simplifies substantially the overall design task by reusing and exploiting the operational
flexibility features of the FTT paradigm. Moreover, it also
allows an easy porting to different hardware platforms for
which FTT protocol implementations are available. Fig. 4
shows an implementation over FTT-Ethernet. Using other
communication platforms, e.g., CAN, just requires replacing the FTT-Ethernet block by an FTT-CAN block, which
interface is nevertheless similar.
Basically, the internal architecture of the FTT–master is
complemented with an extra layer that handles applications
and services. At the application level, the new layer keeps
track of the applications being executed and the services
that compose them while, at the service level, it manages
the several versions of the available services, their location,
worst-case execution/response times, possibly other QoS
5
127
Master
lows scheduling one given service implementation among
several available.
1
{
QoS1
2a
OUT:msg 01
{
QoS2a
Producer
Ethernet Switch
Intel Celeron
735MHz
128MB RAM
10Mb/s
{
QoS3
3
IN:msg X1
OUT:msg 12
Intel Celeron
735MHz
128MB RAM
IN:msg X2
TM(async,msg01,msg12)
Consumer
Intermediate 1
Intermediate 2
2b
Master
EPIC 266MHz
Intel Mobile Pentium MMX
128MB RAM
(a) Configuration of nodes
1
{
QoS1
OUT:msg 01
2a
{
QoS2a
IN:msg X1
OUT:msg 12
3
{
QoS3
async msgs
{
QoS2b
2b
Master
uses Switched Ethernet [14]. Then we defined an experimental setup (fig. 6) using computers running RT-Linux.
The FTT EC is set to 1 millisecond. The system executes
one synthetic application composed of three services similarly to what is shown in fig. 1(b).
IN:msg X1
OUT:msg 22
(b) New node attach
1
{
OUT:msg 01
2a
EPIC 266MHz
128MB RAM
Figure 6. Configuration of nodes
IN:msg X2
async msgs
QoS1
EPIC 266MHz
128MB RAM
{
QoS2a
IN:msg X1
OUT:msg 12
3
{
QoS3
IN:msg X2
5.1
TM(async,msg01,msg22)
Two similar implementations of a consumer/producer task
{
QoS2b
2b
Master
IN:msg X1
OUT:msg 22
(c) Reconfiguration
Producer
Intermediate Consumer 1
1.4
Figure 5. Example of reconfiguration
1.2
1
Packets/t
Fig. 5(a) depicts an illustrative simplified scenario in
which all nodes implement only one service and the application spans over three nodes. The message identifiers have
two digits, the first one referring to the service-level identifier and the second one referring to the application-level
identifier. At run–time, the Trigger Message triggers the
transmission of the synchronous messages 01 (version 0,
service 1) and 12 (version 1, service 2). So, the nodes 1 and
2a that hold the scheduled service versions transmit their
messages over the network in a broadcast fashion. These
messages are received by nodes 2a and 3 respectively. The
X in the service-level identifier means a don’t care. When
a new node joins to the network, it uses an asynchronous
reserved message to notify all its information to the master 5(b). The master assigns an identifier to each new service version (2b in this case) and decides to replace the other
version 2a. Thus, it starts scheduling message 22 (version
2, service 2) instead of 12, Fig. 5(c). The following service
in the application, i.e., 3, continues receiving any message
that has application identifier 2, either 12 or 22 and thus it
will not be aware of the change.
0.8
0.6
0.4
0.2
0
3e+08
3.05e+08
3.1e+08
Time
3.15e+08
3.2e+08
Figure 7. Experiment 1: Message arrivals at
the consumer node
This experiment verifies the switching between two versions of a service that corresponds to a consumer/producer
task. The versions reside in the nodes intermediate 1 and
2 and exhibit similar temporal parameters, namely O =
200EC, T = 1000EC, D = T . The master switches between those implementations every 10 seconds. The consumer node is actually receiving both the producer message
and the consumer/producer messages and logging their reception instants (fig. 7). We can see that the switching between both service versions does not cause any significant
extra delay. The end–to–end delay experimented by the
whole application stays around 25.3 microseconds with a
jitter of ±30 nanoseconds and rare peaks of 100 nanosec-
5 Experimental Results
To verify the practical feasibility of the proposed architecture, we implemented it over the FTT-SE protocol, which
6
128
onds caused by the run–time system, despite the switching
between versions. Similarly, the interactivation delay of the
consumer task stays steadily near 1 second, as expected,
with again a jitter of ±30 nanoseconds. This experiment
shows that the proposed architecture is able to switch between service implementations in different nodes, on-line,
maintaining the same QoS, and with the remaining tasks not
being aware of the switching.
5.2
Two implementations of a consumer/producer task with diﬀerent
QoS
Figure 9. Experiment 2: End–to–end delay
Producer
1.4
1.2
Packets/t
1
0.8
0.6
0.4
0.2
0
3e+08
3.05e+08
3.1e+08
Tiempo
3.15e+08
3.2e+08
Figure 10. Experiment 2: Interactivation delay
Figure 8. Experiment 2: Message arrivals at
the consumer node
6 Conclusions and Future Work
In this experiment we change a temporal parameter of
one of the consumer/producer tasks. One has an offset of
m
= 100EC for producing the second message, and the
O2A
m
other has O2B
= 200EC for the same. This offset impacts
directly on the QoS of the application, leading to different
end–to–end delays. We can see (figures 8 and 9) how the
end–to–end delay is modified from 150 to 250 microseconds every time the service implementation is switched, as
expected. The interactivation delay (fig. 10) shows peaks
of ±100 milliseconds when the switching occurs due to the
difference between the two different offsets of the transmissions carried out by the different versions of the intermediate service.
This experiment shows that when we switch between
two different profiles, i.e., implementations, of a service,
the consumer of the messages produced by such service will
not notice such switching, except for an anticipation or delay of its activation, caused by the different timing parameters of the different profiles, which lead to a corresponding
end–to–end delay modification. This result shows that the
proposed architecture can support on–line updating of code,
even when its timing characteristics are changed.
Next generation distributed real–time systems will require more flexibility, and the ability of reacting and changing their behaviour at run–time, i.e. adaptability. Service–
based approaches can be used in the development of such
systems, in order to provide this desired flexibility by means
of the definition of a model and an architecture that supportdynamic service composition, allowing the applications to
change dynamically the set of services that compose them,
supporting for each service the definition of different versions or profiles.These aproaches also offer the possibility
of providing dynamic QoS management, load balancing and
fault tolerance.
This paper proposed an architecture to support dynamic
service composition in a distributed embedded real–time
environment. This architecture allows an application to be
composedof existing services dispersed in the network, and
to dynamically switch between different versions of any
given service in a transparent way with respect to service
location, timing features or set of consumers of its messages. The architecture is built as an extension to the FTT
paradigm. This extension required the introduction of the
7
129
notion of service and service–based application in the FTT
scope.Also, a prototype of the proposed architecture was
implemented as proof of concept, and the experimental results show that the architecture is able to switch between
service implementations in different nodes, without affecting the remaining services in the application, and that it
can support on–line updating of code, even when its timing
characteristics are changed. Future work will consider the
integration of offset configuration tools and schedulability
analysis that can be applied on-line.
[10] X. Gu and K. Nahrstedt. Distributed Multimedia Service
Composition with Statistical QoS Assurances. IEEE Transactions on Multimedia, 8(1):141–151, February 2006.
[11] X. Gu and K. Nahrstedt. On Composing Stream Applications in Peer-to-Peer Environments. IEEE Transactions on
Parallel and Distributed Systems, 17(8):824–837, August
2006.
[12] D. Isovic and C. . Norström. Components in Real–time Systems. In Proc. of the 8th Conf. on Real–Time Computing
Systems and Applications, Tokyo, 2002.
[13] J. Jin and K. Nahrstedt. A Distributed Approach for
QoS Service Multicast with Geometric Location Awareness.
IEEE Distributed Systems Online, 4(6), June 2003.
[14] R. Marau, L. Almeida, and P. Pedreiras. Enhancing Real–
Time Communication over COTS Ethernet switches. In
WFCS 2006, IEEE 6th Workshop on Factory Communication Systems, Turin, Italy, June 2006.
[15] S. Mullender, editor. Distributed systems. Addison Wesley,
2nd edition, 1993. Chapters 7 and 8.
[16] K. . Nahrstedt, D. Xu, D. Wichadakul, and B. Li. QoS–
Aware Middleware for Ubiquitous and Heterogeneous Environments. IEEE Communications Magazine, 39(2):140–
148, Nov. 2001.
[17] P. Pedreiras and L. Almeida. The Flexible Time-Triggered
(FTT) Paradigm: An Approach to QoS Management in Distributed Real-Time Systems. In IPDPS ’03: Proc. of the
17th International Symposium on Parallel and Distributed
Processing, April 2003.
[18] P. Pedreiras, P. Gai, L. Almeida, and G. Buttazzo. FTTethernet: a flexible real-time communication protocol that
supports dynamic QoS management on Ethernet-based
systems. IEEE Transactions on Industrial Informatics,
1(3):162–172, August 2005.
[19] D. Prasad, A. Burns, and M. Atkin. The Measurement and
Usage of Utility in Adaptive Real–Time Systems. Journal
of Real–Time Systems, 25(2/3):277–296, 2003.
[20] J. Real and A. Crespo. Mode Change Protocols for RealTime Systems: A Survey and a New Proposal. Real Time
Systems Journal, 26(2):161–197, March 2004.
[21] J. Saltzer, D. Reed, and D. Clark. End-to-end arguments in
system design. ACM Transactions on Computer Systems,
2(4):277–288, 1984.
[22] M. Satyanarayanan. Pervasive computing: vision and challenges. IEEE Personal Communications, 8(4):10–17, August 2001.
[23] A. Tes̆anovic, D. Nyström, J. Hansson, and C. Norström.
Aspects and components in real–time system development:
Towards reconfigurable and reusable software. Journal of
Embedded Computing, Feb. 2004.
[24] N. Wang, C. D. Gill, D. C. Schmidt, and V. Subramonian.
Configuring Real–Time Aspects in Component Middleware.
In R. Meersman and Z. Tari, editors, CoopIS/DOA/ODBASE
(2), volume LNCS 3291, pages 1520–1537. Springer, 2004.
Acknowledgements
This work has been partially funded by e-MAGERIT (S0505/TIC/0251), project funded by the Education Council of the
Goverment of the Region of Madrid, the European Social Fund,
and the European Regional Development Fund; also by a grant
from IEETA, Universidade de Aveiro, Portugal; and by the European Commission (ARTIST2 NoE, IST-2004-004527).
References
[1] A. A. Avizienis. The Methodology of N-Version Programming. In C. Lyu, editor, Software Fault Tolerance, pages
25–46. John Wiley & Sons Ltd, 1995.
[2] M. Calha and J. Fonseca. Data streams – an analysis of the
interactions between real–time tasks. In ETFA 2005, 10th
IEEE Conference on Emerging Technologies for Factory Automation, pages 375– 380, Catania, Italy, September 2005.
[3] J. Cano, J. Perez-Cortes, J. Valiente, R. Paredes, and J. Arlandis. Textile Inspection with a Parallel Computer Cluster.
In 5th International Conference on Quality Control By Artificial Vision. QCAV-2001, Le Creusot (France), 2001.
[4] I. Crnkovic and M. Larsson. A case study: Demands on
Component–based Development. In Proc. of 22nd Int. Conf.
of Software Engineering, Limerick (Ireland), June 2000.
[5] M. A. de Miguel, J. Ruiz, and M. Garcı́a-Valls. QoS–Aware
Component Frameworks. In Proc. of the International Workshop on Quality of Service, May 2002.
[6] G. Deconinck, T. A. V. Vincenzo De Florio, and E. Verentziotis. The EFTOS Approach to Dependability in Embedded Supercomputing. IEEE Transactions on reliability,
51(1):76, March 2002.
[7] I. Estevez-Ayres, M. Garcia-Valls, and P. Basanta-Val. Enabling WCET–based Composition of Service–based Real–
Time Applications. ACM SIGBED Review, 2(3):25–29, July
2005.
[8] J. Ferreira, L. Almeida, J. A. Fonseca, P. Pedreiras, E. Martins, G. Rodriguez-Navas, J. Rigo, and J. Proenza. Combining operational flexibility and dependability in FTT-CAN.
IEEE Transactions on Industrial Informatics, 2(2):95–102,
May 2006.
[9] M. Garcı́a-Valls, I. Estévez-Ayres, P. Basanta-Val, and
C. Delgado-Kloos. CoSeRT: A Framework for Composing Service–Based Real–Time Applications. In C. Bussler
and A. Haller, editors, Business Process Management Workshops, volume LNCS 3812, pages 329–341. Springer, 2005.
8
5. Herramientas de Desarrollo
133
PLATAFORMA JAVA PARA EL DESARROLLO DE
APLICACIONES EMPOTRADAS CON RESTRICCIONES
TEMPORALES
Jaime Viúdez, Juan A. Holgado
Departamento de Lenguajes y Sistemas Informáticos
Universidad de Granada
{jviudez, jholgado}@ugr.es
Resumen
Java aporta indudables beneficios en el desarrollo y
mantenimiento de aplicaciones, facilitando la
implementación de programas flexibles, robustos,
reutilizables y seguros. Además, la propiedad de
portabilidad en teoría asegura su ejecución en
diferentes plataformas sin necesidad de volver a
recompilar los programas. Sin embargo, todas estas
propiedades deseables tienen un precio que se paga
en el rendimiento, en una falta de predictibilidad y
en unas necesidades desmesuradas de memoria que
no puede admitirse en aplicaciones que tienen que
ejecutarse sobre entornos empotrados, con limitados
recursos de procesamiento y memoria, y requisitos
de tiempo real. En este trabajo se presenta el
desarrollo de una nueva plataforma Java, que hemos
denominado JAHASE (Java para el Control del
Hardware de Sistemas Empotrados), que facilita el
desarrollo de aplicaciones sobre una gama de
microcontroladores
y
sistemas
empotrados
comerciales en los que se utiliza Java como principal
lenguaje de programación. La plataforma
desarrollada resuelve el problema de portabilidad
presente en sistemas de baja escala, y añade nuevos
mecanismos y funcionalidades, algunos de ellos
inspirados en la especificación RTSJ, dotando al
desarrollador de un conjunto de utilidades que
facilitan su labor como, por ejemplo, la gestión
unificada de buses digitales I2C, SPI y 1-Wire, o el
modelo de eventos asíncrono.
Dado que la plataforma JAHASE puede ser utilizada
en distintas propuestas comerciales de entornos
empotrados, su funcionamiento va a depender de las
características concretas del tipo de empotrado y del
entorno de ejecución. Por este motivo, JAHASE
incorpora un componente con el que es posible
implementar pruebas o experimentos para conocer
de un modo más directo las prestaciones que ofrece
el empotrado.
1
INTRODUCCION
El desarrollo de dispositivos empotrados requiere de
un alto nivel de integración entre sus componentes
hardware y software para conseguir un producto de
bajo costo por unidad, que permita su fabricación en
serie [4]. El desarrollador debe seleccionar aquellos
componentes hardware más apropiados de acuerdo
con sus necesidades de confiabilidad, costo,
rendimiento, consumo de energía, tiempo de vida,
etc. de entre una gama de microprocesadores de 8, 16
o 32 bits, DSPs (procesadores digitales de señales), o
ASICs (circuitos integrados específicos para una
aplicación), así como los módulos de memoria,
dispositivos de entrada y salida, contadores, etc. Por
otra parte, debe integrar los componentes software
más adecuados, para lo cual tiene que instalar un
ejecutivo o un entorno de ejecución que facilite la
posterior carga de la aplicación final, y elegir un
lenguaje de programación flexible y seguro, que le
permita implementar un programa optimizado según
las necesidades concretas de la aplicación y las
características de la plataforma hardware.
La mayoría de las aplicaciones para entornos
empotrados aún se construyen utilizando una mezcla
de lenguajes de programación C y ensamblador. De
este modo, los desarrolladores pueden implementar
programas más simples, que logran un máximo
rendimiento de acuerdo con las prestaciones del
dispositivo, con un consumo mínimo de memoria y
energía. Sin embargo, el uso de lenguajes
procedimentales inseguros como C tiene un impacto
negativo en el desarrollo de software, especialmente
en el caso de aplicaciones complejas, que hace que el
desarrollo del software se convierta en una tarea
tediosa y propensa a errores, en el que es muy difícil
su mantenimiento y reutilización [16].
Java es un lenguaje de programación moderno
orientado a objetos, que ha sido aplicado
exitosamente para el desarrollo de aplicaciones para
sistemas de escritorio y empresariales, aunque
recientemente está ganando la atención de los
desarrolladores de sistemas empotrados [10][1718][28]. Java proporciona la infraestructura y los
mecanismos necesarios para facilitar el desarrollo de
aplicaciones que sean robustas, flexibles, seguras y
reutilizables que, en teoría, pueden ser ejecutados en
cualquier plataforma hardware gracias a la propiedad
de portabilidad. Por ello, Java puede ser una
alternativa interesante para el diseño de sistemas
134
empotrados, ya que puede simplificar el diseño,
desarrollo, prueba y mantenimiento de aplicaciones
complejas. Sin embargo, todas estas características
deseables para el desarrollo de aplicaciones requieren
sacrificar otras propiedades que pueden ser muy
importantes para una aplicación concreta que se
ejecuta en un sistema empotrado con recursos
limitados en memoria y procesamiento, como son el
rendimiento, el determinismo temporal del sistema o
el consumo de memoria.
Se han adoptado varias posibles soluciones que
solventan en parte algunas de estas dificultades, pero
en ningún caso pueden considerarse como soluciones
estándares [17-18, 20, 28]. Cada una de estas
propuestas proporciona una plataforma Java que,
siguiendo en parte algunas de las especificaciones
estándares, imponen restricciones en la máquina
virtual Java (JVM) donde se ejecutan las aplicaciones
Java, y añade nuevos mecanismos y facilidades para
el desarrollo de aplicaciones empotradas incluyendo
para ello nuevas librerías propias generalmente no
conformes con ninguna especificación estándar.
Cuando comparamos algunas de estas propuestas
entre sí nos podemos encontrar diferentes
mecanismos o incluso modelos de programación
diferentes para efectuar las mismas funciones, como
ocurre con la gestión de los temporizadores. Las
divergencias entre las distintas librerías pueden
originar incompatibilidades entre las distintas
soluciones empotradas a las que el desarrollador de
aplicaciones debe estar atento. Es, por ello, que en
este trabajo se propone el desarrollo de una nueva
plataforma software JAHASE (JAva para el control
del HArdware de Sistemas Empotrados) con tres
objetivos: (a) facilitar la portabilidad de programas
Java sobre sistemas empotrados aún teniendo JVM
diferentes, (b) ofrecer nuevas posibilidades a los
desarrolladores de empotrados como servicios de
perro-guardían (watchdog), diferentes tipos de
temporizadores, servicios de bitácora, acceso a
memoria, comunicaciones, etc., y (c) proporcionar un
banco de pruebas que facilite la medida de las
prestaciones concretas del entorno empotrado
utilizado. La plataforma consta de un conjunto de
paquetes y clases que proporcionan un marco de
trabajo uniforme, coherente y transparente para
trabajar con los diferentes entornos empotrados, así
como las facilidades que éstos ofrecen como el
acceso a los puertos de entrada y salida digitales o
analógicos, interfaces de bus (I2C, CAN, 1-Wire,
SPI),
temporizadores
o
manejadores
de
interrupciones. El desarrollo de esta plataforma
software es un componente importante del
middleware JOVIM (Java Open Virtual Instrument
Middleware) [8], y es utilizada para el control de una
maqueta domótica [31], así como parte del desarrollo
de una arquitectura abierta para el control domótico
[9].
2
JAVA
PARA
EMPOTRADOS
SISTEMAS
El desarrollo de aplicaciones para entornos
empotrados requiere conocer las características
hardware del entorno empotrado, el entorno de
ejecución utilizado, y el lenguaje de programación
sobre el que se implementará la aplicación. Pese a los
beneficios que ofrece Java como lenguaje de
programación para el desarrollo de aplicaciones,
existen algunas particularidades que pueden causar
dificultades en el desarrollo de aplicaciones para
sistemas empotrados. A continuación vamos a
analizar
algunas
de
éstas
características,
centrándonos en aspectos de Java que pueden
comprometer el rendimiento y el cumplimiento de las
restricciones temporales.
2.1
MODELOS DE EJECUCIÓN DE LA JVM
La máquina virtual de Java (JVM, Java Virtual
Machine) es el núcleo del paradigma Java, y
determina el rendimiento de la ejecución de los
programas Java. Para ejecutar programas en Java
primero deben ser compilados en un formato binario
portable (en la forma de código intermedio neutro,
llamado Bytecode) contenidos en ficheros .class, para
que luego los bytecodes contenidos en dichos
ficheros sean ejecutados en la JVM. La JVM no sólo
traduce los bytecodes a instrucciones nativas, sino
que además verifica todos los bytecodes antes de su
ejecución. Existen cuatro modelos de ejecución de la
JVM:
a) Interpretado: este es el modo clásico de ejecución,
en el que la JVM interpreta bytecodes del fichero
.class uno a la vez y ejecuta la operación o secuencia
de operaciones nativas sobre la plataforma software
del dispositivo (el entorno de ejecución del mismo).
En este modo se satisface la característica de
portabilidad de la filosofía Java, pero además se
obtiene el peor rendimiento, por lo que este modelo
no se emplea en la actualidad en la implementación
de la JVM.
b) Just-in-Time (JIT). El proceso de traducción de
bytecodes a código nativo se realiza cada vez que un
método se invoca o una clase se carga por primera
vez. La secuencia correspondiente de código nativo
se guarda en memoria RAM para una rápida
ejecución la próxima vez que se llame al mismo
bytecode. Este modo es más rápido que el anterior,
pero requiere una cantidad importante de memoria
RAM para guardar tanto el código Java como su
correspondiente código nativo. Por ello, existen
variaciones del algoritmo JIT como la compilación
adaptativa dinámica (DAC) [6]. La mayoría de las
máquinas virtuales comerciales están desarrolladas
siguiendo este enfoque, como la HotSpot de SUN y
J9 de IBM..
c) Ahead-of-Time (AOT) o compilación cruzada: los
bytecodes Java, o incluso el propio código fuente
Java se traducen directamente a formato binario de la
plataforma nativa del dispositivo, utilizando un
compilador cruzado. Este método es similar al
utilizado por aplicaciones en C/C++ con librerías
compartidas. El fichero ejecutable que se obtiene
puede incluir el fichero objeto más todas las librerías
precompiladas en un solo fichero, o bien las librerías
precompiladas pueden ser enlazadas en tiempo de
ejecución. En cualquier caso, sólo puede ser
ejecutado por el entorno de ejecución específico para
el que fue construido. Con este modo se logra un
rendimiento superior con menos uso de RAM, pero la
portabilidad, la independencia entre plataformas y la
flexibilidad de Java se sacrifican. Este modelo es el
que más frecuentemente se utiliza en el diseño de
máquinas virtuales para empotrados, como, por
ejemplo, JAMAICA que cumple la especificación
RTSJ [23], LEJOS, JELATINE [1], etc.
d) Procesador Java: este modo es el más rápido, ya
que los bytecodes son interpretados y ejecutados
directamente por el hardware del procesador, sin
necesidad de traducción o de tan siquiera un sistema
operativo subyacente. Existen dos formas de ejecutar
bytecodes en hardware: con un procesador Java [7] o
con un chip acelerador [29]. En la primera, se
reemplaza al procesador de uso general por un
procesador hardware específico capaz de ejecutar
bytecodes de JVM como su conjunto de instrucciones
nativo, con lo que sólo puede ejecutar bytecodes
Java. En cambio, en la segunda se incorpora un coprocesador junto al microprocesador de propósito
general, el cual se encarga de traducir los bytecodes
Java en secuencias de instrucciones para el
procesador de propósito general. Con este tipo, el
mismo hardware pueda utilizarse para ejecutar
código mixto. Existen varios procesadores Java con
implementaciones diferentes de este enfoque, como
el aJile JEM2, CJip, etc.
2.2
ESTANDARIZACION DEL LENGUAJE
Y LAS LIBRERIAS
El lenguaje Java está sustentado sobre dos pilares
básicos: (a) la especificación del lenguaje y las
librerías, y (b) la implementación del lenguaje.
Con la especificación del lenguaje y las librerías se
definen los elementos y estructura del lenguaje,
estableciendo tanto su sintaxis como su semántica.
La funcionalidad se obtiene a través de las librerías
APIs que se organizan en un conjunto de paquetes.
135
Ahora bien, con objeto de controlar el desarrollo y
evolución del lenguaje y de las librerías, se ha
regulado la estandarización de las especificaciones a
través de la comunidad Java JCP (Java Community
Process), en la cual grupos especializados de
compañías y organizaciones académicas pueden
desarrollar nuevas especificaciones de librerías
(APIs) para campos de aplicación específicos y para
las nuevas tecnologías que van surgiendo, como
Bluetooth (JSR-82), tecnología inalámbrica (JSR185), programación en tiempo real (JSR-1), Java TV
(JSR-927), etc.
Por otra parte, la implementación del lenguaje
conlleva la implementación de la máquina virtual
JVM sobre la cual se van a ejecutar las aplicaciones
Java, y además la implementación de las librerías que
pueden incluir código nativo, cumpliendo la
especificación estándar que satisfacen.
En el mercado de los dispositivos empotrados Java,
se pueden encontrar una gran gama de productos que
satisfacen algunas de las siguientes especificaciones:
− Java Card. La primera versión de esta
especificación fue introducida en 1996, y está
orientada al desarrollo de aplicaciones para
pequeñas tarjetas inteligentes con procesadores
de 8/16/32 bits y fuertes restricciones de
memoria, en el rango de 1KB de RAM y 16 KB
de ROM. [12]
−
PersonalJava [18]. Introducida en 1997, y
compatible con la versión de JDK 1.1.8, ahora
discontinuada, está orientada a dispositivos
empotrados con 2,5 MB de ROM y un mínimo
de 1MB de RAM. Actualmente esta
especificación ha sido reemplazada por la de
J2ME/CDC. Sin embargo, hay fabricantes que
aún venden sistemas empotrados con librerías
que satisfacen esta especificación como Tini
[25].
− EmbeddedJava. Introducida en 1998, es
compatible con la versión discontinuada del JDK
1.1 orientada a dispositivos de gama baja con
512kB de ROM y 512 KB de RAM sin
interfaces gráficas. Actualmente se ha
reemplazado
esta
especificación
por
J2ME/CLDC.
− J2ME/CLDC (JSR-30) [11]. Introducida en el
2000, su mercado natural son dispositivos con
capacidad de conexión inalámbrica, como
teléfonos móviles, buscapersonas, etc., y se ha
convertido en el estándar de facto para la
mayoría de los dispositivos empotrados Java,
incluyendo equipos de TV, etc. Requiere de al
menos 192kB de memoria, y de un procesador
de 16 0 32 bits.
− J2ME/CDC (JSR-36) [11]. Introducida en el
2001, reemplaza a la especificación PersonalJava
para los dispositivos empotrados con al menos
136
−
−
2MB de memoria y procesadores de 32 bits. Se
orienta principalmente a PDAs y a dispositivos
empotrados con más recursos.
RTSJ (JSR-1) [13]. En 2001 se introdujo la
especificación para la programación de
aplicaciones de tiempo real. Extiende el lenguaje
Java, sin modificar la semántica del lenguaje
para hilos que no tienen características de tiempo
real, para que soporte tareas de tiempo real con
planificadores que pueden ser modificados
dependiendo del tipo de sistema de tiempo real.
RTSJ fue diseñado para extender a la
especificación J2SE (no a J2ME), por lo que uno
de sus mayores problemas está en la gran
cantidad de memoria que necesitan sus
implementaciones, lo que dificulta su uso en
dispositivos empotrados, especialmente en
aquellos de gama baja. Por ejemplo, la
implementación de referencia de Timesys
requiere 2,6 MB de memoria para la JVM
además de 2,5 MB para las librerías.
JSR-302: También conocido como “Safety
Critical JavaTM Technology”. Es una
especificación basada en el JSR-1 (Real-Time
Specification for Java) que agrupa una serie de
características mínimas para sistemas con
requisitos de seguridad críticos (valido para
sistemas embebidos). Es una especificación que
está todavía en desarrollo.
El uso de la implementación de una librería API que
sea conforme a una de las especificaciones descritas
anteriormente sobre un sistema empotrado establece
la semántica de Java y las posibilidades que ofrece en
cuanto a la programación. Los fabricantes de
empotrados superan algunas de las debilidades de
Java
desarrollando
nuevas
librerías
que
complementan los mecanismos y medios que se le
ofrecen a los desarrolladores de sistemas empotrados.
Por tanto, por una parte, los fabricantes anuncian la
compatibilidad de sus dispositivos empotrados con
especificaciones Java estándar (como PersonalJava) y
por otro lado incluyen librerías propietarias para
completar el entorno de desarrollo de software para
los dispositivos.
2.3. RECOLECTOR DE BASURA.
El recolector de basura (RC) es un elemento
fundamental de la JVM, cuyo cometido es ayudar al
programador, pues lo libera de la tarea de ir liberando
los recursos (objetos) que deja de utilizar. Este gran
beneficio tiene como contrapartida los siguientes
problemas:
- Produce sobrecarga debido a que su acción suele
ser costosa en tiempo de computación.
- Indeterminismo, pues reclama la memoria de los
objetos que se han quedado fuera de ámbito en
-
cualquier momento, invalidando los plazos de
tiempo de otras tareas.
Fragmentación de la memoria: las aplicaciones
Java crean objetos temporales frecuentemente, lo
cual origina huecos en la memoria cuando son
liberados.
Para solventar estos problemas existen diversas
posibilidades:
- Implementar nuevos algoritmos para el
recolector de basura que sean más eficientes,
preventivos, que optimicen los recursos de
memoria, etc. Existe una nueva generación de
RC de tiempo real.
- Definir nuevos modelos de memoria, de modo
que el RC sólo pueda actuar sobre objetos
temporales y no sobre objetos que requieran una
persistencia mayor en el tiempo.
- Deshabilitar el uso del RC en las aplicaciones.
2.4. MODELO DE CONCURRENCIA.
La concurrencia posibilita la ejecución de varios
hilos para simplificar la escritura de un programa al
separarlo en tareas independientes. Ahora bien, la
implementación de aplicaciones concurrentes
requiere en Java mecanismos de sincronización para
controlar el acceso a los datos compartidos en
exclusión mutua.
El soporte de concurrencia del Java estándar tiene
una semántica débil que no está suficientemente
especificada. Así, no tenemos garantía de que los
hilos de mayor prioridad se estén siempre ejecutando,
o que los hilos de igual prioridad se ejecuten en
lapsos de tiempo fijos [33].
En la mayoría de las ocasiones esto se resuelve
incluyendo en el entorno de ejecución un sistema
operativo de tiempo real sobre el que se ejecuta una
máquina virtual de Java que utiliza el modelo de
hilos nativos, es decir, hace corresponder los hilos de
Java con los hilos del sistema operativo. Sin
embargo, esto no es suficiente para garantizar que
tenga un comportamiento en tiempo real. La otra
solución más razonable consiste en incluir una
máquina virtual RTSJ, dado que esta especificación
define un modelo de concurrencia seguro junto a
planificadores que posibilitan garantizar el
cumplimiento de las restricciones temporales de las
tareas en ejecución.
El soporte de concurrencia no siempre es necesario, y
en algunos casos no se utiliza porque el sistema
empotrado tiene escasos recursos para disponer de un
entorno de ejecución completo. Una alternativa muy
utilizada en los empotrados es la técnica basada en
ejecutivos cíclicos.
137
2.7. GESTION DE MEMORIA.
2.5. CARGA DINÁMICA DE CLASES.
La carga dinámica de clases es una facilidad única en
Java que permite cargar componentes Java en tiempo
de ejecución. Esto permite cargar clases
independiente de su ubicación y tipo de clases
(applets, servlets, beans).
Los pasos que se llevan a cabo son los siguientes:
-Buscar y cargar los bytecodes asociados a la
nueva clase (carga perezosa de clases “lazy
loading”).
-Verificar el formato del .class
-Enlazar la clase con estructuras de datos
manejados por la JVM.
-Preparación e inicialización de la clase.
-Ejecución del constructor para instanciar el
nuevo objeto.
El problema que presenta la carga dinámica es que
penaliza la ejecución de las aplicaciones y requiere
un consumo de recursos que puede ser excesivo. Por
este motivo en algunos sistemas empotrados Java no
se soporta esta funcionalidad, debido a los escasos
recursos disponibles. Como soluciones a este
problema se pueden utilizar las siguientes
alternativas: (a) Carga estática de clases: Se genera
un binario (similar al compilado de C), o (b) definir
nuevos mecanismos más ligeros de carga de clases.
2.6. ACCESO AL HARDWARE.
El modelo de seguridad de Java no permite un acceso
directo al hardware del dispositivo para el que se
programa, debido a que Java carece de punteros, sólo
referencias. Así se consigue una mayor seguridad y
control de los programas que se están ejecutando. Por
tanto, no es posible controlar directamente el
contenido de los registros hardware, ni de las
posiciones de memoria, lo cual hace imposible la
implementación de controladores para controlar
dispositivos de entrada/salida.
Sólo es posible acceder al hardware utilizando
métodos nativos Java a través de la extensión JNI
(Java Native Interface) que nos permite invocar
llamadas a procedimientos externos escritos en otros
lenguajes de programación como C o ensamblador.
Sin embargo la eficiencia de este mecanismo es muy
dependiente del entorno de ejecución y tiene un
rendimiento muy pobre. Por este motivo la mayoría
de los fabricantes de dispositivos empotrados aportan
sus propios interfaces a métodos nativos para mejorar
el rendimiento global de las aplicaciones.
Java libera al programador del compromiso de la
reserva y liberación de memoria que necesita para el
desarrollo de aplicaciones no permitiendo que pueda
acceder directamente a la memoria, y además
limitando el uso únicamente a memoria heap. El
proceso de reserva de memoria la realiza la JVM a
medida que se va necesitando en la instanciación de
las clases, mientras que el proceso de liberación se
produce en instantes concretos de modo automático
en cuanto los objetos creados quedan fuera de ámbito
llamando al recolector de basura.
La gestión de memoria debe estar controlada en las
aplicaciones con restricciones de tiempo real, ya que
puede ser una de las causas de indeterminismo. En el
caso de Java, el recolector de basura puede interferir
y hacer perder plazos de tiempo, por lo que se deben
adoptar soluciones que controlen el recolector de
basura.
Una posible solución para impedir el efecto del
recolector de basura es eliminarlo de la JVM por lo
que o bien hay que añadir un mecanismo para liberar
la memoria ocupada, o bien ésta no se libera sino se
sobrescribe. Otra solución consiste en definir un pool
de objetos de modo que se puede reutilizar la misma
zona de memoria sin la intervención del recolector de
basura al no permitir que se quede fuera de ámbito.
En RTSJ esto se resuelve definiendo nuevas áreas de
memoria como la memoria Inmortal y la memoria
Scoped.
3
DISPOSITIVOS
JAVA
EMPOTRADOS
En esta sección se ofrece una descripción de algunos
dispositivos empotrados Java comerciales. Los
dispositivos seleccionados cumplen con las
siguientes restricciones. Primero, pueden incluir
procesadores de 8, 16 o 32 bits, pero deben ser
soluciones comerciales con un costo por unidad
conocido a través de sus sitios Web o distribuidores.
En segundo lugar, el lenguaje de programación
principal utilizado para dichos dispositivos debe ser
Java. En tercer lugar, los dispositivos no deben
pertenecer al dominio de los dispositivos móviles con
conectividad
inalámbrica,
como
PDAs,
buscapersonas o teléfonos móviles. En la tabla 1 se
muestra un resumen de las características particulares
de los sistemas empotrados estudiados.
3.1
JAVELIN STAMP
El Javelin Stamp [14] es una placa de tamaño
reducido de Parallax Inc., para hacer simple el
138
desarrollo y despliegue de pequeños sistemas
prototipo. Incluye convertidores de señal analógica a
digital (A/D) y digital a analógica (D/A) y pines de
entrada/salida (E/S) genéricos. Soporta un
subconjunto de la especificación 1.2. No soporta
multihilo, recolector de basura ni interfaces, por lo
que no es estándar.
Stamp. Soporta y es conforme con la especificación
PersonalJava, versión 1.2. Adicionalmente, Snijder
incluye el paquete com.snijder para manejar
características de bajo nivel. La programación y
descarga de aplicaciones es vía telnet y FTP.
4
PLATAFORMA HAJASE
3.2 AJILE
4.1
DESCRIPCION GENERAL
El aJile JEMcore [2] es un procesador Java de
ejecución directa (es una implementación Hardware
de la JVM). El proceso de desarrollo y despliegue de
aplicaciones es complejo, ya que requiere el manejo
y configuración de aplicaciones específicas como el
JemBuilder y el cargador/depurador Charade.
JAHASE es una plataforma software basada en Java
que proporciona un marco de trabajo que permite el
acceso al hardware en sistemas empotrados o
embebidos de una forma homogénea y sencilla,
abstrayendo la complejidad y la heterogeneidad que
pueda existir dependiente de los diferentes
fabricantes.
3.3
Este entorno de programación facilita a los
programadores de aplicaciones para sistemas
empotrados la manipulación del hardware
independientemente de su arquitectura, lo que
permite desacoplar las aplicaciones respecto del
empotrado usado, pudiéndose modificar o actualizar
dicho hardware en función de las necesidades
cambiantes de la aplicación empotrada.
TINI
Tiny InterNet Interface [30] es una especificación
hardware-software de plataformas empotradas
desarrollada por Dallas Semiconductor (ahora Maxim
Integrated Products). Soporta parcialmente el JDK
1.1.8 de Sun, pues no posee carga dinámica de clases
ni finalización de objetos. El proceso de desarrollo y
despliegue de aplicaciones requiere herramientas
específicas (TINI SDK), las cuales generan un
fichero propio (.tini), pues no soporta la ejecución
directa de ficheros Java (.class) como en el caso de
Ajile.
3.4
SNAP
SNAP (Simple Network Application Platform) [24]
es una placa de Imsys Tech basada en el procesador
Java Cjip. Se acopla a una placa portadora que
provee hardware adicional y que es compatible con el
hardware TINI capaz de ejecutar directamente
bytecodes Java. El Cjip soporta y es conforme con
J2ME-CLDC. Incluye la implementación del perfil
MIDP junto con otros paquetes de clases específicos.
La programación y descarga de aplicaciones es vía
telnet y FTP.
3.5
EJC
EJC (Embedded Java Controller) [25] es una
plataforma empotrada Java de uso general diseñada
por Snijder Micro Systems, orientada hacia
aplicaciones Java con conexión a Internet, al igual
que TINI. El hardware se basa en un controlador de
la familia EC200 y su JVM siguen un modelo de
ejecución JIT, en el cual los bytecodes de Java deben
ser primero traducidos a código nativo (a través de la
JVM) antes de su ejecución, como en TINI y Javelin
JAHASE también aporta nuevas funcionalidades de
alto nivel al programador que pueden no estar
presentes en el hardware subyacente como la gestión
de temporizadores o gestión de watchdog, o en la
JVM subyacente como la reserva de memoria que
excluye la intervención del recolector de basura.
Los beneficios que aportan son numerosos, desde las
facilidades de extensibilidad, homogeneidad en el
tratamiento del hardware, portabilidad de las
aplicaciones, simplificación en el modelo de
programación, hasta la posibilidad de aumentar el
número de sistemas empotrados controlados de
forma sencilla y rápida.
4.2
CARACTERÍSTICAS DE JAHASE
Este entorno de trabajo ofrece una serie de
características y funcionalidades que facilitan la
programación de aplicaciones empotradas y de
tiempo real. De forma resumida podemos destacar las
siguientes características principales:
1.Tratamiento homogéneo de las diferentes
entradas/salidas del sistema en conjunto,
independientemente de que éstas estén conectadas
a buses digitales mediante expansores de puertos,
o pertenezcan al propio microcontrolador.
2.Acceso homogéneo a los diferentes tipos de buses
digitales disponibles en una plataforma
empotrada como, por ejemplo, I2C, SPI o 1-Wire.
Resources (Speed)
Resources (Memory)
Processor
Power
139
Javelin Ajile Snap EJC TINI ~25Mhz
100 MHz / 66 MHz
32 kB RAM, 32 kB
program EEPROM
32 kB on-chip, 2-8 MB
external RAM
66.67 MHz
1 kB on-chip, 8 MB
External RAM, 2 MB
Flash
74 MHz
8 kB on-chip, 64 MB
External RAM, 16 MB
Flash
RISC Ubicom
SX48AC
aJile aJ-100 / aJ-80
Imsys Cjip
Cirrus Logic EP7312
(+ slave PIC 16LF872)
75 MHz
64 kB on-chip, 1
MB external RAM,
1 MB Flash
Maxim/Dallas
Semiconductor
DS80C410
150 mW
260 mW
Internal Connectivity
16 I/O pins
PWM
External Connectivity
RS-232
Java Execution Model
Java Standard Compliance
Concurrence/Synchronization
Performance
Efficiency
Real-Time Support
Latency
Reliability
Standardization
Ease of Development
Flexibility
Portability
Escalability
Integration
JNI
Security vs HW Access
JVM - Java
Interpreter
Non-standard
compliant Java 1.2
subset
limited1
low
low
Real Time capable
(low-level
programming), no
schedulers
high
low
low
Medium
low
none
low
low
none
1700 mW (w/active
Ethernet)
8 I/O pins
SPI
I2C
CAN
1-Wire
300 mW
120 mW
15 I/O pins
I2C
1-Wire
1-Wire
CAN
Ethernet
RS-232
22 I/O pins
SPI
I2C
CAN
1-Wire
Ethernet
RS-232
IrDA
Direct on-chip Java
bytecodes execution
Ethernet
RS-232
Ethernet
RS-232
Direct on-chip Java
bytecodes execution
JVM – Java Interpreter
J2ME CLDC
compliant
J2ME CLDC, MIDP
profile compliant
PersonalJava compliant
(Java 1.1.8-based)
Standard2
high
high
Real-Time threads
and scheduling
schemes (Piano roll,
Periodic threads)
support
low
high
medium
low
medium
none
high
medium
none
Standard
normal
normal
Standard
high
high
JVM – Java
Interpreter
Java 1.1.8 subset
support (not 1.1.8
compliant)
Standard
normal
normal
Real-time OS,
deterministic timers
support.
Real-time OS, thread
RTOS-Java priority
mapping support
Real-Time OS,
but no Java RealTime support.
high
medium
high
Medium-High
medium
none
medium
high
Partial supported
low
high
medium
High
medium
none
high
Médium
none
high
medium
high
Medium-High
medium
none
medium
medium
TNI3
Tabla 1. Embedded system features sumary
1
Javelin soporta la programación de Periféricos Virtuales que se ejecutan en pseudo-concurrencia, pero
no soporta programación multi-hebra.
2
Ajile posee algunas características adicionales como PeriodicThread, PeriodicTimer, PianoRoll, etc.
3
TNI: Tini Native Interface. Es un subconjunto de JNI.
140
3.Abstracción de la comunicación, la cual se puede
realizar de forma independiente a la tecnología
subyacente (ej: RS232, Ethernet, etc.).
4.Acceso homogéneo a ficheros en memoria RAM o
flash, y posibilidad de almacenamiento
persistente si el sistema empotrado tiene soporte
para un sistema de archivo.
5.Funcionalidad de generación de Bitácora
(logging). Permite realizar anotaciones ordenadas
cronológicamente en registros (o logs) durante la
ejecución de las aplicaciones de forma sencilla,
con la posibilidad de definir diferentes niveles de
logging (Warning, Error, Info, etc.). Los registros
pueden realizarse sobre memoria, consola o en un
sistema de almacenamiento persistente.
6.Pool de objetos. Permite reutilizar objetos para
optimizar el uso de la memoria.
7.Controlador virtual de acceso a memoria. Crea
espacios de memoria permitiendo el acceso a los
mismos como si se tratase de memoria física.
8.Modelo de eventos asíncronos genéricos que
permiten unificar el modelo de interrupciones y
eventos software, proporcionando la posibilidad
de extender la funcionalidad de los manejadores o
controladores asociados.
9.Perro Guardián (Watchdog) software y hardware.
Permite asociar la ejecución de una determinada
tarea o el reinicio del sistema (funcionamiento
clásico) cuando la aplicación se bloquee en la
zona protegida por el perro guardián.
10.Acceso al uniforme al reloj del sistema. Se puede
obtener el tiempo real del sistema de forma
genérica.
11.Temporizadores: de un solo disparo y periódicos.
Facilitan la programación de tareas periódicas y
no periódicas.
12.Cronómetros y utilidades para medidas del
tiempo de peor ejecución (WCET, Worst Case
Execution Time). Permite poder realizar
mediciones de tiempo de forma sencilla.
13. Medida de prestaciones mediante un conjunto
de pruebas que pueden ser personalizados por el
usuario.
Para el desarrollo de JAHASE se han tenido en
cuenta varios objetivos. En primer lugar se han
utilizado técnicas avanzadas de diseño mediante el
uso de diferentes patrones de diseño como el patrón
Factoría, Adaptador, Proxy, etc. [15] que facilitan
tanto el desarrollo como el mantenimiento de la
plataforma. En segundo lugar, las clases
implementadas en JAHASE están optimizadas para
reducir el consumo de memoria y aumentar la
velocidad de ejecución. En tercer lugar, se ha seguido
un esquema genérico mediante la incorporación de
jerarquías que posibilita la adición de nuevas clases
para la extensión de JAHASE a nuevos entornos
empotrados.
El entorno de programación de JAHASE sigue el
esquema
de
trabajo
host-target
empleado
frecuentemente para el desarrollo de aplicaciones
para sistemas empotrados. En una primera etapa la
aplicación Java se compila a través de JAHASE que
incluirá todos los paquetes comunes y genéricos de la
plataforma, comprobará si hay algún error, y
posteriormente añadirá las clases específicas del
sistema
empotrado
concreto
a
utilizar.
Posteriormente se invoca a los scripts y herramientas
específicas de cada fabricante para crear la imagen de
la aplicación (en unos casos serán bytecodes, en otros
serán un archivo binario, etc.) dependiendo del
modelo de ejecución de la JVM.
4.2
ARQUITECTURA DE JAHASE
En la arquitectura de JAHASE se pueden diferenciar
dos capas: una capa de bajo nivel encargada de
ofrecer la abstracción respecto del hardware
subyacente, permitiendo la homogenización en el
acceso de bajo nivel, y una capa superior que
contiene los componentes y servicios de alto nivel
que permiten una mayor flexibilidad a la hora de
desarrollar aplicaciones sobre esta plataforma. La
figura 1 muestra gráficamente tanto el nivel de
componentes como el de acceso al hardware de la
arquitectura de JAHASE.
Los componentes soportados actualmente son:
1-Acceso a Buses Digitales.
2-Servicios de Bitácora (Logging).
3-Utilidades para el Acceso a memoria.
4-Servicios de tiempo.
5-Comunicaciones.
6-Utilidades de entrada/salida (Hardware).
141
Aplicación Java
JAHASE
Componentes
DigitalBuses
I2C SPI
Logging
Memory
Time
Com
CPU
1-Wire CAN
Homogeneidad y Abstracción Librería Javelin
Stamp
Librería
EJC
Librería
TINI
Librería
JPro
Librería
SNAP
J2ME
J2SE
RTSJ
JVM
Fig.1. Arquitectura JAHASE
Cada uno de los componentes tiene asociado un
paquete que agrupa todas las clases, paquetes e
interfaces relacionados con dicho componente, lo
cual da un mayor grado de estructuración a la
plataforma. El paquete JAHASE contiene los
servicios genéricos y funcionalidades de alto nivel
independientes del hardware, mientras que el paquete
HDK (Hardware Developement Kit) contiene las
implementaciones específicas para cada tipo de
sistema hardware soportado. Las clases incluidas en
el paquete HDK son extensiones de las clases
genéricas del paquete JAHASE, por lo que existirá
una extensión de cada clase en JAHASE por cada
tipo de hardware empotrado soportado a menos que
dicha implementación sea la misma para varios
sistemas hardware.
4.3
COMPONENTE DE BUSES DIGITALES
En este apartado se analiza con detalle uno de los
componentes básicos de JAHASE. Con objeto de
uniformar el acceso a los diferentes tipos de buses
digitales que puede soportar un sistema empotrado,
se ha utilizado un modelo “maestro-esclavo” para
facilitar el acceso a los mismos. El modelo es
modularizable y extensible, y permite de forma
genérica tratar del mismo modo tanto dispositivos
conectados a un bus I2C, a un bus SPI, a un bus 1wire ó a un bus CAN. El modelo “maestro-esclavo”
es una versión software del modelo utilizado
físicamente. El maestro, el empotrado, se encarga de
la sincronización del bus y de iniciar las
comunicaciones con cualquiera de los dispositivos
esclavos, que encapsulan a otros dispositivos
electrónicos como conversores ADC, extensores de
E/S, sensores de temperatura, etc. La ventaja de
utilizar el bus digital, es que permite simplificar el
cableado, pues usa dos o tres cables para acceder al
dispositivo electrónico, y por otra parte, posibilita
aumentar la capacidad del empotrado en algún
aspecto como por ejemplo, proporcionar más E/S de
la que inicialmente tiene el empotrado, acceder a una
unidad de memoria, etc.
Para virtualizar desde el software el modelo maestroesclavo (figura 2) se requiere manejar un maestro
virtual que, internamente, utiliza una instancia del
maestro concreto en función del tipo de bus digital
seleccionado. Además necesitamos conocer el
142
esclavo concreto con el que queremos trabajar. Esto
es necesario, ya que aunque los esclavos utilizan un
mismo protocolo de comunicaciones fijado por el bus
digital, no ocurre así con los datos que son capaces
de interpretar. Por tanto se debe implementar un
esclavo virtual para cada familia de esclavos como
conversores ADC, sensor de Temperatura, etc.
JAHASE incluye algunos esclavos virtuales por
defecto como el esclavo virtual del dispositivo LM75
de National Semiconductor que permite manejar un
sensor de temperatura por I2C, o el esclavo virtual
del dispositivo PCF8574 de Phillips que permite
expandir 8 entradas/salidas digitales por el bus I2C.
Por otra parte, también es necesario incluir un
mecanismo que permita especializar los esclavos
virtuales según las necesidades del programador.
DigitalBus
Slave
#
#
m_address: int
m_name: string
#
#
diectRead(int) : int
directWrite(int, int) : int
contiene
0..*
SPI::SPISlave
I2C::I2CSlave
1
oneWire::
OneWireSlave
#
#
m_name: string
m_speed: int
+
+
+
+
read(byte[], int, int) : int
setSlaveAddress(int) : int
setSpeed(int) : int
write(byte[], int, int) : int
SPI::SPIBus
I2C::AjileI2CBus
I2C::Pcf8591_I2CSlave
-
bus: I2C
+
+
+
+
setVelocidad(int) : int
I2C::LM75_I2CSlave
-
PCF8591: String {readOnly}
-
#
#
# diectRead(int) : int
# directWrite(int, int) : int
LM75: String {readOnly}
I2C::I2CBus
oneWire::
OneWireBus
I2C::SnijderI2CBus
-
bus: I2CMasterBus
esclavo: I2CDevice
+
+
+
+
I2C::SnapI2CBus
-
bus: I2CPort
+
+
+
+
I2C::Pcf8574_I2CSlave
-
PCF8574: String {readOnly}
#
#
Fig. 3. Detalle de la jerarquía de clases de JAHASE
respecto a los buses Digitales.
Generic Digital Bus Master Slave 1 Slave 2 … Slave N Fig.2. Modelo Maestro-Esclavo genérico para buses
digitales
4.3.1. Diseño del Componente.
Dependiendo del tipo de sistema empotrado que
estemos usando, la librería API proporcionada por el
fabricante es diferente para el acceso a los diferentes
tipos de buses digitales, así como los mecanismos
definidos para acceder a los mismos. Esto nos obliga
a realizar una implementación concreta para cada
dispositivo y para cada tipo de sistema empotrado
según el tipo de bus usado que tiene que ajustarse al
modelo de programación de JAHASE.
Para manejar los buses digitales con el modelo
maestro-esclavo virtual se ha creado un conjunto de
clases donde, gracias a la herencia y al patrón
Factoría, se consigue abstraer el tipo de bus que
internamente se está utilizando, al igual que el tipo de
sistema empotrado (hardware) subyacente. El diseño
realizado permite que la implementación de los
diferentes tipos de esclavos sea única, pues el acceso
específico de cada esclavo al bus digital se realiza
utilizando las primitivas genéricas de lectura y/o
escritura del bus.
Como se ve en la figura 3 las principales clases que
soportan el modelo maestro-esclavo están en la raíz
de los árboles de herencia, DigitalBus y Slave, las
cuales contienen los métodos y atributos comunes a
las tecnologías de buses digitales.
La clase DigitalBus es el maestro virtual que abstrae
el acceso a cualquiera de los tipos de buses digitales
soportados por un sistema empotrado. Contiene
esencialmente los métodos y atributos comunes a
todos los tipos de buses digitales y se extiende en
clases especializadas para cada tipo de bus digital y
para cada tipo de empotrado. Estos tienen que
implementar los métodos abstractos de acceso al bus
utilizando la API específica de cada plataforma
hardware.
La clase Slave contiene todos los atributos y métodos
genéricos de un esclavo virtual para cualquier tipo de
bus digital. Para cada uno de los diferentes tipos de
buses digitales y dispositivo concreto existe una clase
que especializa a “Slave” de acuerdo con la
semántica concreta de cada bus. Cuando se desee
realizar una implementación para un dispositivo
concreto como el PCF8591 de Philips se deberá
extender la clase esclavo del bus al que pertenece
I2CSlave en una nueva clase PCF8591_I2CSlave.
Una aplicación que utilice JAHASE, puede
aprovechar este mecanismo de acceso a buses
digitales para controlar cualquier tipo de dispositivo
esclavo conectado al bus mediante una clase
específica por cada uno de los esclavos.
El acceso a los buses digitales se realiza desde la
clase estática JAHASE.cpus.CPU, mediante la
llamada a un método getDigitalBus al que se le
indica el tipo de bus deseado (SPI, I2C, 1-Wire, etc.)
como se puede ver en la figura 4. Las flechas blancas
representan el orden en las llamadas originadas,
mientras que el resto de flechas rellenas indican las
relaciones de herencia entre las diferentes clases.
143
La implementación específica de cada tipo de sistema
empotrado para un determinado bus digital se
encuentra en la jerarquía de paquetes HDK, y
siempre extiende la clase del tipo de bus
correspondiente de la jerarquía de paquetes JAHASE
(representado en el diagrama con las circunferencias
rojas de HDK y sus correspondientes flechas de
herencia hacia JAHASE).
JAHASE
Hdk
virtual del bus concreto a partir de la invocación al
método getDigitalBus pasando como argumento el
tipo de bus deseado. Posteriormente se crea una
instancia de LM92_I2CSlave, especificando en los
argumentos el bus utilizado y la dirección física del
puerto en el que físicamente se ha conectado el
dispositivo LM92. El resto del código de la figura 5
muestra un ejemplo sencillo de cómo se puede
muestrear de forma periódica el sensor de
temperatura y luego, a continuación, imprimir el
valor de temperatura actual.
digitalBus
cpus
CPU
DigitalBus
getDigitalBus(id,busType,busName)
getDigitalBus(id,busType,busName)
I2C
SPI
oneWire
I2CBus
SPIBus
OneWireBus
getDigitalBus
getDigitalBus
getDigitalBus
SnapI2CBus
SnapSPIBus
SnapOWIBus
digitalBus
DigitalBus
I2C
SPI
oneWire
I2CBus
SPIBus
OneWireBus
AjileI2CBus
AjileSPIBus
AjileOWIBus
SnijderI2CBus
SnijderSPIBus
SnijderOWIBus
Fig. 4. Diagrama para la Creación de Buses Digitales
4.3.2. Ejemplo de Aplicación
Para mostrar el funcionamiento del componente de
acceso a buses digitales, se presenta un ejemplo de
aplicación en el que se accede a un sensor de
temperatura LM92 mediante el bus I2C.
package example.buses;
import JAHASE.digitalBus.*;
import JAHASE.digitalBus.I2C.*;
import JAHASE.cpus.*;
/** This is a example of use of I2C bus, a kind of DigitalBus. */
public class pruebaLM92{
4.3.3. Extensión del Componente
Cuando se desee realizar una implementación de un
esclavo virtual para un dispositivo concreto se deberá
extender la clase esclavo del bus al que pertenece
dicho dispositivo. Esto nos permite utilizar la
semántica del protocolo de comunicaciones asociado
al bus, que está implementado en los métodos
directRead y directWrite. El protocolo de
comunicaciones del bus digital establece de forma
general cómo se realizan las lecturas y escrituras
entre los distintos dispositivos del bus, pero no
específica el formato o mascara de bits de los datos
ni el orden en el que son interpretados por cada
dispositivo conectado al bus; esto viene definido por
el fabricante de cada dispositivo que se conecta al
bus. Por tanto, la implementación del esclavo virtual
requiere establecer el protocolo de datos específico
de cada dispositivo utilizando las primitivas
directRead y directWrite.
//-------------------------------------------------------
public static void main(String args[]) {
CPU.initialize(); // To Initiate any of supported embedded
system.
DigitalBus bus =
CPU.getDigitalBus(0,DigitalBus.I2C,"Bus_I2
C");
TemperatureSlave sensor = new
LM92_I2CSlave(0,"LM92_sensor", bus);
while(true){
System.out.println("-Temperatura:
"+sensor.getTemperature());
try{
Thread.sleep(1000);
}catch(Exception e){
e.printStackTrace(); }
}
El mecanismo de extensión permite la total
independencia del esclavo virtual con respecto al tipo
de empotrado subyacente, e incluso podría ser
independiente respecto al tipo de bus digital, si el
protocolo de datos utilizado por el esclavo virtual
fuera el mismo para cualquiera de los buses digitales.
Esto se consigue al manejar los métodos abstractos
de la superclase DigitalBus, que por la ligadura
dinámica se especializan en los métodos de acceso al
bus concreto de las clases especializadas.
}
}
5
Fig. 5. Ejemplo de utilización de los buses digitales (para
un sensor de temperatura I2C LM92).
En el ejemplo de la figura 5 se puede ver como se
puede acceder de forma sencilla a un determinado
dispositivo electrónico a través del bus digital I2C,
concretamente a un sensor de temperatura I2C
LM92. En primer lugar la llamada estática initiliaze
de la clase CPU inicializa la plataforma JAHASE
para poder trabajar sobre el sistema empotrado
concreto. A continuación se obtiene el maestro
ANÁLISIS DE
EJECUCIÓN
ENTORNO
DE
Cuando se trabaja con sistemas empotrados es
importante conocer las prestaciones físicas del
mismo, el entorno de ejecución o sistema operativo
instalado, cómo se realiza la carga una aplicación en
el empotrado y las posibilidades que ofrece el
sistema para la programación. Toda esta información
facilita al desarrollador la implementación de
aplicaciones optimizadas en función de las
características concretas del entorno empotrado.
144
Aunque la plataforma JAHASE proporciona un
marco de trabajo común para trabajar con distintos
tipos de plataformas empotradas, su funcionamiento
está ligado a las características concretas del
empotrado concreto, por lo que debe incluirse un
banco de pruebas que permita conocer de forma
explícita las posibilidades que ofrece. JAHASE
suministra un componente que facilita el diseño de
aplicaciones de prueba, así como incorpora un
conjunto de ellas para que pueda ser utilizada por el
desarrollador.
método main, donde se utiliza un objeto “Logger”
como bitácora para ir capturando las salidas de los
diferentes experimentos, los cuales se ejecutan
mediante las sucesivas llamadas al método
“MakeTest” de la interfaz Tester (la cual se
implementa según cada tipo de experimento),
ofreciendo un modelo genérico y flexible para la
adición de nuevos experimentos.
class Test Diagram
logging::Logger
#
#
Test
contains
5.1 DISEÑO DE UN EXPERIMENTO
En esta sección se explica un modelo de
programación que hemos diseñado para posibilitar el
diseño de experimentos sobre el entorno de
JAHASE. El modelo desarrollado es adaptable a
diferentes gamas de experimentos, extensible para
que se puedan incluir nuevos tipos de experimentos,
y reutilizables para probar las diferentes plataformas
soportadas por JAHASE. Además es necesario
disponer de un mecanismo automático capaz de
chequear el sistema ante una serie de experimentos
predefinidos o diseñados por el desarrollador.
Básicamente el modelo genérico desarrollado
incluye:
1. Esquema del programa principal: Iniciará el
entorno JAHASE, creando un objeto logger
para ir registrando los datos de medición y
realizará cada uno de los experimentos, que
posteriormente pueden ser volcado a consola o
a un archivo.
2. Método de inclusión de carga y/o ruido: Con
objeto de comprobar la influencia de otras
hebras de ejecución en el propio experimento,
se han diseñado diferentes niveles de carga y/o
ruido (carga por número de hebras y periodo de
activación de éstas).
3. Método de realización común de los diferentes
tipos de medidas o pruebas: Mediante los
métodos establecidos en una interfaz de
experimento (Tester.java), se consigue que
cualquier objeto experimento que la
implemente podrá ser ejecutado dentro de este
esquema
(método
makeTest(subType,Logger,patrón))
4. Mecanismo de extensión para incorporar
nuevos tipos de medidas o pruebas: Para ello se
debe implementar la interfaz “Tester” y añadir
el nombre de la nueva clase en el listado de
experimentos a realizar (bien en un fichero de
texto o en un array de cadenas en el propio
programa principal)
En la figura 6 se puede ver la relación entre las
diferentes clases del modelo, incluyendo dos
ejemplos (rendimiento, latencia de despacho y
temporizador periódico). La clase “Test” contiene el
+
TESTS: String[] {readOnly}
+
main(String[]) : void
1
0..*
+
+
1 +
+
+
+
+
+
+
m_level: int
m_prefix: String
flush() : void
getHandler() : VHandler
getLevel() : int
getPrefix() : String
log(int, String) : void
log(int, String, Throwable) : void
setHandler(Vhandler) : void
setLevel(int) : void
setPrefix(String) : void
1..*
«interface»
Tester
+
+
+
+
+
+
+
+
+
GenericTester
destroy() : void
enableOverload(boolean) : void
getOverloadDesc(int) : String
getoverloadDescs() : String[]
getOverloadType() : int
getOverloadTypes() : int[]
isOverloadEnabled() : boolean
makeTest(int, Logger, String) : void
setOverload(int) : void
+
+
+
+
+
+
+
+
destroy() : void
enableOverload(boolean) : void
getOverloadDesc(int) : String
getoverloadDescs() : String[]
getOverloadType() : int
getOverloadTypes() : int[]
isOverloadEnabled() : boolean
setOverload(int) : void
PerformanceTester
+
DispatchLatencyTester
+
PeriodicTimerTester
+
Fig. 6. Diagrama de clases del modelo de test
Para realizar una medida basada en tiempo es
importante tener en cuenta algunas propiedades como
la resolución, precisión y granularidad de los relojes
que se utilizan durante el experimento y de los
resultados obtenidos [27]. Aunque en general los
empotrados suelen incluir algún reloj de alta
resolución basado en un reloj hardware a través de la
implementación de una clase particular, las
especificaciones Java hasta la JDK1.5 no incluyen un
mecanismo para medir tiempo por debajo del umbral
de los milisegundos.
5.2
EXPERIMENTOS REALIZADOS
A continuación se presentan algunos de los
experimentos realizados que son más significativos
en cuanto a características de tiempo real se refiere,
como son el experimento de rendimiento, el retardo
en la activación de los temporizadores, gestión de
memoria, latencias y jitter en la activación de tareas
periódicas. Todos los experimentos se ha realizado
con diferentes niveles (7) de sobrecarga en el
sistema.
Para el estudio de los experimentos realizados hemos
considerados 3 targets concretos: el entorno
empotrado SNAP de Imsys, el microcontrolador EJC
de Snijder, y el empotrado JSTIK de Systronix
145
basado en el procesador Java aJile aJ-100. Los datos
se contrastan con otra plataforma neutra basada en
una estación de trabajo SUN ULTRA con un
procesador con doble núcleo AMD Opteron 1214 a
una frecuencia de 2.2 Ghz (cada núcleo) y una
memoria caché de 2MB (1MB por núcleo) y 3GB de
memoria RAM, que incluye Windows XP de 32 bits
y JVM HotSpot basada en JDK 1.6.02.
5.2.1
Rendimiento (Performance)
-Descripción:
Para obtener una medida del rendimiento se han
realizado una serie de pruebas en la que se mide la
capacidad que tiene la JVM del entorno empotrado
para procesar distintos tipos de operaciones sobre
tipos primitivos (byte, int, etc.) o sobre colecciones
(Vector, array, etc.) y Strings. Este experimento se
basa en el benckmark realizado por Systronix [26].
-Procedimiento de medida:
Número de operaciones por segundo.
-Variable a medir:
Se utiliza el tiempo que tarda en procesarse n-veces
operaciones de copia sobre tipos primitivos (byte,
integer, double, float), arrays y cadenas String. A
partir del número total de operaciones realizadas se
puede obtener las operaciones por segundo de una
plataforma.
-Resultado:
Según la tabla 2, el sistema basado en el
microcontrolador aJile es superior en casi todas las
mediciones, lo cual se debe a que posee una
frecuencia de reloj bastante alta ya que utiliza un
modeo de ejecución basado en procesador Java. No
obstante, en algunas operaciones aritméticas obtiene
mejores resultados el sistema EJC pese a su menor
frecuencia de reloj. Por el contrario la plataforma
SNAP no puede utilizarse para aplicaciones que
demanden grandes prestaciones del entorno.
Tabla 2. Performance Summary
5.2.2
Temporizadores
-Descripción:
Se mide el tiempo transcurrido entre disparos, para
compararlo con el teórico previamente establecido
para los diferentes tipos de temporizadores de
JAHASE, oneShotTimer (un solo disparo),
PeriodicTimer (periódico), CycledTimer (cíclico)
obteniendo de este modo el retardo en el disparo del
temporizador.
La forma de realizar esta medida es, en primer lugar,
realizar un registro de tiempo antes de iniciar el
temporizador, y luego registrar el tiempo cada vez
que se activa el temporizador. Si el temporizador es
periódico, se podrá estimar la precisión a lo largo de
varios disparos. En el caso de un temporizador de un
solo disparo se medirá varias veces el tiempo en el
que se activa desde que se inicia. Se debe tomar
también el máximo y mínimo además del promedio
de tiempos y medir la dispersión en el caso de los
temporizadores periódicos.
-Variable a medir:
Retardo en el disparo del temporizador.
-Resultados:
Se puede observar en la tabla 3 que la plataforma
JSTIK prácticamente no tiene ningún retardo en la
activación del temporizador periódico ni en valores
medios ni en cuanto al valor máximo lo que
posiblemente se deba a que éstos se encuentren en un
rango de tiempos por debajo del milisegundo que es
la mínima resolución con la que se puede medir en
146
esta plataforma. En el caso de la estación de trabajo
se obtienen unos valores muy bajos indudablemente
debido a la diferencia de prestaciones de este tipo de
equipos. En cambio, en el resto de sistemas
empotrados, al aumentar el nivel de sobrecarga del
sistema, se comienza a retrasar el disparo del
temporizador periódico, siendo este aumento casi
lineal, tanto en el valor medio como en el máximo
tiempo de retraso.
Fig. 7.
5.2.3
VPeriodicTimer Maximum Summary
Gestión de Recursos de Memoria
-Descripción:
En este experimento se mide la capacidad que tiene
el entorno empotrado para la gestión de memoria.
JAHASE permite gestionar dos tipos de memoria: la
memoria Heap convencional de cualquier JVM y que
tiene asociado un recolector de basura para liberar y
desfragmentar automáticamente la memoria después
de su uso, y por otra parte, la memoria estática o
Memory Spaces, que establece un tipo de memoria
(similar a la memoria inmortal en RTSJ) donde se
impide la actuación del recolector de basura. Este
experimento está basado en el “HeapTest” de
Systronix [26].
Tabla 3. Timers Summary
Según la figura 7, la plataforma JSTIK es la más
estable de todas eliminando cualquier tipo de retardo
incluso cuando el sistema tiene la mayor carga
posible, salvo algún retardo de 1 ms debido a la poca
precisión del reloj utilizado para hacer las medidas.
Por tanto, esta plataforma es adecuada para el
desarrollo de aplicaciones de tiempo real.
En los demás casos, incluso en la estación de trabajo,
se observan retardos locales en la activación del
temporizador periódico que no se acumulan en las
sucesivas activaciones. Sólo la plataforma SNAP
muestra retardos locales excesivos que se acumulan
cuando el sistema tiene sobrecarga
-Variable a medir:
Estado de la memoria antes y después de la prueba,
y el tiempo necesario para realizar la prueba.
La forma de realizar esta medida consiste en reservar
diferentes tamaños de memoria varias veces, y
evaluar el tiempo que tarda en hacerlo, y en segundo
lugar medir el tiempo que tarda en copiar diferentes
bloques de datos dentro de la memoria. Para hacer el
experimento real, el proceso se realiza con varias
tareas que gestionan distintas partes de la memoria.
Este experimento se hace para los dos tipos de
memoria: Memoria controlada por el recolector de
basura y memoria no controlada por el recolector de
basura.
-Resultados:
En la figura 8 se observa el tiempo total de las
pruebas y el tiempo que requiere el recolector de
basura (GC Time). El tiempo total (TotalTime) hace
referencia al tiempo que tardan en ejecutarse
concurrentemente tres hebras. En este experimento
podemos comprobar en primer lugar que el tiempo
total necesario para la reserva y liberación sucesiva
de memoria aumenta a medida que aumenta la carga
del sistema. Ahora bien, el tiempo necesario es
mucho más severo en la plataforma SNAP, del orden
de 2500 segundos (40 minutos aproximadamente)
para cada ejecución, si lo comparamos con otras
plataformas casi dos órdenes de magnitud menor.
147
Para medir esta latencia, se hace al revés, ya que es
simétrico. Se marca el tiempo de una tarea más
prioritaria antes de hacer un yield() para devolver el
control al planificador, que justo a continuación
devuelve el control a la tarea menos prioritaria que se
encuentra bloqueada. Para que la tarea menos
prioritaria pueda marcar el tiempo se requiere que la
tarea más prioritaria active un flag en un recurso
compartido protegido. Esta medida se realiza durante
un número de veces dado.
b. Latencia de Evento
-Descripción:
La latencia de evento se corresponde con el tiempo
total desde la ocurrencia de un evento hasta que éste
es tratado.
-Variable a medir:
Latencia de evento.
Fig. 8.
Total Time and Total Garbage Collector Time
Summary
Pese a que SNAP es el que más tarda con diferencia,
el número de veces y por tanto el tiempo dedicado al
recolector de basura es menor que en el EJC y muy
similar al del JSTIK. Lo cual significa que el
recolector de basura implementado en dichas
plataformas es mucho más eficiente y menos
invasivo. Aunque el EJC llama un mayor número de
veces al recolector de basura, y el tiempo que le
dedica es bastante grande, sigue obteniendo un
tiempo total bastante bueno al tener mejor controlado
el tiempo necesario para reservar nueva memoria.
5.2.4
Latencias
En este experimento se mide la latencia del sistema
desde varios puntos de vista: la latencia de despacho
debido al cambio de contexto, la latencia debido al
mecanismo de sincronización, y la latencia asociada
a la activación de eventos.
a. Latencia de Despacho
-Descripción:
La latencia de despacho se corresponde con el tiempo
total desde que una tarea es interrumpida por otra de
mayor prioridad hasta que la tarea de mayor
prioridad ocupa el procesador.
-Variable a medir:
Latencia de despacho.
- Procedimiento de medida:
Para medir la latencia de evento se marca el tiempo
antes de disparar un evento AsyncEvent, y luego se
mide justo en el momento en el que se activa el
manejador de evento AsyncEventHandler. Esto se
repite un número determinado de veces.
c.
Latencia de Sincronización
-Descripción:
La latencia de sincronización mide el tiempo que
tarda un hilo en tomar el control desde que adquiere
el cerrojo para entrar en la sección crítica.
-Variable a medir:
Latencia de sincronización.
- Procedimiento de medida:
Para medir la latencia de sincronización se registra el
tiempo antes de invocar un método protegido por
synchronized, y después de entrar en la sección
crítica. Para medir el tiempo se resta el tiempo que
tardaría en invocar un método no synchronized.
-Resultados:
Según la tabla 4 y la figura 9, la latencia de despacho
crece de forma lineal tanto en el sistema EJC como
en el SNAP (en el ultimo con una pendiente mucho
más pronunciada como cabía esperar). Sin embargo
en el JSTIK es muy baja incluso en alta sobrecarga,
obteniendo resultados muy semejantes a los de la
estación de trabajo. En el caso de la latencia de
evento (tabla 4) se obtienen unos resultados bastante
parecidos a los de la de despacho. En el EJC y SNAP
crece según el nivel de carga de manera lineal,
mientras que en el JSTIK se mantiene estable y muy
148
baja (por debajo del milisegundo). Respecto a los
valores mínimos, en el EJC siempre es superior a 2
ms, mientras que en SNAP el mínimo es 4 ms en
situaciones de sobrecarga nula. Estos datos pueden
ser relevantes cuando se requiera un tiempo respuesta
frente a eventos con una marque de tiempo muy
reducido.
5.2.5
Jitter en la activación de tareas periódicas
El jitter nos permite medir la fluctuación que hay en
la activación de una tarea periódica; es decir, la
diferencia entre el máximo retardo en la activación
de una tarea y el mínimo retardo en la activación de
una tarea. Para ello hemos realizado dos tipos de
experimentos.
En el primer experimento que hemos denominado
modelo síncrono, la activación de las tareas
periódicas se programa mediante un hilo que, dentro
de un bucle, bloquea la ejecución del mismo el
tiempo especificado por el periodo mediante
instrucciones basadas en retardos como sleep tal y
como se muestra a continuación:
while (true) {
suspensión (Periodo);
código de la tarea periódica
}
Para medir el jitter tenemos que estudiar el retraso
que se origina entre la activación real de la tarea
periódica y la esperada.
Tabla 4. Latencies Summary
La latencia de sincronización vuelve a ser
dependiente del nivel de sobrecarga, incluso para el
JSTIK, que se ve afectado por las altas sobrecargas.
El sistema JSTIK presenta varios picos muy
pronunciados bajo alta sobrecarga, haciéndolo poco
recomendable para aplicaciones que requieran un
gran número de métodos sincronizados y existan un
alto grado de carga en el sistema. Por el contrario, el
EJC es ahora el que, en su línea, mejor responde
entre los sistemas empotrados para la sincronización
de métodos en altos niveles de carga. La latencia es
crece de forma lineal según la carga, por lo que es
más predecible.
Dispatch Latency Average Summary
SNAP
JStik
EJC
Sun
360
320
280
Time (ms)
240
En el segundo tipo de experimento basado en
temporizadores de JAHASE, que hemos denominado
modelo asíncrono, utilizamos el mecanismo basado
en eventos, de modo que la tarea periódica se activa
como consecuencia de la ocurrencia de un evento
asíncrono que se dispara en el momento en el que se
agota el tiempo programado para el temporizador. En
ese momento se transfiere el control en un manejador
de eventos bloqueado a la espera del disparo de
eventos de temporizador. A la diferencia entre el
tiempo de activación de la tarea y el esperado lo
hemos denominado TimerDelay.
Para ambos experimentos hemos considerado que la
tarea periódica se ejecuta con un periodo de 200 ms.
-Descripción:
En este experimento se mide el jitter que presenta el
entorno empotrado, es decir, las fluctuaciones que se
producen en la activación de tareas periódicas. Dado
que JAHASE funciona sobre la JVM del empotrado,
su medida dependerá esencialmente de la JVM
subyacente.
200
160
-Variable a medir:
Jitter, es decir, la dispersión máxima que hay entre la
activación real y la activación teórica.
120
80
40
0
None
Very Low
Low
Medium
High
Very High
Overload Levels
Fig. 9.
Dispatch Latency Summary (Average &
Deviation)
Para medir el jitter se registra los instantes de
activación de una tarea periódica respecto del
instante en que realmente debería haberse activado
149
(que se calcula previamente antes de iniciar la tarea
periódica).
debido a que el jitter no es demasiado excesivo en
altos niveles de sobrecarga.
Jitter (Asynchronous Model) Average Summary
-Resultados:
Viendo estos resultados (tabla 5), se podría decir que
en ausencia de sobrecarga, el Jitter de los diferentes
sistemas es el siguiente:
-SNAP: entorno a los 3-4 milisegundos.
-EJC: entorno a los 1-2 milisegundos
-JSTIK: entorno al milisegundo.
-SUN Ultra 20: por debajo del milisegundo.
200
SNAP
JStik
EJC
Sun
180
160
Time (ms)
140
120
100
80
60
40
20
0
None
Very Low
Low
Medium
High
Very High
Overload Levels
Fig. 11. Jitter (Asynchronous Model)
6
CONCLUSIONES
FUTURO
Y
TRABAJO
Java constituye una plataforma muy potente para el
desarrollo de aplicaciones empotradas con
requerimientos estrictos de rendimiento y
prestaciones. Sin embargo, existe una gran cantidad
de especificaciones, mecanismos y modelos de
programación particulares a cada fabricante de
dispositivos, lo cual dificulta la adopción de Java por
parte de los desarrolladores de sistemas empotrados.
Tabla 5. Jitter Times (ms)
No obstante, si aumentamos el nivel de carga los
tiempos crecen, lo cual hace que el sistema sea
menos predecible y menos adecuado para
aplicaciones con requisitos temporales estrictos.
Jitter (Synchronous Model) Average Summary
200
SNAP
JStik
EJC
Sun
180
160
Time (ms)
140
120
100
80
En este trabajo se han expuesto las características
principales de una muestra significativa de las
soluciones Java empotradas más populares, y se
propone una plataforma de alto nivel, basada en los
mecanismos abstractos proporcionados por los
distintos fabricantes, que permite manejar de forma
homogénea y coherente los recursos de bajo nivel de
todos los dispositivos analizados. En el futuro se
prevé expandir la plataforma desarrollada para cubrir
mecanismos y tecnologías aún no soportadas, como
una adaptación a una JVM conforme a RTSJ como
Jamaica o JRate, así como el diseño de una jvm para
sistemas empotrados de baja escala con recursos muy
limitados en memoria y procesamiento.
60
40
Referencias
20
0
None
Very Low
Low
Medium
High
Very High
Overload Levels
Fig. 10. Jitter (Synchronous Model)
De nuevo el empotrado JSTIK ofrece muy buenos
resultados, casi exactos a la temporización teórica
(jitter Æ 0), aunque el EJC también se puede utilizar
[1] Agosta G., Crespi S., Svelto G. (2006)
“Jelatine: a virtual machine for small embedded
systems”. Proceedings of the 4th international
workshop on Java technologies for real-time
and embedded systems. Francia.
[2] aJile Processor®, http://www.ajile.com
150
[3] Chen,G. (2001) “PennBench: a benchmark suite
for embedded Java”, Proceedings. WWC-5.
2002 IEEE International Workshop on
Workload Characterization, pp. 71-80.
[4] Corsaro, A., (2002) “Evaluating real-time Java
features and performance for real-time
embedded systems”, IEEE Real-Time and
Embedded Technology and Applications
Symposium, 2002. Proceedings., pp. 90-100.
[5] Debardelaben, J.A., (1997) “Incorporating cost
modeling in embedded-system design”, Design
& Test of Computers, IEEE, pp. 24-35.
[6] Debbabi, M., (2005) “Accelerating embedded
Java for mobile devices”, Communications
Magazine, IEEE, pp 80-85.
[7] Hardin, D.S., (2001) “Real-Time Objects on the
Bare Metal: An Efficient Hardware Realization
of the JavaTM Virtual Machine”, Proceedings.
4th International Symposium on ObjectOriented Real-Time Distributed Computing, pp.
53-59.
[8] Holgado J.A., Moreno A., Capel M. (2007)
“Java-based Adaptable Middleware Platform
for Virtual Instrumentation”. Proceedings IEEE
Int. Conf. on Virtual Environments, HumanComputer Interfaces and Measurement Systems.
Editorial: IEEE. Italia, pp. 1-6.
[15] Kuchana, P, (1994) “Software Architecture
Design
Patterns
in
Java”,
Auerbach
Publications.
[16] Meyer, B. (2000) “Object-Oriented Software
Construction”. 2nd Edition. Prentice-Hall
[17] Mulchandani D., Java for Embedded Systems,
IEEE Internet Computing, Vol2 30-39 (1998)
[18] Nilsson, A., (2001) “Deterministic Java in Tiny
Embedded Systems”, Object-Oriented RealTime Distributed Computing, 2001. ISORC2001. Proceedings. Fourth IEEE International
Symposium on Object-Oriented Real-Time
Distributed Computing, pp. 60-68.
[19] PersonalJava Application Environment, Sun
Microsystems,
http://java.sun.com/products/personaljava/
[20] Schoeberl, M., (2004) “Restrictions of Java for
Embedded Real-Time Systems”, ObjectOriented Real-Time Distributed Computing,
2004. Proceedings. Seventh IEEE International
Symposium on Object-oriented Real-time
Distributed Computing, pp. 93- 100.
[21] Schoeberl, M., (2005) “JOP: A Java Optimized
Processor for Embedded Real-Time Systems”,
Memoria de Tesis Doctoral, Vienna University
of Technology.
[9] Holgado J.A., Viúdez J. (2007) “Arquitectura
abierta basada en Java para Entornos
Domóticos”. 3rd Int. Symposium on Ubiquitous
Computing & Ambient Intelligence. Zaragoza,
pp. 57-66.
[22] Sharp D., Pla E., Luecke K. (2003) “Evaluating
mission critical large-scale embedded system
performance in real-time”. Proceedings of the
24th IEEE International Real-Time Systems
Symposium. IEEE Computer Society,
[10] Holgado J.A., Viúdez J, Capel M.I., Montes
J.M. (2006) “Diseño de un Sistema de Control
Domótico basado en Java”. Actas de las XXVII
Jornadas de Automática (CEA), Almería, pp.
1401-1408.
[23] Siebert F. (2002) “Bringing the full power of
java technology to embedded realtiem
applications”. MSy'02 Embedded Systems in
Mechatronics, 3.-4. Oct 2002, Winterthur,
Switzerland
[11] Java 2 Micro Edition, Sun Developer Network,
Sun Microsystems, http://java.sun.com/javame
[24] Snap®.
Imsys
Development
tools,
http://www.imsys.se/products/devtools.htm.
[12] Java Card Documentation, Sun Microsystems,
http://java.sun.com/products/javacard/
[25] Snijder®, http://www.embedded-web.com
[26] Systronix, http://www.systronix.com/
[13] Java SE Real-Time, Sun Microsystems,
http://java.sun.com/javase/technologies/realtime
.jsp
[14] Javelin
Stamp
de
http://www.parallax.com/javelin/.
[27] Stewart D. (2006) “Measuring execution time
and real time performance”. Embedded Systems
Conference. Boston.
Parallax,
[28] Strom O., Svarstad K. and Ass E. (2003) “On
the Utilization of Java Technology in
Embedded Systems”. Design Automation for
Embedded Systems, 8, 87-106 (2003).
[29] Tan, Y.Y. Yau, C.H. Lo, K.M. Yu, W.S.
Mok, P.L.
Fong, A.S. (2006) “Design and
implementation of Java processors”. Computers
and Digital Techniques, IEE Proceedings, pp.
20-30.
[30] Tini. Tiny Internet Interface de Maxim,
http://www.maximic.com/products/microcontrollers/tini/
[31] Viúdez J., Holgado J.A. (2007) "Diseño y
construcción de una maqueta domótica
controlable a través de microcontroladores
Java”. V Jornadas de Enseñanza a través de
Internet/Web de la Ingeniería de Sistemas y
Automática (EIWISA’07). Zaragoza, pp. 47-52.
[32] Viúdez J. (2007) “Plataforma para el diseño de
Sistemas de Control en entornos empotrados
basada en Java”. Máster en Desarrollo de
Software. Dpto. Lenguajes y Sistemas
Informáticos. Universidad de Granada.
[33] Wellings A., (2004) "Concurrent and Real-Time
Programming in java" Ed. Wiley
151
6. Sistemas de Control
155
A Taxonomy on Prior Work on Sampling Period Selection for
Resource-Constrained Real-Time Control Systems
Camilo Lozoya, Manel Velasco, Pau Martı́, José Yépez, Frederic Pérez, Josep Guàrdia,
Jordi Ayza, Ricard Villà and Josep M. Fuertes
Automatic Control Department
Technical University of Catalonia
Abstract
In this paper we present a non-complete taxonomy on
prior work on sampling period selection for resourceconstrained real-time control systems. Selection of sampling periods for real-time control tasks determines resource utilization (or alternatively, task set schedulability)
as well as overall control performance. At the end, it determines the sequence of control tasks instances’ executions
over time, that is, the schedule. Different schedules are obtained depending on which criterion is used to select sampling periods, what real-time paradigm is demanded in the
underlying executing platform, who should decide which
task to execute, when the decision is taken, and how the decision is enforced. To analyze all these aspects, ten papers
from the last decade, one per year from 1998 to 2007, have
been selected and categorized. The taxonomy, although being incomplete, reveals key tendencies and rises important
research challenges.
1. Introduction
Traditionally, computer controlled systems are implemented using real-time periodic tasks. Each periodic task
is statically assigned a period obtained following wellestablish procedures that mandate to sample and control
periodically. However, the embedded systems market requires systems with more and better functionalities at lower
prices. A consequence is that control applications must be
implemented in platforms where resources are scarce and/or
where increasing performance is a must. And the traditional static periodic approach to control systems implementation fails at minimizing resource utilization and maximizing control performance.
To provide solutions fulfilling the tight demands posed
by modern embedded systems, the control and real-time
communities have shown in recent years a renewed inter-
est on deriving novel sampling period selection methods for
efficient implementation of real-time control systems. In
this paper we present a non-complete taxonomy of some
of these methods. Although most of them have focused on
CPU, many can also be adapted to networks or battery limited platforms.
Many of the novel methods for sampling period selection go beyond than just finding the best values for control task periods. They provide complete real-time frameworks tailored to effective concurrent execution of control
tasks. They are characterized by which criterion is used to
select sampling periods, thus establishing what real-time
paradigm is demanded in the underlying executing platform, who should decide which task to execute, when the
decision is taken, and how the decision is enforced.
To analyze these frameworks and their key properties in
terms of Which, What, Who, When and How, ten papers [1][10] from the last decade (1998-2007) have been arbitrarily
selected and categorized. Papers are listed in chronological
order in the references section to provide an initial view of
the contributions over time. The selection of one paper per
year, although being incomplete in terms of time coverage
and number of papers, collects a wide variety of approaches
while revealing the key existing tendencies on control task
scheduling, and rising important research challenges.
2. Taxonomy
A key aspect of these new methods is the theoretical criterion used to obtain the set of sampling periods. From the
criterion, key aspects are derived and analyzed in order to
construct the taxonomy summarized in table 1.
2.1. Criterion
Two main criterion can be identified: optimization approach or bounding the intersampling dynamics.
156
[1] Set98
[2] Arz99
[3] Reh00
[4] Hri01
[5] Pal02
[6] Cha03
[7] Mar04
[8] Hen05
[9] Tab06
[10] Lem07
Table 1. Taxonomy of sampling period selection approaches.
Which
What
Who
When
How
Criterion
Triggering Triggering
Solving
Solution
Timing
Paradigm
Entity
the problem
Constraints
Optimizat.
TT
Coord.
Off-line
Periods
Static periodic
Bound dyn.
ET
Task
On-line
Periods
Aperiodic ET
Optimizat.
TT
Coord.
Off-line
Sequences Static pseudo periodic
Bound dyn.
TT
Coord.
Off-line
Sequences Static pseudo periodic
Optimizat.
TT
Coord.
Off-line
Periods
Static periodic
Optimizat.
TT
Coord.
Off-line
Periods
Static periodic
Optimizat.
TT
Coord.
On-line
Periods
Varying periodic
Optimizat.
TT
Coord.
On-line
Periods
Varying periodic
Bound dyn.
ET
Task
On-line
Periods
Aperiodic TT
Bound dyn.
ET
Task
On-line
Periods
Aperiodic TT
Sched.
EDF/FP
Missing
Cyc. Ex.
Cyc. Ex.
EDF
EDF/FP
EDF
EDF
Missing
Elastic sch.
In the optimization approaches ([1], [3], [5], [6], [7], [8])
sampling periods are selected to solve an optimization problem. They assume that there is a cost function parameterized in terms of control performance and sampling periods that has to be minimized or maximized depending
on whether it denotes penalty or benefit. The optimization
problem domain is restricted by closed loop stability and
task set schedulability constraints.
In the approaches based on bounding the intersampling
dynamics ([2], [4], [9], [10]), sampling periods are selected to keep each closed loop dynamics within predefined
thresholds. Thresholds, which are derived from pure control theoretical approaches, are used to bound changes in
the dynamics or to ensure closed loop stability.
It is important to identify whether the theoretical criteria
capture the dual problem posed by modern embedded systems: minimizing resource utilization and maximizing control performance. In all the optimization approaches the duality is captured by the cost function and optimization constraints. However, the bounding approaches usually only
capture control performance issues. Therefore, utilization,
and at the end schedulability, is not addressed.
contrary, in the solutions requiring an ET architecture control tasks are in charge of deciding their periods.
2.2. Triggering paradigm and entity
Once periods are selected, they must be enforced by the
underlying real-time architecture. Therefore, it is important
to examine how the solutions are enforced. This can be
analyzed in a three step procedure: first, looking at the
solution in more detail; second, looking at the type of demanded timing constrains; and third, looking at scheduling
policies capable of enforcing such timing constraints.
These two criteria influence whether the period selection solution requires a real-time architecture following
a time-triggered (TT) or event-triggered (ET) paradigm.
All the solutions to the optimization approaches require a
time-triggered architecture while almost all the solutions
based on bounding closed-loop dynamics require an eventtriggered architecture, except for [4].
The classification considers who is in charge of selecting
sampling periods (triggering entity). All solutions requiring
a TT architecture are based on a global coordinator that decides the best periods for the set of control tasks. On the
2.3. When to solve the problem
The previous classification (TT vs. ET) relates to
whether the period selection is performed off-line or online. In all ET approaches ([2], [9], [10]) periods are derived on-line. However, in the TT approaches, some solutions have to be computed off-line ([1], [3], [4], [5], [6])
while the rest are on-line ([7], [8]).
It is important to identify when the sampling periods are
selected for two main reasons. First, computational overhead must be considered, which may be a disadvantage
for on-line approaches. Second, ability to adapt to workload changes, this means that varying available resources
or varying demands from the control applications has to be
also accounted for, which may be an advantage for on-line
approaches.
2.4. Solution and its enforcement using
real-time technology
Solution. Although the taxonomy reviews methods for
sampling period selection, two of the methods ([3], [4])
do not establish sampling periods. Rather they provide
periodic sequences of ordered control task instances. All
the others provide periods, time intervals or timing bounds
157
that establishes when tasks have to be executed, i.e., they
provide diverse control tasks timing constraints.
demands, considering the whole task set. However, if the
elastic scheduling can not met them, problems may occur.
Timing constraints. All solutions following a TT
paradigm impose periodic timing constraints for control
tasks. In particular, [1], [5], and [6] specify static periodic
timing constraints for control tasks. That is, the outcome of
solving off-line their particular methods is a set of periods
for control tasks, which will not change at run-time, named
static periodic. Similarly, [3] and [4] specify static periodic
sequences of ordered tasks instances that will not change at
run-time. Looking at a single task, this would be a static
pseudo-periodic execution. Finally, the on-line solution to
the frameworks presented by [7] and [8] provide periods for
control tasks that will change at run-time, named varying
periodic.
The timeliness of the solutions demanding an eventtriggered architecture is in the general case aperiodic.
However, different type of aperiodicities can be distinguished. In [2], the execution is purely aperiodic and it
is triggered whenever an external event detected, using
specific hardware (event detector), is identified. For control
safety reasons, an upper bound is imposed in order to force
an execution if no events are detected. The triggering
condition can not be predicted. In [9], although tasks
execute aperiodically, a lower bound on the inter-arrival
of job executions is predicted at each job execution, thus
indicating an sporadic task behavior. Finally, in [10], the
execution is again aperiodic, but with the advantage that at
each job execution the next job deadline is predicted. Both
approaches [9] and [10] can be considered as aperiodic
time-triggered.
2.5. Discussion
Scheduling. All solutions demanding a time-triggered
architecture can enforce the derived timing constraints for
control tasks using well known scheduling strategies. In
particular, [3] and [4] can exploit cyclic executives, while
the rest can exploit scheduling policies for periodic tasks,
such as earliest deadline first (EDF) and fixed priority (FP).
Table 1 specifies for each of these solutions what scheduling
policy can be applied. For the on-line approach [7] a specific resource allocator that computes on-line the sampling
periods is required before the EDF dispatching. In the other
on-line approach [8], EDF can be applied directly because
the computation of the periods is performed periodically for
a particular task named feedback scheduler task.
For the solutions demanding an event-based architecture,
the scheduling policy, that can enforce the presented solution, is lacking in the general case. Only the result provided in [10] integrates the presented even-triggered control with existing scheduling theory. At each job execution
the deadline for the following job is predicted and the elastic scheduling is invoked to accommodate the new timing
The presented taxonomy mainly considers key real-time
aspects of the reviewed methods. However, some other aspects have been omitted: Which task model is used in terms
of avoiding/reducing sampling and latency jitters? In the
optimization problem, are the solutions general or depend
on each controlled system? Are they exact (closed forms)
or approximates? Solving them also means to obtain the
appropriated controller gains?. Looking more at control aspects, questions not analyzed are: Which type of controllers
support the presented solutions? Are observers considered?
Is noise also considered?
Overall, many questions have not been analyzed. However, the taxonomy is not closed, and all the previous questions can be incorporated. In addition, many existing papers on sampling period selection (and related issues) for
real-time control systems have not been cited nor analyzed.
They could be also included in the taxonomy. However, the
main tendencies that the taxonomy reveals and that we analyze next would remain the same (or very similar).
3. Tendencies
To identify tendencies, we focus the attention to columns
When and What of Table 1, reading them chronologically,
from top to bottom.
The When column, that refers to whether the sampling
period selection is performed off-line or on-line clearly
shows a tendency to on-line approaches. This tendency
reflects and aims at meeting the demands of modern embedded systems that are required to work in dynamic environments, being adaptive to the available resources that can
change abruptly, or to the resource demands of control applications that can be considered as varying depending on
the state of the controlled plants.
The What column refers to whether the presented solution requires a time or event triggered real-time architecture. The first conclusion that can be extracted is that TT
solutions are more common that ET solutions. But more
revealing, it shows a tendency toward event-triggered approaches. A reason for such trend could be to force up to
the limit the logics of the periodic on-line approaches ([7]
or [8]). In on-line approaches, sampling periods are hold
until the sampling period selection procedure takes place,
and new periods are derived. This logic has been shown
to be effective at minimizing resource utilization and/or at
maximizing control performance. Forcing this logic to the
limit means executing the sampling period selection procedure each time a control task instance executes, which may
158
provide even better resource utilization and better control
performance.
4. Research challenges
Key tendencies indicate a current interest on on-line
event-triggered approaches. However, there is a recognized
lack of scheduling support for event-based control theoretical results [2], problem that can be also identified by looking at the last column of table 1: scheduling solutions for [2]
and [9] are missing, and the scheduling solution adopted by
[10] may not be able to fulfill the demanded timing.
A key property of solutions demanding a TT architecture is that they include resource constraints in the theoretical criterion. Therefore, task set schedulability is already
considered for example in terms of utilization tests, as mentioned in Section 2.1. However, the criteria used to derive
sampling periods for the ET approaches do not consider resource constraints in the formulation. Therefore, the existing solutions do not implicitly solve the scheduling.
To overcome this limitation, several solutions can be envisioned. In the last two reviewed papers, [9] and [10],
tasks decide their rate of progress in an event-triggered fashion. Specifically, at each control task instance execution,
the same information used to compute the next value of the
control signal is also used to decide when the control task
has to be applied again. The latest decision could also consider resource utilization. As mentioned earlier, resource
utilization is usually expressed in terms of the utilization
factor. However, this measure is atemporal and pessimistic.
A more appealing metric for considering when the next instance has to be executed may be the synthetic utilization
factor, which depends on time and was developed for aperiodic scheduling. Future work will explore this approach.
A more radical solution envisions the dispatching of aperiodic jobs without characterizing them with timing constraints (periods and deadlines). The solutions presented
in [9] and [10] translate at each job execution, control demands into next job timing constraints. However, an alternative vision could be to avoid translating control demands
into timing constraints. The approach would require to have
all control tasks always ready to execute, and to have a
global coordinator that picks each time the most appropriated task for execution. In this case, concepts like task set
schedulability will not hold, and equivalent concepts should
be derived. For example, schedulabiliy could mean stability
in the sense of determining how many control loops can be
kept stable. Future work will also explore this approach.
5. Conclusions
A 10-Year taxonomy on prior work on sampling period
selection for resource-constrained real-time control systems
has been presented. The taxonomy shows that current
trends point to on-line event-triggered approaches, that is,
to select periods at run-time generating aperiodic task instances executions. In addition, research challenges for effective building these type of solutions have been presented
and discussed.
References
[1] D. Seto, J.P. Lehoczky, L. Sha, “Task Period Selection and Schedulability in Real-Time Systems”, IEEE
Real-Time Systems Symposium, 1998
[2] K.-E. Årzén, “A Simple Event-Based PID Controller”,
14th World Congress of IFAC, January, 1999.
[3] H. Rehbinder, and M. Sanfridson, “Integration of offline scheduling and optimal control”, 12th Euromicro
Conference on Real-Time Systems, 2000.
[4] D. Hristu-Varsakelis, “Feedback control systems as
users of a shared network:communication sequences
that guarantee stability”, 40th IEEE Conference on
Decision and Control, 2001
[5] L. Palopoli, C. Pinello, A. L. Sangiovanni-Vincentelli,
L. Elghaoui and A. Bicchi, “Synthesis of Robust Control Systems under Resource Constraints”, Hybrid
Systems: Computation and Control, 2002.
[6] R. Chandra, X. Liu, and L. Sha, “On the Scheduling
of Flexible and Reliable Real-Time Control Systems”,
Real-Time Systems 24(2), March 2003.
[7] P. Martı́, C. Lin, S. Brandt, M. Velasco and J.M.
Fuertes, “Optimal State Feedback Based Resource
Allocation for Resource-Constrained Control Tasks”,
25th IEEE Real-Time Systems Symposium, Lisbon,
Portugal, December 2004.
[8] D. Henriksson, and A, Cervin, “Optimal On-line Sampling Period Assignment for Real-Time Control Tasks
Based on Plant State Information”, 44th IEEE Conference on Decision and Control and European Control
Conference ECC 2005, December 2005.
[9] P. Tabuada and X. Wang, “Preliminary results on statetriggered scheduling of stabilizing control tasks”, 45th
IEEE Conference on Decision and Control, December
2006.
[10] M. Lemmon, T. Chantem, X. Hu, and M. Zyskowski,
“On Self-Triggered Full Information H-infinity Controllers”, Hybrid Systems: Computation and Control,
April 2007
159
Distributed Control of parallel robots using passive sensor data
Asier Zubizarreta, Itziar Cabanes, Marga Marcos, Dario Orive, Charles Pinto
University of the Basque Country
Abstract
The present article introduces a novel control architecture for parallel robots. A closed form of
the dynamic model of parallel robots is difficult to
obtain, due to the complex kinematic relations of
these kind of mechanism. However, with the use
of the extra data provided by passive sensors, kinematics and dynamic modelling can be simplified.
The dynamic model can be used to implement advanced control techniques to improve que efficiency
of parallel robots. In this paper, mono and multiarticular control techniques are implemented on a 5R
parallel robot, showing that the use of the of extra
sensor data leads to better and accurate control.
ature belong to PID based monoarticular control
techniques, leaving model-based control an almost
unexplored area.
As most of the control techniques are based in
serial robot control schemes, researchers contributions in this area can be grouped in two general
approaches: monoarticular and multiarticular control. In monoarticular control each actuated joint
is considered separately as a system, so the rest
of the robot is considered as a disturbance over it.
However, its efficiency is poor in high-speed or precision tasks due to effect of the dynamics of the
rest of the robot. To increase the rejection factor
of this disturbance, Chiacchio, et al. [2] propose
to add to the traditional servocontrol scheme an
external acceleration loop. As acceleration is difficult to measure, they propose an State-Space filter
1 Introduction
to calculate it. They also conclude that including
Accuracy and high-speed operation are two op- a feedfordward loop with the inverse dynamics of
posed characteristics required for actual robotic ap- the actuated joint improves the control efficiency.
plications. However, actual serial robots cannot op- Feed-fordward loops are also implemented on [1] to
erate a high-speed without showing poor accuracy reduce the effect of dynamic coupling between actudue to their serial chain structure. As an alterna- ators in a 2DOF parallel robot. Another proposed
tive, Parallel Kinematic Robots are proposed [14]. technique to reduce the positioning error is to imThis kind of mechanisms are composed of a end- plement a PD plus gravity compensation control,
effector platform and a group of serial subchains which requires calculating the gravity term of the
joining it to a fixed base. This structure provides a dynamic model of the robot. Gunawardana and
high stiffness that makes them appropriate for high Ghorbel in [6],[8],[7] apply this control scheme to
loads or task in which high-speed and accuracy are the 5R parallel manipulator. The gravity term is
calculated based on the dynamic modelling method
required.
proposed by the authors and making use of the
However, to exploit the full potential of these
principle of virtual work and reduced modelling
robots advanced control techniques based on the
technique. Su, et al. [18] combines this technique
dynamic model are necesary. This can be a diffiwith a cross-coupling control, in order to compencult task, as these mechanisms have highly coupled
sate the disturbances of the dynamics of the rest of
kinematics and dynamics, which, in most cases,
the robot on the actuated joint.
cannot be solved in closed form and require numerical approaches. As there is not an extended
Multiarticular control, on the other side, considgeneralized approach to dynamic modelling of par- ers the whole robot as a system and controls its
allel robots, most control approaches in the liter- actuated joints considering the coupling between
160
the joints. If the model is accurate, this technique presents better efficiency than the previous
one. The most extended control technique in this
group is Computed Torque Control (CTC), whose
basic idea is to compensate the nonlinear dynamics of the robot using the inverse dynamic model of
the robot. Due to the difficulty of obtaining this
model, few approaches can be found in the literature. Codourey, [3] obtains its inverse dynamic
model and uses it to implement the CTC control
scheme, which is found to reduce the tracking errors in a pick-and-place application compared to
monoarticular control schemes. A similar study
with the same conclusions is done in Denkena and
Holtz in [4], in which CTC scheme is applied in
the 6 DOF PaLiDa robot. However, to accurately
implement the CTC scheme, the inverse dynamic
model’s parameters must be identified, which is not
an easy task in most cases. To avoid this, some
adaptive control schemes have been combined with
CTC schemes. Honegger, et al. in [9] applies this
method to the 6 DOF Hexaglide, Pietsch, et al.
in [17] uses the 2 dof PORTYs robot. Another
technique to compensate the unmodeled dynamics
and the parameter variation is the use of the Dual
Model-Based Control proposed by Li [10].
Along with these two approaches, some authors
have tried to apply other control techniques to parallel robots, most of them directly exported from
serial ones. That is the case in [19], where Vivas
and Poignet apply a Predictive Control Scheme to
H4 robot. Garrido [5] applies vision control techniques to a redundant 2 DOF planar parallel robot.
Another interesting approach is the Design for
Control approach, proposed by Li [11], where the
objective is to design parallel robots in order to obtain a simple dynamic model. The basic idea is that
complex dynamics require complex control techniques, and simple dynamics require simple control.
This way, a simple control can be used to control
the system, obtaining a similar performance level
than that obtained with an advanced control technique.
As stated above, to obtain a precision and high
speed robot operation, advanced and model-based
control techniques are required. However, due to
the construction of parallel robots, the control loop,
even those proposed by advanced controls, only
consider active joints. This way, and from control engineering point of view, the rest of the me-
chanical structure, that is, the structure from the
active joints to the end effector, remains on openloop.Therefore the accuracy of the positioning of
the end-effector regards on the accuracy of modeling and parameter identification. To improve control, some authors have proposed the use of extra
sensors in passive joints [13]. Although sensors can
be difficult to introduce in the mechanical structure, their advantages compensate the effort. For
instance, with a certain number of sensors in strategic passive joints, direct kinematic model can be obtained analytically in most cases, and the dynamic
model can be simplified. This approach is applied
by Marquet, et al. in [12] to H4 robot, using simple monoarticular PID control. Their result show
an increase on the positioning accuracy.
In this paper, based on this approach, a modified
CTC scheme is presented. The objective is to introduce passive sensor data in the control scheme in order to increase positioning accuracy and speed. For
that purpose simulations on the 5R parallel robot
are presented. The rest of the paper is organized
as follows: Section 2 presents the 5R parallel robot
and its kinematic and dynamic model. Section 3
describes the CTC control scheme with extra sensor
data. On section 4 control performance data based
on Matlab-ADAMS cosimulation is presented. On
section 5 the most reults are summarized.
2
Kinematic and Dynamic
Modelling of 5R parallel
robot considering passive
joints
The 5R 2-DOF parallel robot consists of 4 mobile
links connected by 5 revolute joints. Active and
passive joints are identified in Figure 1.
Where qai correspond to the actuated or active
qa
joints and qpi to the passive ones, and q =
,
qp
dq
dq
d
q̇ = dt and q̈ = dt
dt . Cartesian coordinates will
x
be denoted as x =
.
y
Kinematic and Dynamic Models considering only
active sensors are obtained in [8].
161
2.1.2 Inverse Kinematic Model
Inverse Kinematic Model relates cartesian coordinates x and articular ones q. Based on Figure 2 and
using the Cosine Theorem and trigonometric relations, the following expressions can be obtained:
!
2
2
2
l2
1 −L1 − x +y
y
+ arctan x
1(x2 +y 2 )L1
!
2
2
2
l2
2 −L2 − (L−x) +y
± arccos
+
1((L−x)2 +y 2 )L2
!
2
2
2
2
−l1 −L1 + x +y
qa1 = ± arccos
qa2 = π −
qp1 = ±π ∓ arccos
qp2 = ±π ∓ arccos
arctan
2L1 l1
!
2
2
2
−l2
2 −L2 + (L−x) +y
2L2 l2
(2)
Figure 1: 5R Robot Structure
2.1
Kinematic Modelling
y
L−x
2.1.3
Jacobian Matrix
From expression (1) and differentiating with reIn this section, the kinematic model of the 5-bar
spect to time:
mechanism considering the passive joints is preJe q̇a + Js q̇p = 0
(3)
sented.
With:
"
Js =
"
Figure 2: Triange Decomposition
Considering the triangle decomposition of Figure 2, direct and inverse kinematic models can be
derived.
2.1.1
−L1 sin qa1 − l1 sin(qa1 + qp1 )
L1 cos qa1 + l1 cos(qa1 + qp1 )
L2 sin qa2 + l2 sin(qa2 + qp2 )
−L2 cos qa2 − l2 cos(qa2 + qp2 )
#
l2 sin(qa2 + qp2 )
−l2 cos(qa2 + qp2 )
Je = −
−l1 sin(qa1 + qp1 )
l1 cos(qa1 + qp1 )
2.2
Dynamic Modelling
To implement CTC control Scheme considering
passive joint sensor data, it is necessary to consider
the passive joints information in the Inverse Dynamic Model (IDM). For that purpose, a reduced
model based method is used, based on those presented by Nakamura [16], Murray[15] and Ghorbel
[7]. The main idea is to divide the 5R robot though
the end effector obtaining 2 serial subchains. These
two subchains are considered as fully-actuated and
their dynamic model is obtained using the Lagrangian or Newton-Euler formulation. Therefore,
a set of 4 equations are obtained, corresponding to
both passive and active joints:

τa1
 τa2 
τr = 
= Dr · q̈ + Cr · q̇ + Gr
τp1 
τp2

(4)
Direct Kinematic Model
Using Direct Kinematics Model, cartesian coordinates x can be obtained from articular coordinates
q. Based on Figure 2, Direct Kinematics relating q
and x can be easily obtained as a redundant equation system:
Using the proposed formulation, the virtual
torques of the reduced model τr are projected
on
τa1
the active articular coordinate space τ =
τa2
by means of the transformation matrix T defined
as:
T =
x = L1 · cos qa1 + l1 · cos qp1 = L + L2 · cos qa2 + l2 · cos qp2
y = L1 · sin qa1 + l1 · sin qp1 = L2 · sin qa2 + l2 · sin qp2
(1)
I
−Js−1 · Je
(5)
This matrix has been obtained by several authors
[7][15][16] applying the Principle of Virtual Work.
#
162
So, the inverse dynamic model, respect to the articular coordinates q:
τ =T
T
· (Dr · q̈ + Cr · q̇ + Gr )
(6)
coordinates and velocities of the 4 passive and active joints to generate the feedfordward compensation torque. The control algorithm can be written
as stated in equation (8).
T
T
τ = T ·Dr ·(Kp · e + Kv · ė)+T ·(Dr · q̈d + Cr · q̇ + Gr ) (8)
3
Control Schemes using extra sensor data
Where q̈d is the desired acceleration.
Based on the Kinematic and dynamic model, two
control strategies are analyzed: monoarticular control, considering one PID control loop for each actuator, and CTC Scheme, using the inverse Dynamic
model of the 5R. The implementation of these two
schemes, when applied only to active joints, is the
same as on serial robots. However, when considering the extra passive sensor data, the system turns
redundant, as there are more error signals than actuators.
On monoarticular control, as there only exist two
Figure 4: CTC Scheme with Redundant Sensors
actuators in the robot, the T transformation matrix
is required to generate a combined error signal, as
As stated in Section 1, in theory adding extra
proposed in [12] (Figure 3). As stated in (3), this
sensor information leads to a scheme with improved
matrix is composed by the Jacobian matrix of the
robustness against model uncertainties.
robot, that projects the 4-Dimensional error e =
qd −q to the 2-Dimensional e∗ needed to implement
the control loops (equation (7)).
4 Simulation and Results
To validate the proposed control schemes, a set of
experiments have been conducted with the 5R parallel robot. Using ADAMS Multibody Software,
the robot has been modelled, and its parameters
identified. Then, the control loop has been implemented on Matlab/Simulink enviroment, using
ADAMS as the plant. In this Co-Simulation Enviroment, the 4 control schemes previously introduced (PID and CTC with and without redundant
sensors) have been studied.
Figure 3: PID Scheme with Extra Sensors
Model parameters have been identified as follows. Kinematic Parameters: L = 0.35, L1 = 0.15,
∗
T
e =T ·e
(7) L2 = 0.3, l1 = 0.25, l2 = 0.2. Link Center of mass
position:lcL1 = 0.071, lcL2 = 0.146, lcl1 = 0.11,
Based on this idea, the CTC Scheme can be mod- lcl2 = 0.1. Link masses: mL1 = 0.1680250925,
ified to consider passive joint sensor information, as ml1 = 0.2781673197, ml2 = 0.2593537503, mL2 =
it can be seen in Figure 4. In this case, the 4 error 0.5218302134. Passive sensor masses: msi =
signals are multiplied by T T · Dr in the feedback 6.5578861688 · 10−2 with i = 1, 2, 3. Link inerloop to obtain a 2 Dimensional decoupled error sig- tia moments: IL1 = 0.0014, IL2 = 0.0160,Il1 =
nal. The IDM, defined on equation (6), needs the 0.0052, Il2 = 0.0041. Sensor inertia moments:
163
ordinate. In the case of PID control scheme, ITAE
index variation percentaje negative, but near zero.
This implies that the proposed scheme has similar
performance than classical one in that coordinate,
due to model uncertainties. However, due to gravity effect, in y coordinate, the improvement is much
higher. Finally, the use of passive joint sensors also
increases the robustness to model parameter uncertainity. This proves the effectiveness of the approach.
5
Real-Time Implementation
on Labview-RT
The previously proposed strategies are going to be
implemented in the 5R parallel robot prototype designed and constructed on the Control Engineering
Isi = 3.6877748727 · 10−6 with i = 1, 2, 3. All data Department, with the collaboration of the Compin IMS units.
Mech Research Group of the Department of MePID control has been experimentally tuned, ob- chanical Engineering. The prototype structure,
taining the following parameters: Kp1 = 300, (Figure 6) is made by four aluminium links. The
Kd1 = 10, Ki1 = 0.1 and Kp2 = 1000, Kd2 = 20, actuated joints are driven by two Maxon EC32 moKi2 = 0.05. The Kp and Kd gains on CTC Con- tors and controlled by respective EPOS 24/5 Potrol Scheme have been tuned to obtain a maximum sition Controllers. The three passive joints have
overshoot of 10% and a peak time of 1ms.
been sensorized with absolute encoders (500 pulse
An example nonsingular trajectory has been de- per turn).
fined on cartesian coordinates. Using the Inverse
Kinematics defined in Section 2.1.2, the trajectories for the 4 active and passive joints have been derived, and introduced in the control schemes. The
comparative analysis of control schemes consists of
studying the ISE, IAE and ITAE performance indexes of the end-effector positioning error. To show
the effect of parameter uncertainty, the model parameters have been randomly modified by 10% of
ther actual value an 10 iterations of co-simulation
have been run. Results are summarized in Table 1.
Table 2 illustrates the relative improvement percentage of the extended approaches considering
passive sensor data and traditional schemes. The
first conclusion, as stated previously by other authors, is that model-based CTC control scheme improves the control performance of the monoarticu- Figure 6: 5R Prototype with Aluminium structure
lar PID control. In second place, results show that
when passive joint sensors are introduced, control
The control system is implemented in a commerperformance of the corresponding control scheme cial PC running Labview-RT and using a classical
considering only active joints is improved. it is also Host-Target architecture. The communication with
worth noting that, refering to ITAE index, the im- the motor controllers is made using CANOpen proprovement is higher in y coordinate than in x co- tocol. The passive joint sensors are read using RSFigure 5: Reference Trajectory
164
Ext. CTC
CTC
Ext. PID
PID
Mean of Performance Index of TCP positioning error
x-coordinate
y-coordinate
ISE
IAE
ITAE
ISE
IAE
ITAE
1, 02518 · E − 7
6, 88278 · E − 8 0.0113
0.0106
0.0071
0.0041
1, 61019 · E − 7
1, 27142 · E − 7 0.0144
0.0131
0.0087
0.0066
0.0011
0.0009
5.7115
4.9227
8.6435
8.4185
0.0013
0.0124
5.7027
17.3301
8.7030 21.0786
Table 1: CoSimulation Results
CTC
PID
Extended vs Classical Perf. Index Variation %
x-coordinate
y-coordinate
ISE
IAE
ITAE
ISE
IAE
ITAE
36.33 45.86
21.81
19.32
18.68
37.95
12.54 92.98
-0.15
71.59
0.68
60.06
Table 2: Extended vs Classical Scheme Variation %
232 communication protocol. The communication
architecture can be seen in Figure 7.
To implement the control loop, a cascade control
scheme is proposed. Maxon’s EPOS Position Controllers implement the inner velocity control loop,
while the target PC, running Labview-RT, generates the references for this loop and implements the
outer loop, implementing the position loop using
proposed PID or CTC control.
parallel robots has been introduced. With the
extra data, improved robustness against modelparameter uncertainty can be achieved. Experiments show that both in monoarticular and CTC
control schemes, the end-effector positioning errors
are reduced. Besides, a Real-Time framework for
this control schemes is introduced. Future work in
this area covers the implementation in Real Time
using the framework and prototype introduced and
accuracy analysis of the control schemes proposed
in this paper.
Acknowledgments
This work was been funded by the Education, University and Research Department of the Basque
Government (BF106.2R129 research fellowship).
The authors would like to thank the Department
of Mechanical Engineering for their support.
This Article has been sent to the ISR2008.
Figure 8: Control Distribution
6
Conclusions
Model-Based Control Schemes are necessary in parallel robotics in order to achieve accuracy at highspeed operation. However, model parameter identification turns difficult. In most cases, parameters are estimated and errors in modelling introduced. In this paper, a novel CTC control
scheme considering the passive joint sensors for
References
[1] C. Brecher, T. Ostermann, and D.A. Friedrich. Control
concept for pkm considering the mechanical coupling
between actuator. Proceedings of the 5th Chemnitz
Parallel Kinematics Seminar, pages 413–427, 2006.
[2] P. Chiacchio, F. Pierrot, L. Sciavicco, and B. Siciliano.
Robust design of independent joint controllers with experimentation on a high-speed parallel robot. IEEE
Transactions on Industrial Electronics, 40(4):393–403,
1993.
[3] A. Codourey. Dynamic modeling of parallel robots for
computed-torque control implementation. The International Journal of Robotics Research, 17(12):1325–1336,
1998.
165
Figure 7: Communication Architecture
[4] B. Denkena and C. Holz. Advanved position and force
control concepts for the linear direct driven hexapod
palida. Proceedings of the 5th Chemnitz Parallel Kinematics Seminar, pages 259–377, 2006.
[5] R. Garrido, D. Torres-Cruz, and J.C. Martinez-García.
Control visual de robots paralelos planares. XI Latinoamerican Control Congress, 2004.
[6] F. Ghorbel. Modeling and pd control of closed-chain
mechanical systems. Proceedings of the 34th IEEE
Conference on Decision and Control, pages 549–542,
1995.
[7] F. Ghorbel, O. Chételat, R. Gunawardana, and
R. Longchamp. Modeling and set point control of
closed-chain mechanisms: Theory and experiment.
IEEE Transactions on Control System Technology,
8(5):801–815, 2000.
[8] R. Gunawardana and F. Ghorbel. Pd control of closedchain mechanical systems: An experimental study. Proceedings of the 5th IFAC Symposium on Robot Control
(SYROCO’97), pages 79–84, 1997.
[9] M. Honegger, A. Codourey, and E. Burdet. Adaptive
control of the hexaglide, a 6 dof parallel manipulator.
IEEE International Conference in Robotics and Automation, 1997.
[12] F. Marquet, O. Company, S. Krut, and F. Pierrot. Enhancing parallel robots accuracy with redundant sensors. Proceedings of the 2002 IEEE International Conference on Robotics and Automation, pages 4114–4119,
2002.
[13] J. P. Merlet. Still a long way to go on the road for
parallel mechanisms. Proceedings of the ASME 2002
DETC Conference., 2002.
[14] J. P. Merlet.
Springer, 2006.
Parallel Robots (Second Edition).
[15] J Murray and G. Lovell. Dynamic modeling of closedchain robotic manipulators and implications for trajectory control. IEEE Transactions on Robotics and Automation, 5(4):522–528, 1989.
[16] Y. Nakamura and M. Godoussi. Dynamics computation of closed-link robot mechanisms with nonredundant and redundant actuators. IEEE Transactions on
Robotics and Automation, 5(3):294–302, 1989.
[17] I.T. Pietsch, M. Krefft, O.T. Becker, C. C. Bier, and
J. Hesselbach. How to reach the dynamic limits of parallel robots? an autonomous control approach. IEEE
Transactions on Automation Science and Engineering,
2:369–380, 1995.
[10] Q. Li. Error atenuation in the control of a parallel robot
manipulator using a dual-model-based structure. Journal of Mechanical Engineering Science, 217(2):161–
171, 2003.
[18] Y. X. Su, D. Sun, L. Ren, X. Wang, and J. K. Mills.
Nonlinear pd synchronized control for parallel manipulators. Proceedings of the 2005 IEEE International
Conference on Robotics and Automation, pages 1374–
1379, 2005.
[11] Q. Li and F. Wu. Control performance improvement
of a parallel robot via the design for control approach.
Mechatronics, 14:947–964, 2004.
[19] A. Vivas and P. Poignet. Control predictivo de un robot
paralelo. RIAI, 1:46–53, 2004.
Índice de autores
Aldea Rivas, M., 67
Almeida, L., 122
Alonso, A., 21
Ayza, J., 155
Balbastre, P., 45, 55
Basanta-Val, P., 122
Bernat, G., 29
Briones, J. F., 21
Luque, R., 101
Marchand, A., 55
Marcos, M., 78, 159
Martı́, P., 155
Medina, J. L., 108
Miguel, M. A. de, 21
Moreno, C., 29
Orive, D., 78, 159
Cabanes, I., 159
Colin, A., 29
Crespo, A., 45, 55
Dı́az, M., 101
Drake, J. M., 108
Estévez, E., 78
Estévez-Ayres, I., 122
Esteves, J., 29
Fuertes, J. M., 155
Garcı́a, C., 29
Garcı́a-Valls, M., 122
Garrido, D., 101
González Harbour, M., 67, 84
Guàrdia, J., 155
Gutiérrez, J. J., 84
Hansson, H., 11
Hernández-Orallo, E., 38
Hernek, M., 29
Holgado, J. A., 133
Holsti, N., 29
Pérez, F., 78, 155
Pérez, H., 84
Pacheco, P., 108
Piedrafita Moreno, R., 3
Pinto, C., 159
Proenza, J., 11
Puente, J. A. de la, 72
Pulido, J. A., 72
Ripoll, I., 45, 55
Rodrı́guez-Navas, G., 11
Sangorrı́n, D., 84
Silva, J. P., 21
Urueña, S., 72
Vardanega, T., 29
Velasco, M., 155
Viúdez, J., 133
Vila-Carbó, J., 38
Villà, J., 155
Villaroel Salcedo, J. L., 3
Yépez, J., 155
López Martı́nez, P., 108
Llopis, L., 101
Lozoya, C., 155
Zamorano, J., 72
Zubizarreta, A., 159

XI Jornadas de Tiempo Real - Departament de Matemàtiques i

Transcripción

Documentos relacionados

Sts. Peter and Paul Catholic Church Sanctuary of St. Toribio Romo

PBSKids.org/read