Investigation and Improvement on the Impact of TLB misses in Real

Transcripción

Investigation and Improvement on the Impact of TLB misses in Real
Investigation and Improvement on the Impact of TLB misses in Real-‐‑‒Time Systems Takuya Ishikawa, Toshikazu Kato, Shinya Honda and Hiroaki Takada Nagoya University, Japan ERTL Nagoya University
Embedded and Real-Time systems Lab.
Introduction (1/2)
l  Memory protection for real-‐‑‒time systems is needed to ensure safety
l 
l 
detects illegal memory access
requires hardware support
l  Memory Management Unit (MMU)
l  performs address translation with a page table
ü  mapping a virtual address to a physical address
ü  contains a field for access permission
l  Translation Lookaside Buffer (TLB)
ü  a page table entry is cached
➔  to improve address translation speed ➔  a page table is in main memory
ü  TLB miss : TLB does not contain a required entry
➔  a required entry is obtained from a page table (page table walk) and loaded it to a TLB entry
ERTL Nagoya University
Embedded and Real-Time systems Lab.
2
Introduction (2/2)
l  In real-‐‑‒time systems
l  worst case execution time (WCET) is important for schedulability analysis
l  WCET is pessimistically estimated
ü  it is difficult to estimate the exact WCET
l  TLB have the potential to make WCET more pessimistic
l 
it is difficult to predict the occurrence of TLB misses
l  Impact of TLB misses on WCETs should be studied
l  predictability of WCETs with TLB should be improved
ERTL Nagoya University
Embedded and Real-Time systems Lab.
3
Overview of the Research
1. 
l 
2. 
l 
l 
Impact of TLB misses on WCETs is evaluated
execution time distribution of a real-‐‑‒time task in a micro-‐‑‒benchmark with TLB is measured
Methods to improve WCETs with TLB are proposed
take advantage of TLB locking
ü  inhibit the occurrence of TLB misses related to a specified TLB entry
2 types of methods: static one and dynamic one
ü  enable a WCET of a real-‐‑‒time task to be estimated
ERTL Nagoya University
Embedded and Real-Time systems Lab.
4
Evaluation Environment l  target processor: SH7750R(235MHz)
l  MMU with TLB which has 64 entries for instruction and data
ü  page size is 4096 bytes
ü  each entry has ASID (address space identifier)
➔  TLB needs not to be flushed with context switch
l 
TLB miss exception is handled by software (RTOS)
ü  TOPPERS/HRP2 kernel is used as RTOS
➔  ASID identifies a currently running application
➔  each application has its own address space
➔  HRP2 changes ASID with context switch
l  to make it easier to evaluate an impact of TLB misses
l 
l 
cache is disabled (turned off)
interrupts are occurred by only a timer, and the result of evaluation does not include this interrupt execution time
ERTL Nagoya University
Embedded and Real-Time systems Lab.
5
Micro-‐‑‒Benchmark Software (1/2)
l  includes automotive application software and multimedia application software
l 
l 
l 
consideration of automotive software integration
ü  ex.) both automotive control systems and automotive navigation systems are in the same electronic control unit automotive software is real-‐‑‒time application
multimedia software is non-‐‑‒real-‐‑‒time application
l  benchmark software is included in EEMBC benchmark*
l 
l 
automotive software is AutoBench
multimedia software is DENBench
* EEMBC – The Embedded Microprocessor Benchmark Consortium, http://www.eembc.org/
ERTL Nagoya University
Embedded and Real-Time systems Lab.
6
Micro-‐‑‒Benchmark Software (2/2)
l  There are 5 real-‐‑‒time tasks (RT_̲TASK1,2,...5) and 1 non-‐‑‒real-‐‑‒time task (NR_̲TASK)
l 
each task has different ASIDs (is allocated different application)
l  Execution flow
l 
l 
l 
NR_̲TASK is activated firstly
RT_̲TASK1 is activated periodically
RT_̲TASKn executes the own benchmark, and activates next RT_̲TASKn+1 ü  RT_̲TASKs are activated 10,000 times
50ms realtime RT_TASK1
task ID
benchmark
name
memory size
(bytes)
RT_TASK2 RT_TASK1
ttsprk01
58,164
RT_TASK2
a2time01
6,564
RT_TASK3
rspeed01
4,088
RT_TASK4
puwmod01
13,392
RT_TASK5
canrdr01
9,360
NR_TASK
djpegv2data4
535,332
RT_TASK3 RT_TASK4 nonrealtime RT_TASK5 NR_TASK ERTL Nagoya University
Embedded and Real-Time systems Lab.
time
7
Impact of TLB misses on WCETs
EH
[%]
l  evaluation metrics:
ET − EH
l 
l 
l 
l 
EH : execution time of TLB miss exception handlers
ET : execution time of an evaluated task
a ratio of the execution time of TLB miss exception handlers to that of an evaluated task with no occurrence of TLB misses
evaluate the average and maximum value
l  evaluated task
l  RT_̲TASK1(ttsprk01)
ü  constant input value, which maximizes the execution time
l 
RT_̲TASK1 with random input value and RT_̲TASK1 to RT_̲TASK5 with random input value (omitted in this paper...)
l  average execution time of each TLB miss exception handler
l 
16.9 microseconds
ERTL Nagoya University
Embedded and Real-Time systems Lab.
8
Result of Evaluation
l  execution time distribution and metrics value
(times) (回)
10000
no TLB miss 1000
maximum metrics value :
8.13 % measured WCET :
15 TLB misses each TLB miss exception:
about 17 microsecond Frequency 発
生 100
頻
度
10
average metrics value :
2.91 % 1
3130 3150 3170 3190 3210 3230 3250 3270 3290 3310 3330 3350 3370 3390 (μ秒)
(microseconds) 実行時間time Execution
l 
the impact of TLB misses on the WCET can be considerable
ERTL Nagoya University
Embedded and Real-Time systems Lab.
9
Improvement on WCET with TLB Locking
l  TLB locking
l  makes a specified TLB entry not to be replaced
ü  inhibits the occurrence of TLB misses
l 
l 
assigns a normal entry of TLB as a locked entry
ü  after locked entry is released, that can be used as normal one
implemented in an RTOS (on SH7750R processors)
ü  some of ARM processors implement in hardware
l  proposed TLB locking methods
l 
l 
reduce TLB misses while a real-‐‑‒time task is running improve the WCET of a real-‐‑‒time task
TLB ASID 1 normal entry can be replaced … 2 1 locked entry … can not be replaced 1 ERTL Nagoya University
Embedded and Real-Time systems Lab.
virtual page 0x80100
… 0xc0200
0xc0f00
…
0x80108
physical page 0x80100
…
0xc0200
0xc0f00
…
0x80108
attribute RX, Non-Shared … RW, Non-Shared RW, Shared … RX, Non-Shared 10
Static Locking Method
l  locks the target TLB entries during system initialization
these entries remain locked while system is running
TLB locking increase only the execution time of system initialization response time of an entire system can be increased
ü  TLB entries for NR_̲TASK are reduced, and TLB misses can be increased while NR_̲TASK is running
l 
RT_TASK NR_TASK Normal Entry Locked Entry for
RT_TASK TLB used only RT_TASK ERTL Nagoya University
Embedded and Real-Time systems Lab.
time 11
Dynamic Locking Method
l  locks the target TLB entries when a job of a RT_̲TASK is activated, and releases these entries when a job ends
released entries can be used as normal entries
TLB misses while NR_̲TASK is running can be less than static locking method
execution time of RT_̲TASK can be increased, because of TLB locking and releasing
ü  prepare a table for locked TLB entries instead of a page table walk
l 
lock TLB entries release locked entries RT_TASK TLB lock, or
release
NR_TASK NR_TASK can use
all TLB entries Normal Entry Locked Entry
for RT_TASK TLB ERTL Nagoya University
Embedded and Real-Time systems Lab.
time 12
TLB entry invalidation
l  an entry which indicates the same virtual page as a locked entry shall be invalidated before locking
l 
different TLB entries indicates the same virtual page: exception
ASID 1 Normal Entry … 2 1 1 Locked Entry TLB virtual page 0x80100
…
0xc0200 0xc0100
0x80f00
… … … … … … lock
VP = 0x80100
(ASID = 1) ASID 1 … 2 1 1 TLB virtual page 0x80100
…
0xc0200 0x80100
0x80f00
… … … … … … different entries
indicates the
same virtual page exception occurs if not invalidate ... l  static locking method: invalidate all TLB entries
l 
l 
invalidation has little impact on subsequent TLB misses
invalidate all entries by one memory access (to memory mapped MMU register)
l  Dynamic locking method: invalidate not all TLB entries simply
l 
invalidation has much impact on entire TLB misses
ERTL Nagoya University
Embedded and Real-Time systems Lab.
13
Dynamic Invalidation Method I
l  invalidates only TLB entries related to locked entries
l 
l 
does not invalidate unrelated entries invalidation processes are increased by the number of locked entries
search and
invalidate entry
for
VP = 0x80100
(ASID = 1) Normal Entry Locked Entry ASID 1 … 2 1 1 TLB virtual page 0x80100
…
0xc0200 0xc0100
0x80f00
… … … … … … lock
VP = 0x80100
(ASID = 1) ASID TLB virtual page … … 2 1 1 …
0xc0200 0xc0100
0x80f00
… … … … invalidation ASID TLB virtual page … … 2 1 1 …
0xc0200 0x80100
0x80f00
… … … … TLB locking iterate this process for each locked entry ERTL Nagoya University
Embedded and Real-Time systems Lab.
14
Dynamic Invalidation Method II
l  invalidate all TLB entries
l  saves all TLB entries to memory before invalidation
l  restores all TLB entries from memory when locked entries are released
ü  TLB entries for NR_̲TASK are not expired
ü  saving and restoring all entries can be increase the response time
TLB locking TLB ASID virtual page … TLB ASID virtual page 2 0x80200
…
Normal … 2 0xc0200 Entry 1 0xc0100
2 0x80202
save entries … … … … … … invalidate all entries copy of entries ASI
D 仮想ページ番号 2 0x80200
…
… …
…
2 …
1 0xc0100
…
2 0x80202
…
ERTL Nagoya University
Normal
Entry 1 1 0x80100
0x80f00
… Locked
… Entry TLB locking TLB releasing …
0xc0200 in main memory TLB ASID virtual pgae … restore
entries Embedded and Real-Time systems Lab.
ASID virtual page 2 0x80200
… …
2 0xc0200 1 0xc0100
2 0x80202
… … … … … … 15
Evaluation for TLB Locking
l  apply proposed methods to the micro-‐‑‒benchmark
l  evaluation items
1. 
2. 
3. 
execution time distribution of RT_̲TASK and metrics value
execution time of each dynamic locking method
execution efficiency of the entire micro-‐‑‒benchmark
ü  measure response time, execution time and the number of TLB misses of NR_̲TASK
l  apply TLB locking to
ü  RT_̲TASK1 with the constant input
➔  16 pages are locked
ü  RT_̲TASK1 to RT_̲TASK5 (all RT_̲TASKs) with the random inputs
➔  49pages are locked
ERTL Nagoya University
Embedded and Real-Time systems Lab.
16
Result of Execution Time of RT_̲TASK1
l  execution time distribution and metrics value
(回)
(times) 10000
improve the impact of
TLB misses: 8.13 % TLBロッ
ク未適用
w/o TLB
locking TLBロッ
w/ TLBク適用
locking 1000
Frequency 発
生
頻
度
100
対
数
目
盛
10
1
3130
average and maximum
metrics value: 0 %
3150
3170
3190
3210
3230
3250
3270
3290
3310
3330
3350
3370
3390
(μ秒)
(microseconds) 実行時間time Execution
l  the metrics value are 0 % in all cases
l 
all TLB locking methods can get rid of TLB misses for RT_̲TASK
ERTL Nagoya University
Embedded and Real-Time systems Lab.
17
Result of Execution Time of Locking
(μ秒)
(microseconds) 1200
gradients: increase 17
microseconds / page TLB miss
exception TLBミ
ス例外処理
1000
dynamic method
I 動的ロッ
ク手法①
Execution time dynamic method
II 動的ロッ
ク手法②
800
save and restore all entries:
268 microseconds 処
理 600
時
間
gradients: increase 4.2
microseconds / page 400
200
gradients: increase 9.6
microseconds / page 0
1
l 
l 
6
11
16
21
26
31
36
41
46
ページ数
Number
of pages 51
56
61
66
dynamic method I is effective if locked pages are less dynamic method II is effective if locked pages are more
ü  threshold is 52 ERTL Nagoya University
Embedded and Real-Time systems Lab.
18
1.2
1
relative time ratio 時
間
apply to all RT_TASKs (lock 49 pages) 相
対 相
0.8
比 対
0.6 ミ 比
ス 時
0.4 回 間
数
0.2
0
動的②
static w/ 動的①
dynamic w/
dynamic
w/oなし
lock w/ 静的
method method
I method
II TLBロック手法
TLB locking method − static locking method
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1
0.99
0.98
0.97
0.96
16
14
relative TLB miss ratio 相
対
比
1.004
1.003
1.002
1.001
1
0.999
0.998
0.997
0.996
0.995
apply to RT_TASK1 (lock 16 pages) relative TLB miss ratio relative time ratio Result for NR_̲TASK
相
対
10 比
12
8
ミ
6 ス
回
4 数
2
0
Number of
TLBミ
ス回数
TLB
misses response
time 応答時間
execution
実行時間time dynamic
なし w/ static
静的 w/ dynamic
動的① w/動的②
w/o lock method method
I method
II TLBロック手法
TLB locking method * vertical axis indicates ratio to w/o TLB locking •  difference is about 0.1 % if the number of locked page is less
ü decrease response time because decrease execution time of RT_̲TASK1
•  increase response time, execution time and TLB misses if the number of locked page is more
− dynamic locking method
•  method II has less impact on NR_̲TASK than method I saving and restoring TLB entries are effective for execution of NR_̲TASK
ERTL Nagoya University
Embedded and Real-Time systems Lab.
19
Conclusion
l  evaluate the impact of TLB misses on WCETs
l  TLB misses can have a considerable impact on WCETs of real-‐‑‒time tasks
l  propose the TLB locking methods
l  improve WCETs of real-‐‑‒time tasks
ü  lock entries related to real-‐‑‒time tasks
l  static method and 2 dynamic methods are proposed
ü  evaluation shows features of methods
l  Future Works
l  the impact of TLB misses and proposed methods should be evaluated with other environments
ü  different task sets, hardware TLB management, enabled cache, ...
l  methods to select locked entries should be considered
ü  for lack of lockable TLB entries
ERTL Nagoya University
Embedded and Real-Time systems Lab.
20
21
ERTL Nagoya University
Embedded and Real-Time systems Lab.
22
Page Table Walk
l  implementation in hardware
l  a TLB miss is handled by hardware
ü  overhead is less
l  format of page table is defined by hardware
ü  hierarchical page table
l  TLB controls by software are limited
ü  ex) TLB flush is enabled
l  adopted by ARM, SPARC V8 architecture
l  implementation in software
l  a TLB miss is handled by software
ü  overhead is more
l  format of page table is freedom, and TLB can be controlled by software
ü  flexible
l  adopted by SH, MIPS architechture
ERTL Nagoya University
Embedded and Real-Time systems Lab.
23
Page Table in HRP2 on SH7750R
address space .data_app1
.bss_app1 page table application 2 application 1 RW, Non-Shared .data_app2
.bss_app2 RW, Non-Shared .text_app1
.rodata_app1 RX, Non-Shared .text_shared
.rodata_shared RX, Shared ERTL Nagoya University
Embedded and Real-Time systems Lab.
RX, Shared 24
MMU in SH7750R
l  paging virtual address space (32 bit)
l  page size is 1KB, 4KB, 64KB or 1MB
l  TLB
l 
l 
instruction TLB (ITLB) : 4 entries, fully associative
ü  upper level of UTLB, controlled by hardware
shared TLB (UTLB): 64 entries, fully associative
ü  software page table walk
l  UTLB controlled by l 
MMUCR LDTLB instruction
ü  replaced entry is specified by MMUCR.URC
➔  each memory access increments
l 
access to memory mapped TLB
ü  UTLB is mapped to memory
ERTL Nagoya University
Embedded and Real-Time systems Lab.
URC PTEH PTEL LDTLB UTLB 0 : n : 63 25
TLB locking in SH7750R
l  TLB locking process
l  invalidate the same virtual page entry in TLB
ü  stop CPU exception
l 
modify MMUCR
ü  MMUCR.URB configures area where URC is ranged
l 
load page table entry to UTLB
ü  set PTEH, PTEL and MMUCR.URC and issue LDTLB instruction
l  data and code for TLB locking are in TLB-‐‑‒
invalidated area
l 
special virtual address space
l  interrupt and exception is disabled while TLB locking
ERTL Nagoya University
Embedded and Real-Time systems Lab.
UTLB 0 : (n-1) n : 63 Normal
Entry Locked
Entry URC is
ranged in
this area URB 26
Address Space of SH7750R
ERTL Nagoya University
Embedded and Real-Time systems Lab.
27
Static Locking Process
entry disable int TLB flush invalidate all
TLB entries save ASID critical section complete TLB
locking ? YES restore ASID and set
MMUCR.URC to 0
NO set a page table entry to
PTEH and PTEL set MMUCR.URB
and MMUCR.URC
obtain the entry for
the target page modify
locked area LDTLB instruction
enable int exit ERTL Nagoya University
Embedded and Real-Time systems Lab.
28
Dynamic Locking Process I
l  lock
l  release
entry disable int save ASID each entry for locked
page is invalidated entry disable int complete TLB
lock ? YES NO set MMUCR.URB to 0 invalidate TLB entry set a page table entry to
PTEH and PTEL restore ASID and set
MMUCR.URC to 0
release all
locked entries enable int exit set MMUCR.URB
and MMUCR.URC
enable int LDTLB instruction
exit ERTL Nagoya University
Embedded and Real-Time systems Lab.
29
Dynamic Locking Process II
l  lock
l  release
entry disable int saving
TLB save all TLB entries TLB flush save ASID entry disable int invalidate all
TLB entries set MMUCR.URB to 0 restore all TLB entries complete TLB
lock ? YES restore ASID and set
MMUCR.URC to 0
restoring
TLB NO enable int set a page table entry to
PTEH and PTEL exit set MMUCR.URB
and MMUCR.URC
enable int LDTLB instruction
exit ERTL Nagoya University
Embedded and Real-Time systems Lab.
30
TLB miss overhead
l  measure execution time from entry of TLB miss exception handler to end
(
回)
(times) 2500
Frequency 2000
発 1500
生
頻
度 1000
500
16.5
17.0
17.5
Execution time ERTL Nagoya University
(
μ秒)
(microseconds) 0
Embedded and Real-Time systems Lab.
18.0
実行時間
18.5
19.0
31
Evaluation for Impact of TLB miss
l  RT_̲TASK1 with random inputs
(%)
(%) 90
execution time is several
hundreds microseconds 80
平均影響度
average
value average: 11.01 %
maximum: 76.28 %
70
metrics value 60
影
響
度
最大影響度
maximum value execution time is several
thousand microseconds 50
40
30
20
10
0
0
6
12
18
24
30
36
入力値セット番号
42
48
54
60
ID of execution l  RT_̲TASK1 to RT_̲TASK5 (entire RT_̲TASKs)
l 
average: 10.21%, maximum: 36.91%
ERTL Nagoya University
Embedded and Real-Time systems Lab.
32
Result of Evaluation for TLB Locking
method
WCET
overhead
execution efficiency
less pages
more pages
less pages
more pages
static locking
○
○
○
△
×
dynamic locking I
○
○
△
△
△
dynamic locking II
○
×
△
○
○
l  Conclusion in this evaluation
l 
l 
l 
TLB locking is efficient for improvement on WCETs
static locking method is effective if the number of locked entries is less
dynamic locking method II is effective if the number of locked entries is more
ERTL Nagoya University
Embedded and Real-Time systems Lab.
33
Related Works
l  Cache miss and cache lock [1][2][3][4][5]
l  Page fault [6]
l  TLB miss [7]
[1] Isabelle Puaut and David Decotigny. Low-Complexity Algorithms for Static Cache Locking in
Multitasking Hard Real-Time Systems.
[2] Antonio Marti Campoy, Isabelle Puaut, Angel Perles Ivars, and Jose Vicente Busquets Mataix.
Cache Contents Selection for Statically-Locked Instruction Caches: An Algorithm Comparison.
[3] Heiko Falk, Sascha Plazar, and Henrik Theiling. Compile-time decided instruction cache
locking using worst-case execution paths.
[4] Tiantian Liu, Minming Li, and Chun Jason Xue. Minimizing WCET for Real-Time Embedded
Systems via Static Instruction Cache Locking.
[5] Antonio Marti Campoy, Angel Perles Ivars, Francisco Rodriguez, and Jose Vicente Busquets
Mataix. Static use of locking caches vs. dynamic use of locking caches for real-time systems.
[6] Isabelle Puaut and Damien Hardy. Predictable Paging in Real-Time Systems: A Compiler
Approach.
[7] CHEN Chang-Jiu and CHENG Wei-Min. Reducing the TLB Context Switching Miss Ratio
With Banked and Prefetching Mechanism.
ERTL Nagoya University
Embedded and Real-Time systems Lab.
34

Documentos relacionados