Investigation and Improvement on the Impact of TLB misses in Real
Transcripción
Investigation and Improvement on the Impact of TLB misses in Real
Investigation and Improvement on the Impact of TLB misses in Real-‐‑‒Time Systems Takuya Ishikawa, Toshikazu Kato, Shinya Honda and Hiroaki Takada Nagoya University, Japan ERTL Nagoya University Embedded and Real-Time systems Lab. Introduction (1/2) l Memory protection for real-‐‑‒time systems is needed to ensure safety l l detects illegal memory access requires hardware support l Memory Management Unit (MMU) l performs address translation with a page table ü mapping a virtual address to a physical address ü contains a field for access permission l Translation Lookaside Buffer (TLB) ü a page table entry is cached ➔ to improve address translation speed ➔ a page table is in main memory ü TLB miss : TLB does not contain a required entry ➔ a required entry is obtained from a page table (page table walk) and loaded it to a TLB entry ERTL Nagoya University Embedded and Real-Time systems Lab. 2 Introduction (2/2) l In real-‐‑‒time systems l worst case execution time (WCET) is important for schedulability analysis l WCET is pessimistically estimated ü it is difficult to estimate the exact WCET l TLB have the potential to make WCET more pessimistic l it is difficult to predict the occurrence of TLB misses l Impact of TLB misses on WCETs should be studied l predictability of WCETs with TLB should be improved ERTL Nagoya University Embedded and Real-Time systems Lab. 3 Overview of the Research 1. l 2. l l Impact of TLB misses on WCETs is evaluated execution time distribution of a real-‐‑‒time task in a micro-‐‑‒benchmark with TLB is measured Methods to improve WCETs with TLB are proposed take advantage of TLB locking ü inhibit the occurrence of TLB misses related to a specified TLB entry 2 types of methods: static one and dynamic one ü enable a WCET of a real-‐‑‒time task to be estimated ERTL Nagoya University Embedded and Real-Time systems Lab. 4 Evaluation Environment l target processor: SH7750R(235MHz) l MMU with TLB which has 64 entries for instruction and data ü page size is 4096 bytes ü each entry has ASID (address space identifier) ➔ TLB needs not to be flushed with context switch l TLB miss exception is handled by software (RTOS) ü TOPPERS/HRP2 kernel is used as RTOS ➔ ASID identifies a currently running application ➔ each application has its own address space ➔ HRP2 changes ASID with context switch l to make it easier to evaluate an impact of TLB misses l l cache is disabled (turned off) interrupts are occurred by only a timer, and the result of evaluation does not include this interrupt execution time ERTL Nagoya University Embedded and Real-Time systems Lab. 5 Micro-‐‑‒Benchmark Software (1/2) l includes automotive application software and multimedia application software l l l consideration of automotive software integration ü ex.) both automotive control systems and automotive navigation systems are in the same electronic control unit automotive software is real-‐‑‒time application multimedia software is non-‐‑‒real-‐‑‒time application l benchmark software is included in EEMBC benchmark* l l automotive software is AutoBench multimedia software is DENBench * EEMBC – The Embedded Microprocessor Benchmark Consortium, http://www.eembc.org/ ERTL Nagoya University Embedded and Real-Time systems Lab. 6 Micro-‐‑‒Benchmark Software (2/2) l There are 5 real-‐‑‒time tasks (RT_̲TASK1,2,...5) and 1 non-‐‑‒real-‐‑‒time task (NR_̲TASK) l each task has different ASIDs (is allocated different application) l Execution flow l l l NR_̲TASK is activated firstly RT_̲TASK1 is activated periodically RT_̲TASKn executes the own benchmark, and activates next RT_̲TASKn+1 ü RT_̲TASKs are activated 10,000 times 50ms realtime RT_TASK1 task ID benchmark name memory size (bytes) RT_TASK2 RT_TASK1 ttsprk01 58,164 RT_TASK2 a2time01 6,564 RT_TASK3 rspeed01 4,088 RT_TASK4 puwmod01 13,392 RT_TASK5 canrdr01 9,360 NR_TASK djpegv2data4 535,332 RT_TASK3 RT_TASK4 nonrealtime RT_TASK5 NR_TASK ERTL Nagoya University Embedded and Real-Time systems Lab. time 7 Impact of TLB misses on WCETs EH [%] l evaluation metrics: ET − EH l l l l EH : execution time of TLB miss exception handlers ET : execution time of an evaluated task a ratio of the execution time of TLB miss exception handlers to that of an evaluated task with no occurrence of TLB misses evaluate the average and maximum value l evaluated task l RT_̲TASK1(ttsprk01) ü constant input value, which maximizes the execution time l RT_̲TASK1 with random input value and RT_̲TASK1 to RT_̲TASK5 with random input value (omitted in this paper...) l average execution time of each TLB miss exception handler l 16.9 microseconds ERTL Nagoya University Embedded and Real-Time systems Lab. 8 Result of Evaluation l execution time distribution and metrics value (times) (回) 10000 no TLB miss 1000 maximum metrics value : 8.13 % measured WCET : 15 TLB misses each TLB miss exception: about 17 microsecond Frequency 発 生 100 頻 度 10 average metrics value : 2.91 % 1 3130 3150 3170 3190 3210 3230 3250 3270 3290 3310 3330 3350 3370 3390 (μ秒) (microseconds) 実行時間time Execution l the impact of TLB misses on the WCET can be considerable ERTL Nagoya University Embedded and Real-Time systems Lab. 9 Improvement on WCET with TLB Locking l TLB locking l makes a specified TLB entry not to be replaced ü inhibits the occurrence of TLB misses l l assigns a normal entry of TLB as a locked entry ü after locked entry is released, that can be used as normal one implemented in an RTOS (on SH7750R processors) ü some of ARM processors implement in hardware l proposed TLB locking methods l l reduce TLB misses while a real-‐‑‒time task is running improve the WCET of a real-‐‑‒time task TLB ASID 1 normal entry can be replaced … 2 1 locked entry … can not be replaced 1 ERTL Nagoya University Embedded and Real-Time systems Lab. virtual page 0x80100 … 0xc0200 0xc0f00 … 0x80108 physical page 0x80100 … 0xc0200 0xc0f00 … 0x80108 attribute RX, Non-Shared … RW, Non-Shared RW, Shared … RX, Non-Shared 10 Static Locking Method l locks the target TLB entries during system initialization these entries remain locked while system is running TLB locking increase only the execution time of system initialization response time of an entire system can be increased ü TLB entries for NR_̲TASK are reduced, and TLB misses can be increased while NR_̲TASK is running l RT_TASK NR_TASK Normal Entry Locked Entry for RT_TASK TLB used only RT_TASK ERTL Nagoya University Embedded and Real-Time systems Lab. time 11 Dynamic Locking Method l locks the target TLB entries when a job of a RT_̲TASK is activated, and releases these entries when a job ends released entries can be used as normal entries TLB misses while NR_̲TASK is running can be less than static locking method execution time of RT_̲TASK can be increased, because of TLB locking and releasing ü prepare a table for locked TLB entries instead of a page table walk l lock TLB entries release locked entries RT_TASK TLB lock, or release NR_TASK NR_TASK can use all TLB entries Normal Entry Locked Entry for RT_TASK TLB ERTL Nagoya University Embedded and Real-Time systems Lab. time 12 TLB entry invalidation l an entry which indicates the same virtual page as a locked entry shall be invalidated before locking l different TLB entries indicates the same virtual page: exception ASID 1 Normal Entry … 2 1 1 Locked Entry TLB virtual page 0x80100 … 0xc0200 0xc0100 0x80f00 … … … … … … lock VP = 0x80100 (ASID = 1) ASID 1 … 2 1 1 TLB virtual page 0x80100 … 0xc0200 0x80100 0x80f00 … … … … … … different entries indicates the same virtual page exception occurs if not invalidate ... l static locking method: invalidate all TLB entries l l invalidation has little impact on subsequent TLB misses invalidate all entries by one memory access (to memory mapped MMU register) l Dynamic locking method: invalidate not all TLB entries simply l invalidation has much impact on entire TLB misses ERTL Nagoya University Embedded and Real-Time systems Lab. 13 Dynamic Invalidation Method I l invalidates only TLB entries related to locked entries l l does not invalidate unrelated entries invalidation processes are increased by the number of locked entries search and invalidate entry for VP = 0x80100 (ASID = 1) Normal Entry Locked Entry ASID 1 … 2 1 1 TLB virtual page 0x80100 … 0xc0200 0xc0100 0x80f00 … … … … … … lock VP = 0x80100 (ASID = 1) ASID TLB virtual page … … 2 1 1 … 0xc0200 0xc0100 0x80f00 … … … … invalidation ASID TLB virtual page … … 2 1 1 … 0xc0200 0x80100 0x80f00 … … … … TLB locking iterate this process for each locked entry ERTL Nagoya University Embedded and Real-Time systems Lab. 14 Dynamic Invalidation Method II l invalidate all TLB entries l saves all TLB entries to memory before invalidation l restores all TLB entries from memory when locked entries are released ü TLB entries for NR_̲TASK are not expired ü saving and restoring all entries can be increase the response time TLB locking TLB ASID virtual page … TLB ASID virtual page 2 0x80200 … Normal … 2 0xc0200 Entry 1 0xc0100 2 0x80202 save entries … … … … … … invalidate all entries copy of entries ASI D 仮想ページ番号 2 0x80200 … … … … 2 … 1 0xc0100 … 2 0x80202 … ERTL Nagoya University Normal Entry 1 1 0x80100 0x80f00 … Locked … Entry TLB locking TLB releasing … 0xc0200 in main memory TLB ASID virtual pgae … restore entries Embedded and Real-Time systems Lab. ASID virtual page 2 0x80200 … … 2 0xc0200 1 0xc0100 2 0x80202 … … … … … … 15 Evaluation for TLB Locking l apply proposed methods to the micro-‐‑‒benchmark l evaluation items 1. 2. 3. execution time distribution of RT_̲TASK and metrics value execution time of each dynamic locking method execution efficiency of the entire micro-‐‑‒benchmark ü measure response time, execution time and the number of TLB misses of NR_̲TASK l apply TLB locking to ü RT_̲TASK1 with the constant input ➔ 16 pages are locked ü RT_̲TASK1 to RT_̲TASK5 (all RT_̲TASKs) with the random inputs ➔ 49pages are locked ERTL Nagoya University Embedded and Real-Time systems Lab. 16 Result of Execution Time of RT_̲TASK1 l execution time distribution and metrics value (回) (times) 10000 improve the impact of TLB misses: 8.13 % TLBロッ ク未適用 w/o TLB locking TLBロッ w/ TLBク適用 locking 1000 Frequency 発 生 頻 度 100 対 数 目 盛 10 1 3130 average and maximum metrics value: 0 % 3150 3170 3190 3210 3230 3250 3270 3290 3310 3330 3350 3370 3390 (μ秒) (microseconds) 実行時間time Execution l the metrics value are 0 % in all cases l all TLB locking methods can get rid of TLB misses for RT_̲TASK ERTL Nagoya University Embedded and Real-Time systems Lab. 17 Result of Execution Time of Locking (μ秒) (microseconds) 1200 gradients: increase 17 microseconds / page TLB miss exception TLBミ ス例外処理 1000 dynamic method I 動的ロッ ク手法① Execution time dynamic method II 動的ロッ ク手法② 800 save and restore all entries: 268 microseconds 処 理 600 時 間 gradients: increase 4.2 microseconds / page 400 200 gradients: increase 9.6 microseconds / page 0 1 l l 6 11 16 21 26 31 36 41 46 ページ数 Number of pages 51 56 61 66 dynamic method I is effective if locked pages are less dynamic method II is effective if locked pages are more ü threshold is 52 ERTL Nagoya University Embedded and Real-Time systems Lab. 18 1.2 1 relative time ratio 時 間 apply to all RT_TASKs (lock 49 pages) 相 対 相 0.8 比 対 0.6 ミ 比 ス 時 0.4 回 間 数 0.2 0 動的② static w/ 動的① dynamic w/ dynamic w/oなし lock w/ 静的 method method I method II TLBロック手法 TLB locking method − static locking method 1.07 1.06 1.05 1.04 1.03 1.02 1.01 1 0.99 0.98 0.97 0.96 16 14 relative TLB miss ratio 相 対 比 1.004 1.003 1.002 1.001 1 0.999 0.998 0.997 0.996 0.995 apply to RT_TASK1 (lock 16 pages) relative TLB miss ratio relative time ratio Result for NR_̲TASK 相 対 10 比 12 8 ミ 6 ス 回 4 数 2 0 Number of TLBミ ス回数 TLB misses response time 応答時間 execution 実行時間time dynamic なし w/ static 静的 w/ dynamic 動的① w/動的② w/o lock method method I method II TLBロック手法 TLB locking method * vertical axis indicates ratio to w/o TLB locking • difference is about 0.1 % if the number of locked page is less ü decrease response time because decrease execution time of RT_̲TASK1 • increase response time, execution time and TLB misses if the number of locked page is more − dynamic locking method • method II has less impact on NR_̲TASK than method I saving and restoring TLB entries are effective for execution of NR_̲TASK ERTL Nagoya University Embedded and Real-Time systems Lab. 19 Conclusion l evaluate the impact of TLB misses on WCETs l TLB misses can have a considerable impact on WCETs of real-‐‑‒time tasks l propose the TLB locking methods l improve WCETs of real-‐‑‒time tasks ü lock entries related to real-‐‑‒time tasks l static method and 2 dynamic methods are proposed ü evaluation shows features of methods l Future Works l the impact of TLB misses and proposed methods should be evaluated with other environments ü different task sets, hardware TLB management, enabled cache, ... l methods to select locked entries should be considered ü for lack of lockable TLB entries ERTL Nagoya University Embedded and Real-Time systems Lab. 20 21 ERTL Nagoya University Embedded and Real-Time systems Lab. 22 Page Table Walk l implementation in hardware l a TLB miss is handled by hardware ü overhead is less l format of page table is defined by hardware ü hierarchical page table l TLB controls by software are limited ü ex) TLB flush is enabled l adopted by ARM, SPARC V8 architecture l implementation in software l a TLB miss is handled by software ü overhead is more l format of page table is freedom, and TLB can be controlled by software ü flexible l adopted by SH, MIPS architechture ERTL Nagoya University Embedded and Real-Time systems Lab. 23 Page Table in HRP2 on SH7750R address space .data_app1 .bss_app1 page table application 2 application 1 RW, Non-Shared .data_app2 .bss_app2 RW, Non-Shared .text_app1 .rodata_app1 RX, Non-Shared .text_shared .rodata_shared RX, Shared ERTL Nagoya University Embedded and Real-Time systems Lab. RX, Shared 24 MMU in SH7750R l paging virtual address space (32 bit) l page size is 1KB, 4KB, 64KB or 1MB l TLB l l instruction TLB (ITLB) : 4 entries, fully associative ü upper level of UTLB, controlled by hardware shared TLB (UTLB): 64 entries, fully associative ü software page table walk l UTLB controlled by l MMUCR LDTLB instruction ü replaced entry is specified by MMUCR.URC ➔ each memory access increments l access to memory mapped TLB ü UTLB is mapped to memory ERTL Nagoya University Embedded and Real-Time systems Lab. URC PTEH PTEL LDTLB UTLB 0 : n : 63 25 TLB locking in SH7750R l TLB locking process l invalidate the same virtual page entry in TLB ü stop CPU exception l modify MMUCR ü MMUCR.URB configures area where URC is ranged l load page table entry to UTLB ü set PTEH, PTEL and MMUCR.URC and issue LDTLB instruction l data and code for TLB locking are in TLB-‐‑‒ invalidated area l special virtual address space l interrupt and exception is disabled while TLB locking ERTL Nagoya University Embedded and Real-Time systems Lab. UTLB 0 : (n-1) n : 63 Normal Entry Locked Entry URC is ranged in this area URB 26 Address Space of SH7750R ERTL Nagoya University Embedded and Real-Time systems Lab. 27 Static Locking Process entry disable int TLB flush invalidate all TLB entries save ASID critical section complete TLB locking ? YES restore ASID and set MMUCR.URC to 0 NO set a page table entry to PTEH and PTEL set MMUCR.URB and MMUCR.URC obtain the entry for the target page modify locked area LDTLB instruction enable int exit ERTL Nagoya University Embedded and Real-Time systems Lab. 28 Dynamic Locking Process I l lock l release entry disable int save ASID each entry for locked page is invalidated entry disable int complete TLB lock ? YES NO set MMUCR.URB to 0 invalidate TLB entry set a page table entry to PTEH and PTEL restore ASID and set MMUCR.URC to 0 release all locked entries enable int exit set MMUCR.URB and MMUCR.URC enable int LDTLB instruction exit ERTL Nagoya University Embedded and Real-Time systems Lab. 29 Dynamic Locking Process II l lock l release entry disable int saving TLB save all TLB entries TLB flush save ASID entry disable int invalidate all TLB entries set MMUCR.URB to 0 restore all TLB entries complete TLB lock ? YES restore ASID and set MMUCR.URC to 0 restoring TLB NO enable int set a page table entry to PTEH and PTEL exit set MMUCR.URB and MMUCR.URC enable int LDTLB instruction exit ERTL Nagoya University Embedded and Real-Time systems Lab. 30 TLB miss overhead l measure execution time from entry of TLB miss exception handler to end ( 回) (times) 2500 Frequency 2000 発 1500 生 頻 度 1000 500 16.5 17.0 17.5 Execution time ERTL Nagoya University ( μ秒) (microseconds) 0 Embedded and Real-Time systems Lab. 18.0 実行時間 18.5 19.0 31 Evaluation for Impact of TLB miss l RT_̲TASK1 with random inputs (%) (%) 90 execution time is several hundreds microseconds 80 平均影響度 average value average: 11.01 % maximum: 76.28 % 70 metrics value 60 影 響 度 最大影響度 maximum value execution time is several thousand microseconds 50 40 30 20 10 0 0 6 12 18 24 30 36 入力値セット番号 42 48 54 60 ID of execution l RT_̲TASK1 to RT_̲TASK5 (entire RT_̲TASKs) l average: 10.21%, maximum: 36.91% ERTL Nagoya University Embedded and Real-Time systems Lab. 32 Result of Evaluation for TLB Locking method WCET overhead execution efficiency less pages more pages less pages more pages static locking ○ ○ ○ △ × dynamic locking I ○ ○ △ △ △ dynamic locking II ○ × △ ○ ○ l Conclusion in this evaluation l l l TLB locking is efficient for improvement on WCETs static locking method is effective if the number of locked entries is less dynamic locking method II is effective if the number of locked entries is more ERTL Nagoya University Embedded and Real-Time systems Lab. 33 Related Works l Cache miss and cache lock [1][2][3][4][5] l Page fault [6] l TLB miss [7] [1] Isabelle Puaut and David Decotigny. Low-Complexity Algorithms for Static Cache Locking in Multitasking Hard Real-Time Systems. [2] Antonio Marti Campoy, Isabelle Puaut, Angel Perles Ivars, and Jose Vicente Busquets Mataix. Cache Contents Selection for Statically-Locked Instruction Caches: An Algorithm Comparison. [3] Heiko Falk, Sascha Plazar, and Henrik Theiling. Compile-time decided instruction cache locking using worst-case execution paths. [4] Tiantian Liu, Minming Li, and Chun Jason Xue. Minimizing WCET for Real-Time Embedded Systems via Static Instruction Cache Locking. [5] Antonio Marti Campoy, Angel Perles Ivars, Francisco Rodriguez, and Jose Vicente Busquets Mataix. Static use of locking caches vs. dynamic use of locking caches for real-time systems. [6] Isabelle Puaut and Damien Hardy. Predictable Paging in Real-Time Systems: A Compiler Approach. [7] CHEN Chang-Jiu and CHENG Wei-Min. Reducing the TLB Context Switching Miss Ratio With Banked and Prefetching Mechanism. ERTL Nagoya University Embedded and Real-Time systems Lab. 34