FreeBSD/arm Superpages for ARMv7
this is a historical page.
Engineer
Mentor
Abstract
The objective of this project is to provide FreeBSD/arm with the superpages support.
Indicated functionality is intended to work on all ARMv7 based processors, however the reference hardware platform for this development would be a Pandaboard - a popular system, widely available, based on Cortex-A9 ARMv7 CPU core.
Problem description
ARM architecture is more and more prevailing, not only in the mobile and embedded space. Among the more interesting industry trends emerging in the recent months has been the "ARM server" concept. Some top tier companies started developing systems like this already (Dell, HP).
Key to FreeBSD success in these new areas are sophisticated features, among them are superpages, which allow for efficient use of TLB translations so they cover large physical regions, leading to improved performance in many applications and scalability.
Contemporary ARM architecture (ARMv7, the upcoming ARMv8) is already on on par with the traditional PC architecture in terms of advanced CPU features (MMU, multi level cache, TLB, multi core, hardware coherency and similar) and in particular can make use of transparent superpages support.
Milestones
M1 |
Clean-ups of pmap-v6.c module |
100% |
M2 |
Initial implementation of superpages on ARM |
100% |
M3 |
Performance measurements, testing, documentation |
100% |
M4 |
Final integration with the FreeBSD source repo |
100% |
Clean-ups of pmap-v6.c module
Completed tasks:
- Port of the PV entry allocator
- Switch to "AP[2:1]" access permissions model
- PTE based, page referenced/modified emulation
- Fixes regarding page replacement strategy
- Code optimizations and bug fixes
- Integration of the current work to the FreeBSD HEAD
- Implementation of the pmap_copy()
Initial implementation of superpages on ARM
Completed tasks:
- Support for multiple page sizes
- 4KB and 1MB section mappings are allowed
- Implement basic page promotion and demotion mechanisms
- Promotion to 1MB section
- Demotion from 1MB page to 4KB pages
- 1MB section creation and removal
- PV entry management for superpages
- Enable generic, reservation based allocation
- Bring-up of the functionality
- Adjust pmap-v6 infrastructure to utilize superpages' mechanisms
Performance measurements and testing
Currently pmap-v6 is suffering from the demotion problem caused by the continuous active queue scanning in VM.
Due to that all tests were performed with vm.pageout_update_period set to the number greater than available pages in the system. This effectively disables the actively queue scanning.
GUPS benchmark
GUPS (Giga Updates Per Second) measures how frequently system can issue updates to randomly generated memory locations.
In particular GUPS measures both memory latency and bandwidth capabilities.
Test |
CPU Time used [s] |
Real time used [s] |
Updates per second [bn/s] |
SP support |
1 |
146,421875 |
146,420915 |
0,003666627 |
Disabled |
2 |
146,476562 |
146,476513 |
0,003665235 |
Disabled |
3 |
146,398438 |
146,396621 |
0,003667236 |
Disabled |
4 |
146,695312 |
146,699617 |
0,003659661 |
Disabled |
5 |
96,453125 |
96,450370 |
0,005566292 |
Enabled |
6 |
96,429688 |
96,426973 |
0,005567643 |
Enabled |
7 |
96,953125 |
96,948327 |
0,005537702 |
Enabled |
8 |
96,421875 |
96,423033 |
0,005567870 |
Enabled |
Improvement |
34% |
34% |
52% |
LMbench
LMbench is a popular suite of system performance benchmarks.
Memory bandwidth and latency tests can be used for the purpose of superpages verification.
LMbench uses STREAM testing program to examine memory performance. Results are differentiated by type of operation.
Mmap reread [MB/s] |
Bcopy (libc) [MB/s] |
Bcopy (hand) [MB/s] |
Mem read [MB/s] |
Mem write [MB/s] |
Mem latency [ns] |
SP support |
||||
645,4 |
305,4 |
432,3 |
681 |
3043 |
238,8 |
Disabled |
||||
660,0 |
312,4 |
446,9 |
696 |
3300 |
148,4 |
Enabled |
||||
Improvement |
||||||||||
2,26% |
2,29% |
3,37% |
2,2% |
8,44 % |
37,85% |
Self host world build
Reduction in the duration of self hosted world build can be observed when using GCC.
No time reduction can be observed when using CLANG despite creation of ~570000 superpages in the process.
GCC |
CLANG |
SP support |
6h 36min |
6h 16min |
Disabled |
5h 14min |
6h 15min |
Enabled |
Repository
The code has been integrated to the FreeBSD HEAD.
Superpages support SVN Revision
References/Links
Practical, transparent operating system support for superpages PAPER