FreeBSD/arm Superpages for ARMv7

Engineer

ZbigniewBodek

Mentor

GrzegorzBernacki

Abstract

The objective of this project is to provide FreeBSD/arm with the superpages support.

Indicated functionality is intended to work on all ARMv7 based processors, however the reference hardware platform for this development would be a Pandaboard - a popular system, widely available, based on Cortex-A9 ARMv7 CPU core.

Problem description

ARM architecture is more and more prevailing, not only in the mobile and embedded space. Among the more interesting industry trends emerging in the recent months has been the "ARM server" concept. Some top tier companies started developing systems like this already (Dell, HP).

Key to FreeBSD success in these new areas are sophisticated features, among them are superpages, which allow for efficient use of TLB translations so they cover large physical regions, leading to improved performance in many applications and scalability.

Contemporary ARM architecture (ARMv7, the upcoming ARMv8) is already on on par with the traditional PC architecture in terms of advanced CPU features (MMU, multi level cache, TLB, multi core, hardware coherency and similar) and in particular can make use of transparent superpages support.

Milestones

M1

Clean-ups of pmap-v6.c module

100%

M2

Initial implementation of superpages on ARM

100%

M3

Performance measurements, testing, documentation

100%

M4

Final integration with the FreeBSD source repo

100%

Clean-ups of pmap-v6.c module

Completed tasks:

Initial implementation of superpages on ARM

Completed tasks:

Performance measurements and testing

GUPS benchmark

GUPS (Giga Updates Per Second) measures how frequently system can issue updates to randomly generated memory locations.

In particular GUPS measures both memory latency and bandwidth capabilities.

Test

CPU Time used [s]

Real time used [s]

Updates per second [bn/s]

SP support

1

146,421875

146,420915

0,003666627

Disabled

2

146,476562

146,476513

0,003665235

Disabled

3

146,398438

146,396621

0,003667236

Disabled

4

146,695312

146,699617

0,003659661

Disabled

5

96,453125

96,450370

0,005566292

Enabled

6

96,429688

96,426973

0,005567643

Enabled

7

96,953125

96,948327

0,005537702

Enabled

8

96,421875

96,423033

0,005567870

Enabled

Improvement

34%

34%

52%

LMbench

LMbench is a popular suite of system performance benchmarks.

Memory bandwidth and latency tests can be used for the purpose of superpages verification.

LMbench uses STREAM testing program to examine memory performance. Results are differentiated by type of operation.

Mmap reread [MB/s]

Bcopy (libc) [MB/s]

Bcopy (hand) [MB/s]

Mem read [MB/s]

Mem write [MB/s]

Mem latency [ns]

SP support

645,4

305,4

432,3

681

3043

238,8

Disabled

660,0

312,4

446,9

696

3300

148,4

Enabled

Improvement

2,26%

2,29%

3,37%

2,2%

8,44 %

37,85%

Self host world build

Reduction in the duration of self hosted world build can be observed when using GCC.

No time reduction can be observed when using CLANG despite creation of ~570000 superpages in the process.

GCC

CLANG

SP support

6h 36min

6h 16min

Disabled

5h 14min

6h 15min

Enabled


Repository

The code has been integrated to the FreeBSD HEAD.

Superpages support SVN Revision

Practical, transparent operating system support for superpages PAPER

ARMSuperpages (last edited 2013-09-16 10:26:01 by ZbigniewBodek)