Project name
Student: Marko Vlaić (mvlaic@freebsd.org)
- Mentor: Bojan Novković (bnovkov@)
Project description
It is common for kernel subsystems to conditionally include functionality, based on compile time and runtime configurations which are relatively infrequently subject to change. Typical examples include: toggling of DTrace probes, boot- time optimizations based on hardware capabilities and inclusion of additional security checks. This is often done by examining the state of a global flag and executing a block of code conditionally based on that state. When this is done in a ”hot” (i.e. frequently executed) code path, the overhead of the conditional execution can become significant. Moreover, in situations in which the state of the inspected flags changes rarely, most of the performed checks are redundant. The primary goal of this project is to design and implement a low overhead mechanism for conditional execution in contexts in which the branching condition does not change often. The mechanism will be based on runtime code patching of instructions. The second major goal is applying the developed mechanism to an existing block of kernel code. The current working name for the interface is zcond.
Deliverables
- A patch implementing the code patching and static branching mechanism for the x86-64 architecture
- A patch which applies the developed mechanism to an existing piece of kernel code
What is done
All of the work has been aimed at supporting the amd64 architecture.
A safe kernel instruction patching mechanism
Before we proceed with a description, it is important to note that this mechanism did not pass the review. The whole discussion can be found at revision D46379. However considerable time was spent designing and implementing this mechanism, so its description is left in this report for completeness.
A new mechanism for kernel instruction patching was implemented in order to support the zcond interface. Any unprivileged write into kernel text can pose serious security threats, so special care was devoted to hardening the mechanism in that regard. While an instruction is being patched, all CPUs except the one performing the patch are stopped by the smp_rendezvous routine, for the purposes of both safety and correctness.
Certain hardware protections are ordinarily active, forbidding any write access to instruction memory sections. These need to be bypassed in order to perform an instruction patch. Turning the hardware protections off completely for the duration of the patch (e.g. disable_wp() on amd64) would leave the whole of the kernel instructions unnecessarily exposed. Instead, this mechanism relies on a separate kernel page table, initialized with parts of kernel pmap at boot. This separate page table also holds an additional page table entry, with write permissions set for a virtual address allocated at boot. With all of this in mind, the process of patching an instruction is implemented as follows:
CPUs other than the patching CPU are stopped with smp_rendezvous
- The VM page containing the instruction to be patched is mapped to the preallocated address in the dedicated page table
- The dedicated page table is loaded onto the patching CPU
- Instruction is patched
- The kernel page table is restored on the patching CPU
- Other CPUs are resumed
This implementation accomplishes two things, in terms of security:
- Because the dedicated page table is only loaded on the patching CPU, an attacker cannot gain any elevated privilege by hijacking any of the stopped CPUs.
- Only one page of kernel .text is exposed at a time, instead of the whole section.
A low cost conditional execution mechanism
A low cost conditional execution mechanism was implemented and can be used through the zcond interface. Target code blocks of the form:
Can be migrated to the zcond interface as:
The state of a zcond can be toggled with the zcond_enable() and zcond_disable() functions.
The base idea of the mechanism is to save execution time by avoiding a memory access to the flag before selecting the branch to be executed. A single branch direction is "baked in" at compile time by either selecting a nop or an unconditional jmp instruction. When zcond state is toggled at runtime, the "baked in" instruction gets patched through the safe patching mechanism. A nop instruction gets replaced by a jmp and symmetrically a jmp gets replaced with a nop.
The implementation relies on the asm goto statement to record the instruction and jump address into a dedicated ELF section at build time. This data is loaded at boot, as well as on module load.
This patch is currently waiting on a review: D46379
DTrace SDTs have been ported to the zcond interface
DTrace statically defined tracing was the motivational use case for the introduction of the zcond interface. An optimization similar to zcond was recently applied to SDTs (D44483). The zcond mechanism can be seen as a generalization of these optimizations, with additional security measures. SDTs were thus ported to the zcond interface. Furthermore, this port serves as a proof of concept and example of use for the new mechanism.
This change is included in the same revision as the zcond interface: D46379
What is left
If the review process deems the approach acceptable, architectures other than amd64 will be supported
- Extensive benchmarks are needed to ascertain the level of optimization gained from applying the mechanism
- The mechanism should be applied to other kernel subsystems.
Milestones
- May 13th - May 19th: Start working on static branch selection
Implement the DEFINE_ZCOND_TRUE() and DEFINE_ZCOND_FALSE() macros
Implement rudimentary versions of zcond_true() and zcond_false() functions
- May 20th - May 26th: Continue working on the static branch selection
Finish the implemention of zcond_true() and zcond_false() functions, utilizing the asm goto directive.
- May 27th - June 2nd: Finish work on the static branch selection
- Perform some rudimentary testing
- Begin researching the mechanisms needed to support code patching
- July 3rd - July 9th: Start working on the code patching
Start work on the single-processor versions of void zcond_enable() and zcond_disable() functions.
- July 1st - July 7th: finish working on the single processor version of code patching
- start researching the locking mechansims needed for an SMP version of the code patching mechansim
- July 8th - July 21st: mplement the SMP version of the code patching mechanism
- July 22nd - July 28th
- Work out any remaining issues
- Benchmark the mechanism with an artificial test suite
- Start looking for kernel code suitable to be refactored to use the new interface
- July 29th - August 4th
- Study the existing kernel code selected for refactoring
- Rewrite the selected piece of the kernel
- August 5th - August 11th: Final code review with my mentor
- August 12th - August 19th:
- Work out remaining issues
- Work on documentation
The Code
Useful links
linux static keys proposal.pdf