Project name

Project description

It is common for kernel subsystems to conditionally include functionality, based on compile time and runtime configurations which are relatively infrequently subject to change. Typical examples include: toggling of DTrace probes, boot- time optimizations based on hardware capabilities and inclusion of additional security checks. This is often done by examining the state of a global flag and executing a block of code conditionally based on that state. When this is done in a ”hot” (i.e. frequently executed) code path, the overhead of the conditional execution can become significant. Moreover, in situations in which the state of the inspected flags changes rarely, most of the performed checks are redundant. The primary goal of this project is to design and implement a low overhead mechanism for conditional execution in contexts in which the branching condition does not change often. The mechanism will be based on runtime code patching of instructions. The second major goal is applying the developed mechanism to an existing block of kernel code. The current working name for the interface is zcond.

Deliverables

What is done

All of the work has been aimed at supporting the amd64 architecture.

A safe kernel instruction patching mechanism

Before we proceed with a description, it is important to note that this mechanism did not pass the review. The whole discussion can be found at revision D46379. However considerable time was spent designing and implementing this mechanism, so its description is left in this report for completeness.

A new mechanism for kernel instruction patching was implemented in order to support the zcond interface. Any unprivileged write into kernel text can pose serious security threats, so special care was devoted to hardening the mechanism in that regard. While an instruction is being patched, all CPUs except the one performing the patch are stopped by the smp_rendezvous routine, for the purposes of both safety and correctness.

Certain hardware protections are ordinarily active, forbidding any write access to instruction memory sections. These need to be bypassed in order to perform an instruction patch. Turning the hardware protections off completely for the duration of the patch (e.g. disable_wp() on amd64) would leave the whole of the kernel instructions unnecessarily exposed. Instead, this mechanism relies on a separate kernel page table, initialized with parts of kernel pmap at boot. This separate page table also holds an additional page table entry, with write permissions set for a virtual address allocated at boot. With all of this in mind, the process of patching an instruction is implemented as follows:

  1. CPUs other than the patching CPU are stopped with smp_rendezvous

  2. The VM page containing the instruction to be patched is mapped to the preallocated address in the dedicated page table
  3. The dedicated page table is loaded onto the patching CPU
  4. Instruction is patched
  5. The kernel page table is restored on the patching CPU
  6. Other CPUs are resumed

This implementation accomplishes two things, in terms of security:

A low cost conditional execution mechanism

A low cost conditional execution mechanism was implemented and can be used through the zcond interface. Target code blocks of the form:

   1 bool flag = false;
   2 
   3 if(!flag) {
   4   false_action();
   5 }
   6 
   7 if(flag) {
   8   true_action();
   9 }

Can be migrated to the zcond interface as:

   1 DEFINE_ZCOND_FALSE(flag);
   2 
   3 if(zcond_false(flag)) {
   4   false_action();
   5 }
   6 
   7 if(zcond_true(flag)) {
   8   true_action();
   9 }

The state of a zcond can be toggled with the zcond_enable() and zcond_disable() functions.

The base idea of the mechanism is to save execution time by avoiding a memory access to the flag before selecting the branch to be executed. A single branch direction is "baked in" at compile time by either selecting a nop or an unconditional jmp instruction. When zcond state is toggled at runtime, the "baked in" instruction gets patched through the safe patching mechanism. A nop instruction gets replaced by a jmp and symmetrically a jmp gets replaced with a nop.

The implementation relies on the asm goto statement to record the instruction and jump address into a dedicated ELF section at build time. This data is loaded at boot, as well as on module load.

This patch is currently waiting on a review: D46379

DTrace SDTs have been ported to the zcond interface

DTrace statically defined tracing was the motivational use case for the introduction of the zcond interface. An optimization similar to zcond was recently applied to SDTs (D44483). The zcond mechanism can be seen as a generalization of these optimizations, with additional security measures. SDTs were thus ported to the zcond interface. Furthermore, this port serves as a proof of concept and example of use for the new mechanism.

This change is included in the same revision as the zcond interface: D46379

What is left

Milestones

The Code

Github D46379

linux static keys proposal.pdf


CategoryGsoc

SummerOfCode2024Projects/ZeroCostConditionalExecutionMechanism (last edited 2024-11-01T00:30:56+0000 by MarkLinimon)