SpencerWhitman/K - FreeBSD Wiki

Abstract

K comes from some discussions amongst some developers about how cool and or useful it would be to have language extensions that make it possible to write kernel code more cleanly and with less bugs. An example of this would be to have language support of linked lists, instead of mucking about in MACROspace.

Although tarballs are given, the latest code should in p4 at //depot/projects/soc2005/K/

Goals

General Thoughts

Documents

The paper SMART C: A Semantic Macro Replacement Translator for C (dead describes a complementary approach.

Random Examples

Assert

assert.tgz statically tries to verify a predicate contained in an assert statement. At each assert statement, paths going to that assert statement from the beginning of the function including the assert are examined. These paths are given to a theorem prover to check if paths imply the predicate included in the assert statement. More paths are examined if a parameter specifying the depth of conditionals to examine. The tool ignores things like loops and uses some heuristics to prune paths to get rid of statements that are not relevent to the predicate.

int check (int b) {
    if (b > 0)
        return 1;
    else
        return 0;
}
 
void assert (int b) {}
 
void main() {
    int b = 100;
    int cked = check (b);
 
    assert (cked == 1);
}

The assert in the code above can statically be verified that the assert can hold.

Note that there is a limitation in interprocedural analysis because our paths stop at the beginning of the main function. This prevents us from checking properties like.

void unlock(int *lock) {
    assert(*lock == 1);
    *lock = 0;
}
void main() {
    int lock = 1;
    unlock(&lock);
}

This assert in unlock is a cool thing to have to prevent double unlocks or double locks. This can be fixed by creating an assume construct. assume is similiar to a precondition. At the beginning of each function, an assume can be made that used to verify later assert in the function. The validity of the assume is made by adding asserts before each function call. So following our previous example:

With the addition of assumes.

void unlock(int *lock) {
    assume(*lock == 1);
    assert(*lock == 1);
    *lock = 0;
}
void main() {
    int lock = 1;
    assert(*(&lock) == 1); // automatically inserted assert
    unlock(&lock);
}

Magic

Adding a magic number field associated with a structure would be useful for debugging and be automatically removed when released. For example,

struct blah {
        . . .
        int __magic;
}

Whenever a new blah structure is created, the __magic field is set to a number uniquely identifying the blah structure. The current scheme uses just the size of the structure. But if more uniquiness is needed than a 32 bit number where the most significant 20 bits is a number that is incremented whenever we see a new struct definition and the least significant 12 bits is the size of the structure can be used.

struct blah *ptr = malloc (sizeof(struct blah));
ptr->__magic = << some magic number >>;

Now when a pointer is dereferenced resolving to the type struct blah, __magic is checked to see if it's equal to the magic number associated with struct blah. This is cool because you can figure out the memory being dereferenced as the same type that you expect and so is useful during a "night of heavy pointer gymnastics." ptr -> some_other_field = 5; becomes

assert (ptr -> __magic == << some magic number >>);
ptr -> some_other_field = 5;

This is also related to detecting buffer overflows. Previous work like Stackguard add a random number on the stack to detect mucking of the return address by some intruder. This __magic field somewhat detects mucking in the heap.

Note that there is an issue involving hardcoded sizes. For example, if a programmer believes the size of struct blah is 6 and hardcodes 8 in struct blah *ptr = malloc(8), then we need to add 4 (the size of the __magic field) to be correct. Although in this example, we can see a simple solution, we cannot detect this in the general case.

You can try it out at magic.tgz

Using magic . . .

struct blah {
    int stuff;
};

void main() {
    struct blah *ptr = malloc (sizeof(struct blah));
    ptr->stuff = 1;
}

is converted to

struct blah {
    int __magic ;
    int stuff ;
};

extern int ( /* missing proto */  malloc)() ;

void main(void) { 
    struct blah *ptr@main ;
    struct blah *tmp@main ;
    
    {
        tmp@main = (struct blah *)malloc(sizeof(struct blah ));
        tmp@main->__magic = sizeof(struct blah );
        ptr@main = tmp@main;

        assert(ptr@main->__magic == 0 + sizeof(struct blah ));
        ptr@main->stuff = 1;
        return;
    }
}

Scopes

One of the things I would like to be able to do is use the scoping of the language to help me get locking right, something like:

        mutex(&sc->mtx) {
                error = something(sc);
                if (error != 0)
                        break;
                error = something_else(sc);
                if (error != 0)
                        break;
                error = last_thing(sc);
        }
        if (error != 0)
                return (error);

In this case the compiler will automatically release the mutex whenever the {...} block is exited, no matter how this happens.

Currently we have a more general implementation coupled.tgz that can couple two C statements together within a scope. However, we do not extend the C syntax to allow the code above. We use a keyword function void coupled() that if coupled() is called in the beginning of a block, then the next statement is the statement to be executed whenever the block is exited and the statement after next is the statement to be executed when the block is entered. A macro is used to make things look pretty. For example,

#define COUPLED(before, after) coupled(); after; before;

    do {
        COUPLED(lock(), unlock())
        error = something(); 
        if (error != 0)
            break; 
         error = something(); 
         if (error != 0)
            break;
         error = something();
    } while (0);

Whenever the while block is entered, the first statement in COUPLED is executed: lock()}} in this case. Whenever the while block is exited, the second statement in {{{COUPLED is executed: unlock() in this case. The code is transformed to

  while (1) {
    while_0_continue: /* CIL Label */ ;
    lock();
    error@main = something();
    if (error@main != 0) {
      {
      unlock();
      goto while_0_break;
      }
    }
    error@main = something();
    if (error@main != 0) {
      {
      unlock();
      goto while_0_break;
      }
    }
    error@main = something();
    {
    unlock();
    goto while_0_break;
    }
  }
  while_0_break: /* CIL Label */ ;
  }

Note that there is now an unlock(); placed in front of each break statement. The current implementation just addresses the exit issue. It does not handle entrance to block from other means. For example, there could be a goto from elsewhere in the program that enters the block and that "goto" will not have acquired the lock. Although I think that case is rare, the next implementation will take account of that case.

Breaks

Here's another idea with respect to scopes:

        for(i = 0; i < 10; i++) {
                for (j = 0; j < 10; j++) {
                        if (a[i][j] == 2)
                                breakn(2);
                }
        }

Being able to say: "break out of two levels"

Cil can convert this to:

  i@main = 0;
  {
  while (1) {
    while_0_continue: /* CIL Label */ ;
    if (! (i@main < 10)) {
      goto while_0_break;
    }
    j@main = 0;
    {
    while (1) {
      while_1_continue: /* CIL Label */ ;
      if (! (j@main < 10)) {
        goto while_1_break;
      }
      if ((*((*(a@main + i@main)) + j@main)) == 2) {
        {
        goto while_0_break; 
        }
      }
      j@main ++;
    }
    while_1_break: /* CIL Label */ ;
    }
    i@main ++;
  }
  while_0_break: /* CIL Label */ ;
  }

Try it at breaks.tgz. Basically, call the function void blastn(int n) to break out of n levels. For more help, look at the example in ./test to figure out how to use it.

I would like a bitmap type, instead of the tedious

        #define THIS_ONE  0x0000001
        #define THAT_ONE  0x0000002

I would prefer to be able to:

        bitmap {
                 foo,
                 bar,
                 barf
        };

Structure packing control would be nice too:

        struct foo {
                bigendian uint32_t      magic   @0;
                ...
        };

Most CPUs have magic instructions for byte-order and alignment, there is no reason why the compiler can't do this for us.

In the debugging dept, an option to have the compiler watch out for 0xdeadc0de would be neat. Basically whenever a pointer contains something which looks like 0xdeadc0de, panic instead of going haywire. Similarly, avoid jumps through NULL pointers etc:

kcc -Wdeadcode -WNULLderef

And as was said above, in-language support for linked lists instead of macro-madness would be nice as well.

Playing with the Examples

The examples breaks.tgz and coupled.tgz are purely in the development stages and do not have a nice build system. To build and run these examples, first download Ocaml 3.08 which is available as a port and then download the Cil library. The installation for Ocaml should be straightforward. Cil needs to be built but the instructions (./configure; make; make quicktest) should work without a hitch.

To make the examples, you need to modify the Makefile. In Makefile, change COMPILEFLAGS to include the location of Cil. In Makefile.ocaml.build, change the directories to point to the location of Cil. After those modification, run make and the executables will be placed in ./obj. You can the run the executables on the *.c files in the ./test directory.

./obj/doCoupled.asm.exe ./test/coupled.c
./obj/doBreaks.asm.exe ./test/wiki.c

What Other People Are Doing

Microsoft Singularity An operating system designed from scratch by both the systems people and the programming languages people at Microsoft Research
E. Brewer, J. Condit, B. McCloskey, F. Zhou; Thirty Years Is Long Enough: Getting Beyond C, HotOS X, 2005
S Chandra, P McCann, Packet Types Sigcomm 2000 : Seems like a good basis to look at for bitvectors. It's probably has a bit more than we need but it's good start.
D. Mandelin, L. Xu, R. Bodík, D. Kimelman Jungloids PLDI 2005: Another way to deal with lotta API's or in our case lotta macro's
Linux has Sparse, some sort of static checker

TODO: Sept 09, 2005

Do a strawman Bitvector implementation. Try running it on ip_input
- convert chunks of header files to use bit fields
Use coupled and break(n) code in BSD somewhere.
Get Cil to work with RELENG_6 GENERIC
- Fix linking
- Check warnings (chances are most are spurious)
- UPDATE: Have been running into gcc weirdness with the current cvs kernel. The kernel sometimes compiles and sometimes not under identical settings; this is quite possibly a hardware bug on my machine.
Try magic tags for debugging sometime
Do assert / assume for locks or some other coupled property

DONE: Sept 9, 2005

magic tags for structures
p4 is setup and all code has been checked in
Update of Wiki

DONE: Aug 30, 2005

Static checking of asserts

DONE: Aug 04, 2005

Coupled statements within a block or scope

DONE: July 19, 2005

How much time does it take extra to run GENERIC through Cil?
- From Daniel, for 4.10 GENERIC kernel
- Normal build: 221.177u 33.490s 4:47.93
- CIL build w/ derefernce checking: 835.489u 143.103s 18:29.99
Do a deadc0de check when dereferencing function pointers
Break(n)
Look into licensing issues
- Ocaml Compiler is QPL and library is LGPL
- Cil has a BSD License

CategoryGsoc