Adding Syscalls To FreeBSD

The process of adding syscalls to FreeBSD has slowly grown more complex over the years. New features such as audit have added extra fields to the syscalls.master files, symbol versioning added another file to edit in src/lib/libc, and 32-bit compatibility for 64-bit versions requires two declarations of each syscall and translation of some arguments.

NOTE WELL: Adding new syscalls is not to be done lightly. Once we have shipped a syscall in a release, we are generally stuck with it or at least with something that implements the same interface forever. When in doubt, request and obtain review from senior developers!

WARNING: Incorrectly implemented system calls may present serious risks to system stability and security. This document does not address many issues that must be considered when adding any new kernel interface and should not be considered complete.

Registering a syscall

Syscalls are declared in sys/kern/syscalls.master with entries like:

485     AUE_NULL        STD     { int cpuset_setid(cpuwhich_t which, id_t id, \
                                    cpusetid_t setid); }

The format of this line is documented at the top of the file. In general, all new syscalls should be of type STD or NOSTD. The function declaration part contains the prototype of the syscall as seen from userspace. This prototype must also be declared somewhere so userland code can call it, typically in a header under sys/sys/. In this case, it is declared in sys/sys/cpuset.h. After adding an entry to sys/kern/syscalls.master, you must regenerate the generated files in sys/kern and sys/sys:

$ make -C sys/kern/ sysent
mv -f init_sysent.c init_sysent.c.bak
mv -f syscalls.c syscalls.c.bak
mv -f systrace_args.c systrace_args.c.bak
mv -f ../sys/syscall.h ../sys/syscall.h.bak
mv -f ../sys/syscall.mk ../sys/syscall.mk.bak
mv -f ../sys/sysproto.h ../sys/sysproto.h.bak
sh makesyscalls.sh syscalls.master
$ make -C sys/i386/linux sysent
$ make -C sys/amd64/linux32  sysent
$ make -C sys/compat/freebsd32 sysent

In addition to adding the syscall to the kernel's table the symbol must be added to the symbol map in libc. This is done by adding entries to src/lib/libc/sys/Symbol.map. Each syscall results in three symbols. In the example of cpuset_setid, they are cpuset_setid, _cpuset_setid, and sys_cpuset_setid. The plain cpuset_setid symbol should be added to the most recent namespace map (which is currently FBSD_1.2 for new syscalls introduced in 9.0-CURRENT but can change later) as per http://people.freebsd.org/~deischen/symver/freebsd_versioning.txt. By convention we place the other two in the FBSDprivate_1.0 map.

Implementing a syscall

The actual system calls of type STD are implemented by a function with a prototype of the form:

/* XXX: Padding members removed for clarity.  See sysproto.h for details. */
struct cpuset_setid_args {
        cpuwhich_t which;
        id_t id;
        cpusetid_t setid;
};
/* XXX: variable names below are typical, but are not defined in sysproto.h. */
int     cpuset_setid(struct thread *td, struct cpuset_setid_args *uap);

The return value of this function becomes the value of errno in userspace and if non-zero the userspace function returns -1. Explicit return value can be set in td->td_retval[0]. In the cpuset_setid case, this function resides in sys/kern/kern_cpuset.c.

Auditing

Security event auditing allows fine-grained logging of security-relevant events in the operating system, and is described by the CAPP common criteria protection profile. Detailed information on configuring audit can be found in the FreeBSD Handbook http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/audit.html, and an implementation paper can be found on the TrustedBSD website http://www.TrustedBSD.org/.

Most system calls are considered security-relevant, as they frequently involve access control checks, communication between processes, or system configuration, and therefore almost all system calls will be audited. You can find more detailed information on the AddingAuditEvents page. Please do not forget to add appropriate auditing to your new system call, be it a native call or a compat one!

Please e-mail the trustedbsd-audit mailing list, or RobertWatson, if you require help. The audit event identifier space is managed by the OpenBSM project, and you will need to request assignment of a new identifier if an existing one isn't a good match.

Adding 32-bit compatibility

In addition to the main implementation, all new syscalls must include a 32-bit compatibility definition. For syscalls where all arguments are of identical size and layout or where arguments are 32-bit on 32-bit systems and 64-bit on 64-bit systems such as pointers, long ints, or size_t variables, it is sufficient to add a definition of type NOPROTO to sys/compat/freebsd32/syscalls.master:

484     AUE_NULL        NOPROTO { int cpuset(cpusetid_t *setid); }

As with sys/kern/syscalls.master, files in sys/compat/freebsd32 need to be regenerated after a change to sys/compat/freebsd32/syscalls.master:

$ cd sys/compat/freebsd32/
$ make sysent
mv -f freebsd32_sysent.c freebsd32_sysent.c.bak
mv -f freebsd32_syscalls.c freebsd32_syscalls.c.bak
mv -f freebsd32_syscall.h freebsd32_syscall.h.bak
mv -f freebsd32_proto.h freebsd32_proto.h.bak
sh ../../kern/makesyscalls.sh syscalls.master syscalls.conf
$

In other cases, things are more complex. Two key cases are 64-bit arguments such as off_t and id_t and pointers to entries of different sizes and/or layouts such as size_t or struct statfs. In the former case, 64-bit arguments are split into two arguments when actually passed. As a result, the function needs to be defined with appropriate arguments in syscalls.master and use an implementation that stiches the pieces back together. For example, this is the sys/compat/freebsd32/syscalls.master definition for cpuset_setid:

485     AUE_NULL        STD     { int freebsd32_cpuset_setid(cpuwhich_t which, \
                                    uint32_t idlo, uint32_t idhi, \
                                    cpusetid_t setid); }

and this is the implementation in sys/compate/freebsd32/freebsd32_misc.c:

int
freebsd32_cpuset_setid(struct thread *td,
    struct freebsd32_cpuset_setid_args *uap)
{
        struct cpuset_setid_args ap;

        ap.which = uap->which;
        ap.id = (uap->idlo | ((id_t)uap->idhi << 32));
        ap.setid = uap->setid;

        return (cpuset_setid(td, &ap));
}

In this case, it is sufficient to simply create a new argument pointer and pass it in to the primary implementation.

In the case of pointers to differently sized or aligned arguments, things get more complex. Calls to copyin and copyout must be altered so that correctly formatted data is passed to the primary implementation function and returned to userspace. In this case, a kern_syscall function is usually defined which is called to do the actual work by both the normal and 32-bit compatibility interface functions which simply handle marshaling of data. For example, let us consider a hypothetical syscall foo which takes a pointer to a buffer and a pointer to the size of the buffer and copies data to the buffer and the amount of data copied to the size pointer. The declaration of foo in sys/kern/syscalls.master would look like:

666     AUE_NULL        STD     { int foo(char *buf, size_t *bufsize); }

The primary implementation would look like (ignoring kern_foo() since we don't care what it does:

int
foo(struct thread *td, struct foo_args *uap)
{
        int error;
        size_t bufsize;

        if ((error = copyin(uap->bufsize, &bufsize, sizeof(bufsize))) != 0)
                return(error);

        if ((error = kern_foo(td, uap->buf, &bufsize)) != 0)
                return(error);

        return (copyout(&bufsize, uap->bufsize, sizeof(bufsize)));
}

In sys/compat/freebsd32/syscalls.master we would need a somewhat different definition:

666     AUE_NULL        STD     { int freebsd32_foo(char *buf, \
                                    uint32_t *bufsize); }

Likewise, the implementation is a little more complex:

int
foo(struct thread *td, struct foo_args *uap)
{
        int error;
        uint32_t bufsize32
        size_t bufsize;

        if ((error = copyin(uap->bufsize, &bufsize32, sizeof(bufsize32))) != 0)
                return(error);
        bufsize = bufsize32;

        if ((error = kern_foo(td, uap->buf, &bufsize)) != 0)
                return(error);

        bufsize32 = bufsize;
        return (copyout(&bufsize32, uap->bufsize, sizeof(bufsize32)));
}

Testing 32-bit compatibility

32-bit compatibility is somewhat tricky so it's important to test it. If you have access to an amd64 machine, this is easy if you have a utility in the base use can use for testing. Just get your new code up and running normally then buildworld with TARGET=i386. You can then grab the executables you need from /usr/obj and run your tests with them.

Committing

We commit syscalls in two steps. First we commit the implementation along with the syscalls.master files. Then we run "make sysent" in sys/kern and sys/compate/freebsd32 and commit the generated files. This is done to the generated files contain the right values in their "created from FreeBSD:" lines.

MFCing

Some time prior to an MFC, you should add stub entries for your syscalls to ease rollback. This can help prevent problems for users who upgrade, install applications that use the new system call, and then downgrade their kernels which results in those applications being killed with SIGSYS. If the new feature is used by login or sh this can be very bad. An example of adding such stubs can be found in revision 156791.

AddingSyscalls (last edited 2013-11-06 22:26:09 by HirenPanchasara)