Author: Steven Danneman
One pre-requisite of implementing an efficient multi-threaded userspace file server is the ability to set user credentials on a per-thread basis, rather than the per-process basis that POSIX allows.
This is necessary so that each thread can impersonate a separate connected user, and thus have the kernel handle all filesystem access checks for the file server daemon.
Though there is no POSIX defined interface to set and retrieve per-thread credentials, Apple in their Darwin OS has implemented several non-portable syscalls to accomplish this task. They are defined as:
int pthread_setugid_np(uid_t, gid_t) int pthread_getugid_np(uid_t *, gid_t *) int setgroups(int, const gid_t *)
Through this API, per-thread credentials are set in two separate calls, one for UID and GID and a second for supplementary group list. A thread's cred can then be reverted to that of the process by setting the UID to the well defined KAUTH_UID_NONE value.
There are a few details in this design that I dislike. First, the separation of the setting of the UID/GID with the setting of the supplementary groups can lead to application programming error (forgetting to do the setgroups call) and also causes the credential state to be incomplete between calls to these functions.
Second, the use of a well known UID is troublesome.
KAUTH_UID_NONE (~(uid_t)0 - 100)
Applying special meaning to any part of the UID/GID space is dangerous as nothing prevents administrators from using these IDs for general users and groups which will cause undesired results.
My proposed implementation does not have these problems.
Currently, both the process and thread structures contain references to the same in kernel credential structure. In almost all places, the cred pointer in the thread is used for checking access.
Thus, little work is necessary in the access check path of the kernel credentials. Instead, most of the work needed is implementing new syscalls and defining how they interact with the existing per-process cred syscalls.
There are the main tasks to implement:
Define new libthr library functions to set the per-thread credential.
int pthread_getcred_np(uid_t *uid, int *gidsetlen, gid_t *gidset) The pthread_getcred_np() system call retrieves the real user ID of the calling thread, and gets the current group access list of the calling thread and stores it in the array gidset. The gidsetlen argument indicates the number of entries that may be placed in gidset. The pthread_getcred_np() function sets this variable to the actual number of groups returned in gidset, even on error. If the incoming gidsetlen is less than the number of groups in the credential, ERANGE will be returned and gidsetlen will be set to the number of groups. A successful call should always set gitsetlen to at least one for the primary GID. If successful, the pthread_getcred_np() function will return zero. Otherwise an error number will be returned to indicate the error.
int pthread_setcred_np(uid_t uid, int gidsetlen, const gid_t *gidset) The pthread_setcred_np() system call sets the real, effective, and saved UIDs and GIDs, along with the supplementary group access list of the current thread. The first parameter can be any UID. The second parameter, ngroups, indicates the number of entries in the array and must be no more than NGROUPS, as defined in <sys/param.h>. The third parameter, gidset, is an array of GIDs, the first being the primary GID to set. This call will replace all other supplementary groups in the credential. This function may only be called if the process has super-user privileges. If successful, the pthread_setcred_np() function will return zero. Otherwise an error number will be returned to indicate the error.
int pthread_revertcred_np(void) The pthread_revertcred_np() system call reverts the thread's credential, including the real, effective, and saved UIDs and GIDs, to the per process credential. This function may only be called if the process has super-user privileges. If successful, the pthread_revertcred_np() function will return zero. Otherwise an error number will be returned to indicate the error.
Matching syscalls gettcred(), settcred(), and reverttcred() implement the library functions.
We require the caller to set the UID, GID, and supplementary groups in a single syscall. This is a hint to the application programmer that all three of these pieces of information are necessary to create a full credential. It also avoids the race between a setuid() and a setgroups() call where the kernel credential is incomplete.
However, one side-effect of this implementation is that only the super-user can make these calls. Whereas if there were separate, pthread_seteuid_np() and pthread_setegid_np() calls, an unprivileged user who intended to set the euid/egid to the real or saved uid/gid for that thread, would be allowed. Furthermore, instead of checking privileges based off the current thread credential, we check privileges based off the process credential. The process itself must be running as a privileged user in order for any thread to call pthread_setcred_np() or pthread_revertcred_np(). This provides more consistency to the the interaction between setting process credentials and setting thread credentials. Only a thread running in a process which has privileged credentials, may revert back to the credentials of that process. This is the same behavior as the Apple implementation.
We don't want to assume that the FreeBSD kernel will always provide a 1-to-1 mapping between userspace threads and kernel threads. Thus we add an explicit pthread_getcred_np() function to retrieve the uid/gid/groups from the thread.
Regarding the interaction between the per-process credential and per-thread credential. Each is considered a separate entity, thus if no per-thread credential has been explicitely set, then a call to retrieve that per-thread credential will return with an ENOENT error. The explicitly set per-thread credential will always take precedence over the per-process credential. Ie, if pthread_setcred_np() is called setting a UID of 1000, then setuid() is called setting a UID of 2000, the specific thread will retain the UID of 1000.
We must also update P_SUGID inside these functions to designate that the process is "tainted" as defined in the issetugid() call.
Don't automatically update thread cred to process cred on traps
The cred_update_thread() helper functions swaps a thread's current ucred with the one from it's process at the beginning of each syscall and trap. We'll define a new private thread flag, TDP_SUGID, which will designate whether a thread has explicitly changed it's credential. Inside cred_update_thread() we'll check if this bit has been set, and if so we won't reset it's credential.
cred_update_thread() isn't called from the trap handler in each architecture. Each architecture will need to be audited to see if it should call this function, or check the TDP_SUGID flag in another way.
Modify existing get[ug]id() calls to return process credential
The FreeBSD 6.2 get[ug]id() implementation returns information from the per-thread credential, which previous to this API was identical to the per-process credential. Now that these two structures can be different, we must maintain the semantic meaning of these existing POSIX calls and return user/group information from the process credential.
Audit all places where credentials are checked using the process credential and change these to use the thread's credentials.
Most access checks are made using the thread's credential though it appears that a few VOP calls (see do_sendfile) and ifs code use the p_ucred in 5.5. These will need to be updated to use the td_ucred.
Add per-thread credential knowledge to utilities and kernel subsystems
Several utilities and kernel subsystems provide useful information about running processes. These should be updated to also print out per-thread credentials. My list so far is:
Upstream to FreeBSD
It's our intention for our design and implementation to be accepted into the main FreeBSD codebase. Zach Loafman has already started a dialog with key members of the FreeBSD community to gain acceptance for our proposed design.
- How per-thread credentials interact with Mandatory Access Lists (MAC) in FreeBSD still needs to be investigated.
- How per-thread credentials interact with jails
- Use new privilege infrastructure in 7.3 instead of suser()
Add auditing events as discussed here: AddingSyscalls
FreeBSD man pages
- man setuid
- man getuid
- man setgroups
- man getgroups
- man getgrouplist