Contents
Improving the Linux compatibility layer in the FreeBSD kernel
The goal is to update to 2.6.16 compatible Linux syscalls. Without this compatibility, we are not able to update to a more recent linux_base port.
Test equipment
Linux Test Project: README and INSTALL are straight forward, don't forget to mount devfs, linprocfs and linsysfs to the linux-chroot where you compile/test (I'm using gentoo-stage3 to have a complete build environment)
GNU libc nptl regression test: not as easy as LTP, but doable; precompiled stuff available
mmap fingerprinter: little tool by Marcin Cieslak, extended by Jung-uk Kim to determine the fingerprint of the mmap() call. Could be used to extend the FreeBSD regression tests too. Note: Marcin confirmed that mmap_test.c is public domain.
Darren Hart's futex test suite: this suite will run various futex related functionality, performance and stress tests.
Trinity: a Linux system call fuzz tester: a package to call syscalls at random, with random arguments, to find eventual vulnerabilites and bugs.
LTP
Very easy to set up:
- Install emulators/linux_dist-gentoo-stage3
- Get the latest stable release of the LTP and copy it to the gentoo-stage3 directory
- cd into /usr/local/gentoo-stage3/, mkdir dev and sys, mount devfs, linprocfs and linsysfs and then do a "chroot /usr/local/gentoo-stage3 bash"
To prepare the LTP extract the tarball, cd into the newly extracted directory and type "./configure && make all install". This will install the LTP into /usr/local/gentoo-stage3/opt/ltp. To choose a different installation path use "make DESTDIR=YOUR_FAVOURITE_PATH install".
- cd into the LTP installation directory and initiate a full LTP test run with "./runltp -p -l/var/tmp/results-log -o/var/tmp/results-output -C/var/tmp/results-failed -d/tmp"
Keep an eye on /usr/local/gentoo-stage3/tmp between tests and clean it up. You should also run "ipcs" and remove some leftover (compare before and after each run).
A little funny note: we are even more compatible than a real linux distribution (don't take this literally). LTP 20061222 does not even compile on Ubuntu 6.10. The error message is ../../generate.sh: 60: arith: syntax error: "cnt--" and the line in question is while [ $((cnt--)) -gt 0 ] ; do. If you do a loop unrolling by hand it fails later when you try to run the LTP tests even before it tries the first test. We did not try to proceed further, we want to fix problems on FreeBSD, not on Linux.
Missing stuff
Syscalls
syscall |
status |
add_key |
not started |
adjtimex |
not started |
capget |
not started |
create_module |
will not be implemented |
epoll_create |
patch available, this syscall is needed by recent firefox-linux versions (e.g. 3.6.10 or 4.0 beta 6), problem with the patch: epoll does not work after fork()ing |
epoll_ctl |
|
epoll_wait |
|
fadvise64(_64) |
not started, used by gnu-sort, ... |
fstatfs64 |
not started |
futex |
in RELENG_8_0+, problems when swapping out/mmaping etc., no support for clockrt (FUTEX_CLOCK_REALTIME [op=265]), additional patches available |
inotify_add_watch |
not started |
inotify_init |
not started, needed by acroread9, problem: unfinished implementation |
inotify_rm_watch |
not started |
io_cancel |
not started |
io_destroy |
not started |
io_getevents |
not started |
io_setup |
not started |
io_submit |
not started |
ioprio_get |
not started |
ioprio_set |
not started |
keyctl |
not started, used by sshd, cron, ... |
lookup_dcookie |
implemented in RELENG_8_0+ |
mbind |
not started |
mincore |
not started |
pipe2 |
|
ppoll |
not started |
prctl |
partially implemented |
pselect6 |
not started |
ptrace |
implemented in RELENG_8_0+ |
quotactl |
not started |
readahead |
implemented in RELENG_8_0+ |
remap_file_pages |
not started |
rt_sigqueueinfo |
not started |
sendfile |
not started |
sendfile64 |
implemented in RELENG_8_0+ |
setfsgid |
not started |
setfsuid |
not started |
set_mempolicy |
not started |
stime |
not started |
swapoff |
not started |
sysctl |
will probably not be implemented |
sysfs |
not started |
syslog |
not started |
unshare |
not started |
vhangup |
not started |
waitid |
not started |
The syscalls(2) manual is a list of all syscalls <= linux-2.6-mainline including references to the linux kernel version each syscall first appeared. This document is part of the Linux man-pages project maintained by Michael Kerrisk.
Right now the goal is to catch up with linux-2.6.16 syscall wise. If this goal has been reached the next step will be to reach linux-2.6-mainline syscall compatibility.
IPC
type |
name |
status |
11 |
LINUX_GETPID |
Appears to be implemented? |
12 |
LINUX_GETVAL |
Appears to be implemented? |
13 |
LINUX_GETALL |
Implemented (in r166008) |
17 |
LINUX_SETALL |
Implemented (in r166008) |
-257 |
?? |
not started, probably a "has to fail" regression test sentinel |
ioctl
type |
status |
used by |
0x541c ('T',28) |
not implemented |
consoletype |
0x5801 ('X',1) |
not implemented |
quake4.x86 |
0x6d02 ('m',2) |
not implemented |
dd |
0x8905 ('M',5) |
not implemented |
ltp-20090930 (sockioctl01) |
0x8910 ('M',16) |
not implemented |
quake4.x86, opera 11 |
0x8914 ('M',20) |
not implemented |
ltp-20090930 (sockioctl01) |
futex operators and flags
operator |
status |
FUTEX_WAIT |
implemented |
FUTEX_WAKE |
implemented |
FUTEX_FD |
abandoned in 2006; removed in 2008 (2.6.26) |
FUTEX_REQUEUE |
? |
FUTEX_CMP_REQUEUE |
implemented |
FUTEX_WAKE_OP |
implemented |
FUTEX_LOCK_PI |
unimplemented |
FUTEX_UNLOCK_PI |
unimplemented |
FUTEX_TRYLOCK_PI |
unimplemented |
FUTEX_WAIT_BITSET |
implemented in 9.0 (r218117) |
FUTEX_WAKE_BITSET |
implemented in 9.0 (r218117) |
FUTEX_WAIT_REQUEUE_PI |
unimplemented |
FUTEX_CMP_REQUEUE_PI |
unimplemented |
FUTEX_PRIVATE_FLAG |
ignored |
FUTEX_CLOCK_REALTIME |
ignored |
Workarounds
if you run an application in the linux java which wants to use the linux epoll functions (you should see "not implemented" messages in dmesg), you can start java with the argument -Djava.nio.channels.spi.SelectorProvider=sun.nio.ch.PollSelectorProvider
PRs
21463: Kris Kennaway suggests the introduction of a sysctl switch which allows/disallows execution of Linux setugid binaries. This would protect FreeBSD from certain Linux userland vulnerabilities.
29698: Most of the problems are fixed, only kern.ipc.semmap missing but is present in the code. Bug? We dont have ipcs installed in linux_base but gentoo version works with current code.
36952: The linuxulator can only distinguish between freebsd and linux elf binaries. When it comes to any kind of shell scripts it cannot decide whether to run them with a native freebsd or linux shell as interpreter. This lets linux Bourne-Again shell scripts fail since /bin/bash isn't present in a regular freebsd installation. Along with the problem report a patch was submitted which never got tested however. It's unknown if it fixes the problem.
39201: If a process gets started with rfork(flags|RFLINUXTHPN) and then traced with ptrace(PT_ATTACH), calling waitpid() fails with ECHILD. Also ptrace(PT_DETACH) might kill the traced process (this hasn't been confirmed though). A patch has been submitted, but is outdated (4.6-RC) and needs to be rewritten.
44293: fixed
55835: This was commited in rev. 1.54 of compat/linux/linux-ipc.c
56451: proc/cpuinfo (linprocfs.c) reports wrong CPU model.: fixed in 9-current (214982).
72920: seems to be important for Oracle, the fix may be just prepending /compat/linux to the name variable in the args of bind() and connect()
73777: we need to lookup the orginal intent of the code which is removed in this patch: the idea of removing the special handling for "/" in the linuxulator was abandoned.
77710: fixed in 7-current
93199: fixed
97326: fixed
99068: affects linux-java: fixed in stable/8 (r180768) and MFC'ed to stable/7 (r173628).
102897: errno problem?: fixed in rev 1.97 of linux_file.c (MFC'd to 6.x)
102956: fixed in 8-current (r192203) and MFC'ed to 7-stable (r194281).
133144: Linux has gone through two threading model changes. If a Linux application or library has been linked against the old pthreads without fast TLS support or pthreads with internal TLS support libraries it will segfault. This thread describes the threading situation under FreeBSD and Linux in detail: http://lists.freebsd.org/pipermail/freebsd-threads/2003-June/000530.html. This PR is a 2.6.16 stopper since it entirely prevents binaries/libraries compiled against an older Linux threading model from running. As long as this PR doesn't get resolved legacy code (2.4.2 emulation) cannot be removed.
138860: linux_socketcall() can lead to buffer overflows.
138880: munmap() segfaults after intensive linux_mmap2() stresstest
151714 lack of inotify support breaks Acrobat Reader 9
TODO
in -current
- Futexes lack synchronization with VM and virtual memory independance.
- add the compat.linux.strict_emu sysctl (the name is not set in stone) and use it to enable/disable some functionality we allow in FreeBSD but are not allowed to do in Linux, e.g., allow to open more than one million files, or allow to read() directories
- add detection of unhandled flags to syscalls (e.g. add each detected flag to a variable and compare this variable to the input, if there's a difference print the difference)
- print a message if a known but unhandled flag (e.g. LINUX_O_NOATIME for open()) is used (maybe protected by bootverbose or debug; depending upon the severity and frequence of the use of this flag)
- change the non P1003_1B_MQUEUE part of linux_mq_*() into its own module (the changes for linux aio in p4@114975 can serve as an example)
- SO_PEERCRED: PID handling (compat/linux/linux_socket.c)
- add in the normal device driver entries into the device_handler stuff that are common to all Linux machines
- add in /dev/passX and if possible /dev/sdX support into the device_handler stuff (message "color scanner software for FreeBSD" on emulation@)
- have a look at the linux dev_t issue (HEADSUP message from phk to arch@ in March of 2005)
- cleanup/review of existing code
- Verify that each copyin()/copyout() is handled correctly. There are at least 5 copyout()s where a quick grep revealed ignorance of the return value.
there are places where a function A calls function B, then calls copyin() on the result of function B, and then checks the return value of function B for errors -> BOOM on error when calling copyin()
- safety-net like in rt_sigpending (but no "new" return values, KASSERT if possible, rt_sigpending needs to be reviewed regarding this)
- CLK_TCK still valid? / XXX comment: both in linux_misc.c
- add doxygen comments to the code
- Use 1:1 threads instead of processes for Linux threads?
- write the MD stuff for amd64 (seems to be done (p4 diff above), modulo bugs, needs testers)
MFC
The following needs to be MFCed (incomplete list):
mmap() improvements: done
- bugfixes (commits need to be reviewed, telling us about specific stuff/commits would be appreciated): partly done, maybe some are outstanding, we have to check
utimes() syscall: done
clock_*() syscalls: done
rt_*() syscalls: done
linprocfs improvements: done
deCOMAPT43ify: done
PR77710: done
The following will not be MFCed (maybe incomplete list):
- TLS
- NPTL
Bugs
A note to users: Feel free to read the following list of things to fix, but do not make conclusions out of it. The linux compatibility environment runs just fine. There are some broken edge cases which don't affect the daily use. The following list is only meaningful for developers. Just because something is marked as a bug, it doesn't mean it doesn't work. It may be the case that some obscure error condition does not return the expected error value, or that a seldomly used feature is not implemented.
Misc
- the linux-ldd doesn't work when not run with the linux compat shell; if it is run within the linux compat shell, it does print nothing for dynamic executables (it correctly works for libs); because it is a shell script which calls ld-linux.so.2 it may be caused by something which is tested in the LTP... or not
report by LTP and/or the linux pkill command: "2.4+ kernel w/o ELF notes? -- report this": what are ELF notes and why do we need them - the patch is available.: fixed
bug regarding mkdir, report by "Steven Hartland" < killing@multiplay.co.uk > on emulation@ at 13 Jul 2006: trailing slash in mkdir() call results in failure: no longer reproducable in current
remove() does not delete empty directories (PR 102897), also reported with LTP: fixed in current
mmap problems (errors in LTP runs, not taking PROT_EXEC into account): fixed in current/stable
broken mmap behavior with PROT_EXEC: fixed in current/stable
- linux_lseek() silently truncates the offset modulo 2^32, lseek() in 2.6.x has error handling for this (EOVERFLOW). Unlike the FreeBSD lseek() (open(), ...) the Linux one can't handle 64bit file sizes.
- the linux "dmesg" utility does not work, missing syscall
LORs with 2.6.16: fixed in p4
problems with 2.6.16 and FC6 linux base (not available in the ports collection), primary issue: *at() calls: in HEAD.
zombie processes with 2.6.16 when closing linux-opera
utimes changes permissions of files, see http://lists.freebsd.org/pipermail/freebsd-emulation/2006-December/002937.html : fixed in current as of 2006/12/31 13:16:00
Automated test results
- The ¨ballista¨ testcases of the Linux Test Project should be run when we are finished with fixing bugs.
- The Open POSIX testsuite (comes with the LPI); we need a reference output from Linux to be able to compare this.
- Some of the LTP testcases which are not run by default.
- The glibc regression tests.