Adrian's Xen Hackery
Overview
This page is intended as a set of informal notes about my foray into FreeBSD/Xen. Just to clarify:
- Don't email me directly about this stuff; email the freebsd-xen mailing list @ freebsd.org ;
- I'm not being paid for this (at all!) and this is all very much personally motivated, so please don't demand help from me because you can't get it to work. Just email the freebsd-xen mailing list and see if people (me included) are available to give you a hand;
- I'm using FreeBSD-current so I can push fixes and improvements into the upcoming FreeBSD 8.0-RELEASE - no, I'm not currently using the 7.x or 6.3 Xen branches so please don't ask why things don't work if you're using those versions!;
- This stuff can and will change frequently;
- And FreeBSD/Xen is still under development.
If you'd like to talk to someone about commercial FreeBSD/Xen improvements, the man to talk to is Kip Macy. He's done the bulk of the FreeBSD/Xen port. I'm just trying to coax it to work.
Background
I've been attempting to make FreeBSD/Xen (under FreeBSD-current) work. My test environment is a CentOS 5.3 i386 server, single CPU AMD XP 2000+ with 2GB of RAM.
My commercial hosting services use Xen and Linux virtual domains. I'd like to eventually make all of this work well enough to offer stable FreeBSD Xen virtual domains on the same platform but as I said above, this is all by and large personal interest at the moment.
I've been attempting to mostly mirror the same deployment for FreeBSD/Xen as I have with Linux - using pygrub to boot a bootloader; supplying separate LVM exported logical volumes as filesystems rather than a whole disk image (so extending the filesystems is made much easier.)
Block Device Naming
Xen hijacks the Linux block device major/minor numbering scheme and stuffs it into the block devices available for the DomU's. So if you use "sda1" for your Xen jail, The Linux Xen DomU block device driver would hijack the major number from the scsi device driver and use it.
FreeBSD/Xen used to simply create an "xbd" device with the raw device number after it. This changed recently (thanks to dfr?) to somewhat emulate what Linux was doing. From what I can tell, using hdX and partitions (hdXa, hdXb, etc) will result in the relevant ATA devices being created in the kernel (ad0s1, ad1s1, etc.) Simiarly for SCSI - sdX became daX in FreeBSD.
To get "xbd" (which I was using to make sure I had separately named devices that absolutely didn't look like they should deserve a normal DOS label) I need to use the major "202" (0xCA) with unit numbers being minor >> 4. So 0xCA00 is xbd0, 0xCA10 is xbd1, 0xCA20 is xbd2, etc. This gives me xbd0 -> xbd15 before I run out of "xbd" slices. (Actually that isn't true but I won't assume I'll get any more. Those who are interested should read sys/dev/xen/blkfront/blkfront.c:blkfront_vdevice_to_unit() to see.)
How it all boots
Xen has two main boot methods - either loading a kernel image from the Dom0 filesystem or via "pygrub". Pygrub will fondle the disk image to find a /boot/grub/menu.lst file; load the kernel from that into the Dom0 filesystem and execute it. It also grabs some parameters from the kernel configuration line and appends them as arguments.
I first tried the former. This is an example configuration file:
memory = 256 name = "freebsd" vif = [ 'mac=00:bd:c4:12:00:ef,bridge=xenbr0' ] disk = [ 'phy:/dev/hosting2_data2/XEN_freebsd,hda,w' ] on_crash = 'preserve' extra = "boot_verbose=1" extra += ",vfs.root.mountfrom=ufs:/dev/ad0s1a" extra += ",kern.hz=100"
A linux boot will take a "ramdisk" and "root" parameter set to define the ramdisk and root filesystem/options. FreeBSD/Xen doesn't seem to understand these; note how normal kernel environment hints are used via "extra".
Also note that the virtual disk is just that - a completely virtual DOS disk, complete with DOS slices and a FreeBSD disklabel on ad0s1. Building this was relatively easy:
# truncate -s 10G disk.img # mdconfig -f disk.img (this outputs the attached unit ; assume its md0 here) # fdisk -i md0 (follow the prompts to create one FreeBSD slice covering the entire disk) # disklabel -i md0s1 (this creates a single partition "a" covering the whole FreeBSD slice.)
Making the Xen Filesystem
I just used a normal world/kernel build and install plus a distribution install to setup the basic filesystem. There's a couple of changes which are needed once you've done this.
# cd /path/to/world && make buildworld && make buildkernel KERNCONF=XEN
Assuming that you're using the above "md" file method to populate a filesystem:
# mount /dev/md0s1a /mnt # cd /path/to/world && make DESTDIR=/mnt installworld && make DESTDIR=/mnt installkernel KERNCONF=XEN && make DESTDIR=/mnt distribution
Then, you need to edit a couple of files - /mnt/etc/fstab and /mnt/etc/ttys. Add a normal root drive entry to /mnt/etc/fstab and then add the following to /mnt/etc/ttys:
xc0 "/usr/libexec/getty Pc" vt100 on secure
Then, unmount, un-md, and copy over:
# umount /mnt # mdconfig -d -u <unit number>
Using separate slices
It is possible to use separate slices for each filesystem (and swap.) I'm sure you can do it with mdconfig - just don't put a DOS/FreeBSD label and directly newfs/mount the straight md device (eg /dev/md0.) I haven't tried this though; I ended up grabbing "sysutils/makefs" to create FFS filesystems for me.
# makefs -M512m root.fs /path/to/install/root
My configuration file for this has a slightly different disk device line:
disk = [ 'file:/home/adrian/xen/root.fs,0xCA00,w' ]
Note the entry uses a device id of 0xCA00 rather than the strangely over-loaded Linux-y device names (hda, etc.) This becomes "/dev/xbd0" in the Xen, so your extra line would then become:
extra += ",vfs.root.mountfrom=ufs:xbd0"
Using pygrub
Xen installs these days use "pygrub", a GRUB style bootloader mostly written in Python. Xen's "bootloader" support allows an external program to determine the relevant Xen configuration sections.
I've had no luck trying to make pygrub work with a DOS disk image and FreeBSD disklabel - it needs to be taught to read FreeBSD slices (it has support for Solaris slices; so someone with some care could probably copy that and make it work.)
Instead, it -does- have support for raw UFS partitions and the UFS code seems to handle an UFS1 partition from "makefs" just fine. I've not yet tried UFS2.
"pygrub" does the following:
- It fondles the -first- disk device listed in the configuration file for a DOS partition label or FS signature;
- If it eventually finds an FS signature, it calls the FS specific code to open it up and read the GRUB config file (/boot/grub/menu.lst);
- It then either runs interactively or just boots the default;
- It will then copy the required kernel and ramdisk out into the Dom0 filesystem (/var/lib/xen/);
- Finally, it returns a configuration stanza to the Xen environment which Xen then uses to load the domain.
To boot a FreeBSD install on /dev/xbd0, I have the following Xen config file:
bootloader = "/usr/bin/pygrub" memory = 256 name = "freebsd" vif = [ 'mac=00:bd:c4:12:00:ef,bridge=xenbr0' ] disk = [ 'file:/home/adrian/xen/root.fs,0xCA00,w' ] on_crash = 'preserve'
Inside the FreeBSD install I have one file, /boot/grub/menu.lst, with the following:
title FreeBSD root (hd0,0) kernel /boot/kernel/kernel vfs.root.mountfrom=ufs:xbd0,kern.hz=100,boot_verbose=1
Finally, to test, you can run pygrub from the command line:
# pygrub /path/to/root.fs
It will then run interactively, pop up a menu, and then spit out a configuration snippet. The above menu.lst generates this config snippet:
linux (kernel /var/lib/xen/boot_kernel.XU_kel)(args "vfs.root.mountfrom=ufs:xbd0,kern.hz=100,boot_verbose=1")
Finally, the running kernel environment, printed via kenv:
# kenv vfs.root.mountfrom="ufs:xbd0" kern.hz="100" boot_verbose="1"
Now. A few things to note:
- I bet the fact that the above pygrub hackery works is very, very undocumented.
- No, pygrub won't accept a BSD-y entry for root - eg root (hd0,0,a) - it isn't coded for that and will just error out.
- I haven't read the code to make certain that the args are always turned into the "extra=" style of Xen arguments, versus changing the "known" arguments (root, ramdisk, nfs_server, nfs_root, etc) into Xen arguments and the other into "extra=" arguments. As I said, I bet the fact that this works at all is a fluke.
Timekeeping
TODO - this needs looking at..
Console
The only support at the moment is the serial-y like "xencons". This doesn't hijack the normal cons25 device (ttyv0) so you must add an entry to /etc/ttys or no getty will be started:
xc0 "/usr/libexec/getty Pc" vt100 on secure
Remote GDB
The kernel remote GDB won't function. Kip has some basic instructions somewhere on bootstrapping Xen from source and then using the "xen-gdbserver" stuff to provide remote GDB for a domain. I haven't yet tried this so I can't document how to set it up or use it. Kip does say it works though.
Growing partitions online
TODO.. (haven't tried yet.)
Adding/Changing/Removing network devices online
TODO.. (haven't tried yet.)
Adding/changing/removing block devices online
TODO.. (haven't tried yet.)
I've grown the block device (lvresize), rebooted into single user mode and used growfs. Ugly but it works.
Memory Balloon Driver
The Xen code -has- a balloon driver to grow/shrink the DomU memory but I haven't yet tried it to ensure it works.
Network related stuff
There's a bunch of stuff in netfront which I should play with. Specifically tuning mbufs, the way it implements TCP segment offloading/checksum stuff (which I -believe- requires the Dom0 hardware to export it somehow to the DomU..) and other bits and pieces.
Rebooting issues
I think "reboot" doesn't force the Xen control to re-run pygrub and suck in a new kernel; I need to investigate what is going on there. I've noticed that my DomU crashes if I replace the kernel image and then restart. Normal restart'ing (without replacing the kernel) works fine.