"image"
To learn more about "image" you will find the overview page here.
Discussion during EuroBSDCon hackathon 2007
1. Panellists (unordered)
PawelJakubDawidek, IsaacLevy, BrooksDavis, PhilipPaeps, MarkoZec, SimonNielsen, BjoernZeeb,
2. Notes by IsaacLevy
2.1. Overview of "Image"/Jail meeting
A number of projects are directly related to Jail in the context of virtual servers, yet have more valuable roles and futures on their own. As PHK suggested in NotesEuroBSDConDevsummit2006; , it was discussed that jail should migrate to "image" (A system image)- however a design strategy which keeps various components seperated keeps the jail/image from turning into too much of a singularity or tangent in the OS, as well as making valuable aspects of their use available to the system for use in other contexts/applications.
The following projects have direct impact on jailing:
- Marco Work- TCP/IP virtualization
- Pawel work- ZFS/filesystem
- Brooks work- Resource Control/Job control (post-meeting note added below)
Jail Cleanup:
- get rid of jid
- go with jail names (unique)
2.2. Discussion of a few problems with current jail implimentation:
2.2.1. jails do not always stop and restart immeadiately
- hang from cridental structure
- TCP timeout code...
- Jail will persist until timeout ends (also in devfs code?!?)
- bug in TTY code
- lookup, devfs makes an entry, uses cridental of process of lookup
- devfs will never remove this entry
- THE JAIL WILL NEVER GO AWAY (a killed jail persisting in jls(8) is most visible example)
- bug in TTY code
- Possible solutions?
- always create tty with kernel cridentals
- jails with unique names, (performance problem? with new jails [scanning names, collision detection])
simply use a faster data structure if this becomes a problem
- Like separate exec syscall, (pre-populate jails)
- No chroot with open directory descriptor, but with open files OK.
- Would be nice to create empty jails and execute binary inside the jail
- Jail name (separate from hostname)
NO trust for hostname
- (never been a good strategy, 4.x era or 5-6.x era, real-world administration jail hostnames change often, and jailed users may need simply change hostname)
sidenote: Pawel has patch for jail within a jail http://garage.freebsd.pl/mljail.README
2.2.2. Misc. jail issues
- get rid of sysctls
- manage sysctls per-jail
- raw sockets for one jail, not for another- etc...
Message buffer, be careful- it's quite large
- Priv9 in the kernel, fine grained priviliges possble to allow mask of privs inside a jail
- Maximum set of privs. inside jail, then we can assign privs to jails.
- need to keep child jails in order
- Allow user to mount filesystems, etc...
- in linux, users get different capabilities to mount filesystems (get ucaps, etc...)
- remove setuid root from ping, etc... or setuid gid...
2.3. Marco's Virtualized network stack work
http://imunes.tel.fer.hr/virtnet/
2.3.1. Project Status
- Reasonably stable
- conditionally compilable
- bunch of macros, which can revert back to head with some exceptions of special cases
- removal should not harm system
2.3.2. Implementation Discussion
- The socket knows where it lives, each thread holds a pointer to a minute which needs to be worked on.
- Different threads can operate on different instances
- performance relies on current thread macro, (this is cheap, in the end)
- The Per-CPU macro feels cheap,
- Pawel note, it's not as cheap as reading a pointer
- The Good thing about having this implicit propogation of the context which to operate on,
- every socket is attached to one instance, one socket to each instance at all times
- one can always deduce state of a given process
Pawel sidenote, never knew which IP to operate on before...
- options:
- Case one cannot know, are the timers
- cannot operate without context
2.3.3. Not done
- cleanup of the state.
- killing one, is messy.
- Problematic issue:
- protp pr_init
- record the sequence, so to explicitly instantiate instance,
- replay an instance
- Will replay in reverse order be captured correctly... For init, it seems to work fine-
2.3.4. Before any cleanup can commence:
- the stack must be free of processes sticking
- sockets, and interfaces
- it doesn't attempt to do anything.
- For network emulation, it's cool.
2.3.5. Q: What about vlans- vlan ID collisions or not?
- Retain the association with parent id, you won't know it's a vlan id
- Can create independent Vlan interface inside or outside
- Can assign physical interface to virtual stack
- TSO and fancy stuff working atg full speed...
keep Jail code and virtual stack separate, as well as other resource constriction (general concensus this is an important idea)
- TSO and fancy stuff working atg full speed...
Pawel sidenote: Then can we still call jail jail?
Re. #2, 'Dummynet' with regard to Resource Control...? Talk to Jeff Roberson about it. nice(1), renice(8) discussion..
2.4. Virtualize filesystems?
2.4.1. Big Change Order:
3. Addendum, regarding Brooks' work on Process Scheduling
3.1. Notes from an informal conversation with Brian Redman, regarding Process Scheduling
3.2. HOG Source Code, a simple utility to hog memory:
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <signal.h>
/* written by Brian Redman (BER), sometime around 1986
Disclaimer
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY
EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR HIS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
*/
/*
Basic Instructions
Compile this code to a binary:
# cc hog.c -o hog
then run something like:
# hog 10
- and the hog will do just that- sit and hog 10mb of ram.
To run a hog stampede, (a fork bomb):
# while (1)
# hog 99m&
# end
# note: BER has used this code to break nearly every stock UNIX system that's
# existed since 1986 or so, fork bombs are a complicated problem which nobody
# has eloquently resolved in a dynamic manner, to date (2006).
#
# This code is good for replicating overt system memory leaks as well.
*/
int dosleep = 0;
long rlimit = 0;
struct rlimit rl;
void catch2(int i) {
printf("rlim_cur was: %ld\n", rl.rlim_cur);
printf("rlim_max was: %ld\n", rl.rlim_max);
rl.rlim_cur /= 2;
setrlimit(RLIMIT_RSS, &rl);
getrlimit(RLIMIT_RSS, &rl);
printf("rlim_cur is: %ld\n", rl.rlim_cur);
signal(SIGUSR2, catch2);
}
void catch1(int i) {
if (dosleep) {
dosleep = 0;
} else {
dosleep = 1;
}
signal(SIGUSR1, catch1);
}
main(int argc, char *argv[]) {
long i, *ip, *p;
unsigned long n;
long m = 1;
signal(SIGUSR1, catch1);
signal(SIGUSR2, catch2);
printf("%d\n", getpid());
switch (argv[1][strlen(argv[1])-1]) {
case 'g': m = 1024;
case 'm': m = m * 1024;
case 'k': m = m * 1024;
argv[1][strlen(argv[1])-1] = '\0';
}
n = m * strtoul(argv[1], (char **)NULL, 10);
getrlimit(RLIMIT_RSS, &rl);
rl.rlim_cur = n+2*1024*1024;
setrlimit(RLIMIT_RSS, &rl);
if (p = (long *)malloc(n)) {
printf("malloced %ld bytes\n",n);
while (1) {
while (dosleep) {
sleep(10);
}
ip = p;
for (i = 0; i < n/sizeof(long); i++) {
*ip++ = i;
}
}
} else {
printf("failed to malloc %ud bytes\n", n);
}
}