TODO list for ports cluster items. Please do not edit without consulting portmgr. Thanks.
- all tasks on rackmac1 fail with "jail: unknown parameter: allow.nomount". Offlined.
- beefy1 offlined due to suspected bad RAM.
- dynode offlined. Unknown failure. Needs on-site hands.
- baytech-b2b. Solution: replace.
- Upgrade gohan09. Needed for pkgng.
- Bring nova online.
- Upgrade sol to 9.0.
debug failure modes
- Figure out why package1N failed so badly as i386 nodes but only on pointyhat-prime, not pointyhat-west.
- Figure out how to build i386-9 pxe for packageN.
Figure out why i386-9 at NYI are unreliable at > 4 maxjobs.
beefy* can run with 48 jobs without too much trouble iff package*.nyi.freebsd.org are online. If they are offlined beefy* will panic often unless backed down to 32 jobs.
- gohan09 seems to run into 'swap exhausted' even with a small-ish number of jobs. Should be able to run far more.
- ssh timeouts on beefy* and gohan09.
- fix the "truncated distfile" problem (occurs randomly and only under load). Possibly a problem in tmpfs.
- figure out why ganglia on -west just shows a blank page.
- pxebuild sync task.
- re-merge /a/pxeboot at nyi to reflect reality.
- looks like 2 copies of /usr/local/etc/ssh/ ?
- modify pxebuild script to also cross-build packages (at least, for i386).
- bring sparc64 boots up to 9-STABLE.
- auto-reboot from ddb?
- turn off periodic scripts on narutoN. (possibly for any that PXE boot?)
- clean up the NTP configuration problem.
- clean up the annoying messages from sendmail.
See general index on pointyhat.
- Eventually, once the new codebase is more mature, and one of the other machines has been in production for a while, switch over to the new codebase. At the same time, probably migrate HTTP serving of historical pages somewhere else. Once that is done, make this the new testbed for script debugging.
See general index on pointyhat-west.
- continue to write up the "how to create a package dispatch node" notes. They are starting to get scattered around.
- come up with a backup strategy, and finish zbackup and zexpire.
- once the software tasks below are completed, turn it over for general use.
- finish on-disk FreeBSD system install. (Right now, it just PXE boots.)
- once that is done, migrate the DHCP server over to it.
- use the above document to configure ports and scripts. Edit the document to reflect reality.
- all tasks as per pointyhat-west.
- test refactored errorlog parsing code.
- figure out a better restart strategy for qmanager. What we have is far better but imperfect.
- come up with a synchronization strategy for both log results, config files, and pxe setup files.
- instantiate pollmachine as a task under qmanager. Use this as a way to remove XXX notify.
- figure out a better way to kill a hung build.
- switch the 'failures list' pages over to duds.verbose
- automate the 'failures list' pages
- debug problems with scripts (e.g. results of scp(1) not being checked). This is believed to be the source of the "trying to make package over and over" problem.
- fix transient database timeout problem on "uploaded packages" query on overview page.
- fix transient database timeout problem on dependency tree page.
- create ganglia-monitor-mode-client
continue to quantify the performance that we need for nodes