Linux packages for pkg(8)
Student: ManuelWiesinger
Mentor: BaptisteDaroussin
Project description
The idea is to make pkg(8) able to install foreign packages (i.e. .deb/.rpm), from foreign repositories, for use with linux(4). This is done by, first extending libpkg, so that it can deal with foreign package sources and then feeding the SAT-solver of pkg(8) to resolve the dependencies and fetch/install whatever is necessary. I will start this project with Debian packages. If time allows I will start to implement .rpms-support as well. The biggest challenge is to use the SAT-solver of pkg(8) for foreign dependencies.
The recommended way (that is used by most major Linux-distros, including Debian) to verify packages, is using gpgv. I think verifying Debian-packages with OpenSSL can easily become very cumbersome and error prone. The simplest approach seems to be using an exec-function. Unfortunately this means to depend on GPLv3 code. In my eyes a reasonable compromise is to ask the user to install it manually, if it is not.
Deliverables
a forked git repository including:
my adaptions, extensions and bugfixes, style according to style(9)
- Suitable sections in the manpages
- Configuration examples
- Where useful ATF based tests
You can find the repo here: https://github.com/manufactory/pkg
Project Status
In a nutshell: I did not complete everything as planned. See Official Final Status for details. However, I'll hack on and send pull requests on Github.
Here of suggestions, that I will work on. This list will be updated:
Get the code compatible to the latest trunk of pkg |
In Progress |
Apply suggestions of bapt@ (mainly just exchange function calls) |
In Progress |
Consistently use pkgdb_* instead of sqlite |
In Progress |
Make pkg_parse_manifest() repo specific |
to be done |
Adding package type to pkg query |
to be done |
Make pkgdb_query() to repo specific |
to be done |
Make pkg audit to repo specific |
to be done |
Make pkgdb_* functions more generic, to avoid code duplication |
to be done |
Make functions in pkgdb_iterator.c less restrictive for non-binary functions |
to be done |
Make pkg_is_valid() repo specific |
to be done |
Write a pkg_is_valid() |
to be done |
Make pkg_add() repo specific |
to be done |
Integrate my pkg_add() |
to be done |
Write a pkg_search) |
to be done |
Changes in the architecture in detail:
First of all, when typing 'pkg something', pkg has to know somehow, that it's not working with a 'normal' repo. For example, files should be installed to '/compat/linux'. This could be done by simply providing the repository configuration file, when calling package. libpkg/pkg_add.c -------------------------- pkg_add_from_remote() pkg_add_upgrade() minimal changes, just call the right pkg_add() add.c: ------------- exec_add(): -- .) needs to open the right database according to the repo provided on the command line. .) Needs to call the right pkg_add function. I suggest that a pkg_add() function is added to struct pkg_repo_op. pkg_manifest.c -------------------------- The manifest_keys are not are not specific. I'd move pkg_parse_manifest() to the repo-specific directory. Funtions that need to be adapted (that is all non-static functions): pkg_emit_filelist() pkg_string() pkgdb_register_pkg() pkg_parse_manifest_fileat() pkg_parse_manifest_file() pkg_emit_object() pkg_emit_mainfest_file() pkg_emit_mainfest_sbuf() pkg_emit_mainfest() src/query.c -------------------------- print_query() I'd love to see the pkg-type here. This can be done just by the repo-type pkg_query.c: -------------------------- I think pkgdb_query should belong to struct pkg_repo_op. All occurrences need to be adapted accordingly. I could list them all here, but I think it would just look like I'm trying to make that mail look longer, when I just list a grep -r here. src/rquery.c and rquery.c : -------------------------- should work, once query works nicely. Same adaptions as for query.c src/annotate.c and annotate.c: -------------------------- Just need to call the right pkg_query. Probably it makes sense here, to exclude non-binary-repos, expect it is given explicitly. I cannot think of a sensible use case where an annotation is set for binary and .deb packages. libpkg/pkg_audit.c -------------------------- struct pkg_audit needs entry about the audit type (e.g. DEB_AUDIT) pkg_audit_process() needs to call the right function then. Probably struct pkg_repo_op is fine for a specific parsing function audit.c: -------------------------- Adapt warnings to allow other filenames than vuln.xml Should work once pkg_audit.c is adapted. fetch.c: -------------------------- Should work without any change. When repo_ops.get_cached_name() and repo_ops.fetch_pkg() work. libpkg/pkg_jobs.c: -------------------------- jobs_solve_install_upgrade() should work nicely, but depending on the repo type not all flags may be available. But since not all packages have e.g. shared libs. This should work without change. pkg_delete.c: ------------- pkg_start_stop_rc_scripts() needs to be move to repo_ops, if this is desired. It may be 'not so easy' to start e.g. debian rc-scripts or even worse systemd-stuff. libpkg/pkg_version.c: -------------------------- pkg_version_cmp() imo needs to be moved to repo_ops pkgdb.c: ------------- The sql-statements are database specific and imo it would make sense to move them to a header file in repo/_sometype_/ Same counts for: pkgdb_init() pkgdb_register_pkg() pkgdb_unregister_pkg() pkgdb_vset() pkgdb_open(): needs a small adaption to open the given repository. alternatively, an own function can be created, that opens the database specified for the given repo. This is necessary forin e.g. pkg_add() to work. pkgdb_obtain_lock(): can be reused. Of course all 'special'-databases have to provide a lock-table then. other functions (including pkgdb_release_lock(), pkgdb_close()) can safely be reused. pkgdb_iterator.c -------------------------- All functions in load_on_flag[] need to be not to throw an error if something not mandatory is not existing. E.g the Debian manifest provides no licences. For Debian the following values do not exist, or make little sense: pkgdb_load_scripts (can be changed easily) pkgdb_load_options pkgdb_load_category (in the Debian manifest too, but has nothing to do with FreeBSD categories ) pkgdb_load_license pkgdb_load_user pkgdb_load_group pkgdb_load_shlib_required pkgdb_load_shlib_provided pkgdb_load_provides pkgdb_load_requires libpkg/pkg.c -------------------------- pkg_is_valid() needs to be generic src/install.c: -------------------------- pkg_flags f = PKG_FLAG_NONE | PKG_FLAG_PKG_VERSION_TEST; There is no sensible package "pkg" for non binary packages. This only needs to test for pkg if it is a binary repo. How to avoid code copy pasting in my code: ---------------------------------------------------- There are many functions that can be used generically for different types, but not necessarily fit for all future-repo types. Thus I suggest, to provide generic functions, which can be called by the repo-specific functions. This counts especially the database functions. Maybe it is sensible to make them generic, and give them just pointers to sql_prstmt plus an index. These are: .init() - with pointer to sql_stmts .access() .open() .close() .stat() .mirror_pkg() makes only sense if we want to mirror foreign repos, there are better tools for that imo. -------------------------------------------------------------------- -------------------------------------------------------------------- TODO: ----------------- pkg_search() Small stuff: ----------------- add.c: pkgdb instead of sqlite usual small stuff mentioned before: ----------------- kick out mktemp() NELEM instead of STRLEN autotools magic todo: ----------------- get repo/linux_deb/utils to repo I nuked code for a generic way to put a pkg to the database. I simply cannot know what foreign manifests provide and prepared statements are already there and repo-dependeant. -------------------------------------------------------------------- brought in by bapt@ but not yet done: -------------------------------------------------------------------- pkg_repo_util_check_gpg() use posix_spawn() pkg_repo_linux_deb_fetch_check_extract_packages() should say: only amd64 and i686 supported several times: get rid of fgets() in favour of getline() I hope I did not forget something important. I expect, that there is stuff missing in this mail, which will occur during testing. However.
Official Final Status
Here is my final status mail:
Fully integrating my code into pkg(8), was unfortunately not entirely possible, within the scope of this years summer of code. The architecture to install packages is very specific and tailored for binary packages (i.e. common FreeBSD-packages). Changing huge parts of an existing software architecture requires lots of discussion. That was not possible in time. I think it would have made little sense to spend much time to get it working in a hackish way, that is ugly and thus useless in the end. For this reason I mocked many functions in accordance with bapt@ and created a very detailed list of suggested changes, to abstract functions so that they are repo specific. This list was was mailed to bapt@ for evaluation. Of course, it is important to me to fully integrate my code into pkg(8) and make it a part of FreeBSD. I will work through my list of suggestions stepwise and open pull requests on Github. This way discussion will take place regardless of the availability of upstream developers. I hope in the end this will highly simplify the development for other package types, such as RPMs (and conditionally even ruby-gems, etc.) There is fully working and fully integrated code for updating repositories, verifying the the manifests using gpg and for creating/deleting databases. There is mocked code for comparing package versions according to Debian's specification and to analyse, register and extract Debian-packages. This includes dependency and conflict parsing. This is basically all that is needed to use an official Debian repository. I spent a lot of time of to learn the ropes, and I want to use my gained knowledge to continue contributing to pkg(8). If possible I'll attend the EuroBSDcon in Stockholm and present what I have so far.
Milestones
May, 25 (official start of coding)
Getting the code flexible enough to allow several repository types. Fix eventually bugs. It's not funny, but it makes life a lot easier: write tests to avoid regressions. Four weeks seem long, but being to fast is better then being to slow.
June, 26 (begin of mid-term evaluations)
Writing a (basic) backend for Debian packages. Start by defining test cases.
July, 10
Resolving dependencies using the pkg-SAT-solver
July, 24
Extra week to extend and improve the .deb-backend, and deal with eventual problems
July, 31
pkg-audit-interface
August, 7
extra time
August, 17
scrubbing the code, write documentation and additional tests.
August, 21
extra time until firm end
August, 28 (official end)
Test Plan
- You can see my commits-history on Github.
- Where applicable there will be ATF-based tests.
The Code
Useful links
Debian repositories: