Ports and Compiled Python Packages

Preface

CPython's different setup mechanisms as well as the CPython interpreter itself create byte-compiled copies (.pyc and/or .pyo) of a Python file at installation or run-time. The byte-compiled versions of the Python files are used in the first place by the interpreter to speed up loading and importing the files, when Python code is to be executed by a program.

Trouble in Userland

Right now, both package installers, pkg_install and the upcoming pkgng, as well as the ports tree's make install and make deinstall mechanisms install Python packages including the byte-compiled files.

At packaging and installation time, the files are picked up by the fake-pkg targets, a checksum for them is created and written into ${PKG_DBDIR}/${PKGNAME}/+CONTENTS (pkg_install) or the package checksum file (pkgng).

When a byte-compiled version of the original .py file is created, CPython stores the file's st_mtime and a magic number within the byte-compiled file and recompiles those files on mismatches of the .py file's st_mtime or the interpreter's magic number.

Recompiling those files leads to checksum mismatches for the package management tools, which might cause problems on auditing secured environments or the behaviour of package build systems (which check the integrity). If post-install hooks cause a recompile (since the st_mtime of a file as well as the installed Python interpreter's magic number might be different on the target system, compared to the system on which the packages were generated), the checksum verification will also fail for every package that installs python modules.

Avoiding the Trouble

To avoid those issues, the most simple solution would be to avoid creating a checksum for those files, since the CPython interpreter will not stop (re)compiling the files as long as

The FreeBSD port maintainers and packaging tools are not in the position to decide for users whether byte-compiling files should be performed or not. Hence the three requirements from above should be supported but not enforced on users.

Solution for pkg_install

To avoid checksum mismatches for pkg_install (pkg_delete and make deinstall), ${PKG_DBDIR}/${PKGNAME}/+CONTENTS must not contain checksums for the byte-compiled files. Since changing pkg_install just for this case is not a viable option, the necessary code might go into the port tree's Makefile infrastructure.

By enhancing the fake-pkg target in bsd.port.mk, we can strip the offending checksums quite easily, while at the same time we can avoid messing around with ${TMPPLIST}, causing side effects for other ports.

                ${ECHO_MSG} "===>   Registering installation for ${PKGNAME}"; \
                ${MKDIR} ${PKG_DBDIR}/${PKGNAME}; \
                ${PKG_CMD} ${PKG_ARGS} -O ${PKGFILE} > ${PKG_DBDIR}/${PKGNAME}/+CONTENTS; \
+               ${SED} -i -e '/\.py[co]$/{n;d;}' ${PKG_DBDIR}/${PKGNAME}/+CONTENTS; \
                ${CP} ${DESCR} ${PKG_DBDIR}/${PKGNAME}/+DESC; \
                ${ECHO_CMD} ${COMMENT:Q} > ${PKG_DBDIR}/${PKGNAME}/+COMMENT; \
                if [ -f ${PKGINSTALL} ]; then \

Note: http://people.freebsd.org/~mva/python_checksum.patch will always contain the most recent version of the patch.

Solution for pkgng

In contrast to pkg_install pkgng picks up the ${TMPPLIST} and creates the checksum internally, unreachable for any post-checksum operation as for pkg_install. Hence pkg register should be enhanced to receive a set of excludes (e.g. as regular expression) to ignore the checksum creation for files matching on it.

TODO: Implement/describe the pkgng extension.

Avoid byte-compiling

If checksums are not created anymore, audits based on the packaging tools create a hole for the byte-compiled files, since the auditor cannot tell, if they were modified without keeping track of their st_mtime elsewhere, creating a possible security issue. Additionally, avoiding byte-compiled files can save diskspace by reducing the installed package size to a minimum of necessary files.

pkg_install

To create a sort of frozen package without any byte-compiled files, byte-compiling should be made optional and it should be up to the user, whether byte-compiling is wanted or not. For the ports tree, it would be necessary to make compiling at installation time optional, so distutils and easy_install are aware of a --no-compile flag.

At the moment PYDISTUTILS_INSTALLARGS enforces the -c -O1 options, causing .pyc and .pyo files to be created without giving a user the chance to influence the behaviour. This however is a necessity, so that users do not have to deal with potential security issues.

To make byte-compiling optional, several prerequisites have to be met.

distutils (and easy_install, since it just utilizes distutils) differentiate between C extensions and pure python extensions and use different intermediate build directories. This needs to be aligned so that the last requirement (auto-populating) can be met for mixed package installations.

This change enables us to track which Python files are installed as Python packages and modules (and hence would be byte-compiled by default).

-PYDISTUTILS_BUILDARGS?=
-PYDISTUTILS_INSTALLARGS?=      -c -O1 --prefix=${PREFIX}
+PYDISTUTILS_BUILDDIR?=         ${WRKSRC}/build/lib
+PYDISTUTILS_BUILDARGS?=                --build-platlib ${PYDISTUTILS_BUILDDIR} --build-purelib ${PYDISTUTILS_BUILDDIR}
+.if !defined(WITHOUT_PYTHON_BYTECOMPILE)
+PYDISTUTILS_COMPILEARGS?=      -c -O1
+.else
+PYDISTUTILS_COMPILEARGS?=      --no-compile
+.endif
+PYDISTUTILS_INSTALLARGS?=      --prefix=${PREFIX}

To populate the ${TMPPLIST} automatically, ${PYDISTUTILS_BUILDDIR} now can be scanned for any *.py file. For each file, a .pyc and .pyo entry can be added to ${TMPPLIST}.

.if defined(USE_PYDISTUTILS) && !defined(WITHOUT_PYTHON_BYTECOMPILE)
_RELDIR=        ${PYTHONPREFIX_SITELIBDIR:S/^${PREFIX}\///}
add-plist-post: add-plist-pyc
add-plist-pyc:
        @${TOUCH} ${TMPPLIST}.pyc_tmp
        @for i in `find ${PYDISTUTILS_BUILDDIR} -type f -name '*.py'`; do \
                PYC=`${ECHO_CMD} $$i | ${SED} "s|.py$$|.pyc|"`; \
                NEWC=`${ECHO_CMD} $${PYC} | ${SED} "s|${PYDISTUTILS_BUILDDIR}||"`; \
                NEWO=`${ECHO_CMD} $${NEWC} | ${SED} "s|.pyc$$|.pyo|"`; \
                ${ECHO_CMD} "${_RELDIR}$${NEWC}" >> ${TMPPLIST}.pyc_tmp; \
                ${ECHO_CMD} "${_RELDIR}$${NEWO}" >> ${TMPPLIST}.pyc_tmp; \
        done; \
        ${CAT} ${TMPPLIST} >> ${TMPPLIST}.pyc_tmp; \
        ${CAT} ${TMPPLIST}.pyc_tmp > ${TMPPLIST};

.endif

The last task to do is cleaning up all pkg-plist files in the ports tree from the .pyc and .pyo entries.

Note: http://people.freebsd.org/~mva/pyc_compile.bsd.python.mk.patch will always contain the most recent version of the patch.

pkgng

TODO: Is it necessary for pkgng to do something here?

Python/CompiledPackages (last edited 2013-11-03 09:08:17 by KubilayKocak)