Nss-LDAP importing and nsswitch subsystem improvement
Contact: MichaelBushkov
Email: <bushman AT rsu DOT ru>
Possible mentors
Hajimu Umemoto and Brooks Davis (though he may not be available in June) agreed to be my mentors on this project.
Synopsis
During the Summer Of Code 2005 I was lucky to work on improving the FreeBSD's nsswitch subsystem and implementing caching for it. Almost all goals were achieved and the code was finally committed to -current. In this proposal there are some ideas, which came to me during the above work and some ideas, that I've found in FreeBSD projects list. The project consists of several tasks.
Task 1. Moving current nsswitch modules out of the libc
- Move DNS source out of libc into the separate nss-module. This will include moving out the HESIOD support from libc. With DNS code excluded from libc, we'll be able to move out the resolver code from the libc also.
- Move NIS/YP source out of libc into the separate nss-module. This will allow to move all YP-related stuff from the libc into the separate library.
- Moving the 'files' module out of the libc isn't a very bad idea. More to say, if that is done, the whole nsswitch subsystem will be implemented in the uniform way (each module as the shared library). Moving 'files' from libc will also allow to move the BDB code out of the libc.
The other advantage of moving all modules from the libc to the shared libraries is that when it's done, we'll be able to call nsdispatch() from places, other than libc, without any problems. Currently nsdispatch() has the following syntax: int nsdispatch(void *retval, const ns_dtab dtab[], const char *database, const char *method_name, const ns_src defaults[], ...);
The dtab argument contains modules’ functions addresses, for example files_getpwent, files_getgrent and others. Besides, the mdata member of the dtab is used to customize the behaviour of each of these functions. It makes calls to nsdispatch() from places, other than libc, almost impossible, because we don't know how to fill the dtab structure (we don't have the modules functions addresses and we don't know how to fill the mdata parameter). There are 2 ways of solving this problem:
- We can somehow expose the nsswitch-modules related functions or filled dtab structures so that they would be visible outside the libc. This is a bad way, IMHO, because a) both functions addresses and dtab are implementation specific b) they can change.
- We can move all nsswitch-modules to the shared libraries. This will solve the problem automatically.
What the ability to call nsdispatch() from outside of the libc will give us? Currently I see at least 1 good application of this ability. We'll be able to implement 'perform-actual-lookups' logic in the caching daemon for all nsswitch databases. The advantage of 'perform-actual-lookups' mode is that it allows to have the global cache for all users for the specified database. It can be useful, for example, for 'hosts' database (and for all databases that would benefit from the global cache).
Task 2. Importing nss_ldap into the source tree
Currently nss_ldap can be installed via ports collection. As it becomes widely used it will be useful to include it into the default FreeBSD distribution. There are 2 ways this could be done:
- We can import nss_ldap into the libc.
- We can import nss_ldap as shared nss-module.
With first approach, we’ll probably have to import OpenLDAP into the libc and we’ll also have all the libc-integrated-nsswitch-modules’ drawbacks, that were mentioned before. So, the 2nd way seems to be more suitable.
Nss_ldap files use basically 2 types of licenses – GNU Library General Public License and the variation of BSD license. If there is a need, I can rewrite the files, that are currently under GNU LGPL. Besides, nss_ldap should be extended to support pw_class member of the passwd structure.
Task 3. Regression tests
After the caching daemon was committed to –current there is a need in the set of regression tests for nsswitch-related areas. These tests should be used to test the nsswitch subsystem performance and accuracy with various configurations types. The set of such tests will be very useful during all nsswitch-related works – as moving nss-modules out of the libc and caching daemon optimization. These tests must cover all libc functions, that make use of nsdispatch: getpwbyXXX(), getgrbyXXX(), getservbyXXX(), gethostbyXXX(), getipnodebyXXX(), getaddrinfo(), getrpcbyXXX(), getnetbyXXX(), getprotobyXXX() and other such functions.
Task 4. Caching daemon performance improvement
The caching daemon (http://www.freebsd.org/cgi/man.cgi?query=cached&apropos=0&sektion=0&manpath=FreeBSD+7.0-current&format=html) currently can be very helpful with network data sources, such as DNS and LDAP, but it causes performance drawback when it is used with only “files” source. It should be optimized to reduce or (in the best case) remove this drawback. The daemon’s threading and communication parts should be examined for bottlenecks that cause the performance decrease.
Deliverables and dates
As a summary of the previous information, there is a list of deliverables with some times estimated:
- Making a set of regression tests for nsswitch subsystem. Will require about 3-4 days.
Moving DNS, NIS and files modules from the libc. This will make libc more lightweight, will allow to exclude several libraries (such as resolver and bdb) from it. Proper nsdispatch() calls from outside of the libc will also be available – so the ‘perform-actual-lookups’ option of the caching daemon could be implemented for all nsswitch databases (currently it is implemented only for passwd, groups and services). All this work will require about 2-2,5 weeks.
- Making a set of patches, which will allow to import nss_ldap into the source tree (perhaps, with rewriting some of its files to avoid licensing problems). It will require about 2 weeks.
Improving caching daemon performance to make it comparable with “files” nsswitch source speed. This will require about 7-10 days.
So, It will require about 2 months to get the work done. I will be able to start working in the beginning of June and work about 35-40 hours per week. All works should be over in the beginning of August.
Biography
I'm a 4th year student of the Rostov State University, Russia. I also work as a programmer in the University Computer Center. I'm developing for FreeBSD for about 4 years. My first major work was the lookupd daemon (http://rsu.ru/~bushman/lookupd) (which was something a similar to a light version of the FreeBSD's caching daemon), which is available in sysutils/lookupd folder in ports collection. I was lucky to participate (and to successfully complete) in SOC 2005 with the project which goal was to extend nsswitch subsystem and implement the caching daemon (http://wikitest.freebsd.org/moin.cgi/NsswitchAndCachingFinalReport). This proposal is the logical continuation of that work and is inspired by some my ideas (which came during the past work) and by ideas of other developers (found basically in FreeBSD's list of projects for volunteers).