This is my project's final report. Before I dive into the technical details I want to say great thank you to my mentors - Brooks Davis and Jacques Vidrine. They've helped me with a lot of things in the project and answered all my questions (even the silly ones :)). I hope not to lose contact with them after the project's end. I'm also very very thankful to all people who tested the project and shared their ideas with me.

My Perforce Directory Structure

Initial Goals

The project itself has 2 main goals:

  1. Extend the nsswitch to support more databases.
  2. Make nsswitch support caching.

Both of these goals were achieved.

Nsswitch Extensions

During the project the nsswitch was extended to support several new databases:

  1. Services
  2. Protocols
  3. Rpc
  4. Openssh host keys
  5. Globus Grid Toolkit 4 gridmap

Services, protocols and rpc databases are implemented inside the libc. The support of openssh host keys and GT4 gridmap databases was added by patching the openssh-portable and the upcoming gt4 port respectively.

Services database

The support of services database is implemented in the getservent.c file. It was quite complicated task comparing to protocols and rpc. I've had some problems with getserv*_r interface. These functions were implemented in the HP-UX style, and I had to rewrite them in the Linux/Solaris style. So, here are these functions before:

int getservbyname_r(const char *, const char *, struct servent *, struct servent_data *);
int getservbyport_r(int, const char *, struct servent *, struct servent_data *);
int getservent_r(struct servent *, struct servent_data *);

and after modification:

int getservbyname_r(const char *name, const char *proto, struct servent *serv, char *buffer, size_t bufsize, struct servent **result);
int getservbyport_r(int port, const char *proto, struct servent *serv, char *buffer, size_t bufsize, struct servent **result);
int getservent_r(struct servent *serv, char *buffer, size_t bufsize, struct servent **result);

Linux/Solaris interface gives several advantages:

  1. Compatibility with Linux/Solaris nsswitch plugins
  2. Nsswitch plugins developers work with "general" data structure like memory buffer, instead of working with very specialized servent_data. Servent_data forces developers to use the particular structure to store their data, and memory buffer doesn't have this restriction.

The files, nis and compat sources were implemented for services database. I've used passwd and group databases as the example of how things should be done.

RPC and Protocols databases

They were much easier to be implemented through nsdispatch - basically because they don't have the compat source. I've made the necessary modifications to the code - files and nis sources are supported.

OpenSSH hostkeys database

I've made a patch for openssh-portable port. Several files were patched, but the main modification were made to the hostfile.c file. All modifications are embraced into #if defined HAVE_NSDISPATCH && defined USE_NSSWITCH braces. HAVE_NSDISPATCH marco variable is set by autoconf - the test for nsdispatch presence was added to the autoconf input file. User can also always turn nsswitch support off by unsetting the USE_NSSWITCH variable. If the openssh is compiled with nsswitch support turned on, it will use appropriate nsswitch sources (it will even mention nsswitch in its debug message :)). NIS and files sources are currently supported. Danny Braniss requested the HESIOD support and agreed to help with it. So it will be done in the nearest future.

Currently interaction with NIS source is very similar to working with the ssh_known_hosts file. We iterate through all lines and then get the result. The advantage of such a way is that we support absolutely all ssh_known_hosts file functionality (host masks, etc). The disadvantage is speed. I'm thinking about rewriting NIS source, so that it use yp_match() function instead of yp_first() and yp_next(). This will cause some restrictions to the NIS map file, though.

GT4 gridmap

I've modified the GT4 sources (gss_assist_gridmap.c and configure.in files in the gss_assist folder, actually), so that it can use nsswitch instead of the gridmap file. Currently only files source is supported. It's not hard to implement NIS and other sources, but first the whole approach should be tested, I think. I haven't made a patch out of my modifications yet. I'll do it in a little while, after the testing is done.

Some more details

I've made the small and simple test toolkit (//depot/projcets/soc2005/tests/common). For each nsswitch libc database, that I had to implement, I've made some kind of a "sandbox" environment (for not to rebuild the world every time the code is changed). For each database a series of tests was made. One of the most important tests was to determine if the new functions return exactly the same results as the old ones. After the code had been tested, it was cleaned and moved to the libc.

Caching library

To make the caching possible, I've implemented the caching library. It defines the simple interface to organize the cache of arbitrary data. The cache itself is derived on several entries. There are 2 basic types of entries:

  1. Common entries. Common entries allow user to cache key=value data. These entries support policies (LRU - least recently used, LFU - least frequently used and FIFO - first in, first out). Policy is applied if there are too many cached elements in the entry.

  2. Multipart entries. These entries allow user to cache the sequence of arbitrary data. They are used for caching getXXXent() functions results. To work with such an entry, user must initialize a session. If user opens write session, he sequentially write data to the entry, but he can submit the changes only by calling close_cache_mp_write_session() function. If, instead, he calls abandon_cache_mp_write_session(), all the data from the current write session are freed and the session is abandoned. This approach is very useful for getXXXent() functions. When the first call of getXXXent() is made, the write session is opened. If the setXXXent() or endXXXent() functions are called, the session is abandoned. If the getXXXent() functions indicates the end of sequence, the session is gracefully closed with close_cache_mp_write_session() and all session data are placed in the cache. The read session allows user to sequentially read data from the entry. If there are some opened read sessions in the entry, the write session can't be submitted.

The multipart entries implementation wasn't very complicated. Common entries required much more work to be done. I've implemented the queue.h-like hashtable interface. So the data in the common entries are stored in the hashtable. FIFO and LRU policies are implemented via macros from queue.h file. In case of FIFO policy, each new element is just pushed to the front of the queue. If the FIFO policy is applied, the elements from the queue's back are deleted first. In case of LRU policy, elements are moved to the front of the queue after each request. And, similarly to the FIFO, the elements from the queue's back are deleted first.

The LFU policy is somewhat more complicated. Frequency is the real value so, when it's updated we must update the whole array (we can't use queue, cause we'll need sorting or binary search). So I think that using the array and sorting/binary search is not the optimal way, especially when the cache is huge. I've used a kind of the hashtable for this policy. The idea is that if the element has access frequency of 0.15, for example (it means 0.15 times a second), then it would be placed in the hashtable entry number 15. Elements from hastable entries with lowest numbers are deleted first. To make this approach really effective, the hashing function (which maps 0.15 to 15) shouldn't be linear, and frequency should be probably calculated as number of requests/minutes or half-minutes instead of number of requests/seconds. I'll make the research of what values are optimal, and will optimize the code of LFU policy.

Caching Daemon

Caching daemon is built upon the top of the caching library. It has the defined communication protocol - which is, with some exceptions, a wrapper on top of the caching library functions. During startup, it parses the cached.conf configuration file and initializes the cache. Cached.conf file contains information about the cache entries and specific values, like threads number and connection time out. To communicate with users, caching daemon uses the unix socket.

There are 2 kqueues in the daemon. One for all socket events. And one for timer events. Kqueues are both shared among all daemon threads. Each request is characterized by the struct query_state. Request is processed using the state-machine. Each processing step can be done only when appropriate number of bytes can be read/written to/from the socket (NOTE_LOWAT kevent flags are used to do this). Daemon correctly handles cases, when the amount of data that should be sent/received exceeds the socket buffer size.

Caching daemon doesn't do the actual nsswitch lookups. It only caches the results. I'll add the ability to make the lookups - I just didn't plan to do it during the SoC. I think that these approaches (when daemon does the lookups, and when it only cashes the results) are quite equal - they both have advantages and disadvantages. So in the future caching daemon will be able to work in both ways. Currently it can only cache the existing results. Because of this behavior, the cache is per-user - each cache entry name in the cache contains the user's euid (for example root's entry name for services will be "0_services"). It is needed to avoid the cache poisoning. Unfortunately, I don't think that global shared cache is possible - it's too vulnerable. To avoid potential issues with setgid programs, the daemon will also check that the calling user has equal uid and euid and gid and egid. IMHO, this will make cache poisoning impossible - user will be able to poison only the cache, which he uses by himself.

Caching And Libc

To support caching, I've made some modifications to the libc. All these modifications are embraced into the #ifndef NS_CACHING/#endif macros, so the caching can be easily turned off. The core of caching is in the nsdispatch function. If during the processing of sources chain it finds the cache source (currently I simply do strcmp() with 'cache'), then it doesn't process the source in the usual way. It retrieves the pointer to the struct nss_cache_data from the cache source (from the void *mdata, actually). This structure contains 3 functions pointers:

  1. id_func - this function fills the given buffer with the string, which will identify the cached data

  2. marshal_func - this function marshals the lookup result into the memory buffer

  3. unmarshal_func - this functions unmarshals the lookup result from the memory buffer to the user-supplied arguments

These functions are used to store the lookup results in the cache. All daemon-communication functions were placed into the nscachedcli.c file. All caching-related functions were placed into the nscache.c file.

To add caching support for particular database (services, passwd, group, etc), we need to implement XXX_id_func, XXX_marshal_func and XXX_unmarshal_func for these database's functions. To ease the definition of cache-related structures, 3 macros were made:

Here is the example of how this is used for getservbyname_r function:

#ifdef NS_CACHING
/* serv_id_func, serv_marshal_func and serv_unmarshal_func are implemented here */
#endif

int 
getservbyname_r(const char *name, const char *proto, 
        struct servent *serv, char *buffer, size_t bufsize, 
        struct servent **result)
{
        static const struct servent_mdata mdata = { nss_lt_name, 0 };
        static const struct servent_mdata compat_mdata = { nss_lt_name, 1 };
#ifdef NS_CACHING
        static const nss_cache_info cache_info = 
        NS_COMMON_CACHE_INFO_INITIALIZER(
                services, (void *)nss_lt_name,
                serv_id_func, serv_marshal_func, serv_unmarshal_func);
#endif /* NS_CACHING */
        
        static const ns_dtab dtab[] = {
                { NSSRC_FILES, files_servent, (void *)&mdata },
#ifdef YP
                { NSSRC_NIS, nis_servent, (void *)nss_lt_name },
#endif
                { NSSRC_COMPAT, files_servent, (void *)&compat_mdata },
#ifdef NS_CACHING
                NS_CACHE_CB(&cache_info)
#endif /* NS_CACHING */         
                { NULL, NULL, NULL }
        };
        
        int     rv, ret_errno;

        ret_errno = 0;
        *result = NULL;
        rv = nsdispatch(result, dtab, NSDB_SERVICES, "getservbyname_r", 
            defaultsrc, name, proto, serv, buffer, bufsize, &ret_errno);
        
         if (rv == NS_SUCCESS)
                return (0);
        else
                return (ret_errno);     
}

I've written the id, marshal and unmarshal functions and patched libc to support caching for several databases:

Everything that was described above can be found in my main src-tree branch (//depot/projects/soc2005/nsswitch_cached/src). I've made a patch out of it - so that everybody could try and test it. After posting the patch to -hackers and -current mailing lists, I've received some very useful feedback - several bugs were fixed and some improvements were made after that.

In-Process Caching Branch

In my original proposal I've noted the ability of the in-process caching - when we don't have the caching daemon and cache is process-specific. I've made the special branch (//depot/projects/soc2005/nsswitch_cached/tests/src.inproc) where it is implemented. I don't think that this branch will be used further, but I'll do some benchmarking with it.

P.S.

I've worked with great people, learned a lot of new things, and gained an awesome experience during my work on this project. I'll be extremely glad to continue working with FreeBSD after the SoC ends. I have some ideas of what to do in the nearest future:

NsswitchAndCachingFinalReport (last edited 2008-06-17 21:38:26 by localhost)