General notes about the auditdistd implementation
Contents
As of r320481.
Lists
Auditdistd uses a couple of lists to communicate between threads. Those lists are:
adist_disk_list
adist_free_list
adist_recv_list
adist_send_list
Sender lists
Auditdistd seems to use adist_free_list, adist_send_list and adist_recv_list when acting as a sender.
Free list
adist_free_list is filled with ADIST_CMD_UNDEFINED requests during within the init_environment() function. This is because it acts as a pool of requests, which results in a fixed number of request structures moving between different lists.
Q: What happens when there is an end of file and trail_switch returns false, which results in the following code being executed:
Source: sender.c:read_thread()
A: trail_switch() returns false only when the currently processed file is not terminated and it could not be accessed (faccessat(fd, trail->tr_filename, F_OK, 0) == 0 is false). Then the read thread continues to loop until there is a file it can process.
Q: Why is it so important to give the request structure back to the free list?
A: The lists are shared between all the threads so it is a good idea to return the request structure if it is not needed at the moment as other threads might use it to increase their performance.
In sender_disconnect() both the send list and the recv list are merged into the free list.
Understanding sender threads
Auditdistd starts adist_sender() for every host it is supposed to send audit trails to from its configuration file.
adist_sender() starts 4 threads: guard_thread(), send_thread(), recv_thread and read_thread.
Guard thread
The guard thread is responsible for signal-handling and restarting connections when needed.
Every ADIST_KEEPALIVE seconds the guard thread calls guard_check_connection(), which checks if adhost->adh_remote is not NULL. If it is then it just calls sender_connect() to restart the connection.
Send thread
It tries to get a request from adist_send_list within ADIST_KEEPALIVE seconds. If it fails to do so it calls keepalive_send(), which tries to get an free request structure from the free list and fill it with an ADIST_CMD_KEEPALIVE command before putting it into the send list. Otherwise, it checks whether adhost->adh_remote is still there and sends the request packet to the server. It is worth noting that the request is inserted into the recv list before it is sent.
Recv thread
Tasks:
- Confirm that the server (receiver) received a packet from the client (sender).
Call trail_unlink() on the client side when the server confirms that it successfully received the ADIST_CMD_CLOSE command.
This thread gets requests from the recv list (which are inserted there by the send thread before sending a packet to the receiver) and compares their sequence numbers against replies it gets from the receiving server. If the confirmed request was ADIST_CMD_CLOSE then the trail_unlink() function is called.
Read thread
Tasks:
- Open files from reading.
- Check if a new file should be opened for reading.
- Read the content of the files.
A new file is loaded when the connection was reestablished or when trail_filefd(adist_trail) == -1`.
The thread checks if there is a new file using read_thread_wait(). Then it takes a request from the free list. When the read thread starts processing a new file the request is filled with the ADIST_CMD_OPEN command and the new file name:
Source: sender.c:read_thread()
Then the request is inserted into the send list.
However, when the file is already open then the read thread reads from the file, fills the request with that data and moves it to adist_send_list as a ADIST_CMD_APPEND request. In case there was an error during reading from the file, the request is filled with ADIST_CMD_ERROR instead.
About auditdistd packets
Apparently, auditdistd nodes communicate with each other using packets. The definition of the packet struct can be found in auditdistd.h:
187 struct adpkt {
188 uint8_t adp_byteorder;
189 #define ADIST_CMD_UNDEFINED 0
190 #define ADIST_CMD_OPEN 1
191 #define ADIST_CMD_APPEND 2
192 #define ADIST_CMD_CLOSE 3
193 #define ADIST_CMD_KEEPALIVE 4
194 #define ADIST_CMD_ERROR 5
195 uint8_t adp_cmd;
196 uint64_t adp_seq;
197 uint32_t adp_datasize;
198 unsigned char adp_data[0];
199 } __packed;
Source: auditdistd.h
Audit trail files handling (from the sender side)
trail.c
trail.c looks like a good file to start with.
65 struct trail {
66 int tr_magic;
67 /* Path usually to /var/audit/dist/ directory. */
68 char tr_dirname[PATH_MAX];
69 /* Descriptor to td_dirname directory. */
70 DIR *tr_dirfp;
71 /* Path to audit trail file. */
72 char tr_filename[PATH_MAX];
73 /* Descriptor to audit trail file. */
74 int tr_filefd;
75 };
Source: trail.c
In case of a lost connection, when trail_reset() is called, the trail->tr_filename is reset (set to '\0'), so auditdistd has to start processing all the files again.
Q: Is auditdistd going to resend all the trail files again in case of a lost connection?
A: No, as the receiver side seems to be able to recover the name of the last file it worked with thanks to the call to trail_last() in the receiver_connect() function.
Trail files
Trail files suffixes:
.not_terminated
.crash_recovery
.[0-9]{14}
The trail file can be found in case it was renamed because the first part of its name (14 bytes) never changes, although a trail file can be renamed from .not_terminated to .[0-9]{14} or to .crash_recovery when hosts are disconnected.
trail.c:trail_start()
trail_start() is called when the daemon wants to open a trail file at a certain offset. If the file doesn't exist, then trail_next() is called. If the file cannot be opened then the function checks if the file's name was changed and tries to find it. If it fails to do so, it moves to another file by calling trail_next(). There are a couple more checks which might result in calling trail_next() (effectively skipping the current file). A comment inside trail_start() describes in what circumstances auditdistd continues to process a file:
261 /*
262 * We continue sending requested file if:
263 * 1. It is not fully sent yet, or
264 * 2. It is fully sent, but is not terminated, so new data can be
265 * appended still, or
266 * 3. It is fully sent but file name has changed.
267 *
268 * Note that we are fine if our .not_terminated or .crash_recovery file
269 * is smaller than the one on the receiver side, as it is possible that
270 * more data was send to the receiver than was safely stored on disk.
271 * We accept .not_terminated only because auditdistd can start before
272 * auditd manage to rename it to .crash_recovery.
273 */
274 if (offset < sb.st_size ||
275 (offset >= sb.st_size &&
276 trail_is_not_terminated(trail->tr_filename)) ||
277 (offset >= sb.st_size && trail_is_not_terminated(filename) &&
278 trail_is_crash_recovery(trail->tr_filename))) {
Source: trail.c
Q: What is the difference between filename and trail->tr_filename in trail_start()?
A: The filename argument is adhost->adh_trail_name, which is the trail file name that is currently processed on the receiver side. At first the contents of filename are copied into trail->tr_filename but those two variables are not necessarily equal until the end of the trail_start() function. In fact, the trail_find() function might be called if the file was renamed. In this case trail->tr_filename will store the newest file name (the one that auditdistd is really interested in) and filename will store the old file name that auditdistd thought is still valid.
Source: auditdistd.h
The file is removed if it is fully processed (sent). Afterwards, trail_next() is called.
trail.c:trail_next()
The all the entries in the directory with logs are checked during searching for the next file. All the files not starting with a digit and not regular files are ignored. The next file is going to be the file which is the next file in a lexical order after the file name stored in trail->tr_filename, although if it is not possible to open it, then the file is skipped. Nothing happens if there are no new files.
Auditd and its interactions with the environment of auditdistd
Examples of file names inside /var/audit:
20170803223904.not_terminated when auditd is still running
20170803223904.20170803225824 after auditd was stopped writing to that file
current
After stopping auditd a file called /var/audit/current disappears. This file is in fact a symbolic link to the currently opened file, which by the way has a .not_terminated suffix.
A related comment before the trail_validate_name() function:
535 /*
536 * Check if the given file name is a valid audit trail file name.
537 * Possible names:
538 * 20120106132657.20120106132805
539 * 20120106132657.not_terminated
540 * 20120106132657.crash_recovery
541 * If two names are given, check if the first name can be renamed
542 * to the second name. When renaming, first part of the name has
543 * to be identical and only the following renames are valid:
544 * 20120106132657.not_terminated -> 20120106132657.20120106132805
545 * 20120106132657.not_terminated -> 20120106132657.crash_recovery
546 */
Source: trail.c:trail_validate_name()
Q: What program (or function) manages the .not_terminated suffix of audit trail files?
A: The auditd daemon seems to be in charge of this. Look at the comment in auditdistd trail.c:
Source: trail.c
Q: Does auditdistd delete old files in case there is no available disk space?
A: