General notes about the auditdistd implementation

Contents

General notes about the auditdistd implementation

As of r320481.

Lists

Auditdistd uses a couple of lists to communicate between threads. Those lists are:

adist_disk_list
adist_free_list
adist_recv_list
adist_send_list

Sender lists

Auditdistd seems to use adist_free_list, adist_send_list and adist_recv_list when acting as a sender.

Free list

adist_free_list is filled with ADIST_CMD_UNDEFINED requests during within the init_environment() function. This is because it acts as a pool of requests, which results in a fixed number of request structures moving between different lists.

Q: What happens when there is an end of file and trail_switch returns false, which results in the following code being executed:

```
 469 if (!trail_switch(adist_trail)) {
 470         /* More audit records can arrive. */
 471         mtx_lock(&adist_free_list_lock);
 472         TAILQ_INSERT_TAIL(&adist_free_list, adreq,
 473             adr_next);
 474         mtx_unlock(&adist_free_list_lock);
 475         wait_for_file();
 476         continue;
 477 }
```
- ^{Source: sender.c:read_thread()}
A: trail_switch() returns false only when the currently processed file is not terminated and it could not be accessed (faccessat(fd, trail->tr_filename, F_OK, 0) == 0 is false). Then the read thread continues to loop until there is a file it can process.

Q: Why is it so important to give the request structure back to the free list?

A: The lists are shared between all the threads so it is a good idea to return the request structure if it is not needed at the moment as other threads might use it to increase their performance.

In sender_disconnect() both the send list and the recv list are merged into the free list.

Understanding sender threads

Auditdistd starts adist_sender() for every host it is supposed to send audit trails to from its configuration file.

adist_sender() starts 4 threads: guard_thread(), send_thread(), recv_thread and read_thread.

Guard thread

The guard thread is responsible for signal-handling and restarting connections when needed.

Every ADIST_KEEPALIVE seconds the guard thread calls guard_check_connection(), which checks if adhost->adh_remote is not NULL. If it is then it just calls sender_connect() to restart the connection.

Send thread

It tries to get a request from adist_send_list within ADIST_KEEPALIVE seconds. If it fails to do so it calls keepalive_send(), which tries to get an free request structure from the free list and fill it with an ADIST_CMD_KEEPALIVE command before putting it into the send list. Otherwise, it checks whether adhost->adh_remote is still there and sends the request packet to the server. It is worth noting that the request is inserted into the recv list before it is sent.

Recv thread

Tasks:

Confirm that the server (receiver) received a packet from the client (sender).
Call trail_unlink() on the client side when the server confirms that it successfully received the ADIST_CMD_CLOSE command.

This thread gets requests from the recv list (which are inserted there by the send thread before sending a packet to the receiver) and compares their sequence numbers against replies it gets from the receiving server. If the confirmed request was ADIST_CMD_CLOSE then the trail_unlink() function is called.

Read thread

Tasks:

Open files from reading.
Check if a new file should be opened for reading.
Read the content of the files.

A new file is loaded when the connection was reestablished or when trail_filefd(adist_trail) == -1`.

The thread checks if there is a new file using read_thread_wait(). Then it takes a request from the free list. When the read thread starts processing a new file the request is filled with the ADIST_CMD_OPEN command and the new file name:

 440 newfile = read_thread_wait();
 441 QUEUE_TAKE(adreq, &adist_free_list, 0);
 442 if (newfile) {
 443         adreq_fill(adreq, ADIST_CMD_OPEN,
 444             trail_filename(adist_trail), 0);
 445         newfile = false;
 446         goto move;
 447 }

^{Source: sender.c:read_thread()}

Then the request is inserted into the send list.

However, when the file is already open then the read thread reads from the file, fills the request with that data and moves it to adist_send_list as a ADIST_CMD_APPEND request. In case there was an error during reading from the file, the request is filled with ADIST_CMD_ERROR instead.

About auditdistd packets

Apparently, auditdistd nodes communicate with each other using packets. The definition of the packet struct can be found in auditdistd.h:

 187 struct adpkt {
 188         uint8_t         adp_byteorder;
 189 #define ADIST_CMD_UNDEFINED     0
 190 #define ADIST_CMD_OPEN          1
 191 #define ADIST_CMD_APPEND        2
 192 #define ADIST_CMD_CLOSE         3
 193 #define ADIST_CMD_KEEPALIVE     4
 194 #define ADIST_CMD_ERROR         5
 195         uint8_t         adp_cmd;
 196         uint64_t        adp_seq;
 197         uint32_t        adp_datasize;
 198         unsigned char   adp_data[0];
 199 } __packed;

^{Source: auditdistd.h}

Audit trail files handling (from the sender side)

trail.c

trail.c looks like a good file to start with.

  65 struct trail {
  66         int      tr_magic;
  67         /* Path usually to /var/audit/dist/ directory. */
  68         char     tr_dirname[PATH_MAX];
  69         /* Descriptor to td_dirname directory. */
  70         DIR     *tr_dirfp;
  71         /* Path to audit trail file. */
  72         char     tr_filename[PATH_MAX];
  73         /* Descriptor to audit trail file. */
  74         int      tr_filefd;
  75 };

^{Source: trail.c}

In case of a lost connection, when trail_reset() is called, the trail->tr_filename is reset (set to '\0'), so auditdistd has to start processing all the files again.

Q: Is auditdistd going to resend all the trail files again in case of a lost connection?

A: No, as the receiver side seems to be able to recover the name of the last file it worked with thanks to the call to trail_last() in the receiver_connect() function.

Trail files

Trail files suffixes:

.not_terminated
.crash_recovery
.[0-9]{14}

The trail file can be found in case it was renamed because the first part of its name (14 bytes) never changes, although a trail file can be renamed from .not_terminated to .[0-9]{14} or to .crash_recovery when hosts are disconnected.

trail.c:trail_start()

trail_start() is called when the daemon wants to open a trail file at a certain offset. If the file doesn't exist, then trail_next() is called. If the file cannot be opened then the function checks if the file's name was changed and tries to find it. If it fails to do so, it moves to another file by calling trail_next(). There are a couple more checks which might result in calling trail_next() (effectively skipping the current file). A comment inside trail_start() describes in what circumstances auditdistd continues to process a file:

 261 /*
 262  * We continue sending requested file if:
 263  * 1. It is not fully sent yet, or
 264  * 2. It is fully sent, but is not terminated, so new data can be
 265  *    appended still, or
 266  * 3. It is fully sent but file name has changed.
 267  *
 268  * Note that we are fine if our .not_terminated or .crash_recovery file
 269  * is smaller than the one on the receiver side, as it is possible that
 270  * more data was send to the receiver than was safely stored on disk.
 271  * We accept .not_terminated only because auditdistd can start before
 272  * auditd manage to rename it to .crash_recovery.
 273  */
 274 if (offset < sb.st_size ||
 275     (offset >= sb.st_size &&
 276      trail_is_not_terminated(trail->tr_filename)) ||
 277     (offset >= sb.st_size && trail_is_not_terminated(filename) &&
 278      trail_is_crash_recovery(trail->tr_filename))) {

^{Source: trail.c}

Q: What is the difference between filename and trail->tr_filename in trail_start()?

A: The filename argument is adhost->adh_trail_name, which is the trail file name that is currently processed on the receiver side. At first the contents of filename are copied into trail->tr_filename but those two variables are not necessarily equal until the end of the trail_start() function. In fact, the trail_find() function might be called if the file was renamed. In this case trail->tr_filename will store the newest file name (the one that auditdistd is really interested in) and filename will store the old file name that auditdistd thought is still valid.
```
 163 /* Receiver-specific fields. */
 164 char     adh_trail_name[ADIST_PATHSIZE];
```
- ^{Source: auditdistd.h}

The file is removed if it is fully processed (sent). Afterwards, trail_next() is called.

trail.c:trail_next()

The all the entries in the directory with logs are checked during searching for the next file. All the files not starting with a digit and not regular files are ignored. The next file is going to be the file which is the next file in a lexical order after the file name stored in trail->tr_filename, although if it is not possible to open it, then the file is skipped. Nothing happens if there are no new files.

Auditd and its interactions with the environment of auditdistd

Examples of file names inside /var/audit:

20170803223904.not_terminated when auditd is still running
20170803223904.20170803225824 after auditd was stopped writing to that file
current

After stopping auditd a file called /var/audit/current disappears. This file is in fact a symbolic link to the currently opened file, which by the way has a .not_terminated suffix.

A related comment before the trail_validate_name() function:

 535 /*
 536  * Check if the given file name is a valid audit trail file name.
 537  * Possible names:
 538  * 20120106132657.20120106132805
 539  * 20120106132657.not_terminated
 540  * 20120106132657.crash_recovery
 541  * If two names are given, check if the first name can be renamed
 542  * to the second name. When renaming, first part of the name has
 543  * to be identical and only the following renames are valid:
 544  * 20120106132657.not_terminated -> 20120106132657.20120106132805
 545  * 20120106132657.not_terminated -> 20120106132657.crash_recovery
 546  */

^{Source: trail.c:trail_validate_name()}

Q: What program (or function) manages the .not_terminated suffix of audit trail files?

A: The auditd daemon seems to be in charge of this. Look at the comment in auditdistd trail.c:
- ```
 271  * We accept .not_terminated only because auditdistd can start before
 272  * auditd manage to rename it to .crash_recovery.
```
  - ^{Source: trail.c}

References

MateuszPiotrowski/Audit/Auditdistd (last edited 2021-03-28T07:03:51+0000 by KubilayKocak)