LibElf Implementation Notes
The ELF(3) API allows applications to create, read and write plain ELF files (and to read ar(1) archives, see below). The current design mmap's the content of the target file and makes the uninterpreted bits of the target available as ELF descriptor member e_rawfile.
ELF_C_READ operations will map the underlying file read-only and use e_rawfile directly.
ELF_C_WRITE operations use freshly malloc()'ed address ranges and will update the e_rawfile member on an elf_update().
ELF_C_RDWR operations will map the file read-only, and will read information from there as needed when data from a given section is retrieved. The library will merge "clean" and "dirty" data and write out the new contents to the underlying file at elf_update() time. The "new" data will be then mapped in and the size of the mapped range would be adjusted as needed.
ar(1) archives have a few corners to be taken care of.
GNU ar creates archives which have archive headers that start at even addresses. I.e., if a file being archived has an odd length, the data in the archive is padded with one 0x0D character. Extraction always uses the exact size of the file. Solaris ar(1) appears to use 4 byte padding. It isn't clear how to tell the difference between a Solaris and GNU ar archive. What this means is that when reading & parsing data from inside of an ar(1) archive, we need to be prepared for misalignments.
struct ar_hdr described in /usr/include/ar.h has limited space (16 bytes) for the name of the file. For archives containing files with longer names, an extended archive format is used, with special entries //, /?, /0.../9, denoting special members (an archive symbol table, an archive string table, etc.).
- API Issues:
elf_rand() is defined in the SVR4 ELF API as taking a size_t for an offset into the archive. I.e., the prototype is elf_rand(Elf *e, size_t offset). Using an off_t seems to be more correct on 32 bit systems: elf_rand(Elf *e, off_t offset).
The SVR4 API uses off_t for the d_off member of an Elf_Data structure and size_t for the d_size member. However, these types are architecture dependent; size_t is 32 bits on FreeBSD/i386 and 64 bits on FreeBSD/sparc64. I've defined these as of type uint64_t so that 64 bit ELF objects can be handled on 32 bit systems without bugs caused by field width mismatches.
gelf_newehdr() is defined in the SVR4 ELF API as returning unsigned long. However the return type should be a union of Elf32_Ehdr * and Elf64_Ehdr *, or at least void *. We can't portably assume that an unsigned long is large enough to hold a pointer value.
gelf_newphdr() has a similar issue.
It isn't very clear what elf_getident() needs to do if invoked on an ELF file opened for writing. I'm choosing to return ELF_E_SEQUENCE if elf_getident() is invoked before a successful elf_update().
- API Extensions
elf_setshstrndx(), for an application to set the e_shstrndx field (or its equivalent if using extended section numbering).
As an extension to the elf_flag*() APIs, a flag value of zero will query the current set of flags on an object without changing them.
- GNU libelf has a set of extensions that allow removing sections and moving sections around in the file. Are these useful for our tools?
jb@ has proposed that an elf_dump() API be added to -lelf.