Thread Local Storage
From http://people.freebsd.org/~marcel/tls.html
This document tries to collect all kinds of information related to TLS and serves as a design document and implementation guide. Nothing fancy, just something to help us flesh out the details.
The problem space
We seem to have a three dimensional problem space:
* complete vs shared executable -- Complete means static, but is a term used in certain environments. The advantage of using complete in this context is that it allows us to use static to mean something else. The big difference between complete and shared is the presence (or absence) of a runtime linker.
* static vs dynamic TLS -- This of course refers to the TLS model in use. A process can have both models in use at the same time, but certain technical restrictions apply. The big difference between static and dynamic TLS is the use of the __tls_get_addr() function to get the virtual address of a thread local variable (or not).
* with pthread vs without pthread -- This means whether a threads library (libthr or libkse) is present and/or in use. The existence of the __thread keyword does not imply or mean that the process will be multi-threaded. This means that we have to deal with TLS accesses outside the context of a threaded application. The big difference between pthread and without pthread is the ability to actually have multiple threads.
Current platform support
Of the current tier 1 and tier 2 platforms, only i386 and ia64 have full toolchain support. This is with GCC 3.3. On ia64, the current version of binutils (2.13.2) is buggy with respect to TLS. This seems to affect dynamic TLS relocations. On alpha the TLS access sequences are not generated at all. The __thread keyword seems to be ignored. On sparc64 the compiler emits an error when the __thread keyword is used. GCC 3.4 claims to have support for TLS on alpha and sparc64. This has not been tested or verified. On amd64 the assembler does not support thread-local access relocations in 64-bit mode (binutils 2.13.2). When generating 32-bit (ILP32) code on amd64, the assembler supports TLS. This however has no practical value.
Below typical TLS access sequences, both static and dynamic, for the platforms that do support TLS. The C code from which the access sequences is generated is:
int __thread i = 3; int x() { return i; }
i386
static TLS access sequence
movl %gs:0, %eax movl i@NTPOFF(%eax), %eax
dynamic TLS access sequence
addl $_GLOBAL_OFFSET_TABLE_+[.-.L2], %ebx leal i@TLSGD(,%ebx,1), %eax call ___tls_get_addr@PLT movl (%eax), %eax popl %ebx
ia64
static TLS access sequence
addl r14 = @tprel(i), tp ;; ld4 r8 = [r14]
dynamic TLS access sequence
addl r14 = @ltoff(@dtpmod(i)), gp addl r15 = @ltoff(@dtprel(i)), gp ;; ld8 out0 = [r14] ld8 out1 = [r15] br.call.sptk b0 = __tls_get_addr ld4 r8 = [r8]