A couple of potential changes to the MIPS TLS ABI

Mon Jan 31 20:56:41 UTC 2005

Hi folks,

We have the 32-bit port of NPTL basically working now, and I have been
preparing the first patches for submission - which means binutils.  So
I have been going over the ABI in some detail looking for things that
we want to finalize before the patches are integrated.  I found three
points...

First, a minor correction: LE was using "ori" where the equivalent LD
sequence used "addiu".  I've updated this on the Wiki.  It was a
leftover from an earlier draft.

Second, the %tpoff operator is currently ambiguous.  When we say
%dtpoff, we are always talking about the offset from the base of this
module's DTV entry to the location of the variable; currently this is
always used with %hi and %lo.  However, %tpoff can be used with %hi and
%lo (Local Exec model, refers to variable offset from thread pointer)
%or without (Initial Exec model, refers to GOT slot holding the
variable offset).

How about this instead?
  R_MIPS_TLS_TPOFF -> R_MIPS_TLS_GOTTPOFF
  %tpoff(x)($28) -> %gottpoff(x)($28)

This also frees up %tpoff in case we want to use that the way PowerPC
uses @tprel.  foo at tprel@l is the low 16 bits of the offset; foo at tprel s
the low 16 bits also, but signals an error if the ofset does not fit
entirely in 16 bits.  The alternate sequence III of Local Exec could
take advantage of this.

Third, the Design Choices section of the specification has this to say:

     *  The compiler is not allowed to schedule the sequences below. 

  The sequences below must appear exactly as written in the code
  generated by the compiler. This restriction is present because we have
  not yet determined what linker optimizations may be possible. In order
  to facilitate adding linker optimizations in the future, without
  recompiling current code, the compiler is restricted from scheduling
  these sequences. 

I'd like to settle this one way or the other before finalizing the
spec.

For reference, the possible linker optimizations are:
 General Dynamic -> Initial Exec (whenever linking an exec)
 Local Dynamic -> Local Exec (ditto)
 Initial Exec -> Local Exec (when the symbol is known to live in the
			     executable; can be applied starting at GD too)

The major advantage here is replacing a call to __tls_get_addr with a
rdhwr instruction.  These are, in theory, doable for MIPS.  Here's an
example o32 GD -> IE, probably the most important one:

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsgd(x)               R_MIPS_TLS_GD   x
   0x12 $gp restore (not mentioned in the TLS ABI)

   0x00 rdhwr $4, $5
   0x04 lw $5, %tpoff(x1)($28)         R_MIPS_TLS_TPOFF        x1
   0x08 addu $4, $4, $5
   0x12 the $gp restore can be nop'd out

There are a couple of other quirks for this; the only one I can think
of offhand is MIPS-I load delay slots, which would mean neither
sequence could be used as-is.  The immediate disadvantage is that the
compiler can not schedule the sequences.  I don't know what all the
tradeoffs are here, I just know that the compiler implementation would
be simpler if we did not make the sequences fixed and unschedulable.
So I'd like to ditch that unless folks think that
  (A) the linker optimizations are useful
  (B) the linker optimizations are feasible
  (C) someone is likely to implement the linker optimizations

Any opinions?  I see that Alpha does implement the TLS linker
relaxations; on the other hand, Alpha already had a linker relaxation
mechanism in place, and the GNU tools for MIPS don't.

-- 
Daniel Jacobowitz