[mips-tls] A couple of potential changes to the MIPS TLS ABI

Thu Feb 3 01:00:03 UTC 2005

Rather than respond to each email individually, I'm including one global
response to everybody's feedback.

>> The one area that I'm concerned about is the use of rdhwr to return 
>> the pointer.  There are several reasons why I'm not sure this is the 
>> right thing to do:

Dan> I'm getting a lot of conflicting feedback about this.

Mark>  From our point of view, we've already got a validated implementation 
Mark> using rdhwr.  We'd like to avoid having to rework our code and then 
Mark> revalidate.
Mark> 
Mark> Realistically, if rdhwr isn't officially blessed, some vendors might 
Mark> still use our implementation.  Or, things might just languish.  In
other 
Mark> words, I'm somewhat afraid that we've missed the technical window to 
Mark> debate this particular technical point.
Mark> 
Mark> As Dan says, the new MIPS ABI can do better in this regard, as in 
Mark> others.  Furthermore, if the kernel adds a syscall that can be used by

Mark> the o32 ABI, then the tools can be updated to work with that too.  I 
Mark> think the only immutable aspect of this existing design is that
if/once 
Mark> our implementation escapes into the wild, then kernels forevermore may

Mark> have to support the rdhwr solution, even if most programs no longer
use 
Mark> it.  I think that's a relatively small price to pay to get NPTL
working 
Mark> on MIPS.

In terms of the ship leaving the dock, is the issue one of specifically
rdhwr, or could we use another instruction which also traps as an RI (or
something else that isn't a syscall)? I'll talk more about rdhwr below, but
it's important for me to understand whether it's the instruction, or the
mechanism that makes you believe that the technical window has passed.

>> - rdhwr is a MIPS32/64 Release 2 instruction.  No existing MIPS I-IV 
>> implementation has this instruction and probably never will.  Even 
>> existing MIPS3264 Release 2 implementations don't have the internal 
>> register to hold the data.  This means that it will be years before 
>> any hardware will support the feature, and that support depends on an 
>> architecture decision (see next
>> item)

Dan> Compare this to a syscall.  There is no existing implementation which
will implement the syscall efficiently, and _never_ will be.

Thiemo> Yes. This means we will have a TLS register which is a bit slower
than a regular GPR for MIPS{32,64}R2 and a relatively slow emulated register
for older implementations. If we use a pseudo-syscall instead, we'll have
only the second variant with less performance potential.

I take your point on syscall vs. something else that traps as an ri.  So let
me try to explain my concern about the use of rdhwr specifically.

Compliance with the MIPS32/MIPS64 architectures (which is what's required
for implementations by both MIPS Technologies and MIPS architecture
licensees) requires passing a set of tests.  These tests check the corner
cases of the architecture at each revision.  We do this to prevent
fragmentation of the architecture and make your (you == the community of
people writing software for implementations of the architecture) life
easier.

In the particular case of rdhwr, we explicitly check that this instruction
generates a reserved instruction on implementations of Release 1 of the
architecture, and that all reserved encodings of rdhwr registers (which is
what you're proposing to use) cause a reserved instruction exception on
implementations of Release 2.  This means that there will never be a real
implementation of rdhwr on Release 1 implementations.  With the current
architecture spec, Release 2 implementations will be non-compliant with the
architecture unless we make an architecture change.  Changes to existing
architecture can certainly be done, but we don't take them lightly because
we need to get comment from those people who thought they had a stable
architecture from which to implement.  The fact that it's rdhwr makes it
somewhat simpler because we would make the TLS register optional, and
optional registers would cause a reserved instruction anyway.

But the point is, the decision to use a particular instruction for the TLS
pointer means that the architecture has to change.  To do that is going to
require some time while we consult with all of the architecture licensees.
Once that happens, somebody would have to actually implement the register on
an implementation of Release 2 of the architecture.  It will be years
(probably at least 2-3) before the first implementation appears with the TLS
register implemented via rdhwr, and the total population of those
implementations is going to be small.  The vast majority of MIPS
implementations will continue to trap with a reserved instruction, which
will fundamentally limit the performance of NPTL on MIPS.

The alternatives seem to be to use a GPR (but this requires an ABI change)
or to park the TLB  pointer someplace in the address space. I wondered to
Mark at one point whether we could put it at the base of the stack, then
down-align sp to access it.  We played with this a bit, but couldn't come up
with anything that was relatively clean.

So my feedback on the use of rdhwr (or any other instruction that traps) is
that as long as this is a short-term solution and/or we understand the
performance implications of how often that trap happens, it's OK.  Depending
on rdhwr to appear in a real implementation any time in the next 2-3 years
simply isn't going to happen.

If we do decide to use rdhwr (as opposed to another trapping instruction -
see further comments below), we're probably going to have to change whatever
RDHWR register number that you're using now.  You can't just pick one at
random as that will conflict with the architecture as we add new registers.

>> - We have some concerns at the architecture level about using rdhwr 
>> for this purpose rather than using a GPR under the umbrella of an ABI 
>> re-work that some of you are involved in.

Dan> This objection is way too vague for me to respond to.  Also, using
rdhwr does not prevent future use of a GPR, in an ABI that doesn't exist
yet.  Nice thing about read-only state; you can keep it in multiple places
easily.

Thiemo> The ABI re-work surely isn't mutually exclusive to o32 TLS. The main
reason for o32 TLS is to get rid of unmaintained linuxthreads while
maintaining source-level compatibility. It will also be available soon (the
same rationale applies to n32/n64 TLS, of course). The ABI re-work is much
more ambitious.

I personally believe that the ABI change is the long-term solution, but I
take your point about the needs for o32.

I've talked about my concerns about the use of rdhwr above.  My general
concern is about the widespread use of any instruction whose emulation
requires reading the instruction from memory (which would be pretty much
anything but syscall, which has at dedicated exception vector and passes
arguments via register).  We had occasion to have to debug a problem with
another operating system and a MIPS core from a different manufacturer.  We
discovered that this particular implementation did not guarantee that a load
done off the EPC value would always hit in the TLB.  In fact, it missed and
the kernel didn't use a guarded load, so it took a nested exception and
crashed.

You could say that this is a bug in the implementation, but we started to
look more broadly and concluded that it is possible for implementations,
particularly those that implement a virtual instruction cache, to wind up at
the reserved instruction handler and not have the instruction page mapped.

The advantage of syscall is that the argument is in a register, and no
instruction read is required to interpret the instruction.  One can
certainly use another instruction (e.g., rdhwr) whose emulation requires
reading the instruction, but the read needs to be guarded.  If this is true
of the Linux RI handler, we're all set.  If not, this needs to be considered
in the selection of the instruction that's going to trap.  While a TLB miss
isn't going to happen very often (maybe never on some processors), the code
has to deal with the case to ensure correctness.  When thinking about the
choice of rdhwr or something else that traps, we should consider this
situation.

>> - Some preliminary work at MIPS suggests that a tuned handler for 
>> syscall is faster than one for handling rdhwr at the reserved 
>> instruction handler. This means that we're betting on having actual 
>> hardware implementations of rdhwr out there in sufficient volume to 
>> make up for the fact that we're penalizing everybody else by using an 
>> RI trap hander vs. a syscall trap handler.

Dan> So is this one.  Can you be more specific?  The only substantial
difference in overhead that I am familiar with is the additional register
save/restores; note that this is a substantial _advantage_ for userland
which would otherwise have to save and restore additional registers.
Keeping the save/restores in the kernel is a win for code size and
complexity.

Thiemo> That's surprising, at least for the current Linux implementation.
The basic exception handler is the same, and the syscall path is already
time-critical and loaded with ABI dispatch etc. Adding another path to it
will penalize syscalls further. RI has no critical path yet, adding the
rdhwr emulation should be fast and relatively straightforward. Extracing the
instruction from mapped space could get slow if it interferes with TLB
handling, but I don't think that's the common case.

In thinking this over, I realized that I was combining performance with the
correctness issue that I mentioned above.  If one ignores the need to do a
guarded load, the performance delta between syscall and ri was very small.
I think that Macro did the original testing, and we can show you the code.
But I acknowledge that the isn't much difference in the two implementations.

It's been awhile since I looked at the code, but I thought we could hide the
additional instructions required to do this with syscall under the current
code for almost all implementations of the architecture.  That is, knowing
that all implementations are pipelined and that certain things create holes
in the pipeline, I seem to recall thinking that it would add no more cycles
(as opposed to instructions) to the syscall flow.  But as I said, it's been
awhile.

>> To me, all of these suggest that we may want to use syscall rather 
>> than rdhwr to get the pointer, at least until we can decide whether to 
>> dedicate a GPR for this purpose.

Dan> In any case, I am putting Ralf on the hot seat here.  I'm going to do
whatever he likes, anyway, since it's no good to me if the kernel doesn't
support it :-)

Thiemo> Is there some data available how much pressure has to be expected
for the TLS register vs. normal GPR? I would guess it is significantly
lower, since there are several Linux implementations which use emulation
sucessfully. In that case, rdhwr might be even benefical for the ABI
re-work, and free up an GPR which can be used for better things.

We have started some experiments on register usage and pressure as part of
the ABI work.  We'll let you know as we get more data.

/gmu

---
Michael Uhler, Chief Technology Officer
MIPS Technologies, Inc.   Email: uhler at mips.com
1225 Charleston Road      Voice:  (650)567-5025   FAX:   (650)567-5225
Mountain View, CA 94043   Mobile: (650)868-6870   Admin: (650)567-5085