[mips-tls] A couple of potential changes to the MIPS TLS ABI

Mon Feb 7 20:49:02 UTC 2005

On Wed, Feb 02, 2005 at 05:00:03PM -0800, Michael Uhler wrote:
> In terms of the ship leaving the dock, is the issue one of specifically
> rdhwr, or could we use another instruction which also traps as an RI (or
> something else that isn't a syscall)? I'll talk more about rdhwr below, but
> it's important for me to understand whether it's the instruction, or the
> mechanism that makes you believe that the technical window has passed.

I don't care what the trapping instruction is; I would prefer not to
move away from an RI.  As Mark wrote, that code is tested and working
(although not quite finalized).

> >> - rdhwr is a MIPS32/64 Release 2 instruction.  No existing MIPS I-IV 
> >> implementation has this instruction and probably never will.  Even 
> >> existing MIPS3264 Release 2 implementations don't have the internal 
> >> register to hold the data.  This means that it will be years before 
> >> any hardware will support the feature, and that support depends on an 
> >> architecture decision (see next
> >> item)
> 
> Dan> Compare this to a syscall.  There is no existing implementation which
> will implement the syscall efficiently, and _never_ will be.
> 
> Thiemo> Yes. This means we will have a TLS register which is a bit slower
> than a regular GPR for MIPS{32,64}R2 and a relatively slow emulated register
> for older implementations. If we use a pseudo-syscall instead, we'll have
> only the second variant with less performance potential.

Thiemo, you may already be clear on this point, but I'm going to
highlight it for the discussion anyway: the rdhwr solution does not use
a real register on MIPS32r2.  It will trap on every existing CPU.

> I take your point on syscall vs. something else that traps as an ri.  So let
> me try to explain my concern about the use of rdhwr specifically.
> 
> Compliance with the MIPS32/MIPS64 architectures (which is what's required
> for implementations by both MIPS Technologies and MIPS architecture
> licensees) requires passing a set of tests.  These tests check the corner
> cases of the architecture at each revision.  We do this to prevent
> fragmentation of the architecture and make your (you == the community of
> people writing software for implementations of the architecture) life
> easier.
> 
> In the particular case of rdhwr, we explicitly check that this instruction
> generates a reserved instruction on implementations of Release 1 of the
> architecture, and that all reserved encodings of rdhwr registers (which is
> what you're proposing to use) cause a reserved instruction exception on
> implementations of Release 2.  This means that there will never be a real
> implementation of rdhwr on Release 1 implementations.  With the current
> architecture spec, Release 2 implementations will be non-compliant with the
> architecture unless we make an architecture change.  Changes to existing
> architecture can certainly be done, but we don't take them lightly because
> we need to get comment from those people who thought they had a stable
> architecture from which to implement.  The fact that it's rdhwr makes it
> somewhat simpler because we would make the TLS register optional, and
> optional registers would cause a reserved instruction anyway.

No, I don't think you are looking at this from the right side.  The
decision to use a reserved rdhwr encoding for the thread pointer, AT
SOME FUTURE TIME, does not mean that Release 2 has any need to change.
The RI can be trapped on current processors, and it can be added to a
future architecture revision.  On the other hand, if the performance
benefits are compelling enough, that leaves you room to change the
architecture.  The phrase "Release 2 implementations will be
non-compliant" only applies to "Release 2 implementations with this
hypothetical register, of which I expect there to be none".

> But the point is, the decision to use a particular instruction for the TLS
> pointer means that the architecture has to change.  To do that is going to
> require some time while we consult with all of the architecture licensees.
> Once that happens, somebody would have to actually implement the register on
> an implementation of Release 2 of the architecture.  It will be years
> (probably at least 2-3) before the first implementation appears with the TLS
> register implemented via rdhwr, and the total population of those
> implementations is going to be small.  The vast majority of MIPS
> implementations will continue to trap with a reserved instruction, which
> will fundamentally limit the performance of NPTL on MIPS.
> 
> The alternatives seem to be to use a GPR (but this requires an ABI change)

As many people have pointed out, waiting for the ABI change isn't
practical.  In a sense, that would also fundamentally limit the
performance of NPTL on MIPS :-)

> or to park the TLB  pointer someplace in the address space. I wondered to
> Mark at one point whether we could put it at the base of the stack, then
> down-align sp to access it.  We played with this a bit, but couldn't come up
> with anything that was relatively clean.

You can't do it that way.  This is what LinuxThreads used to do and it
imposes impossible limits on your stack alignment and sizing.

Want to talk to me more about using a parked TLB entry?  I spoke with
someone (Ralf or Jun, probably) about the idea originally; I was told
it wasn't possible on MIPS SMP implementations to make this work, or
that there was some other reason why it was undesirable.  If that's not
accurate then we could use a reserved memory location.  However, that
makes the TLS model dependent on details of Linux's memory mapping -
not good for a hopefully generally useful ABI.

Note that I haven't been doing thread benchamarking, but the
performance overhead from emulating rdhwr has not been significant
in casual testing.  I'm not weeping for the lost speed either way.

> So my feedback on the use of rdhwr (or any other instruction that traps) is
> that as long as this is a short-term solution and/or we understand the
> performance implications of how often that trap happens, it's OK.  Depending
> on rdhwr to appear in a real implementation any time in the next 2-3 years
> simply isn't going to happen.

I understand that.

> If we do decide to use rdhwr (as opposed to another trapping instruction -
> see further comments below), we're probably going to have to change whatever
> RDHWR register number that you're using now.  You can't just pick one at
> random as that will conflict with the architecture as we add new registers.

Hint: that's why I asked MIPS for feedback, so that we could get a
non-conflicting register number assigned.  The only reason I picked $5
was because it was unassigned in the MIPS32r2 spec and I couldn't find
any reference to plans for it.

> I've talked about my concerns about the use of rdhwr above.  My general
> concern is about the widespread use of any instruction whose emulation
> requires reading the instruction from memory (which would be pretty much
> anything but syscall, which has at dedicated exception vector and passes
> arguments via register).  We had occasion to have to debug a problem with
> another operating system and a MIPS core from a different manufacturer.  We
> discovered that this particular implementation did not guarantee that a load
> done off the EPC value would always hit in the TLB.  In fact, it missed and
> the kernel didn't use a guarded load, so it took a nested exception and
> crashed.
> 
> You could say that this is a bug in the implementation, but we started to
> look more broadly and concluded that it is possible for implementations,
> particularly those that implement a virtual instruction cache, to wind up at
> the reserved instruction handler and not have the instruction page mapped.
> 
> The advantage of syscall is that the argument is in a register, and no
> instruction read is required to interpret the instruction.  One can
> certainly use another instruction (e.g., rdhwr) whose emulation requires
> reading the instruction, but the read needs to be guarded.  If this is true
> of the Linux RI handler, we're all set.  If not, this needs to be considered
> in the selection of the instruction that's going to trap.  While a TLB miss
> isn't going to happen very often (maybe never on some processors), the code
> has to deal with the case to ensure correctness.  When thinking about the
> choice of rdhwr or something else that traps, we should consider this
> situation.

All reads from userspace are always protected in Linux; anything else
is a bug, plain and simple.  This is a non-issue.  Your point about the
performance overhead of reading and decoding an instruction is worth
keeping in mind, but hasn't been a big slowdown.

> It's been awhile since I looked at the code, but I thought we could hide the
> additional instructions required to do this with syscall under the current
> code for almost all implementations of the architecture.  That is, knowing
> that all implementations are pipelined and that certain things create holes
> in the pipeline, I seem to recall thinking that it would add no more cycles
> (as opposed to instructions) to the syscall flow.  But as I said, it's been
> awhile.

I've seen pretty strong negative push for adding more complexity to the
already extremely complex syscall path.  A syscall which didn't trash
the same set of registers would be a lot of complexity.

In any case, I'll note that the binutils and GCC portions of TLS
support are ready for submission to the FSF and I'm moving on to glibc.
The GCC bits include the instruction under debate, whatever it turns out
to be.  I don't want to bog this down in discussion longer than
necessary, so I hope we can come to an agreement in the next few days.

-- 
Daniel Jacobowitz