[c++-pthreads] thread-safety definition
Dave Butenhof
David.Butenhof at hp.com
Thu Jan 8 14:04:05 UTC 2004
Mathieu Lacage wrote:
>On Thu, 2004-01-08 at 12:34, Dave Butenhof wrote:
>
>>>1) "inside cancelation": This is basically ExitThread (win32 name). It
>>>exists on all the platforms which support a form of threads or another I
>>>know of. It semantics vary a lot from one platform to the other
>>>unfortunatly. On win32, it will not invoke any thread-specific cleanup
>>>handlers (neither C++ exceptions nor SEH are involved). On BeOS
>>>(exit_thread), it will behave just like on windows. On POSIX
>>>(pthread_exit) systems, it will invoke the thread-specific cancelation
>>>handlers.
>>>
>>The term "cancellation" seems heavy here. This is just a voluntary
>>termination. But, yes, there are similar properties -- certainly from
>>the point of view of the rest of the frames on the call stack at the time.
>>
>>
>Indeed. For a C++ POSIX binding, I would assume you might want to make
>such a function throw an exception caught by the thread-creation
>function to unwind properly the stack. Or is this some kind of wild
>stupid idea?
>
>
One example: on Tru64 UNIX and OpenVMS, pthread_exit() raises an
exception, which is distinct from the exception provoked by
pthread_cancel(), but with similar characteristics. Specifically, that
an UNCAUGHT exception will terminate only the thread rather than the
process (it's implicitly caught in the thread library's internal "thread
base" routine), and that it's "generally improper" (though not
impossible nor even illegal) for any other agency to finalize
propagation of the exception.
It's an exception for exactly the same reason as cancel: so that each
active frame on the stack has the opportunity to perform appropriate
cleanup of resources before termination.
In the "pure POSIX model", without exceptions, both pthread_exit() and
cancellation provoke sequential LIFO execution of a stack of "POSIX
cleanup handlers" designated by the pthread_cleanup_push() operation.
The intended implementation of pthread_cleanup_push() (and our actual
implementation) is as a simple macro that initiates an exception scope,
analogous to a C++ "try {".
>>>2) "outside cancelation": There are two kinds of "outside cancelation":
>>>
>>> 2.1) "async cancelation": The OS removes the thread from its list of
>>>tasks to schedule and does nothing to cleanup the thread ressources.
>>>This is the most extreme useless feature of a thread library. BeOS and
>>>win32 provide it. POSIX does not provide it.
>>>
>>>
>I should add: win32 (TerminateThread), BeOS (kill_thread).
>
>>POSIX already defines "async cancel", as a mode where posting a cancel
>>to a thread will cause the cancellation to be delivered at any arbitrary
>>time supported by the OS and hardware. (Usually on the next clock tick,
>>though that's a "common implementation" rather than any rule or even
>>recommendation.)
>>
>>
>OK. I guess this definition of "POSIX async cancel" was already
>explained on the list before but I missed it. I believe this POSIX async
>cancel is similar enough (at least, it feels as unsafe to use) to
>"abort" that we could count it in section 2.1. What do you think ?
>
>
No, not really. POSIX async cancel is still an exception, allowing
hierarchical isolated cleanup of each active frame on the stack. It's
just that, because of the resource ownership dilemma, there's no way to
safely use async-cancel in "general code". It has to be restricted to
areas of code that do not acquire or release resources, including any
calls to external functions that might.
Nevertheless, async cancel CAN be used safely if you're careful, without
disrupting the operation of the process. This is not true of
TerminateThread, or the hypothetical pthread_abort() proposal, which
immediately deschedule the victim thread and abandon any resources it
might own -- including heap (which can cause memory leaks) and
synchronization objects (which, far worse, is almost guaranteed to cause
deadlocks).
And note that it's OK to allocate heap, or lock a mutex, and then enable
async cancel for some section of code, disabling async cancel before
freeing the memory or releasing the mutex. In such a sequence, the
cleanup handlers invoked by async cancel DO know the state of the
resources (they are "acquired"), and can clean up. You simply can't
enable async cancel across a call that allocates or frees heap, locks or
unlocks a mutex, because the cleanup handler couldn't tell whether the
operation had completed.
In contrast, ANY use of TerminateThread trashes the process
unrecoverably, except in extremely unusual circumstances where an
embedded-type application really knows precisely what the victim thread
might be doing and can reliably repair any predicates and release or
safely discard any resources. You can NEVER do this with a thread that
might be running arbitrary library code, because you can't possibly know
what resources it might own or the effect of abandoning them. (That's
why pthread_abort() was rejected. While it's useful and even essential
for some class of embedded system application, it's very nearly useless,
and extremely dangerous, in any more general environment. Since the real
value of POSIX in true embedded system design is "programmer
portability", not full portability of every API, there would have been
no point to including this specialized function in the general standard.)
>>"Cancellation" (both deferred and async) come from the Digital "CMA"
>>architecture (where it was called "alert"). The CMA concept derives from
>>a less structured (but fundamentally similar) capability in the SRC
>>research labs' Topaz thread package.
>>
>>
>Do you know of other widely used system-level APIs which provide similar
>features?
>
>
No; though that's no guarantee that some haven't cropped up somewhere.
>>>Definition "Posix thread-safety":
>>>---------------------------------
>>>A library is "posix thread-safe" if it is thread-safe and
>>>defered-cancelation-safe.
>>>
>>I wouldn't tack cancel-safety onto thread-safety so intimately, although
>>
>>
>I used the POSIX name because I thought it was the only widely deployed
>system which provides this service. Maybe we should rename this to
>"strong thread-safety". Maybe "defered-cancel thread-safety"?
>
>
But my point was that it's perfectly reasonable to have POSIX
thread-safety without cancel-safety. I don't see how it's relevant
whether anything but POSIX also has cancel-safety.
>>(Async cancel is an oddity; there are, and should be, very few
>>async-cancel-safe functions. Async-cancel regions of code cannot
>>accomodate resource acquisition or release of any sort, as the recovery
>>code is generally unable to determine the state of the resource.)
>>
>>
>Yes. This is why I don't feel it's necessary to discuss it further since
>so little code will be concerned with it, we can altogether not deal
>with it for most C++ libraries.
>
>
Introducing asynchronous exceptions into C++ would be pointlessly
disruptive, like introducing continuable exceptions. I'd rather not even
consider it.
Even if it were supported, though, C++ is certainly free to follow the
lead of POSIX. We designated only a very few functions to be
async-cancel safe; and even at that I think we ended up with more than
we really should have had. (I never really figured out why we ended up
with pthread_cancel() being async-cancel safe, and I don't think it
makes any sense. The guy who write the text couldn't remember either,
but in the end we decided not to risk changing it.) Really, in terms of
POSIX standard APIs, all you can do with async cancel enabled is to
DISABLE async-cancel. I like it that way. There's no reason at all that
ANY of the standard C++ runtime should be designated (or coded) to be
async-cancel safe.
>>Nevertheless, it's quite reasonable to write a "thread-safe" special
>>purpose application routine that doesn't deal with cancellation simply
>>because the designer KNOWS that a thread running that code cannot be
>>cancelled. One might even make this choice within in a general purpose
>>library in some cases -- say, for a daemon thread that could never run
>>application code nor be identified to the application, and that
>>therefore cannot be cancelled.
>>
>>
>Yes. Exactly. I have written a lot of code like that. The core C++
>threaded code is hidden far away from the user which cannot therefore
>"posix-defer-cancel" it. It can't even ever get the C++ exceptions since
>they are catch (...) and transformed into C error codes.
>
>
This doesn't sound like the same thing, though. Your catch(...) may
prevent the cancel from doing what it SHOULD do, but it won't prevent
delivery, and you've just ignored the application's cancel request.
That's bad, and while it may be "cancel safe" in some trivial respect,
(an unexpected cancel request won't corrupt the library state), it's not
useful to anyone.
If code runs in an application thread, or a thread for which application
code might have a valid handle, then that thread can be cancelled at the
whim of the application. You can of course simply DOCUMENT that doing
this is an error. You can say it'll be ignored, or you can say that it
may arbitrarily corrupt application state; but that's not a true general
purpose library.
What I'm talking about is a separate thread created within the library
to which no application code could possibly have a reference. It is
physically impossible for the application code to ever REQUEST
cancellation. (Yeah, very little is "physically impossible", and a
simple uninitialized variable could end up holding the handle of such a
thread; but that's an application error against which nobody can
reasonably defend.) Anyway, if the application "CAN'T" cancel the
thread, and the library knows that it WON'T cancel the thread, there's
no point in writing code that runs ONLY within that thread to be cancel
safe.
>As a conclusion to these (tentative) definitions, I believe the purpose
>of this mailing list is to find a solution to develop "defered-cancel
>thread-safe" C++ libraries: simple "thread-safe" libraries do not
>require special attention. If everyone could agree to the statement
>above, it would probably make the discussion more productive: other
>threading models which do not support async cancelation are of no
>interest to the discussion and can be forgotten.
>
>
Code that cannot ever be subject to cancellation need not be cancel
safe, if that's what you mean. If code was written to a thread model
without cancellation, or written specifically for an environment where
it would not be cancelled, that code can be brought into a new
"cancellable C++" environment safely as long as that basic premise
continues -- that it will not be run in a thread that's cancelled.
>If people agree on this statement, the only issue I can see which
>delimits the design space for the solution to this problem is whether or
>not you wish to allow the C++ library calling into C code (which uses
>pthreads) and/or allow C code to use the C++ library (which uses our C++
>threading solution).
>
>Maybe it would help to consider the two cases separatly and try to
>figure out what requirements each case creates:
> 1) C++ library calls C++ code and is called by C++ code.
> 2) C++ library calls into C code.
> 3) C code calls C++ library.
>
>The hard part seems to be 2) and 3) where, if you use exceptions to
>propagate a cancel operation from either a cancelation point or a
>pthread_exit call, you need to correctly handle the registered
>cancelation handlers _and_ the C++ catch blocks in the right order. That
>seems pretty hard (ie: impossible) to me, being just a _user_ of thread
>libraries.
>
>
The impact extends beyond C and C++, to every facility that deals with
exceptions; Java, Ada, Modula-2+, or whatever else. The call stack must
be unwound once, and all handlers, no matter how declared or in what
language, called in the correct sequence. You're right -- it's nearly
impossible without exceptions; yet it's trivial, natural, and all but
unavoidable if everyone uses the same common exception/unwind package.
(And I might point out that any "non exception" mechanism that could
accomplish it would be indistinguishable from a common exception
infrastructure anyway!) That's precisely why cancellation and thread
exit ARE exceptions, were always intended to be exceptions, and cannot
practically be anything else. ;-)
>If people are not interested in 2) and 3) and just want to design a
>solution for 1), then I think it will make the discussion more
>productive to acknowledge it.
>
>
The ANSI C++ committee could well do that; just as POSIX and C++ have so
far essentially ignored each other. However, we might look back at the
recently revealed origin of the name and subject of this mailing list,
which is tangled up with actual implementation on a real system,
specifically gcc. THEY cannot ignore interoperability between C and C++;
and nor can anyone else in the real world. So even if the committee were
to decide it cannot or is unwilling to address 2 and 3, I don't think
that decision would be relevant to this mailing list!
--
/--------------------[ David.Butenhof at hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
More information about the c++-pthreads
mailing list