[c++-pthreads] thread-safety definition

Thu Jan 8 14:04:05 UTC 2004

Mathieu Lacage wrote:

>On Thu, 2004-01-08 at 12:34, Dave Butenhof wrote:
>
>>>1) "inside cancelation": This is basically ExitThread (win32 name). It
>>>exists on all the platforms which support a form of threads or another I
>>>know of. It semantics vary a lot from one platform to the other
>>>unfortunatly. On win32, it will not invoke any thread-specific cleanup
>>>handlers (neither C++ exceptions nor SEH are involved). On BeOS
>>>(exit_thread), it will behave just like on windows. On POSIX
>>>(pthread_exit) systems, it will invoke the thread-specific cancelation
>>>handlers.
>>>
>>The term "cancellation" seems heavy here. This is just a voluntary 
>>termination. But, yes, there are similar properties -- certainly from 
>>the point of view of the rest of the frames on the call stack at the time.
>>    
>>
>Indeed. For a C++ POSIX binding, I would assume you might want to make
>such a function throw an exception caught by the thread-creation
>function to unwind properly the stack. Or is this some kind of wild
>stupid idea?
>  
>
One example: on Tru64 UNIX and OpenVMS, pthread_exit() raises an 
exception, which is distinct from the exception provoked by 
pthread_cancel(), but with similar characteristics. Specifically, that 
an UNCAUGHT exception will terminate only the thread rather than the 
process (it's implicitly caught in the thread library's internal "thread 
base" routine), and that it's "generally improper" (though not 
impossible nor even illegal) for any other agency to finalize 
propagation of the exception.

It's an exception for exactly the same reason as cancel: so that each 
active frame on the stack has the opportunity to perform appropriate 
cleanup of resources before termination.

In the "pure POSIX model", without exceptions, both pthread_exit() and 
cancellation provoke sequential LIFO execution of a stack of "POSIX 
cleanup handlers" designated by the pthread_cleanup_push() operation. 
The intended implementation of pthread_cleanup_push() (and our actual 
implementation) is as a simple macro that initiates an exception scope, 
analogous to a C++ "try {".

>>>2) "outside cancelation": There are two kinds of "outside cancelation":
>>>
>>>	2.1) "async cancelation": The OS removes the thread from its list of
>>>tasks to schedule and does nothing to cleanup the thread ressources.
>>>This is the most extreme useless feature of a thread library. BeOS and
>>>win32 provide it. POSIX does not provide it.
>>>      
>>>
>I should add: win32 (TerminateThread), BeOS (kill_thread).
>
>>POSIX already defines "async cancel", as a mode where posting a cancel 
>>to a thread will cause the cancellation to be delivered at any arbitrary 
>>time supported by the OS and hardware. (Usually on the next clock tick, 
>>though that's a "common implementation" rather than any rule or even 
>>recommendation.)
>>    
>>
>OK. I guess this definition of "POSIX async cancel" was already
>explained on the list before but I missed it. I believe this POSIX async
>cancel is similar enough (at least, it feels as unsafe to use) to
>"abort" that we could count it in section 2.1. What do you think ?
>  
>
No, not really. POSIX async cancel is still an exception, allowing 
hierarchical isolated cleanup of each active frame on the stack. It's 
just that, because of the resource ownership dilemma, there's no way to 
safely use async-cancel in "general code". It has to be restricted to 
areas of code that do not acquire or release resources, including any 
calls to external functions that might.

Nevertheless, async cancel CAN be used safely if you're careful, without 
disrupting the operation of the process. This is not true of 
TerminateThread, or the hypothetical pthread_abort() proposal, which 
immediately deschedule the victim thread and abandon any resources it 
might own -- including heap (which can cause memory leaks) and 
synchronization objects (which, far worse, is almost guaranteed to cause 
deadlocks).

And note that it's OK to allocate heap, or lock a mutex, and then enable 
async cancel for some section of code, disabling async cancel before 
freeing the memory or releasing the mutex. In such a sequence, the 
cleanup handlers invoked by async cancel DO know the state of the 
resources (they are "acquired"), and can clean up. You simply can't 
enable async cancel across a call that allocates or frees heap, locks or 
unlocks a mutex, because the cleanup handler couldn't tell whether the 
operation had completed.

In contrast, ANY use of TerminateThread trashes the process 
unrecoverably, except in extremely unusual circumstances where an 
embedded-type application really knows precisely what the victim thread 
might be doing and can reliably repair any predicates and release or 
safely discard any resources. You can NEVER do this with a thread that 
might be running arbitrary library code, because you can't possibly know 
what resources it might own or the effect of abandoning them. (That's 
why pthread_abort() was rejected. While it's useful and even essential 
for some class of embedded system application, it's very nearly useless, 
and extremely dangerous, in any more general environment. Since the real 
value of POSIX in true embedded system design is "programmer 
portability", not full portability of every API, there would have been 
no point to including this specialized function in the general standard.)

>>"Cancellation" (both deferred and async) come from the Digital "CMA" 
>>architecture (where it was called "alert"). The CMA concept derives from 
>>a less structured (but fundamentally similar) capability in the SRC 
>>research labs' Topaz thread package.
>>    
>>
>Do you know of other widely used system-level APIs which provide similar
>features?
>  
>
No; though that's no guarantee that some haven't cropped up somewhere.

>>>Definition "Posix thread-safety":
>>>---------------------------------
>>>A library is "posix thread-safe" if it is thread-safe and
>>>defered-cancelation-safe.
>>>
>>I wouldn't tack cancel-safety onto thread-safety so intimately, although 
>>    
>>
>I used the POSIX name because I thought it was the only widely deployed
>system which provides this service. Maybe we should rename this to
>"strong thread-safety". Maybe "defered-cancel thread-safety"?
>  
>
But my point was that it's perfectly reasonable to have POSIX 
thread-safety without cancel-safety. I don't see how it's relevant 
whether anything but POSIX also has cancel-safety.

>>(Async cancel is an oddity; there are, and should be, very few 
>>async-cancel-safe functions. Async-cancel regions of code cannot 
>>accomodate resource acquisition or release of any sort, as the recovery 
>>code is generally unable to determine the state of the resource.)
>>    
>>
>Yes. This is why I don't feel it's necessary to discuss it further since
>so little code will be concerned with it, we can altogether not deal
>with it for most C++ libraries.
>  
>
Introducing asynchronous exceptions into C++ would be pointlessly 
disruptive, like introducing continuable exceptions. I'd rather not even 
consider it.

Even if it were supported, though, C++ is certainly free to follow the 
lead of POSIX. We designated only a very few functions to be 
async-cancel safe; and even at that I think we ended up with more than 
we really should have had. (I never really figured out why we ended up 
with pthread_cancel() being async-cancel safe, and I don't think it 
makes any sense. The guy who write the text couldn't remember either, 
but in the end we decided not to risk changing it.) Really, in terms of 
POSIX standard APIs, all you can do with async cancel enabled is to 
DISABLE async-cancel. I like it that way. There's no reason at all that 
ANY of the standard C++ runtime should be designated (or coded) to be 
async-cancel safe.

>>Nevertheless, it's quite reasonable to write a "thread-safe" special 
>>purpose application routine that doesn't deal with cancellation simply 
>>because the designer KNOWS that a thread running that code cannot be 
>>cancelled. One might even make this choice within in a general purpose 
>>library in some cases -- say, for a daemon thread that could never run 
>>application code nor be identified to the application, and that 
>>therefore cannot be cancelled.
>>    
>>
>Yes. Exactly. I have written a lot of code like that. The core C++
>threaded code is hidden far away from the user which cannot therefore
>"posix-defer-cancel" it. It can't even ever get the C++ exceptions since
>they are catch (...) and transformed into C error codes.
>  
>
This doesn't sound like the same thing, though. Your catch(...) may 
prevent the cancel from doing what it SHOULD do, but it won't prevent 
delivery, and you've just ignored the application's cancel request. 
That's bad, and while it may be "cancel safe" in some trivial respect, 
(an unexpected cancel request won't corrupt the library state), it's not 
useful to anyone.

If code runs in an application thread, or a thread for which application 
code might have a valid handle, then that thread can be cancelled at the 
whim of the application. You can of course simply DOCUMENT that doing 
this is an error. You can say it'll be ignored, or you can say that it 
may arbitrarily corrupt application state; but that's not a true general 
purpose library.

What I'm talking about is a separate thread created within the library 
to which no application code could possibly have a reference. It is 
physically impossible for the application code to ever REQUEST 
cancellation. (Yeah, very little is "physically impossible", and a 
simple uninitialized variable could end up holding the handle of such a 
thread; but that's an application error against which nobody can 
reasonably defend.) Anyway, if the application "CAN'T" cancel the 
thread, and the library knows that it WON'T cancel the thread, there's 
no point in writing code that runs ONLY within that thread to be cancel 
safe.

>As a conclusion to these (tentative) definitions, I believe the purpose
>of this mailing list is to find a solution to develop "defered-cancel
>thread-safe" C++ libraries: simple "thread-safe" libraries do not
>require special attention. If everyone could agree to the statement
>above, it would probably make the discussion more productive: other
>threading models which do not support async cancelation are of no
>interest to the discussion and can be forgotten.
>  
>
Code that cannot ever be subject to cancellation need not be cancel 
safe, if that's what you mean. If code was written to a thread model 
without cancellation, or written specifically for an environment where 
it would not be cancelled, that code can be brought into a new 
"cancellable C++" environment safely as long as that basic premise 
continues -- that it will not be run in a thread that's cancelled.

>If people agree on this statement, the only issue I can see which
>delimits the design space for the solution to this problem is whether or
>not you wish to allow the C++ library calling into C code (which uses
>pthreads) and/or allow C code to use the C++ library (which uses our C++
>threading solution).
>
>Maybe it would help to consider the two cases separatly and try to
>figure out what requirements each case creates:
>	1) C++ library calls C++ code and is called by C++ code.
>	2) C++ library calls into C code.
>	3) C code calls C++ library.
>
>The hard part seems to be 2) and 3) where, if you use exceptions to
>propagate a cancel operation from either a cancelation point or a
>pthread_exit call, you need to correctly handle the registered
>cancelation handlers _and_ the C++ catch blocks in the right order. That
>seems pretty hard (ie: impossible) to me, being just a _user_ of thread
>libraries.
>  
>
The impact extends beyond C and C++, to every facility that deals with 
exceptions; Java, Ada, Modula-2+, or whatever else. The call stack must 
be unwound once, and all handlers, no matter how declared or in what 
language, called in the correct sequence. You're right -- it's nearly 
impossible without exceptions; yet it's trivial, natural, and all but 
unavoidable if everyone uses the same common exception/unwind package. 
(And I might point out that any "non exception" mechanism that could 
accomplish it would be indistinguishable from a common exception 
infrastructure anyway!) That's precisely why cancellation and thread 
exit ARE exceptions, were always intended to be exceptions, and cannot 
practically be anything else. ;-)

>If people are not interested in 2) and 3) and just want to design a
>solution for 1), then I think it will make the discussion more
>productive to acknowledge it.
>  
>
The ANSI C++ committee could well do that; just as POSIX and C++ have so 
far essentially ignored each other. However, we might look back at the 
recently revealed origin of the name and subject of this mailing list, 
which is tangled up with actual implementation on a real system, 
specifically gcc. THEY cannot ignore interoperability between C and C++; 
and nor can anyone else in the real world. So even if the committee were 
to decide it cannot or is unwilling to address 2 and 3, I don't think 
that decision would be relevant to this mailing list!

-- 
/--------------------[ David.Butenhof at hp.com ]--------------------\
| Hewlett-Packard Company       Tru64 UNIX & VMS Thread Architect |
|     My book: http://www.awl.com/cseng/titles/0-201-63392-2/     |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/