[c++-pthreads] concrete library-code example

Wed Jan 7 15:41:38 UTC 2004

Nathan Myers wrote:

>On Mon, Jan 05, 2004 at 11:57:32AM -0500, Dave Butenhof wrote:
>  
>
>>Nathan Myers wrote:
>>    
>>
>>>Here is a more-or-less concrete example, for discussion purposes.
>>>It's meant as a generic example of code written according to the 
>>>existing contract offered by C libraries.
>>>      
>>>
>>Correction: "... offered by C libraries that support POSIX 1003.1b-1993 
>>or earlier."
>>    
>>
>Very few programmers can identify any POSIX definition by number.   
>They write, and have long written, exception-safe library code that, 
>at most, uses mutexes (wrapped carefully for portability!) to guard 
>global state.  Few have even heard of cancellation.
>
>Many millions of lines of such code have been running for years on 
>millions of installations, worldwide.  It's good code.  To pretend 
>that it's all suddenly worthless because it doesn't take into account 
>new (or newly-deployed) standard revision 7834-stroke-"b"-slash-
>667-stroke-"a" would simply make _us_ irrelevant.
>  
>
You're right -- most people don't understand the distinctions between 
POSIX revisions, so perhaps I've presumed too much. So I'll make this 
clear. Once you're talking about "POSIX" and "threads", there's no such 
ambiguity. There were no concept at all of "threads" in POSIX before 
there was cancellation. Cancellation is a required base feature of 
"pthreads", and has been since well before the first (semi-)public draft 
of the specification. There is no such thing as "pthreads" without 
cancellation, and never has been. A library coded to "1003.lb-1993 or 
before" can't use cancellation... but it can't use (or work with) 
threads, either. (Among other details, until 1003.1c-1995 [threads], 
POSIX overruled ANSI C and unconditionally required a single global 
errno variable; which makes simultaneous or even multiplexed thread 
execution, er, "interesting".)

The only rational or reasonable way to label code "correct" that uses 
POSIX thread interfaces (e.g., mutexes), and does not address 
cancellation, is if that code was designed exclusively for a SPECIALIZED 
(non-general) purpose where it could be known that it would run only in 
threads that the application WILL NOT cancel. There is no such thing as 
a CORRECT, general purpose, "POSIX" library recognizing threads that 
doesn't address cancellation. Period. If the code in question CORRECTLY 
uses non-cancel-safe threading, then it will continue to be safe no 
matter what choices C++ makes for cancellation support... because 
threads running the code will never be cancelled. If this is NOT true, 
then it was never correct in the first place.

Again, as I've said, if you're talking about non-POSIX threaded code, 
the story may be different. Yes, Win32 doesn't have cancellation. 
Solaris "UI Threads" (deprecated since Solaris 2.5, which is a long way 
back) didn't have cancellation. There are undoubtedly embedded system 
threading systems that don't have cancellation. This goes back to my 
question "what are the real goals here"?

Doing cancellation as error returns instead of exceptions doesn't make 
anything cancel-safe, either; not unless you analyze every place that 
calls a cancellable function to be sure it's doing something reasonable 
with this new return value that means something critically different 
from other errors. Many presume that the distinction between errors 
isn't really important, because they all simply mean "it didn't work". 
Many more simply ignore the error. Even those that try to "decode" the 
error and do something reasonable don't pass on the same error code to 
its own caller, for all sorts of reasons. Error returns aren't modular; 
particularly in complicated layered systems. Exceptions are. That's why 
exceptions are better, and why cancellation was always intended to be 
modelled as an exception.

>>>int affect_world(struct state* s)
>>>{
>>>  int result;
>>>  violate_invariants_or_claim_resources(s);
>>>  result = c_function_or_system_call(s->member);
>>>  if (result < 0) {
>>>    clean_up(s, result);
>>>    return result;
>>>  }
>>>  act_on_result(s, result);
>>>  restore_invariants_and_release_resources(s);
>>>  return 0;
>>>}
>>>
>>>This pattern is extremely common in both C and C++ libraries.  If 
>>>read() were to throw (or to "just ... not return"), the program state 
>>>would be corrupted.  A redefinition of c_function_or_system_call 
>>>semantics that breaks this code breaks many thousands of existing 
>>>thread-safe C and C++ libraries.
>>>      
>>>
>>If this code exists in a pure ANSI C/POSIX application using threads, 
>>and if the thread running this code can be cancelled, then the 
>>implementation of this function is broken because IT (not the 
>>implementation, nor the cancellation) corrupts program state.
>>    
>>
>No.  The code was written to a documented interface.  Whoever changes 
>the interface semantics without changing the interface name is 
>responsible for corrupting the program state.
>  
>
But the code is NOT written to the documented interface unambiguously 
applicable to ALL POSIX code that uses threads. If it doesn't use 
threads, there's no problem -- either in POSIX or in this hypothetical 
"threaded C++".

The only complication is if it was written for some OTHER threading 
package without cancellation, and if it's really more critical to the 
C++ goals to protect it from cancellation than to allow the rest of the 
application to depend on timely and safe cancellation. (And if so, I'd 
suggest the best way to be safe is to simply omit any support at all for 
cancellation. )

Once more, as explanation and apology, I wrote my original reply on this 
list blindly believing the list's name: "c++-pthreads". Not 
"C++-threads". "pthreads" means "POSIX threads", and "POSIX threads" 
means modular cancellation cleanup deliberately modelled on exceptions. 
I ought to have already known from previous discussions that the list 
name was inaccurate. The question is... how inaccurate, and in what way? 
If cancellation is important, then you'll be compatible with all CORRECT 
general POSIX threaded code by closely following the POSIX cancellation 
semantics. Nothing will magically make "all arbitrary threading package 
code, or incorrect POSIX threading code" cancel-safe, so I personally 
think it's a waste of time to worry about that too much.

Either way, someone's going to need to analyze the library code and make 
sure it's doing the right thing when it's cancelled. It's easier to do 
that with isolated modular cleanup mechanisms like cleanup handlers and 
destructors than to follow the often twisty passages by which typical 
code deals with and propagates error codes (if it does at all).

>>While I'm not at all trying to argue that the issue is at all as simple 
>>as this, that's the facts all the same.
>>    
>>
>Sorry, that's simply disingenuous.  To argue that everybody should have
>coded to an interface that you only just got around to documenting, 
>implementing, and deploying, many years after the code was written,
>borders on arrogant contempt.
>
>Such an attitude may be fine for the POSIX C committee, but I see no 
>reason to match it here.  In any case, we have a great deal more 
>already-thread-safe code to preserve, because thread-safety (by the
>common definition) is the norm, in C++.
>  
>
Again, either the code is "POSIX threads" or not. If it is, then either 
it deals with cancellation, knows a-priori that the application will 
never CHOOSE to cancel the thread that runs it, or it's broken. There 
are no other options. (Yes, broken code is common. Some code broken in 
this particular way may have no other problems, though it's not 
generally the way I'd prefer to bet. Designing a solution around support 
for broken code, though, without any guarantee that it'll fare any 
better under the less-modular alternative, doesn't seem to me the best 
way to start.)

If it's NOT "POSIX threads", then it probably has no concept of 
cancellation in the first place, and cannot be made magically safe for 
cancellation by any reasonable set of definitions. If it's C then it 
PROBABLY doesn't handle hierarchical cleanup. But if it's C++ (and at 
least THAT part of the list's name is presumably accurate), then it's 
probably as likely to do the right cleanup when an arbitrary new 
exception propagates as any C or C++ code is to do the right thing on an 
arbitrary new error status. (Which may well be just a way of saying that 
neither will work very often; but the real point is that I believe using 
error returns will make things worse, not better.)

>>Depending on propagation of error statuses is a really bad way to
>>implement cancellation.   At least, given the primitive and limited
>>concept of ANSI/POSIX error codes. Too much code ignores statuses in
>>the first place, which is bad enough. But, worse, there are many
>>legitimate reasons for library code to CONVERT return status values;
>>e.g., I called read() and it returned some error but MY function only
>>implicitly involves a read() and it simply wouldn't be useful or
>>meaningful to return that error to my caller. Instead, I want to
>>indicate that my function (say, synchronizing a database) failed, and
>>so any (or at least most) failures of my "support calls" will result
>>in my returning 'unable to synchronize database' (which often isn't an
>>ANSI/POSIX error number in the first place, but even if it is, it's
>>unlikely to be the value returned by read). The ECANCELLED some have
>>proposed would be lost, and that's unacceptable. This is why we
>>settled on exceptions to represent cancellation. And because POSIX and
>>ANSI C don't have exceptions, we devised the simple "cleanup handler"
>>mechanism that allowed a clean  and transparent implementation on top
>>of exceptions, or a "hack" implementation private to the thread
>>library where exceptions weren't available.
>>    
>>
>Again, that reasoning may be fine for C (did you really ask all those
>C programmers?), but we need not be bound by it here.  
>
>Since a cancellation error return swallowed up in library code must
>surface again at the next cancellation point, eventually (given a 
>well-written library) the failure must propagate upward to the point 
>where it may be turned into an exception.  (A library that never 
>propagates system-call failures to its caller isn't anything-safe, 
>and needn't concern us.)
>  
>
I really don't understand how this is supposed to work.

"Pending cancel" remains until the thread terminates. There was some 
mention somewhere that implied it would be cleared on delivery but 
"re-set" when the exception object was destroyed -- but I don't see how 
you'd manage that in any model where the "real" cancellation is a status 
return that MIGHT sometime later be "converted" to an exception.

That means every cancellable call made during cleanup will fail with 
your ECANCELLED. Which means most things simply cannot be cleaned up. 
OK, so maybe you defer all attempts to clean up until the exception is 
first raised, at which point you "unpend" the cancel until and unless 
the exception object is destroyed. That still breaks your example if the 
clean_up() routine makes any cancellable calls. (So much for preserving 
the old code.) Yet if returning the first ECANCEL "unpends", you'll lose 
the cancel if the code instead does something like:

  if (result < 0) {
    clean_up(s, result);
    return EINVAL;
  }

... and I thought the whole point of the "sticky cancel" was precisely 
to avoid that risk?

I haven't seen anything so far that seems to offer a way out of this 
maze of messes. And that's precisely why we made cancellation an 
exception in the first place.

>>>(The cancellation model described in
>>>http://www.codesourcery.com/archives/c++-pthreads/msg00021.html
>>>is designed to preserve libraries that contain code that follows 
>>>this pattern.)
>>>
>>>Jason, do you not consider those libraries worth preserving?
>>>      
>>>
>>If you're talking about a currently non-threaded library to which
>>you'd like to transparently add thread support; well, I doubt that's
>>possible, and this particular proposal isn't going to help.  When
>>they're redesigned and recoded to be thread-safe, they can also be
>>made cancel-safe. 
>>    
>>
>No.  I'm talking about the many millions of lines of existing 
>thread-safe library code.  Ordinary thread-safety is the norm in C++ 
>libraries, because it's the natural way to code, in C++.
>  
>
"Natural"? No, it's not, unless you presume that all objects and static 
data are private. And you know that's not true. C++ libraries, like STL, 
have gone through enormous pains to try to be "basically thread-safe", 
and there are still conceptual problems, particularly in areas like the 
interdependencies around iostreams. It's not easy to get any complicated 
system "right". Threading adds complication. Cancellation and/or 
exceptions adds complication. Nobody's saying it doesn't. But to be 
GENERAL PURPOSE C++ code, you need to be exception-safe; and to be 
GENERAL PURPOSE POSIX thread code, you need to be cancel-safe. To be 
GENERAL PURPOSE POSIX C++ thread code, you need to be both. And since 
there's not much difference between POSIX cancel-safe and C++ 
exception-safe, the combination shouldn't be so difficult, technically.

The problems, as always, are more political. What SORTS of changes are 
required of existing code? (You may like to think it's "WHETHER" changes 
are required -- but you'll never convince me there's any way to avoid 
requiring change or at least careful analysis of every affected code 
path.) WHICH existing code patterns to prefer over others? (And much of 
this pivots on which patterns each person thinks are common or 
important, and that's a matter of seriously subjective opinion.) Whether 
it's OK to lose a cancel? What level of reliability and latency a 
cancellable application ought to be able to expect?

>>If you're talking about adding cancel support transparently 
>>to an existing C++ library, I doubt this is sufficient unless there's 
>>some standard requirement that all C++ libraries must pass through the 
>>system failure code to the caller. (There isn't, can't be, and shouldn't 
>>be.) And it also presupposes that the C++ library isn't exception-safe; 
>>because if it is, then delivering cancellation as an exception would 
>>seem "obviously" to be the most compatible and complete solution.
>>    
>>
>Exception-safety depends on identifying and guarding against documented
>sources of possible exceptions.  System calls and C library functions 
>are not among those. Also, C++ libraries very frequently rely on 
>underlying C libraries, and are written to depend on their documented 
>behavior.  (None of my man(2) or man(3) pages mention unwinding, never 
>mind throwing.)
>  
>
None mention ECANCELLED, either. So if we can't add an undocumented 
throw, how can we add an undocumented error code? While you might like 
to think that "just adding a new error code" is "nearly transparent" to 
"some existing code", you're correct only in some narrow boundary of 
"nearly" and "some". There'll inevitably be enormous volumes of code 
that DOESN'T fit that preconceived pattern, and I'm quite sure we've all 
seen plenty of examples.

The only thing that's FREE is NOTHING. So either we do nothing, or we 
accept a cost. I don't see a whole lot of argument for simply ignoring 
cancellation entirely, which is nothing, and free. So you simply can't 
argue against some strategy because there's a cost. There's always a 
cost. The interesting questions revolve around how much cost is 
acceptable, and the benefits of each strategy relative to its cost.

Currently, ANSI C++ doesn't "support" threads at all. So any use of 
threads is beyond the boundaries of the standard, and therefore 
nonportable and subject to the whims of various implementations. On 
Tru64 UNIX and OpenVMS, cancellation is and always has been an 
exception. In POSIX cancellation has always worked exactly like a 
special exception (within the constraints of POSIX and ANSI C, of 
course, which doesn't allow actually using the word "exception" except 
in non-binding explanations). Nowhere has it ever been represented as a 
special error return from general pre-existing system functions. So 
which is more compatible, and with what?

>Even if you claim that the threat of "unwinding" from system calls is
>ancient, and that everything should have been written to assume it, 
>a change to make them throw would be completely new.
>  
>
Replace "unwinding" and "throw" by "cancel", or "threads", and the 
statement still stands. If you've already decided to add threads and 
cancel, I don't see how this argument is relevant. Yes, things will 
change. Old assumptions will be broken. Code will need to adapt. If you 
really CAN change the world so fundamentally without changing the code 
that runs in it, that'd be great; but I don't believe it.

>>And I'm deliberately discounting the mention I've seen several times in 
>>this list of "thread-safe" libraries that aren't "cancel-safe". Such 
>>libraries are simply broken, from basic design.  Cancellation is a basic 
>>and important part of the POSIX thread model, and if you're not safe 
>>you're not safe. The only viable exclusion (there, I avoided using the 
>>word "exception", though it took me a few moments of thought) is if you 
>>can be guaranteed to be running only in threads that can never be 
>>cancelled... and in that case the whole issue is irrelevant!
>>    
>>
>No offense intended, but disingenuity makes a poor substitute for 
>responsible design.
>
You clearly like the word "disingenuous". You've used it several times. 
That's at best a subjective slur that doesn't belong in this discussion. 
I may disagree with you, but I'm trying hard not to accuse you of being 
deliberately "not-straightforward", despite the enormous temptation of 
turning the accusation back at you. Or I could be disingenuous and 
simply say that I agree completely with your statement... but why should 
I take offense when it clearly has nothing at all to do with me?

>We can afford to be more responsible here, because we have stronger language semantics to work with, and well-worked-out exception-safety standards.
>  
>
Yes, let's be responsible. Perhaps we should first define precisely what 
"responsible" is intended to mean in this context, before we start 
arguing over which low-level details or discussions might fall under 
that banner. And I'm not being disingenuous here -- I absolutely mean 
it. Perhaps the C++ committee people already know exactly the full range 
of constraints and requirements on this effort, but I, and presumably 
others involved in this wider discussion group, cannot. If those 
constraints and requirements aren't to be explicitly and fully shared 
with us, then the discussion never should have been opened up in the 
first place... and I might as well just go away.

-- 
/--------------------[ David.Butenhof at hp.com ]--------------------\
| Hewlett-Packard Company       Tru64 UNIX & VMS Thread Architect |
|     My book: http://www.awl.com/cseng/titles/0-201-63392-2/     |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/