From oldham at codesourcery.com Thu Jan 1 17:08:48 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:08:48 -0800 Subject: [PATCH] Optimize guard update copy In-Reply-To: References: Message-ID: <3FF45420.3090106@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch removes number four of the copies done for guard update. > Basically, additionally to the three copies I mentioned in the previous > mail, we're doing one extra during the RemoteView expressionApply of the > data-parallel assignment we're doing for the guard domains. Ugh. Fixed by > manually sending/receiving from/to the views. Doesn't work for Cheetah, > so conditionalized on POOMA_MPI. What breaks for Cheetah? > Tested as usual, ok to apply? > > Richard. > > > 2003Dec30 Richard Guenther > > * src/Engine/MultiPatchEngine.cpp: optimize remote to local and > local to remote copy in guard update. > > ===== MultiPatchEngine.cpp 1.6 vs 1.7 ===== > --- 1.6/r2/src/Engine/MultiPatchEngine.cpp Tue Dec 9 12:16:07 2003 > +++ 1.7/r2/src/Engine/MultiPatchEngine.cpp Thu Dec 18 16:41:50 2003 > @@ -34,6 +34,7 @@ > #include "Engine/CompressedFraction.h" > #include "Array/Array.h" > #include "Tulip/ReduceOverContexts.h" > +#include "Tulip/SendReceive.h" > #include "Threads/PoomaCSem.h" > #include "Domain/IteratorPairDomain.h" > > @@ -261,6 +262,40 @@ > // > //----------------------------------------------------------------------------- > > +/// Guard layer assign between non-remote engines, just use the > +/// ET mechanisms > + > +template > +static inline > +void simpleAssign(const Array& lhs, > + const Array& rhs, > + const Interval& domain) > +{ > + lhs(domain) = rhs(domain); > +} > + > +/// Guard layer assign between remote engines, use Send/Receive directly > +/// to avoid one extra copy of the data. > + > +template > +static inline > +void simpleAssign(const Array >& lhs, > + const Array >& rhs, > + const Interval& domain) > +{ > + if (lhs.engine().owningContext() == rhs.engine().owningContext()) > + lhs(domain) = rhs(domain); > + else { > + typedef typename NewEngine, Interval >::Type_t ViewEngine_t; > + if (lhs.engine().engineIsLocal()) > + Receive::receive(ViewEngine_t(lhs.engine().localEngine(), domain), > + rhs.engine().owningContext()); > + else if (rhs.engine().engineIsLocal()) > + SendReceive::send(ViewEngine_t(rhs.engine().localEngine(), domain), > + lhs.engine().owningContext()); > + } > +} > + > template > void Engine >:: > fillGuardsHandler(const WrappedInt &) const > @@ -293,8 +328,12 @@ > Array lhs(data()[dest]), rhs(data()[src]); > > // Now do assignment from the subdomains. > - > + // Optimized lhs(p->domain_m) = rhs(p->domain_m); > +#if POOMA_MPI > + simpleAssign(lhs, rhs, p->domain_m); > +#else > lhs(p->domain_m) = rhs(p->domain_m); > +#endif > > ++p; > } -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:17:21 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:17:21 -0800 Subject: [PATCH] MPI SendReceive In-Reply-To: References: Message-ID: <3FF45621.4000404@codesourcery.com> Richard Guenther wrote: > Hi! > > This is now the MPI version of SendReceive.h, including changes to > RemoteEngine.h which handles (de-)serialization of engines. The latter > change allows optimizing away one of the three(!) copies we are doing > currently for communicating an engine at receive time: > - receive into message buffer > - deserialize into temporary brick engine > - copy temporary brick engine to target view > > the message buffer is now directly deserialized into the target view (for > non-Cheetah operation, with Cheetah this is not possible). Patch which > removes a fourth(!!) copy we're doing at guard update follows. > > Tested as usual. > > Ok? Yes. Thanks for improving the performance. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:24:47 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:24:47 -0800 Subject: [PATCH] Add MPI variants for RemoteProxy, CollectFromContexts and ReduceOverContexts In-Reply-To: References: Message-ID: <3FF457DF.6080709@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds native MPI variants of the above messaging abstractions. > These patches were tested together with the remaining changes with serial, > Cheetah and MPI. As POOMA_MPI is never defined (for now), this shouldn't > introduce regressions there, too. But of course for it alone, this patch > is useless. More to follow. > > Ok? Yes. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:25:53 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:25:53 -0800 Subject: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: References: Message-ID: <3FF45821.8030605@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch is a partial reversion of a previous patch that made us use > Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a > Cheetah::CHEETAH serialization for std::string, which was previously > missing. One step more for the MPI merge. > > Tested together with all other MPI changes with serial, Cheetah and MPI. > > Ok? Yes. Do we need more regression tests for this work to better ensure correctness? -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 22:45:45 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 1 Jan 2004 23:45:45 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: <3FF45821.8030605@codesourcery.com> References: <3FF45821.8030605@codesourcery.com> Message-ID: On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch is a partial reversion of a previous patch that made us use > > Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a > > Cheetah::CHEETAH serialization for std::string, which was previously > > missing. One step more for the MPI merge. > > > > Tested together with all other MPI changes with serial, Cheetah and MPI. > > > > Ok? > > Yes. Do we need more regression tests for this work to better ensure > correctness? Maybe, at least we get all non-POD types that are not explicitly specialized wrong during serialization. And I can tell you, such errors are _very_ hard to find (happened for me for std::string and RemoteProxy). Richard. From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:26:27 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:26:27 +0100 (CET) Subject: [PATCH] Initialize MPI Message-ID: Hi! This patch adds MPI initialization. Ok? Richard. 2004Jan02 Richard Guenther * src/Pooma/Pooma.cmpl.cpp: add initialization and finalization sequence for MPI. Pooma::blockAndEvaluate() at finalization. --- /home/richard/src/pooma/cvs/r2/src/Pooma/Pooma.cmpl.cpp 2003-12-25 12:26:04.000000000 +0100 +++ Pooma/Pooma.cmpl.cpp 2004-01-02 00:40:15.000000000 +0100 @@ -287,10 +287,10 @@ // we can do this in the other initialize routine by querying for // the Cheetah options from the Options object. -#if POOMA_CHEETAH - +#if POOMA_MPI + MPI_Init(&argc, &argv); +#elif POOMA_CHEETAH controller_g = new Cheetah::Controller(argc, argv); - #endif // Just create an Options object for this argc, argv set, and give that @@ -349,12 +349,20 @@ // Set myContext_s and numContexts_s to the context numbers. -#if POOMA_CHEETAH +#if POOMA_MESSAGING +#if POOMA_MPI + MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); + MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); + // ugh... + for (int i=0; imycontext(); numContexts_g = controller_g->ncontexts(); +#endif initializeCheetahHelpers(numContexts_g); @@ -376,14 +384,14 @@ warnMessages(opts.printWarnings()); errorMessages(opts.printErrors()); -#if POOMA_CHEETAH - // This barrier is here so that Pooma is initialized on all contexts // before we continue. (Another context could invoke a remote member // function on us before we're initialized... which would be bad.) +#if POOMA_MPI + MPI_Barrier(MPI_COMM_WORLD); +#elif POOMA_CHEETAH controller_g->barrier(); - #endif // Initialize the Inform streams with info on how many contexts we @@ -416,6 +424,8 @@ bool finalize(bool quitRTS, bool quitArch) { + Pooma::blockAndEvaluate(); + if (initialized_s) { // Wait for threads to finish. @@ -426,7 +436,7 @@ cleanup_s(); -#if POOMA_CHEETAH +#if POOMA_MESSAGING // Clean up the Cheetah helpers. finalizeCheetahHelpers(); @@ -436,15 +446,19 @@ if (quitRTS) { -#if POOMA_CHEETAH +#if POOMA_MESSAGING // Deleting the controller shuts down the cross-context communication // if this is the last thing using this controller. If something // else is using this, Cheetah will not shut down until that item // is destroyed or stops using the controller. +#if POOMA_MPI + MPI_Finalize(); +#elif POOMA_CHEETAH if (controller_g != 0) delete controller_g; +#endif #endif } @@ -784,18 +799,18 @@ SystemContext_t::runSomething(); } -#elif POOMA_REORDER_ITERATES +# elif POOMA_REORDER_ITERATES CTAssert(NO_SUPPORT_FOR_THREADS_WITH_MESSAGING); -#else // we're using the serial scheduler, so we only need to get messages +# else // we're using the serial scheduler, so we only need to get messages while (Pooma::incomingMessages()) { controller_g->poll(); } -#endif // schedulers +# endif // schedulers #else // !POOMA_CHEETAH From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 22:43:03 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 1 Jan 2004 23:43:03 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Optimize guard update copy In-Reply-To: <3FF45420.3090106@codesourcery.com> References: <3FF45420.3090106@codesourcery.com> Message-ID: On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch removes number four of the copies done for guard update. > > Basically, additionally to the three copies I mentioned in the previous > > mail, we're doing one extra during the RemoteView expressionApply of the > > data-parallel assignment we're doing for the guard domains. Ugh. Fixed by > > manually sending/receiving from/to the views. Doesn't work for Cheetah, > > so conditionalized on POOMA_MPI. > > What breaks for Cheetah? I don't remember... I can try again next week. Richard. From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 23:55:48 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 00:55:48 +0100 (CET) Subject: CVS down? Message-ID: Hi! $ traceroute pooma.codesourcery.com traceroute to pooma.codesourcery.com (65.73.237.138), 30 hops max, 38 byte packets 1 kolme.hamnixda.de (192.168.100.254) 5.260 ms 2.008 ms 1.801 ms 2 217.5.98.157 (217.5.98.157) 69.914 ms 61.777 ms 61.311 ms 3 217.237.156.218 (217.237.156.218) 60.429 ms 59.232 ms 60.236 ms 4 WAS-E4.WAS.US.NET.DTAG.DE (62.154.14.134) 191.891 ms 165.471 ms 162.642 ms 5 62.156.138.210 (62.156.138.210) 168.278 ms 173.438 ms 168.534 ms 6 bpr2-ae0.VirginiaEquinix.cw.net (208.173.50.253) 167.892 ms 170.474 ms 168.912 ms 7 208.173.50.242 (208.173.50.242) 163.327 ms 166.910 ms 203.087 ms 8 p7-3.cr01.mcln.eli.net (207.173.114.129) 171.543 ms !H 165.860 ms !H * Oh, btw. www.codesourcery.com seems to be down, too ((61) Connection refused). Richard. From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:34:16 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:34:16 +0100 (CET) Subject: [PATCH] Add --mpi configure switch Message-ID: Hi! This (finally) adds --mpi configure switch to enable POOMA_MPI. It checks for mpiCC or mpic++ in either $MPICH_ROOT/bin or the current $PATH and uses the first one found as new $cpp and $link. I didn't change the Cheetah configure switch which now has the slightly confusing name --messaging. Maybe we want to change this to --cheetah. Ok? I'll start full testing of serial, MPI and Cheetah to see if I forgot a part of the changes after the pending stuff is committed. Thanks, Richard. 2004Jan02 Richard Guenther * configure: add --mpi switch to enable MPI messaging using mpiCC/mpic++. --- /home/richard/src/pooma/cvs/r2/configure 2003-12-30 18:19:29.000000000 +0100 +++ configure 2004-01-02 00:40:10.000000000 +0100 @@ -209,8 +208,9 @@ $hdf5nm = "--hdf5"; $fftwnm = "--fftw"; $cheetahnm = "--messaging"; +$mpinm = "--mpi"; $strictnm = "--strict"; $archfnsnm = "--arch-specific-functions"; ### configure options $dbgprntnm = "-v"; # turn on verbose output from configure @@ -236,10 +237,11 @@ [$sharednm, "", "create a shared library."], [$finternm, "", "include fortran support libraries."], [$nofinternm, "", "do not include the fortran libraries."], [$preinstnm, "", "build preinstantiations of several classes."], [$serialnm, "", "configure to run serially, no parallelism."], - [$threadsnm, "", "include threads capability, if available."], + [$threadsnm, "", "include threads capability, if available."], [$cheetahnm, "", "enable use of CHEETAH communications package."], + [$mpinm, "", "enable use of MPI communications package."], [$schednm, "", "use for thread scheduling."], [$pawsnm, "", "enable PAWS program coupling, if available."], [$pawsdevnm, "", "enable PAWS program coupling for PAWS devel."], @@ -1276,13 +1266,22 @@ { $cheetah = 1; } - print "Set cheetah = $cheetah\n" if $dbgprnt; + if (scalar @{$arghash{$mpinm}} > 1) + { + $mpi = 1; + } $messaging = $cheetah + $mpi; + if ($messaging>1 or $messaging and scalar @{$arghash{$serialnm}}> 1) + { + printerror "$cheetahnm and/or $mpinm and/or $serialnm given. Use only one."; + } + print "Set messaging = $messaging\n" if $dbgprnt; # add a define indicating whether CHEETAH/MPI is available, and configure # extra options to include and define lists my $defmessaging = $messaging; my $defcheetah = 0; + my $defmpi = 0; if ($cheetah) { if (exists $ENV{"CHEETAHDIR"}) @@ -1299,7 +1298,6 @@ } $defcheetah = 1; - $scheduler = "serialAsync"; # add in the extra compilation settings for CHEETAH. @@ -1315,8 +1313,40 @@ $link = $cheetah_link; } } + elsif ($mpi) + { + my $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; + if (system("test -x $MPICH_ROOT/bin/mpiCC") == 0) + { + $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; + } + elsif (system("test -x $MPICH_ROOT/bin/mpic++") == 0) + { + $mpiCC = "\$(MPICH_ROOT)/bin/mpic++"; + } + elsif (system("which mpiCC") == 0) + { + $mpiCC = "mpiCC"; + } + elsif (system("which mpic++") == 0) + { + $mpiCC = "mpic++"; + } + else + { + die "There is no known MPI location. Select one by setting MPICH_ROOT or adjusting your PATH.\n"; + } + + $defmpi = 1; + $scheduler = "serialAsync"; + + # use special compiler script for MPI. + $cpp = $mpiCC; + $link = $mpiCC; + } add_yesno_define("POOMA_MESSAGING", $defmessaging); add_yesno_define("POOMA_CHEETAH", $defcheetah); + add_yesno_define("POOMA_MPI", $defmpi); } From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:20:35 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:20:35 +0100 (CET) Subject: [PATCH] MPI support for SerialAsync scheduler Message-ID: Hi! This patch moves SerialAsync to the state I have it. So this patch maybe somewhat hard to follow, so I'll go through the obfuscating parts first: - it moves commentary to doxygen style - it moves Iterate definition up due to dependency issues Apart from this, the patch introduces a std::stack for tracking the current generation. This is necessary for MPI messaging to avoid deadlocks waiting for communication on one end that hasn't been issued at the remote end yet. Basically the only places where a full blockAndEvaluate() is safe, is, if we're not inside a generation. And we need to sometimes wait for communication to complete due to a limited amount of MPI_Requests we can have in fly. For asyncronous MPI operation the scheduler maintains the necessary MPI_Request structures and has the ability to wait on the completion of the asyncronous requests. This makes necessary the deferred destruction of the Iterates done via a reference count that is incremented on every MPI request issued and decremented on every MPI request completed. This same mechanism may possibly used to solve the Cheetah use-after-destruct issue -- I'll prepare a seperate patch for this. So, I hope I didn't forget something in the patch. The patch was tested as usual. Ok to commit? Thanks, Richard. 2004Jan02 Richard Guenther * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, add std::stack for generation tracking, add support for asyncronous MPI requests. src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define new static variables. --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.h 2000-06-09 00:16:50.000000000 +0200 +++ Threads/IterateSchedulers/SerialAsync.h 2004-01-02 00:40:16.000000000 +0100 @@ -42,48 +42,38 @@ // DataObject //----------------------------------------------------------------------------- -#include - #ifndef _SerialAsync_h_ #define _SerialAsync_h_ -/* -LIBRARY: - SerialAsync - -CLASSES: IterateScheduler - -CLASSES: DataObject - -CLASSES: Iterate - -OVERVIEW - SerialAsync IterateScheduler is a policy template to create a - dependence graphs and executes the graph respecting the - dependencies without using threads. There is no parallelism, - but Iterates may be executed out-of-order with respect to the - program text. - ------------------------------------------------------------------------------*/ - -////////////////////////////////////////////////////////////////////// -//----------------------------------------------------------------------------- -// Overview: -// Smarts classes for times when you want no threads but you do want -// dataflow evaluation. -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Typedefs: -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Smarts classes for times when you want no threads but you do want + * dataflow evaluation. + * + * SerialAsync IterateScheduler is a policy template to create a + * dependence graphs and executes the graph respecting the + * dependencies without using threads. + * There is no (thread level) parallelism, but Iterates may be executed + * out-of-order with respect to the program text. Also this scheduler is + * used for message based parallelism in which case asyncronous execution + * leads to reduced communication latencies. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- #include +#include +#include +#include +#include +#include #include "Threads/IterateSchedulers/IterateScheduler.h" #include "Threads/IterateSchedulers/Runnable.h" +#include "Tulip/Messaging.h" +#include "Utilities/PAssert.h" //----------------------------------------------------------------------------- // Forward Declarations: @@ -94,76 +84,261 @@ namespace Smarts { -#define MYID 0 -#define MAX_CPUS 1 -// -// Tag class for specializing IterateScheduler, Iterate and DataObject. -// +/** + * Tag class for specializing IterateScheduler, Iterate and DataObject. + */ + struct SerialAsync { - enum Action { Read, Write}; + enum Action { Read, Write }; }; -//----------------------------------------------------------------------------- +/** + * Iterate is used to implement the SerialAsync + * scheduling policy. + * + * An Iterate is a non-blocking unit of concurrency that is used + * to describe a chunk of work. It inherits from the Runnable + * class and as all subclasses of Runnable, the user specializes + * the run() method to specify the operation. + * Iterate is a further specialization of the + * Iterate class to use the SerialAsync Scheduling algorithm to + * generate the data dependency graph for a data-driven + * execution. + */ + +template<> +class Iterate : public Runnable +{ + friend class IterateScheduler; + friend class DataObject; + +public: + + typedef DataObject DataObject_t; + typedef IterateScheduler IterateScheduler_t; + + + /// The Constructor for this class takes the IterateScheduler and a + /// CPU affinity. CPU affinity has a default value of -1 which means + /// it may run on any CPU available. + + inline Iterate(IterateScheduler & s, int affinity=-1) + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) + {} + + /// The dtor is virtual because the subclasses will need to add to it. + + virtual ~Iterate() {} + + /// The run method does the core work of the Iterate. + /// It is supplied by the subclass. + + virtual void run() = 0; + + //@name Stubs for the affinities + /// There is no such thing in serial. + //@{ + + inline int affinity() const {return 0;} + + inline int hintAffinity() const {return 0;} + + inline void affinity(int) {} + + inline void hintAffinity(int) {} + + //@} + + /// Notify is used to indicate to the Iterate that one of the data + /// objects it had requested has been granted. To do this, we dec a + /// dependence counter which, if equal to 0, the Iterate is ready for + /// execution. + + void notify() + { + if (--notifications_m == 0) + add(this); + } + + /// How many notifications remain? + + int notifications() const { return notifications_m; } + + void addNotification() { notifications_m++; } + + int& generation() { return generation_m; } + + int& togo() { return togo_m; } + +protected: + + /// What scheduler are we working with? + IterateScheduler &scheduler_m; + + /// How many notifications should we receive before we can run? + int notifications_m; + + /// Which generation we were issued in. + int generation_m; + + /// How many times we need to go past a "did something" to be ready + /// for destruction? + int togo_m; + +}; + + +/** + * FIXME. + */ struct SystemContext { void addNCpus(int) {} void wait() {} void concurrency(int){} - int concurrency() {return 1;} + int concurrency() { return 1; } void mustRunOn() {} // We have a separate message queue because they are // higher priority. + typedef Iterate *IteratePtr_t; static std::list workQueueMessages_m; static std::list workQueue_m; +#if POOMA_MPI + static MPI_Request requests_m[1024]; + static std::map allocated_requests_m; + static std::set free_requests_m; +#endif + + +#if POOMA_MPI - /////////////////////////// - // This function lets you check if there are iterates that are - // ready to run. - inline static - bool workReady() + /// Query, if we have lots of MPI_Request slots available + + static bool haveLotsOfMPIRequests() { - return !(workQueue_m.empty() && workQueueMessages_m.empty()); + return free_requests_m.size() > 1024/2; } - /////////////////////////// - // Run an iterate if one is ready. - inline static - void runSomething() + /// Get a MPI_Request slot, associated with an iterate + + static MPI_Request* getMPIRequest(IteratePtr_t p) { - if (!workQueueMessages_m.empty()) - { - // Get the top iterate. - // Delete it from the queue. - // Run the iterate. - // Delete the iterate. This could put more iterates in the queue. + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); + int i = *free_requests_m.begin(); + free_requests_m.erase(free_requests_m.begin()); + allocated_requests_m[i] = p; + p->togo()++; + return &requests_m[i]; + } - RunnablePtr_t p = workQueueMessages_m.front(); - workQueueMessages_m.pop_front(); - p->execute(); + static void releaseMPIRequest(int i) + { + IteratePtr_t p = allocated_requests_m[i]; + allocated_requests_m.erase(i); + free_requests_m.insert(i); + if (--(p->togo()) == 0) delete p; - } + } + + static bool waitForSomeRequests(bool mayBlock) + { + if (allocated_requests_m.empty()) + return false; + + int last_used_request = allocated_requests_m.rbegin()->first; + int finished[last_used_request+1]; + MPI_Status statuses[last_used_request+1]; + int nr_finished; + int res; + if (mayBlock) + res = MPI_Waitsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); else - { - if (!workQueue_m.empty()) - { - RunnablePtr_t p = workQueue_m.front(); - workQueue_m.pop_front(); - p->execute(); - delete p; + res = MPI_Testsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); + if (nr_finished == MPI_UNDEFINED) + return false; + + // release finised requests + while (nr_finished--) { + if (res == MPI_ERR_IN_STATUS) { + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { + char msg[MPI_MAX_ERROR_STRING+1]; + int len; + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); + msg[len] = '\0'; + PInsist(0, msg); + } } + releaseMPIRequest(finished[nr_finished]); } + return true; + } + +#else + + static bool waitForSomeRequests(bool mayBlock) + { + return false; + } + +#endif + + + /// This function lets you check if there are iterates that are + /// ready to run. + + static bool workReady() + { + return !(workQueue_m.empty() + && workQueueMessages_m.empty() +#if POOMA_MPI + && allocated_requests_m.empty() +#endif + ); + } + + /// Run an iterate if one is ready. Returns if progress + /// was made. + + static bool runSomething(bool mayBlock = true) + { + // do work in this order to minimize communication latency: + // - issue all messages + // - do some regular work + // - wait for messages to complete + + RunnablePtr_t p = NULL; + if (!workQueueMessages_m.empty()) { + p = workQueueMessages_m.front(); + workQueueMessages_m.pop_front(); + } else if (!workQueue_m.empty()) { + p = workQueue_m.front(); + workQueue_m.pop_front(); + } + + if (p) { + p->execute(); + Iterate *it = dynamic_cast(p); + if (it) { + if (--(it->togo()) == 0) + delete it; + } else + delete p; + return true; + + } else + return waitForSomeRequests(mayBlock); } }; -inline void addRunnable(RunnablePtr_t rn) -{ - SystemContext::workQueue_m.push_front(rn); -} +/// Adds a runnable to the appropriate work-queue. inline void add(RunnablePtr_t rn) { @@ -182,25 +357,18 @@ inline void wait() {} inline void mustRunOn(){} -/*------------------------------------------------------------------------ -CLASS - IterateScheduler_Serial_Async - - Implements a asynchronous scheduler for a data driven execution. - Specializes a IterateScheduler. - -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - - The SerialAsync IterateScheduler, Iterate and DataObject - implement a SMARTS scheduler that does dataflow without threads. - What that means is that when you hand iterates to the - IterateScheduler it stores them up until you call - IterateScheduler::blockingEvaluate(), at which point it evaluates - iterates until the queue is empty. ------------------------------------------------------------------------------*/ + +/** + * Implements a asynchronous scheduler for a data driven execution. + * Specializes a IterateScheduler. + * + * The SerialAsync IterateScheduler, Iterate and DataObject + * implement a SMARTS scheduler that does dataflow without threads. + * What that means is that when you hand iterates to the + * IterateScheduler it stores them up until you call + * IterateScheduler::blockingEvaluate(), at which point it evaluates + * iterates until the queue is empty. + */ template<> class IterateScheduler @@ -212,196 +380,128 @@ typedef DataObject DataObject_t; typedef Iterate Iterate_t; - /////////////////////////// - // Constructor - // - IterateScheduler() {} - - /////////////////////////// - // Destructor - // - ~IterateScheduler() {} - void setConcurrency(int) {} - - //--------------------------------------------------------------------------- - // Mutators. - //--------------------------------------------------------------------------- - - /////////////////////////// - // Tells the scheduler that the parser thread is starting a new - // data-parallel statement. Any Iterate that is handed off to the - // scheduler between beginGeneration() and endGeneration() belongs - // to the same data-paralllel statement and therefore has the same - // generation number. - // - inline void beginGeneration() { } - - /////////////////////////// - // Tells the scheduler that no more Iterates will be handed off for - // the data parallel statement that was begun with a - // beginGeneration(). - // - inline void endGeneration() {} - - /////////////////////////// - // The parser thread calls this method to evaluate the generated - // graph until all the nodes in the dependence graph has been - // executed by the scheduler. That is to say, the scheduler - // executes all the Iterates that has been handed off to it by the - // parser thread. - // - inline - void blockingEvaluate(); - - /////////////////////////// - // The parser thread calls this method to ask the scheduler to run - // the given Iterate when the dependence on that Iterate has been - // satisfied. - // - inline void handOff(Iterate* it); + IterateScheduler() + : generation_m(0) + {} - inline - void releaseIterates() { } + ~IterateScheduler() {} -protected: -private: + void setConcurrency(int) {} - typedef std::list Container_t; - typedef Container_t::iterator Iterator_t; + /// Tells the scheduler that the parser thread is starting a new + /// data-parallel statement. Any Iterate that is handed off to the + /// scheduler between beginGeneration() and endGeneration() belongs + /// to the same data-paralllel statement and therefore has the same + /// generation number. + /// Nested invocations are handled as being part of the outermost + /// generation. -}; + void beginGeneration() + { + // Ensure proper overflow behavior. + if (++generation_m < 0) + generation_m = 0; + generationStack_m.push(generation_m); + } -//----------------------------------------------------------------------------- + /// Tells the scheduler that no more Iterates will be handed off for + /// the data parallel statement that was begun with a + /// beginGeneration(). -/*------------------------------------------------------------------------ -CLASS - Iterate_SerialAsync - - Iterate is used to implement the SerialAsync - scheduling policy. - -KEYWORDS - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. - -DESCRIPTION - An Iterate is a non-blocking unit of concurrency that is used - to describe a chunk of work. It inherits from the Runnable - class and as all subclasses of Runnable, the user specializes - the run() method to specify the operation. - Iterate is a further specialization of the - Iterate class to use the SerialAsync Scheduling algorithm to - generate the data dependency graph for a data-driven - execution. */ + void endGeneration() + { + PAssert(inGeneration()); + generationStack_m.pop(); -template<> -class Iterate : public Runnable -{ - friend class IterateScheduler; - friend class DataObject; +#if POOMA_MPI + // this is a safe point to block until we have "lots" of MPI Requests + if (!inGeneration()) + while (!SystemContext::haveLotsOfMPIRequests()) + SystemContext::runSomething(true); +#endif + } -public: + /// Wether we are inside a generation and may not safely block. - typedef DataObject DataObject_t; - typedef IterateScheduler IterateScheduler_t; + bool inGeneration() const + { + return !generationStack_m.empty(); + } + /// What the current generation is. - /////////////////////////// - // The Constructor for this class takes the IterateScheduler and a - // CPU affinity. CPU affinity has a default value of -1 which means - // it may run on any CPU available. - // - inline Iterate(IterateScheduler & s, int affinity=-1); - - /////////////////////////// - // The dtor is virtual because the subclasses will need to add to it. - // - virtual ~Iterate() {} + int generation() const + { + if (!inGeneration()) + return -1; + return generationStack_m.top(); + } - /////////////////////////// - // The run method does the core work of the Iterate. - // It is supplied by the subclass. - // - virtual void run() = 0; + /// The parser thread calls this method to evaluate the generated + /// graph until all the nodes in the dependence graph has been + /// executed by the scheduler. That is to say, the scheduler + /// executes all the Iterates that has been handed off to it by the + /// parser thread. - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int affinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int hintAffinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void affinity(int) {} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void hintAffinity(int) {} + void blockingEvaluate() + { + if (inGeneration()) { + // It's not safe to block inside a generation, so + // just do as much as we can without blocking. + while (SystemContext::runSomething(false)) + ; + + } else { + // Loop as long as there is anything in the queue. + while (SystemContext::workReady()) + SystemContext::runSomething(true); + } + } - /////////////////////////// - // Notify is used to indicate to the Iterate that one of the data - // objects it had requested has been granted. To do this, we dec a - // dependence counter which, if equal to 0, the Iterate is ready for - // execution. - // - inline void notify(); - - /////////////////////////// - // How many notifications remain? - // - inline - int notifications() const { return notifications_m; } + /// The parser thread calls this method to ask the scheduler to run + /// the given Iterate when the dependence on that Iterate has been + /// satisfied. - inline void addNotification() + void handOff(Iterate* it) { - notifications_m++; + // No action needs to be taken here. Iterates will make their + // own way into the execution queue. + it->generation() = generation(); + it->notify(); } -protected: + void releaseIterates() { } - // What scheduler are we working with? - IterateScheduler &scheduler_m; +private: - // How many notifications should we receive before we can run? - int notifications_m; + typedef std::list Container_t; + typedef Container_t::iterator Iterator_t; -private: - // Set notifications dynamically and automatically every time a - // request is made by the iterate - void incr_notifications() { notifications_m++;} + static std::stack generationStack_m; + int generation_m; }; -//----------------------------------------------------------------------------- - -/*------------------------------------------------------------------------ -CLASS - DataObject_SerialAsync - - Implements a asynchronous scheduler for a data driven execution. -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - The DataObject Class is used introduce a type to represent - a resources (normally) blocks of data) that Iterates contend - for atomic access. Iterates make request for either a read or - write to the DataObjects. DataObjects may grant the request if - the object is currently available. Otherwise, the request is - enqueue in a queue private to the data object until the - DataObject is release by another Iterate. A set of read - requests may be granted all at once if there are no - intervening write request to that DataObject. - DataObject is a specialization of DataObject for - the policy template SerialAsync. -*/ +/** + * Implements a asynchronous scheduler for a data driven execution. + * + * The DataObject Class is used introduce a type to represent + * a resources (normally) blocks of data) that Iterates contend + * for atomic access. Iterates make request for either a read or + * write to the DataObjects. DataObjects may grant the request if + * the object is currently available. Otherwise, the request is + * enqueue in a queue private to the data object until the + * DataObject is release by another Iterate. A set of read + * requests may be granted all at once if there are no + * intervening write request to that DataObject. + * DataObject is a specialization of DataObject for + * the policy template SerialAsync. + * + * There are two ways data can be used: to read or to write. + * Don't change this to give more than two states; + * things inside depend on that. + */ template<> class DataObject @@ -413,54 +513,56 @@ typedef IterateScheduler IterateScheduler_t; typedef Iterate Iterate_t; - // There are two ways data can be used: to read or to write. - // Don't change this to give more than two states: - // things inside depend on that. - - /////////////////////////// - // Construct the data object with an empty set of requests - // and the given affinity. - // - inline DataObject(int affinity=-1); + + /// Construct the data object with an empty set of requests + /// and the given affinity. + + DataObject(int affinity=-1) + : released_m(queue_m.end()), notifications_m(0) + { + // released_m to the end of the queue (which should) also be the + // beginning. notifications_m to zero, since nothing has been + // released yet. + } - /////////////////////////// - // for compatibility with other SMARTS schedulers, accept - // Scheduler arguments (unused) - // - inline - DataObject(int affinity, IterateScheduler&); - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline int affinity() const { return 0; } - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline void affinity(int) {} + /// for compatibility with other SMARTS schedulers, accept + /// Scheduler arguments (unused) - /////////////////////////// - // An iterate makes a request for a certain action in a certain - // generation. - // - inline - void request(Iterate&, SerialAsync::Action); - - /////////////////////////// - // An iterate finishes and tells the DataObject it no longer needs - // it. If this is the last release for the current set of - // requests, have the IterateScheduler release some more. - // - inline void release(SerialAsync::Action); + inline DataObject(int affinity, IterateScheduler&) + : released_m(queue_m.end()), notifications_m(0) + {} + + /// Stub out affinity because there is no affinity in serial. + + int affinity() const { return 0; } + + /// Stub out affinity because there is no affinity in serial. + + void affinity(int) {} + + /// An iterate makes a request for a certain action in a certain + /// generation. + + inline void request(Iterate&, SerialAsync::Action); + + /// An iterate finishes and tells the DataObject it no longer needs + /// it. If this is the last release for the current set of + /// requests, have the IterateScheduler release some more. + + void release(SerialAsync::Action) + { + if (--notifications_m == 0) + releaseIterates(); + } -protected: private: - // If release needs to let more iterates go, it calls this. + /// If release needs to let more iterates go, it calls this. inline void releaseIterates(); - // The type for a request. + /** + * The type for a request. + */ class Request { public: @@ -475,135 +577,27 @@ SerialAsync::Action act_m; }; - // The type of the queue and iterator. + /// The type of the queue and iterator. typedef std::list Container_t; typedef Container_t::iterator Iterator_t; - // The list of requests from various iterates. - // They're granted in FIFO order. + /// The list of requests from various iterates. + /// They're granted in FIFO order. Container_t queue_m; - // Pointer to the last request that has been granted. + /// Pointer to the last request that has been granted. Iterator_t released_m; - // The number of outstanding notifications. + /// The number of outstanding notifications. int notifications_m; }; -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for -// IterateScheduler -// -////////////////////////////////////////////////////////////////////// - -// -// IterateScheduler::handOff(Iterate*) -// No action needs to be taken here. Iterates will make their -// own way into the execution queue. -// - -inline void -IterateScheduler::handOff(Iterate* it) -{ - it->notify(); -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for Iterate -// -////////////////////////////////////////////////////////////////////// - -// -// Iterate::Iterate -// Construct with the scheduler and the number of notifications. -// Ignore the affinity. -// - -inline -Iterate::Iterate(IterateScheduler& s, int) -: scheduler_m(s), notifications_m(1) -{ -} - -// -// Iterate::notify -// Notify the iterate that a DataObject is ready. -// Decrement the counter, and if it is zero, alert the scheduler. -// - -inline void -Iterate::notify() -{ - if ( --notifications_m == 0 ) - { - add(this); - } -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for DataObject -// -////////////////////////////////////////////////////////////////////// - -// -// DataObject::DataObject() -// Initialize: -// released_m to the end of the queue (which should) also be the -// beginning. notifications_m to zero, since nothing has been -// released yet. -// - -inline -DataObject::DataObject(int) -: released_m(queue_m.end()), notifications_m(0) -{ -} - -// -// void DataObject::release(Action) -// An iterate has finished and is telling the DataObject that -// it is no longer needed. -// +/// void DataObject::releaseIterates(SerialAsync::Action) +/// When the last released iterate dies, we need to +/// look at the beginning of the queue and tell more iterates +/// that they can access this data. inline void -DataObject::release(SerialAsync::Action) -{ - if ( --notifications_m == 0 ) - releaseIterates(); -} - - - -//----------------------------------------------------------------------------- -// -// void IterateScheduler::blockingEvaluate -// Evaluate all the iterates in the queue. -// -//----------------------------------------------------------------------------- -inline -void -IterateScheduler::blockingEvaluate() -{ - // Loop as long as there is anything in the queue. - while (SystemContext::workReady()) - { - SystemContext::runSomething(); - } -} - -//----------------------------------------------------------------------------- -// -// void DataObject::releaseIterates(SerialAsync::Action) -// When the last released iterate dies, we need to -// look at the beginning of the queue and tell more iterates -// that they can access this data. -// -//----------------------------------------------------------------------------- -inline -void DataObject::releaseIterates() { // Get rid of the reservations that have finished. @@ -622,14 +616,17 @@ released_m->iterate().notify(); ++notifications_m; - // Record what action that one will take. + // Record what action that one will take + // and record its generation number SerialAsync::Action act = released_m->act(); + int generation = released_m->iterate().generation(); // Look at the next iterate. ++released_m; // If the first one was a read, release more. if ( act == SerialAsync::Read ) + { // As long as we aren't at the end and we have more reads... while ((released_m != end) && @@ -642,29 +639,30 @@ // And go on to the next. ++released_m; } + + } + } } +/// void DataObject::request(Iterate&, action) +/// An iterate makes a reservation with this DataObject for a given +/// action in a given generation. The request may be granted +/// immediately. -// -// void DataObject::request(Iterate&, action) -// An iterate makes a reservation with this DataObject for a given -// action in a given generation. The request may be granted -// immediately. -// -inline -void +inline void DataObject::request(Iterate& it, SerialAsync::Action act) { // The request can be granted immediately if: // The queue is currently empty, or - // The request is a read and everything in the queue is a read. + // the request is a read and everything in the queue is a read, + // or (with relaxed conditions), everything is the same generation. // Set notifications dynamically and automatically // every time a request is made by the iterate - it.incr_notifications(); + it.notifications_m++; bool allReleased = (queue_m.end() == released_m); bool releasable = queue_m.empty() || @@ -691,17 +689,11 @@ } -//---------------------------------------------------------------------- - - -// -// End of Smarts namespace. -// -} +} // namespace Smarts ////////////////////////////////////////////////////////////////////// -#endif // POOMA_PACKAGE_CLASS_H +#endif // _SerialAsync_h_ /*************************************************************************** * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2000-04-12 02:08:06.000000000 +0200 +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2004-01-02 00:40:16.000000000 +0100 @@ -82,6 +82,12 @@ std::list SystemContext::workQueueMessages_m; std::list SystemContext::workQueue_m; +#if POOMA_MPI + MPI_Request SystemContext::requests_m[1024]; + std::map SystemContext::allocated_requests_m; + std::set SystemContext::free_requests_m; +#endif +std::stack IterateScheduler::generationStack_m; } From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 15:36:47 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 16:36:47 +0100 (CET) Subject: [PATCH] OpenMP support Message-ID: Hi Jeffrey, would you please look at "[PATCH] OpenMP loop level parallelism" mail I sent Dec23? Additionally to this patch I propose the following, which adds a --openmp switch to configure. Tested with gcc (with and without --openmp, which is the same here) and Intel icpc (with and without --openmp, which makes a difference here). Ok? Thanks, Richard. 2004Jan02 Richard Guenther * config/arch/LINUXICC.conf: don't warn about unused #pragmas. configure: add --openmp switch. scripts/configure.ac: add test to detect wether -openmp works. scripts/configure: regenerated. diff -Nru a/r2/config/arch/LINUXICC.conf b/r2/config/arch/LINUXICC.conf --- a/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 +++ b/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 @@ -170,8 +170,8 @@ ### debug or optimized build settings for C++ applications -$cppdbg_app = "-g"; -$cppopt_app = "-DNOPAssert -DNOCTAssert -O2"; +$cppdbg_app = "-g -wd161"; +$cppopt_app = "-DNOPAssert -DNOCTAssert -O2 -wd161"; ### debug or optimized build settings for C++ libraries diff -Nru a/r2/configure b/r2/configure --- a/r2/configure Fri Jan 2 16:32:14 2004 +++ b/r2/configure Fri Jan 2 16:32:14 2004 @@ -170,6 +170,7 @@ $prefixnm = "--prefix"; $serialnm = "--serial"; $threadsnm = "--threads"; +$openmpnm = "--openmp"; $profilenm = "--profile"; $insurenm = "--insure"; $debugnm = "--debug"; @@ -237,7 +238,8 @@ [$finternm, "", "include fortran support libraries."], [$nofinternm, "", "do not include the fortran libraries."], [$serialnm, "", "configure to run serially, no parallelism."], - [$threadsnm, "", "include threads capability, if available."], + [$threadsnm, "", "include threads capability, if available."], + [$openmpnm, "", "enable use of OpenMP, if available."], [$cheetahnm, "", "enable use of CHEETAH communications package."], [$schednm, "", "use for thread scheduling."], [$pawsnm, "", "enable PAWS program coupling, if available."], @@ -434,6 +436,10 @@ $threads_include_makefile = ""; $cpp_threads_arg = ""; +### include OpenMP capability? +$openmp = 0; +$openmpargs = ""; + ### if threads is used, what scheduler should be employed? $scheduler = $schedulerdefault; @@ -1307,9 +1313,9 @@ sub setthreads { # set $threads variable properly - if (scalar @{$arghash{$threadsnm}} > 1 and scalar @{$arghash{$serialnm}}> 1) + if (scalar @{$arghash{$threadsnm}} > 1 and (scalar @{$arghash{$serialnm}} > 1 or scalar @{$arghash{$openmpnm}} > 1)) { - printerror "$threadsnm and $serialnm both given. Use only one."; + printerror "$threadsnm and $serialnm or $openmpnm given. Use only one."; } elsif (not $threads_able or scalar @{$arghash{$serialnm}} > 1) { @@ -1438,6 +1444,13 @@ $pooma_reorder_iterates = $threads || ($scheduler eq "serialAsync"); add_yesno_define("POOMA_REORDER_ITERATES", $pooma_reorder_iterates); + + # OpenMP support + if (scalar @{$arghash{$openmpnm}} > 1) + { + $openmpargs = "\@openmpargs\@"; + } + } @@ -1936,20 +1949,20 @@ print FSUITE "LD = $link\n"; print FSUITE "\n"; print FSUITE "### flags for applications\n"; - print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $cppshare $cppopt_lib\n"; - print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $cppshare $cppdbg_lib\n"; - print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $cppopt_app\n"; - print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $cppdbg_app\n"; + print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppopt_lib\n"; + print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppdbg_lib\n"; + print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; + print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; print FSUITE "\n"; - print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $cshare $copt_lib\n"; - print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $cshare $cdbg_lib\n"; - print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $copt_app\n"; - print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $cdbg_app\n"; + print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $openmpargs $cshare $copt_lib\n"; + print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $openmpargs $cshare $cdbg_lib\n"; + print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $openmpargs $copt_app\n"; + print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; print FSUITE "\n"; - print FSUITE "F77_OPT_LIB_ARGS = $f77args $f77share $f77opt_lib\n"; - print FSUITE "F77_DBG_LIB_ARGS = $f77args $f77share $f77dbg_lib\n"; - print FSUITE "F77_OPT_APP_ARGS = $f77args $f77opt_app\n"; - print FSUITE "F77_DBG_APP_ARGS = $f77args $f77dbg_app\n"; + print FSUITE "F77_OPT_LIB_ARGS = $f77args $openmpargs $f77share $f77opt_lib\n"; + print FSUITE "F77_DBG_LIB_ARGS = $f77args $openmpargs $f77share $f77dbg_lib\n"; + print FSUITE "F77_OPT_APP_ARGS = $f77args $openmpargs $f77opt_app\n"; + print FSUITE "F77_DBG_APP_ARGS = $f77args $openmpargs $f77dbg_app\n"; print FSUITE "\n"; if ($shared) { print FSUITE "AR_OPT_ARGS = $arshareopt\n"; diff -Nru a/r2/scripts/configure.ac b/r2/scripts/configure.ac --- a/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 +++ b/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 @@ -352,6 +352,31 @@ dnl +dnl Check for compiler argument for OpenMP support +dnl + +AC_MSG_CHECKING([for way to enable OpenMP support]) +acx_saved_cxxflags=$CXXFLAGS +CXXFLAGS="$CXXFLAGS -openmp" +AC_TRY_LINK([ +#include +], [ + double d[128]; +#pragma omp parallel for + for (int i=0; i<128; ++i) + d[i] = 1.0; + omp_get_max_threads(); +], [ +AC_MSG_RESULT([-openmp]) +openmpargs="-openmp" +], [ +AC_MSG_RESULT([none]) +]) +CXXFLAGS=$acx_saved_cxxflags +AC_SUBST(openmpargs) + + +dnl dnl Check on how to get failure on unrecognized pragmas dnl gcc: -Wunknown-pragmas -Werror dnl icpc: -we161 From oldham at codesourcery.com Fri Jan 2 20:01:07 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 02 Jan 2004 12:01:07 -0800 Subject: [pooma-dev] CVS down? In-Reply-To: References: Message-ID: <3FF5CE03.4090907@codesourcery.com> Richard Guenther wrote: > Hi! > > $ traceroute pooma.codesourcery.com > traceroute to pooma.codesourcery.com (65.73.237.138), 30 hops max, 38 byte > packets > 1 kolme.hamnixda.de (192.168.100.254) 5.260 ms 2.008 ms 1.801 ms > 2 217.5.98.157 (217.5.98.157) 69.914 ms 61.777 ms 61.311 ms > 3 217.237.156.218 (217.237.156.218) 60.429 ms 59.232 ms 60.236 ms > 4 WAS-E4.WAS.US.NET.DTAG.DE (62.154.14.134) 191.891 ms 165.471 ms > 162.642 ms > 5 62.156.138.210 (62.156.138.210) 168.278 ms 173.438 ms 168.534 ms > 6 bpr2-ae0.VirginiaEquinix.cw.net (208.173.50.253) 167.892 ms 170.474 > ms 168.912 ms > 7 208.173.50.242 (208.173.50.242) 163.327 ms 166.910 ms 203.087 ms > 8 p7-3.cr01.mcln.eli.net (207.173.114.129) 171.543 ms !H 165.860 ms !H * > > Oh, btw. www.codesourcery.com seems to be down, too ((61) Connection > refused). Thank you for the report of the difficulties. 01 January, CodeSourcery moved its machines, which now have different IP addresses. As the new DNS entries move through the Internet, these problems will disappear. We apologize for any difficulties these changes may have caused. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:12:38 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:12:38 -0800 Subject: [PATCH] Add --mpi configure switch In-Reply-To: References: Message-ID: <3FF9D346.9040206@codesourcery.com> Richard Guenther wrote: > Hi! > > This (finally) adds --mpi configure switch to enable POOMA_MPI. It checks > for mpiCC or mpic++ in either $MPICH_ROOT/bin or the current $PATH and > uses the first one found as new $cpp and $link. > > I didn't change the Cheetah configure switch which now has the slightly > confusing name --messaging. Maybe we want to change this to --cheetah. > > Ok? Yes. This is good progress. > I'll start full testing of serial, MPI and Cheetah to see if I forgot a > part of the changes after the pending stuff is committed. > > Thanks, > > Richard. > > > 2004Jan02 Richard Guenther > > * configure: add --mpi switch to enable MPI messaging using > mpiCC/mpic++. > > --- /home/richard/src/pooma/cvs/r2/configure 2003-12-30 18:19:29.000000000 +0100 > +++ configure 2004-01-02 00:40:10.000000000 +0100 > @@ -209,8 +208,9 @@ > $hdf5nm = "--hdf5"; > $fftwnm = "--fftw"; > $cheetahnm = "--messaging"; > +$mpinm = "--mpi"; > $strictnm = "--strict"; > $archfnsnm = "--arch-specific-functions"; > > ### configure options > $dbgprntnm = "-v"; # turn on verbose output from configure > @@ -236,10 +237,11 @@ > [$sharednm, "", "create a shared library."], > [$finternm, "", "include fortran support libraries."], > [$nofinternm, "", "do not include the fortran libraries."], > [$preinstnm, "", "build preinstantiations of several classes."], > [$serialnm, "", "configure to run serially, no parallelism."], > - [$threadsnm, "", "include threads capability, if available."], > + [$threadsnm, "", "include threads capability, if available."], > [$cheetahnm, "", "enable use of CHEETAH communications package."], > + [$mpinm, "", "enable use of MPI communications package."], > [$schednm, "", "use for thread scheduling."], > [$pawsnm, "", "enable PAWS program coupling, if available."], > [$pawsdevnm, "", "enable PAWS program coupling for PAWS devel."], > @@ -1276,13 +1266,22 @@ > { > $cheetah = 1; > } > - print "Set cheetah = $cheetah\n" if $dbgprnt; > + if (scalar @{$arghash{$mpinm}} > 1) > + { > + $mpi = 1; > + } > $messaging = $cheetah + $mpi; > + if ($messaging>1 or $messaging and scalar @{$arghash{$serialnm}}> 1) > + { > + printerror "$cheetahnm and/or $mpinm and/or $serialnm given. Use only one."; > + } > + print "Set messaging = $messaging\n" if $dbgprnt; > > # add a define indicating whether CHEETAH/MPI is available, and configure > # extra options to include and define lists > my $defmessaging = $messaging; > my $defcheetah = 0; > + my $defmpi = 0; > if ($cheetah) > { > if (exists $ENV{"CHEETAHDIR"}) > @@ -1299,7 +1298,6 @@ > } > > $defcheetah = 1; > - > $scheduler = "serialAsync"; > > # add in the extra compilation settings for CHEETAH. > @@ -1315,8 +1313,40 @@ > $link = $cheetah_link; > } > } > + elsif ($mpi) > + { > + my $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; > + if (system("test -x $MPICH_ROOT/bin/mpiCC") == 0) > + { > + $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; > + } > + elsif (system("test -x $MPICH_ROOT/bin/mpic++") == 0) > + { > + $mpiCC = "\$(MPICH_ROOT)/bin/mpic++"; > + } > + elsif (system("which mpiCC") == 0) > + { > + $mpiCC = "mpiCC"; > + } > + elsif (system("which mpic++") == 0) > + { > + $mpiCC = "mpic++"; > + } > + else > + { > + die "There is no known MPI location. Select one by setting MPICH_ROOT or adjusting your PATH.\n"; > + } > + > + $defmpi = 1; > + $scheduler = "serialAsync"; > + > + # use special compiler script for MPI. > + $cpp = $mpiCC; > + $link = $mpiCC; > + } > add_yesno_define("POOMA_MESSAGING", $defmessaging); > add_yesno_define("POOMA_CHEETAH", $defcheetah); > + add_yesno_define("POOMA_MPI", $defmpi); > } > > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:30:43 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:30:43 -0800 Subject: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: Message-ID: <3FF9D783.5030504@codesourcery.com> Richard Guenther wrote: > The patch was tested as usual. > > Ok to commit? I have some questions and comments interspersed below. > Thanks, Richard. > > > 2004Jan02 Richard Guenther > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > add std::stack for generation tracking, add support for > asyncronous MPI requests. Add an 'h' to spell asynchronous. > src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define > new static variables. > > --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.h 2000-06-09 00:16:50.000000000 +0200 > +++ Threads/IterateSchedulers/SerialAsync.h 2004-01-02 00:40:16.000000000 +0100 > @@ -42,48 +42,38 @@ > // DataObject > //----------------------------------------------------------------------------- > > -#include > - > #ifndef _SerialAsync_h_ > #define _SerialAsync_h_ > -/* > -LIBRARY: > - SerialAsync > - > -CLASSES: IterateScheduler > - > -CLASSES: DataObject > - > -CLASSES: Iterate > - > -OVERVIEW > - SerialAsync IterateScheduler is a policy template to create a > - dependence graphs and executes the graph respecting the > - dependencies without using threads. There is no parallelism, > - but Iterates may be executed out-of-order with respect to the > - program text. > - > ------------------------------------------------------------------------------*/ > - > -////////////////////////////////////////////////////////////////////// > > -//----------------------------------------------------------------------------- > -// Overview: > -// Smarts classes for times when you want no threads but you do want > -// dataflow evaluation. > -//----------------------------------------------------------------------------- > - > -//----------------------------------------------------------------------------- > -// Typedefs: > -//----------------------------------------------------------------------------- > +/** @file > + * @ingroup IterateSchedulers > + * @brief > + * Smarts classes for times when you want no threads but you do want > + * dataflow evaluation. > + * > + * SerialAsync IterateScheduler is a policy template to create a > + * dependence graphs and executes the graph respecting the > + * dependencies without using threads. > + * There is no (thread level) parallelism, but Iterates may be executed > + * out-of-order with respect to the program text. Also this scheduler is > + * used for message based parallelism in which case asyncronous execution > + * leads to reduced communication latencies. > + */ > > //----------------------------------------------------------------------------- > // Includes: > //----------------------------------------------------------------------------- > > #include > +#include > +#include > +#include > +#include > +#include > #include "Threads/IterateSchedulers/IterateScheduler.h" > #include "Threads/IterateSchedulers/Runnable.h" > +#include "Tulip/Messaging.h" > +#include "Utilities/PAssert.h" > > //----------------------------------------------------------------------------- > // Forward Declarations: > @@ -94,76 +84,261 @@ > > namespace Smarts { > > -#define MYID 0 > -#define MAX_CPUS 1 > -// > -// Tag class for specializing IterateScheduler, Iterate and DataObject. > -// > +/** > + * Tag class for specializing IterateScheduler, Iterate and DataObject. > + */ > + > struct SerialAsync > { > - enum Action { Read, Write}; > + enum Action { Read, Write }; > }; > > > -//----------------------------------------------------------------------------- > +/** > + * Iterate is used to implement the SerialAsync > + * scheduling policy. > + * > + * An Iterate is a non-blocking unit of concurrency that is used > + * to describe a chunk of work. It inherits from the Runnable > + * class and as all subclasses of Runnable, the user specializes > + * the run() method to specify the operation. > + * Iterate is a further specialization of the > + * Iterate class to use the SerialAsync Scheduling algorithm to > + * generate the data dependency graph for a data-driven > + * execution. > + */ > + > +template<> > +class Iterate : public Runnable > +{ > + friend class IterateScheduler; > + friend class DataObject; > + > +public: > + > + typedef DataObject DataObject_t; > + typedef IterateScheduler IterateScheduler_t; > + > + > + /// The Constructor for this class takes the IterateScheduler and a > + /// CPU affinity. CPU affinity has a default value of -1 which means > + /// it may run on any CPU available. > + > + inline Iterate(IterateScheduler & s, int affinity=-1) > + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) > + {} > + > + /// The dtor is virtual because the subclasses will need to add to it. > + > + virtual ~Iterate() {} > + > + /// The run method does the core work of the Iterate. > + /// It is supplied by the subclass. > + > + virtual void run() = 0; > + > + //@name Stubs for the affinities > + /// There is no such thing in serial. > + //@{ > + > + inline int affinity() const {return 0;} > + > + inline int hintAffinity() const {return 0;} > + > + inline void affinity(int) {} > + > + inline void hintAffinity(int) {} > + > + //@} > + > + /// Notify is used to indicate to the Iterate that one of the data > + /// objects it had requested has been granted. To do this, we dec a > + /// dependence counter which, if equal to 0, the Iterate is ready for > + /// execution. > + > + void notify() > + { > + if (--notifications_m == 0) > + add(this); > + } > + > + /// How many notifications remain? > + > + int notifications() const { return notifications_m; } > + > + void addNotification() { notifications_m++; } > + > + int& generation() { return generation_m; } > + > + int& togo() { return togo_m; } > + > +protected: > + > + /// What scheduler are we working with? > + IterateScheduler &scheduler_m; > + > + /// How many notifications should we receive before we can run? > + int notifications_m; > + > + /// Which generation we were issued in. > + int generation_m; > + > + /// How many times we need to go past a "did something" to be ready > + /// for destruction? > + int togo_m; > + > +}; > + > + > +/** > + * FIXME. > + */ I am wary of adding unfinished code to the code base. At the very least, we need a more extensive comment describing what is not finished. > struct SystemContext > { > void addNCpus(int) {} > void wait() {} > void concurrency(int){} > - int concurrency() {return 1;} > + int concurrency() { return 1; } > void mustRunOn() {} > > // We have a separate message queue because they are > // higher priority. > + typedef Iterate *IteratePtr_t; > static std::list workQueueMessages_m; > static std::list workQueue_m; > +#if POOMA_MPI > + static MPI_Request requests_m[1024]; What is this fixed constant of 1024? Does this come from the MPI standard? > + static std::map allocated_requests_m; > + static std::set free_requests_m; > +#endif > + > + > +#if POOMA_MPI > > - /////////////////////////// > - // This function lets you check if there are iterates that are > - // ready to run. > - inline static > - bool workReady() > + /// Query, if we have lots of MPI_Request slots available > + > + static bool haveLotsOfMPIRequests() > { > - return !(workQueue_m.empty() && workQueueMessages_m.empty()); > + return free_requests_m.size() > 1024/2; > } > > - /////////////////////////// > - // Run an iterate if one is ready. > - inline static > - void runSomething() > + /// Get a MPI_Request slot, associated with an iterate > + > + static MPI_Request* getMPIRequest(IteratePtr_t p) > { > - if (!workQueueMessages_m.empty()) > - { > - // Get the top iterate. > - // Delete it from the queue. > - // Run the iterate. > - // Delete the iterate. This could put more iterates in the queue. > + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); > + int i = *free_requests_m.begin(); > + free_requests_m.erase(free_requests_m.begin()); > + allocated_requests_m[i] = p; > + p->togo()++; > + return &requests_m[i]; > + } > > - RunnablePtr_t p = workQueueMessages_m.front(); > - workQueueMessages_m.pop_front(); > - p->execute(); > + static void releaseMPIRequest(int i) > + { > + IteratePtr_t p = allocated_requests_m[i]; > + allocated_requests_m.erase(i); > + free_requests_m.insert(i); > + if (--(p->togo()) == 0) > delete p; > - } > + } > + > + static bool waitForSomeRequests(bool mayBlock) > + { > + if (allocated_requests_m.empty()) > + return false; > + > + int last_used_request = allocated_requests_m.rbegin()->first; > + int finished[last_used_request+1]; > + MPI_Status statuses[last_used_request+1]; > + int nr_finished; > + int res; > + if (mayBlock) > + res = MPI_Waitsome(last_used_request+1, requests_m, > + &nr_finished, finished, statuses); > else > - { > - if (!workQueue_m.empty()) > - { > - RunnablePtr_t p = workQueue_m.front(); > - workQueue_m.pop_front(); > - p->execute(); > - delete p; > + res = MPI_Testsome(last_used_request+1, requests_m, > + &nr_finished, finished, statuses); > + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); > + if (nr_finished == MPI_UNDEFINED) > + return false; > + > + // release finised requests > + while (nr_finished--) { > + if (res == MPI_ERR_IN_STATUS) { > + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { > + char msg[MPI_MAX_ERROR_STRING+1]; > + int len; > + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); > + msg[len] = '\0'; > + PInsist(0, msg); > + } > } > + releaseMPIRequest(finished[nr_finished]); > } > + return true; > + } > + > +#else > + > + static bool waitForSomeRequests(bool mayBlock) > + { > + return false; > + } > + > +#endif > + > + > + /// This function lets you check if there are iterates that are > + /// ready to run. > + > + static bool workReady() > + { > + return !(workQueue_m.empty() > + && workQueueMessages_m.empty() > +#if POOMA_MPI > + && allocated_requests_m.empty() > +#endif > + ); > + } > + > + /// Run an iterate if one is ready. Returns if progress > + /// was made. > + > + static bool runSomething(bool mayBlock = true) > + { > + // do work in this order to minimize communication latency: > + // - issue all messages > + // - do some regular work > + // - wait for messages to complete > + > + RunnablePtr_t p = NULL; > + if (!workQueueMessages_m.empty()) { > + p = workQueueMessages_m.front(); > + workQueueMessages_m.pop_front(); > + } else if (!workQueue_m.empty()) { > + p = workQueue_m.front(); > + workQueue_m.pop_front(); > + } > + > + if (p) { > + p->execute(); > + Iterate *it = dynamic_cast(p); > + if (it) { > + if (--(it->togo()) == 0) > + delete it; > + } else > + delete p; > + return true; > + > + } else > + return waitForSomeRequests(mayBlock); > } > > }; > > -inline void addRunnable(RunnablePtr_t rn) > -{ > - SystemContext::workQueue_m.push_front(rn); > -} > +/// Adds a runnable to the appropriate work-queue. > > inline void add(RunnablePtr_t rn) > { > @@ -182,25 +357,18 @@ > inline void wait() {} > inline void mustRunOn(){} > > -/*------------------------------------------------------------------------ > -CLASS > - IterateScheduler_Serial_Async > - > - Implements a asynchronous scheduler for a data driven execution. > - Specializes a IterateScheduler. > - > -KEYWORDS > - Data-parallelism, Native-interface, IterateScheduler. > - > -DESCRIPTION > - > - The SerialAsync IterateScheduler, Iterate and DataObject > - implement a SMARTS scheduler that does dataflow without threads. > - What that means is that when you hand iterates to the > - IterateScheduler it stores them up until you call > - IterateScheduler::blockingEvaluate(), at which point it evaluates > - iterates until the queue is empty. > ------------------------------------------------------------------------------*/ > + > +/** > + * Implements a asynchronous scheduler for a data driven execution. > + * Specializes a IterateScheduler. > + * > + * The SerialAsync IterateScheduler, Iterate and DataObject > + * implement a SMARTS scheduler that does dataflow without threads. > + * What that means is that when you hand iterates to the > + * IterateScheduler it stores them up until you call > + * IterateScheduler::blockingEvaluate(), at which point it evaluates > + * iterates until the queue is empty. > + */ > > template<> > class IterateScheduler > @@ -212,196 +380,128 @@ > typedef DataObject DataObject_t; > typedef Iterate Iterate_t; > > - /////////////////////////// > - // Constructor > - // > - IterateScheduler() {} > - > - /////////////////////////// > - // Destructor > - // > - ~IterateScheduler() {} > - void setConcurrency(int) {} > - > - //--------------------------------------------------------------------------- > - // Mutators. > - //--------------------------------------------------------------------------- > - > - /////////////////////////// > - // Tells the scheduler that the parser thread is starting a new > - // data-parallel statement. Any Iterate that is handed off to the > - // scheduler between beginGeneration() and endGeneration() belongs > - // to the same data-paralllel statement and therefore has the same > - // generation number. > - // > - inline void beginGeneration() { } > - > - /////////////////////////// > - // Tells the scheduler that no more Iterates will be handed off for > - // the data parallel statement that was begun with a > - // beginGeneration(). > - // > - inline void endGeneration() {} > - > - /////////////////////////// > - // The parser thread calls this method to evaluate the generated > - // graph until all the nodes in the dependence graph has been > - // executed by the scheduler. That is to say, the scheduler > - // executes all the Iterates that has been handed off to it by the > - // parser thread. > - // > - inline > - void blockingEvaluate(); > - > - /////////////////////////// > - // The parser thread calls this method to ask the scheduler to run > - // the given Iterate when the dependence on that Iterate has been > - // satisfied. > - // > - inline void handOff(Iterate* it); > + IterateScheduler() > + : generation_m(0) > + {} > > - inline > - void releaseIterates() { } > + ~IterateScheduler() {} > > -protected: > -private: > + void setConcurrency(int) {} > > - typedef std::list Container_t; > - typedef Container_t::iterator Iterator_t; > + /// Tells the scheduler that the parser thread is starting a new > + /// data-parallel statement. Any Iterate that is handed off to the > + /// scheduler between beginGeneration() and endGeneration() belongs > + /// to the same data-paralllel statement and therefore has the same > + /// generation number. > + /// Nested invocations are handled as being part of the outermost > + /// generation. > > -}; > + void beginGeneration() > + { > + // Ensure proper overflow behavior. > + if (++generation_m < 0) > + generation_m = 0; > + generationStack_m.push(generation_m); > + } > > -//----------------------------------------------------------------------------- > + /// Tells the scheduler that no more Iterates will be handed off for > + /// the data parallel statement that was begun with a > + /// beginGeneration(). > > -/*------------------------------------------------------------------------ > -CLASS > - Iterate_SerialAsync > - > - Iterate is used to implement the SerialAsync > - scheduling policy. > - > -KEYWORDS > - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. > - > -DESCRIPTION > - An Iterate is a non-blocking unit of concurrency that is used > - to describe a chunk of work. It inherits from the Runnable > - class and as all subclasses of Runnable, the user specializes > - the run() method to specify the operation. > - Iterate is a further specialization of the > - Iterate class to use the SerialAsync Scheduling algorithm to > - generate the data dependency graph for a data-driven > - execution. */ > + void endGeneration() > + { > + PAssert(inGeneration()); > + generationStack_m.pop(); > > -template<> > -class Iterate : public Runnable > -{ > - friend class IterateScheduler; > - friend class DataObject; > +#if POOMA_MPI > + // this is a safe point to block until we have "lots" of MPI Requests > + if (!inGeneration()) > + while (!SystemContext::haveLotsOfMPIRequests()) > + SystemContext::runSomething(true); > +#endif > + } > > -public: > + /// Wether we are inside a generation and may not safely block. > > - typedef DataObject DataObject_t; > - typedef IterateScheduler IterateScheduler_t; > + bool inGeneration() const > + { > + return !generationStack_m.empty(); > + } > > + /// What the current generation is. > > - /////////////////////////// > - // The Constructor for this class takes the IterateScheduler and a > - // CPU affinity. CPU affinity has a default value of -1 which means > - // it may run on any CPU available. > - // > - inline Iterate(IterateScheduler & s, int affinity=-1); > - > - /////////////////////////// > - // The dtor is virtual because the subclasses will need to add to it. > - // > - virtual ~Iterate() {} > + int generation() const > + { > + if (!inGeneration()) > + return -1; > + return generationStack_m.top(); > + } > > - /////////////////////////// > - // The run method does the core work of the Iterate. > - // It is supplied by the subclass. > - // > - virtual void run() = 0; > + /// The parser thread calls this method to evaluate the generated > + /// graph until all the nodes in the dependence graph has been > + /// executed by the scheduler. That is to say, the scheduler > + /// executes all the Iterates that has been handed off to it by the > + /// parser thread. > > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline int affinity() const {return 0;} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline int hintAffinity() const {return 0;} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline void affinity(int) {} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline void hintAffinity(int) {} > + void blockingEvaluate() > + { > + if (inGeneration()) { > + // It's not safe to block inside a generation, so > + // just do as much as we can without blocking. > + while (SystemContext::runSomething(false)) > + ; > + > + } else { > + // Loop as long as there is anything in the queue. > + while (SystemContext::workReady()) > + SystemContext::runSomething(true); > + } > + } > > - /////////////////////////// > - // Notify is used to indicate to the Iterate that one of the data > - // objects it had requested has been granted. To do this, we dec a > - // dependence counter which, if equal to 0, the Iterate is ready for > - // execution. > - // > - inline void notify(); > - > - /////////////////////////// > - // How many notifications remain? > - // > - inline > - int notifications() const { return notifications_m; } > + /// The parser thread calls this method to ask the scheduler to run > + /// the given Iterate when the dependence on that Iterate has been > + /// satisfied. > > - inline void addNotification() > + void handOff(Iterate* it) > { > - notifications_m++; > + // No action needs to be taken here. Iterates will make their > + // own way into the execution queue. > + it->generation() = generation(); > + it->notify(); > } > > -protected: > + void releaseIterates() { } > > - // What scheduler are we working with? > - IterateScheduler &scheduler_m; > +private: > > - // How many notifications should we receive before we can run? > - int notifications_m; > + typedef std::list Container_t; > + typedef Container_t::iterator Iterator_t; > > -private: > - // Set notifications dynamically and automatically every time a > - // request is made by the iterate > - void incr_notifications() { notifications_m++;} > + static std::stack generationStack_m; > + int generation_m; > > }; > > > -//----------------------------------------------------------------------------- > - > -/*------------------------------------------------------------------------ > -CLASS > - DataObject_SerialAsync > - > - Implements a asynchronous scheduler for a data driven execution. > -KEYWORDS > - Data-parallelism, Native-interface, IterateScheduler. > - > -DESCRIPTION > - The DataObject Class is used introduce a type to represent > - a resources (normally) blocks of data) that Iterates contend > - for atomic access. Iterates make request for either a read or > - write to the DataObjects. DataObjects may grant the request if > - the object is currently available. Otherwise, the request is > - enqueue in a queue private to the data object until the > - DataObject is release by another Iterate. A set of read > - requests may be granted all at once if there are no > - intervening write request to that DataObject. > - DataObject is a specialization of DataObject for > - the policy template SerialAsync. > -*/ > +/** > + * Implements a asynchronous scheduler for a data driven execution. > + * > + * The DataObject Class is used introduce a type to represent > + * a resources (normally) blocks of data) that Iterates contend > + * for atomic access. Iterates make request for either a read or > + * write to the DataObjects. DataObjects may grant the request if > + * the object is currently available. Otherwise, the request is > + * enqueue in a queue private to the data object until the > + * DataObject is release by another Iterate. A set of read > + * requests may be granted all at once if there are no > + * intervening write request to that DataObject. > + * DataObject is a specialization of DataObject for > + * the policy template SerialAsync. > + * > + * There are two ways data can be used: to read or to write. > + * Don't change this to give more than two states; > + * things inside depend on that. > + */ > > template<> > class DataObject > @@ -413,54 +513,56 @@ > typedef IterateScheduler IterateScheduler_t; > typedef Iterate Iterate_t; > > - // There are two ways data can be used: to read or to write. > - // Don't change this to give more than two states: > - // things inside depend on that. > - > - /////////////////////////// > - // Construct the data object with an empty set of requests > - // and the given affinity. > - // > - inline DataObject(int affinity=-1); > + > + /// Construct the data object with an empty set of requests > + /// and the given affinity. > + > + DataObject(int affinity=-1) > + : released_m(queue_m.end()), notifications_m(0) > + { > + // released_m to the end of the queue (which should) also be the > + // beginning. notifications_m to zero, since nothing has been > + // released yet. > + } > > - /////////////////////////// > - // for compatibility with other SMARTS schedulers, accept > - // Scheduler arguments (unused) > - // > - inline > - DataObject(int affinity, IterateScheduler&); > - > - /////////////////////////// > - // Stub out affinity because there is no affinity in serial. > - // > - inline int affinity() const { return 0; } > - > - /////////////////////////// > - // Stub out affinity because there is no affinity in serial. > - // > - inline void affinity(int) {} > + /// for compatibility with other SMARTS schedulers, accept > + /// Scheduler arguments (unused) > > - /////////////////////////// > - // An iterate makes a request for a certain action in a certain > - // generation. > - // > - inline > - void request(Iterate&, SerialAsync::Action); > - > - /////////////////////////// > - // An iterate finishes and tells the DataObject it no longer needs > - // it. If this is the last release for the current set of > - // requests, have the IterateScheduler release some more. > - // > - inline void release(SerialAsync::Action); > + inline DataObject(int affinity, IterateScheduler&) > + : released_m(queue_m.end()), notifications_m(0) > + {} > + > + /// Stub out affinity because there is no affinity in serial. > + > + int affinity() const { return 0; } > + > + /// Stub out affinity because there is no affinity in serial. > + > + void affinity(int) {} > + > + /// An iterate makes a request for a certain action in a certain > + /// generation. > + > + inline void request(Iterate&, SerialAsync::Action); > + > + /// An iterate finishes and tells the DataObject it no longer needs > + /// it. If this is the last release for the current set of > + /// requests, have the IterateScheduler release some more. > + > + void release(SerialAsync::Action) > + { > + if (--notifications_m == 0) > + releaseIterates(); > + } > > -protected: > private: > > - // If release needs to let more iterates go, it calls this. > + /// If release needs to let more iterates go, it calls this. > inline void releaseIterates(); > > - // The type for a request. > + /** > + * The type for a request. > + */ > class Request > { > public: > @@ -475,135 +577,27 @@ > SerialAsync::Action act_m; > }; > > - // The type of the queue and iterator. > + /// The type of the queue and iterator. > typedef std::list Container_t; > typedef Container_t::iterator Iterator_t; > > - // The list of requests from various iterates. > - // They're granted in FIFO order. > + /// The list of requests from various iterates. > + /// They're granted in FIFO order. > Container_t queue_m; > > - // Pointer to the last request that has been granted. > + /// Pointer to the last request that has been granted. > Iterator_t released_m; > > - // The number of outstanding notifications. > + /// The number of outstanding notifications. > int notifications_m; > }; > > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for > -// IterateScheduler > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// IterateScheduler::handOff(Iterate*) > -// No action needs to be taken here. Iterates will make their > -// own way into the execution queue. > -// > - > -inline void > -IterateScheduler::handOff(Iterate* it) > -{ > - it->notify(); > -} > - > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for Iterate > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// Iterate::Iterate > -// Construct with the scheduler and the number of notifications. > -// Ignore the affinity. > -// > - > -inline > -Iterate::Iterate(IterateScheduler& s, int) > -: scheduler_m(s), notifications_m(1) > -{ > -} > - > -// > -// Iterate::notify > -// Notify the iterate that a DataObject is ready. > -// Decrement the counter, and if it is zero, alert the scheduler. > -// > - > -inline void > -Iterate::notify() > -{ > - if ( --notifications_m == 0 ) > - { > - add(this); > - } > -} > - > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for DataObject > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// DataObject::DataObject() > -// Initialize: > -// released_m to the end of the queue (which should) also be the > -// beginning. notifications_m to zero, since nothing has been > -// released yet. > -// > - > -inline > -DataObject::DataObject(int) > -: released_m(queue_m.end()), notifications_m(0) > -{ > -} > - > -// > -// void DataObject::release(Action) > -// An iterate has finished and is telling the DataObject that > -// it is no longer needed. > -// > +/// void DataObject::releaseIterates(SerialAsync::Action) > +/// When the last released iterate dies, we need to > +/// look at the beginning of the queue and tell more iterates > +/// that they can access this data. > > inline void > -DataObject::release(SerialAsync::Action) > -{ > - if ( --notifications_m == 0 ) > - releaseIterates(); > -} > - > - > - > -//----------------------------------------------------------------------------- > -// > -// void IterateScheduler::blockingEvaluate > -// Evaluate all the iterates in the queue. > -// > -//----------------------------------------------------------------------------- > -inline > -void > -IterateScheduler::blockingEvaluate() > -{ > - // Loop as long as there is anything in the queue. > - while (SystemContext::workReady()) > - { > - SystemContext::runSomething(); > - } > -} > - > -//----------------------------------------------------------------------------- > -// > -// void DataObject::releaseIterates(SerialAsync::Action) > -// When the last released iterate dies, we need to > -// look at the beginning of the queue and tell more iterates > -// that they can access this data. > -// > -//----------------------------------------------------------------------------- > -inline > -void > DataObject::releaseIterates() > { > // Get rid of the reservations that have finished. > @@ -622,14 +616,17 @@ > released_m->iterate().notify(); > ++notifications_m; > > - // Record what action that one will take. > + // Record what action that one will take > + // and record its generation number > SerialAsync::Action act = released_m->act(); > + int generation = released_m->iterate().generation(); > > // Look at the next iterate. > ++released_m; > > // If the first one was a read, release more. > if ( act == SerialAsync::Read ) > + { > > // As long as we aren't at the end and we have more reads... > while ((released_m != end) && > @@ -642,29 +639,30 @@ > // And go on to the next. > ++released_m; > } > + > + } > + > } > } > > +/// void DataObject::request(Iterate&, action) > +/// An iterate makes a reservation with this DataObject for a given > +/// action in a given generation. The request may be granted > +/// immediately. > > -// > -// void DataObject::request(Iterate&, action) > -// An iterate makes a reservation with this DataObject for a given > -// action in a given generation. The request may be granted > -// immediately. > -// > -inline > -void > +inline void > DataObject::request(Iterate& it, > SerialAsync::Action act) > > { > // The request can be granted immediately if: > // The queue is currently empty, or > - // The request is a read and everything in the queue is a read. > + // the request is a read and everything in the queue is a read, > + // or (with relaxed conditions), everything is the same generation. > > // Set notifications dynamically and automatically > // every time a request is made by the iterate > - it.incr_notifications(); > + it.notifications_m++; > > bool allReleased = (queue_m.end() == released_m); > bool releasable = queue_m.empty() || > @@ -691,17 +689,11 @@ > } > > > -//---------------------------------------------------------------------- > - > - > -// > -// End of Smarts namespace. > -// > -} > +} // namespace Smarts > > ////////////////////////////////////////////////////////////////////// > > -#endif // POOMA_PACKAGE_CLASS_H > +#endif // _SerialAsync_h_ > > /*************************************************************************** > * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ > --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2000-04-12 02:08:06.000000000 +0200 > +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2004-01-02 00:40:16.000000000 +0100 > @@ -82,6 +82,12 @@ > > std::list SystemContext::workQueueMessages_m; > std::list SystemContext::workQueue_m; > +#if POOMA_MPI > + MPI_Request SystemContext::requests_m[1024]; > + std::map SystemContext::allocated_requests_m; > + std::set SystemContext::free_requests_m; > +#endif > +std::stack IterateScheduler::generationStack_m; > > } > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:32:06 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:32:06 -0800 Subject: [PATCH] Initialize MPI In-Reply-To: References: Message-ID: <3FF9D7D6.70407@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds MPI initialization. > > Ok? Yes. > Richard. > > > 2004Jan02 Richard Guenther > > * src/Pooma/Pooma.cmpl.cpp: add initialization and > finalization sequence for MPI. Pooma::blockAndEvaluate() at > finalization. > > --- /home/richard/src/pooma/cvs/r2/src/Pooma/Pooma.cmpl.cpp 2003-12-25 12:26:04.000000000 +0100 > +++ Pooma/Pooma.cmpl.cpp 2004-01-02 00:40:15.000000000 +0100 > @@ -287,10 +287,10 @@ > // we can do this in the other initialize routine by querying for > // the Cheetah options from the Options object. > > -#if POOMA_CHEETAH > - > +#if POOMA_MPI > + MPI_Init(&argc, &argv); > +#elif POOMA_CHEETAH > controller_g = new Cheetah::Controller(argc, argv); > - > #endif > > // Just create an Options object for this argc, argv set, and give that > @@ -349,12 +349,20 @@ > > // Set myContext_s and numContexts_s to the context numbers. > > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > > +#if POOMA_MPI > + MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); > + MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); > + // ugh... > + for (int i=0; i + Smarts::SystemContext::free_requests_m.insert(i); > +#elif POOMA_CHEETAH > PAssert(controller_g != 0); > > myContext_g = controller_g->mycontext(); > numContexts_g = controller_g->ncontexts(); > +#endif > > initializeCheetahHelpers(numContexts_g); > > @@ -376,14 +384,14 @@ > warnMessages(opts.printWarnings()); > errorMessages(opts.printErrors()); > > -#if POOMA_CHEETAH > - > // This barrier is here so that Pooma is initialized on all contexts > // before we continue. (Another context could invoke a remote member > // function on us before we're initialized... which would be bad.) > > +#if POOMA_MPI > + MPI_Barrier(MPI_COMM_WORLD); > +#elif POOMA_CHEETAH > controller_g->barrier(); > - > #endif > > // Initialize the Inform streams with info on how many contexts we > @@ -416,6 +424,8 @@ > > bool finalize(bool quitRTS, bool quitArch) > { > + Pooma::blockAndEvaluate(); > + > if (initialized_s) > { > // Wait for threads to finish. > @@ -426,7 +436,7 @@ > > cleanup_s(); > > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > // Clean up the Cheetah helpers. > > finalizeCheetahHelpers(); > @@ -436,15 +446,19 @@ > > if (quitRTS) > { > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > > // Deleting the controller shuts down the cross-context communication > // if this is the last thing using this controller. If something > // else is using this, Cheetah will not shut down until that item > // is destroyed or stops using the controller. > > +#if POOMA_MPI > + MPI_Finalize(); > +#elif POOMA_CHEETAH > if (controller_g != 0) > delete controller_g; > +#endif > > #endif > } > @@ -784,18 +799,18 @@ > SystemContext_t::runSomething(); > } > > -#elif POOMA_REORDER_ITERATES > +# elif POOMA_REORDER_ITERATES > > CTAssert(NO_SUPPORT_FOR_THREADS_WITH_MESSAGING); > > -#else // we're using the serial scheduler, so we only need to get messages > +# else // we're using the serial scheduler, so we only need to get messages > > while (Pooma::incomingMessages()) > { > controller_g->poll(); > } > > -#endif // schedulers > +# endif // schedulers > > #else // !POOMA_CHEETAH > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:37:50 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:37:50 -0800 Subject: [PATCH] Add MPI serializer In-Reply-To: References: Message-ID: <3FF9D92E.8060205@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds the serializer for MPI messaging. This is basically a > stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted > all traces of Cheetah::DELEGATE mechanism which we don't use. > > Ok? Please see the interspersed comments below. > Richard. > > > 2004Jan02 Richard Guenther > > * src/Tulip/CheetahSerialize.h: new file. > src/Tulip/Messaging.h: include it, if POOMA_MPI. > > --- /home/richard/src/pooma/cvs/r2/src/Tulip/Messaging.h 2003-12-25 12:26:35.000000000 +0100 > +++ Tulip/Messaging.h 2004-01-02 00:40:16.000000000 +0100 > @@ -49,7 +49,12 @@ > // Includes: > //----------------------------------------------------------------------------- > > -#include "Pooma/Pooma.h" > +#include "Pooma/Configuration.h" > + > +#if POOMA_MPI > +# include "Tulip/CheetahSerialize.h" > +# include > +#endif > > #if POOMA_CHEETAH > # include "Cheetah/Cheetah.h" > @@ -254,6 +259,6 @@ > // ACL:rcsinfo > // ---------------------------------------------------------------------- > // $RCSfile: Messaging.h,v $ $Author: pooma $ > -// $Revision: 1.8 $ $Date: 2003/12/25 11:26:35 $ > +// $Revision: 1.7 $ $Date: 2003/10/21 18:47:59 $ > // ---------------------------------------------------------------------- > // ACL:rcsinfo > #ifndef CHEETAH_MATCHINGHANDLER_SERIALIZE_H > #define CHEETAH_MATCHINGHANDLER_SERIALIZE_H > > //----------------------------------------------------------------------------- > // Classes: > // Cheetah > // Serialize > //----------------------------------------------------------------------------- > > //----------------------------------------------------------------------------- > // Overview: > // > // Serialize is a simple class that serializes/unserializes items to/from > // a buffer. It can be partially specialized for different types T, > // or for different general tags Tag. Provided tags are: > // > // 1. 'CHEETAH' is a simple tag type for the default case used by other parts > // of Cheetah. Objects are instantiated in place in the provided buffer. Where is number 2? > // 3. 'ARRAY' serializes arrays. API changes a little from other > // serialize tags as array length must be provided in serialize methods. > // Objects are instantiated in place in the provided buffer. > // > //----------------------------------------------------------------------------- > > //----------------------------------------------------------------------------- > // Include Files: > //----------------------------------------------------------------------------- > > #include > #include > > > namespace Cheetah { > > //---------------------------------------------------------------------- > // > // class Serialize > // > // Serialize is a class that can be specialized to pack and unpack > // items of type T to/from a provided buffer of bytes. It is used by > // the MatchingHandler to prepare and use data sent between MatchingHandler > // send and request calls. It has two template parameters: a tag, and a data > // type. The tag can be used to specialize to different categories of > // serialize operations; the data type indicates the type of data that > // will be packed or unpacked. > // > // Serialize specializations should define the following four static > // functions: > // > // // Return the storage needed to pack the item of type T > // static int size(const T &item); > // > // // Pack an item of type T into the given buffer. Return space used. > // static int pack(const T &item, char *buffer); > // > // // Unpack an item of type T from the given buffer. Set the given > // // pointer to point at this item. Return bytes unpacked. > // static int unpack(T* &p, char *buffer); > // > // // Delete the item pointed to by the given pointer, that was > // // unpacked with a previous call to unpack(). > // static void cleanup(T *p); > // > // There is a general template for this class that does nothing, > // one specialization for a tag 'CHEETAH'. > // > //---------------------------------------------------------------------- > > > //---------------------------------------------------------------------- > // Returns padding necessary for word alignment. > //---------------------------------------------------------------------- > static inline int padding(int size) > { > int extra = size % sizeof(void*); > return (extra == 0) ? 0 : sizeof(void*) - extra; > } > > > //---------------------------------------------------------------------- > // CHEETAH serialize specialization > //---------------------------------------------------------------------- > > // The general tag type used to specialize Serialize later. > > struct CHEETAH > { > inline CHEETAH() { } > inline ~CHEETAH() { } > }; > > > // The general template, that does nothing. > > template > class Serialize { }; > > > // A specialization for the CHEETAH tag that provides some default ability > // to pack items. > > template > class Serialize< ::Cheetah::CHEETAH, T> > { > public: > // Return the storage needed to pack the item of type T. > // For the default case, this is just sizeof(T), but perhaps rounded > // up to be pointer-word-size aligned. > > static inline int size(const T &) > { Remove the extra blank line. > > return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); > /* > const int off = sizeof(T) % sizeof(void *); > return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); > */ Why have the commented out code? > } > > // Pack an item of type T into the given buffer. Return space used. > // By default, this just does a placement-new into the buffer, > // assuming the storage required is sizeof(T). > > static inline int pack(const T &item, char *buffer) > { > new ((void*)buffer) T(item); > return size(item); > } > > // Unpack an item of type T from the given buffer. Set the given > // pointer to point at this item. Return bytes unpacked. > // By default, this just recasts the current buffer pointer. > > static inline int unpack(T* &p, char *buffer) > { > p = reinterpret_cast(buffer); > return size(*p); > } > > // Delete the item pointed to by the given pointer, that was > // unpacked with a previous call to unpack(). > // By default, this just runs the destructor on the data, which for > // many things will do nothing. > > static inline void cleanup(T *p) > { > p->~T(); > } > }; > > > //---------------------------------------------------------------------- > // ARRAY serialize specialization > //---------------------------------------------------------------------- > > struct ARRAY > { > inline ARRAY() { } > inline ~ARRAY() { } > }; > > > // A specialization for the POINTER tag that provides marshaling of > // arrays. > > template > class Serialize< ::Cheetah::ARRAY, T> > { > public: > > // Return the storage needed to pack count items of type T, > // This includes the bytes needed to store the size of the array. > > static inline int size(const T* items, const int& count) > { > int arraySize = count*sizeof(T); > return ( Serialize::size(count) > + arraySize + padding(arraySize) ); > } > > // Pack an item of type T into the given buffer. Return space used. > // By default, this just does a placement-new into the buffer, > // assuming the storage required is sizeof(T). > > static inline int pack(const T* items, char* buffer, const int& count) > { > int n = Serialize::pack(count, buffer); > memcpy(n+buffer, items, count*sizeof(T)); > return size(items, count); > } > > // Unpack an item of type T from the given buffer. Set the given > // pointer to point at this item. Return bytes unpacked. > > static inline int unpack(T* &p, char *buffer, int& count) > { > int* iPtr; > int n = Serialize::unpack(iPtr, buffer); > count = *iPtr; > p = reinterpret_cast(n+buffer); > return size(p, count); > } > > // Delete the item pointed to by the given pointer, that was unpacked with a > // previous call to unpack(). By default, this just runs the destructor on > // the data, which for many things will do nothing. Memory has been > // allocated from the provided buffer so no freeing of memory need be done > // here. > > static inline void cleanup(T *p) > { > p->~T(); > } > }; > > > // > // This class is used so that serialization routines can be specialized > // for either delegation (WrappedBool) or CHEETAH > // (WrappedBool). > // > > template class WrappedBool > { > public: > WrappedBool() {} > ~WrappedBool() {} > }; > > } // namespace Cheetah > > #endif // CHEETAH_MATCHINGHANDLER_SERIALIZE_H -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:39:19 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:39:19 -0800 Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: References: <3FF45821.8030605@codesourcery.com> Message-ID: <3FF9D987.3030104@codesourcery.com> Richard Guenther wrote: > On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This patch is a partial reversion of a previous patch that made us use >>>Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a >>>Cheetah::CHEETAH serialization for std::string, which was previously >>>missing. One step more for the MPI merge. >>> >>>Tested together with all other MPI changes with serial, Cheetah and MPI. >>> >>>Ok? >> >>Yes. Do we need more regression tests for this work to better ensure >>correctness? > > > Maybe, at least we get all non-POD types that are not explicitly > specialized wrong during serialization. And I can tell you, such errors > are _very_ hard to find (happened for me for std::string and RemoteProxy). > > Richard. Yes, we're running the regression tests in serial, not parallel, only so testing may be hard. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:44:34 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:44:34 -0800 Subject: [PATCH] OpenMP support In-Reply-To: References: Message-ID: <3FF9DAC2.4020005@codesourcery.com> Richard Guenther wrote: > Hi Jeffrey, > > would you please look at "[PATCH] OpenMP loop level parallelism" mail I > sent Dec23? Additionally to this patch I propose the following, which > adds a --openmp switch to configure. The 23Dec patch is fine. > Tested with gcc (with and without --openmp, which is the same here) and > Intel icpc (with and without --openmp, which makes a difference here). > > Ok? Yes. > Thanks, > > Richard. > > > 2004Jan02 Richard Guenther > > * config/arch/LINUXICC.conf: don't warn about unused #pragmas. > configure: add --openmp switch. > scripts/configure.ac: add test to detect wether -openmp works. s/wether/whether/ > scripts/configure: regenerated. > > diff -Nru a/r2/config/arch/LINUXICC.conf b/r2/config/arch/LINUXICC.conf > --- a/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 > +++ b/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 > @@ -170,8 +170,8 @@ > > ### debug or optimized build settings for C++ applications > > -$cppdbg_app = "-g"; > -$cppopt_app = "-DNOPAssert -DNOCTAssert -O2"; > +$cppdbg_app = "-g -wd161"; > +$cppopt_app = "-DNOPAssert -DNOCTAssert -O2 -wd161"; > > > ### debug or optimized build settings for C++ libraries > diff -Nru a/r2/configure b/r2/configure > --- a/r2/configure Fri Jan 2 16:32:14 2004 > +++ b/r2/configure Fri Jan 2 16:32:14 2004 > @@ -170,6 +170,7 @@ > $prefixnm = "--prefix"; > $serialnm = "--serial"; > $threadsnm = "--threads"; > +$openmpnm = "--openmp"; > $profilenm = "--profile"; > $insurenm = "--insure"; > $debugnm = "--debug"; > @@ -237,7 +238,8 @@ > [$finternm, "", "include fortran support libraries."], > [$nofinternm, "", "do not include the fortran libraries."], > [$serialnm, "", "configure to run serially, no parallelism."], > - [$threadsnm, "", "include threads capability, if available."], > + [$threadsnm, "", "include threads capability, if available."], > + [$openmpnm, "", "enable use of OpenMP, if available."], > [$cheetahnm, "", "enable use of CHEETAH communications package."], > [$schednm, "", "use for thread scheduling."], > [$pawsnm, "", "enable PAWS program coupling, if available."], > @@ -434,6 +436,10 @@ > $threads_include_makefile = ""; > $cpp_threads_arg = ""; > > +### include OpenMP capability? > +$openmp = 0; > +$openmpargs = ""; > + > ### if threads is used, what scheduler should be employed? > $scheduler = $schedulerdefault; > > @@ -1307,9 +1313,9 @@ > sub setthreads > { > # set $threads variable properly > - if (scalar @{$arghash{$threadsnm}} > 1 and scalar @{$arghash{$serialnm}}> 1) > + if (scalar @{$arghash{$threadsnm}} > 1 and (scalar @{$arghash{$serialnm}} > 1 or scalar @{$arghash{$openmpnm}} > 1)) > { > - printerror "$threadsnm and $serialnm both given. Use only one."; > + printerror "$threadsnm and $serialnm or $openmpnm given. Use only one."; > } > elsif (not $threads_able or scalar @{$arghash{$serialnm}} > 1) > { > @@ -1438,6 +1444,13 @@ > $pooma_reorder_iterates = $threads || ($scheduler eq "serialAsync"); > > add_yesno_define("POOMA_REORDER_ITERATES", $pooma_reorder_iterates); > + > + # OpenMP support > + if (scalar @{$arghash{$openmpnm}} > 1) > + { > + $openmpargs = "\@openmpargs\@"; > + } > + > } > > > @@ -1936,20 +1949,20 @@ > print FSUITE "LD = $link\n"; > print FSUITE "\n"; > print FSUITE "### flags for applications\n"; > - print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $cppshare $cppopt_lib\n"; > - print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $cppshare $cppdbg_lib\n"; > - print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $cppopt_app\n"; > - print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $cppdbg_app\n"; > + print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppopt_lib\n"; > + print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppdbg_lib\n"; > + print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; > + print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; > print FSUITE "\n"; > - print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $cshare $copt_lib\n"; > - print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $cshare $cdbg_lib\n"; > - print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $copt_app\n"; > - print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $cdbg_app\n"; > + print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $openmpargs $cshare $copt_lib\n"; > + print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $openmpargs $cshare $cdbg_lib\n"; > + print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $openmpargs $copt_app\n"; > + print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; > print FSUITE "\n"; > - print FSUITE "F77_OPT_LIB_ARGS = $f77args $f77share $f77opt_lib\n"; > - print FSUITE "F77_DBG_LIB_ARGS = $f77args $f77share $f77dbg_lib\n"; > - print FSUITE "F77_OPT_APP_ARGS = $f77args $f77opt_app\n"; > - print FSUITE "F77_DBG_APP_ARGS = $f77args $f77dbg_app\n"; > + print FSUITE "F77_OPT_LIB_ARGS = $f77args $openmpargs $f77share $f77opt_lib\n"; > + print FSUITE "F77_DBG_LIB_ARGS = $f77args $openmpargs $f77share $f77dbg_lib\n"; > + print FSUITE "F77_OPT_APP_ARGS = $f77args $openmpargs $f77opt_app\n"; > + print FSUITE "F77_DBG_APP_ARGS = $f77args $openmpargs $f77dbg_app\n"; > print FSUITE "\n"; > if ($shared) { > print FSUITE "AR_OPT_ARGS = $arshareopt\n"; > diff -Nru a/r2/scripts/configure.ac b/r2/scripts/configure.ac > --- a/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 > +++ b/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 > @@ -352,6 +352,31 @@ > > > dnl > +dnl Check for compiler argument for OpenMP support > +dnl > + > +AC_MSG_CHECKING([for way to enable OpenMP support]) > +acx_saved_cxxflags=$CXXFLAGS > +CXXFLAGS="$CXXFLAGS -openmp" > +AC_TRY_LINK([ > +#include > +], [ > + double d[128]; > +#pragma omp parallel for > + for (int i=0; i<128; ++i) > + d[i] = 1.0; > + omp_get_max_threads(); > +], [ > +AC_MSG_RESULT([-openmp]) > +openmpargs="-openmp" > +], [ > +AC_MSG_RESULT([none]) > +]) > +CXXFLAGS=$acx_saved_cxxflags > +AC_SUBST(openmpargs) > + > + > +dnl > dnl Check on how to get failure on unrecognized pragmas > dnl gcc: -Wunknown-pragmas -Werror > dnl icpc: -we161 -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:15:53 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:15:53 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FF9D783.5030504@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > The patch was tested as usual. > > > > Ok to commit? > > I have some questions and comments interspersed below. > > > Thanks, Richard. > > > > > > 2004Jan02 Richard Guenther > > > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > > add std::stack for generation tracking, add support for > > asyncronous MPI requests. > > Add an 'h' to spell asynchronous. Ok. > > +/** > > + * FIXME. > > + */ > > I am wary of adding unfinished code to the code base. At the very > least, we need a more extensive comment describing what is not finished. Oh, it's just missing documentation of struct SystemContext. I'll strip the FIXME. Ok with this change? Thanks, Richard. From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:20:02 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:20:02 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Add MPI serializer In-Reply-To: <3FF9D92E.8060205@codesourcery.com> References: <3FF9D92E.8060205@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch adds the serializer for MPI messaging. This is basically a > > stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted > > all traces of Cheetah::DELEGATE mechanism which we don't use. > > > > Ok? > > Please see the interspersed comments below. > > > // Serialize is a simple class that serializes/unserializes items to/from > > // a buffer. It can be partially specialized for different types T, > > // or for different general tags Tag. Provided tags are: > > // > > // 1. 'CHEETAH' is a simple tag type for the default case used by other parts > > // of Cheetah. Objects are instantiated in place in the provided buffer. > > Where is number 2? Number 2 was 'DELEGATE', which I stripped. I'll change 3 for 2. > > // 3. 'ARRAY' serializes arrays. API changes a little from other > > // serialize tags as array length must be provided in serialize methods. > > // Objects are instantiated in place in the provided buffer. > > // > > //----------------------------------------------------------------------------- > > static inline int size(const T &) > > { > Remove the extra blank line. Ok. > > > > return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); > > /* > > const int off = sizeof(T) % sizeof(void *); > > return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); > > */ > > Why have the commented out code? It's work in progress, I'll remove it for now. Ok with these changes? Thanks, Richard. From oldham at codesourcery.com Mon Jan 5 22:38:46 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 14:38:46 -0800 Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> Message-ID: <3FF9E776.6030803@codesourcery.com> Richard Guenther wrote: > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>The patch was tested as usual. >>> >>>Ok to commit? >> >>I have some questions and comments interspersed below. >> >> >>>Thanks, Richard. >>> >>> >>>2004Jan02 Richard Guenther >>> >>> * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, >>> add std::stack for generation tracking, add support for >>> asyncronous MPI requests. >> >>Add an 'h' to spell asynchronous. > > > Ok. > > >>>+/** >>>+ * FIXME. >>>+ */ >> >>I am wary of adding unfinished code to the code base. At the very >>least, we need a more extensive comment describing what is not finished. > > > Oh, it's just missing documentation of struct SystemContext. I'll strip > the FIXME. > > Ok with this change? I'd prefer to add some documentation, but either way it is fine. > Thanks, > > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:38:51 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:38:51 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: <3FF9D987.3030104@codesourcery.com> References: <3FF45821.8030605@codesourcery.com> <3FF9D987.3030104@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Yes, we're running the regression tests in serial, not parallel, only so > testing may be hard. With native MPI support coming along nicely this gets as easy as exchanging --serial for --mpi at configure time and doing > make check MPIRUN="mpirun -np 2" if using mpich support coming with your favorite distribution. But doing this with QMTest may be harder - I don't know. Richard. From oldham at codesourcery.com Mon Jan 5 22:39:20 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 14:39:20 -0800 Subject: [pooma-dev] Re: [PATCH] Add MPI serializer In-Reply-To: References: <3FF9D92E.8060205@codesourcery.com> Message-ID: <3FF9E798.3000005@codesourcery.com> Richard Guenther wrote: > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This patch adds the serializer for MPI messaging. This is basically a >>>stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted >>>all traces of Cheetah::DELEGATE mechanism which we don't use. >>> >>>Ok? >> >>Please see the interspersed comments below. >> >> >>>// Serialize is a simple class that serializes/unserializes items to/from >>>// a buffer. It can be partially specialized for different types T, >>>// or for different general tags Tag. Provided tags are: >>>// >>>// 1. 'CHEETAH' is a simple tag type for the default case used by other parts >>>// of Cheetah. Objects are instantiated in place in the provided buffer. >> >>Where is number 2? > > > Number 2 was 'DELEGATE', which I stripped. I'll change 3 for 2. > > >>>// 3. 'ARRAY' serializes arrays. API changes a little from other >>>// serialize tags as array length must be provided in serialize methods. >>>// Objects are instantiated in place in the provided buffer. >>>// >>>//----------------------------------------------------------------------------- > > >>> static inline int size(const T &) >>> { >> >>Remove the extra blank line. > > > Ok. > > >>> return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); >>> /* >>> const int off = sizeof(T) % sizeof(void *); >>> return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); >>> */ >> >>Why have the commented out code? > > > It's work in progress, I'll remove it for now. > > Ok with these changes? Yes. > Thanks, > > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:59:08 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:59:08 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FF9D783.5030504@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> Message-ID: Whoops, I just noticed I didn't answer one of your questions: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > static std::list workQueue_m; > > +#if POOMA_MPI > > + static MPI_Request requests_m[1024]; > > What is this fixed constant of 1024? Does this come from the MPI standard? > > > + static std::map allocated_requests_m; Well - it's somewhat arbitrary, but with some reason. First, with mpich an MPI_Request is an integer identifier, so 1024 requests will fill just a page of memory. Second, the mpich library seems to use poll/select on distinct sockets for which 1024 seems an appropriate upper number. Third, its about the number of in-flight requests I have with my 3d CFD code (but you may see we hard-limit here and wait for requests to finish at appropriate places). So, I dont like having this magic number either, but for the MPI standard the requests need to be allocated continuously and we need _some_ limit. Once someone has a problem with this we could make it configurable, but I dont see the point at the moment. Thus, ok again? Thanks, Richard. From rguenth at tat.physik.uni-tuebingen.de Tue Jan 6 14:07:50 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 6 Jan 2004 15:07:50 +0100 (CET) Subject: [PATCH] Fix compilation problems Message-ID: Hi! I applied the patch below as obvious which restores compilation. Richard. 2004Jan06 Richard Guenther * src/Tulip/PatchSizeSyncer.cmpl.cpp: fix missing #include. Index: PatchSizeSyncer.cmpl.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Tulip/PatchSizeSyncer.cmpl.cpp,v retrieving revision 1.7 retrieving revision 1.8 diff -u -u -r1.7 -r1.8 --- PatchSizeSyncer.cmpl.cpp 25 Dec 2003 11:26:35 -0000 1.7 +++ PatchSizeSyncer.cmpl.cpp 6 Jan 2004 14:03:25 -0000 1.8 @@ -34,7 +34,9 @@ // Includes: //----------------------------------------------------------------------------- +#include "Pooma/Configuration.h" #include "Tulip/Messaging.h" +#include "Pooma/Pooma.h" #include "Tulip/PatchSizeSyncer.h" #include "Tulip/RemoteProxy.h" #include "Tulip/CollectFromContexts.h" From oldham at codesourcery.com Tue Jan 6 18:36:23 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 10:36:23 -0800 Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> Message-ID: <3FFB0027.6080509@codesourcery.com> Richard Guenther wrote: > Whoops, I just noticed I didn't answer one of your questions: > > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>> static std::list workQueue_m; >>>+#if POOMA_MPI >>>+ static MPI_Request requests_m[1024]; >> >>What is this fixed constant of 1024? Does this come from the MPI standard? >> >> >>>+ static std::map allocated_requests_m; > > > Well - it's somewhat arbitrary, but with some reason. First, with mpich > an MPI_Request is an integer identifier, so 1024 requests will fill just a > page of memory. Second, the mpich library seems to use poll/select on > distinct sockets for which 1024 seems an appropriate upper number. Third, > its about the number of in-flight requests I have with my 3d CFD code (but > you may see we hard-limit here and wait for requests to finish at > appropriate places). > > So, I dont like having this magic number either, but for the MPI standard > the requests need to be allocated continuously and we need _some_ limit. > Once someone has a problem with this we could make it configurable, but I > dont see the point at the moment. > > Thus, ok again? Let's move the magic constant into a const variable instead of having the constant scattered throughout the code. Then, please commit. Thanks. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Tue Jan 6 19:58:33 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 6 Jan 2004 20:58:33 +0100 (CET) Subject: [pooma-dev] [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FFB0027.6080509@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> <3FFB0027.6080509@codesourcery.com> Message-ID: On Tue, 6 Jan 2004, Jeffrey D. Oldham wrote: > Let's move the magic constant into a const variable instead of having > the constant scattered throughout the code. Then, please commit. Thanks. For the record, this is what I committed. It passes builds for both --serial and --mpi for me. Richard. 2004Jan06 Richard Guenther * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, add std::stack for generation tracking, add support for asyncronous MPI requests. src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define new static variables. src/Threads/IterateSchedulers/Runnable.h: declare add(). src/Pooma/Pooma.cmpl.cpp: use SystemContext::max_requests constant. Index: Pooma/Pooma.cmpl.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Pooma/Pooma.cmpl.cpp,v retrieving revision 1.40 diff -u -u -r1.40 Pooma.cmpl.cpp --- Pooma/Pooma.cmpl.cpp 5 Jan 2004 22:34:33 -0000 1.40 +++ Pooma/Pooma.cmpl.cpp 6 Jan 2004 19:52:47 -0000 @@ -354,8 +354,7 @@ #if POOMA_MPI MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); - // ugh... - for (int i=0; i SystemContext::workQueueMessages_m; std::list SystemContext::workQueue_m; +#if POOMA_MPI + const int SystemContext::max_requests; + MPI_Request SystemContext::requests_m[SystemContext::max_requests]; + std::map SystemContext::allocated_requests_m; + std::set SystemContext::free_requests_m; +#endif +std::stack IterateScheduler::generationStack_m; } Index: Threads/IterateSchedulers/SerialAsync.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Threads/IterateSchedulers/SerialAsync.h,v retrieving revision 1.9 diff -u -u -r1.9 SerialAsync.h --- Threads/IterateSchedulers/SerialAsync.h 8 Jun 2000 22:16:50 -0000 1.9 +++ Threads/IterateSchedulers/SerialAsync.h 6 Jan 2004 19:52:48 -0000 @@ -42,48 +42,38 @@ // DataObject //----------------------------------------------------------------------------- -#include - #ifndef _SerialAsync_h_ #define _SerialAsync_h_ -/* -LIBRARY: - SerialAsync - -CLASSES: IterateScheduler - -CLASSES: DataObject - -CLASSES: Iterate - -OVERVIEW - SerialAsync IterateScheduler is a policy template to create a - dependence graphs and executes the graph respecting the - dependencies without using threads. There is no parallelism, - but Iterates may be executed out-of-order with respect to the - program text. - ------------------------------------------------------------------------------*/ - -////////////////////////////////////////////////////////////////////// -//----------------------------------------------------------------------------- -// Overview: -// Smarts classes for times when you want no threads but you do want -// dataflow evaluation. -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Typedefs: -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Smarts classes for times when you want no threads but you do want + * dataflow evaluation. + * + * SerialAsync IterateScheduler is a policy template to create a + * dependence graphs and executes the graph respecting the + * dependencies without using threads. + * There is no (thread level) parallelism, but Iterates may be executed + * out-of-order with respect to the program text. Also this scheduler is + * used for message based parallelism in which case asyncronous execution + * leads to reduced communication latencies. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- #include +#include +#include +#include +#include +#include #include "Threads/IterateSchedulers/IterateScheduler.h" #include "Threads/IterateSchedulers/Runnable.h" +#include "Tulip/Messaging.h" +#include "Utilities/PAssert.h" //----------------------------------------------------------------------------- // Forward Declarations: @@ -94,76 +84,258 @@ namespace Smarts { -#define MYID 0 -#define MAX_CPUS 1 -// -// Tag class for specializing IterateScheduler, Iterate and DataObject. -// +/** + * Tag class for specializing IterateScheduler, Iterate and DataObject. + */ + struct SerialAsync { - enum Action { Read, Write}; + enum Action { Read, Write }; }; -//----------------------------------------------------------------------------- +/** + * Iterate is used to implement the SerialAsync + * scheduling policy. + * + * An Iterate is a non-blocking unit of concurrency that is used + * to describe a chunk of work. It inherits from the Runnable + * class and as all subclasses of Runnable, the user specializes + * the run() method to specify the operation. + * Iterate is a further specialization of the + * Iterate class to use the SerialAsync Scheduling algorithm to + * generate the data dependency graph for a data-driven + * execution. + */ + +template<> +class Iterate : public Runnable +{ + friend class IterateScheduler; + friend class DataObject; + +public: + + typedef DataObject DataObject_t; + typedef IterateScheduler IterateScheduler_t; + + + /// The Constructor for this class takes the IterateScheduler and a + /// CPU affinity. CPU affinity has a default value of -1 which means + /// it may run on any CPU available. + + inline Iterate(IterateScheduler & s, int affinity=-1) + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) + {} + + /// The dtor is virtual because the subclasses will need to add to it. + + virtual ~Iterate() {} + + /// The run method does the core work of the Iterate. + /// It is supplied by the subclass. + + virtual void run() = 0; + + //@name Stubs for the affinities + /// There is no such thing in serial. + //@{ + + inline int affinity() const {return 0;} + + inline int hintAffinity() const {return 0;} + + inline void affinity(int) {} + + inline void hintAffinity(int) {} + + //@} + + /// Notify is used to indicate to the Iterate that one of the data + /// objects it had requested has been granted. To do this, we dec a + /// dependence counter which, if equal to 0, the Iterate is ready for + /// execution. + + void notify() + { + if (--notifications_m == 0) + add(this); + } + + /// How many notifications remain? + + int notifications() const { return notifications_m; } + + void addNotification() { notifications_m++; } + + int& generation() { return generation_m; } + + int& togo() { return togo_m; } + +protected: + + /// What scheduler are we working with? + IterateScheduler &scheduler_m; + + /// How many notifications should we receive before we can run? + int notifications_m; + + /// Which generation we were issued in. + int generation_m; + + /// How many times we need to go past a "did something" to be ready + /// for destruction? + int togo_m; + +}; + struct SystemContext { void addNCpus(int) {} void wait() {} void concurrency(int){} - int concurrency() {return 1;} + int concurrency() { return 1; } void mustRunOn() {} // We have a separate message queue because they are // higher priority. + typedef Iterate *IteratePtr_t; static std::list workQueueMessages_m; static std::list workQueue_m; +#if POOMA_MPI + static const int max_requests = 1024; + static MPI_Request requests_m[max_requests]; + static std::map allocated_requests_m; + static std::set free_requests_m; +#endif + + +#if POOMA_MPI + + /// Query, if we have lots of MPI_Request slots available - /////////////////////////// - // This function lets you check if there are iterates that are - // ready to run. - inline static - bool workReady() + static bool haveLotsOfMPIRequests() { - return !(workQueue_m.empty() && workQueueMessages_m.empty()); + return free_requests_m.size() > max_requests/2; } - /////////////////////////// - // Run an iterate if one is ready. - inline static - void runSomething() + /// Get a MPI_Request slot, associated with an iterate + + static MPI_Request* getMPIRequest(IteratePtr_t p) { - if (!workQueueMessages_m.empty()) - { - // Get the top iterate. - // Delete it from the queue. - // Run the iterate. - // Delete the iterate. This could put more iterates in the queue. + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); + int i = *free_requests_m.begin(); + free_requests_m.erase(free_requests_m.begin()); + allocated_requests_m[i] = p; + p->togo()++; + return &requests_m[i]; + } - RunnablePtr_t p = workQueueMessages_m.front(); - workQueueMessages_m.pop_front(); - p->execute(); + static void releaseMPIRequest(int i) + { + IteratePtr_t p = allocated_requests_m[i]; + allocated_requests_m.erase(i); + free_requests_m.insert(i); + if (--(p->togo()) == 0) delete p; - } + } + + static bool waitForSomeRequests(bool mayBlock) + { + if (allocated_requests_m.empty()) + return false; + + int last_used_request = allocated_requests_m.rbegin()->first; + int finished[last_used_request+1]; + MPI_Status statuses[last_used_request+1]; + int nr_finished; + int res; + if (mayBlock) + res = MPI_Waitsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); else - { - if (!workQueue_m.empty()) - { - RunnablePtr_t p = workQueue_m.front(); - workQueue_m.pop_front(); - p->execute(); - delete p; + res = MPI_Testsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); + if (nr_finished == MPI_UNDEFINED) + return false; + + // release finised requests + while (nr_finished--) { + if (res == MPI_ERR_IN_STATUS) { + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { + char msg[MPI_MAX_ERROR_STRING+1]; + int len; + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); + msg[len] = '\0'; + PInsist(0, msg); + } } + releaseMPIRequest(finished[nr_finished]); } + return true; + } + +#else + + static bool waitForSomeRequests(bool mayBlock) + { + return false; + } + +#endif + + + /// This function lets you check if there are iterates that are + /// ready to run. + + static bool workReady() + { + return !(workQueue_m.empty() + && workQueueMessages_m.empty() +#if POOMA_MPI + && allocated_requests_m.empty() +#endif + ); + } + + /// Run an iterate if one is ready. Returns if progress + /// was made. + + static bool runSomething(bool mayBlock = true) + { + // do work in this order to minimize communication latency: + // - issue all messages + // - do some regular work + // - wait for messages to complete + + RunnablePtr_t p = NULL; + if (!workQueueMessages_m.empty()) { + p = workQueueMessages_m.front(); + workQueueMessages_m.pop_front(); + } else if (!workQueue_m.empty()) { + p = workQueue_m.front(); + workQueue_m.pop_front(); + } + + if (p) { + p->execute(); + Iterate *it = dynamic_cast(p); + if (it) { + if (--(it->togo()) == 0) + delete it; + } else + delete p; + return true; + + } else + return waitForSomeRequests(mayBlock); } }; -inline void addRunnable(RunnablePtr_t rn) -{ - SystemContext::workQueue_m.push_front(rn); -} +/// Adds a runnable to the appropriate work-queue. inline void add(RunnablePtr_t rn) { @@ -182,25 +354,18 @@ inline void wait() {} inline void mustRunOn(){} -/*------------------------------------------------------------------------ -CLASS - IterateScheduler_Serial_Async - - Implements a asynchronous scheduler for a data driven execution. - Specializes a IterateScheduler. - -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - - The SerialAsync IterateScheduler, Iterate and DataObject - implement a SMARTS scheduler that does dataflow without threads. - What that means is that when you hand iterates to the - IterateScheduler it stores them up until you call - IterateScheduler::blockingEvaluate(), at which point it evaluates - iterates until the queue is empty. ------------------------------------------------------------------------------*/ + +/** + * Implements a asynchronous scheduler for a data driven execution. + * Specializes a IterateScheduler. + * + * The SerialAsync IterateScheduler, Iterate and DataObject + * implement a SMARTS scheduler that does dataflow without threads. + * What that means is that when you hand iterates to the + * IterateScheduler it stores them up until you call + * IterateScheduler::blockingEvaluate(), at which point it evaluates + * iterates until the queue is empty. + */ template<> class IterateScheduler @@ -212,196 +377,128 @@ typedef DataObject DataObject_t; typedef Iterate Iterate_t; - /////////////////////////// - // Constructor - // - IterateScheduler() {} - - /////////////////////////// - // Destructor - // - ~IterateScheduler() {} - void setConcurrency(int) {} - - //--------------------------------------------------------------------------- - // Mutators. - //--------------------------------------------------------------------------- - - /////////////////////////// - // Tells the scheduler that the parser thread is starting a new - // data-parallel statement. Any Iterate that is handed off to the - // scheduler between beginGeneration() and endGeneration() belongs - // to the same data-paralllel statement and therefore has the same - // generation number. - // - inline void beginGeneration() { } - - /////////////////////////// - // Tells the scheduler that no more Iterates will be handed off for - // the data parallel statement that was begun with a - // beginGeneration(). - // - inline void endGeneration() {} - - /////////////////////////// - // The parser thread calls this method to evaluate the generated - // graph until all the nodes in the dependence graph has been - // executed by the scheduler. That is to say, the scheduler - // executes all the Iterates that has been handed off to it by the - // parser thread. - // - inline - void blockingEvaluate(); - - /////////////////////////// - // The parser thread calls this method to ask the scheduler to run - // the given Iterate when the dependence on that Iterate has been - // satisfied. - // - inline void handOff(Iterate* it); + IterateScheduler() + : generation_m(0) + {} - inline - void releaseIterates() { } + ~IterateScheduler() {} -protected: -private: + void setConcurrency(int) {} - typedef std::list Container_t; - typedef Container_t::iterator Iterator_t; + /// Tells the scheduler that the parser thread is starting a new + /// data-parallel statement. Any Iterate that is handed off to the + /// scheduler between beginGeneration() and endGeneration() belongs + /// to the same data-paralllel statement and therefore has the same + /// generation number. + /// Nested invocations are handled as being part of the outermost + /// generation. -}; + void beginGeneration() + { + // Ensure proper overflow behavior. + if (++generation_m < 0) + generation_m = 0; + generationStack_m.push(generation_m); + } -//----------------------------------------------------------------------------- + /// Tells the scheduler that no more Iterates will be handed off for + /// the data parallel statement that was begun with a + /// beginGeneration(). -/*------------------------------------------------------------------------ -CLASS - Iterate_SerialAsync - - Iterate is used to implement the SerialAsync - scheduling policy. - -KEYWORDS - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. - -DESCRIPTION - An Iterate is a non-blocking unit of concurrency that is used - to describe a chunk of work. It inherits from the Runnable - class and as all subclasses of Runnable, the user specializes - the run() method to specify the operation. - Iterate is a further specialization of the - Iterate class to use the SerialAsync Scheduling algorithm to - generate the data dependency graph for a data-driven - execution. */ + void endGeneration() + { + PAssert(inGeneration()); + generationStack_m.pop(); -template<> -class Iterate : public Runnable -{ - friend class IterateScheduler; - friend class DataObject; +#if POOMA_MPI + // this is a safe point to block until we have "lots" of MPI Requests + if (!inGeneration()) + while (!SystemContext::haveLotsOfMPIRequests()) + SystemContext::runSomething(true); +#endif + } -public: + /// Wether we are inside a generation and may not safely block. - typedef DataObject DataObject_t; - typedef IterateScheduler IterateScheduler_t; + bool inGeneration() const + { + return !generationStack_m.empty(); + } + /// What the current generation is. - /////////////////////////// - // The Constructor for this class takes the IterateScheduler and a - // CPU affinity. CPU affinity has a default value of -1 which means - // it may run on any CPU available. - // - inline Iterate(IterateScheduler & s, int affinity=-1); - - /////////////////////////// - // The dtor is virtual because the subclasses will need to add to it. - // - virtual ~Iterate() {} + int generation() const + { + if (!inGeneration()) + return -1; + return generationStack_m.top(); + } - /////////////////////////// - // The run method does the core work of the Iterate. - // It is supplied by the subclass. - // - virtual void run() = 0; + /// The parser thread calls this method to evaluate the generated + /// graph until all the nodes in the dependence graph has been + /// executed by the scheduler. That is to say, the scheduler + /// executes all the Iterates that has been handed off to it by the + /// parser thread. - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int affinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int hintAffinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void affinity(int) {} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void hintAffinity(int) {} + void blockingEvaluate() + { + if (inGeneration()) { + // It's not safe to block inside a generation, so + // just do as much as we can without blocking. + while (SystemContext::runSomething(false)) + ; + + } else { + // Loop as long as there is anything in the queue. + while (SystemContext::workReady()) + SystemContext::runSomething(true); + } + } - /////////////////////////// - // Notify is used to indicate to the Iterate that one of the data - // objects it had requested has been granted. To do this, we dec a - // dependence counter which, if equal to 0, the Iterate is ready for - // execution. - // - inline void notify(); - - /////////////////////////// - // How many notifications remain? - // - inline - int notifications() const { return notifications_m; } + /// The parser thread calls this method to ask the scheduler to run + /// the given Iterate when the dependence on that Iterate has been + /// satisfied. - inline void addNotification() + void handOff(Iterate* it) { - notifications_m++; + // No action needs to be taken here. Iterates will make their + // own way into the execution queue. + it->generation() = generation(); + it->notify(); } -protected: + void releaseIterates() { } - // What scheduler are we working with? - IterateScheduler &scheduler_m; +private: - // How many notifications should we receive before we can run? - int notifications_m; + typedef std::list Container_t; + typedef Container_t::iterator Iterator_t; -private: - // Set notifications dynamically and automatically every time a - // request is made by the iterate - void incr_notifications() { notifications_m++;} + static std::stack generationStack_m; + int generation_m; }; -//----------------------------------------------------------------------------- - -/*------------------------------------------------------------------------ -CLASS - DataObject_SerialAsync - - Implements a asynchronous scheduler for a data driven execution. -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - The DataObject Class is used introduce a type to represent - a resources (normally) blocks of data) that Iterates contend - for atomic access. Iterates make request for either a read or - write to the DataObjects. DataObjects may grant the request if - the object is currently available. Otherwise, the request is - enqueue in a queue private to the data object until the - DataObject is release by another Iterate. A set of read - requests may be granted all at once if there are no - intervening write request to that DataObject. - DataObject is a specialization of DataObject for - the policy template SerialAsync. -*/ +/** + * Implements a asynchronous scheduler for a data driven execution. + * + * The DataObject Class is used introduce a type to represent + * a resources (normally) blocks of data) that Iterates contend + * for atomic access. Iterates make request for either a read or + * write to the DataObjects. DataObjects may grant the request if + * the object is currently available. Otherwise, the request is + * enqueue in a queue private to the data object until the + * DataObject is release by another Iterate. A set of read + * requests may be granted all at once if there are no + * intervening write request to that DataObject. + * DataObject is a specialization of DataObject for + * the policy template SerialAsync. + * + * There are two ways data can be used: to read or to write. + * Don't change this to give more than two states; + * things inside depend on that. + */ template<> class DataObject @@ -413,54 +510,56 @@ typedef IterateScheduler IterateScheduler_t; typedef Iterate Iterate_t; - // There are two ways data can be used: to read or to write. - // Don't change this to give more than two states: - // things inside depend on that. - - /////////////////////////// - // Construct the data object with an empty set of requests - // and the given affinity. - // - inline DataObject(int affinity=-1); + + /// Construct the data object with an empty set of requests + /// and the given affinity. + + DataObject(int affinity=-1) + : released_m(queue_m.end()), notifications_m(0) + { + // released_m to the end of the queue (which should) also be the + // beginning. notifications_m to zero, since nothing has been + // released yet. + } - /////////////////////////// - // for compatibility with other SMARTS schedulers, accept - // Scheduler arguments (unused) - // - inline - DataObject(int affinity, IterateScheduler&); - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline int affinity() const { return 0; } - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline void affinity(int) {} + /// for compatibility with other SMARTS schedulers, accept + /// Scheduler arguments (unused) - /////////////////////////// - // An iterate makes a request for a certain action in a certain - // generation. - // - inline - void request(Iterate&, SerialAsync::Action); - - /////////////////////////// - // An iterate finishes and tells the DataObject it no longer needs - // it. If this is the last release for the current set of - // requests, have the IterateScheduler release some more. - // - inline void release(SerialAsync::Action); + inline DataObject(int affinity, IterateScheduler&) + : released_m(queue_m.end()), notifications_m(0) + {} + + /// Stub out affinity because there is no affinity in serial. + + int affinity() const { return 0; } + + /// Stub out affinity because there is no affinity in serial. + + void affinity(int) {} + + /// An iterate makes a request for a certain action in a certain + /// generation. + + inline void request(Iterate&, SerialAsync::Action); + + /// An iterate finishes and tells the DataObject it no longer needs + /// it. If this is the last release for the current set of + /// requests, have the IterateScheduler release some more. + + void release(SerialAsync::Action) + { + if (--notifications_m == 0) + releaseIterates(); + } -protected: private: - // If release needs to let more iterates go, it calls this. + /// If release needs to let more iterates go, it calls this. inline void releaseIterates(); - // The type for a request. + /** + * The type for a request. + */ class Request { public: @@ -475,135 +574,27 @@ SerialAsync::Action act_m; }; - // The type of the queue and iterator. + /// The type of the queue and iterator. typedef std::list Container_t; typedef Container_t::iterator Iterator_t; - // The list of requests from various iterates. - // They're granted in FIFO order. + /// The list of requests from various iterates. + /// They're granted in FIFO order. Container_t queue_m; - // Pointer to the last request that has been granted. + /// Pointer to the last request that has been granted. Iterator_t released_m; - // The number of outstanding notifications. + /// The number of outstanding notifications. int notifications_m; }; -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for -// IterateScheduler -// -////////////////////////////////////////////////////////////////////// - -// -// IterateScheduler::handOff(Iterate*) -// No action needs to be taken here. Iterates will make their -// own way into the execution queue. -// +/// void DataObject::releaseIterates(SerialAsync::Action) +/// When the last released iterate dies, we need to +/// look at the beginning of the queue and tell more iterates +/// that they can access this data. inline void -IterateScheduler::handOff(Iterate* it) -{ - it->notify(); -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for Iterate -// -////////////////////////////////////////////////////////////////////// - -// -// Iterate::Iterate -// Construct with the scheduler and the number of notifications. -// Ignore the affinity. -// - -inline -Iterate::Iterate(IterateScheduler& s, int) -: scheduler_m(s), notifications_m(1) -{ -} - -// -// Iterate::notify -// Notify the iterate that a DataObject is ready. -// Decrement the counter, and if it is zero, alert the scheduler. -// - -inline void -Iterate::notify() -{ - if ( --notifications_m == 0 ) - { - add(this); - } -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for DataObject -// -////////////////////////////////////////////////////////////////////// - -// -// DataObject::DataObject() -// Initialize: -// released_m to the end of the queue (which should) also be the -// beginning. notifications_m to zero, since nothing has been -// released yet. -// - -inline -DataObject::DataObject(int) -: released_m(queue_m.end()), notifications_m(0) -{ -} - -// -// void DataObject::release(Action) -// An iterate has finished and is telling the DataObject that -// it is no longer needed. -// - -inline void -DataObject::release(SerialAsync::Action) -{ - if ( --notifications_m == 0 ) - releaseIterates(); -} - - - -//----------------------------------------------------------------------------- -// -// void IterateScheduler::blockingEvaluate -// Evaluate all the iterates in the queue. -// -//----------------------------------------------------------------------------- -inline -void -IterateScheduler::blockingEvaluate() -{ - // Loop as long as there is anything in the queue. - while (SystemContext::workReady()) - { - SystemContext::runSomething(); - } -} - -//----------------------------------------------------------------------------- -// -// void DataObject::releaseIterates(SerialAsync::Action) -// When the last released iterate dies, we need to -// look at the beginning of the queue and tell more iterates -// that they can access this data. -// -//----------------------------------------------------------------------------- -inline -void DataObject::releaseIterates() { // Get rid of the reservations that have finished. @@ -622,14 +613,17 @@ released_m->iterate().notify(); ++notifications_m; - // Record what action that one will take. + // Record what action that one will take + // and record its generation number SerialAsync::Action act = released_m->act(); + int generation = released_m->iterate().generation(); // Look at the next iterate. ++released_m; // If the first one was a read, release more. if ( act == SerialAsync::Read ) + { // As long as we aren't at the end and we have more reads... while ((released_m != end) && @@ -642,29 +636,30 @@ // And go on to the next. ++released_m; } + + } + } } +/// void DataObject::request(Iterate&, action) +/// An iterate makes a reservation with this DataObject for a given +/// action in a given generation. The request may be granted +/// immediately. -// -// void DataObject::request(Iterate&, action) -// An iterate makes a reservation with this DataObject for a given -// action in a given generation. The request may be granted -// immediately. -// -inline -void +inline void DataObject::request(Iterate& it, SerialAsync::Action act) { // The request can be granted immediately if: // The queue is currently empty, or - // The request is a read and everything in the queue is a read. + // the request is a read and everything in the queue is a read, + // or (with relaxed conditions), everything is the same generation. // Set notifications dynamically and automatically // every time a request is made by the iterate - it.incr_notifications(); + it.notifications_m++; bool allReleased = (queue_m.end() == released_m); bool releasable = queue_m.empty() || @@ -691,17 +686,11 @@ } -//---------------------------------------------------------------------- - - -// -// End of Smarts namespace. -// -} +} // namespace Smarts ////////////////////////////////////////////////////////////////////// -#endif // POOMA_PACKAGE_CLASS_H +#endif // _SerialAsync_h_ /*************************************************************************** * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ From oldham at codesourcery.com Tue Jan 6 20:10:59 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 12:10:59 -0800 Subject: [pooma-dev] [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> <3FFB0027.6080509@codesourcery.com> Message-ID: <3FFB1653.7040302@codesourcery.com> Richard Guenther wrote: > On Tue, 6 Jan 2004, Jeffrey D. Oldham wrote: > > >>Let's move the magic constant into a const variable instead of having >>the constant scattered throughout the code. Then, please commit. Thanks. > > > For the record, this is what I committed. It passes builds for both > --serial and --mpi for me. > > Richard. > > > 2004Jan06 Richard Guenther > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > add std::stack for generation tracking, add support for > asyncronous MPI requests. > src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define > new static variables. > src/Threads/IterateSchedulers/Runnable.h: declare add(). > src/Pooma/Pooma.cmpl.cpp: use SystemContext::max_requests > constant. > > Index: Threads/IterateSchedulers/SerialAsync.cmpl.cpp > =================================================================== > RCS file: /home/pooma/Repository/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp,v > retrieving revision 1.3 > diff -u -u -r1.3 SerialAsync.cmpl.cpp > --- Threads/IterateSchedulers/SerialAsync.cmpl.cpp 12 Apr 2000 00:08:06 -0000 1.3 > +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 6 Jan 2004 19:52:47 -0000 > @@ -82,6 +82,13 @@ > > std::list SystemContext::workQueueMessages_m; > std::list SystemContext::workQueue_m; > +#if POOMA_MPI > + const int SystemContext::max_requests; > + MPI_Request SystemContext::requests_m[SystemContext::max_requests]; Thank you. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Tue Jan 6 22:50:46 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 14:50:46 -0800 Subject: Patch: Fix Compilation of Messaging.cmpl.cpp Message-ID: <3FFB3BC6.7080403@codesourcery.com> The attached patch, approved by Richard Guenther, ensures src/Tulip/Messaging.cmpl.cpp can be compiled. The source code uses 'Pooma::context', so 'Pooma::Pooma.h' must be included. Tested under LINUX gcc and G++ 3.4. 2004-01-06 Jeffrey D. Oldham * Messaging.cmpl.cpp: Include "Pooma.h" so "Pooma::context" is declared. -- Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: msg.06Jan.13.9.patch URL: From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 12:55:32 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 13:55:32 +0100 (CET) Subject: [PATCH] Extend/fix some testcases for MPI Message-ID: Hi! This patch fixes some testcases for MPI operation and extends array_test29 to check for individual guard updates (for the optimization patches still pending). Ok? Richard. 2004Jan07 Richard Guenther * src/Array/tests/array_test29: systematically check (partial) guard updates. src/Field/tests/BasicTest3.cpp: use ReplicatedTag, not DistributedTag. src/Layout/tests/dynamiclayout_test1.cpp: #define BARRIER, as in other tests. src/Layout/tests/dynamiclayout_test2.cpp: likewise. src/Tulip/tests/ReduceOverContextsTest.cpp: #include Pooma/Pooma.h. diff -Nru a/r2/src/Array/tests/array_test29.cpp b/r2/src/Array/tests/array_test29.cpp --- a/r2/src/Array/tests/array_test29.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Array/tests/array_test29.cpp Wed Jan 7 13:47:20 2004 @@ -33,17 +33,168 @@ #include "Pooma/Pooma.h" #include "Utilities/Tester.h" -#include "Domain/Loc.h" -#include "Domain/Interval.h" -#include "Partition/UniformGridPartition.h" -#include "Layout/UniformGridLayout.h" -#include "Engine/BrickEngine.h" -#include "Engine/CompressibleBrick.h" -#include "Engine/MultiPatchEngine.h" -#include "Engine/RemoteEngine.h" -#include "Array/Array.h" +#include "Pooma/Arrays.h" #include "Tiny/Vector.h" + +void checks1(Pooma::Tester& tester) +{ + Interval<2> I(9, 9); + Loc<2> blocks(3, 3); + UniformGridPartition<2> partition(blocks, GuardLayers<2>(1)); + UniformGridLayout<2> layout(I, partition, DistributedTag()); + DomainLayout<2> layout2(I, GuardLayers<2>(1)); + + Array<2, int, MultiPatch > > + am(layout), bm(layout); + Array<2, int, Brick> + al(layout2), bl(layout2); + + am = 2; + al = 2; + am(I) = 1; + al(I) = 1; + bm = am; + bl = al; + + bm(I) += am(I-Loc<2>(1, 0)); + bl(I) += al(I-Loc<2>(1, 0)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(0, 1)); + al(I) += bl(I-Loc<2>(0, 1)); + tester.check("upper guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + bm(I) += am(I-Loc<2>(-1, 0)); + bl(I) += al(I-Loc<2>(-1, 0)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(0, -1)); + al(I) += bl(I-Loc<2>(0, -1)); + tester.check("lower guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + + bm(I) += am(I-Loc<2>(1, 1)); + bl(I) += al(I-Loc<2>(1, 1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(1, -1)); + al(I) += bl(I-Loc<2>(1, -1)); + tester.check("lower left guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + bm(I) += am(I-Loc<2>(-1, 1)); + bl(I) += al(I-Loc<2>(-1, 1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(-1, -1)); + al(I) += bl(I-Loc<2>(-1, -1)); + tester.check("lower right guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } +} + +void checks2(Pooma::Tester& tester) +{ + Interval<2> I(9, 9); + Loc<2> blocks(3, 3); + UniformGridPartition<2> partition(blocks, GuardLayers<2>(1)); + UniformGridLayout<2> layout(I, partition, DistributedTag()); + DomainLayout<2> layout2(I, GuardLayers<2>(1)); + + Array<2, int, MultiPatch > > + am(layout), bm(layout); + Array<2, int, Brick> + al(layout2), bl(layout2); + + am = 2; + al = 2; + am(I) = 1; + al(I) = 1; + bm = am; + bl = al; + + bm(I) = am(I-Loc<2>(1, 0)); + bl(I) = al(I-Loc<2>(1, 0)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(0, 1)); + bl(I) = al(I-Loc<2>(0, 1)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, 0)); + bl(I) = al(I-Loc<2>(-1, 0)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(0, -1)); + bl(I) = al(I-Loc<2>(0, -1)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + + bm(I) = am(I-Loc<2>(1, 1)); + bl(I) = al(I-Loc<2>(1, 1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(1, -1)); + bl(I) = al(I-Loc<2>(1, -1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, 1)); + bl(I) = al(I-Loc<2>(-1, 1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, -1)); + bl(I) = al(I-Loc<2>(-1, -1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } +} + int main(int argc, char *argv[]) { Pooma::initialize(argc, argv); @@ -72,6 +223,16 @@ a1(I) = (a2(I-1)+a2(I+1))/2; tester.check("Average", all(a1(I) == 1)); + Interval<1> J(1,7); + a1 = 0; + a2 = 1; + a1(J) = (a2(J-1)+a2(J+1))/2; + tester.check("Average", all(a1(J) == 1)); + + checks1(tester); + if (tester.ok()) + checks2(tester); + int ret = tester.results("array_test29"); Pooma::finalize(); return ret; diff -Nru a/r2/src/Field/tests/BasicTest3.cpp b/r2/src/Field/tests/BasicTest3.cpp --- a/r2/src/Field/tests/BasicTest3.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Field/tests/BasicTest3.cpp Wed Jan 7 13:47:20 2004 @@ -127,7 +127,7 @@ // MultiPatch tester.out() << "MultiPatch...\n"; { - GridLayout<2> layout1(Interval<2>(I, J), Loc<2>(1, 1), GuardLayers<2>(1), DistributedTag()); + GridLayout<2> layout1(Interval<2>(I, J), Loc<2>(1, 1), GuardLayers<2>(1), ReplicatedTag()); Field::Mesh_t, int, MultiPatch > f(vert, layout1, origin, spacings); check(tester, f); diff -Nru a/r2/src/Layout/tests/dynamiclayout_test1.cpp b/r2/src/Layout/tests/dynamiclayout_test1.cpp --- a/r2/src/Layout/tests/dynamiclayout_test1.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Layout/tests/dynamiclayout_test1.cpp Wed Jan 7 13:47:20 2004 @@ -40,7 +40,7 @@ #include "Layout/DynamicLayout.h" #include "Partition/GridPartition.h" -//#define BARRIER +#define BARRIER #ifndef BARRIER #if POOMA_CHEETAH diff -Nru a/r2/src/Layout/tests/dynamiclayout_test2.cpp b/r2/src/Layout/tests/dynamiclayout_test2.cpp --- a/r2/src/Layout/tests/dynamiclayout_test2.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Layout/tests/dynamiclayout_test2.cpp Wed Jan 7 13:47:20 2004 @@ -44,7 +44,7 @@ #include #endif -//#define BARRIER +#define BARRIER #ifndef BARRIER #if POOMA_CHEETAH diff -Nru a/r2/src/Tulip/tests/ReduceOverContextsTest.cpp b/r2/src/Tulip/tests/ReduceOverContextsTest.cpp --- a/r2/src/Tulip/tests/ReduceOverContextsTest.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Tulip/tests/ReduceOverContextsTest.cpp Wed Jan 7 13:47:20 2004 @@ -32,7 +32,7 @@ // Include files -#include "PETE/PETE.h" // seems like overkill... +#include "Pooma/Pooma.h" #include "Tulip/ReduceOverContexts.h" #include "Tulip/RemoteProxy.h" #include "Utilities/Tester.h" From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 13:01:16 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 14:01:16 +0100 (CET) Subject: [PATCH] Clean Threads/ Message-ID: Hi! This patch cleans the Threads/ files by doxygenifizing them and using common code, if available, rather than duplicating existing stuff. Ok? Richard. 2004Jan07 Richard Guenther * src/Threads/IterateSchedulers/IterateScheduler.h: Doxygenifize, only declare classes. src/Threads/IterateSchedulers/Runnable.h: Doxygenifize, reorder methods. src/Threads/PoomaSmarts.h: Doxygenifize. src/Threads/Scheduler.h: Doxygenifize, use #error, not CTAssert. src/Threads/SmartsStubs.h: Doxygenifize, use IterateScheduler.h and Runnable.h instead of duplicating code. diff -Nru a/r2/src/Threads/IterateSchedulers/IterateScheduler.h b/r2/src/Threads/IterateSchedulers/IterateScheduler.h --- a/r2/src/Threads/IterateSchedulers/IterateScheduler.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/IterateSchedulers/IterateScheduler.h Wed Jan 7 13:47:20 2004 @@ -40,44 +40,28 @@ #ifndef ITERATE_SCHEDULER_H #define ITERATE_SCHEDULER_H -//---------------------------------------------------------------------- -// Functions: -// template class IterateScheduler -//---------------------------------------------------------------------- - -//---------------------------------------------------------------------- -// The templated classes. -// This is sort of like an abstract base class, since it doesn't -// implement anything and you can't build one of these directly. -// -//---------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Templates for the scheduler classes. + * + * This is sort of like abstract base classes, since it doesn't + * implement anything and you can't build one of these directly. + * They are implemented by specializing for a tag class like + * Stub or SerialAsync. + */ namespace Smarts { -template -class IterateScheduler -{ -public: - IterateScheduler() {}; - int generation() const {return generation_m;} -private: - int generation_m; -}; template -class Iterate -{ -public: - Iterate() {}; -private: +class IterateScheduler; +template +class Iterate; -}; template -class DataObject -{ -public: - DataObject() {}; -private: -}; +class DataObject; + } // close namespace Smarts -#endif// ITERATE_SCHEDULER_H + +#endif // ITERATE_SCHEDULER_H diff -Nru a/r2/src/Threads/IterateSchedulers/Runnable.h b/r2/src/Threads/IterateSchedulers/Runnable.h --- a/r2/src/Threads/IterateSchedulers/Runnable.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/IterateSchedulers/Runnable.h Wed Jan 7 13:47:20 2004 @@ -29,95 +29,63 @@ #ifndef _Runnable_h_ #define _Runnable_h_ +/** @file + * @ingroup IterateSchedulers + * @brief + * Base class for a schedulable object or function to be executed + * by the scheduler asynchronously. + */ + #include namespace Smarts { -/*------------------------------------------------------------------------ -CLASS - Runnable - - Base class for a schedulable object or function to be executed - by the scheduler asynchronously - -KEYWORDS - Thread, Native_Interface, Task_Parallelism, Data_Parallelism. - -DESCRIPTION - Runnable is the base class for system classes "Thread" and - "Iterate". However, the user may define his/her own - sub-class. Any class derived from Runnable, is an object that - the scheduler understands and therefore is the mechanism to - have something executed in parallel by the scheduler on behalf - of the user. - -COPYRIGHT - This program was prepared by the Regents of the University of - California at Los Alamos National Laboratory (the University) - under Contract No. W-7405-ENG-36 with the U.S. Department of - Energy (DOE). The University has certain rights in the program - pursuant to the contract and the program should not be copied - or distributed outside your organization. All rights in the - program are reserved by the DOE and the University. Neither - the U.S. Government nor the University makes any warranty, - express or implied, or assumes any liability or responsibility - for the use of this software. - - Parts of this software have been authored at the University of - Colorado -- Boulder. Neither University of Colorado nor its - employees makes any warranty, express or implied, or assumes - any liability or responsibility for the use of this SOFTWARE. - - This SOFTWARE may be modified for derivative use, but modified - SOFTWARE should be clearly marked as such, so as not to - confuse it with the versions available from Los Alamos - National Laboratory. -*/ +/** + * Runnable is the base class for system classes "Thread" and + * "Iterate". However, the user may define his/her own + * sub-class. Any class derived from Runnable, is an object that + * the scheduler understands and therefore is the mechanism to + * have something executed in parallel by the scheduler on behalf + * of the user. + */ class Runnable { - friend class Context; public: - /////////////////////////// - // Set priority of this runnable relative to other runnables - // being scheduled. - // - inline int - priority() { return priority_m; }; - - /////////////////////////// - // Accessor function to priority. - // - inline void - priority(int _priority) { priority_m = _priority; }; - - ///////// - virtual ~Runnable(){}; - - ///////// Runnable() { priority_m = 0; } - /////////////////////////// - // The parameter to this constructor is the CPU id for - // hard affinity. - // - Runnable(int ) + /// The parameter to this constructor is the CPU id for + /// hard affinity. + + Runnable(int) { priority_m = 0; } - //////// - virtual void execute(){ - run(); - }; + virtual ~Runnable() {} + + /// Accessor function to priority. + inline int + priority() { return priority_m; } + + /// Set priority of this runnable relative to other runnables + /// being scheduled. + + inline void + priority(int _priority) { priority_m = _priority; } + + virtual void execute() + { + run(); + } protected: - virtual void run(){}; + virtual void run() {} private: int priority_m; @@ -130,5 +98,6 @@ inline void add(RunnablePtr_t); -} +} // namespace Smarts + #endif diff -Nru a/r2/src/Threads/PoomaSmarts.h b/r2/src/Threads/PoomaSmarts.h --- a/r2/src/Threads/PoomaSmarts.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/PoomaSmarts.h Wed Jan 7 13:47:20 2004 @@ -29,25 +29,17 @@ #ifndef POOMA_THREADS_POOMA_SMARTS_H #define POOMA_THREADS_POOMA_SMARTS_H -//----------------------------------------------------------------------------- -// Types: -// Pooma::SmartsTag_t -// Pooma::Scheduler_t -// Pooma::DataObject_t -// Pooma::Iterate_t -// -// Global Data: -// Pooma::schedulerVersion -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Overview: -// The POOMA wrapper around defines, includes, and typedefs for the Smarts -// run-time evaluation system. Based on the settings of POOMA_THREADS and -// the selected scheduler, define several typedefs and include the necessary -// files. If we're compiling only for serial runs, use a stub version of -// the Smarts interface instead. -//----------------------------------------------------------------------------- +/** @file + * @ingroup Threads + * @brief + * The POOMA wrapper around defines, includes, and typedefs for the Smarts + * run-time evaluation system. + * + * Based on the settings of POOMA_THREADS and + * the selected scheduler, define several typedefs and include the necessary + * files. If we're compiling only for serial runs, use a stub version of + * the Smarts interface instead. + */ //----------------------------------------------------------------------------- // Includes: diff -Nru a/r2/src/Threads/Scheduler.h b/r2/src/Threads/Scheduler.h --- a/r2/src/Threads/Scheduler.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/Scheduler.h Wed Jan 7 13:47:20 2004 @@ -34,16 +34,16 @@ #ifndef POOMA_THREADS_SCHEDULER_H #define POOMA_THREADS_SCHEDULER_H -////////////////////////////////////////////////////////////////////// - -//----------------------------------------------------------------------------- -// Overview: -// -// This file exist to wrap the correct includes from Smarts based on the -// scheduler that we've selected. If we're running in serial then we include -// the a stub file. This file defines a single typedef: SmartsTag_t, a policy -// tag which is used to select the appropriate smarts data object etc. -//----------------------------------------------------------------------------- +/** @file + * @ingroup Threads + * @brief + * Scheduler multiplexing based on configuration. + * + * This file exist to wrap the correct includes from Smarts based on the + * scheduler that we've selected. If we're running in serial then we include + * the a stub file. This file defines a single typedef: SmartsTag_t, a policy + * tag which is used to select the appropriate smarts data object etc. + */ //----------------------------------------------------------------------------- // Includes: @@ -82,8 +82,7 @@ # else -# include "Utilities/PAssert.h" -CTAssert(YOU_HAVE_NOT_SELECTED_A_SCHEDULER); +# error "You have not selected a scheduler" # endif diff -Nru a/r2/src/Threads/SmartsStubs.h b/r2/src/Threads/SmartsStubs.h --- a/r2/src/Threads/SmartsStubs.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/SmartsStubs.h Wed Jan 7 13:47:20 2004 @@ -29,56 +29,21 @@ #ifndef POOMA_THREADS_SMARTSSTUBS_H #define POOMA_THREADS_SMARTSSTUBS_H -//----------------------------------------------------------------------------- -// Functions: -// SimpleSerialScheduler -// SimpleSerialScheduler::Iterate -// SimpleSerialScheduler::DataObject -// template class IterateScheduler -// IterateScheduler -// Iterate -// DataObject -// void concurrency(int) -// int concurrency() -// wait -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Stub scheduler for serial in-order evaluation. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- -//---------------------------------------------------------------------- -// The templated classes. -// This is sort of like an abstract base class, since it doesn't -// implement anything and you can't build one of these directly. -//---------------------------------------------------------------------- +#include "Threads/IterateSchedulers/IterateScheduler.h" +#include "Threads/IterateSchedulers/Runnable.h" namespace Smarts { -template -class IterateScheduler -{ -private: - // Private ctor means you can't build one of these. - IterateScheduler() {} -}; - -template -class Iterate -{ -private: - // Private ctor means you can't build one of these. - Iterate() {} -}; - -template -class DataObject -{ -private: - // Private ctor means you can't build one of these. - DataObject() {} -}; - //---------------------------------------------------------------------- // The tag class we'll use for the template parameter. //---------------------------------------------------------------------- @@ -93,6 +58,7 @@ template<> class IterateScheduler; template<> class DataObject; + //////////////////////////////////////////////////////////////////////// // // The specialization of Iterate for Stub @@ -100,7 +66,7 @@ //////////////////////////////////////////////////////////////////////// template<> -class Iterate +class Iterate : public Runnable { public: // Construct the Iterate with: @@ -244,30 +210,9 @@ { } -class Runnable -{ -public: - // Runnable just takes affinity. - inline Runnable(int affinity) - : affinity_m(affinity) - { } - - // The dtor is virtual because the subclasses will need to add to it. - virtual ~Runnable() {} - virtual void run() = 0; - - int affinity() { return affinity_m; } - int hintAffinity() { return affinity_m; } - void affinity(int) {} - void hintAffinity(int) {} - -private: - int affinity_m; -}; - inline void add(Runnable *runnable) { - runnable->run(); + runnable->execute(); delete runnable; } From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 14:38:54 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 15:38:54 +0100 (CET) Subject: [PATCH] Provide -openmp to user apps, too Message-ID: Hi! Last time I missed the user used Makefile in the lib directory. Fixed with this pathch. Ok? Richard. 2004Jan07 Richard Guenther * configure: provide openmp arguments to lib/$SUITE/Makefile. ===== configure 1.20 vs edited ===== --- 1.20/r2/configure Wed Jan 7 12:23:35 2004 +++ edited/configure Wed Jan 7 15:35:57 2004 @@ -2335,14 +2335,14 @@ print MFILE "LD_PARALLEL = 1\n"; print MFILE "\n"; print MFILE "### flags for applications\n"; - print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $cppopt_app\n"; - print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $cppdbg_app\n"; + print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; + print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; print MFILE "\n"; - print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $copt_app\n"; - print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $cdbg_app\n"; + print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $openmpargs $copt_app\n"; + print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; print MFILE "\n"; - print MFILE "POOMA_F77_OPT_ARGS = $f77args $f77opt_app\n"; - print MFILE "POOMA_F77_DBG_ARGS = $f77args $f77dbg_app\n"; + print MFILE "POOMA_F77_OPT_ARGS = $openmpargs $f77args $f77opt_app\n"; + print MFILE "POOMA_F77_DBG_ARGS = $openmpargs $f77args $f77dbg_app\n"; print MFILE "\n"; print MFILE "POOMA_INCLUDES = $totinclist\n"; print MFILE "\n"; From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 16:57:46 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 17:57:46 +0100 (CET) Subject: [PATCH] Support hybrid MPI/OpenMP if available Message-ID: Hi! This patch makes sure to correctly initialize MPI according to the standard when using OpenMP. Tested with mpich and Intel icpc where in fact, this mode is not supported appearantly. Ok? Richard. 2004Jan07 Richard Guenther * src/Pooma/Pooma.cmpl.cpp: initialize MPI using MPI_Init_thread if _OPENMP is defined, require at least MPI_THREAD_FUNNELED support. ===== Pooma/Pooma.cmpl.cpp 1.6 vs edited ===== --- 1.6/r2/src/Pooma/Pooma.cmpl.cpp Wed Jan 7 12:23:35 2004 +++ edited/Pooma/Pooma.cmpl.cpp Wed Jan 7 17:54:30 2004 @@ -288,7 +288,13 @@ // the Cheetah options from the Options object. #if POOMA_MPI +# ifdef _OPENMP + int provided; + MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); + PInsist(provided >= MPI_THREAD_FUNNELED, "No MPI support for OpenMP"); +# else MPI_Init(&argc, &argv); +# endif #elif POOMA_CHEETAH controller_g = new Cheetah::Controller(argc, argv); #endif From oldham at codesourcery.com Thu Jan 8 21:34:55 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:34:55 -0800 Subject: [PATCH] Extend/fix some testcases for MPI In-Reply-To: References: Message-ID: <3FFDCCFF.7090401@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch fixes some testcases for MPI operation and extends array_test29 > to check for individual guard updates (for the optimization patches still > pending). > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * src/Array/tests/array_test29: systematically check (partial) > guard updates. > src/Field/tests/BasicTest3.cpp: use ReplicatedTag, not > DistributedTag. > src/Layout/tests/dynamiclayout_test1.cpp: #define BARRIER, as in > other tests. > src/Layout/tests/dynamiclayout_test2.cpp: likewise. > src/Tulip/tests/ReduceOverContextsTest.cpp: #include Pooma/Pooma.h. Yes. Thanks. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:37:12 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:37:12 -0800 Subject: [PATCH] Clean Threads/ In-Reply-To: References: Message-ID: <3FFDCD88.5000602@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch cleans the Threads/ files by doxygenifizing them and > using common code, if available, rather than duplicating existing stuff. > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * src/Threads/IterateSchedulers/IterateScheduler.h: Doxygenifize, > only declare classes. > src/Threads/IterateSchedulers/Runnable.h: Doxygenifize, reorder > methods. > src/Threads/PoomaSmarts.h: Doxygenifize. > src/Threads/Scheduler.h: Doxygenifize, use #error, not CTAssert. > src/Threads/SmartsStubs.h: Doxygenifize, use IterateScheduler.h > and Runnable.h instead of duplicating code. Yes. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:37:53 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:37:53 -0800 Subject: [PATCH] Provide -openmp to user apps, too In-Reply-To: References: Message-ID: <3FFDCDB1.6040102@codesourcery.com> Richard Guenther wrote: > Hi! > > Last time I missed the user used Makefile in the lib directory. Fixed with > this pathch. > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * configure: provide openmp arguments to lib/$SUITE/Makefile. > > ===== configure 1.20 vs edited ===== > --- 1.20/r2/configure Wed Jan 7 12:23:35 2004 > +++ edited/configure Wed Jan 7 15:35:57 2004 > @@ -2335,14 +2335,14 @@ > print MFILE "LD_PARALLEL = 1\n"; > print MFILE "\n"; > print MFILE "### flags for applications\n"; > - print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $cppopt_app\n"; > - print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $cppdbg_app\n"; > + print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; > + print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; > print MFILE "\n"; > - print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $copt_app\n"; > - print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $cdbg_app\n"; > + print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $openmpargs $copt_app\n"; > + print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; > print MFILE "\n"; > - print MFILE "POOMA_F77_OPT_ARGS = $f77args $f77opt_app\n"; > - print MFILE "POOMA_F77_DBG_ARGS = $f77args $f77dbg_app\n"; > + print MFILE "POOMA_F77_OPT_ARGS = $openmpargs $f77args $f77opt_app\n"; > + print MFILE "POOMA_F77_DBG_ARGS = $openmpargs $f77args $f77dbg_app\n"; > print MFILE "\n"; > print MFILE "POOMA_INCLUDES = $totinclist\n"; > print MFILE "\n"; Yes, please commit. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:43:48 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:43:48 -0800 Subject: [PATCH] Support hybrid MPI/OpenMP if available In-Reply-To:

POOMA 2.3 parallel model

POOMA post-2.4 parallel model

CHEETAH overview:

POOMA 2.3 parallel model

POOMA post-2.4 parallel model

CHEETAH overview: