From oldham at codesourcery.com Thu Jan 1 17:08:48 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:08:48 -0800 Subject: [PATCH] Optimize guard update copy In-Reply-To: References: Message-ID: <3FF45420.3090106@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch removes number four of the copies done for guard update. > Basically, additionally to the three copies I mentioned in the previous > mail, we're doing one extra during the RemoteView expressionApply of the > data-parallel assignment we're doing for the guard domains. Ugh. Fixed by > manually sending/receiving from/to the views. Doesn't work for Cheetah, > so conditionalized on POOMA_MPI. What breaks for Cheetah? > Tested as usual, ok to apply? > > Richard. > > > 2003Dec30 Richard Guenther > > * src/Engine/MultiPatchEngine.cpp: optimize remote to local and > local to remote copy in guard update. > > ===== MultiPatchEngine.cpp 1.6 vs 1.7 ===== > --- 1.6/r2/src/Engine/MultiPatchEngine.cpp Tue Dec 9 12:16:07 2003 > +++ 1.7/r2/src/Engine/MultiPatchEngine.cpp Thu Dec 18 16:41:50 2003 > @@ -34,6 +34,7 @@ > #include "Engine/CompressedFraction.h" > #include "Array/Array.h" > #include "Tulip/ReduceOverContexts.h" > +#include "Tulip/SendReceive.h" > #include "Threads/PoomaCSem.h" > #include "Domain/IteratorPairDomain.h" > > @@ -261,6 +262,40 @@ > // > //----------------------------------------------------------------------------- > > +/// Guard layer assign between non-remote engines, just use the > +/// ET mechanisms > + > +template > +static inline > +void simpleAssign(const Array& lhs, > + const Array& rhs, > + const Interval& domain) > +{ > + lhs(domain) = rhs(domain); > +} > + > +/// Guard layer assign between remote engines, use Send/Receive directly > +/// to avoid one extra copy of the data. > + > +template > +static inline > +void simpleAssign(const Array >& lhs, > + const Array >& rhs, > + const Interval& domain) > +{ > + if (lhs.engine().owningContext() == rhs.engine().owningContext()) > + lhs(domain) = rhs(domain); > + else { > + typedef typename NewEngine, Interval >::Type_t ViewEngine_t; > + if (lhs.engine().engineIsLocal()) > + Receive::receive(ViewEngine_t(lhs.engine().localEngine(), domain), > + rhs.engine().owningContext()); > + else if (rhs.engine().engineIsLocal()) > + SendReceive::send(ViewEngine_t(rhs.engine().localEngine(), domain), > + lhs.engine().owningContext()); > + } > +} > + > template > void Engine >:: > fillGuardsHandler(const WrappedInt &) const > @@ -293,8 +328,12 @@ > Array lhs(data()[dest]), rhs(data()[src]); > > // Now do assignment from the subdomains. > - > + // Optimized lhs(p->domain_m) = rhs(p->domain_m); > +#if POOMA_MPI > + simpleAssign(lhs, rhs, p->domain_m); > +#else > lhs(p->domain_m) = rhs(p->domain_m); > +#endif > > ++p; > } -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:17:21 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:17:21 -0800 Subject: [PATCH] MPI SendReceive In-Reply-To: References: Message-ID: <3FF45621.4000404@codesourcery.com> Richard Guenther wrote: > Hi! > > This is now the MPI version of SendReceive.h, including changes to > RemoteEngine.h which handles (de-)serialization of engines. The latter > change allows optimizing away one of the three(!) copies we are doing > currently for communicating an engine at receive time: > - receive into message buffer > - deserialize into temporary brick engine > - copy temporary brick engine to target view > > the message buffer is now directly deserialized into the target view (for > non-Cheetah operation, with Cheetah this is not possible). Patch which > removes a fourth(!!) copy we're doing at guard update follows. > > Tested as usual. > > Ok? Yes. Thanks for improving the performance. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:24:47 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:24:47 -0800 Subject: [PATCH] Add MPI variants for RemoteProxy, CollectFromContexts and ReduceOverContexts In-Reply-To: References: Message-ID: <3FF457DF.6080709@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds native MPI variants of the above messaging abstractions. > These patches were tested together with the remaining changes with serial, > Cheetah and MPI. As POOMA_MPI is never defined (for now), this shouldn't > introduce regressions there, too. But of course for it alone, this patch > is useless. More to follow. > > Ok? Yes. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 1 17:25:53 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 01 Jan 2004 09:25:53 -0800 Subject: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: References: Message-ID: <3FF45821.8030605@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch is a partial reversion of a previous patch that made us use > Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a > Cheetah::CHEETAH serialization for std::string, which was previously > missing. One step more for the MPI merge. > > Tested together with all other MPI changes with serial, Cheetah and MPI. > > Ok? Yes. Do we need more regression tests for this work to better ensure correctness? -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 22:45:45 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 1 Jan 2004 23:45:45 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: <3FF45821.8030605@codesourcery.com> References: <3FF45821.8030605@codesourcery.com> Message-ID: On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch is a partial reversion of a previous patch that made us use > > Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a > > Cheetah::CHEETAH serialization for std::string, which was previously > > missing. One step more for the MPI merge. > > > > Tested together with all other MPI changes with serial, Cheetah and MPI. > > > > Ok? > > Yes. Do we need more regression tests for this work to better ensure > correctness? Maybe, at least we get all non-POD types that are not explicitly specialized wrong during serialization. And I can tell you, such errors are _very_ hard to find (happened for me for std::string and RemoteProxy). Richard. From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:26:27 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:26:27 +0100 (CET) Subject: [PATCH] Initialize MPI Message-ID: Hi! This patch adds MPI initialization. Ok? Richard. 2004Jan02 Richard Guenther * src/Pooma/Pooma.cmpl.cpp: add initialization and finalization sequence for MPI. Pooma::blockAndEvaluate() at finalization. --- /home/richard/src/pooma/cvs/r2/src/Pooma/Pooma.cmpl.cpp 2003-12-25 12:26:04.000000000 +0100 +++ Pooma/Pooma.cmpl.cpp 2004-01-02 00:40:15.000000000 +0100 @@ -287,10 +287,10 @@ // we can do this in the other initialize routine by querying for // the Cheetah options from the Options object. -#if POOMA_CHEETAH - +#if POOMA_MPI + MPI_Init(&argc, &argv); +#elif POOMA_CHEETAH controller_g = new Cheetah::Controller(argc, argv); - #endif // Just create an Options object for this argc, argv set, and give that @@ -349,12 +349,20 @@ // Set myContext_s and numContexts_s to the context numbers. -#if POOMA_CHEETAH +#if POOMA_MESSAGING +#if POOMA_MPI + MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); + MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); + // ugh... + for (int i=0; imycontext(); numContexts_g = controller_g->ncontexts(); +#endif initializeCheetahHelpers(numContexts_g); @@ -376,14 +384,14 @@ warnMessages(opts.printWarnings()); errorMessages(opts.printErrors()); -#if POOMA_CHEETAH - // This barrier is here so that Pooma is initialized on all contexts // before we continue. (Another context could invoke a remote member // function on us before we're initialized... which would be bad.) +#if POOMA_MPI + MPI_Barrier(MPI_COMM_WORLD); +#elif POOMA_CHEETAH controller_g->barrier(); - #endif // Initialize the Inform streams with info on how many contexts we @@ -416,6 +424,8 @@ bool finalize(bool quitRTS, bool quitArch) { + Pooma::blockAndEvaluate(); + if (initialized_s) { // Wait for threads to finish. @@ -426,7 +436,7 @@ cleanup_s(); -#if POOMA_CHEETAH +#if POOMA_MESSAGING // Clean up the Cheetah helpers. finalizeCheetahHelpers(); @@ -436,15 +446,19 @@ if (quitRTS) { -#if POOMA_CHEETAH +#if POOMA_MESSAGING // Deleting the controller shuts down the cross-context communication // if this is the last thing using this controller. If something // else is using this, Cheetah will not shut down until that item // is destroyed or stops using the controller. +#if POOMA_MPI + MPI_Finalize(); +#elif POOMA_CHEETAH if (controller_g != 0) delete controller_g; +#endif #endif } @@ -784,18 +799,18 @@ SystemContext_t::runSomething(); } -#elif POOMA_REORDER_ITERATES +# elif POOMA_REORDER_ITERATES CTAssert(NO_SUPPORT_FOR_THREADS_WITH_MESSAGING); -#else // we're using the serial scheduler, so we only need to get messages +# else // we're using the serial scheduler, so we only need to get messages while (Pooma::incomingMessages()) { controller_g->poll(); } -#endif // schedulers +# endif // schedulers #else // !POOMA_CHEETAH From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 22:43:03 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 1 Jan 2004 23:43:03 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Optimize guard update copy In-Reply-To: <3FF45420.3090106@codesourcery.com> References: <3FF45420.3090106@codesourcery.com> Message-ID: On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch removes number four of the copies done for guard update. > > Basically, additionally to the three copies I mentioned in the previous > > mail, we're doing one extra during the RemoteView expressionApply of the > > data-parallel assignment we're doing for the guard domains. Ugh. Fixed by > > manually sending/receiving from/to the views. Doesn't work for Cheetah, > > so conditionalized on POOMA_MPI. > > What breaks for Cheetah? I don't remember... I can try again next week. Richard. From rguenth at tat.physik.uni-tuebingen.de Thu Jan 1 23:55:48 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 00:55:48 +0100 (CET) Subject: CVS down? Message-ID: Hi! $ traceroute pooma.codesourcery.com traceroute to pooma.codesourcery.com (65.73.237.138), 30 hops max, 38 byte packets 1 kolme.hamnixda.de (192.168.100.254) 5.260 ms 2.008 ms 1.801 ms 2 217.5.98.157 (217.5.98.157) 69.914 ms 61.777 ms 61.311 ms 3 217.237.156.218 (217.237.156.218) 60.429 ms 59.232 ms 60.236 ms 4 WAS-E4.WAS.US.NET.DTAG.DE (62.154.14.134) 191.891 ms 165.471 ms 162.642 ms 5 62.156.138.210 (62.156.138.210) 168.278 ms 173.438 ms 168.534 ms 6 bpr2-ae0.VirginiaEquinix.cw.net (208.173.50.253) 167.892 ms 170.474 ms 168.912 ms 7 208.173.50.242 (208.173.50.242) 163.327 ms 166.910 ms 203.087 ms 8 p7-3.cr01.mcln.eli.net (207.173.114.129) 171.543 ms !H 165.860 ms !H * Oh, btw. www.codesourcery.com seems to be down, too ((61) Connection refused). Richard. From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:34:16 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:34:16 +0100 (CET) Subject: [PATCH] Add --mpi configure switch Message-ID: Hi! This (finally) adds --mpi configure switch to enable POOMA_MPI. It checks for mpiCC or mpic++ in either $MPICH_ROOT/bin or the current $PATH and uses the first one found as new $cpp and $link. I didn't change the Cheetah configure switch which now has the slightly confusing name --messaging. Maybe we want to change this to --cheetah. Ok? I'll start full testing of serial, MPI and Cheetah to see if I forgot a part of the changes after the pending stuff is committed. Thanks, Richard. 2004Jan02 Richard Guenther * configure: add --mpi switch to enable MPI messaging using mpiCC/mpic++. --- /home/richard/src/pooma/cvs/r2/configure 2003-12-30 18:19:29.000000000 +0100 +++ configure 2004-01-02 00:40:10.000000000 +0100 @@ -209,8 +208,9 @@ $hdf5nm = "--hdf5"; $fftwnm = "--fftw"; $cheetahnm = "--messaging"; +$mpinm = "--mpi"; $strictnm = "--strict"; $archfnsnm = "--arch-specific-functions"; ### configure options $dbgprntnm = "-v"; # turn on verbose output from configure @@ -236,10 +237,11 @@ [$sharednm, "", "create a shared library."], [$finternm, "", "include fortran support libraries."], [$nofinternm, "", "do not include the fortran libraries."], [$preinstnm, "", "build preinstantiations of several classes."], [$serialnm, "", "configure to run serially, no parallelism."], - [$threadsnm, "", "include threads capability, if available."], + [$threadsnm, "", "include threads capability, if available."], [$cheetahnm, "", "enable use of CHEETAH communications package."], + [$mpinm, "", "enable use of MPI communications package."], [$schednm, "", "use for thread scheduling."], [$pawsnm, "", "enable PAWS program coupling, if available."], [$pawsdevnm, "", "enable PAWS program coupling for PAWS devel."], @@ -1276,13 +1266,22 @@ { $cheetah = 1; } - print "Set cheetah = $cheetah\n" if $dbgprnt; + if (scalar @{$arghash{$mpinm}} > 1) + { + $mpi = 1; + } $messaging = $cheetah + $mpi; + if ($messaging>1 or $messaging and scalar @{$arghash{$serialnm}}> 1) + { + printerror "$cheetahnm and/or $mpinm and/or $serialnm given. Use only one."; + } + print "Set messaging = $messaging\n" if $dbgprnt; # add a define indicating whether CHEETAH/MPI is available, and configure # extra options to include and define lists my $defmessaging = $messaging; my $defcheetah = 0; + my $defmpi = 0; if ($cheetah) { if (exists $ENV{"CHEETAHDIR"}) @@ -1299,7 +1298,6 @@ } $defcheetah = 1; - $scheduler = "serialAsync"; # add in the extra compilation settings for CHEETAH. @@ -1315,8 +1313,40 @@ $link = $cheetah_link; } } + elsif ($mpi) + { + my $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; + if (system("test -x $MPICH_ROOT/bin/mpiCC") == 0) + { + $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; + } + elsif (system("test -x $MPICH_ROOT/bin/mpic++") == 0) + { + $mpiCC = "\$(MPICH_ROOT)/bin/mpic++"; + } + elsif (system("which mpiCC") == 0) + { + $mpiCC = "mpiCC"; + } + elsif (system("which mpic++") == 0) + { + $mpiCC = "mpic++"; + } + else + { + die "There is no known MPI location. Select one by setting MPICH_ROOT or adjusting your PATH.\n"; + } + + $defmpi = 1; + $scheduler = "serialAsync"; + + # use special compiler script for MPI. + $cpp = $mpiCC; + $link = $mpiCC; + } add_yesno_define("POOMA_MESSAGING", $defmessaging); add_yesno_define("POOMA_CHEETAH", $defcheetah); + add_yesno_define("POOMA_MPI", $defmpi); } From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 12:20:35 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 13:20:35 +0100 (CET) Subject: [PATCH] MPI support for SerialAsync scheduler Message-ID: Hi! This patch moves SerialAsync to the state I have it. So this patch maybe somewhat hard to follow, so I'll go through the obfuscating parts first: - it moves commentary to doxygen style - it moves Iterate definition up due to dependency issues Apart from this, the patch introduces a std::stack for tracking the current generation. This is necessary for MPI messaging to avoid deadlocks waiting for communication on one end that hasn't been issued at the remote end yet. Basically the only places where a full blockAndEvaluate() is safe, is, if we're not inside a generation. And we need to sometimes wait for communication to complete due to a limited amount of MPI_Requests we can have in fly. For asyncronous MPI operation the scheduler maintains the necessary MPI_Request structures and has the ability to wait on the completion of the asyncronous requests. This makes necessary the deferred destruction of the Iterates done via a reference count that is incremented on every MPI request issued and decremented on every MPI request completed. This same mechanism may possibly used to solve the Cheetah use-after-destruct issue -- I'll prepare a seperate patch for this. So, I hope I didn't forget something in the patch. The patch was tested as usual. Ok to commit? Thanks, Richard. 2004Jan02 Richard Guenther * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, add std::stack for generation tracking, add support for asyncronous MPI requests. src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define new static variables. --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.h 2000-06-09 00:16:50.000000000 +0200 +++ Threads/IterateSchedulers/SerialAsync.h 2004-01-02 00:40:16.000000000 +0100 @@ -42,48 +42,38 @@ // DataObject //----------------------------------------------------------------------------- -#include - #ifndef _SerialAsync_h_ #define _SerialAsync_h_ -/* -LIBRARY: - SerialAsync - -CLASSES: IterateScheduler - -CLASSES: DataObject - -CLASSES: Iterate - -OVERVIEW - SerialAsync IterateScheduler is a policy template to create a - dependence graphs and executes the graph respecting the - dependencies without using threads. There is no parallelism, - but Iterates may be executed out-of-order with respect to the - program text. - ------------------------------------------------------------------------------*/ - -////////////////////////////////////////////////////////////////////// -//----------------------------------------------------------------------------- -// Overview: -// Smarts classes for times when you want no threads but you do want -// dataflow evaluation. -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Typedefs: -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Smarts classes for times when you want no threads but you do want + * dataflow evaluation. + * + * SerialAsync IterateScheduler is a policy template to create a + * dependence graphs and executes the graph respecting the + * dependencies without using threads. + * There is no (thread level) parallelism, but Iterates may be executed + * out-of-order with respect to the program text. Also this scheduler is + * used for message based parallelism in which case asyncronous execution + * leads to reduced communication latencies. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- #include +#include +#include +#include +#include +#include #include "Threads/IterateSchedulers/IterateScheduler.h" #include "Threads/IterateSchedulers/Runnable.h" +#include "Tulip/Messaging.h" +#include "Utilities/PAssert.h" //----------------------------------------------------------------------------- // Forward Declarations: @@ -94,76 +84,261 @@ namespace Smarts { -#define MYID 0 -#define MAX_CPUS 1 -// -// Tag class for specializing IterateScheduler, Iterate and DataObject. -// +/** + * Tag class for specializing IterateScheduler, Iterate and DataObject. + */ + struct SerialAsync { - enum Action { Read, Write}; + enum Action { Read, Write }; }; -//----------------------------------------------------------------------------- +/** + * Iterate is used to implement the SerialAsync + * scheduling policy. + * + * An Iterate is a non-blocking unit of concurrency that is used + * to describe a chunk of work. It inherits from the Runnable + * class and as all subclasses of Runnable, the user specializes + * the run() method to specify the operation. + * Iterate is a further specialization of the + * Iterate class to use the SerialAsync Scheduling algorithm to + * generate the data dependency graph for a data-driven + * execution. + */ + +template<> +class Iterate : public Runnable +{ + friend class IterateScheduler; + friend class DataObject; + +public: + + typedef DataObject DataObject_t; + typedef IterateScheduler IterateScheduler_t; + + + /// The Constructor for this class takes the IterateScheduler and a + /// CPU affinity. CPU affinity has a default value of -1 which means + /// it may run on any CPU available. + + inline Iterate(IterateScheduler & s, int affinity=-1) + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) + {} + + /// The dtor is virtual because the subclasses will need to add to it. + + virtual ~Iterate() {} + + /// The run method does the core work of the Iterate. + /// It is supplied by the subclass. + + virtual void run() = 0; + + //@name Stubs for the affinities + /// There is no such thing in serial. + //@{ + + inline int affinity() const {return 0;} + + inline int hintAffinity() const {return 0;} + + inline void affinity(int) {} + + inline void hintAffinity(int) {} + + //@} + + /// Notify is used to indicate to the Iterate that one of the data + /// objects it had requested has been granted. To do this, we dec a + /// dependence counter which, if equal to 0, the Iterate is ready for + /// execution. + + void notify() + { + if (--notifications_m == 0) + add(this); + } + + /// How many notifications remain? + + int notifications() const { return notifications_m; } + + void addNotification() { notifications_m++; } + + int& generation() { return generation_m; } + + int& togo() { return togo_m; } + +protected: + + /// What scheduler are we working with? + IterateScheduler &scheduler_m; + + /// How many notifications should we receive before we can run? + int notifications_m; + + /// Which generation we were issued in. + int generation_m; + + /// How many times we need to go past a "did something" to be ready + /// for destruction? + int togo_m; + +}; + + +/** + * FIXME. + */ struct SystemContext { void addNCpus(int) {} void wait() {} void concurrency(int){} - int concurrency() {return 1;} + int concurrency() { return 1; } void mustRunOn() {} // We have a separate message queue because they are // higher priority. + typedef Iterate *IteratePtr_t; static std::list workQueueMessages_m; static std::list workQueue_m; +#if POOMA_MPI + static MPI_Request requests_m[1024]; + static std::map allocated_requests_m; + static std::set free_requests_m; +#endif + + +#if POOMA_MPI - /////////////////////////// - // This function lets you check if there are iterates that are - // ready to run. - inline static - bool workReady() + /// Query, if we have lots of MPI_Request slots available + + static bool haveLotsOfMPIRequests() { - return !(workQueue_m.empty() && workQueueMessages_m.empty()); + return free_requests_m.size() > 1024/2; } - /////////////////////////// - // Run an iterate if one is ready. - inline static - void runSomething() + /// Get a MPI_Request slot, associated with an iterate + + static MPI_Request* getMPIRequest(IteratePtr_t p) { - if (!workQueueMessages_m.empty()) - { - // Get the top iterate. - // Delete it from the queue. - // Run the iterate. - // Delete the iterate. This could put more iterates in the queue. + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); + int i = *free_requests_m.begin(); + free_requests_m.erase(free_requests_m.begin()); + allocated_requests_m[i] = p; + p->togo()++; + return &requests_m[i]; + } - RunnablePtr_t p = workQueueMessages_m.front(); - workQueueMessages_m.pop_front(); - p->execute(); + static void releaseMPIRequest(int i) + { + IteratePtr_t p = allocated_requests_m[i]; + allocated_requests_m.erase(i); + free_requests_m.insert(i); + if (--(p->togo()) == 0) delete p; - } + } + + static bool waitForSomeRequests(bool mayBlock) + { + if (allocated_requests_m.empty()) + return false; + + int last_used_request = allocated_requests_m.rbegin()->first; + int finished[last_used_request+1]; + MPI_Status statuses[last_used_request+1]; + int nr_finished; + int res; + if (mayBlock) + res = MPI_Waitsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); else - { - if (!workQueue_m.empty()) - { - RunnablePtr_t p = workQueue_m.front(); - workQueue_m.pop_front(); - p->execute(); - delete p; + res = MPI_Testsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); + if (nr_finished == MPI_UNDEFINED) + return false; + + // release finised requests + while (nr_finished--) { + if (res == MPI_ERR_IN_STATUS) { + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { + char msg[MPI_MAX_ERROR_STRING+1]; + int len; + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); + msg[len] = '\0'; + PInsist(0, msg); + } } + releaseMPIRequest(finished[nr_finished]); } + return true; + } + +#else + + static bool waitForSomeRequests(bool mayBlock) + { + return false; + } + +#endif + + + /// This function lets you check if there are iterates that are + /// ready to run. + + static bool workReady() + { + return !(workQueue_m.empty() + && workQueueMessages_m.empty() +#if POOMA_MPI + && allocated_requests_m.empty() +#endif + ); + } + + /// Run an iterate if one is ready. Returns if progress + /// was made. + + static bool runSomething(bool mayBlock = true) + { + // do work in this order to minimize communication latency: + // - issue all messages + // - do some regular work + // - wait for messages to complete + + RunnablePtr_t p = NULL; + if (!workQueueMessages_m.empty()) { + p = workQueueMessages_m.front(); + workQueueMessages_m.pop_front(); + } else if (!workQueue_m.empty()) { + p = workQueue_m.front(); + workQueue_m.pop_front(); + } + + if (p) { + p->execute(); + Iterate *it = dynamic_cast(p); + if (it) { + if (--(it->togo()) == 0) + delete it; + } else + delete p; + return true; + + } else + return waitForSomeRequests(mayBlock); } }; -inline void addRunnable(RunnablePtr_t rn) -{ - SystemContext::workQueue_m.push_front(rn); -} +/// Adds a runnable to the appropriate work-queue. inline void add(RunnablePtr_t rn) { @@ -182,25 +357,18 @@ inline void wait() {} inline void mustRunOn(){} -/*------------------------------------------------------------------------ -CLASS - IterateScheduler_Serial_Async - - Implements a asynchronous scheduler for a data driven execution. - Specializes a IterateScheduler. - -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - - The SerialAsync IterateScheduler, Iterate and DataObject - implement a SMARTS scheduler that does dataflow without threads. - What that means is that when you hand iterates to the - IterateScheduler it stores them up until you call - IterateScheduler::blockingEvaluate(), at which point it evaluates - iterates until the queue is empty. ------------------------------------------------------------------------------*/ + +/** + * Implements a asynchronous scheduler for a data driven execution. + * Specializes a IterateScheduler. + * + * The SerialAsync IterateScheduler, Iterate and DataObject + * implement a SMARTS scheduler that does dataflow without threads. + * What that means is that when you hand iterates to the + * IterateScheduler it stores them up until you call + * IterateScheduler::blockingEvaluate(), at which point it evaluates + * iterates until the queue is empty. + */ template<> class IterateScheduler @@ -212,196 +380,128 @@ typedef DataObject DataObject_t; typedef Iterate Iterate_t; - /////////////////////////// - // Constructor - // - IterateScheduler() {} - - /////////////////////////// - // Destructor - // - ~IterateScheduler() {} - void setConcurrency(int) {} - - //--------------------------------------------------------------------------- - // Mutators. - //--------------------------------------------------------------------------- - - /////////////////////////// - // Tells the scheduler that the parser thread is starting a new - // data-parallel statement. Any Iterate that is handed off to the - // scheduler between beginGeneration() and endGeneration() belongs - // to the same data-paralllel statement and therefore has the same - // generation number. - // - inline void beginGeneration() { } - - /////////////////////////// - // Tells the scheduler that no more Iterates will be handed off for - // the data parallel statement that was begun with a - // beginGeneration(). - // - inline void endGeneration() {} - - /////////////////////////// - // The parser thread calls this method to evaluate the generated - // graph until all the nodes in the dependence graph has been - // executed by the scheduler. That is to say, the scheduler - // executes all the Iterates that has been handed off to it by the - // parser thread. - // - inline - void blockingEvaluate(); - - /////////////////////////// - // The parser thread calls this method to ask the scheduler to run - // the given Iterate when the dependence on that Iterate has been - // satisfied. - // - inline void handOff(Iterate* it); + IterateScheduler() + : generation_m(0) + {} - inline - void releaseIterates() { } + ~IterateScheduler() {} -protected: -private: + void setConcurrency(int) {} - typedef std::list Container_t; - typedef Container_t::iterator Iterator_t; + /// Tells the scheduler that the parser thread is starting a new + /// data-parallel statement. Any Iterate that is handed off to the + /// scheduler between beginGeneration() and endGeneration() belongs + /// to the same data-paralllel statement and therefore has the same + /// generation number. + /// Nested invocations are handled as being part of the outermost + /// generation. -}; + void beginGeneration() + { + // Ensure proper overflow behavior. + if (++generation_m < 0) + generation_m = 0; + generationStack_m.push(generation_m); + } -//----------------------------------------------------------------------------- + /// Tells the scheduler that no more Iterates will be handed off for + /// the data parallel statement that was begun with a + /// beginGeneration(). -/*------------------------------------------------------------------------ -CLASS - Iterate_SerialAsync - - Iterate is used to implement the SerialAsync - scheduling policy. - -KEYWORDS - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. - -DESCRIPTION - An Iterate is a non-blocking unit of concurrency that is used - to describe a chunk of work. It inherits from the Runnable - class and as all subclasses of Runnable, the user specializes - the run() method to specify the operation. - Iterate is a further specialization of the - Iterate class to use the SerialAsync Scheduling algorithm to - generate the data dependency graph for a data-driven - execution. */ + void endGeneration() + { + PAssert(inGeneration()); + generationStack_m.pop(); -template<> -class Iterate : public Runnable -{ - friend class IterateScheduler; - friend class DataObject; +#if POOMA_MPI + // this is a safe point to block until we have "lots" of MPI Requests + if (!inGeneration()) + while (!SystemContext::haveLotsOfMPIRequests()) + SystemContext::runSomething(true); +#endif + } -public: + /// Wether we are inside a generation and may not safely block. - typedef DataObject DataObject_t; - typedef IterateScheduler IterateScheduler_t; + bool inGeneration() const + { + return !generationStack_m.empty(); + } + /// What the current generation is. - /////////////////////////// - // The Constructor for this class takes the IterateScheduler and a - // CPU affinity. CPU affinity has a default value of -1 which means - // it may run on any CPU available. - // - inline Iterate(IterateScheduler & s, int affinity=-1); - - /////////////////////////// - // The dtor is virtual because the subclasses will need to add to it. - // - virtual ~Iterate() {} + int generation() const + { + if (!inGeneration()) + return -1; + return generationStack_m.top(); + } - /////////////////////////// - // The run method does the core work of the Iterate. - // It is supplied by the subclass. - // - virtual void run() = 0; + /// The parser thread calls this method to evaluate the generated + /// graph until all the nodes in the dependence graph has been + /// executed by the scheduler. That is to say, the scheduler + /// executes all the Iterates that has been handed off to it by the + /// parser thread. - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int affinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int hintAffinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void affinity(int) {} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void hintAffinity(int) {} + void blockingEvaluate() + { + if (inGeneration()) { + // It's not safe to block inside a generation, so + // just do as much as we can without blocking. + while (SystemContext::runSomething(false)) + ; + + } else { + // Loop as long as there is anything in the queue. + while (SystemContext::workReady()) + SystemContext::runSomething(true); + } + } - /////////////////////////// - // Notify is used to indicate to the Iterate that one of the data - // objects it had requested has been granted. To do this, we dec a - // dependence counter which, if equal to 0, the Iterate is ready for - // execution. - // - inline void notify(); - - /////////////////////////// - // How many notifications remain? - // - inline - int notifications() const { return notifications_m; } + /// The parser thread calls this method to ask the scheduler to run + /// the given Iterate when the dependence on that Iterate has been + /// satisfied. - inline void addNotification() + void handOff(Iterate* it) { - notifications_m++; + // No action needs to be taken here. Iterates will make their + // own way into the execution queue. + it->generation() = generation(); + it->notify(); } -protected: + void releaseIterates() { } - // What scheduler are we working with? - IterateScheduler &scheduler_m; +private: - // How many notifications should we receive before we can run? - int notifications_m; + typedef std::list Container_t; + typedef Container_t::iterator Iterator_t; -private: - // Set notifications dynamically and automatically every time a - // request is made by the iterate - void incr_notifications() { notifications_m++;} + static std::stack generationStack_m; + int generation_m; }; -//----------------------------------------------------------------------------- - -/*------------------------------------------------------------------------ -CLASS - DataObject_SerialAsync - - Implements a asynchronous scheduler for a data driven execution. -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - The DataObject Class is used introduce a type to represent - a resources (normally) blocks of data) that Iterates contend - for atomic access. Iterates make request for either a read or - write to the DataObjects. DataObjects may grant the request if - the object is currently available. Otherwise, the request is - enqueue in a queue private to the data object until the - DataObject is release by another Iterate. A set of read - requests may be granted all at once if there are no - intervening write request to that DataObject. - DataObject is a specialization of DataObject for - the policy template SerialAsync. -*/ +/** + * Implements a asynchronous scheduler for a data driven execution. + * + * The DataObject Class is used introduce a type to represent + * a resources (normally) blocks of data) that Iterates contend + * for atomic access. Iterates make request for either a read or + * write to the DataObjects. DataObjects may grant the request if + * the object is currently available. Otherwise, the request is + * enqueue in a queue private to the data object until the + * DataObject is release by another Iterate. A set of read + * requests may be granted all at once if there are no + * intervening write request to that DataObject. + * DataObject is a specialization of DataObject for + * the policy template SerialAsync. + * + * There are two ways data can be used: to read or to write. + * Don't change this to give more than two states; + * things inside depend on that. + */ template<> class DataObject @@ -413,54 +513,56 @@ typedef IterateScheduler IterateScheduler_t; typedef Iterate Iterate_t; - // There are two ways data can be used: to read or to write. - // Don't change this to give more than two states: - // things inside depend on that. - - /////////////////////////// - // Construct the data object with an empty set of requests - // and the given affinity. - // - inline DataObject(int affinity=-1); + + /// Construct the data object with an empty set of requests + /// and the given affinity. + + DataObject(int affinity=-1) + : released_m(queue_m.end()), notifications_m(0) + { + // released_m to the end of the queue (which should) also be the + // beginning. notifications_m to zero, since nothing has been + // released yet. + } - /////////////////////////// - // for compatibility with other SMARTS schedulers, accept - // Scheduler arguments (unused) - // - inline - DataObject(int affinity, IterateScheduler&); - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline int affinity() const { return 0; } - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline void affinity(int) {} + /// for compatibility with other SMARTS schedulers, accept + /// Scheduler arguments (unused) - /////////////////////////// - // An iterate makes a request for a certain action in a certain - // generation. - // - inline - void request(Iterate&, SerialAsync::Action); - - /////////////////////////// - // An iterate finishes and tells the DataObject it no longer needs - // it. If this is the last release for the current set of - // requests, have the IterateScheduler release some more. - // - inline void release(SerialAsync::Action); + inline DataObject(int affinity, IterateScheduler&) + : released_m(queue_m.end()), notifications_m(0) + {} + + /// Stub out affinity because there is no affinity in serial. + + int affinity() const { return 0; } + + /// Stub out affinity because there is no affinity in serial. + + void affinity(int) {} + + /// An iterate makes a request for a certain action in a certain + /// generation. + + inline void request(Iterate&, SerialAsync::Action); + + /// An iterate finishes and tells the DataObject it no longer needs + /// it. If this is the last release for the current set of + /// requests, have the IterateScheduler release some more. + + void release(SerialAsync::Action) + { + if (--notifications_m == 0) + releaseIterates(); + } -protected: private: - // If release needs to let more iterates go, it calls this. + /// If release needs to let more iterates go, it calls this. inline void releaseIterates(); - // The type for a request. + /** + * The type for a request. + */ class Request { public: @@ -475,135 +577,27 @@ SerialAsync::Action act_m; }; - // The type of the queue and iterator. + /// The type of the queue and iterator. typedef std::list Container_t; typedef Container_t::iterator Iterator_t; - // The list of requests from various iterates. - // They're granted in FIFO order. + /// The list of requests from various iterates. + /// They're granted in FIFO order. Container_t queue_m; - // Pointer to the last request that has been granted. + /// Pointer to the last request that has been granted. Iterator_t released_m; - // The number of outstanding notifications. + /// The number of outstanding notifications. int notifications_m; }; -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for -// IterateScheduler -// -////////////////////////////////////////////////////////////////////// - -// -// IterateScheduler::handOff(Iterate*) -// No action needs to be taken here. Iterates will make their -// own way into the execution queue. -// - -inline void -IterateScheduler::handOff(Iterate* it) -{ - it->notify(); -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for Iterate -// -////////////////////////////////////////////////////////////////////// - -// -// Iterate::Iterate -// Construct with the scheduler and the number of notifications. -// Ignore the affinity. -// - -inline -Iterate::Iterate(IterateScheduler& s, int) -: scheduler_m(s), notifications_m(1) -{ -} - -// -// Iterate::notify -// Notify the iterate that a DataObject is ready. -// Decrement the counter, and if it is zero, alert the scheduler. -// - -inline void -Iterate::notify() -{ - if ( --notifications_m == 0 ) - { - add(this); - } -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for DataObject -// -////////////////////////////////////////////////////////////////////// - -// -// DataObject::DataObject() -// Initialize: -// released_m to the end of the queue (which should) also be the -// beginning. notifications_m to zero, since nothing has been -// released yet. -// - -inline -DataObject::DataObject(int) -: released_m(queue_m.end()), notifications_m(0) -{ -} - -// -// void DataObject::release(Action) -// An iterate has finished and is telling the DataObject that -// it is no longer needed. -// +/// void DataObject::releaseIterates(SerialAsync::Action) +/// When the last released iterate dies, we need to +/// look at the beginning of the queue and tell more iterates +/// that they can access this data. inline void -DataObject::release(SerialAsync::Action) -{ - if ( --notifications_m == 0 ) - releaseIterates(); -} - - - -//----------------------------------------------------------------------------- -// -// void IterateScheduler::blockingEvaluate -// Evaluate all the iterates in the queue. -// -//----------------------------------------------------------------------------- -inline -void -IterateScheduler::blockingEvaluate() -{ - // Loop as long as there is anything in the queue. - while (SystemContext::workReady()) - { - SystemContext::runSomething(); - } -} - -//----------------------------------------------------------------------------- -// -// void DataObject::releaseIterates(SerialAsync::Action) -// When the last released iterate dies, we need to -// look at the beginning of the queue and tell more iterates -// that they can access this data. -// -//----------------------------------------------------------------------------- -inline -void DataObject::releaseIterates() { // Get rid of the reservations that have finished. @@ -622,14 +616,17 @@ released_m->iterate().notify(); ++notifications_m; - // Record what action that one will take. + // Record what action that one will take + // and record its generation number SerialAsync::Action act = released_m->act(); + int generation = released_m->iterate().generation(); // Look at the next iterate. ++released_m; // If the first one was a read, release more. if ( act == SerialAsync::Read ) + { // As long as we aren't at the end and we have more reads... while ((released_m != end) && @@ -642,29 +639,30 @@ // And go on to the next. ++released_m; } + + } + } } +/// void DataObject::request(Iterate&, action) +/// An iterate makes a reservation with this DataObject for a given +/// action in a given generation. The request may be granted +/// immediately. -// -// void DataObject::request(Iterate&, action) -// An iterate makes a reservation with this DataObject for a given -// action in a given generation. The request may be granted -// immediately. -// -inline -void +inline void DataObject::request(Iterate& it, SerialAsync::Action act) { // The request can be granted immediately if: // The queue is currently empty, or - // The request is a read and everything in the queue is a read. + // the request is a read and everything in the queue is a read, + // or (with relaxed conditions), everything is the same generation. // Set notifications dynamically and automatically // every time a request is made by the iterate - it.incr_notifications(); + it.notifications_m++; bool allReleased = (queue_m.end() == released_m); bool releasable = queue_m.empty() || @@ -691,17 +689,11 @@ } -//---------------------------------------------------------------------- - - -// -// End of Smarts namespace. -// -} +} // namespace Smarts ////////////////////////////////////////////////////////////////////// -#endif // POOMA_PACKAGE_CLASS_H +#endif // _SerialAsync_h_ /*************************************************************************** * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2000-04-12 02:08:06.000000000 +0200 +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2004-01-02 00:40:16.000000000 +0100 @@ -82,6 +82,12 @@ std::list SystemContext::workQueueMessages_m; std::list SystemContext::workQueue_m; +#if POOMA_MPI + MPI_Request SystemContext::requests_m[1024]; + std::map SystemContext::allocated_requests_m; + std::set SystemContext::free_requests_m; +#endif +std::stack IterateScheduler::generationStack_m; } From rguenth at tat.physik.uni-tuebingen.de Fri Jan 2 15:36:47 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 2 Jan 2004 16:36:47 +0100 (CET) Subject: [PATCH] OpenMP support Message-ID: Hi Jeffrey, would you please look at "[PATCH] OpenMP loop level parallelism" mail I sent Dec23? Additionally to this patch I propose the following, which adds a --openmp switch to configure. Tested with gcc (with and without --openmp, which is the same here) and Intel icpc (with and without --openmp, which makes a difference here). Ok? Thanks, Richard. 2004Jan02 Richard Guenther * config/arch/LINUXICC.conf: don't warn about unused #pragmas. configure: add --openmp switch. scripts/configure.ac: add test to detect wether -openmp works. scripts/configure: regenerated. diff -Nru a/r2/config/arch/LINUXICC.conf b/r2/config/arch/LINUXICC.conf --- a/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 +++ b/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 @@ -170,8 +170,8 @@ ### debug or optimized build settings for C++ applications -$cppdbg_app = "-g"; -$cppopt_app = "-DNOPAssert -DNOCTAssert -O2"; +$cppdbg_app = "-g -wd161"; +$cppopt_app = "-DNOPAssert -DNOCTAssert -O2 -wd161"; ### debug or optimized build settings for C++ libraries diff -Nru a/r2/configure b/r2/configure --- a/r2/configure Fri Jan 2 16:32:14 2004 +++ b/r2/configure Fri Jan 2 16:32:14 2004 @@ -170,6 +170,7 @@ $prefixnm = "--prefix"; $serialnm = "--serial"; $threadsnm = "--threads"; +$openmpnm = "--openmp"; $profilenm = "--profile"; $insurenm = "--insure"; $debugnm = "--debug"; @@ -237,7 +238,8 @@ [$finternm, "", "include fortran support libraries."], [$nofinternm, "", "do not include the fortran libraries."], [$serialnm, "", "configure to run serially, no parallelism."], - [$threadsnm, "", "include threads capability, if available."], + [$threadsnm, "", "include threads capability, if available."], + [$openmpnm, "", "enable use of OpenMP, if available."], [$cheetahnm, "", "enable use of CHEETAH communications package."], [$schednm, "", "use for thread scheduling."], [$pawsnm, "", "enable PAWS program coupling, if available."], @@ -434,6 +436,10 @@ $threads_include_makefile = ""; $cpp_threads_arg = ""; +### include OpenMP capability? +$openmp = 0; +$openmpargs = ""; + ### if threads is used, what scheduler should be employed? $scheduler = $schedulerdefault; @@ -1307,9 +1313,9 @@ sub setthreads { # set $threads variable properly - if (scalar @{$arghash{$threadsnm}} > 1 and scalar @{$arghash{$serialnm}}> 1) + if (scalar @{$arghash{$threadsnm}} > 1 and (scalar @{$arghash{$serialnm}} > 1 or scalar @{$arghash{$openmpnm}} > 1)) { - printerror "$threadsnm and $serialnm both given. Use only one."; + printerror "$threadsnm and $serialnm or $openmpnm given. Use only one."; } elsif (not $threads_able or scalar @{$arghash{$serialnm}} > 1) { @@ -1438,6 +1444,13 @@ $pooma_reorder_iterates = $threads || ($scheduler eq "serialAsync"); add_yesno_define("POOMA_REORDER_ITERATES", $pooma_reorder_iterates); + + # OpenMP support + if (scalar @{$arghash{$openmpnm}} > 1) + { + $openmpargs = "\@openmpargs\@"; + } + } @@ -1936,20 +1949,20 @@ print FSUITE "LD = $link\n"; print FSUITE "\n"; print FSUITE "### flags for applications\n"; - print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $cppshare $cppopt_lib\n"; - print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $cppshare $cppdbg_lib\n"; - print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $cppopt_app\n"; - print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $cppdbg_app\n"; + print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppopt_lib\n"; + print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppdbg_lib\n"; + print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; + print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; print FSUITE "\n"; - print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $cshare $copt_lib\n"; - print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $cshare $cdbg_lib\n"; - print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $copt_app\n"; - print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $cdbg_app\n"; + print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $openmpargs $cshare $copt_lib\n"; + print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $openmpargs $cshare $cdbg_lib\n"; + print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $openmpargs $copt_app\n"; + print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; print FSUITE "\n"; - print FSUITE "F77_OPT_LIB_ARGS = $f77args $f77share $f77opt_lib\n"; - print FSUITE "F77_DBG_LIB_ARGS = $f77args $f77share $f77dbg_lib\n"; - print FSUITE "F77_OPT_APP_ARGS = $f77args $f77opt_app\n"; - print FSUITE "F77_DBG_APP_ARGS = $f77args $f77dbg_app\n"; + print FSUITE "F77_OPT_LIB_ARGS = $f77args $openmpargs $f77share $f77opt_lib\n"; + print FSUITE "F77_DBG_LIB_ARGS = $f77args $openmpargs $f77share $f77dbg_lib\n"; + print FSUITE "F77_OPT_APP_ARGS = $f77args $openmpargs $f77opt_app\n"; + print FSUITE "F77_DBG_APP_ARGS = $f77args $openmpargs $f77dbg_app\n"; print FSUITE "\n"; if ($shared) { print FSUITE "AR_OPT_ARGS = $arshareopt\n"; diff -Nru a/r2/scripts/configure.ac b/r2/scripts/configure.ac --- a/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 +++ b/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 @@ -352,6 +352,31 @@ dnl +dnl Check for compiler argument for OpenMP support +dnl + +AC_MSG_CHECKING([for way to enable OpenMP support]) +acx_saved_cxxflags=$CXXFLAGS +CXXFLAGS="$CXXFLAGS -openmp" +AC_TRY_LINK([ +#include +], [ + double d[128]; +#pragma omp parallel for + for (int i=0; i<128; ++i) + d[i] = 1.0; + omp_get_max_threads(); +], [ +AC_MSG_RESULT([-openmp]) +openmpargs="-openmp" +], [ +AC_MSG_RESULT([none]) +]) +CXXFLAGS=$acx_saved_cxxflags +AC_SUBST(openmpargs) + + +dnl dnl Check on how to get failure on unrecognized pragmas dnl gcc: -Wunknown-pragmas -Werror dnl icpc: -we161 From oldham at codesourcery.com Fri Jan 2 20:01:07 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 02 Jan 2004 12:01:07 -0800 Subject: [pooma-dev] CVS down? In-Reply-To: References: Message-ID: <3FF5CE03.4090907@codesourcery.com> Richard Guenther wrote: > Hi! > > $ traceroute pooma.codesourcery.com > traceroute to pooma.codesourcery.com (65.73.237.138), 30 hops max, 38 byte > packets > 1 kolme.hamnixda.de (192.168.100.254) 5.260 ms 2.008 ms 1.801 ms > 2 217.5.98.157 (217.5.98.157) 69.914 ms 61.777 ms 61.311 ms > 3 217.237.156.218 (217.237.156.218) 60.429 ms 59.232 ms 60.236 ms > 4 WAS-E4.WAS.US.NET.DTAG.DE (62.154.14.134) 191.891 ms 165.471 ms > 162.642 ms > 5 62.156.138.210 (62.156.138.210) 168.278 ms 173.438 ms 168.534 ms > 6 bpr2-ae0.VirginiaEquinix.cw.net (208.173.50.253) 167.892 ms 170.474 > ms 168.912 ms > 7 208.173.50.242 (208.173.50.242) 163.327 ms 166.910 ms 203.087 ms > 8 p7-3.cr01.mcln.eli.net (207.173.114.129) 171.543 ms !H 165.860 ms !H * > > Oh, btw. www.codesourcery.com seems to be down, too ((61) Connection > refused). Thank you for the report of the difficulties. 01 January, CodeSourcery moved its machines, which now have different IP addresses. As the new DNS entries move through the Internet, these problems will disappear. We apologize for any difficulties these changes may have caused. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:12:38 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:12:38 -0800 Subject: [PATCH] Add --mpi configure switch In-Reply-To: References: Message-ID: <3FF9D346.9040206@codesourcery.com> Richard Guenther wrote: > Hi! > > This (finally) adds --mpi configure switch to enable POOMA_MPI. It checks > for mpiCC or mpic++ in either $MPICH_ROOT/bin or the current $PATH and > uses the first one found as new $cpp and $link. > > I didn't change the Cheetah configure switch which now has the slightly > confusing name --messaging. Maybe we want to change this to --cheetah. > > Ok? Yes. This is good progress. > I'll start full testing of serial, MPI and Cheetah to see if I forgot a > part of the changes after the pending stuff is committed. > > Thanks, > > Richard. > > > 2004Jan02 Richard Guenther > > * configure: add --mpi switch to enable MPI messaging using > mpiCC/mpic++. > > --- /home/richard/src/pooma/cvs/r2/configure 2003-12-30 18:19:29.000000000 +0100 > +++ configure 2004-01-02 00:40:10.000000000 +0100 > @@ -209,8 +208,9 @@ > $hdf5nm = "--hdf5"; > $fftwnm = "--fftw"; > $cheetahnm = "--messaging"; > +$mpinm = "--mpi"; > $strictnm = "--strict"; > $archfnsnm = "--arch-specific-functions"; > > ### configure options > $dbgprntnm = "-v"; # turn on verbose output from configure > @@ -236,10 +237,11 @@ > [$sharednm, "", "create a shared library."], > [$finternm, "", "include fortran support libraries."], > [$nofinternm, "", "do not include the fortran libraries."], > [$preinstnm, "", "build preinstantiations of several classes."], > [$serialnm, "", "configure to run serially, no parallelism."], > - [$threadsnm, "", "include threads capability, if available."], > + [$threadsnm, "", "include threads capability, if available."], > [$cheetahnm, "", "enable use of CHEETAH communications package."], > + [$mpinm, "", "enable use of MPI communications package."], > [$schednm, "", "use for thread scheduling."], > [$pawsnm, "", "enable PAWS program coupling, if available."], > [$pawsdevnm, "", "enable PAWS program coupling for PAWS devel."], > @@ -1276,13 +1266,22 @@ > { > $cheetah = 1; > } > - print "Set cheetah = $cheetah\n" if $dbgprnt; > + if (scalar @{$arghash{$mpinm}} > 1) > + { > + $mpi = 1; > + } > $messaging = $cheetah + $mpi; > + if ($messaging>1 or $messaging and scalar @{$arghash{$serialnm}}> 1) > + { > + printerror "$cheetahnm and/or $mpinm and/or $serialnm given. Use only one."; > + } > + print "Set messaging = $messaging\n" if $dbgprnt; > > # add a define indicating whether CHEETAH/MPI is available, and configure > # extra options to include and define lists > my $defmessaging = $messaging; > my $defcheetah = 0; > + my $defmpi = 0; > if ($cheetah) > { > if (exists $ENV{"CHEETAHDIR"}) > @@ -1299,7 +1298,6 @@ > } > > $defcheetah = 1; > - > $scheduler = "serialAsync"; > > # add in the extra compilation settings for CHEETAH. > @@ -1315,8 +1313,40 @@ > $link = $cheetah_link; > } > } > + elsif ($mpi) > + { > + my $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; > + if (system("test -x $MPICH_ROOT/bin/mpiCC") == 0) > + { > + $mpiCC = "\$(MPICH_ROOT)/bin/mpiCC"; > + } > + elsif (system("test -x $MPICH_ROOT/bin/mpic++") == 0) > + { > + $mpiCC = "\$(MPICH_ROOT)/bin/mpic++"; > + } > + elsif (system("which mpiCC") == 0) > + { > + $mpiCC = "mpiCC"; > + } > + elsif (system("which mpic++") == 0) > + { > + $mpiCC = "mpic++"; > + } > + else > + { > + die "There is no known MPI location. Select one by setting MPICH_ROOT or adjusting your PATH.\n"; > + } > + > + $defmpi = 1; > + $scheduler = "serialAsync"; > + > + # use special compiler script for MPI. > + $cpp = $mpiCC; > + $link = $mpiCC; > + } > add_yesno_define("POOMA_MESSAGING", $defmessaging); > add_yesno_define("POOMA_CHEETAH", $defcheetah); > + add_yesno_define("POOMA_MPI", $defmpi); > } > > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:30:43 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:30:43 -0800 Subject: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: Message-ID: <3FF9D783.5030504@codesourcery.com> Richard Guenther wrote: > The patch was tested as usual. > > Ok to commit? I have some questions and comments interspersed below. > Thanks, Richard. > > > 2004Jan02 Richard Guenther > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > add std::stack for generation tracking, add support for > asyncronous MPI requests. Add an 'h' to spell asynchronous. > src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define > new static variables. > > --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.h 2000-06-09 00:16:50.000000000 +0200 > +++ Threads/IterateSchedulers/SerialAsync.h 2004-01-02 00:40:16.000000000 +0100 > @@ -42,48 +42,38 @@ > // DataObject > //----------------------------------------------------------------------------- > > -#include > - > #ifndef _SerialAsync_h_ > #define _SerialAsync_h_ > -/* > -LIBRARY: > - SerialAsync > - > -CLASSES: IterateScheduler > - > -CLASSES: DataObject > - > -CLASSES: Iterate > - > -OVERVIEW > - SerialAsync IterateScheduler is a policy template to create a > - dependence graphs and executes the graph respecting the > - dependencies without using threads. There is no parallelism, > - but Iterates may be executed out-of-order with respect to the > - program text. > - > ------------------------------------------------------------------------------*/ > - > -////////////////////////////////////////////////////////////////////// > > -//----------------------------------------------------------------------------- > -// Overview: > -// Smarts classes for times when you want no threads but you do want > -// dataflow evaluation. > -//----------------------------------------------------------------------------- > - > -//----------------------------------------------------------------------------- > -// Typedefs: > -//----------------------------------------------------------------------------- > +/** @file > + * @ingroup IterateSchedulers > + * @brief > + * Smarts classes for times when you want no threads but you do want > + * dataflow evaluation. > + * > + * SerialAsync IterateScheduler is a policy template to create a > + * dependence graphs and executes the graph respecting the > + * dependencies without using threads. > + * There is no (thread level) parallelism, but Iterates may be executed > + * out-of-order with respect to the program text. Also this scheduler is > + * used for message based parallelism in which case asyncronous execution > + * leads to reduced communication latencies. > + */ > > //----------------------------------------------------------------------------- > // Includes: > //----------------------------------------------------------------------------- > > #include > +#include > +#include > +#include > +#include > +#include > #include "Threads/IterateSchedulers/IterateScheduler.h" > #include "Threads/IterateSchedulers/Runnable.h" > +#include "Tulip/Messaging.h" > +#include "Utilities/PAssert.h" > > //----------------------------------------------------------------------------- > // Forward Declarations: > @@ -94,76 +84,261 @@ > > namespace Smarts { > > -#define MYID 0 > -#define MAX_CPUS 1 > -// > -// Tag class for specializing IterateScheduler, Iterate and DataObject. > -// > +/** > + * Tag class for specializing IterateScheduler, Iterate and DataObject. > + */ > + > struct SerialAsync > { > - enum Action { Read, Write}; > + enum Action { Read, Write }; > }; > > > -//----------------------------------------------------------------------------- > +/** > + * Iterate is used to implement the SerialAsync > + * scheduling policy. > + * > + * An Iterate is a non-blocking unit of concurrency that is used > + * to describe a chunk of work. It inherits from the Runnable > + * class and as all subclasses of Runnable, the user specializes > + * the run() method to specify the operation. > + * Iterate is a further specialization of the > + * Iterate class to use the SerialAsync Scheduling algorithm to > + * generate the data dependency graph for a data-driven > + * execution. > + */ > + > +template<> > +class Iterate : public Runnable > +{ > + friend class IterateScheduler; > + friend class DataObject; > + > +public: > + > + typedef DataObject DataObject_t; > + typedef IterateScheduler IterateScheduler_t; > + > + > + /// The Constructor for this class takes the IterateScheduler and a > + /// CPU affinity. CPU affinity has a default value of -1 which means > + /// it may run on any CPU available. > + > + inline Iterate(IterateScheduler & s, int affinity=-1) > + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) > + {} > + > + /// The dtor is virtual because the subclasses will need to add to it. > + > + virtual ~Iterate() {} > + > + /// The run method does the core work of the Iterate. > + /// It is supplied by the subclass. > + > + virtual void run() = 0; > + > + //@name Stubs for the affinities > + /// There is no such thing in serial. > + //@{ > + > + inline int affinity() const {return 0;} > + > + inline int hintAffinity() const {return 0;} > + > + inline void affinity(int) {} > + > + inline void hintAffinity(int) {} > + > + //@} > + > + /// Notify is used to indicate to the Iterate that one of the data > + /// objects it had requested has been granted. To do this, we dec a > + /// dependence counter which, if equal to 0, the Iterate is ready for > + /// execution. > + > + void notify() > + { > + if (--notifications_m == 0) > + add(this); > + } > + > + /// How many notifications remain? > + > + int notifications() const { return notifications_m; } > + > + void addNotification() { notifications_m++; } > + > + int& generation() { return generation_m; } > + > + int& togo() { return togo_m; } > + > +protected: > + > + /// What scheduler are we working with? > + IterateScheduler &scheduler_m; > + > + /// How many notifications should we receive before we can run? > + int notifications_m; > + > + /// Which generation we were issued in. > + int generation_m; > + > + /// How many times we need to go past a "did something" to be ready > + /// for destruction? > + int togo_m; > + > +}; > + > + > +/** > + * FIXME. > + */ I am wary of adding unfinished code to the code base. At the very least, we need a more extensive comment describing what is not finished. > struct SystemContext > { > void addNCpus(int) {} > void wait() {} > void concurrency(int){} > - int concurrency() {return 1;} > + int concurrency() { return 1; } > void mustRunOn() {} > > // We have a separate message queue because they are > // higher priority. > + typedef Iterate *IteratePtr_t; > static std::list workQueueMessages_m; > static std::list workQueue_m; > +#if POOMA_MPI > + static MPI_Request requests_m[1024]; What is this fixed constant of 1024? Does this come from the MPI standard? > + static std::map allocated_requests_m; > + static std::set free_requests_m; > +#endif > + > + > +#if POOMA_MPI > > - /////////////////////////// > - // This function lets you check if there are iterates that are > - // ready to run. > - inline static > - bool workReady() > + /// Query, if we have lots of MPI_Request slots available > + > + static bool haveLotsOfMPIRequests() > { > - return !(workQueue_m.empty() && workQueueMessages_m.empty()); > + return free_requests_m.size() > 1024/2; > } > > - /////////////////////////// > - // Run an iterate if one is ready. > - inline static > - void runSomething() > + /// Get a MPI_Request slot, associated with an iterate > + > + static MPI_Request* getMPIRequest(IteratePtr_t p) > { > - if (!workQueueMessages_m.empty()) > - { > - // Get the top iterate. > - // Delete it from the queue. > - // Run the iterate. > - // Delete the iterate. This could put more iterates in the queue. > + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); > + int i = *free_requests_m.begin(); > + free_requests_m.erase(free_requests_m.begin()); > + allocated_requests_m[i] = p; > + p->togo()++; > + return &requests_m[i]; > + } > > - RunnablePtr_t p = workQueueMessages_m.front(); > - workQueueMessages_m.pop_front(); > - p->execute(); > + static void releaseMPIRequest(int i) > + { > + IteratePtr_t p = allocated_requests_m[i]; > + allocated_requests_m.erase(i); > + free_requests_m.insert(i); > + if (--(p->togo()) == 0) > delete p; > - } > + } > + > + static bool waitForSomeRequests(bool mayBlock) > + { > + if (allocated_requests_m.empty()) > + return false; > + > + int last_used_request = allocated_requests_m.rbegin()->first; > + int finished[last_used_request+1]; > + MPI_Status statuses[last_used_request+1]; > + int nr_finished; > + int res; > + if (mayBlock) > + res = MPI_Waitsome(last_used_request+1, requests_m, > + &nr_finished, finished, statuses); > else > - { > - if (!workQueue_m.empty()) > - { > - RunnablePtr_t p = workQueue_m.front(); > - workQueue_m.pop_front(); > - p->execute(); > - delete p; > + res = MPI_Testsome(last_used_request+1, requests_m, > + &nr_finished, finished, statuses); > + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); > + if (nr_finished == MPI_UNDEFINED) > + return false; > + > + // release finised requests > + while (nr_finished--) { > + if (res == MPI_ERR_IN_STATUS) { > + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { > + char msg[MPI_MAX_ERROR_STRING+1]; > + int len; > + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); > + msg[len] = '\0'; > + PInsist(0, msg); > + } > } > + releaseMPIRequest(finished[nr_finished]); > } > + return true; > + } > + > +#else > + > + static bool waitForSomeRequests(bool mayBlock) > + { > + return false; > + } > + > +#endif > + > + > + /// This function lets you check if there are iterates that are > + /// ready to run. > + > + static bool workReady() > + { > + return !(workQueue_m.empty() > + && workQueueMessages_m.empty() > +#if POOMA_MPI > + && allocated_requests_m.empty() > +#endif > + ); > + } > + > + /// Run an iterate if one is ready. Returns if progress > + /// was made. > + > + static bool runSomething(bool mayBlock = true) > + { > + // do work in this order to minimize communication latency: > + // - issue all messages > + // - do some regular work > + // - wait for messages to complete > + > + RunnablePtr_t p = NULL; > + if (!workQueueMessages_m.empty()) { > + p = workQueueMessages_m.front(); > + workQueueMessages_m.pop_front(); > + } else if (!workQueue_m.empty()) { > + p = workQueue_m.front(); > + workQueue_m.pop_front(); > + } > + > + if (p) { > + p->execute(); > + Iterate *it = dynamic_cast(p); > + if (it) { > + if (--(it->togo()) == 0) > + delete it; > + } else > + delete p; > + return true; > + > + } else > + return waitForSomeRequests(mayBlock); > } > > }; > > -inline void addRunnable(RunnablePtr_t rn) > -{ > - SystemContext::workQueue_m.push_front(rn); > -} > +/// Adds a runnable to the appropriate work-queue. > > inline void add(RunnablePtr_t rn) > { > @@ -182,25 +357,18 @@ > inline void wait() {} > inline void mustRunOn(){} > > -/*------------------------------------------------------------------------ > -CLASS > - IterateScheduler_Serial_Async > - > - Implements a asynchronous scheduler for a data driven execution. > - Specializes a IterateScheduler. > - > -KEYWORDS > - Data-parallelism, Native-interface, IterateScheduler. > - > -DESCRIPTION > - > - The SerialAsync IterateScheduler, Iterate and DataObject > - implement a SMARTS scheduler that does dataflow without threads. > - What that means is that when you hand iterates to the > - IterateScheduler it stores them up until you call > - IterateScheduler::blockingEvaluate(), at which point it evaluates > - iterates until the queue is empty. > ------------------------------------------------------------------------------*/ > + > +/** > + * Implements a asynchronous scheduler for a data driven execution. > + * Specializes a IterateScheduler. > + * > + * The SerialAsync IterateScheduler, Iterate and DataObject > + * implement a SMARTS scheduler that does dataflow without threads. > + * What that means is that when you hand iterates to the > + * IterateScheduler it stores them up until you call > + * IterateScheduler::blockingEvaluate(), at which point it evaluates > + * iterates until the queue is empty. > + */ > > template<> > class IterateScheduler > @@ -212,196 +380,128 @@ > typedef DataObject DataObject_t; > typedef Iterate Iterate_t; > > - /////////////////////////// > - // Constructor > - // > - IterateScheduler() {} > - > - /////////////////////////// > - // Destructor > - // > - ~IterateScheduler() {} > - void setConcurrency(int) {} > - > - //--------------------------------------------------------------------------- > - // Mutators. > - //--------------------------------------------------------------------------- > - > - /////////////////////////// > - // Tells the scheduler that the parser thread is starting a new > - // data-parallel statement. Any Iterate that is handed off to the > - // scheduler between beginGeneration() and endGeneration() belongs > - // to the same data-paralllel statement and therefore has the same > - // generation number. > - // > - inline void beginGeneration() { } > - > - /////////////////////////// > - // Tells the scheduler that no more Iterates will be handed off for > - // the data parallel statement that was begun with a > - // beginGeneration(). > - // > - inline void endGeneration() {} > - > - /////////////////////////// > - // The parser thread calls this method to evaluate the generated > - // graph until all the nodes in the dependence graph has been > - // executed by the scheduler. That is to say, the scheduler > - // executes all the Iterates that has been handed off to it by the > - // parser thread. > - // > - inline > - void blockingEvaluate(); > - > - /////////////////////////// > - // The parser thread calls this method to ask the scheduler to run > - // the given Iterate when the dependence on that Iterate has been > - // satisfied. > - // > - inline void handOff(Iterate* it); > + IterateScheduler() > + : generation_m(0) > + {} > > - inline > - void releaseIterates() { } > + ~IterateScheduler() {} > > -protected: > -private: > + void setConcurrency(int) {} > > - typedef std::list Container_t; > - typedef Container_t::iterator Iterator_t; > + /// Tells the scheduler that the parser thread is starting a new > + /// data-parallel statement. Any Iterate that is handed off to the > + /// scheduler between beginGeneration() and endGeneration() belongs > + /// to the same data-paralllel statement and therefore has the same > + /// generation number. > + /// Nested invocations are handled as being part of the outermost > + /// generation. > > -}; > + void beginGeneration() > + { > + // Ensure proper overflow behavior. > + if (++generation_m < 0) > + generation_m = 0; > + generationStack_m.push(generation_m); > + } > > -//----------------------------------------------------------------------------- > + /// Tells the scheduler that no more Iterates will be handed off for > + /// the data parallel statement that was begun with a > + /// beginGeneration(). > > -/*------------------------------------------------------------------------ > -CLASS > - Iterate_SerialAsync > - > - Iterate is used to implement the SerialAsync > - scheduling policy. > - > -KEYWORDS > - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. > - > -DESCRIPTION > - An Iterate is a non-blocking unit of concurrency that is used > - to describe a chunk of work. It inherits from the Runnable > - class and as all subclasses of Runnable, the user specializes > - the run() method to specify the operation. > - Iterate is a further specialization of the > - Iterate class to use the SerialAsync Scheduling algorithm to > - generate the data dependency graph for a data-driven > - execution. */ > + void endGeneration() > + { > + PAssert(inGeneration()); > + generationStack_m.pop(); > > -template<> > -class Iterate : public Runnable > -{ > - friend class IterateScheduler; > - friend class DataObject; > +#if POOMA_MPI > + // this is a safe point to block until we have "lots" of MPI Requests > + if (!inGeneration()) > + while (!SystemContext::haveLotsOfMPIRequests()) > + SystemContext::runSomething(true); > +#endif > + } > > -public: > + /// Wether we are inside a generation and may not safely block. > > - typedef DataObject DataObject_t; > - typedef IterateScheduler IterateScheduler_t; > + bool inGeneration() const > + { > + return !generationStack_m.empty(); > + } > > + /// What the current generation is. > > - /////////////////////////// > - // The Constructor for this class takes the IterateScheduler and a > - // CPU affinity. CPU affinity has a default value of -1 which means > - // it may run on any CPU available. > - // > - inline Iterate(IterateScheduler & s, int affinity=-1); > - > - /////////////////////////// > - // The dtor is virtual because the subclasses will need to add to it. > - // > - virtual ~Iterate() {} > + int generation() const > + { > + if (!inGeneration()) > + return -1; > + return generationStack_m.top(); > + } > > - /////////////////////////// > - // The run method does the core work of the Iterate. > - // It is supplied by the subclass. > - // > - virtual void run() = 0; > + /// The parser thread calls this method to evaluate the generated > + /// graph until all the nodes in the dependence graph has been > + /// executed by the scheduler. That is to say, the scheduler > + /// executes all the Iterates that has been handed off to it by the > + /// parser thread. > > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline int affinity() const {return 0;} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline int hintAffinity() const {return 0;} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline void affinity(int) {} > - /////////////////////////// > - // Stub in all the affinities, because there is no such thing > - // in serial. > - // > - inline void hintAffinity(int) {} > + void blockingEvaluate() > + { > + if (inGeneration()) { > + // It's not safe to block inside a generation, so > + // just do as much as we can without blocking. > + while (SystemContext::runSomething(false)) > + ; > + > + } else { > + // Loop as long as there is anything in the queue. > + while (SystemContext::workReady()) > + SystemContext::runSomething(true); > + } > + } > > - /////////////////////////// > - // Notify is used to indicate to the Iterate that one of the data > - // objects it had requested has been granted. To do this, we dec a > - // dependence counter which, if equal to 0, the Iterate is ready for > - // execution. > - // > - inline void notify(); > - > - /////////////////////////// > - // How many notifications remain? > - // > - inline > - int notifications() const { return notifications_m; } > + /// The parser thread calls this method to ask the scheduler to run > + /// the given Iterate when the dependence on that Iterate has been > + /// satisfied. > > - inline void addNotification() > + void handOff(Iterate* it) > { > - notifications_m++; > + // No action needs to be taken here. Iterates will make their > + // own way into the execution queue. > + it->generation() = generation(); > + it->notify(); > } > > -protected: > + void releaseIterates() { } > > - // What scheduler are we working with? > - IterateScheduler &scheduler_m; > +private: > > - // How many notifications should we receive before we can run? > - int notifications_m; > + typedef std::list Container_t; > + typedef Container_t::iterator Iterator_t; > > -private: > - // Set notifications dynamically and automatically every time a > - // request is made by the iterate > - void incr_notifications() { notifications_m++;} > + static std::stack generationStack_m; > + int generation_m; > > }; > > > -//----------------------------------------------------------------------------- > - > -/*------------------------------------------------------------------------ > -CLASS > - DataObject_SerialAsync > - > - Implements a asynchronous scheduler for a data driven execution. > -KEYWORDS > - Data-parallelism, Native-interface, IterateScheduler. > - > -DESCRIPTION > - The DataObject Class is used introduce a type to represent > - a resources (normally) blocks of data) that Iterates contend > - for atomic access. Iterates make request for either a read or > - write to the DataObjects. DataObjects may grant the request if > - the object is currently available. Otherwise, the request is > - enqueue in a queue private to the data object until the > - DataObject is release by another Iterate. A set of read > - requests may be granted all at once if there are no > - intervening write request to that DataObject. > - DataObject is a specialization of DataObject for > - the policy template SerialAsync. > -*/ > +/** > + * Implements a asynchronous scheduler for a data driven execution. > + * > + * The DataObject Class is used introduce a type to represent > + * a resources (normally) blocks of data) that Iterates contend > + * for atomic access. Iterates make request for either a read or > + * write to the DataObjects. DataObjects may grant the request if > + * the object is currently available. Otherwise, the request is > + * enqueue in a queue private to the data object until the > + * DataObject is release by another Iterate. A set of read > + * requests may be granted all at once if there are no > + * intervening write request to that DataObject. > + * DataObject is a specialization of DataObject for > + * the policy template SerialAsync. > + * > + * There are two ways data can be used: to read or to write. > + * Don't change this to give more than two states; > + * things inside depend on that. > + */ > > template<> > class DataObject > @@ -413,54 +513,56 @@ > typedef IterateScheduler IterateScheduler_t; > typedef Iterate Iterate_t; > > - // There are two ways data can be used: to read or to write. > - // Don't change this to give more than two states: > - // things inside depend on that. > - > - /////////////////////////// > - // Construct the data object with an empty set of requests > - // and the given affinity. > - // > - inline DataObject(int affinity=-1); > + > + /// Construct the data object with an empty set of requests > + /// and the given affinity. > + > + DataObject(int affinity=-1) > + : released_m(queue_m.end()), notifications_m(0) > + { > + // released_m to the end of the queue (which should) also be the > + // beginning. notifications_m to zero, since nothing has been > + // released yet. > + } > > - /////////////////////////// > - // for compatibility with other SMARTS schedulers, accept > - // Scheduler arguments (unused) > - // > - inline > - DataObject(int affinity, IterateScheduler&); > - > - /////////////////////////// > - // Stub out affinity because there is no affinity in serial. > - // > - inline int affinity() const { return 0; } > - > - /////////////////////////// > - // Stub out affinity because there is no affinity in serial. > - // > - inline void affinity(int) {} > + /// for compatibility with other SMARTS schedulers, accept > + /// Scheduler arguments (unused) > > - /////////////////////////// > - // An iterate makes a request for a certain action in a certain > - // generation. > - // > - inline > - void request(Iterate&, SerialAsync::Action); > - > - /////////////////////////// > - // An iterate finishes and tells the DataObject it no longer needs > - // it. If this is the last release for the current set of > - // requests, have the IterateScheduler release some more. > - // > - inline void release(SerialAsync::Action); > + inline DataObject(int affinity, IterateScheduler&) > + : released_m(queue_m.end()), notifications_m(0) > + {} > + > + /// Stub out affinity because there is no affinity in serial. > + > + int affinity() const { return 0; } > + > + /// Stub out affinity because there is no affinity in serial. > + > + void affinity(int) {} > + > + /// An iterate makes a request for a certain action in a certain > + /// generation. > + > + inline void request(Iterate&, SerialAsync::Action); > + > + /// An iterate finishes and tells the DataObject it no longer needs > + /// it. If this is the last release for the current set of > + /// requests, have the IterateScheduler release some more. > + > + void release(SerialAsync::Action) > + { > + if (--notifications_m == 0) > + releaseIterates(); > + } > > -protected: > private: > > - // If release needs to let more iterates go, it calls this. > + /// If release needs to let more iterates go, it calls this. > inline void releaseIterates(); > > - // The type for a request. > + /** > + * The type for a request. > + */ > class Request > { > public: > @@ -475,135 +577,27 @@ > SerialAsync::Action act_m; > }; > > - // The type of the queue and iterator. > + /// The type of the queue and iterator. > typedef std::list Container_t; > typedef Container_t::iterator Iterator_t; > > - // The list of requests from various iterates. > - // They're granted in FIFO order. > + /// The list of requests from various iterates. > + /// They're granted in FIFO order. > Container_t queue_m; > > - // Pointer to the last request that has been granted. > + /// Pointer to the last request that has been granted. > Iterator_t released_m; > > - // The number of outstanding notifications. > + /// The number of outstanding notifications. > int notifications_m; > }; > > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for > -// IterateScheduler > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// IterateScheduler::handOff(Iterate*) > -// No action needs to be taken here. Iterates will make their > -// own way into the execution queue. > -// > - > -inline void > -IterateScheduler::handOff(Iterate* it) > -{ > - it->notify(); > -} > - > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for Iterate > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// Iterate::Iterate > -// Construct with the scheduler and the number of notifications. > -// Ignore the affinity. > -// > - > -inline > -Iterate::Iterate(IterateScheduler& s, int) > -: scheduler_m(s), notifications_m(1) > -{ > -} > - > -// > -// Iterate::notify > -// Notify the iterate that a DataObject is ready. > -// Decrement the counter, and if it is zero, alert the scheduler. > -// > - > -inline void > -Iterate::notify() > -{ > - if ( --notifications_m == 0 ) > - { > - add(this); > - } > -} > - > -////////////////////////////////////////////////////////////////////// > -// > -// Inline implementation of the functions for DataObject > -// > -////////////////////////////////////////////////////////////////////// > - > -// > -// DataObject::DataObject() > -// Initialize: > -// released_m to the end of the queue (which should) also be the > -// beginning. notifications_m to zero, since nothing has been > -// released yet. > -// > - > -inline > -DataObject::DataObject(int) > -: released_m(queue_m.end()), notifications_m(0) > -{ > -} > - > -// > -// void DataObject::release(Action) > -// An iterate has finished and is telling the DataObject that > -// it is no longer needed. > -// > +/// void DataObject::releaseIterates(SerialAsync::Action) > +/// When the last released iterate dies, we need to > +/// look at the beginning of the queue and tell more iterates > +/// that they can access this data. > > inline void > -DataObject::release(SerialAsync::Action) > -{ > - if ( --notifications_m == 0 ) > - releaseIterates(); > -} > - > - > - > -//----------------------------------------------------------------------------- > -// > -// void IterateScheduler::blockingEvaluate > -// Evaluate all the iterates in the queue. > -// > -//----------------------------------------------------------------------------- > -inline > -void > -IterateScheduler::blockingEvaluate() > -{ > - // Loop as long as there is anything in the queue. > - while (SystemContext::workReady()) > - { > - SystemContext::runSomething(); > - } > -} > - > -//----------------------------------------------------------------------------- > -// > -// void DataObject::releaseIterates(SerialAsync::Action) > -// When the last released iterate dies, we need to > -// look at the beginning of the queue and tell more iterates > -// that they can access this data. > -// > -//----------------------------------------------------------------------------- > -inline > -void > DataObject::releaseIterates() > { > // Get rid of the reservations that have finished. > @@ -622,14 +616,17 @@ > released_m->iterate().notify(); > ++notifications_m; > > - // Record what action that one will take. > + // Record what action that one will take > + // and record its generation number > SerialAsync::Action act = released_m->act(); > + int generation = released_m->iterate().generation(); > > // Look at the next iterate. > ++released_m; > > // If the first one was a read, release more. > if ( act == SerialAsync::Read ) > + { > > // As long as we aren't at the end and we have more reads... > while ((released_m != end) && > @@ -642,29 +639,30 @@ > // And go on to the next. > ++released_m; > } > + > + } > + > } > } > > +/// void DataObject::request(Iterate&, action) > +/// An iterate makes a reservation with this DataObject for a given > +/// action in a given generation. The request may be granted > +/// immediately. > > -// > -// void DataObject::request(Iterate&, action) > -// An iterate makes a reservation with this DataObject for a given > -// action in a given generation. The request may be granted > -// immediately. > -// > -inline > -void > +inline void > DataObject::request(Iterate& it, > SerialAsync::Action act) > > { > // The request can be granted immediately if: > // The queue is currently empty, or > - // The request is a read and everything in the queue is a read. > + // the request is a read and everything in the queue is a read, > + // or (with relaxed conditions), everything is the same generation. > > // Set notifications dynamically and automatically > // every time a request is made by the iterate > - it.incr_notifications(); > + it.notifications_m++; > > bool allReleased = (queue_m.end() == released_m); > bool releasable = queue_m.empty() || > @@ -691,17 +689,11 @@ > } > > > -//---------------------------------------------------------------------- > - > - > -// > -// End of Smarts namespace. > -// > -} > +} // namespace Smarts > > ////////////////////////////////////////////////////////////////////// > > -#endif // POOMA_PACKAGE_CLASS_H > +#endif // _SerialAsync_h_ > > /*************************************************************************** > * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ > --- /home/richard/src/pooma/cvs/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2000-04-12 02:08:06.000000000 +0200 > +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 2004-01-02 00:40:16.000000000 +0100 > @@ -82,6 +82,12 @@ > > std::list SystemContext::workQueueMessages_m; > std::list SystemContext::workQueue_m; > +#if POOMA_MPI > + MPI_Request SystemContext::requests_m[1024]; > + std::map SystemContext::allocated_requests_m; > + std::set SystemContext::free_requests_m; > +#endif > +std::stack IterateScheduler::generationStack_m; > > } > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:32:06 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:32:06 -0800 Subject: [PATCH] Initialize MPI In-Reply-To: References: Message-ID: <3FF9D7D6.70407@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds MPI initialization. > > Ok? Yes. > Richard. > > > 2004Jan02 Richard Guenther > > * src/Pooma/Pooma.cmpl.cpp: add initialization and > finalization sequence for MPI. Pooma::blockAndEvaluate() at > finalization. > > --- /home/richard/src/pooma/cvs/r2/src/Pooma/Pooma.cmpl.cpp 2003-12-25 12:26:04.000000000 +0100 > +++ Pooma/Pooma.cmpl.cpp 2004-01-02 00:40:15.000000000 +0100 > @@ -287,10 +287,10 @@ > // we can do this in the other initialize routine by querying for > // the Cheetah options from the Options object. > > -#if POOMA_CHEETAH > - > +#if POOMA_MPI > + MPI_Init(&argc, &argv); > +#elif POOMA_CHEETAH > controller_g = new Cheetah::Controller(argc, argv); > - > #endif > > // Just create an Options object for this argc, argv set, and give that > @@ -349,12 +349,20 @@ > > // Set myContext_s and numContexts_s to the context numbers. > > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > > +#if POOMA_MPI > + MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); > + MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); > + // ugh... > + for (int i=0; i + Smarts::SystemContext::free_requests_m.insert(i); > +#elif POOMA_CHEETAH > PAssert(controller_g != 0); > > myContext_g = controller_g->mycontext(); > numContexts_g = controller_g->ncontexts(); > +#endif > > initializeCheetahHelpers(numContexts_g); > > @@ -376,14 +384,14 @@ > warnMessages(opts.printWarnings()); > errorMessages(opts.printErrors()); > > -#if POOMA_CHEETAH > - > // This barrier is here so that Pooma is initialized on all contexts > // before we continue. (Another context could invoke a remote member > // function on us before we're initialized... which would be bad.) > > +#if POOMA_MPI > + MPI_Barrier(MPI_COMM_WORLD); > +#elif POOMA_CHEETAH > controller_g->barrier(); > - > #endif > > // Initialize the Inform streams with info on how many contexts we > @@ -416,6 +424,8 @@ > > bool finalize(bool quitRTS, bool quitArch) > { > + Pooma::blockAndEvaluate(); > + > if (initialized_s) > { > // Wait for threads to finish. > @@ -426,7 +436,7 @@ > > cleanup_s(); > > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > // Clean up the Cheetah helpers. > > finalizeCheetahHelpers(); > @@ -436,15 +446,19 @@ > > if (quitRTS) > { > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > > // Deleting the controller shuts down the cross-context communication > // if this is the last thing using this controller. If something > // else is using this, Cheetah will not shut down until that item > // is destroyed or stops using the controller. > > +#if POOMA_MPI > + MPI_Finalize(); > +#elif POOMA_CHEETAH > if (controller_g != 0) > delete controller_g; > +#endif > > #endif > } > @@ -784,18 +799,18 @@ > SystemContext_t::runSomething(); > } > > -#elif POOMA_REORDER_ITERATES > +# elif POOMA_REORDER_ITERATES > > CTAssert(NO_SUPPORT_FOR_THREADS_WITH_MESSAGING); > > -#else // we're using the serial scheduler, so we only need to get messages > +# else // we're using the serial scheduler, so we only need to get messages > > while (Pooma::incomingMessages()) > { > controller_g->poll(); > } > > -#endif // schedulers > +# endif // schedulers > > #else // !POOMA_CHEETAH > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:37:50 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:37:50 -0800 Subject: [PATCH] Add MPI serializer In-Reply-To: References: Message-ID: <3FF9D92E.8060205@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch adds the serializer for MPI messaging. This is basically a > stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted > all traces of Cheetah::DELEGATE mechanism which we don't use. > > Ok? Please see the interspersed comments below. > Richard. > > > 2004Jan02 Richard Guenther > > * src/Tulip/CheetahSerialize.h: new file. > src/Tulip/Messaging.h: include it, if POOMA_MPI. > > --- /home/richard/src/pooma/cvs/r2/src/Tulip/Messaging.h 2003-12-25 12:26:35.000000000 +0100 > +++ Tulip/Messaging.h 2004-01-02 00:40:16.000000000 +0100 > @@ -49,7 +49,12 @@ > // Includes: > //----------------------------------------------------------------------------- > > -#include "Pooma/Pooma.h" > +#include "Pooma/Configuration.h" > + > +#if POOMA_MPI > +# include "Tulip/CheetahSerialize.h" > +# include > +#endif > > #if POOMA_CHEETAH > # include "Cheetah/Cheetah.h" > @@ -254,6 +259,6 @@ > // ACL:rcsinfo > // ---------------------------------------------------------------------- > // $RCSfile: Messaging.h,v $ $Author: pooma $ > -// $Revision: 1.8 $ $Date: 2003/12/25 11:26:35 $ > +// $Revision: 1.7 $ $Date: 2003/10/21 18:47:59 $ > // ---------------------------------------------------------------------- > // ACL:rcsinfo > #ifndef CHEETAH_MATCHINGHANDLER_SERIALIZE_H > #define CHEETAH_MATCHINGHANDLER_SERIALIZE_H > > //----------------------------------------------------------------------------- > // Classes: > // Cheetah > // Serialize > //----------------------------------------------------------------------------- > > //----------------------------------------------------------------------------- > // Overview: > // > // Serialize is a simple class that serializes/unserializes items to/from > // a buffer. It can be partially specialized for different types T, > // or for different general tags Tag. Provided tags are: > // > // 1. 'CHEETAH' is a simple tag type for the default case used by other parts > // of Cheetah. Objects are instantiated in place in the provided buffer. Where is number 2? > // 3. 'ARRAY' serializes arrays. API changes a little from other > // serialize tags as array length must be provided in serialize methods. > // Objects are instantiated in place in the provided buffer. > // > //----------------------------------------------------------------------------- > > //----------------------------------------------------------------------------- > // Include Files: > //----------------------------------------------------------------------------- > > #include > #include > > > namespace Cheetah { > > //---------------------------------------------------------------------- > // > // class Serialize > // > // Serialize is a class that can be specialized to pack and unpack > // items of type T to/from a provided buffer of bytes. It is used by > // the MatchingHandler to prepare and use data sent between MatchingHandler > // send and request calls. It has two template parameters: a tag, and a data > // type. The tag can be used to specialize to different categories of > // serialize operations; the data type indicates the type of data that > // will be packed or unpacked. > // > // Serialize specializations should define the following four static > // functions: > // > // // Return the storage needed to pack the item of type T > // static int size(const T &item); > // > // // Pack an item of type T into the given buffer. Return space used. > // static int pack(const T &item, char *buffer); > // > // // Unpack an item of type T from the given buffer. Set the given > // // pointer to point at this item. Return bytes unpacked. > // static int unpack(T* &p, char *buffer); > // > // // Delete the item pointed to by the given pointer, that was > // // unpacked with a previous call to unpack(). > // static void cleanup(T *p); > // > // There is a general template for this class that does nothing, > // one specialization for a tag 'CHEETAH'. > // > //---------------------------------------------------------------------- > > > //---------------------------------------------------------------------- > // Returns padding necessary for word alignment. > //---------------------------------------------------------------------- > static inline int padding(int size) > { > int extra = size % sizeof(void*); > return (extra == 0) ? 0 : sizeof(void*) - extra; > } > > > //---------------------------------------------------------------------- > // CHEETAH serialize specialization > //---------------------------------------------------------------------- > > // The general tag type used to specialize Serialize later. > > struct CHEETAH > { > inline CHEETAH() { } > inline ~CHEETAH() { } > }; > > > // The general template, that does nothing. > > template > class Serialize { }; > > > // A specialization for the CHEETAH tag that provides some default ability > // to pack items. > > template > class Serialize< ::Cheetah::CHEETAH, T> > { > public: > // Return the storage needed to pack the item of type T. > // For the default case, this is just sizeof(T), but perhaps rounded > // up to be pointer-word-size aligned. > > static inline int size(const T &) > { Remove the extra blank line. > > return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); > /* > const int off = sizeof(T) % sizeof(void *); > return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); > */ Why have the commented out code? > } > > // Pack an item of type T into the given buffer. Return space used. > // By default, this just does a placement-new into the buffer, > // assuming the storage required is sizeof(T). > > static inline int pack(const T &item, char *buffer) > { > new ((void*)buffer) T(item); > return size(item); > } > > // Unpack an item of type T from the given buffer. Set the given > // pointer to point at this item. Return bytes unpacked. > // By default, this just recasts the current buffer pointer. > > static inline int unpack(T* &p, char *buffer) > { > p = reinterpret_cast(buffer); > return size(*p); > } > > // Delete the item pointed to by the given pointer, that was > // unpacked with a previous call to unpack(). > // By default, this just runs the destructor on the data, which for > // many things will do nothing. > > static inline void cleanup(T *p) > { > p->~T(); > } > }; > > > //---------------------------------------------------------------------- > // ARRAY serialize specialization > //---------------------------------------------------------------------- > > struct ARRAY > { > inline ARRAY() { } > inline ~ARRAY() { } > }; > > > // A specialization for the POINTER tag that provides marshaling of > // arrays. > > template > class Serialize< ::Cheetah::ARRAY, T> > { > public: > > // Return the storage needed to pack count items of type T, > // This includes the bytes needed to store the size of the array. > > static inline int size(const T* items, const int& count) > { > int arraySize = count*sizeof(T); > return ( Serialize::size(count) > + arraySize + padding(arraySize) ); > } > > // Pack an item of type T into the given buffer. Return space used. > // By default, this just does a placement-new into the buffer, > // assuming the storage required is sizeof(T). > > static inline int pack(const T* items, char* buffer, const int& count) > { > int n = Serialize::pack(count, buffer); > memcpy(n+buffer, items, count*sizeof(T)); > return size(items, count); > } > > // Unpack an item of type T from the given buffer. Set the given > // pointer to point at this item. Return bytes unpacked. > > static inline int unpack(T* &p, char *buffer, int& count) > { > int* iPtr; > int n = Serialize::unpack(iPtr, buffer); > count = *iPtr; > p = reinterpret_cast(n+buffer); > return size(p, count); > } > > // Delete the item pointed to by the given pointer, that was unpacked with a > // previous call to unpack(). By default, this just runs the destructor on > // the data, which for many things will do nothing. Memory has been > // allocated from the provided buffer so no freeing of memory need be done > // here. > > static inline void cleanup(T *p) > { > p->~T(); > } > }; > > > // > // This class is used so that serialization routines can be specialized > // for either delegation (WrappedBool) or CHEETAH > // (WrappedBool). > // > > template class WrappedBool > { > public: > WrappedBool() {} > ~WrappedBool() {} > }; > > } // namespace Cheetah > > #endif // CHEETAH_MATCHINGHANDLER_SERIALIZE_H -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:39:19 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:39:19 -0800 Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: References: <3FF45821.8030605@codesourcery.com> Message-ID: <3FF9D987.3030104@codesourcery.com> Richard Guenther wrote: > On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This patch is a partial reversion of a previous patch that made us use >>>Cheetah::DELEGATE serialization for RemoteProxy. It also brings us a >>>Cheetah::CHEETAH serialization for std::string, which was previously >>>missing. One step more for the MPI merge. >>> >>>Tested together with all other MPI changes with serial, Cheetah and MPI. >>> >>>Ok? >> >>Yes. Do we need more regression tests for this work to better ensure >>correctness? > > > Maybe, at least we get all non-POD types that are not explicitly > specialized wrong during serialization. And I can tell you, such errors > are _very_ hard to find (happened for me for std::string and RemoteProxy). > > Richard. Yes, we're running the regression tests in serial, not parallel, only so testing may be hard. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 5 21:44:34 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 13:44:34 -0800 Subject: [PATCH] OpenMP support In-Reply-To: References: Message-ID: <3FF9DAC2.4020005@codesourcery.com> Richard Guenther wrote: > Hi Jeffrey, > > would you please look at "[PATCH] OpenMP loop level parallelism" mail I > sent Dec23? Additionally to this patch I propose the following, which > adds a --openmp switch to configure. The 23Dec patch is fine. > Tested with gcc (with and without --openmp, which is the same here) and > Intel icpc (with and without --openmp, which makes a difference here). > > Ok? Yes. > Thanks, > > Richard. > > > 2004Jan02 Richard Guenther > > * config/arch/LINUXICC.conf: don't warn about unused #pragmas. > configure: add --openmp switch. > scripts/configure.ac: add test to detect wether -openmp works. s/wether/whether/ > scripts/configure: regenerated. > > diff -Nru a/r2/config/arch/LINUXICC.conf b/r2/config/arch/LINUXICC.conf > --- a/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 > +++ b/r2/config/arch/LINUXICC.conf Fri Jan 2 16:32:14 2004 > @@ -170,8 +170,8 @@ > > ### debug or optimized build settings for C++ applications > > -$cppdbg_app = "-g"; > -$cppopt_app = "-DNOPAssert -DNOCTAssert -O2"; > +$cppdbg_app = "-g -wd161"; > +$cppopt_app = "-DNOPAssert -DNOCTAssert -O2 -wd161"; > > > ### debug or optimized build settings for C++ libraries > diff -Nru a/r2/configure b/r2/configure > --- a/r2/configure Fri Jan 2 16:32:14 2004 > +++ b/r2/configure Fri Jan 2 16:32:14 2004 > @@ -170,6 +170,7 @@ > $prefixnm = "--prefix"; > $serialnm = "--serial"; > $threadsnm = "--threads"; > +$openmpnm = "--openmp"; > $profilenm = "--profile"; > $insurenm = "--insure"; > $debugnm = "--debug"; > @@ -237,7 +238,8 @@ > [$finternm, "", "include fortran support libraries."], > [$nofinternm, "", "do not include the fortran libraries."], > [$serialnm, "", "configure to run serially, no parallelism."], > - [$threadsnm, "", "include threads capability, if available."], > + [$threadsnm, "", "include threads capability, if available."], > + [$openmpnm, "", "enable use of OpenMP, if available."], > [$cheetahnm, "", "enable use of CHEETAH communications package."], > [$schednm, "", "use for thread scheduling."], > [$pawsnm, "", "enable PAWS program coupling, if available."], > @@ -434,6 +436,10 @@ > $threads_include_makefile = ""; > $cpp_threads_arg = ""; > > +### include OpenMP capability? > +$openmp = 0; > +$openmpargs = ""; > + > ### if threads is used, what scheduler should be employed? > $scheduler = $schedulerdefault; > > @@ -1307,9 +1313,9 @@ > sub setthreads > { > # set $threads variable properly > - if (scalar @{$arghash{$threadsnm}} > 1 and scalar @{$arghash{$serialnm}}> 1) > + if (scalar @{$arghash{$threadsnm}} > 1 and (scalar @{$arghash{$serialnm}} > 1 or scalar @{$arghash{$openmpnm}} > 1)) > { > - printerror "$threadsnm and $serialnm both given. Use only one."; > + printerror "$threadsnm and $serialnm or $openmpnm given. Use only one."; > } > elsif (not $threads_able or scalar @{$arghash{$serialnm}} > 1) > { > @@ -1438,6 +1444,13 @@ > $pooma_reorder_iterates = $threads || ($scheduler eq "serialAsync"); > > add_yesno_define("POOMA_REORDER_ITERATES", $pooma_reorder_iterates); > + > + # OpenMP support > + if (scalar @{$arghash{$openmpnm}} > 1) > + { > + $openmpargs = "\@openmpargs\@"; > + } > + > } > > > @@ -1936,20 +1949,20 @@ > print FSUITE "LD = $link\n"; > print FSUITE "\n"; > print FSUITE "### flags for applications\n"; > - print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $cppshare $cppopt_lib\n"; > - print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $cppshare $cppdbg_lib\n"; > - print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $cppopt_app\n"; > - print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $cppdbg_app\n"; > + print FSUITE "CXX_OPT_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppopt_lib\n"; > + print FSUITE "CXX_DBG_LIB_ARGS = \@cppargs\@ $openmpargs $cppshare $cppdbg_lib\n"; > + print FSUITE "CXX_OPT_APP_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; > + print FSUITE "CXX_DBG_APP_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; > print FSUITE "\n"; > - print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $cshare $copt_lib\n"; > - print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $cshare $cdbg_lib\n"; > - print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $copt_app\n"; > - print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $cdbg_app\n"; > + print FSUITE "C_OPT_LIB_ARGS = \@cargs\@ $openmpargs $cshare $copt_lib\n"; > + print FSUITE "C_DBG_LIB_ARGS = \@cargs\@ $openmpargs $cshare $cdbg_lib\n"; > + print FSUITE "C_OPT_APP_ARGS = \@cargs\@ $openmpargs $copt_app\n"; > + print FSUITE "C_DBG_APP_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; > print FSUITE "\n"; > - print FSUITE "F77_OPT_LIB_ARGS = $f77args $f77share $f77opt_lib\n"; > - print FSUITE "F77_DBG_LIB_ARGS = $f77args $f77share $f77dbg_lib\n"; > - print FSUITE "F77_OPT_APP_ARGS = $f77args $f77opt_app\n"; > - print FSUITE "F77_DBG_APP_ARGS = $f77args $f77dbg_app\n"; > + print FSUITE "F77_OPT_LIB_ARGS = $f77args $openmpargs $f77share $f77opt_lib\n"; > + print FSUITE "F77_DBG_LIB_ARGS = $f77args $openmpargs $f77share $f77dbg_lib\n"; > + print FSUITE "F77_OPT_APP_ARGS = $f77args $openmpargs $f77opt_app\n"; > + print FSUITE "F77_DBG_APP_ARGS = $f77args $openmpargs $f77dbg_app\n"; > print FSUITE "\n"; > if ($shared) { > print FSUITE "AR_OPT_ARGS = $arshareopt\n"; > diff -Nru a/r2/scripts/configure.ac b/r2/scripts/configure.ac > --- a/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 > +++ b/r2/scripts/configure.ac Fri Jan 2 16:32:14 2004 > @@ -352,6 +352,31 @@ > > > dnl > +dnl Check for compiler argument for OpenMP support > +dnl > + > +AC_MSG_CHECKING([for way to enable OpenMP support]) > +acx_saved_cxxflags=$CXXFLAGS > +CXXFLAGS="$CXXFLAGS -openmp" > +AC_TRY_LINK([ > +#include > +], [ > + double d[128]; > +#pragma omp parallel for > + for (int i=0; i<128; ++i) > + d[i] = 1.0; > + omp_get_max_threads(); > +], [ > +AC_MSG_RESULT([-openmp]) > +openmpargs="-openmp" > +], [ > +AC_MSG_RESULT([none]) > +]) > +CXXFLAGS=$acx_saved_cxxflags > +AC_SUBST(openmpargs) > + > + > +dnl > dnl Check on how to get failure on unrecognized pragmas > dnl gcc: -Wunknown-pragmas -Werror > dnl icpc: -we161 -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:15:53 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:15:53 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FF9D783.5030504@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > The patch was tested as usual. > > > > Ok to commit? > > I have some questions and comments interspersed below. > > > Thanks, Richard. > > > > > > 2004Jan02 Richard Guenther > > > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > > add std::stack for generation tracking, add support for > > asyncronous MPI requests. > > Add an 'h' to spell asynchronous. Ok. > > +/** > > + * FIXME. > > + */ > > I am wary of adding unfinished code to the code base. At the very > least, we need a more extensive comment describing what is not finished. Oh, it's just missing documentation of struct SystemContext. I'll strip the FIXME. Ok with this change? Thanks, Richard. From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:20:02 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:20:02 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Add MPI serializer In-Reply-To: <3FF9D92E.8060205@codesourcery.com> References: <3FF9D92E.8060205@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch adds the serializer for MPI messaging. This is basically a > > stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted > > all traces of Cheetah::DELEGATE mechanism which we don't use. > > > > Ok? > > Please see the interspersed comments below. > > > // Serialize is a simple class that serializes/unserializes items to/from > > // a buffer. It can be partially specialized for different types T, > > // or for different general tags Tag. Provided tags are: > > // > > // 1. 'CHEETAH' is a simple tag type for the default case used by other parts > > // of Cheetah. Objects are instantiated in place in the provided buffer. > > Where is number 2? Number 2 was 'DELEGATE', which I stripped. I'll change 3 for 2. > > // 3. 'ARRAY' serializes arrays. API changes a little from other > > // serialize tags as array length must be provided in serialize methods. > > // Objects are instantiated in place in the provided buffer. > > // > > //----------------------------------------------------------------------------- > > static inline int size(const T &) > > { > Remove the extra blank line. Ok. > > > > return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); > > /* > > const int off = sizeof(T) % sizeof(void *); > > return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); > > */ > > Why have the commented out code? It's work in progress, I'll remove it for now. Ok with these changes? Thanks, Richard. From oldham at codesourcery.com Mon Jan 5 22:38:46 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 14:38:46 -0800 Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> Message-ID: <3FF9E776.6030803@codesourcery.com> Richard Guenther wrote: > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>The patch was tested as usual. >>> >>>Ok to commit? >> >>I have some questions and comments interspersed below. >> >> >>>Thanks, Richard. >>> >>> >>>2004Jan02 Richard Guenther >>> >>> * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, >>> add std::stack for generation tracking, add support for >>> asyncronous MPI requests. >> >>Add an 'h' to spell asynchronous. > > > Ok. > > >>>+/** >>>+ * FIXME. >>>+ */ >> >>I am wary of adding unfinished code to the code base. At the very >>least, we need a more extensive comment describing what is not finished. > > > Oh, it's just missing documentation of struct SystemContext. I'll strip > the FIXME. > > Ok with this change? I'd prefer to add some documentation, but either way it is fine. > Thanks, > > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:38:51 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:38:51 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Back to using Cheetah::CHEETAH for serialization In-Reply-To: <3FF9D987.3030104@codesourcery.com> References: <3FF45821.8030605@codesourcery.com> <3FF9D987.3030104@codesourcery.com> Message-ID: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > Yes, we're running the regression tests in serial, not parallel, only so > testing may be hard. With native MPI support coming along nicely this gets as easy as exchanging --serial for --mpi at configure time and doing > make check MPIRUN="mpirun -np 2" if using mpich support coming with your favorite distribution. But doing this with QMTest may be harder - I don't know. Richard. From oldham at codesourcery.com Mon Jan 5 22:39:20 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 05 Jan 2004 14:39:20 -0800 Subject: [pooma-dev] Re: [PATCH] Add MPI serializer In-Reply-To: References: <3FF9D92E.8060205@codesourcery.com> Message-ID: <3FF9E798.3000005@codesourcery.com> Richard Guenther wrote: > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This patch adds the serializer for MPI messaging. This is basically a >>>stripped down version of Cheetahs MatchingHandler/Serialize.h. I omitted >>>all traces of Cheetah::DELEGATE mechanism which we don't use. >>> >>>Ok? >> >>Please see the interspersed comments below. >> >> >>>// Serialize is a simple class that serializes/unserializes items to/from >>>// a buffer. It can be partially specialized for different types T, >>>// or for different general tags Tag. Provided tags are: >>>// >>>// 1. 'CHEETAH' is a simple tag type for the default case used by other parts >>>// of Cheetah. Objects are instantiated in place in the provided buffer. >> >>Where is number 2? > > > Number 2 was 'DELEGATE', which I stripped. I'll change 3 for 2. > > >>>// 3. 'ARRAY' serializes arrays. API changes a little from other >>>// serialize tags as array length must be provided in serialize methods. >>>// Objects are instantiated in place in the provided buffer. >>>// >>>//----------------------------------------------------------------------------- > > >>> static inline int size(const T &) >>> { >> >>Remove the extra blank line. > > > Ok. > > >>> return sizeof(double) * ((sizeof(T) + sizeof(double) - 1) / sizeof(double)); >>> /* >>> const int off = sizeof(T) % sizeof(void *); >>> return (sizeof(T) + (off == 0 ? 0 : sizeof(void *) - off)); >>> */ >> >>Why have the commented out code? > > > It's work in progress, I'll remove it for now. > > Ok with these changes? Yes. > Thanks, > > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 5 22:59:08 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 5 Jan 2004 23:59:08 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FF9D783.5030504@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> Message-ID: Whoops, I just noticed I didn't answer one of your questions: On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > static std::list workQueue_m; > > +#if POOMA_MPI > > + static MPI_Request requests_m[1024]; > > What is this fixed constant of 1024? Does this come from the MPI standard? > > > + static std::map allocated_requests_m; Well - it's somewhat arbitrary, but with some reason. First, with mpich an MPI_Request is an integer identifier, so 1024 requests will fill just a page of memory. Second, the mpich library seems to use poll/select on distinct sockets for which 1024 seems an appropriate upper number. Third, its about the number of in-flight requests I have with my 3d CFD code (but you may see we hard-limit here and wait for requests to finish at appropriate places). So, I dont like having this magic number either, but for the MPI standard the requests need to be allocated continuously and we need _some_ limit. Once someone has a problem with this we could make it configurable, but I dont see the point at the moment. Thus, ok again? Thanks, Richard. From rguenth at tat.physik.uni-tuebingen.de Tue Jan 6 14:07:50 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 6 Jan 2004 15:07:50 +0100 (CET) Subject: [PATCH] Fix compilation problems Message-ID: Hi! I applied the patch below as obvious which restores compilation. Richard. 2004Jan06 Richard Guenther * src/Tulip/PatchSizeSyncer.cmpl.cpp: fix missing #include. Index: PatchSizeSyncer.cmpl.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Tulip/PatchSizeSyncer.cmpl.cpp,v retrieving revision 1.7 retrieving revision 1.8 diff -u -u -r1.7 -r1.8 --- PatchSizeSyncer.cmpl.cpp 25 Dec 2003 11:26:35 -0000 1.7 +++ PatchSizeSyncer.cmpl.cpp 6 Jan 2004 14:03:25 -0000 1.8 @@ -34,7 +34,9 @@ // Includes: //----------------------------------------------------------------------------- +#include "Pooma/Configuration.h" #include "Tulip/Messaging.h" +#include "Pooma/Pooma.h" #include "Tulip/PatchSizeSyncer.h" #include "Tulip/RemoteProxy.h" #include "Tulip/CollectFromContexts.h" From oldham at codesourcery.com Tue Jan 6 18:36:23 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 10:36:23 -0800 Subject: [pooma-dev] Re: [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> Message-ID: <3FFB0027.6080509@codesourcery.com> Richard Guenther wrote: > Whoops, I just noticed I didn't answer one of your questions: > > On Mon, 5 Jan 2004, Jeffrey D. Oldham wrote: > > >>> static std::list workQueue_m; >>>+#if POOMA_MPI >>>+ static MPI_Request requests_m[1024]; >> >>What is this fixed constant of 1024? Does this come from the MPI standard? >> >> >>>+ static std::map allocated_requests_m; > > > Well - it's somewhat arbitrary, but with some reason. First, with mpich > an MPI_Request is an integer identifier, so 1024 requests will fill just a > page of memory. Second, the mpich library seems to use poll/select on > distinct sockets for which 1024 seems an appropriate upper number. Third, > its about the number of in-flight requests I have with my 3d CFD code (but > you may see we hard-limit here and wait for requests to finish at > appropriate places). > > So, I dont like having this magic number either, but for the MPI standard > the requests need to be allocated continuously and we need _some_ limit. > Once someone has a problem with this we could make it configurable, but I > dont see the point at the moment. > > Thus, ok again? Let's move the magic constant into a const variable instead of having the constant scattered throughout the code. Then, please commit. Thanks. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Tue Jan 6 19:58:33 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 6 Jan 2004 20:58:33 +0100 (CET) Subject: [pooma-dev] [PATCH] MPI support for SerialAsync scheduler In-Reply-To: <3FFB0027.6080509@codesourcery.com> References: <3FF9D783.5030504@codesourcery.com> <3FFB0027.6080509@codesourcery.com> Message-ID: On Tue, 6 Jan 2004, Jeffrey D. Oldham wrote: > Let's move the magic constant into a const variable instead of having > the constant scattered throughout the code. Then, please commit. Thanks. For the record, this is what I committed. It passes builds for both --serial and --mpi for me. Richard. 2004Jan06 Richard Guenther * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, add std::stack for generation tracking, add support for asyncronous MPI requests. src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define new static variables. src/Threads/IterateSchedulers/Runnable.h: declare add(). src/Pooma/Pooma.cmpl.cpp: use SystemContext::max_requests constant. Index: Pooma/Pooma.cmpl.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Pooma/Pooma.cmpl.cpp,v retrieving revision 1.40 diff -u -u -r1.40 Pooma.cmpl.cpp --- Pooma/Pooma.cmpl.cpp 5 Jan 2004 22:34:33 -0000 1.40 +++ Pooma/Pooma.cmpl.cpp 6 Jan 2004 19:52:47 -0000 @@ -354,8 +354,7 @@ #if POOMA_MPI MPI_Comm_rank(MPI_COMM_WORLD, &myContext_g); MPI_Comm_size(MPI_COMM_WORLD, &numContexts_g); - // ugh... - for (int i=0; i SystemContext::workQueueMessages_m; std::list SystemContext::workQueue_m; +#if POOMA_MPI + const int SystemContext::max_requests; + MPI_Request SystemContext::requests_m[SystemContext::max_requests]; + std::map SystemContext::allocated_requests_m; + std::set SystemContext::free_requests_m; +#endif +std::stack IterateScheduler::generationStack_m; } Index: Threads/IterateSchedulers/SerialAsync.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Threads/IterateSchedulers/SerialAsync.h,v retrieving revision 1.9 diff -u -u -r1.9 SerialAsync.h --- Threads/IterateSchedulers/SerialAsync.h 8 Jun 2000 22:16:50 -0000 1.9 +++ Threads/IterateSchedulers/SerialAsync.h 6 Jan 2004 19:52:48 -0000 @@ -42,48 +42,38 @@ // DataObject //----------------------------------------------------------------------------- -#include - #ifndef _SerialAsync_h_ #define _SerialAsync_h_ -/* -LIBRARY: - SerialAsync - -CLASSES: IterateScheduler - -CLASSES: DataObject - -CLASSES: Iterate - -OVERVIEW - SerialAsync IterateScheduler is a policy template to create a - dependence graphs and executes the graph respecting the - dependencies without using threads. There is no parallelism, - but Iterates may be executed out-of-order with respect to the - program text. - ------------------------------------------------------------------------------*/ - -////////////////////////////////////////////////////////////////////// -//----------------------------------------------------------------------------- -// Overview: -// Smarts classes for times when you want no threads but you do want -// dataflow evaluation. -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Typedefs: -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Smarts classes for times when you want no threads but you do want + * dataflow evaluation. + * + * SerialAsync IterateScheduler is a policy template to create a + * dependence graphs and executes the graph respecting the + * dependencies without using threads. + * There is no (thread level) parallelism, but Iterates may be executed + * out-of-order with respect to the program text. Also this scheduler is + * used for message based parallelism in which case asyncronous execution + * leads to reduced communication latencies. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- #include +#include +#include +#include +#include +#include #include "Threads/IterateSchedulers/IterateScheduler.h" #include "Threads/IterateSchedulers/Runnable.h" +#include "Tulip/Messaging.h" +#include "Utilities/PAssert.h" //----------------------------------------------------------------------------- // Forward Declarations: @@ -94,76 +84,258 @@ namespace Smarts { -#define MYID 0 -#define MAX_CPUS 1 -// -// Tag class for specializing IterateScheduler, Iterate and DataObject. -// +/** + * Tag class for specializing IterateScheduler, Iterate and DataObject. + */ + struct SerialAsync { - enum Action { Read, Write}; + enum Action { Read, Write }; }; -//----------------------------------------------------------------------------- +/** + * Iterate is used to implement the SerialAsync + * scheduling policy. + * + * An Iterate is a non-blocking unit of concurrency that is used + * to describe a chunk of work. It inherits from the Runnable + * class and as all subclasses of Runnable, the user specializes + * the run() method to specify the operation. + * Iterate is a further specialization of the + * Iterate class to use the SerialAsync Scheduling algorithm to + * generate the data dependency graph for a data-driven + * execution. + */ + +template<> +class Iterate : public Runnable +{ + friend class IterateScheduler; + friend class DataObject; + +public: + + typedef DataObject DataObject_t; + typedef IterateScheduler IterateScheduler_t; + + + /// The Constructor for this class takes the IterateScheduler and a + /// CPU affinity. CPU affinity has a default value of -1 which means + /// it may run on any CPU available. + + inline Iterate(IterateScheduler & s, int affinity=-1) + : scheduler_m(s), notifications_m(1), generation_m(-1), togo_m(1) + {} + + /// The dtor is virtual because the subclasses will need to add to it. + + virtual ~Iterate() {} + + /// The run method does the core work of the Iterate. + /// It is supplied by the subclass. + + virtual void run() = 0; + + //@name Stubs for the affinities + /// There is no such thing in serial. + //@{ + + inline int affinity() const {return 0;} + + inline int hintAffinity() const {return 0;} + + inline void affinity(int) {} + + inline void hintAffinity(int) {} + + //@} + + /// Notify is used to indicate to the Iterate that one of the data + /// objects it had requested has been granted. To do this, we dec a + /// dependence counter which, if equal to 0, the Iterate is ready for + /// execution. + + void notify() + { + if (--notifications_m == 0) + add(this); + } + + /// How many notifications remain? + + int notifications() const { return notifications_m; } + + void addNotification() { notifications_m++; } + + int& generation() { return generation_m; } + + int& togo() { return togo_m; } + +protected: + + /// What scheduler are we working with? + IterateScheduler &scheduler_m; + + /// How many notifications should we receive before we can run? + int notifications_m; + + /// Which generation we were issued in. + int generation_m; + + /// How many times we need to go past a "did something" to be ready + /// for destruction? + int togo_m; + +}; + struct SystemContext { void addNCpus(int) {} void wait() {} void concurrency(int){} - int concurrency() {return 1;} + int concurrency() { return 1; } void mustRunOn() {} // We have a separate message queue because they are // higher priority. + typedef Iterate *IteratePtr_t; static std::list workQueueMessages_m; static std::list workQueue_m; +#if POOMA_MPI + static const int max_requests = 1024; + static MPI_Request requests_m[max_requests]; + static std::map allocated_requests_m; + static std::set free_requests_m; +#endif + + +#if POOMA_MPI + + /// Query, if we have lots of MPI_Request slots available - /////////////////////////// - // This function lets you check if there are iterates that are - // ready to run. - inline static - bool workReady() + static bool haveLotsOfMPIRequests() { - return !(workQueue_m.empty() && workQueueMessages_m.empty()); + return free_requests_m.size() > max_requests/2; } - /////////////////////////// - // Run an iterate if one is ready. - inline static - void runSomething() + /// Get a MPI_Request slot, associated with an iterate + + static MPI_Request* getMPIRequest(IteratePtr_t p) { - if (!workQueueMessages_m.empty()) - { - // Get the top iterate. - // Delete it from the queue. - // Run the iterate. - // Delete the iterate. This could put more iterates in the queue. + PInsist(!free_requests_m.empty(), "No free MPIRequest slots."); + int i = *free_requests_m.begin(); + free_requests_m.erase(free_requests_m.begin()); + allocated_requests_m[i] = p; + p->togo()++; + return &requests_m[i]; + } - RunnablePtr_t p = workQueueMessages_m.front(); - workQueueMessages_m.pop_front(); - p->execute(); + static void releaseMPIRequest(int i) + { + IteratePtr_t p = allocated_requests_m[i]; + allocated_requests_m.erase(i); + free_requests_m.insert(i); + if (--(p->togo()) == 0) delete p; - } + } + + static bool waitForSomeRequests(bool mayBlock) + { + if (allocated_requests_m.empty()) + return false; + + int last_used_request = allocated_requests_m.rbegin()->first; + int finished[last_used_request+1]; + MPI_Status statuses[last_used_request+1]; + int nr_finished; + int res; + if (mayBlock) + res = MPI_Waitsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); else - { - if (!workQueue_m.empty()) - { - RunnablePtr_t p = workQueue_m.front(); - workQueue_m.pop_front(); - p->execute(); - delete p; + res = MPI_Testsome(last_used_request+1, requests_m, + &nr_finished, finished, statuses); + PAssert(res == MPI_SUCCESS || res == MPI_ERR_IN_STATUS); + if (nr_finished == MPI_UNDEFINED) + return false; + + // release finised requests + while (nr_finished--) { + if (res == MPI_ERR_IN_STATUS) { + if (statuses[nr_finished].MPI_ERROR != MPI_SUCCESS) { + char msg[MPI_MAX_ERROR_STRING+1]; + int len; + MPI_Error_string(statuses[nr_finished].MPI_ERROR, msg, &len); + msg[len] = '\0'; + PInsist(0, msg); + } } + releaseMPIRequest(finished[nr_finished]); } + return true; + } + +#else + + static bool waitForSomeRequests(bool mayBlock) + { + return false; + } + +#endif + + + /// This function lets you check if there are iterates that are + /// ready to run. + + static bool workReady() + { + return !(workQueue_m.empty() + && workQueueMessages_m.empty() +#if POOMA_MPI + && allocated_requests_m.empty() +#endif + ); + } + + /// Run an iterate if one is ready. Returns if progress + /// was made. + + static bool runSomething(bool mayBlock = true) + { + // do work in this order to minimize communication latency: + // - issue all messages + // - do some regular work + // - wait for messages to complete + + RunnablePtr_t p = NULL; + if (!workQueueMessages_m.empty()) { + p = workQueueMessages_m.front(); + workQueueMessages_m.pop_front(); + } else if (!workQueue_m.empty()) { + p = workQueue_m.front(); + workQueue_m.pop_front(); + } + + if (p) { + p->execute(); + Iterate *it = dynamic_cast(p); + if (it) { + if (--(it->togo()) == 0) + delete it; + } else + delete p; + return true; + + } else + return waitForSomeRequests(mayBlock); } }; -inline void addRunnable(RunnablePtr_t rn) -{ - SystemContext::workQueue_m.push_front(rn); -} +/// Adds a runnable to the appropriate work-queue. inline void add(RunnablePtr_t rn) { @@ -182,25 +354,18 @@ inline void wait() {} inline void mustRunOn(){} -/*------------------------------------------------------------------------ -CLASS - IterateScheduler_Serial_Async - - Implements a asynchronous scheduler for a data driven execution. - Specializes a IterateScheduler. - -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - - The SerialAsync IterateScheduler, Iterate and DataObject - implement a SMARTS scheduler that does dataflow without threads. - What that means is that when you hand iterates to the - IterateScheduler it stores them up until you call - IterateScheduler::blockingEvaluate(), at which point it evaluates - iterates until the queue is empty. ------------------------------------------------------------------------------*/ + +/** + * Implements a asynchronous scheduler for a data driven execution. + * Specializes a IterateScheduler. + * + * The SerialAsync IterateScheduler, Iterate and DataObject + * implement a SMARTS scheduler that does dataflow without threads. + * What that means is that when you hand iterates to the + * IterateScheduler it stores them up until you call + * IterateScheduler::blockingEvaluate(), at which point it evaluates + * iterates until the queue is empty. + */ template<> class IterateScheduler @@ -212,196 +377,128 @@ typedef DataObject DataObject_t; typedef Iterate Iterate_t; - /////////////////////////// - // Constructor - // - IterateScheduler() {} - - /////////////////////////// - // Destructor - // - ~IterateScheduler() {} - void setConcurrency(int) {} - - //--------------------------------------------------------------------------- - // Mutators. - //--------------------------------------------------------------------------- - - /////////////////////////// - // Tells the scheduler that the parser thread is starting a new - // data-parallel statement. Any Iterate that is handed off to the - // scheduler between beginGeneration() and endGeneration() belongs - // to the same data-paralllel statement and therefore has the same - // generation number. - // - inline void beginGeneration() { } - - /////////////////////////// - // Tells the scheduler that no more Iterates will be handed off for - // the data parallel statement that was begun with a - // beginGeneration(). - // - inline void endGeneration() {} - - /////////////////////////// - // The parser thread calls this method to evaluate the generated - // graph until all the nodes in the dependence graph has been - // executed by the scheduler. That is to say, the scheduler - // executes all the Iterates that has been handed off to it by the - // parser thread. - // - inline - void blockingEvaluate(); - - /////////////////////////// - // The parser thread calls this method to ask the scheduler to run - // the given Iterate when the dependence on that Iterate has been - // satisfied. - // - inline void handOff(Iterate* it); + IterateScheduler() + : generation_m(0) + {} - inline - void releaseIterates() { } + ~IterateScheduler() {} -protected: -private: + void setConcurrency(int) {} - typedef std::list Container_t; - typedef Container_t::iterator Iterator_t; + /// Tells the scheduler that the parser thread is starting a new + /// data-parallel statement. Any Iterate that is handed off to the + /// scheduler between beginGeneration() and endGeneration() belongs + /// to the same data-paralllel statement and therefore has the same + /// generation number. + /// Nested invocations are handled as being part of the outermost + /// generation. -}; + void beginGeneration() + { + // Ensure proper overflow behavior. + if (++generation_m < 0) + generation_m = 0; + generationStack_m.push(generation_m); + } -//----------------------------------------------------------------------------- + /// Tells the scheduler that no more Iterates will be handed off for + /// the data parallel statement that was begun with a + /// beginGeneration(). -/*------------------------------------------------------------------------ -CLASS - Iterate_SerialAsync - - Iterate is used to implement the SerialAsync - scheduling policy. - -KEYWORDS - Data_Parallelism, Native_Interface, IterateScheduler, Data_Flow. - -DESCRIPTION - An Iterate is a non-blocking unit of concurrency that is used - to describe a chunk of work. It inherits from the Runnable - class and as all subclasses of Runnable, the user specializes - the run() method to specify the operation. - Iterate is a further specialization of the - Iterate class to use the SerialAsync Scheduling algorithm to - generate the data dependency graph for a data-driven - execution. */ + void endGeneration() + { + PAssert(inGeneration()); + generationStack_m.pop(); -template<> -class Iterate : public Runnable -{ - friend class IterateScheduler; - friend class DataObject; +#if POOMA_MPI + // this is a safe point to block until we have "lots" of MPI Requests + if (!inGeneration()) + while (!SystemContext::haveLotsOfMPIRequests()) + SystemContext::runSomething(true); +#endif + } -public: + /// Wether we are inside a generation and may not safely block. - typedef DataObject DataObject_t; - typedef IterateScheduler IterateScheduler_t; + bool inGeneration() const + { + return !generationStack_m.empty(); + } + /// What the current generation is. - /////////////////////////// - // The Constructor for this class takes the IterateScheduler and a - // CPU affinity. CPU affinity has a default value of -1 which means - // it may run on any CPU available. - // - inline Iterate(IterateScheduler & s, int affinity=-1); - - /////////////////////////// - // The dtor is virtual because the subclasses will need to add to it. - // - virtual ~Iterate() {} + int generation() const + { + if (!inGeneration()) + return -1; + return generationStack_m.top(); + } - /////////////////////////// - // The run method does the core work of the Iterate. - // It is supplied by the subclass. - // - virtual void run() = 0; + /// The parser thread calls this method to evaluate the generated + /// graph until all the nodes in the dependence graph has been + /// executed by the scheduler. That is to say, the scheduler + /// executes all the Iterates that has been handed off to it by the + /// parser thread. - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int affinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline int hintAffinity() const {return 0;} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void affinity(int) {} - /////////////////////////// - // Stub in all the affinities, because there is no such thing - // in serial. - // - inline void hintAffinity(int) {} + void blockingEvaluate() + { + if (inGeneration()) { + // It's not safe to block inside a generation, so + // just do as much as we can without blocking. + while (SystemContext::runSomething(false)) + ; + + } else { + // Loop as long as there is anything in the queue. + while (SystemContext::workReady()) + SystemContext::runSomething(true); + } + } - /////////////////////////// - // Notify is used to indicate to the Iterate that one of the data - // objects it had requested has been granted. To do this, we dec a - // dependence counter which, if equal to 0, the Iterate is ready for - // execution. - // - inline void notify(); - - /////////////////////////// - // How many notifications remain? - // - inline - int notifications() const { return notifications_m; } + /// The parser thread calls this method to ask the scheduler to run + /// the given Iterate when the dependence on that Iterate has been + /// satisfied. - inline void addNotification() + void handOff(Iterate* it) { - notifications_m++; + // No action needs to be taken here. Iterates will make their + // own way into the execution queue. + it->generation() = generation(); + it->notify(); } -protected: + void releaseIterates() { } - // What scheduler are we working with? - IterateScheduler &scheduler_m; +private: - // How many notifications should we receive before we can run? - int notifications_m; + typedef std::list Container_t; + typedef Container_t::iterator Iterator_t; -private: - // Set notifications dynamically and automatically every time a - // request is made by the iterate - void incr_notifications() { notifications_m++;} + static std::stack generationStack_m; + int generation_m; }; -//----------------------------------------------------------------------------- - -/*------------------------------------------------------------------------ -CLASS - DataObject_SerialAsync - - Implements a asynchronous scheduler for a data driven execution. -KEYWORDS - Data-parallelism, Native-interface, IterateScheduler. - -DESCRIPTION - The DataObject Class is used introduce a type to represent - a resources (normally) blocks of data) that Iterates contend - for atomic access. Iterates make request for either a read or - write to the DataObjects. DataObjects may grant the request if - the object is currently available. Otherwise, the request is - enqueue in a queue private to the data object until the - DataObject is release by another Iterate. A set of read - requests may be granted all at once if there are no - intervening write request to that DataObject. - DataObject is a specialization of DataObject for - the policy template SerialAsync. -*/ +/** + * Implements a asynchronous scheduler for a data driven execution. + * + * The DataObject Class is used introduce a type to represent + * a resources (normally) blocks of data) that Iterates contend + * for atomic access. Iterates make request for either a read or + * write to the DataObjects. DataObjects may grant the request if + * the object is currently available. Otherwise, the request is + * enqueue in a queue private to the data object until the + * DataObject is release by another Iterate. A set of read + * requests may be granted all at once if there are no + * intervening write request to that DataObject. + * DataObject is a specialization of DataObject for + * the policy template SerialAsync. + * + * There are two ways data can be used: to read or to write. + * Don't change this to give more than two states; + * things inside depend on that. + */ template<> class DataObject @@ -413,54 +510,56 @@ typedef IterateScheduler IterateScheduler_t; typedef Iterate Iterate_t; - // There are two ways data can be used: to read or to write. - // Don't change this to give more than two states: - // things inside depend on that. - - /////////////////////////// - // Construct the data object with an empty set of requests - // and the given affinity. - // - inline DataObject(int affinity=-1); + + /// Construct the data object with an empty set of requests + /// and the given affinity. + + DataObject(int affinity=-1) + : released_m(queue_m.end()), notifications_m(0) + { + // released_m to the end of the queue (which should) also be the + // beginning. notifications_m to zero, since nothing has been + // released yet. + } - /////////////////////////// - // for compatibility with other SMARTS schedulers, accept - // Scheduler arguments (unused) - // - inline - DataObject(int affinity, IterateScheduler&); - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline int affinity() const { return 0; } - - /////////////////////////// - // Stub out affinity because there is no affinity in serial. - // - inline void affinity(int) {} + /// for compatibility with other SMARTS schedulers, accept + /// Scheduler arguments (unused) - /////////////////////////// - // An iterate makes a request for a certain action in a certain - // generation. - // - inline - void request(Iterate&, SerialAsync::Action); - - /////////////////////////// - // An iterate finishes and tells the DataObject it no longer needs - // it. If this is the last release for the current set of - // requests, have the IterateScheduler release some more. - // - inline void release(SerialAsync::Action); + inline DataObject(int affinity, IterateScheduler&) + : released_m(queue_m.end()), notifications_m(0) + {} + + /// Stub out affinity because there is no affinity in serial. + + int affinity() const { return 0; } + + /// Stub out affinity because there is no affinity in serial. + + void affinity(int) {} + + /// An iterate makes a request for a certain action in a certain + /// generation. + + inline void request(Iterate&, SerialAsync::Action); + + /// An iterate finishes and tells the DataObject it no longer needs + /// it. If this is the last release for the current set of + /// requests, have the IterateScheduler release some more. + + void release(SerialAsync::Action) + { + if (--notifications_m == 0) + releaseIterates(); + } -protected: private: - // If release needs to let more iterates go, it calls this. + /// If release needs to let more iterates go, it calls this. inline void releaseIterates(); - // The type for a request. + /** + * The type for a request. + */ class Request { public: @@ -475,135 +574,27 @@ SerialAsync::Action act_m; }; - // The type of the queue and iterator. + /// The type of the queue and iterator. typedef std::list Container_t; typedef Container_t::iterator Iterator_t; - // The list of requests from various iterates. - // They're granted in FIFO order. + /// The list of requests from various iterates. + /// They're granted in FIFO order. Container_t queue_m; - // Pointer to the last request that has been granted. + /// Pointer to the last request that has been granted. Iterator_t released_m; - // The number of outstanding notifications. + /// The number of outstanding notifications. int notifications_m; }; -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for -// IterateScheduler -// -////////////////////////////////////////////////////////////////////// - -// -// IterateScheduler::handOff(Iterate*) -// No action needs to be taken here. Iterates will make their -// own way into the execution queue. -// +/// void DataObject::releaseIterates(SerialAsync::Action) +/// When the last released iterate dies, we need to +/// look at the beginning of the queue and tell more iterates +/// that they can access this data. inline void -IterateScheduler::handOff(Iterate* it) -{ - it->notify(); -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for Iterate -// -////////////////////////////////////////////////////////////////////// - -// -// Iterate::Iterate -// Construct with the scheduler and the number of notifications. -// Ignore the affinity. -// - -inline -Iterate::Iterate(IterateScheduler& s, int) -: scheduler_m(s), notifications_m(1) -{ -} - -// -// Iterate::notify -// Notify the iterate that a DataObject is ready. -// Decrement the counter, and if it is zero, alert the scheduler. -// - -inline void -Iterate::notify() -{ - if ( --notifications_m == 0 ) - { - add(this); - } -} - -////////////////////////////////////////////////////////////////////// -// -// Inline implementation of the functions for DataObject -// -////////////////////////////////////////////////////////////////////// - -// -// DataObject::DataObject() -// Initialize: -// released_m to the end of the queue (which should) also be the -// beginning. notifications_m to zero, since nothing has been -// released yet. -// - -inline -DataObject::DataObject(int) -: released_m(queue_m.end()), notifications_m(0) -{ -} - -// -// void DataObject::release(Action) -// An iterate has finished and is telling the DataObject that -// it is no longer needed. -// - -inline void -DataObject::release(SerialAsync::Action) -{ - if ( --notifications_m == 0 ) - releaseIterates(); -} - - - -//----------------------------------------------------------------------------- -// -// void IterateScheduler::blockingEvaluate -// Evaluate all the iterates in the queue. -// -//----------------------------------------------------------------------------- -inline -void -IterateScheduler::blockingEvaluate() -{ - // Loop as long as there is anything in the queue. - while (SystemContext::workReady()) - { - SystemContext::runSomething(); - } -} - -//----------------------------------------------------------------------------- -// -// void DataObject::releaseIterates(SerialAsync::Action) -// When the last released iterate dies, we need to -// look at the beginning of the queue and tell more iterates -// that they can access this data. -// -//----------------------------------------------------------------------------- -inline -void DataObject::releaseIterates() { // Get rid of the reservations that have finished. @@ -622,14 +613,17 @@ released_m->iterate().notify(); ++notifications_m; - // Record what action that one will take. + // Record what action that one will take + // and record its generation number SerialAsync::Action act = released_m->act(); + int generation = released_m->iterate().generation(); // Look at the next iterate. ++released_m; // If the first one was a read, release more. if ( act == SerialAsync::Read ) + { // As long as we aren't at the end and we have more reads... while ((released_m != end) && @@ -642,29 +636,30 @@ // And go on to the next. ++released_m; } + + } + } } +/// void DataObject::request(Iterate&, action) +/// An iterate makes a reservation with this DataObject for a given +/// action in a given generation. The request may be granted +/// immediately. -// -// void DataObject::request(Iterate&, action) -// An iterate makes a reservation with this DataObject for a given -// action in a given generation. The request may be granted -// immediately. -// -inline -void +inline void DataObject::request(Iterate& it, SerialAsync::Action act) { // The request can be granted immediately if: // The queue is currently empty, or - // The request is a read and everything in the queue is a read. + // the request is a read and everything in the queue is a read, + // or (with relaxed conditions), everything is the same generation. // Set notifications dynamically and automatically // every time a request is made by the iterate - it.incr_notifications(); + it.notifications_m++; bool allReleased = (queue_m.end() == released_m); bool releasable = queue_m.empty() || @@ -691,17 +686,11 @@ } -//---------------------------------------------------------------------- - - -// -// End of Smarts namespace. -// -} +} // namespace Smarts ////////////////////////////////////////////////////////////////////// -#endif // POOMA_PACKAGE_CLASS_H +#endif // _SerialAsync_h_ /*************************************************************************** * $RCSfile: SerialAsync.h,v $ $Author: sa_smith $ From oldham at codesourcery.com Tue Jan 6 20:10:59 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 12:10:59 -0800 Subject: [pooma-dev] [PATCH] MPI support for SerialAsync scheduler In-Reply-To: References: <3FF9D783.5030504@codesourcery.com> <3FFB0027.6080509@codesourcery.com> Message-ID: <3FFB1653.7040302@codesourcery.com> Richard Guenther wrote: > On Tue, 6 Jan 2004, Jeffrey D. Oldham wrote: > > >>Let's move the magic constant into a const variable instead of having >>the constant scattered throughout the code. Then, please commit. Thanks. > > > For the record, this is what I committed. It passes builds for both > --serial and --mpi for me. > > Richard. > > > 2004Jan06 Richard Guenther > > * src/Threads/IterateSchedulers/SerialAsync.h: doxygenifize, > add std::stack for generation tracking, add support for > asyncronous MPI requests. > src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp: define > new static variables. > src/Threads/IterateSchedulers/Runnable.h: declare add(). > src/Pooma/Pooma.cmpl.cpp: use SystemContext::max_requests > constant. > > Index: Threads/IterateSchedulers/SerialAsync.cmpl.cpp > =================================================================== > RCS file: /home/pooma/Repository/r2/src/Threads/IterateSchedulers/SerialAsync.cmpl.cpp,v > retrieving revision 1.3 > diff -u -u -r1.3 SerialAsync.cmpl.cpp > --- Threads/IterateSchedulers/SerialAsync.cmpl.cpp 12 Apr 2000 00:08:06 -0000 1.3 > +++ Threads/IterateSchedulers/SerialAsync.cmpl.cpp 6 Jan 2004 19:52:47 -0000 > @@ -82,6 +82,13 @@ > > std::list SystemContext::workQueueMessages_m; > std::list SystemContext::workQueue_m; > +#if POOMA_MPI > + const int SystemContext::max_requests; > + MPI_Request SystemContext::requests_m[SystemContext::max_requests]; Thank you. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Tue Jan 6 22:50:46 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 06 Jan 2004 14:50:46 -0800 Subject: Patch: Fix Compilation of Messaging.cmpl.cpp Message-ID: <3FFB3BC6.7080403@codesourcery.com> The attached patch, approved by Richard Guenther, ensures src/Tulip/Messaging.cmpl.cpp can be compiled. The source code uses 'Pooma::context', so 'Pooma::Pooma.h' must be included. Tested under LINUX gcc and G++ 3.4. 2004-01-06 Jeffrey D. Oldham * Messaging.cmpl.cpp: Include "Pooma.h" so "Pooma::context" is declared. -- Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: msg.06Jan.13.9.patch URL: From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 12:55:32 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 13:55:32 +0100 (CET) Subject: [PATCH] Extend/fix some testcases for MPI Message-ID: Hi! This patch fixes some testcases for MPI operation and extends array_test29 to check for individual guard updates (for the optimization patches still pending). Ok? Richard. 2004Jan07 Richard Guenther * src/Array/tests/array_test29: systematically check (partial) guard updates. src/Field/tests/BasicTest3.cpp: use ReplicatedTag, not DistributedTag. src/Layout/tests/dynamiclayout_test1.cpp: #define BARRIER, as in other tests. src/Layout/tests/dynamiclayout_test2.cpp: likewise. src/Tulip/tests/ReduceOverContextsTest.cpp: #include Pooma/Pooma.h. diff -Nru a/r2/src/Array/tests/array_test29.cpp b/r2/src/Array/tests/array_test29.cpp --- a/r2/src/Array/tests/array_test29.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Array/tests/array_test29.cpp Wed Jan 7 13:47:20 2004 @@ -33,17 +33,168 @@ #include "Pooma/Pooma.h" #include "Utilities/Tester.h" -#include "Domain/Loc.h" -#include "Domain/Interval.h" -#include "Partition/UniformGridPartition.h" -#include "Layout/UniformGridLayout.h" -#include "Engine/BrickEngine.h" -#include "Engine/CompressibleBrick.h" -#include "Engine/MultiPatchEngine.h" -#include "Engine/RemoteEngine.h" -#include "Array/Array.h" +#include "Pooma/Arrays.h" #include "Tiny/Vector.h" + +void checks1(Pooma::Tester& tester) +{ + Interval<2> I(9, 9); + Loc<2> blocks(3, 3); + UniformGridPartition<2> partition(blocks, GuardLayers<2>(1)); + UniformGridLayout<2> layout(I, partition, DistributedTag()); + DomainLayout<2> layout2(I, GuardLayers<2>(1)); + + Array<2, int, MultiPatch > > + am(layout), bm(layout); + Array<2, int, Brick> + al(layout2), bl(layout2); + + am = 2; + al = 2; + am(I) = 1; + al(I) = 1; + bm = am; + bl = al; + + bm(I) += am(I-Loc<2>(1, 0)); + bl(I) += al(I-Loc<2>(1, 0)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(0, 1)); + al(I) += bl(I-Loc<2>(0, 1)); + tester.check("upper guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + bm(I) += am(I-Loc<2>(-1, 0)); + bl(I) += al(I-Loc<2>(-1, 0)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(0, -1)); + al(I) += bl(I-Loc<2>(0, -1)); + tester.check("lower guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + + bm(I) += am(I-Loc<2>(1, 1)); + bl(I) += al(I-Loc<2>(1, 1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(1, -1)); + al(I) += bl(I-Loc<2>(1, -1)); + tester.check("lower left guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } + bm(I) += am(I-Loc<2>(-1, 1)); + bl(I) += al(I-Loc<2>(-1, 1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + am(I) += bm(I-Loc<2>(-1, -1)); + al(I) += bl(I-Loc<2>(-1, -1)); + tester.check("lower right guards", all(am(I) == al(I))); + if (!tester.ok()) { + tester.out() << al << am << std::endl; + return; + } +} + +void checks2(Pooma::Tester& tester) +{ + Interval<2> I(9, 9); + Loc<2> blocks(3, 3); + UniformGridPartition<2> partition(blocks, GuardLayers<2>(1)); + UniformGridLayout<2> layout(I, partition, DistributedTag()); + DomainLayout<2> layout2(I, GuardLayers<2>(1)); + + Array<2, int, MultiPatch > > + am(layout), bm(layout); + Array<2, int, Brick> + al(layout2), bl(layout2); + + am = 2; + al = 2; + am(I) = 1; + al(I) = 1; + bm = am; + bl = al; + + bm(I) = am(I-Loc<2>(1, 0)); + bl(I) = al(I-Loc<2>(1, 0)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(0, 1)); + bl(I) = al(I-Loc<2>(0, 1)); + tester.check("left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, 0)); + bl(I) = al(I-Loc<2>(-1, 0)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(0, -1)); + bl(I) = al(I-Loc<2>(0, -1)); + tester.check("right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + + bm(I) = am(I-Loc<2>(1, 1)); + bl(I) = al(I-Loc<2>(1, 1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(1, -1)); + bl(I) = al(I-Loc<2>(1, -1)); + tester.check("upper left guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, 1)); + bl(I) = al(I-Loc<2>(-1, 1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } + bm(I) = am(I-Loc<2>(-1, -1)); + bl(I) = al(I-Loc<2>(-1, -1)); + tester.check("upper right guards", all(bm(I) == bl(I))); + if (!tester.ok()) { + tester.out() << bl << bm << std::endl; + return; + } +} + int main(int argc, char *argv[]) { Pooma::initialize(argc, argv); @@ -72,6 +223,16 @@ a1(I) = (a2(I-1)+a2(I+1))/2; tester.check("Average", all(a1(I) == 1)); + Interval<1> J(1,7); + a1 = 0; + a2 = 1; + a1(J) = (a2(J-1)+a2(J+1))/2; + tester.check("Average", all(a1(J) == 1)); + + checks1(tester); + if (tester.ok()) + checks2(tester); + int ret = tester.results("array_test29"); Pooma::finalize(); return ret; diff -Nru a/r2/src/Field/tests/BasicTest3.cpp b/r2/src/Field/tests/BasicTest3.cpp --- a/r2/src/Field/tests/BasicTest3.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Field/tests/BasicTest3.cpp Wed Jan 7 13:47:20 2004 @@ -127,7 +127,7 @@ // MultiPatch tester.out() << "MultiPatch...\n"; { - GridLayout<2> layout1(Interval<2>(I, J), Loc<2>(1, 1), GuardLayers<2>(1), DistributedTag()); + GridLayout<2> layout1(Interval<2>(I, J), Loc<2>(1, 1), GuardLayers<2>(1), ReplicatedTag()); Field::Mesh_t, int, MultiPatch > f(vert, layout1, origin, spacings); check(tester, f); diff -Nru a/r2/src/Layout/tests/dynamiclayout_test1.cpp b/r2/src/Layout/tests/dynamiclayout_test1.cpp --- a/r2/src/Layout/tests/dynamiclayout_test1.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Layout/tests/dynamiclayout_test1.cpp Wed Jan 7 13:47:20 2004 @@ -40,7 +40,7 @@ #include "Layout/DynamicLayout.h" #include "Partition/GridPartition.h" -//#define BARRIER +#define BARRIER #ifndef BARRIER #if POOMA_CHEETAH diff -Nru a/r2/src/Layout/tests/dynamiclayout_test2.cpp b/r2/src/Layout/tests/dynamiclayout_test2.cpp --- a/r2/src/Layout/tests/dynamiclayout_test2.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Layout/tests/dynamiclayout_test2.cpp Wed Jan 7 13:47:20 2004 @@ -44,7 +44,7 @@ #include #endif -//#define BARRIER +#define BARRIER #ifndef BARRIER #if POOMA_CHEETAH diff -Nru a/r2/src/Tulip/tests/ReduceOverContextsTest.cpp b/r2/src/Tulip/tests/ReduceOverContextsTest.cpp --- a/r2/src/Tulip/tests/ReduceOverContextsTest.cpp Wed Jan 7 13:47:20 2004 +++ b/r2/src/Tulip/tests/ReduceOverContextsTest.cpp Wed Jan 7 13:47:20 2004 @@ -32,7 +32,7 @@ // Include files -#include "PETE/PETE.h" // seems like overkill... +#include "Pooma/Pooma.h" #include "Tulip/ReduceOverContexts.h" #include "Tulip/RemoteProxy.h" #include "Utilities/Tester.h" From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 13:01:16 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 14:01:16 +0100 (CET) Subject: [PATCH] Clean Threads/ Message-ID: Hi! This patch cleans the Threads/ files by doxygenifizing them and using common code, if available, rather than duplicating existing stuff. Ok? Richard. 2004Jan07 Richard Guenther * src/Threads/IterateSchedulers/IterateScheduler.h: Doxygenifize, only declare classes. src/Threads/IterateSchedulers/Runnable.h: Doxygenifize, reorder methods. src/Threads/PoomaSmarts.h: Doxygenifize. src/Threads/Scheduler.h: Doxygenifize, use #error, not CTAssert. src/Threads/SmartsStubs.h: Doxygenifize, use IterateScheduler.h and Runnable.h instead of duplicating code. diff -Nru a/r2/src/Threads/IterateSchedulers/IterateScheduler.h b/r2/src/Threads/IterateSchedulers/IterateScheduler.h --- a/r2/src/Threads/IterateSchedulers/IterateScheduler.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/IterateSchedulers/IterateScheduler.h Wed Jan 7 13:47:20 2004 @@ -40,44 +40,28 @@ #ifndef ITERATE_SCHEDULER_H #define ITERATE_SCHEDULER_H -//---------------------------------------------------------------------- -// Functions: -// template class IterateScheduler -//---------------------------------------------------------------------- - -//---------------------------------------------------------------------- -// The templated classes. -// This is sort of like an abstract base class, since it doesn't -// implement anything and you can't build one of these directly. -// -//---------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Templates for the scheduler classes. + * + * This is sort of like abstract base classes, since it doesn't + * implement anything and you can't build one of these directly. + * They are implemented by specializing for a tag class like + * Stub or SerialAsync. + */ namespace Smarts { -template -class IterateScheduler -{ -public: - IterateScheduler() {}; - int generation() const {return generation_m;} -private: - int generation_m; -}; template -class Iterate -{ -public: - Iterate() {}; -private: +class IterateScheduler; +template +class Iterate; -}; template -class DataObject -{ -public: - DataObject() {}; -private: -}; +class DataObject; + } // close namespace Smarts -#endif// ITERATE_SCHEDULER_H + +#endif // ITERATE_SCHEDULER_H diff -Nru a/r2/src/Threads/IterateSchedulers/Runnable.h b/r2/src/Threads/IterateSchedulers/Runnable.h --- a/r2/src/Threads/IterateSchedulers/Runnable.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/IterateSchedulers/Runnable.h Wed Jan 7 13:47:20 2004 @@ -29,95 +29,63 @@ #ifndef _Runnable_h_ #define _Runnable_h_ +/** @file + * @ingroup IterateSchedulers + * @brief + * Base class for a schedulable object or function to be executed + * by the scheduler asynchronously. + */ + #include namespace Smarts { -/*------------------------------------------------------------------------ -CLASS - Runnable - - Base class for a schedulable object or function to be executed - by the scheduler asynchronously - -KEYWORDS - Thread, Native_Interface, Task_Parallelism, Data_Parallelism. - -DESCRIPTION - Runnable is the base class for system classes "Thread" and - "Iterate". However, the user may define his/her own - sub-class. Any class derived from Runnable, is an object that - the scheduler understands and therefore is the mechanism to - have something executed in parallel by the scheduler on behalf - of the user. - -COPYRIGHT - This program was prepared by the Regents of the University of - California at Los Alamos National Laboratory (the University) - under Contract No. W-7405-ENG-36 with the U.S. Department of - Energy (DOE). The University has certain rights in the program - pursuant to the contract and the program should not be copied - or distributed outside your organization. All rights in the - program are reserved by the DOE and the University. Neither - the U.S. Government nor the University makes any warranty, - express or implied, or assumes any liability or responsibility - for the use of this software. - - Parts of this software have been authored at the University of - Colorado -- Boulder. Neither University of Colorado nor its - employees makes any warranty, express or implied, or assumes - any liability or responsibility for the use of this SOFTWARE. - - This SOFTWARE may be modified for derivative use, but modified - SOFTWARE should be clearly marked as such, so as not to - confuse it with the versions available from Los Alamos - National Laboratory. -*/ +/** + * Runnable is the base class for system classes "Thread" and + * "Iterate". However, the user may define his/her own + * sub-class. Any class derived from Runnable, is an object that + * the scheduler understands and therefore is the mechanism to + * have something executed in parallel by the scheduler on behalf + * of the user. + */ class Runnable { - friend class Context; public: - /////////////////////////// - // Set priority of this runnable relative to other runnables - // being scheduled. - // - inline int - priority() { return priority_m; }; - - /////////////////////////// - // Accessor function to priority. - // - inline void - priority(int _priority) { priority_m = _priority; }; - - ///////// - virtual ~Runnable(){}; - - ///////// Runnable() { priority_m = 0; } - /////////////////////////// - // The parameter to this constructor is the CPU id for - // hard affinity. - // - Runnable(int ) + /// The parameter to this constructor is the CPU id for + /// hard affinity. + + Runnable(int) { priority_m = 0; } - //////// - virtual void execute(){ - run(); - }; + virtual ~Runnable() {} + + /// Accessor function to priority. + inline int + priority() { return priority_m; } + + /// Set priority of this runnable relative to other runnables + /// being scheduled. + + inline void + priority(int _priority) { priority_m = _priority; } + + virtual void execute() + { + run(); + } protected: - virtual void run(){}; + virtual void run() {} private: int priority_m; @@ -130,5 +98,6 @@ inline void add(RunnablePtr_t); -} +} // namespace Smarts + #endif diff -Nru a/r2/src/Threads/PoomaSmarts.h b/r2/src/Threads/PoomaSmarts.h --- a/r2/src/Threads/PoomaSmarts.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/PoomaSmarts.h Wed Jan 7 13:47:20 2004 @@ -29,25 +29,17 @@ #ifndef POOMA_THREADS_POOMA_SMARTS_H #define POOMA_THREADS_POOMA_SMARTS_H -//----------------------------------------------------------------------------- -// Types: -// Pooma::SmartsTag_t -// Pooma::Scheduler_t -// Pooma::DataObject_t -// Pooma::Iterate_t -// -// Global Data: -// Pooma::schedulerVersion -//----------------------------------------------------------------------------- - -//----------------------------------------------------------------------------- -// Overview: -// The POOMA wrapper around defines, includes, and typedefs for the Smarts -// run-time evaluation system. Based on the settings of POOMA_THREADS and -// the selected scheduler, define several typedefs and include the necessary -// files. If we're compiling only for serial runs, use a stub version of -// the Smarts interface instead. -//----------------------------------------------------------------------------- +/** @file + * @ingroup Threads + * @brief + * The POOMA wrapper around defines, includes, and typedefs for the Smarts + * run-time evaluation system. + * + * Based on the settings of POOMA_THREADS and + * the selected scheduler, define several typedefs and include the necessary + * files. If we're compiling only for serial runs, use a stub version of + * the Smarts interface instead. + */ //----------------------------------------------------------------------------- // Includes: diff -Nru a/r2/src/Threads/Scheduler.h b/r2/src/Threads/Scheduler.h --- a/r2/src/Threads/Scheduler.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/Scheduler.h Wed Jan 7 13:47:20 2004 @@ -34,16 +34,16 @@ #ifndef POOMA_THREADS_SCHEDULER_H #define POOMA_THREADS_SCHEDULER_H -////////////////////////////////////////////////////////////////////// - -//----------------------------------------------------------------------------- -// Overview: -// -// This file exist to wrap the correct includes from Smarts based on the -// scheduler that we've selected. If we're running in serial then we include -// the a stub file. This file defines a single typedef: SmartsTag_t, a policy -// tag which is used to select the appropriate smarts data object etc. -//----------------------------------------------------------------------------- +/** @file + * @ingroup Threads + * @brief + * Scheduler multiplexing based on configuration. + * + * This file exist to wrap the correct includes from Smarts based on the + * scheduler that we've selected. If we're running in serial then we include + * the a stub file. This file defines a single typedef: SmartsTag_t, a policy + * tag which is used to select the appropriate smarts data object etc. + */ //----------------------------------------------------------------------------- // Includes: @@ -82,8 +82,7 @@ # else -# include "Utilities/PAssert.h" -CTAssert(YOU_HAVE_NOT_SELECTED_A_SCHEDULER); +# error "You have not selected a scheduler" # endif diff -Nru a/r2/src/Threads/SmartsStubs.h b/r2/src/Threads/SmartsStubs.h --- a/r2/src/Threads/SmartsStubs.h Wed Jan 7 13:47:20 2004 +++ b/r2/src/Threads/SmartsStubs.h Wed Jan 7 13:47:20 2004 @@ -29,56 +29,21 @@ #ifndef POOMA_THREADS_SMARTSSTUBS_H #define POOMA_THREADS_SMARTSSTUBS_H -//----------------------------------------------------------------------------- -// Functions: -// SimpleSerialScheduler -// SimpleSerialScheduler::Iterate -// SimpleSerialScheduler::DataObject -// template class IterateScheduler -// IterateScheduler -// Iterate -// DataObject -// void concurrency(int) -// int concurrency() -// wait -//----------------------------------------------------------------------------- +/** @file + * @ingroup IterateSchedulers + * @brief + * Stub scheduler for serial in-order evaluation. + */ //----------------------------------------------------------------------------- // Includes: //----------------------------------------------------------------------------- -//---------------------------------------------------------------------- -// The templated classes. -// This is sort of like an abstract base class, since it doesn't -// implement anything and you can't build one of these directly. -//---------------------------------------------------------------------- +#include "Threads/IterateSchedulers/IterateScheduler.h" +#include "Threads/IterateSchedulers/Runnable.h" namespace Smarts { -template -class IterateScheduler -{ -private: - // Private ctor means you can't build one of these. - IterateScheduler() {} -}; - -template -class Iterate -{ -private: - // Private ctor means you can't build one of these. - Iterate() {} -}; - -template -class DataObject -{ -private: - // Private ctor means you can't build one of these. - DataObject() {} -}; - //---------------------------------------------------------------------- // The tag class we'll use for the template parameter. //---------------------------------------------------------------------- @@ -93,6 +58,7 @@ template<> class IterateScheduler; template<> class DataObject; + //////////////////////////////////////////////////////////////////////// // // The specialization of Iterate for Stub @@ -100,7 +66,7 @@ //////////////////////////////////////////////////////////////////////// template<> -class Iterate +class Iterate : public Runnable { public: // Construct the Iterate with: @@ -244,30 +210,9 @@ { } -class Runnable -{ -public: - // Runnable just takes affinity. - inline Runnable(int affinity) - : affinity_m(affinity) - { } - - // The dtor is virtual because the subclasses will need to add to it. - virtual ~Runnable() {} - virtual void run() = 0; - - int affinity() { return affinity_m; } - int hintAffinity() { return affinity_m; } - void affinity(int) {} - void hintAffinity(int) {} - -private: - int affinity_m; -}; - inline void add(Runnable *runnable) { - runnable->run(); + runnable->execute(); delete runnable; } From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 14:38:54 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 15:38:54 +0100 (CET) Subject: [PATCH] Provide -openmp to user apps, too Message-ID: Hi! Last time I missed the user used Makefile in the lib directory. Fixed with this pathch. Ok? Richard. 2004Jan07 Richard Guenther * configure: provide openmp arguments to lib/$SUITE/Makefile. ===== configure 1.20 vs edited ===== --- 1.20/r2/configure Wed Jan 7 12:23:35 2004 +++ edited/configure Wed Jan 7 15:35:57 2004 @@ -2335,14 +2335,14 @@ print MFILE "LD_PARALLEL = 1\n"; print MFILE "\n"; print MFILE "### flags for applications\n"; - print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $cppopt_app\n"; - print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $cppdbg_app\n"; + print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; + print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; print MFILE "\n"; - print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $copt_app\n"; - print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $cdbg_app\n"; + print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $openmpargs $copt_app\n"; + print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; print MFILE "\n"; - print MFILE "POOMA_F77_OPT_ARGS = $f77args $f77opt_app\n"; - print MFILE "POOMA_F77_DBG_ARGS = $f77args $f77dbg_app\n"; + print MFILE "POOMA_F77_OPT_ARGS = $openmpargs $f77args $f77opt_app\n"; + print MFILE "POOMA_F77_DBG_ARGS = $openmpargs $f77args $f77dbg_app\n"; print MFILE "\n"; print MFILE "POOMA_INCLUDES = $totinclist\n"; print MFILE "\n"; From rguenth at tat.physik.uni-tuebingen.de Wed Jan 7 16:57:46 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 7 Jan 2004 17:57:46 +0100 (CET) Subject: [PATCH] Support hybrid MPI/OpenMP if available Message-ID: Hi! This patch makes sure to correctly initialize MPI according to the standard when using OpenMP. Tested with mpich and Intel icpc where in fact, this mode is not supported appearantly. Ok? Richard. 2004Jan07 Richard Guenther * src/Pooma/Pooma.cmpl.cpp: initialize MPI using MPI_Init_thread if _OPENMP is defined, require at least MPI_THREAD_FUNNELED support. ===== Pooma/Pooma.cmpl.cpp 1.6 vs edited ===== --- 1.6/r2/src/Pooma/Pooma.cmpl.cpp Wed Jan 7 12:23:35 2004 +++ edited/Pooma/Pooma.cmpl.cpp Wed Jan 7 17:54:30 2004 @@ -288,7 +288,13 @@ // the Cheetah options from the Options object. #if POOMA_MPI +# ifdef _OPENMP + int provided; + MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); + PInsist(provided >= MPI_THREAD_FUNNELED, "No MPI support for OpenMP"); +# else MPI_Init(&argc, &argv); +# endif #elif POOMA_CHEETAH controller_g = new Cheetah::Controller(argc, argv); #endif From oldham at codesourcery.com Thu Jan 8 21:34:55 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:34:55 -0800 Subject: [PATCH] Extend/fix some testcases for MPI In-Reply-To: References: Message-ID: <3FFDCCFF.7090401@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch fixes some testcases for MPI operation and extends array_test29 > to check for individual guard updates (for the optimization patches still > pending). > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * src/Array/tests/array_test29: systematically check (partial) > guard updates. > src/Field/tests/BasicTest3.cpp: use ReplicatedTag, not > DistributedTag. > src/Layout/tests/dynamiclayout_test1.cpp: #define BARRIER, as in > other tests. > src/Layout/tests/dynamiclayout_test2.cpp: likewise. > src/Tulip/tests/ReduceOverContextsTest.cpp: #include Pooma/Pooma.h. Yes. Thanks. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:37:12 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:37:12 -0800 Subject: [PATCH] Clean Threads/ In-Reply-To: References: Message-ID: <3FFDCD88.5000602@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch cleans the Threads/ files by doxygenifizing them and > using common code, if available, rather than duplicating existing stuff. > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * src/Threads/IterateSchedulers/IterateScheduler.h: Doxygenifize, > only declare classes. > src/Threads/IterateSchedulers/Runnable.h: Doxygenifize, reorder > methods. > src/Threads/PoomaSmarts.h: Doxygenifize. > src/Threads/Scheduler.h: Doxygenifize, use #error, not CTAssert. > src/Threads/SmartsStubs.h: Doxygenifize, use IterateScheduler.h > and Runnable.h instead of duplicating code. Yes. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:37:53 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:37:53 -0800 Subject: [PATCH] Provide -openmp to user apps, too In-Reply-To: References: Message-ID: <3FFDCDB1.6040102@codesourcery.com> Richard Guenther wrote: > Hi! > > Last time I missed the user used Makefile in the lib directory. Fixed with > this pathch. > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * configure: provide openmp arguments to lib/$SUITE/Makefile. > > ===== configure 1.20 vs edited ===== > --- 1.20/r2/configure Wed Jan 7 12:23:35 2004 > +++ edited/configure Wed Jan 7 15:35:57 2004 > @@ -2335,14 +2335,14 @@ > print MFILE "LD_PARALLEL = 1\n"; > print MFILE "\n"; > print MFILE "### flags for applications\n"; > - print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $cppopt_app\n"; > - print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $cppdbg_app\n"; > + print MFILE "POOMA_CXX_OPT_ARGS = \@cppargs\@ $openmpargs $cppopt_app\n"; > + print MFILE "POOMA_CXX_DBG_ARGS = \@cppargs\@ $openmpargs $cppdbg_app\n"; > print MFILE "\n"; > - print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $copt_app\n"; > - print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $cdbg_app\n"; > + print MFILE "POOMA_CC_OPT_ARGS = \@cargs\@ $openmpargs $copt_app\n"; > + print MFILE "POOMA_CC_DBG_ARGS = \@cargs\@ $openmpargs $cdbg_app\n"; > print MFILE "\n"; > - print MFILE "POOMA_F77_OPT_ARGS = $f77args $f77opt_app\n"; > - print MFILE "POOMA_F77_DBG_ARGS = $f77args $f77dbg_app\n"; > + print MFILE "POOMA_F77_OPT_ARGS = $openmpargs $f77args $f77opt_app\n"; > + print MFILE "POOMA_F77_DBG_ARGS = $openmpargs $f77args $f77dbg_app\n"; > print MFILE "\n"; > print MFILE "POOMA_INCLUDES = $totinclist\n"; > print MFILE "\n"; Yes, please commit. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 8 21:43:48 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:43:48 -0800 Subject: [PATCH] Support hybrid MPI/OpenMP if available In-Reply-To: References: Message-ID: <3FFDCF14.70708@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch makes sure to correctly initialize MPI according to the > standard when using OpenMP. > > Tested with mpich and Intel icpc where in fact, this mode is not supported > appearantly. > > Ok? > > Richard. > > > 2004Jan07 Richard Guenther > > * src/Pooma/Pooma.cmpl.cpp: initialize MPI using MPI_Init_thread > if _OPENMP is defined, require at least MPI_THREAD_FUNNELED > support. > > ===== Pooma/Pooma.cmpl.cpp 1.6 vs edited ===== > --- 1.6/r2/src/Pooma/Pooma.cmpl.cpp Wed Jan 7 12:23:35 2004 > +++ edited/Pooma/Pooma.cmpl.cpp Wed Jan 7 17:54:30 2004 > @@ -288,7 +288,13 @@ > // the Cheetah options from the Options object. > > #if POOMA_MPI > +# ifdef _OPENMP > + int provided; > + MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); > + PInsist(provided >= MPI_THREAD_FUNNELED, "No MPI support for OpenMP"); > +# else > MPI_Init(&argc, &argv); > +# endif > #elif POOMA_CHEETAH > controller_g = new Cheetah::Controller(argc, argv); > #endif OpenMP does not support MPI_init? I'd prefer to initialize OpenMP using the same mechanism as for MPI implementations. Also, does finalization also need to change? -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 8 21:54:05 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 8 Jan 2004 22:54:05 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Support hybrid MPI/OpenMP if available In-Reply-To: <3FFDCF14.70708@codesourcery.com> References: <3FFDCF14.70708@codesourcery.com> Message-ID: On Thu, 8 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > > > #if POOMA_MPI > > +# ifdef _OPENMP > > + int provided; > > + MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); > > + PInsist(provided >= MPI_THREAD_FUNNELED, "No MPI support for OpenMP"); > > +# else > > MPI_Init(&argc, &argv); > > +# endif > > #elif POOMA_CHEETAH > > controller_g = new Cheetah::Controller(argc, argv); > > #endif > > OpenMP does not support MPI_init? I'd prefer to initialize OpenMP using > the same mechanism as for MPI implementations. It's somewhat difficult. The MPI-1 standard does not support any sort of threading and has only MPI_Init. The MPI-2 standard does support many levels of thread support which needs to be specified through MPI_Init_thread - though using MPI_Init is still possible, which is equivalent to initializing with no thread support. Nearly all implementation support the MPI_Init_thread call as part of (usually incomplete) MPI-2 support. To allow using OpenMP if MPI is used, too, we need at least make the MPI library aware of this. So, if no OpenMP is used, MPI_Init suffices and allows for MPI-1 only implementations. For OpenMP support we absolutely need MPI_Init_threads, so we use it. > Also, does finalization also need to change? No. Ok with this explanation? Thanks, Richard. From oldham at codesourcery.com Thu Jan 8 21:59:09 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 08 Jan 2004 13:59:09 -0800 Subject: [pooma-dev] Re: [PATCH] Support hybrid MPI/OpenMP if available In-Reply-To: References: <3FFDCF14.70708@codesourcery.com> Message-ID: <3FFDD2AD.1030805@codesourcery.com> Richard Guenther wrote: > On Thu, 8 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>> #if POOMA_MPI >>>+# ifdef _OPENMP >>>+ int provided; >>>+ MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); >>>+ PInsist(provided >= MPI_THREAD_FUNNELED, "No MPI support for OpenMP"); >>>+# else >>> MPI_Init(&argc, &argv); >>>+# endif >>> #elif POOMA_CHEETAH >>> controller_g = new Cheetah::Controller(argc, argv); >>> #endif >> >>OpenMP does not support MPI_init? I'd prefer to initialize OpenMP using >>the same mechanism as for MPI implementations. > > > It's somewhat difficult. The MPI-1 standard does not support any sort of > threading and has only MPI_Init. The MPI-2 standard does support many > levels of thread support which needs to be specified through > MPI_Init_thread - though using MPI_Init is still possible, which is > equivalent to initializing with no thread support. > > Nearly all implementation support the MPI_Init_thread call as part of > (usually incomplete) MPI-2 support. To allow using OpenMP if MPI is used, > too, we need at least make the MPI library aware of this. So, if no > OpenMP is used, MPI_Init suffices and allows for MPI-1 only > implementations. For OpenMP support we absolutely need MPI_Init_threads, > so we use it. > > >>Also, does finalization also need to change? > > > No. > > Ok with this explanation? Yes. I appreciate the education. > Thanks, > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 8 22:13:52 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 8 Jan 2004 23:13:52 +0100 (CET) Subject: [PATCH] Fix deadlocks in MPI reduction evaluators Message-ID: Hi! The following patch is necessary to avoid deadlocks with the MPI implementation and multi-patch setups where one context does not participate in the reduction. Fixes failure of array_test_.. - I don't remember - with MPI. Basically the scenario is that the collective synchronous MPI_Gather is called from ReduceOverContexts<> on the non-participating (and thus not receiving) contexts while the SendIterates are still in the schedulers queue. The calculation participating contexts will wait for the ReceiveIterates and patch reductions to complete using the CSem forever then. So the fix is to make the not participating contexts wait on the CSem, too, by using a fake write iterate queued after the send iterates which will trigger as soon as the send iterates complete. Tested using MPI, Cheetah and serial some time ago. Ok? Richard. 2004Jan08 Richard Guenther * src/Engine/RemoteEngine.h: use a waiting iterate to wait for reduction completion in remote single and multi-patch reduction evaluator. Do begin/endGeneration at the toplevel evaluate. src/Evaluator/Reduction.h: do begin/endGeneration at the toplevel evaluate. --- src/Engine/RemoteEngine.h 2004-01-02 12:57:48.000000000 +0100 +++ /home/richard/src/pooma/pooma-mpi3/r2/src/Engine/RemoteEngine.h 2004-01-08 23:00:40.000000000 +0100 @@ -1954,6 +1962,29 @@ } }; + +template +struct WaitingIterate : public Pooma::Iterate_t { + WaitingIterate(const Expr& e, Pooma::CountingSemaphore& csem) + : Pooma::Iterate_t(Pooma::scheduler()), + e_m(e), csem_m(csem) + { + DataObjectRequest writeReq(*this); + engineFunctor(e_m, writeReq); + } + virtual void run() + { + csem_m.incr(); + } + virtual ~WaitingIterate() + { + DataObjectRequest writeRel; + engineFunctor(e_m, writeRel); + } + Expr e_m; + Pooma::CountingSemaphore& csem_m; +}; + //----------------------------------------------------------------------------- // Single-patch Reductions involving remote engines: // @@ -1998,12 +2029,11 @@ Pooma::CountingSemaphore csem; csem.height(1); - Pooma::scheduler().beginGeneration(); - if (Pooma::context() != computationContext) { expressionApply(e, RemoteSend(computationContext)); - csem.incr(); + Pooma::Iterate_t *it = new WaitingIterate(e, csem); + Pooma::scheduler().handOff(it); } else { @@ -2013,8 +2043,7 @@ forEach(e, view, TreeCombine()), csem); } - Pooma::scheduler().endGeneration(); - + // Wait for RemoteSend or Reduction to complete. csem.wait(); RemoteProxy globalRet(ret, computationContext); @@ -2102,8 +2131,6 @@ csem.height(n); T *vals = new T[n]; - Pooma::scheduler().beginGeneration(); - i = inter.begin(); k = 0; for (j = 0; j < inter.size(); j++) @@ -2129,13 +2156,19 @@ else { expressionApply(e(*i), RemoteSend(computationalContext[j])); + // One extra RemoteSend to wait for. Maybe we can combine these + // iterates, but maybe not. Play safe for now. + csem.raise_height(1); + Pooma::Iterate_t *it = new WaitingIterate + >::Type_t>(e(*i), csem); + Pooma::scheduler().handOff(it); } } ++i; } - Pooma::scheduler().endGeneration(); + // Wait for RemoteSends and Reductions to complete. csem.wait(); if (n > 0) --- src/Evaluator/Reduction.h 2003-11-21 22:30:38.000000000 +0100 +++ /home/richard/src/pooma/pooma-mpi3/r2/src/Evaluator/Reduction.h 2004-01-02 00:40:14.000000000 +0100 @@ -128,10 +128,15 @@ void evaluate(T &ret, const Op &op, const Expr &e) const { typedef typename EvaluatorTag1::Evaluator_t Evaluator_t; + + Pooma::scheduler().beginGeneration(); + PAssert(checkValidity(e, WrappedInt())); forEach(e, PerformUpdateTag(), NullCombine()); Reduction().evaluate(ret, op, e()); + Pooma::scheduler().endGeneration(); + POOMA_INCREMENT_STATISTIC(NumReductions) } }; @@ -184,12 +189,8 @@ Pooma::CountingSemaphore csem; csem.height(1); - Pooma::scheduler().beginGeneration(); - evaluate(ret, op, e, csem); - Pooma::scheduler().endGeneration(); - csem.wait(); } }; @@ -237,12 +238,10 @@ expressionApply(e, IntersectorTag(inter)); - const int n = std::distance(inter.begin(), inter.end()); + const int n = inter.size(); Pooma::CountingSemaphore csem; csem.height(n); T *vals = new T[n]; - - Pooma::scheduler().beginGeneration(); typename Inter_t::const_iterator i = inter.begin(); int j = 0; @@ -253,8 +252,6 @@ ++i; ++j; } - Pooma::scheduler().endGeneration(); - csem.wait(); ret = vals[0]; From rguenth at tat.physik.uni-tuebingen.de Fri Jan 9 12:36:29 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 9 Jan 2004 13:36:29 +0100 (CET) Subject: [PATCH] Document OpenMP/MPI Message-ID: Hi! While the documentation beyond docs/ is in a rather bad shape, this adds a paragraph to parallelism.html describing the new --mpi and --openmp modes. Ok? How is CodeSourcery planning to do releases in the future? There is a lot of old and inaccurate information in the toplevel INSTALL.* and README files, also VERSION.LOG seems to be not up-to-date, likewise src/Field/ToDo or the toplevel Ideas. If it was my personal package, I'd start ripping out those old files entirely (likewise the scripts/build* files) and provide a README with requirements and pointers to the documentation and a generic INSTALL (of course I'd drop all traces of the word Windows there, too ;)). Maybe someone of you can suggest how to proceed with all of the information in these files? Thanks, Richard. ===== parallelism.html 1.1 vs edited ===== --- 1.1/r2/docs/parallelism.html Mon May 13 17:47:21 2002 +++ edited/parallelism.html Fri Jan 9 13:29:04 2004 @@ -1,4 +1,4 @@ -

POOMA 2.3 parallel model

+

POOMA post-2.4 parallel model

CONTEXT: In discussing the parallelism models that POOMA supports, the word `context' has a particular @@ -40,6 +40,15 @@ The 2.4 release should address these issues and permit a parallel model with multithreaded contexts that communicate with each other through messages. + +

After POOMA 2.4 two new models of parallelism are supported. Namely +use of OpenMP thread level parallelization, if supported by the compiler, +and the use of an available MPI library such as MPICH or a vendor provided +implementation. Both models, MPI and OpenMP, may be combined simultaneously +if the MPI implementation supports this kind of operation. This is especially +useful for clusters of SMP workstations. Those new modes of operation can +be specified by the --mpi and --openmp configure switches. +

CHEETAH overview:

From rguenth at tat.physik.uni-tuebingen.de Fri Jan 9 13:42:25 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 9 Jan 2004 14:42:25 +0100 (CET) Subject: [PATCH] Fix Cheetah operation Message-ID: Hi! This patch reverts part of a previous patch to restore unpacking of compressible brick views in Cheetah mode. Ok? Richard. 2004Jan09 Richard Guenther * src/Engine/RemoteEngine.h: revert removal of unpack(Engine*, char*) and cleanup(Engine*) method in Cheetah::Serialize > class. ===== RemoteEngine.h 1.5 vs edited ===== --- 1.5/r2/src/Engine/RemoteEngine.h Wed Jan 7 09:54:05 2004 +++ edited/RemoteEngine.h Fri Jan 9 14:42:11 2004 @@ -1616,6 +1616,55 @@ return nBytes; } + static inline int + unpack(Engine_t* &a, char *buffer) + { + Interval *dom; + + int change; + int nBytes=0; + + change = Serialize::unpack(dom, buffer); + buffer += change; + nBytes += change; + + bool *compressed; + + change = Serialize::unpack(compressed, buffer); + buffer += change; + nBytes += change; + + if (*compressed) + { + T *value; + + change = Serialize::unpack(value, buffer); + + Engine foo(*dom, *value); + + a = new Engine_t(foo, *dom); + } + else + { + Engine foo(*dom); + + EngineElemDeSerialize op(buffer); + + change = EngineBlockSerialize::apply(op, foo, *dom); + + a = new Engine_t(foo, *dom); + } + nBytes += change; + + return nBytes; + } + + static inline void + cleanup(Engine_t* a) + { + delete a; + } + // We support a special unpack to avoid an extra copy. static inline int From oldham at codesourcery.com Fri Jan 9 17:26:32 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 09 Jan 2004 09:26:32 -0800 Subject: [PATCH] Fix Cheetah operation In-Reply-To: References: Message-ID: <3FFEE448.8030801@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch reverts part of a previous patch to restore unpacking of > compressible brick views in Cheetah mode. > > Ok? Yes. > Richard. > > > 2004Jan09 Richard Guenther > > * src/Engine/RemoteEngine.h: revert removal of unpack(Engine*, > char*) and cleanup(Engine*) method in Cheetah::Serialize Engine > class. > > ===== RemoteEngine.h 1.5 vs edited ===== > --- 1.5/r2/src/Engine/RemoteEngine.h Wed Jan 7 09:54:05 2004 > +++ edited/RemoteEngine.h Fri Jan 9 14:42:11 2004 > @@ -1616,6 +1616,55 @@ > return nBytes; > } > > + static inline int > + unpack(Engine_t* &a, char *buffer) > + { > + Interval *dom; > + > + int change; > + int nBytes=0; > + > + change = Serialize::unpack(dom, buffer); > + buffer += change; > + nBytes += change; > + > + bool *compressed; > + > + change = Serialize::unpack(compressed, buffer); > + buffer += change; > + nBytes += change; > + > + if (*compressed) > + { > + T *value; > + > + change = Serialize::unpack(value, buffer); > + > + Engine foo(*dom, *value); > + > + a = new Engine_t(foo, *dom); > + } > + else > + { > + Engine foo(*dom); > + > + EngineElemDeSerialize op(buffer); > + > + change = EngineBlockSerialize::apply(op, foo, *dom); > + > + a = new Engine_t(foo, *dom); > + } > + nBytes += change; > + > + return nBytes; > + } > + > + static inline void > + cleanup(Engine_t* a) > + { > + delete a; > + } > + > // We support a special unpack to avoid an extra copy. > > static inline int -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Sun Jan 11 14:21:12 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Sun, 11 Jan 2004 15:21:12 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Optimize guard update copy In-Reply-To: References: <3FF45420.3090106@codesourcery.com> Message-ID: On Thu, 1 Jan 2004, Richard Guenther wrote: > On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: > > > Richard Guenther wrote: > > > Hi! > > > > > > This patch removes number four of the copies done for guard update. > > > Basically, additionally to the three copies I mentioned in the previous > > > mail, we're doing one extra during the RemoteView expressionApply of the > > > data-parallel assignment we're doing for the guard domains. Ugh. Fixed by > > > manually sending/receiving from/to the views. Doesn't work for Cheetah, > > > so conditionalized on POOMA_MPI. > > > > What breaks for Cheetah? > > I don't remember... I can try again next week. I tried again, and with Cheetah this is really a mess, because Cheetah cannot receive into a View, so we need to pass a Brick as IncomingView and a BrickView as View to SendReceive::receive() and this breaks all over the place in the Cheetah library... So, unfortunately, no - this doesn't work for Cheetah - at least not with major surgery inside the Cheetah library (which I would rather drop than fix). So, is the patch ok as it is (affecting only POOMA_MPI)? Thanks, Richard. From oldham at codesourcery.com Tue Jan 13 19:06:55 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 13 Jan 2004 11:06:55 -0800 Subject: [PATCH] Fix deadlocks in MPI reduction evaluators In-Reply-To: References: Message-ID: <400441CF.1030007@codesourcery.com> Richard Guenther wrote: > Hi! > > The following patch is necessary to avoid deadlocks with the MPI > implementation and multi-patch setups where one context does not > participate in the reduction. > > Fixes failure of array_test_.. - I don't remember - with MPI. > > Basically the scenario is that the collective synchronous MPI_Gather is > called from ReduceOverContexts<> on the non-participating (and thus > not receiving) contexts while the SendIterates are still in the > schedulers queue. The calculation participating contexts will wait for > the ReceiveIterates and patch reductions to complete using the CSem > forever then. > > So the fix is to make the not participating contexts wait on the CSem, > too, by using a fake write iterate queued after the send iterates which > will trigger as soon as the send iterates complete. Instead of adding fake write iterate can we adjust the MPI_Gather so non-participating contexts do not participate? > Tested using MPI, Cheetah and serial some time ago. > > Ok? > > Richard. > > > 2004Jan08 Richard Guenther > > * src/Engine/RemoteEngine.h: use a waiting iterate to wait for > reduction completion in remote single and multi-patch reduction > evaluator. > Do begin/endGeneration at the toplevel evaluate. > src/Evaluator/Reduction.h: do begin/endGeneration at the toplevel > evaluate. > > --- src/Engine/RemoteEngine.h 2004-01-02 12:57:48.000000000 +0100 > +++ /home/richard/src/pooma/pooma-mpi3/r2/src/Engine/RemoteEngine.h 2004-01-08 23:00:40.000000000 +0100 > @@ -1954,6 +1962,29 @@ > } > }; > > + > +template > +struct WaitingIterate : public Pooma::Iterate_t { > + WaitingIterate(const Expr& e, Pooma::CountingSemaphore& csem) > + : Pooma::Iterate_t(Pooma::scheduler()), > + e_m(e), csem_m(csem) > + { > + DataObjectRequest writeReq(*this); > + engineFunctor(e_m, writeReq); > + } > + virtual void run() > + { > + csem_m.incr(); > + } > + virtual ~WaitingIterate() > + { > + DataObjectRequest writeRel; > + engineFunctor(e_m, writeRel); > + } > + Expr e_m; > + Pooma::CountingSemaphore& csem_m; > +}; > + > //----------------------------------------------------------------------------- > // Single-patch Reductions involving remote engines: > // > @@ -1998,12 +2029,11 @@ > Pooma::CountingSemaphore csem; > csem.height(1); > > - Pooma::scheduler().beginGeneration(); > - > if (Pooma::context() != computationContext) > { > expressionApply(e, RemoteSend(computationContext)); > - csem.incr(); > + Pooma::Iterate_t *it = new WaitingIterate(e, csem); > + Pooma::scheduler().handOff(it); > } > else > { > @@ -2013,8 +2043,7 @@ > forEach(e, view, TreeCombine()), csem); > } > > - Pooma::scheduler().endGeneration(); > - > + // Wait for RemoteSend or Reduction to complete. > csem.wait(); > > RemoteProxy globalRet(ret, computationContext); > @@ -2102,8 +2131,6 @@ > csem.height(n); > T *vals = new T[n]; > > - Pooma::scheduler().beginGeneration(); > - > i = inter.begin(); > k = 0; > for (j = 0; j < inter.size(); j++) > @@ -2129,13 +2156,19 @@ > else > { > expressionApply(e(*i), RemoteSend(computationalContext[j])); > + // One extra RemoteSend to wait for. Maybe we can combine these > + // iterates, but maybe not. Play safe for now. > + csem.raise_height(1); > + Pooma::Iterate_t *it = new WaitingIterate > + >::Type_t>(e(*i), csem); > + Pooma::scheduler().handOff(it); > } > } > > ++i; > } > > - Pooma::scheduler().endGeneration(); > + // Wait for RemoteSends and Reductions to complete. > csem.wait(); > > if (n > 0) > --- src/Evaluator/Reduction.h 2003-11-21 22:30:38.000000000 +0100 > +++ /home/richard/src/pooma/pooma-mpi3/r2/src/Evaluator/Reduction.h 2004-01-02 00:40:14.000000000 +0100 > @@ -128,10 +128,15 @@ > void evaluate(T &ret, const Op &op, const Expr &e) const > { > typedef typename EvaluatorTag1::Evaluator_t Evaluator_t; > + > + Pooma::scheduler().beginGeneration(); > + > PAssert(checkValidity(e, WrappedInt())); > forEach(e, PerformUpdateTag(), NullCombine()); > Reduction().evaluate(ret, op, e()); > > + Pooma::scheduler().endGeneration(); > + > POOMA_INCREMENT_STATISTIC(NumReductions) > } > }; > @@ -184,12 +189,8 @@ > Pooma::CountingSemaphore csem; > csem.height(1); > > - Pooma::scheduler().beginGeneration(); > - > evaluate(ret, op, e, csem); > > - Pooma::scheduler().endGeneration(); > - > csem.wait(); > } > }; > @@ -237,12 +238,10 @@ > > expressionApply(e, IntersectorTag(inter)); > > - const int n = std::distance(inter.begin(), inter.end()); > + const int n = inter.size(); > Pooma::CountingSemaphore csem; > csem.height(n); > T *vals = new T[n]; > - > - Pooma::scheduler().beginGeneration(); > > typename Inter_t::const_iterator i = inter.begin(); > int j = 0; > @@ -253,8 +252,6 @@ > ++i; ++j; > } > > - Pooma::scheduler().endGeneration(); > - > csem.wait(); > > ret = vals[0]; -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Tue Jan 13 19:07:35 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 13 Jan 2004 11:07:35 -0800 Subject: [PATCH] Document OpenMP/MPI In-Reply-To: References: Message-ID: <400441F7.8000507@codesourcery.com> Richard Guenther wrote: > Hi! > > While the documentation beyond docs/ is in a rather bad shape, this adds a > paragraph to parallelism.html describing the new --mpi and --openmp modes. > > Ok? Yes. > Thanks, > > Richard. > > > ===== parallelism.html 1.1 vs edited ===== > --- 1.1/r2/docs/parallelism.html Mon May 13 17:47:21 2002 > +++ edited/parallelism.html Fri Jan 9 13:29:04 2004 > @@ -1,4 +1,4 @@ > -

POOMA 2.3 parallel model

> +

POOMA post-2.4 parallel model

> >

CONTEXT: In discussing the parallelism models that > POOMA supports, the word `context' has a particular > @@ -40,6 +40,15 @@ > The 2.4 release should address these issues and permit > a parallel model with multithreaded contexts that communicate > with each other through messages. > + > +

After POOMA 2.4 two new models of parallelism are supported. Namely > +use of OpenMP thread level parallelization, if supported by the compiler, > +and the use of an available MPI library such as MPICH or a vendor provided > +implementation. Both models, MPI and OpenMP, may be combined simultaneously > +if the MPI implementation supports this kind of operation. This is especially > +useful for clusters of SMP workstations. Those new modes of operation can > +be specified by the --mpi and --openmp configure switches. > + > >

CHEETAH overview:

> -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Tue Jan 13 19:43:46 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 13 Jan 2004 20:43:46 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators In-Reply-To: <400441CF.1030007@codesourcery.com> References: <400441CF.1030007@codesourcery.com> Message-ID: On Tue, 13 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > The following patch is necessary to avoid deadlocks with the MPI > > implementation and multi-patch setups where one context does not > > participate in the reduction. > > > > Fixes failure of array_test_.. - I don't remember - with MPI. > > > > Basically the scenario is that the collective synchronous MPI_Gather is > > called from ReduceOverContexts<> on the non-participating (and thus > > not receiving) contexts while the SendIterates are still in the > > schedulers queue. The calculation participating contexts will wait for > > the ReceiveIterates and patch reductions to complete using the CSem > > forever then. > > > > So the fix is to make the not participating contexts wait on the CSem, > > too, by using a fake write iterate queued after the send iterates which > > will trigger as soon as the send iterates complete. > > Instead of adding fake write iterate can we adjust the MPI_Gather so > non-participating contexts do not participate? The problem is not easy to tackle in MPI_Gather, as collective communication primitives involve all contexts and this can be overcome only by creating a new MPI communicator, which is costly. Also I'm not sure that this will solve the problem at all. The problem is that contexts participating only via sending their data to a remote context (i.e. are participating, but not computing) don't have the counting semaphore to block on (its height is zero for them). So after queuing the send iterates they go straight to the final reduction which is not done via an extra iterate and block there, not firing off the send iterate in the first place. Ugh. Same of course for completely non participating contexts, and even this may be a problem because of old unrun iterates. So in first I thought of creating a DataObject to hold the reduction result, so we can do usual data-flow evaluation on it, and not ignore dependencies on it, as we do now. But this turned out to be more invasive and I didn't have time to complete this. So the fake writing iterate solves the problem (only partly, because, I could imagine for completely non-participating contexts the problem is still there) for me. But anyway, I'm not pushing this very hard now, but it's guaranteed to deadlock at reductions otherwise for MPI for me (so there's a race even in the case of all-participating contexts, or the intersector is doing something strange). Richard. From rguenth at tat.physik.uni-tuebingen.de Wed Jan 14 11:11:56 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 14 Jan 2004 12:11:56 +0100 (CET) Subject: [PATCH] Fix PrintField wrt expressions Message-ID: Hi! The following patch allows to print Fields with expression engines. PrintField uses applyRelations() while it should use a tree-walk with PerformUpdateTag. Ok? Richard. 2004Jan14 Richard Guenther * src/Field/PrintField.h: use forEach(,PerformUpdateTag(),) rather than applyRelations(). ===== PrintField.h 1.3 vs edited ===== --- 1.3/r2/src/Field/PrintField.h Wed Dec 3 12:30:41 2003 +++ edited/PrintField.h Wed Jan 14 12:01:09 2004 @@ -231,7 +231,7 @@ template void print(S &s, const A &a) const { - a.applyRelations(); + forEach(a, PerformUpdateTag(), NullCombine()); Pooma::blockAndEvaluate(); for (int m = 0; m < a.numMaterials(); m++) From rguenth at tat.physik.uni-tuebingen.de Wed Jan 14 19:00:01 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 14 Jan 2004 20:00:01 +0100 (CET) Subject: [PATCH] Return references in LayoutBase Message-ID: Hi! This patch makes internalGuards(), externalGuards() and blocks() methods of LayoutBase return const references rather than copies. Tested with no regressions on MPI intel linux. Ok? Richard. 2004Jan14 Richard Guenther * src/Layout/LayoutBase.h: return const references in guard and block accessors. ===== LayoutBase.h 1.10 vs edited ===== --- 1.10/r2/src/Layout/LayoutBase.h Wed Jan 7 12:17:55 2004 +++ edited/LayoutBase.h Tue Jan 13 23:37:13 2004 @@ -204,12 +204,12 @@ return all_m[i]->allocated(); } - inline GuardLayers_t internalGuards() const + inline const GuardLayers_t& internalGuards() const { return internalGuards_m; } - inline GuardLayers_t externalGuards() const + inline const GuardLayers_t& externalGuards() const { return externalGuards_m; } @@ -243,7 +243,7 @@ /// number of blocks along each axis. - inline Loc blocks() const { return blocks_m; } + inline const Loc& blocks() const { return blocks_m; } ///@name Guard-cell related functions. /// Iterators into the fill list. These are MultiPatch's interface to From rguenth at tat.physik.uni-tuebingen.de Wed Jan 14 20:50:00 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 14 Jan 2004 21:50:00 +0100 (CET) Subject: [PATCH] Canonicalize makeOwnCopy of multipatch engine Message-ID: Hi! This makes the PAssert(valid) to an if as is done in all other Engines. Fixes a problem noted by a guy that likes to mail me privately in german rather than to the list ;) Ok? Richard. 2004Jan14 Richard Guenther * src/Engine/MultiPatchEngine.cpp: don't assert validity in makeOwnCopy(), but rather ignore the request in the invalid case. Index: src/Engine/MultiPatchEngine.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Engine/MultiPatchEngine.cpp,v retrieving revision 1.53 diff -u -u -r1.53 MultiPatchEngine.cpp --- src/Engine/MultiPatchEngine.cpp 6 May 2003 20:50:39 -0000 1.53 +++ src/Engine/MultiPatchEngine.cpp 14 Jan 2004 20:44:24 -0000 @@ -244,8 +244,7 @@ Engine >:: makeOwnCopy() { - PAssert(data_m.isValid()); - if (data_m.isShared()) { + if (data_m.isValid() && data_m.isShared()) { data_m.makeOwnCopy(); pDirty_m = new bool(*pDirty_m); } From rguenth at tat.physik.uni-tuebingen.de Wed Jan 14 20:56:51 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Wed, 14 Jan 2004 21:56:51 +0100 (CET) Subject: [PATCH] Speed up guard update. Message-ID: Hi! This is a refined (aka shorter) patch which unifies the tracking of up-to-date faces and the special optimized copy for MPI. Tested on serial ia32 linux with gcc3.4 with no regression. Ok? Richard. 2004Jan14 Richard Guenther * src/Engine/Intersector.h: track used guard faces. src/Engine/MultiPatchEngine.h: track up-to-dateness per face using a bitmask. src/Engine/Stencil.h: track used guard faces. src/Field/DiffOps/FieldStencil.h: track used guard faces. src/Layout/GridLayout.cpp: record face of guard update. src/Layout/LayoutBase.h: add face_m member to guard update struct. src/Layout/UniformGridLayout.cpp: record face of guard update. src/Engine/MultiPatchEngine.cpp: update only not up-to-date and needed faces during fillGuards(). Do manual Send/Receive of the inner guards domain for MPI. --- cvs/r2/src/Engine/Intersector.h 2004-01-14 20:08:06.000000000 +0100 +++ pooma-mpi3/r2/src/Engine/Intersector.h 2004-01-14 20:13:32.000000000 +0100 @@ -129,7 +129,8 @@ } template - bool intersect(const Engine &engine, const GuardLayers &guard) + bool intersect(const Engine &engine, const GuardLayers &guard, + GuardLayers &usedGuards) { CTAssert(Engine::dimensions == Dim); @@ -145,9 +146,7 @@ // If we've seen this ID before, we're done. if (ids_m[i] == layout.ID()) - { return false; - } // If we've seen the base ID before and the base domain is the same // we're done. @@ -157,10 +156,27 @@ { shared(layout.ID(),ids_m[i]); - // In this case we are using the guard cells unless this domain - // is exactly the same as one we've seen before. + // was: return (!sameBaseDomain(i,layout.baseDomain())); - return (!sameBaseDomain(i,layout.baseDomain())); + // We should be able to find out the actual shape of the + // used internal guards here, rather than just returning bool. + // Something like: + + // But what do, if Dim2 > baseDims_m[i]!? + if (baseDims_m[i] < Dim2) + return true; + + bool used = false; + for (int j = 0; j < Dim2; j++) + { + usedGuards.lower(j) = std::max(0, baseDomains_m[i][j].first() - layout.baseDomain()[j].first()); + if (usedGuards.lower(j) != 0) + used = true; + usedGuards.upper(j) = std::max(0, layout.baseDomain()[j].last() - baseDomains_m[i][j].last()); + if (usedGuards.upper(j) != 0) + used = true; + } + return used; } } @@ -437,9 +453,9 @@ template inline - bool intersect(const Engine &l, const GuardLayers &guard) + bool intersect(const Engine &l, const GuardLayers &guard, GuardLayers &usedGuards) { - return (data()->intersect(l,guard)); + return (data()->intersect(l,guard,usedGuards)); } private: --- cvs/r2/src/Engine/MultiPatchEngine.h 2004-01-14 20:11:36.000000000 +0100 +++ pooma-mpi3/r2/src/Engine/MultiPatchEngine.h 2004-01-14 20:13:32.000000000 +0100 @@ -628,13 +628,18 @@ //--------------------------------------------------------------------------- /// Fill the internal guard cells. - inline void fillGuards() const + inline void fillGuards(const GuardLayers& g) const { - fillGuardsHandler(WrappedInt()); + fillGuardsHandler(g, WrappedInt()); + } + + inline void fillGuards() const + { + fillGuards(layout().internalGuards()); } - inline void fillGuardsHandler(const WrappedInt&) const { }; - void fillGuardsHandler(const WrappedInt&) const ; + inline void fillGuardsHandler(const GuardLayers&, const WrappedInt&) const { }; + void fillGuardsHandler(const GuardLayers&, const WrappedInt&) const ; //--------------------------------------------------------------------------- /// Set the internal guard cells to a particular value. @@ -650,14 +655,31 @@ /// Set and get the dirty flag (fillGuards is a no-op unless the /// dirty flag is true). + inline int dirty() const { return *pDirty_m; } + inline void setDirty() const { - *pDirty_m = true; + *pDirty_m = (1<<(Dim*2))-1; + } + + inline void clearDirty(int face = -1) const + { + if (face == -1) + *pDirty_m = 0; + else { + PAssert(face >= 0 && face <= Dim*2-1); + *pDirty_m &= ~(1<= 0 && face <= Dim*2-1); + return *pDirty_m & (1<& g) const + { + baseEngine_m.fillGuards(g); + } + //--------------------------------------------------------------------------- /// Set the internal guard cells to a particular value (default zero) @@ -1217,10 +1244,15 @@ { baseEngine_m.setDirty(); } + + inline void clearDirty(int face=-1) const + { + baseEngine_m.clearDirty(face); + } - inline bool isDirty() const + inline bool isDirty(int face=-1) const { - return baseEngine_m.isDirty(); + return baseEngine_m.isDirty(face); } //--------------------------------------------------------------------------- @@ -1694,12 +1726,13 @@ apply(const Engine > &engine, const ExpressionApply > &tag) { + GuardLayers usedGuards; bool useGuards = tag.tag().intersector_m.intersect(engine, - engine.layout().internalGuards()); + engine.layout().internalGuards(), usedGuards); if (useGuards) - engine.fillGuards(); + engine.fillGuards(usedGuards); return 0; } @@ -1725,13 +1758,14 @@ const ExpressionApply > &tag, const WrappedInt &) { + GuardLayers usedGuards; bool useGuards = tag.tag().intersector_m. intersect(engine, - engine.layout().baseLayout().internalGuards()); + engine.layout().baseLayout().internalGuards(), usedGuards); if (useGuards) - engine.fillGuards(); + engine.fillGuards(usedGuards); return 0; } @@ -1741,7 +1775,7 @@ const ExpressionApply > &tag, const WrappedInt &) { - tag.tag().intersector_m.intersect(engine, GuardLayers()); + tag.tag().intersector_m.intersect(engine); return 0; } }; --- cvs/r2/src/Engine/Stencil.h 2004-01-14 20:08:07.000000000 +0100 +++ pooma-mpi3/r2/src/Engine/Stencil.h 2004-01-14 20:13:32.000000000 +0100 @@ -752,11 +752,14 @@ StencilIntersector(const This_t &model) : domain_m(model.domain_m), + stencilExtent_m(model.stencilExtent_m), intersector_m(model.intersector_m) { } - StencilIntersector(const Interval &domain, const Intersect &intersect) + StencilIntersector(const Interval &domain, const Intersect &intersect, + const GuardLayers &stencilExtent) : domain_m(domain), + stencilExtent_m(stencilExtent), intersector_m(intersect) { } @@ -766,6 +769,7 @@ { intersector_m = model.intersector_m; domain_m = model.domain_m; + stencilExtent_m = model.stencilExtent_m; } return *this; } @@ -807,14 +811,19 @@ template inline - bool intersect(const Engine &engine, const GuardLayers &) + bool intersect(const Engine &engine, const GuardLayers &g, + GuardLayers &usedGuards) { intersect(engine); + // FIXME: accumulate used guards from intersect above and + // stencil extent? I.e. allow Stencil<>(a(i-1)+a(i+1))? + usedGuards = stencilExtent_m; return true; } private: Interval domain_m; + GuardLayers stencilExtent_m; Intersect intersector_m; }; @@ -833,8 +842,14 @@ const ExpressionApply > &tag) { typedef StencilIntersector NewIntersector_t; + GuardLayers stencilExtent; + for (int i=0; i(newIntersector)); --- cvs/r2/src/Field/DiffOps/FieldStencil.h 2004-01-14 20:08:09.000000000 +0100 +++ pooma-mpi3/r2/src/Field/DiffOps/FieldStencil.h 2004-01-14 20:13:32.000000000 +0100 @@ -614,11 +617,13 @@ // Constructors FieldStencilIntersector(const This_t &model) - : domain_m(model.domain_m), intersector_m(model.intersector_m) + : domain_m(model.domain_m), stencilExtent_m(model.stencilExtent_m), + intersector_m(model.intersector_m) { } - FieldStencilIntersector(const Domain_t &dom, const Intersect &intersect) - : domain_m(dom), intersector_m(intersect) + FieldStencilIntersector(const Domain_t &dom, const Intersect &intersect, + const GuardLayers &stencilExtent) + : domain_m(dom), stencilExtent_m(stencilExtent), intersector_m(intersect) { } This_t &operator=(const This_t &model) @@ -626,6 +631,7 @@ if (this != &model) { domain_m = model.domain_m; + stencilExtent_m = model.stencilExtent_m; intersector_m = model.intersector_m; } return *this; @@ -662,9 +668,13 @@ } template - inline bool intersect(const Engine &engine, const GuardLayers &) + inline bool intersect(const Engine &engine, const GuardLayers &, + GuardLayers &usedGuards) { intersect(engine); + // FIXME: accumulate used guards from intersect above and + // stencil extent? I.e. allow Stencil<>(a(i-1)+a(i+1))? + usedGuards = stencilExtent_m; return true; } @@ -672,6 +682,7 @@ Interval domain_m; + GuardLayers stencilExtent_m; Intersect intersector_m; }; @@ -699,8 +710,14 @@ // cells results in an error in the multipatch inode view.) typedef FieldStencilIntersector NewIntersector_t; + GuardLayers stencilExtent; + for (int i=0; i(newIntersector)); --- cvs/r2/src/Layout/GridLayout.cpp 2004-01-14 20:08:10.000000000 +0100 +++ pooma-mpi3/r2/src/Layout/GridLayout.cpp 2004-01-14 20:13:32.000000000 +0100 @@ -429,7 +436,7 @@ // Now, push IDs and source into cache... - this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID)); + this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID, d*2)); } } } @@ -481,7 +488,7 @@ // Now, push IDs and source into cache... - this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID)); + this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID, d*2+1)); } } } --- cvs/r2/src/Layout/LayoutBase.h 2004-01-14 20:08:12.000000000 +0100 +++ pooma-mpi3/r2/src/Layout/LayoutBase.h 2004-01-14 20:13:32.000000000 +0100 @@ -119,8 +121,8 @@ struct GCFillInfo { - GCFillInfo(const Domain_t &dom, int ownedID, int guardID) - : domain_m(dom), ownedID_m(ownedID), guardID_m(guardID) { } + GCFillInfo(const Domain_t &dom, int ownedID, int guardID, int face=-1) + : domain_m(dom), ownedID_m(ownedID), guardID_m(guardID), face_m(face) { } // Get a CW warning about this not having a default constructor // when we instantiate the vector below. This never @@ -131,6 +133,7 @@ Domain_t domain_m; // guard layer domain int ownedID_m; // node ID for which domain_m is owned int guardID_m; // node ID for which domain_m is in the guards + int face_m; // destination face of the guard layer (or -1, if unknown) Domain_t & domain() { return domain_m;} int & ownedID() { return ownedID_m;} --- cvs/r2/src/Layout/UniformGridLayout.cpp 2004-01-14 20:08:13.000000000 +0100 +++ pooma-mpi3/r2/src/Layout/UniformGridLayout.cpp 2004-01-14 20:13:32.000000000 +0100 @@ -279,7 +279,7 @@ //----------------------------------------------------------------------------- // // template -// void UniformGridLayout::calcGCFillList() +// void UniformGridLayoutData::calcGCFillList() // // Calculates the cached information needed by MultiPatch Engine to // fill the guard cells. @@ -370,7 +370,7 @@ this->all_m[sourceID]->context() == Pooma::context() || this->all_m[destID]->context() == Pooma::context() ) - this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID)); + this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID,d*2)); } } @@ -417,7 +417,7 @@ this->all_m[sourceID]->context() == Pooma::context() || this->all_m[destID]->context() == Pooma::context() ) - this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID)); + this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID,d*2+1)); } } } --- cvs/r2/src/Engine/MultiPatchEngine.cpp 2004-01-14 20:11:34.000000000 +0100 +++ pooma-mpi3/r2/src/Engine/MultiPatchEngine.cpp 2004-01-14 20:23:23.000000000 +0100 @@ -34,6 +34,7 @@ #include "Engine/CompressedFraction.h" #include "Array/Array.h" #include "Tulip/ReduceOverContexts.h" +#include "Tulip/SendReceive.h" #include "Threads/PoomaCSem.h" #include "Domain/IteratorPairDomain.h" @@ -77,16 +78,18 @@ Engine(const Layout_t &layout) : layout_m(layout), data_m(layout.sizeGlobal()), - pDirty_m(new bool(true)) + pDirty_m(new int) { typedef typename Layout_t::Value_t Node_t; + setDirty(); + // check for correct match of PatchTag and the mapper used to make the // layout. // THIS IS A HACK! we test on the context of the first patch, and if it // is -1, we have a Layout made with the LocalMapper. -#if POOMA_CHEETAH +#if POOMA_MESSAGING if( layout_m.nodeListGlobal().size() > 0) { @@ -247,7 +250,7 @@ PAssert(data_m.isValid()); if (data_m.isShared()) { data_m.makeOwnCopy(); - pDirty_m = new bool(*pDirty_m); + pDirty_m = new int(*pDirty_m); } return *this; @@ -261,45 +264,88 @@ // //----------------------------------------------------------------------------- +/// Guard layer assign between non-remote engines, just use the +/// ET mechanisms + +template +static inline +void simpleAssign(const Array& lhs, + const Array& rhs, + const Interval& domain) +{ + lhs(domain) = rhs(domain); +} + +/// Guard layer assign between remote engines, use Send/Receive directly +/// to avoid one extra copy of the data. + +template +static inline +void simpleAssign(const Array >& lhs, + const Array >& rhs, + const Interval& domain) +{ + if (lhs.engine().owningContext() == rhs.engine().owningContext()) + lhs(domain) = rhs(domain); + else { + typedef typename NewEngine, Interval >::Type_t ViewEngine_t; + if (lhs.engine().engineIsLocal()) + Receive::receive(ViewEngine_t(lhs.engine().localEngine(), domain), + rhs.engine().owningContext()); + else if (rhs.engine().engineIsLocal()) + SendReceive::send(ViewEngine_t(rhs.engine().localEngine(), domain), + lhs.engine().owningContext()); + } +} + template void Engine >:: -fillGuardsHandler(const WrappedInt &) const +fillGuardsHandler(const GuardLayers& g, const WrappedInt &) const { if (!isDirty()) return; - -#if POOMA_PURIFY - - // This is here to remove spurious UMRs that result when un-initialized - // guards are copied in the following loop. All of the unitialized data - // is ultimately overwritten with good data, so I don't see why purify - // calls these UMRs in stead of unitialized memory copies, but it does. - // I don't do this in general since it would be slow and since T(0) is - // not generally valid. This does mean that fillGuards() will fail - // with purify for types that do not know what to do with T(0). - - setGuards(T(0)); - -#endif + int updated = 0; typename Layout_t::FillIterator_t p = layout_m.beginFillList(); - + while (p != layout_m.endFillList()) { int src = p->ownedID_m; int dest = p->guardID_m; - // Create patch arrays that see the entire patch: + // Skip face, if not dirty. + + if (isDirty(p->face_m)) { + + // Check, if the p->domain_m is a guard which matches the + // needed guard g. + + int d = p->face_m/2; + int guardSizeNeeded = p->face_m & 1 ? g.upper(d) : g.lower(d); + if (!(p->face_m != -1 + && guardSizeNeeded == 0)) { + + // Create patch arrays that see the entire patch: - Array lhs(data()[dest]), rhs(data()[src]); + Array lhs(data()[dest]), rhs(data()[src]); - // Now do assignment from the subdomains. + // Now do assignment from the subdomains. +#if POOMA_MPI + simpleAssign(lhs, rhs, p->domain_m); +#else + lhs(p->domain_m) = rhs(p->domain_m); +#endif + + // Mark up-to-date. + updated |= 1<face_m; + + } + + } - lhs(p->domain_m) = rhs(p->domain_m); - ++p; } - - *pDirty_m = false; + + *pDirty_m &= ~updated; } @@ -331,7 +377,7 @@ ++p; } - *pDirty_m = true; + setDirty(); } @@ -366,7 +412,7 @@ ++p; } - *pDirty_m = true; + setDirty(); } From oldham at codesourcery.com Thu Jan 15 21:16:11 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 13:16:11 -0800 Subject: [PATCH] Return references in LayoutBase In-Reply-To: References: Message-ID: <4007031B.2040205@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch makes internalGuards(), externalGuards() and blocks() methods > of LayoutBase return const references rather than copies. > > Tested with no regressions on MPI intel linux. > > Ok? > Yes. > Richard. > > > 2004Jan14 Richard Guenther > > * src/Layout/LayoutBase.h: return const references in > guard and block accessors. > > ===== LayoutBase.h 1.10 vs edited ===== > --- 1.10/r2/src/Layout/LayoutBase.h Wed Jan 7 12:17:55 2004 > +++ edited/LayoutBase.h Tue Jan 13 23:37:13 2004 > @@ -204,12 +204,12 @@ > return all_m[i]->allocated(); > } > > - inline GuardLayers_t internalGuards() const > + inline const GuardLayers_t& internalGuards() const > { > return internalGuards_m; > } > > - inline GuardLayers_t externalGuards() const > + inline const GuardLayers_t& externalGuards() const > { > return externalGuards_m; > } > @@ -243,7 +243,7 @@ > > /// number of blocks along each axis. > > - inline Loc blocks() const { return blocks_m; } > + inline const Loc& blocks() const { return blocks_m; } > > ///@name Guard-cell related functions. > /// Iterators into the fill list. These are MultiPatch's interface to -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Thu Jan 15 21:17:51 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 13:17:51 -0800 Subject: [PATCH] Fix PrintField wrt expressions In-Reply-To: References: Message-ID: <4007037F.3080103@codesourcery.com> Richard Guenther wrote: > Hi! > > The following patch allows to print Fields with expression engines. > PrintField uses applyRelations() while it should use a tree-walk with > PerformUpdateTag. So, with this change, the field will be guaranteed to be updated by any relations that can change the field? > Ok? > > Richard. > > > 2004Jan14 Richard Guenther > > * src/Field/PrintField.h: use forEach(,PerformUpdateTag(),) rather > than applyRelations(). > > ===== PrintField.h 1.3 vs edited ===== > --- 1.3/r2/src/Field/PrintField.h Wed Dec 3 12:30:41 2003 > +++ edited/PrintField.h Wed Jan 14 12:01:09 2004 > @@ -231,7 +231,7 @@ > template > void print(S &s, const A &a) const > { > - a.applyRelations(); > + forEach(a, PerformUpdateTag(), NullCombine()); > Pooma::blockAndEvaluate(); > > for (int m = 0; m < a.numMaterials(); m++) -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 02:55:08 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 18:55:08 -0800 Subject: [PATCH] Canonicalize makeOwnCopy of multipatch engine In-Reply-To: References: Message-ID: <4007528C.3030701@codesourcery.com> Richard Guenther wrote: > Hi! > > This makes the PAssert(valid) to an if as is done in all other Engines. > Fixes a problem noted by a guy that likes to mail me privately in german > rather than to the list ;) > > Ok? ja. ;) > Richard. > > > 2004Jan14 Richard Guenther > > * src/Engine/MultiPatchEngine.cpp: don't assert validity > in makeOwnCopy(), but rather ignore the request in the > invalid case. > > Index: src/Engine/MultiPatchEngine.cpp > =================================================================== > RCS file: /home/pooma/Repository/r2/src/Engine/MultiPatchEngine.cpp,v > retrieving revision 1.53 > diff -u -u -r1.53 MultiPatchEngine.cpp > --- src/Engine/MultiPatchEngine.cpp 6 May 2003 20:50:39 -0000 1.53 > +++ src/Engine/MultiPatchEngine.cpp 14 Jan 2004 20:44:24 -0000 > @@ -244,8 +244,7 @@ > Engine >:: > makeOwnCopy() > { > - PAssert(data_m.isValid()); > - if (data_m.isShared()) { > + if (data_m.isValid() && data_m.isShared()) { > data_m.makeOwnCopy(); > pDirty_m = new bool(*pDirty_m); > } -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 02:58:21 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 18:58:21 -0800 Subject: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators In-Reply-To: References: <400441CF.1030007@codesourcery.com> Message-ID: <4007534D.90101@codesourcery.com> Richard Guenther wrote: > On Tue, 13 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>The following patch is necessary to avoid deadlocks with the MPI >>>implementation and multi-patch setups where one context does not >>>participate in the reduction. >>> >>>Fixes failure of array_test_.. - I don't remember - with MPI. >>> >>>Basically the scenario is that the collective synchronous MPI_Gather is >>>called from ReduceOverContexts<> on the non-participating (and thus >>>not receiving) contexts while the SendIterates are still in the >>>schedulers queue. The calculation participating contexts will wait for >>>the ReceiveIterates and patch reductions to complete using the CSem >>>forever then. >>> >>>So the fix is to make the not participating contexts wait on the CSem, >>>too, by using a fake write iterate queued after the send iterates which >>>will trigger as soon as the send iterates complete. >> >>Instead of adding fake write iterate can we adjust the MPI_Gather so >>non-participating contexts do not participate? > > > The problem is not easy to tackle in MPI_Gather, as collective > communication primitives involve all contexts and this can be overcome > only by creating a new MPI communicator, which is costly. Also I'm not > sure that this will solve the problem at all. > > The problem is that contexts participating only via sending their data to > a remote context (i.e. are participating, but not computing) don't have > the counting semaphore to block on (its height is zero for them). So > after queuing the send iterates they go straight to the final reduction > which is not done via an extra iterate and block there, not firing off the > send iterate in the first place. Ugh. Same of course for completely non > participating contexts, and even this may be a problem because of old > unrun iterates. > > So in first I thought of creating a DataObject to hold the reduction > result, so we can do usual data-flow evaluation on it, and not ignore > dependencies on it, as we do now. But this turned out to be more invasive > and I didn't have time to complete this. > > So the fake writing iterate solves the problem (only partly, because, I > could imagine for completely non-participating contexts the problem is > still there) for me. > > But anyway, I'm not pushing this very hard now, but it's guaranteed to > deadlock at reductions otherwise for MPI for me (so there's a race even > in the case of all-participating contexts, or the intersector is doing > something strange). > > Richard. I appreciate your finding the difficulty and your taking the time to explain the problem. I am reluctant to add code that is known to be broken for some situations. Is there a way to mark the code so 1) the known brokenness is marked and 2) the program asks sensibly when the brokenness is experienced? -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 03:04:14 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 19:04:14 -0800 Subject: [PATCH] Speed up guard update. In-Reply-To: References: Message-ID: <400754AE.1070102@codesourcery.com> Richard Guenther wrote: > Hi! > > This is a refined (aka shorter) patch which unifies the tracking of > up-to-date faces and the special optimized copy for MPI. > > Tested on serial ia32 linux with gcc3.4 with no regression. > > Ok? Yes, assuming the user interface did not change. It looks like GCFillInfo's interface changed but existing code will still run because a parameter with a default argument was added. > Richard. > > > 2004Jan14 Richard Guenther > > * src/Engine/Intersector.h: track used guard faces. > src/Engine/MultiPatchEngine.h: track up-to-dateness per > face using a bitmask. > src/Engine/Stencil.h: track used guard faces. > src/Field/DiffOps/FieldStencil.h: track used guard faces. > src/Layout/GridLayout.cpp: record face of guard update. > src/Layout/LayoutBase.h: add face_m member to guard update > struct. > src/Layout/UniformGridLayout.cpp: record face of guard update. > src/Engine/MultiPatchEngine.cpp: update only not up-to-date > and needed faces during fillGuards(). Do manual Send/Receive > of the inner guards domain for MPI. > > --- cvs/r2/src/Engine/Intersector.h 2004-01-14 20:08:06.000000000 +0100 > +++ pooma-mpi3/r2/src/Engine/Intersector.h 2004-01-14 20:13:32.000000000 +0100 > @@ -129,7 +129,8 @@ > } > > template > - bool intersect(const Engine &engine, const GuardLayers &guard) > + bool intersect(const Engine &engine, const GuardLayers &guard, > + GuardLayers &usedGuards) > { > CTAssert(Engine::dimensions == Dim); > > @@ -145,9 +146,7 @@ > // If we've seen this ID before, we're done. > > if (ids_m[i] == layout.ID()) > - { > return false; > - } > > // If we've seen the base ID before and the base domain is the same > // we're done. > @@ -157,10 +156,27 @@ > { > shared(layout.ID(),ids_m[i]); > > - // In this case we are using the guard cells unless this domain > - // is exactly the same as one we've seen before. > + // was: return (!sameBaseDomain(i,layout.baseDomain())); > > - return (!sameBaseDomain(i,layout.baseDomain())); > + // We should be able to find out the actual shape of the > + // used internal guards here, rather than just returning bool. > + // Something like: > + > + // But what do, if Dim2 > baseDims_m[i]!? > + if (baseDims_m[i] < Dim2) > + return true; > + > + bool used = false; > + for (int j = 0; j < Dim2; j++) > + { > + usedGuards.lower(j) = std::max(0, baseDomains_m[i][j].first() - layout.baseDomain()[j].first()); > + if (usedGuards.lower(j) != 0) > + used = true; > + usedGuards.upper(j) = std::max(0, layout.baseDomain()[j].last() - baseDomains_m[i][j].last()); > + if (usedGuards.upper(j) != 0) > + used = true; > + } > + return used; > } > } > > @@ -437,9 +453,9 @@ > > template > inline > - bool intersect(const Engine &l, const GuardLayers &guard) > + bool intersect(const Engine &l, const GuardLayers &guard, GuardLayers &usedGuards) > { > - return (data()->intersect(l,guard)); > + return (data()->intersect(l,guard,usedGuards)); > } > > private: > --- cvs/r2/src/Engine/MultiPatchEngine.h 2004-01-14 20:11:36.000000000 +0100 > +++ pooma-mpi3/r2/src/Engine/MultiPatchEngine.h 2004-01-14 20:13:32.000000000 +0100 > @@ -628,13 +628,18 @@ > //--------------------------------------------------------------------------- > /// Fill the internal guard cells. > > - inline void fillGuards() const > + inline void fillGuards(const GuardLayers& g) const > { > - fillGuardsHandler(WrappedInt()); > + fillGuardsHandler(g, WrappedInt()); > + } > + > + inline void fillGuards() const > + { > + fillGuards(layout().internalGuards()); > } > > - inline void fillGuardsHandler(const WrappedInt&) const { }; > - void fillGuardsHandler(const WrappedInt&) const ; > + inline void fillGuardsHandler(const GuardLayers&, const WrappedInt&) const { }; > + void fillGuardsHandler(const GuardLayers&, const WrappedInt&) const ; > > //--------------------------------------------------------------------------- > /// Set the internal guard cells to a particular value. > @@ -650,14 +655,31 @@ > /// Set and get the dirty flag (fillGuards is a no-op unless the > /// dirty flag is true). > > + inline int dirty() const { return *pDirty_m; } > + > inline void setDirty() const > { > - *pDirty_m = true; > + *pDirty_m = (1<<(Dim*2))-1; > + } > + > + inline void clearDirty(int face = -1) const > + { > + if (face == -1) > + *pDirty_m = 0; > + else { > + PAssert(face >= 0 && face <= Dim*2-1); > + *pDirty_m &= ~(1< + } > } > > - inline bool isDirty() const > + inline bool isDirty(int face = -1) const > { > - return *pDirty_m; > + if (face == -1) > + return *pDirty_m != 0; > + else { > + PAssert(face >= 0 && face <= Dim*2-1); > + return *pDirty_m & (1< + } > } > > //============================================================ > @@ -874,7 +896,7 @@ > /// must share the same flag. We use the reference count in > /// data_m to decide whether to clean this up. > > - bool *pDirty_m; > + int *pDirty_m; > }; > > > @@ -1193,6 +1215,11 @@ > baseEngine_m.fillGuards(); > } > > + inline void fillGuards(const GuardLayers& g) const > + { > + baseEngine_m.fillGuards(g); > + } > + > //--------------------------------------------------------------------------- > /// Set the internal guard cells to a particular value (default zero) > > @@ -1217,10 +1244,15 @@ > { > baseEngine_m.setDirty(); > } > + > + inline void clearDirty(int face=-1) const > + { > + baseEngine_m.clearDirty(face); > + } > > - inline bool isDirty() const > + inline bool isDirty(int face=-1) const > { > - return baseEngine_m.isDirty(); > + return baseEngine_m.isDirty(face); > } > > //--------------------------------------------------------------------------- > @@ -1694,12 +1726,13 @@ > apply(const Engine > &engine, > const ExpressionApply > &tag) > { > + GuardLayers usedGuards; > bool useGuards = > tag.tag().intersector_m.intersect(engine, > - engine.layout().internalGuards()); > + engine.layout().internalGuards(), usedGuards); > > if (useGuards) > - engine.fillGuards(); > + engine.fillGuards(usedGuards); > > return 0; > } > @@ -1725,13 +1758,14 @@ > const ExpressionApply > &tag, > const WrappedInt &) > { > + GuardLayers usedGuards; > bool useGuards = > tag.tag().intersector_m. > intersect(engine, > - engine.layout().baseLayout().internalGuards()); > + engine.layout().baseLayout().internalGuards(), usedGuards); > > if (useGuards) > - engine.fillGuards(); > + engine.fillGuards(usedGuards); > > return 0; > } > @@ -1741,7 +1775,7 @@ > const ExpressionApply > &tag, > const WrappedInt &) > { > - tag.tag().intersector_m.intersect(engine, GuardLayers()); > + tag.tag().intersector_m.intersect(engine); > return 0; > } > }; > --- cvs/r2/src/Engine/Stencil.h 2004-01-14 20:08:07.000000000 +0100 > +++ pooma-mpi3/r2/src/Engine/Stencil.h 2004-01-14 20:13:32.000000000 +0100 > @@ -752,11 +752,14 @@ > > StencilIntersector(const This_t &model) > : domain_m(model.domain_m), > + stencilExtent_m(model.stencilExtent_m), > intersector_m(model.intersector_m) > { } > > - StencilIntersector(const Interval &domain, const Intersect &intersect) > + StencilIntersector(const Interval &domain, const Intersect &intersect, > + const GuardLayers &stencilExtent) > : domain_m(domain), > + stencilExtent_m(stencilExtent), > intersector_m(intersect) > { } > > @@ -766,6 +769,7 @@ > { > intersector_m = model.intersector_m; > domain_m = model.domain_m; > + stencilExtent_m = model.stencilExtent_m; > } > return *this; > } > @@ -807,14 +811,19 @@ > > template > inline > - bool intersect(const Engine &engine, const GuardLayers &) > + bool intersect(const Engine &engine, const GuardLayers &g, > + GuardLayers &usedGuards) > { > intersect(engine); > + // FIXME: accumulate used guards from intersect above and > + // stencil extent? I.e. allow Stencil<>(a(i-1)+a(i+1))? > + usedGuards = stencilExtent_m; > return true; > } > > private: > Interval domain_m; > + GuardLayers stencilExtent_m; > Intersect intersector_m; > }; > > @@ -833,8 +842,14 @@ > const ExpressionApply > &tag) > { > typedef StencilIntersector NewIntersector_t; > + GuardLayers stencilExtent; > + for (int i=0; i + stencilExtent.lower(i) = engine.function().lowerExtent(i); > + stencilExtent.upper(i) = engine.function().upperExtent(i); > + } > NewIntersector_t newIntersector(engine.intersectDomain(), > - tag.tag().intersector_m); > + tag.tag().intersector_m, > + stencilExtent); > > expressionApply(engine.expression(), > IntersectorTag(newIntersector)); > --- cvs/r2/src/Field/DiffOps/FieldStencil.h 2004-01-14 20:08:09.000000000 +0100 > +++ pooma-mpi3/r2/src/Field/DiffOps/FieldStencil.h 2004-01-14 20:13:32.000000000 +0100 > @@ -614,11 +617,13 @@ > // Constructors > > FieldStencilIntersector(const This_t &model) > - : domain_m(model.domain_m), intersector_m(model.intersector_m) > + : domain_m(model.domain_m), stencilExtent_m(model.stencilExtent_m), > + intersector_m(model.intersector_m) > { } > > - FieldStencilIntersector(const Domain_t &dom, const Intersect &intersect) > - : domain_m(dom), intersector_m(intersect) > + FieldStencilIntersector(const Domain_t &dom, const Intersect &intersect, > + const GuardLayers &stencilExtent) > + : domain_m(dom), stencilExtent_m(stencilExtent), intersector_m(intersect) > { } > > This_t &operator=(const This_t &model) > @@ -626,6 +631,7 @@ > if (this != &model) > { > domain_m = model.domain_m; > + stencilExtent_m = model.stencilExtent_m; > intersector_m = model.intersector_m; > } > return *this; > @@ -662,9 +668,13 @@ > } > > template > - inline bool intersect(const Engine &engine, const GuardLayers &) > + inline bool intersect(const Engine &engine, const GuardLayers &, > + GuardLayers &usedGuards) > { > intersect(engine); > + // FIXME: accumulate used guards from intersect above and > + // stencil extent? I.e. allow Stencil<>(a(i-1)+a(i+1))? > + usedGuards = stencilExtent_m; > return true; > } > > @@ -672,6 +682,7 @@ > > > Interval domain_m; > + GuardLayers stencilExtent_m; > Intersect intersector_m; > }; > > @@ -699,8 +710,14 @@ > // cells results in an error in the multipatch inode view.) > > typedef FieldStencilIntersector NewIntersector_t; > + GuardLayers stencilExtent; > + for (int i=0; i + stencilExtent.lower(i) = engine.functor().lowerExtent(i); > + stencilExtent.upper(i) = engine.functor().upperExtent(i); > + } > NewIntersector_t newIntersector(engine.intersectDomain(), > - tag.tag().intersector_m); > + tag.tag().intersector_m, > + stencilExtent); > > expressionApply(engine.field(), > IntersectorTag(newIntersector)); > --- cvs/r2/src/Layout/GridLayout.cpp 2004-01-14 20:08:10.000000000 +0100 > +++ pooma-mpi3/r2/src/Layout/GridLayout.cpp 2004-01-14 20:13:32.000000000 +0100 > @@ -429,7 +436,7 @@ > > // Now, push IDs and source into cache... > > - this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID)); > + this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID, d*2)); > } > } > } > @@ -481,7 +488,7 @@ > > // Now, push IDs and source into cache... > > - this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID)); > + this->gcFillList_m.push_back(GCFillInfo_t(gcdom, sourceID, destID, d*2+1)); > } > } > } > --- cvs/r2/src/Layout/LayoutBase.h 2004-01-14 20:08:12.000000000 +0100 > +++ pooma-mpi3/r2/src/Layout/LayoutBase.h 2004-01-14 20:13:32.000000000 +0100 > @@ -119,8 +121,8 @@ > > struct GCFillInfo > { > - GCFillInfo(const Domain_t &dom, int ownedID, int guardID) > - : domain_m(dom), ownedID_m(ownedID), guardID_m(guardID) { } > + GCFillInfo(const Domain_t &dom, int ownedID, int guardID, int face=-1) > + : domain_m(dom), ownedID_m(ownedID), guardID_m(guardID), face_m(face) { } > > // Get a CW warning about this not having a default constructor > // when we instantiate the vector below. This never > @@ -131,6 +133,7 @@ > Domain_t domain_m; // guard layer domain > int ownedID_m; // node ID for which domain_m is owned > int guardID_m; // node ID for which domain_m is in the guards > + int face_m; // destination face of the guard layer (or -1, if unknown) > > Domain_t & domain() { return domain_m;} > int & ownedID() { return ownedID_m;} > --- cvs/r2/src/Layout/UniformGridLayout.cpp 2004-01-14 20:08:13.000000000 +0100 > +++ pooma-mpi3/r2/src/Layout/UniformGridLayout.cpp 2004-01-14 20:13:32.000000000 +0100 > @@ -279,7 +279,7 @@ > //----------------------------------------------------------------------------- > // > // template > -// void UniformGridLayout::calcGCFillList() > +// void UniformGridLayoutData::calcGCFillList() > // > // Calculates the cached information needed by MultiPatch Engine to > // fill the guard cells. > @@ -370,7 +370,7 @@ > this->all_m[sourceID]->context() == Pooma::context() || > this->all_m[destID]->context() == Pooma::context() > ) > - this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID)); > + this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID,d*2)); > } > } > > @@ -417,7 +417,7 @@ > this->all_m[sourceID]->context() == Pooma::context() || > this->all_m[destID]->context() == Pooma::context() > ) > - this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID)); > + this->gcFillList_m.push_back(GCFillInfo_t(gcdom,sourceID,destID,d*2+1)); > } > } > } > --- cvs/r2/src/Engine/MultiPatchEngine.cpp 2004-01-14 20:11:34.000000000 +0100 > +++ pooma-mpi3/r2/src/Engine/MultiPatchEngine.cpp 2004-01-14 20:23:23.000000000 +0100 > @@ -34,6 +34,7 @@ > #include "Engine/CompressedFraction.h" > #include "Array/Array.h" > #include "Tulip/ReduceOverContexts.h" > +#include "Tulip/SendReceive.h" > #include "Threads/PoomaCSem.h" > #include "Domain/IteratorPairDomain.h" > > @@ -77,16 +78,18 @@ > Engine(const Layout_t &layout) > : layout_m(layout), > data_m(layout.sizeGlobal()), > - pDirty_m(new bool(true)) > + pDirty_m(new int) > { > typedef typename Layout_t::Value_t Node_t; > > + setDirty(); > + > // check for correct match of PatchTag and the mapper used to make the > // layout. > // THIS IS A HACK! we test on the context of the first patch, and if it > // is -1, we have a Layout made with the LocalMapper. > > -#if POOMA_CHEETAH > +#if POOMA_MESSAGING > > if( layout_m.nodeListGlobal().size() > 0) > { > @@ -247,7 +250,7 @@ > PAssert(data_m.isValid()); > if (data_m.isShared()) { > data_m.makeOwnCopy(); > - pDirty_m = new bool(*pDirty_m); > + pDirty_m = new int(*pDirty_m); > } > > return *this; > @@ -261,45 +264,88 @@ > // > //----------------------------------------------------------------------------- > > +/// Guard layer assign between non-remote engines, just use the > +/// ET mechanisms > + > +template > +static inline > +void simpleAssign(const Array& lhs, > + const Array& rhs, > + const Interval& domain) > +{ > + lhs(domain) = rhs(domain); > +} > + > +/// Guard layer assign between remote engines, use Send/Receive directly > +/// to avoid one extra copy of the data. > + > +template > +static inline > +void simpleAssign(const Array >& lhs, > + const Array >& rhs, > + const Interval& domain) > +{ > + if (lhs.engine().owningContext() == rhs.engine().owningContext()) > + lhs(domain) = rhs(domain); > + else { > + typedef typename NewEngine, Interval >::Type_t ViewEngine_t; > + if (lhs.engine().engineIsLocal()) > + Receive::receive(ViewEngine_t(lhs.engine().localEngine(), domain), > + rhs.engine().owningContext()); > + else if (rhs.engine().engineIsLocal()) > + SendReceive::send(ViewEngine_t(rhs.engine().localEngine(), domain), > + lhs.engine().owningContext()); > + } > +} > + > template > void Engine >:: > -fillGuardsHandler(const WrappedInt &) const > +fillGuardsHandler(const GuardLayers& g, const WrappedInt &) const > { > if (!isDirty()) return; > - > -#if POOMA_PURIFY > - > - // This is here to remove spurious UMRs that result when un-initialized > - // guards are copied in the following loop. All of the unitialized data > - // is ultimately overwritten with good data, so I don't see why purify > - // calls these UMRs in stead of unitialized memory copies, but it does. > - // I don't do this in general since it would be slow and since T(0) is > - // not generally valid. This does mean that fillGuards() will fail > - // with purify for types that do not know what to do with T(0). > - > - setGuards(T(0)); > - > -#endif > > + int updated = 0; > typename Layout_t::FillIterator_t p = layout_m.beginFillList(); > - > + > while (p != layout_m.endFillList()) > { > int src = p->ownedID_m; > int dest = p->guardID_m; > > - // Create patch arrays that see the entire patch: > + // Skip face, if not dirty. > + > + if (isDirty(p->face_m)) { > + > + // Check, if the p->domain_m is a guard which matches the > + // needed guard g. > + > + int d = p->face_m/2; > + int guardSizeNeeded = p->face_m & 1 ? g.upper(d) : g.lower(d); > + if (!(p->face_m != -1 > + && guardSizeNeeded == 0)) { > + > + // Create patch arrays that see the entire patch: > > - Array lhs(data()[dest]), rhs(data()[src]); > + Array lhs(data()[dest]), rhs(data()[src]); > > - // Now do assignment from the subdomains. > + // Now do assignment from the subdomains. > +#if POOMA_MPI > + simpleAssign(lhs, rhs, p->domain_m); > +#else > + lhs(p->domain_m) = rhs(p->domain_m); > +#endif > + > + // Mark up-to-date. > + updated |= 1<face_m; > + > + } > + > + } > > - lhs(p->domain_m) = rhs(p->domain_m); > - > ++p; > } > - > - *pDirty_m = false; > + > + *pDirty_m &= ~updated; > } > > > @@ -331,7 +377,7 @@ > ++p; > } > > - *pDirty_m = true; > + setDirty(); > } > > > @@ -366,7 +412,7 @@ > ++p; > } > > - *pDirty_m = true; > + setDirty(); > } > > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 03:04:54 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 15 Jan 2004 19:04:54 -0800 Subject: [pooma-dev] Re: [PATCH] Optimize guard update copy In-Reply-To: References: <3FF45420.3090106@codesourcery.com> Message-ID: <400754D6.9070606@codesourcery.com> Richard Guenther wrote: > On Thu, 1 Jan 2004, Richard Guenther wrote: > > >>On Thu, 1 Jan 2004, Jeffrey D. Oldham wrote: >> >> >>>Richard Guenther wrote: >>> >>>>Hi! >>>> >>>>This patch removes number four of the copies done for guard update. >>>>Basically, additionally to the three copies I mentioned in the previous >>>>mail, we're doing one extra during the RemoteView expressionApply of the >>>>data-parallel assignment we're doing for the guard domains. Ugh. Fixed by >>>>manually sending/receiving from/to the views. Doesn't work for Cheetah, >>>>so conditionalized on POOMA_MPI. >>> >>>What breaks for Cheetah? >> >>I don't remember... I can try again next week. > > > I tried again, and with Cheetah this is really a mess, because Cheetah > cannot receive into a View, so we need to pass a Brick as IncomingView and > a BrickView as View to SendReceive::receive() and this > breaks all over the place in the Cheetah library... > > So, unfortunately, no - this doesn't work for Cheetah - at least not with > major surgery inside the Cheetah library (which I would rather drop than > fix). > > So, is the patch ok as it is (affecting only POOMA_MPI)? OK. > Thanks, > > Richard. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Fri Jan 16 08:55:04 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 16 Jan 2004 09:55:04 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Fix PrintField wrt expressions In-Reply-To: <4007037F.3080103@codesourcery.com> References: <4007037F.3080103@codesourcery.com> Message-ID: On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > The following patch allows to print Fields with expression engines. > > PrintField uses applyRelations() while it should use a tree-walk with > > PerformUpdateTag. > > So, with this change, the field will be guaranteed to be updated by any > relations that can change the field? If the field is dirty, yes. Behavoir is exactly the same as before, just the case tester.out() << f + 1.0 << std::endl; didn't work before, because the FieldEngine doesn't have the data() method applyRelations is trying to access. The forEach() magically skips the ExpressionEngines and applies to the leafs only. Richard. > > 2004Jan14 Richard Guenther > > > > * src/Field/PrintField.h: use forEach(,PerformUpdateTag(),) rather > > than applyRelations(). > > > > ===== PrintField.h 1.3 vs edited ===== > > --- 1.3/r2/src/Field/PrintField.h Wed Dec 3 12:30:41 2003 > > +++ edited/PrintField.h Wed Jan 14 12:01:09 2004 > > @@ -231,7 +231,7 @@ > > template > > void print(S &s, const A &a) const > > { > > - a.applyRelations(); > > + forEach(a, PerformUpdateTag(), NullCombine()); > > Pooma::blockAndEvaluate(); > > > > for (int m = 0; m < a.numMaterials(); m++) > > > -- > Jeffrey D. Oldham > oldham at codesourcery.com > -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From rguenth at tat.physik.uni-tuebingen.de Fri Jan 16 09:00:40 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 16 Jan 2004 10:00:40 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Speed up guard update. In-Reply-To: <400754AE.1070102@codesourcery.com> References: <400754AE.1070102@codesourcery.com> Message-ID: On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This is a refined (aka shorter) patch which unifies the tracking of > > up-to-date faces and the special optimized copy for MPI. > > > > Tested on serial ia32 linux with gcc3.4 with no regression. > > > > Ok? > > Yes, assuming the user interface did not change. It looks like > GCFillInfo's interface changed but existing code will still run because > a parameter with a default argument was added. It depends on where you draw the line between "user" interface and internal interface. I'd consider GCFillInfo's internal interface, only the fillGuards(), setDirty(), etc. methods from the MultiPatchEngine I'd consider "user" interface - and these ones will still work with old code. I also changed the various intersectors to take an additional parameter - old code will break if it used this interface (but I consider this not user interface either). Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From rguenth at tat.physik.uni-tuebingen.de Fri Jan 16 09:07:04 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 16 Jan 2004 10:07:04 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators In-Reply-To: <4007534D.90101@codesourcery.com> References: <400441CF.1030007@codesourcery.com> <4007534D.90101@codesourcery.com> Message-ID: On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > I appreciate your finding the difficulty and your taking the time to > explain the problem. I am reluctant to add code that is known to be > broken for some situations. Is there a way to mark the code so 1) the > known brokenness is marked and 2) the program asks sensibly when the > brokenness is experienced? I understand your concerns. I can add a comment describing the brokenness (which boils down to "we don't have a collective barrier abstraction" - blockAndEvaluate() is _not_ collective). Detecting it would be possible by asserting that each context at least does either Send or Compute. I'll think some more about the collective barrier -- maybe extend the generation tracking to allow blocking (collectively) for completion of the generation at endGeneration() time. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From oldham at codesourcery.com Fri Jan 16 17:06:00 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 16 Jan 2004 09:06:00 -0800 Subject: [pooma-dev] Re: [PATCH] Fix PrintField wrt expressions In-Reply-To: References: <4007037F.3080103@codesourcery.com> Message-ID: <400819F8.2060507@codesourcery.com> Richard Guenther wrote: > On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>The following patch allows to print Fields with expression engines. >>>PrintField uses applyRelations() while it should use a tree-walk with >>>PerformUpdateTag. >> >>So, with this change, the field will be guaranteed to be updated by any >>relations that can change the field? > > > If the field is dirty, yes. Behavoir is exactly the same as before, just > the case > > tester.out() << f + 1.0 << std::endl; > > didn't work before, because the FieldEngine doesn't have > the data() method applyRelations is trying to access. The forEach() > magically skips the ExpressionEngines and applies to the leafs only. > > Richard. > > >>>2004Jan14 Richard Guenther >>> >>> * src/Field/PrintField.h: use forEach(,PerformUpdateTag(),) rather >>> than applyRelations(). >>> >>>===== PrintField.h 1.3 vs edited ===== >>>--- 1.3/r2/src/Field/PrintField.h Wed Dec 3 12:30:41 2003 >>>+++ edited/PrintField.h Wed Jan 14 12:01:09 2004 >>>@@ -231,7 +231,7 @@ >>> template >>> void print(S &s, const A &a) const >>> { >>>- a.applyRelations(); >>>+ forEach(a, PerformUpdateTag(), NullCombine()); >>> Pooma::blockAndEvaluate(); >>> >>> for (int m = 0; m < a.numMaterials(); m++) Great! That's a good improvement. Will you please commit the patch? Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 17:11:48 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 16 Jan 2004 09:11:48 -0800 Subject: [pooma-dev] Re: [PATCH] Speed up guard update. In-Reply-To: References: <400754AE.1070102@codesourcery.com> Message-ID: <40081B54.2000501@codesourcery.com> Richard Guenther wrote: > On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This is a refined (aka shorter) patch which unifies the tracking of >>>up-to-date faces and the special optimized copy for MPI. >>> >>>Tested on serial ia32 linux with gcc3.4 with no regression. >>> >>>Ok? >> >>Yes, assuming the user interface did not change. It looks like >>GCFillInfo's interface changed but existing code will still run because >>a parameter with a default argument was added. > > > It depends on where you draw the line between "user" interface and > internal interface. I'd consider GCFillInfo's internal interface, only > the fillGuards(), setDirty(), etc. methods from the MultiPatchEngine I'd > consider "user" interface - and these ones will still work with old code. > I also changed the various intersectors to take an additional parameter - old code will > break if it used this interface (but I consider this not user interface > either). Great! I interpret this a "no change to the user interface". Would you please commit the patch? -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Fri Jan 16 17:12:27 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Fri, 16 Jan 2004 09:12:27 -0800 Subject: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators In-Reply-To: References: <400441CF.1030007@codesourcery.com> <4007534D.90101@codesourcery.com> Message-ID: <40081B7B.5060205@codesourcery.com> Richard Guenther wrote: > On Thu, 15 Jan 2004, Jeffrey D. Oldham wrote: > > >>I appreciate your finding the difficulty and your taking the time to >>explain the problem. I am reluctant to add code that is known to be >>broken for some situations. Is there a way to mark the code so 1) the >>known brokenness is marked and 2) the program asks sensibly when the >>brokenness is experienced? > > > I understand your concerns. I can add a comment describing the brokenness > (which boils down to "we don't have a collective barrier abstraction" - > blockAndEvaluate() is _not_ collective). Detecting it would be possible > by asserting that each context at least does either Send or Compute. > > I'll think some more about the collective barrier -- maybe extend the > generation tracking to allow blocking (collectively) for completion of the > generation at endGeneration() time. Thank you. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Sat Jan 17 19:21:26 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Sat, 17 Jan 2004 20:21:26 +0100 (CET) Subject: [PATCH] Kill Unwrap<> Message-ID: Hi! This patch kills previously introduced Unwrap<> and instead provides a fallback in OpMask<>. This way we don't forget places to update (as I did with PartialReduction in case of OpenMP). Tested on serial ia64 linux with no regressions. Ok? Richard. 2004Jan17 Richard Guenther * src/Engine/RemoteEngine.h: kill use of Unwrap<>. src/Evaluator/Reduction.h: likewise. src/Tulip/ReduceOverContexts.h: likewise. src/Evaluator/OpMask.h: likewise, provide fallback in OpMask instead. diff -Nru a/r2/src/Engine/RemoteEngine.h b/r2/src/Engine/RemoteEngine.h --- a/r2/src/Engine/RemoteEngine.h Sat Jan 17 20:16:00 2004 +++ b/r2/src/Engine/RemoteEngine.h Sat Jan 17 20:16:00 2004 @@ -2193,12 +2193,12 @@ { ret = vals[0]; for (j = 1; j < n; j++) - Unwrap::unwrap(op)(ret, vals[j]); + op(ret, vals[j]); } delete [] vals; - ReduceOverContexts::Op_t> finalReduction(ret, 0, n > 0); + ReduceOverContexts finalReduction(ret, 0, n > 0); if (Pooma::context() == 0) ret = finalReduction; diff -Nru a/r2/src/Evaluator/OpMask.h b/r2/src/Evaluator/OpMask.h --- a/r2/src/Evaluator/OpMask.h Sat Jan 17 20:16:00 2004 +++ b/r2/src/Evaluator/OpMask.h Sat Jan 17 20:16:00 2004 @@ -150,16 +150,25 @@ OpMask(const Op &op) : op_m(op) { } ~OpMask() { } + /// WhereProxy Op, embed a conditional operation. template - inline T1& - operator()(T1 &a, const T2 &b) const + inline void + operator()(T1 &a, const MaskAssign &b) const { if (b.defined()) { op_m(a, b.value()); } - return a; } + + /// Fall back to native operation. + template + inline void + operator()(T1 &a, const T2 &b) const + { + op_m(a, b); + } + Op op_m; }; @@ -167,18 +176,6 @@ struct BinaryReturn > { typedef T1 &Type_t; -}; - -template -struct Unwrap { - typedef Op Op_t; - static inline const Op_t& unwrap(const Op &op) { return op; } -}; - -template -struct Unwrap > { - typedef typename Unwrap::Op_t Op_t; - static inline const Op_t& unwrap(const OpMask &op) { return Unwrap::unwrap(op.op_m); } }; template diff -Nru a/r2/src/Evaluator/Reduction.h b/r2/src/Evaluator/Reduction.h --- a/r2/src/Evaluator/Reduction.h Sat Jan 17 20:16:00 2004 +++ b/r2/src/Evaluator/Reduction.h Sat Jan 17 20:16:00 2004 @@ -259,7 +259,7 @@ ret = vals[0]; for (j = 1; j < n; j++) - Unwrap::unwrap(op)(ret, vals[j]); + op(ret, vals[j]); delete [] vals; } }; diff -Nru a/r2/src/Tulip/ReduceOverContexts.h b/r2/src/Tulip/ReduceOverContexts.h --- a/r2/src/Tulip/ReduceOverContexts.h Sat Jan 17 20:16:00 2004 +++ b/r2/src/Tulip/ReduceOverContexts.h Sat Jan 17 20:16:00 2004 @@ -274,7 +274,7 @@ if (!v.valid()) v = *v2; else - Unwrap::Op_t()(v.value(), v2->value()); + ReductionOp()(v.value(), v2->value()); } Serialize_t::cleanup(v2); } @@ -325,7 +325,7 @@ } else { - Unwrap::Op_t()(me->value_m, v.value()); + ReductionOp()(me->value_m, v.value()); } } From rguenth at tat.physik.uni-tuebingen.de Sun Jan 18 14:07:27 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Sun, 18 Jan 2004 15:07:27 +0100 (CET) Subject: [PATCH] Warn about MPI not supporting cross-context particles Message-ID: Hi! This patch aborts on use of cross-context particles with MPI, which is not supported. Ok? Richard. 2004Jan18 Richard Guenther * src/Particles/PatchSwapLayout.cpp: abort on cross-context particles with MPI. Index: src/Particles/PatchSwapLayout.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Particles/PatchSwapLayout.cpp,v retrieving revision 1.15 diff -u -u -r1.15 PatchSwapLayout.cpp --- src/Particles/PatchSwapLayout.cpp 8 Jun 2000 22:16:23 -0000 1.15 +++ src/Particles/PatchSwapLayout.cpp 18 Jan 2004 13:38:28 -0000 @@ -545,6 +545,8 @@ Pooma::particleSwapHandler()->send(toContext, tag, buf); } +#elif POOMA_MPI + PInsist(false, "Cross-context particles not supported for MPI"); #endif // POOMA_CHEETAH } @@ -621,6 +623,8 @@ while (layout_m.patchInfo(lid).msgReceived() < remotePatches) Pooma::poll(); +#elif POOMA_MPI + PInsist(false, "Cross-context particles not supported for MPI"); #endif // POOMA_CHEETAH } From oldham at codesourcery.com Mon Jan 19 19:00:39 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 19 Jan 2004 11:00:39 -0800 Subject: [PATCH] Kill Unwrap<> In-Reply-To: References: Message-ID: <400C2957.6090706@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch kills previously introduced Unwrap<> and instead provides a > fallback in OpMask<>. This way we don't forget places to update (as I did > with PartialReduction in case of OpenMP). > > Tested on serial ia64 linux with no regressions. > > Ok? Thanks for fixing this. I agree it's a good idea to have the default values do the right thing. Are there any other operators that need modification as well? Is it OK to have an additional if in the Where operator? Will this impact performance? OK to commit. > Richard. > > > 2004Jan17 Richard Guenther > > * src/Engine/RemoteEngine.h: kill use of Unwrap<>. > src/Evaluator/Reduction.h: likewise. > src/Tulip/ReduceOverContexts.h: likewise. > src/Evaluator/OpMask.h: likewise, provide fallback in > OpMask instead. > > diff -Nru a/r2/src/Engine/RemoteEngine.h b/r2/src/Engine/RemoteEngine.h > --- a/r2/src/Engine/RemoteEngine.h Sat Jan 17 20:16:00 2004 > +++ b/r2/src/Engine/RemoteEngine.h Sat Jan 17 20:16:00 2004 > @@ -2193,12 +2193,12 @@ > { > ret = vals[0]; > for (j = 1; j < n; j++) > - Unwrap::unwrap(op)(ret, vals[j]); > + op(ret, vals[j]); > } > > delete [] vals; > > - ReduceOverContexts::Op_t> finalReduction(ret, 0, n > 0); > + ReduceOverContexts finalReduction(ret, 0, n > 0); > if (Pooma::context() == 0) > ret = finalReduction; > > diff -Nru a/r2/src/Evaluator/OpMask.h b/r2/src/Evaluator/OpMask.h > --- a/r2/src/Evaluator/OpMask.h Sat Jan 17 20:16:00 2004 > +++ b/r2/src/Evaluator/OpMask.h Sat Jan 17 20:16:00 2004 > @@ -150,16 +150,25 @@ > OpMask(const Op &op) : op_m(op) { } > ~OpMask() { } > > + /// WhereProxy Op, embed a conditional operation. > template > - inline T1& > - operator()(T1 &a, const T2 &b) const > + inline void > + operator()(T1 &a, const MaskAssign &b) const > { > if (b.defined()) > { > op_m(a, b.value()); > } > - return a; > } > + > + /// Fall back to native operation. > + template > + inline void > + operator()(T1 &a, const T2 &b) const > + { > + op_m(a, b); > + } > + > Op op_m; > }; > > @@ -167,18 +176,6 @@ > struct BinaryReturn > > { > typedef T1 &Type_t; > -}; > - > -template > -struct Unwrap { > - typedef Op Op_t; > - static inline const Op_t& unwrap(const Op &op) { return op; } > -}; > - > -template > -struct Unwrap > { > - typedef typename Unwrap::Op_t Op_t; > - static inline const Op_t& unwrap(const OpMask &op) { return Unwrap::unwrap(op.op_m); } > }; > > template > diff -Nru a/r2/src/Evaluator/Reduction.h b/r2/src/Evaluator/Reduction.h > --- a/r2/src/Evaluator/Reduction.h Sat Jan 17 20:16:00 2004 > +++ b/r2/src/Evaluator/Reduction.h Sat Jan 17 20:16:00 2004 > @@ -259,7 +259,7 @@ > > ret = vals[0]; > for (j = 1; j < n; j++) > - Unwrap::unwrap(op)(ret, vals[j]); > + op(ret, vals[j]); > delete [] vals; > } > }; > diff -Nru a/r2/src/Tulip/ReduceOverContexts.h b/r2/src/Tulip/ReduceOverContexts.h > --- a/r2/src/Tulip/ReduceOverContexts.h Sat Jan 17 20:16:00 2004 > +++ b/r2/src/Tulip/ReduceOverContexts.h Sat Jan 17 20:16:00 2004 > @@ -274,7 +274,7 @@ > if (!v.valid()) > v = *v2; > else > - Unwrap::Op_t()(v.value(), v2->value()); > + ReductionOp()(v.value(), v2->value()); > } > Serialize_t::cleanup(v2); > } > @@ -325,7 +325,7 @@ > } > else > { > - Unwrap::Op_t()(me->value_m, v.value()); > + ReductionOp()(me->value_m, v.value()); > } > } > -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Mon Jan 19 19:04:31 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 19 Jan 2004 11:04:31 -0800 Subject: [PATCH] Warn about MPI not supporting cross-context particles In-Reply-To: References: Message-ID: <400C2A3F.5060902@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch aborts on use of cross-context particles with MPI, which is not > supported. > > Ok? > > Richard. > > > 2004Jan18 Richard Guenther > > * src/Particles/PatchSwapLayout.cpp: abort on cross-context > particles with MPI. > > Index: src/Particles/PatchSwapLayout.cpp > =================================================================== > RCS file: /home/pooma/Repository/r2/src/Particles/PatchSwapLayout.cpp,v > retrieving revision 1.15 > diff -u -u -r1.15 PatchSwapLayout.cpp > --- src/Particles/PatchSwapLayout.cpp 8 Jun 2000 22:16:23 -0000 1.15 > +++ src/Particles/PatchSwapLayout.cpp 18 Jan 2004 13:38:28 -0000 > @@ -545,6 +545,8 @@ > Pooma::particleSwapHandler()->send(toContext, tag, buf); > } > > +#elif POOMA_MPI > + PInsist(false, "Cross-context particles not supported for MPI"); > #endif // POOMA_CHEETAH Is the POOMA_CHEETAH comment correct? I think it is probably correct since the first #if is probably for Cheetah, but is it misleading? Is there a better comment? > } > > @@ -621,6 +623,8 @@ > while (layout_m.patchInfo(lid).msgReceived() < remotePatches) > Pooma::poll(); > > +#elif POOMA_MPI > + PInsist(false, "Cross-context particles not supported for MPI"); > #endif // POOMA_CHEETAH Likewise here. > } > Thanks for tightening the code. OK to commit. -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Mon Jan 19 22:07:55 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 19 Jan 2004 23:07:55 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Kill Unwrap<> In-Reply-To: <400C2957.6090706@codesourcery.com> References: <400C2957.6090706@codesourcery.com> Message-ID: On Mon, 19 Jan 2004, Jeffrey D. Oldham wrote: > Richard Guenther wrote: > > Hi! > > > > This patch kills previously introduced Unwrap<> and instead provides a > > fallback in OpMask<>. This way we don't forget places to update (as I did > > with PartialReduction in case of OpenMP). > > > > Tested on serial ia64 linux with no regressions. > > > > Ok? > > Thanks for fixing this. I agree it's a good idea to have the default > values do the right thing. Are there any other operators that need > modification as well? Is it OK to have an additional if in the Where > operator? Will this impact performance? It's really only the WhereProxy and its OpMask wrapping operator that is this special. Performance should be unaffected. Richard. From rguenth at tat.physik.uni-tuebingen.de Mon Jan 19 22:08:58 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 19 Jan 2004 23:08:58 +0100 (CET) Subject: [pooma-dev] Re: [PATCH] Warn about MPI not supporting cross-context particles In-Reply-To: <400C2A3F.5060902@codesourcery.com> References: <400C2A3F.5060902@codesourcery.com> Message-ID: On Mon, 19 Jan 2004, Jeffrey D. Oldham wrote: > > > > +#elif POOMA_MPI > > + PInsist(false, "Cross-context particles not supported for MPI"); > > #endif // POOMA_CHEETAH > > Is the POOMA_CHEETAH comment correct? I think it is probably correct > since the first #if is probably for Cheetah, but is it misleading? Is > there a better comment? I don't know of one - the existing one is probably better than no one. Richard. From oldham at codesourcery.com Tue Jan 20 00:52:34 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 19 Jan 2004 16:52:34 -0800 Subject: [pooma-dev] Re: [PATCH] Kill Unwrap<> In-Reply-To: References: <400C2957.6090706@codesourcery.com> Message-ID: <400C7BD2.8000605@codesourcery.com> Richard Guenther wrote: > On Mon, 19 Jan 2004, Jeffrey D. Oldham wrote: > > >>Richard Guenther wrote: >> >>>Hi! >>> >>>This patch kills previously introduced Unwrap<> and instead provides a >>>fallback in OpMask<>. This way we don't forget places to update (as I did >>>with PartialReduction in case of OpenMP). >>> >>>Tested on serial ia64 linux with no regressions. >>> >>>Ok? >> >>Thanks for fixing this. I agree it's a good idea to have the default >>values do the right thing. Are there any other operators that need >>modification as well? Is it OK to have an additional if in the Where >>operator? Will this impact performance? > > > It's really only the WhereProxy and its OpMask wrapping operator that is > this special. Performance should be unaffected. Good. Thanks. Please commit the patch. -- Jeffrey D. Oldham oldham at codesourcery.com From oldham at codesourcery.com Tue Jan 20 00:53:05 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Mon, 19 Jan 2004 16:53:05 -0800 Subject: [pooma-dev] Re: [PATCH] Warn about MPI not supporting cross-context particles In-Reply-To: References: <400C2A3F.5060902@codesourcery.com> Message-ID: <400C7BF1.6010701@codesourcery.com> Richard Guenther wrote: > On Mon, 19 Jan 2004, Jeffrey D. Oldham wrote: > > >>>+#elif POOMA_MPI >>>+ PInsist(false, "Cross-context particles not supported for MPI"); >>> #endif // POOMA_CHEETAH >> >>Is the POOMA_CHEETAH comment correct? I think it is probably correct >>since the first #if is probably for Cheetah, but is it misleading? Is >>there a better comment? > > > I don't know of one - the existing one is probably better than no one. Thanks for making these changes. Will you please commit them? -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 29 10:44:27 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 29 Jan 2004 11:44:27 +0100 (CET) Subject: [PATCH] Fix dependency generation Message-ID: Hi! This patch fixes long standing possibility of ending up with recursive makefile inclusion... basically, if the dependency making failed somehow, we ended up including it forever on the next invocation of make. Ugh. It also brings dependencies for the testsuite. Ok? Richard. 2004Jan29 Richard Guenther * config/Shared/rules.mk: don't repeat the toplevel makefile in the initial depend.mk. Ignore SCCS and CVS directories for depend files, add testsuite files for depend. ===== rules.mk 1.3 vs edited ===== *** /tmp/rules.mk-1.3-9804 Mon Jun 23 14:50:41 2003 --- edited/rules.mk Thu Jan 29 11:39:35 2004 *************** *** 3,14 **** .PHONY : showtimes showenv clean cleansuite realclean realcleansuite realrealcleansuite tar makestuff dirs .PHONY : showalias echopoomaroot suiteinfo ! depend: @echo "Making Dependencies for suite=$(SUITE)."; \ cd $(PROJECT_ROOT);\ ! filelist=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cc" -o -name "*.C" -o -name "*.cpp" | grep -v tests`;\ ! cp $(PROJECT_ROOT)/makefile $(LIBRARY_ROOT)/depend.mk ; \ ! $(MAKEDEPEND) -f $(LIBRARY_ROOT)/depend.mk $(shell echo $(SUITE_DEFINES) | $(TR) ' ' ',' ) $(SUITE_INCLUDES) $$filelist 2> $(LIBRARY_ROOT)/depend.err;\ $(PERL) $(SHARED_ROOT)/dependo.pl $(LIBRARY_ROOT)/depend.mk $(PROJECT_ROOT) $$filelist;\ rm -f $(LIBRARY_ROOT)/depend.mk.bak --- 3,15 ---- .PHONY : showtimes showenv clean cleansuite realclean realcleansuite realrealcleansuite tar makestuff dirs .PHONY : showalias echopoomaroot suiteinfo ! depend: cleandepend @echo "Making Dependencies for suite=$(SUITE)."; \ cd $(PROJECT_ROOT);\ ! filelist=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cmpl.cpp" -o -name "*.inst.cpp" | grep -v "tests\|FileTemplates\|SCCS\|CVS"`;\ ! filelisttests=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cpp" | grep -v "SCCS\|CVS" | grep "tests/.*\.cpp"`;\ ! touch $(LIBRARY_ROOT)/depend.mk ; \ ! $(MAKEDEPEND) -f $(LIBRARY_ROOT)/depend.mk $(shell echo $(SUITE_DEFINES) | $(TR) ' ' ',' ) $(SUITE_INCLUDES) $$filelist $$filelisttests 2> $(LIBRARY_ROOT)/depend.err;\ $(PERL) $(SHARED_ROOT)/dependo.pl $(LIBRARY_ROOT)/depend.mk $(PROJECT_ROOT) $$filelist;\ rm -f $(LIBRARY_ROOT)/depend.mk.bak From oldham at codesourcery.com Thu Jan 29 16:22:19 2004 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Thu, 29 Jan 2004 08:22:19 -0800 Subject: [pooma-dev] [PATCH] Fix dependency generation In-Reply-To: References: Message-ID: <4019333B.2000202@codesourcery.com> Richard Guenther wrote: > Hi! > > This patch fixes long standing possibility of ending up with recursive > makefile inclusion... basically, if the dependency making failed somehow, > we ended up including it forever on the next invocation of make. Ugh. > It also brings dependencies for the testsuite. > > Ok? OK. > Richard. > > > 2004Jan29 Richard Guenther > > * config/Shared/rules.mk: don't repeat the toplevel makefile in > the initial depend.mk. Ignore SCCS and CVS directories for depend > files, add testsuite files for depend. > > ===== rules.mk 1.3 vs edited ===== > *** /tmp/rules.mk-1.3-9804 Mon Jun 23 14:50:41 2003 > --- edited/rules.mk Thu Jan 29 11:39:35 2004 > *************** > *** 3,14 **** > .PHONY : showtimes showenv clean cleansuite realclean realcleansuite realrealcleansuite tar makestuff dirs > .PHONY : showalias echopoomaroot suiteinfo > > ! depend: > @echo "Making Dependencies for suite=$(SUITE)."; \ > cd $(PROJECT_ROOT);\ > ! filelist=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cc" -o -name "*.C" -o -name "*.cpp" | grep -v tests`;\ > ! cp $(PROJECT_ROOT)/makefile $(LIBRARY_ROOT)/depend.mk ; \ > ! $(MAKEDEPEND) -f $(LIBRARY_ROOT)/depend.mk $(shell echo $(SUITE_DEFINES) | $(TR) ' ' ',' ) $(SUITE_INCLUDES) $$filelist 2> $(LIBRARY_ROOT)/depend.err;\ > $(PERL) $(SHARED_ROOT)/dependo.pl $(LIBRARY_ROOT)/depend.mk $(PROJECT_ROOT) $$filelist;\ > rm -f $(LIBRARY_ROOT)/depend.mk.bak > > --- 3,15 ---- > .PHONY : showtimes showenv clean cleansuite realclean realcleansuite realrealcleansuite tar makestuff dirs > .PHONY : showalias echopoomaroot suiteinfo > > ! depend: cleandepend > @echo "Making Dependencies for suite=$(SUITE)."; \ > cd $(PROJECT_ROOT);\ > ! filelist=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cmpl.cpp" -o -name "*.inst.cpp" | grep -v "tests\|FileTemplates\|SCCS\|CVS"`;\ > ! filelisttests=`$(FIND) $(PROJECT_ROOT)/src -type f -name "*.cpp" | grep -v "SCCS\|CVS" | grep "tests/.*\.cpp"`;\ > ! touch $(LIBRARY_ROOT)/depend.mk ; \ > ! $(MAKEDEPEND) -f $(LIBRARY_ROOT)/depend.mk $(shell echo $(SUITE_DEFINES) | $(TR) ' ' ',' ) $(SUITE_INCLUDES) $$filelist $$filelisttests 2> $(LIBRARY_ROOT)/depend.err;\ > $(PERL) $(SHARED_ROOT)/dependo.pl $(LIBRARY_ROOT)/depend.mk $(PROJECT_ROOT) $$filelist;\ > rm -f $(LIBRARY_ROOT)/depend.mk.bak > -- Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Thu Jan 29 21:49:50 2004 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Thu, 29 Jan 2004 22:49:50 +0100 (CET) Subject: Cheetah license Message-ID: Hi! Did anyone had progress regarding the "missing" Cheetah license? Thanks, Richard.