[vsipl++] [patch] Profile_event class

Jules Bergmann jules at codesourcery.com
Mon Jul 17 20:50:34 UTC 2006


Don McCoy wrote:
 > This patch integrates the functions needed for the impl_performance()
 > interface along with some new functions needed for handling events more
 > efficiently.

Don,

This is looking good.  I have several comments below.

Also, we should start thinking about how this should be configured
and controlled.

Here's a strawman:

Controllable at configure-time:

  - Type of timer (including no timer)
    Current --with-profile-timer=XXX configure option.

    Perhaps we should change the option name to avoid confusion when
    timer is enabled but profiling is not.
    How about --with-timer=XXX ?

    If timer is disabled, than both profiling and impl_performance
    are also disabled.

  - Whether profiling is enabled (but not whether it is trace or
    accumulate):
    New option: --with-profile={no,all}

  - What types of events are profiled (by broad categories).
    Categories:
     - objects: signal processing objects and linear algebra solvers
     - matvec: linear algebra
     - expr: element-wise expressions
     - comm: communications
     - user events

    New option: --with-profile-cat={no,objects,matvec,expr,user,all}

    Default is {no}.

  - Whether performance API (impl_perfromance) is enabled:
    New option --[enable,disable]-performance-api


Controllable at run-time:

  - If profiling configured: profiling mode (trace vs accumulate),
    profiling duration and profile output (via Profile_in_scope)
    controlled via API.  If profiling is disabled, this API is still
    valid but has no effect.  This is currently how we have it.

    We could potentially add command line options that are recognized
    by the 'vsipl' object to control profiling mode so that a user
    program built with a profile-configured library can be profiled
    without changing the app:

	--vsipl++-profile-mode={accum,trace,off}
	--vsipl++-profile-output={filename}

    Probably most useful for tracing very small programs and accumulating
    larger programs.

				-- Jules

 >
 > # mode: pm_accum
 > # timer: x86_64_tsc_time
 > # clocks_per_sec: 3591371008
 > #
 > # tag : total ticks : num calls : op count : mflops
 > fwd fft, cplx-cplx, rbv : 102141 : 1 : 51200 : 1800.24
 > inv fft, cplx-cplx, rbv : 95490 : 1 : 51200 : 1925.63
 >
 > Note: rbv = return by value.  The others should be readable.  Full
 > documentation will follow soon.

Can you add the FFT size to the tag?


 > Index: src/vsip/impl/fft.hpp
 > ===================================================================
 > --- src/vsip/impl/fft.hpp	(revision 145051)
 > +++ src/vsip/impl/fft.hpp	(working copy)
 > @@ -73,7 +73,7 @@
 >    typedef typename impl::Scalar_of<I>::type scalar_type;
 >
 >    length_type
 > -  op_count(length_type len)
 > +  op_count(length_type len) const
 >    {
 >      length_type ops =
 >        static_cast<length_type>(5 * len * std::log((float)len) / 
std::log(2.f));
 > @@ -81,11 +81,14 @@
 >      return ops;
 >    }
 >
 > -  base_interface(Domain<D> const &dom, scalar_type scale)
 > +  base_interface(Domain<D> const &dom, scalar_type scale, 
std::string event_tag)

Does base_interface have enough context to figure out the event name
by itself?  If not, it might be worth passing the extra info (i.e.
adding a template parameter for by_value vs by_reference).  That would
make it easier to limit the impact of the profiling when it is turned
off.

 >      : input_size_(io_size<D, I, O, A>::size(dom)),
 >        output_size_(io_size<D, O, I, A>::size(dom)),
 > -      scale_(scale)
 > -  {}
 > +      scale_(scale), event_(event_tag)
 > +  {
 > +    // Pre-compute the FLOP count.  Used for event profiling (if 
enabled).
 > +    event_.ops(op_count(this->input_size_.size()));

Why not pass the op count as an argument to the 'Profile_event'
constructor?

 > +  }


 > +class Profile_event
 > +{
 > +  typedef DefaultTime    TP;
 > +
 > +public:
 > +  Profile_event(std::string name, unsigned int ops_count = 0)
 > +    : name_(name), ops_(ops_count)
 > +  {}
 > +
 > +  ~Profile_event() {}
 > +
 > +  void ops(unsigned int ops_count) { ops_ = ops_count; }
 > +
 > +  const char* name() const { return name_.c_str(); }
 > +  unsigned int ops() const { return ops_; }
 > +  float total() const { return 
TP::seconds(prof->raw_total(this->name_.c_str())); }
 > +  int count() const { return prof->count(this->name_.c_str()); }
 > +  float mflops() const { return (prof->count(this->name_.c_str()) * 
this->ops_) /
 > +                         (1e6 * this->total()); }
 > +
 > +private:
 > +  std::string name_;
 > +  unsigned int ops_;
 > +};

Profile_event should keep track of its accumulated time.  The above
approach has two problems:
  - Profile_event (and hence impl_performance) will only work when
    profiling is turned on in the pm_accum mode.
  - Objects with the same tag will confound each other's impl_performance
    results.

One way to have Profile_event keep track of its own time and use the
same timestamp for profiling is to have Profile_event call
TP::sample() and then call a Profile::raw_event() function that is
similar to Profile::event() but takes a time sample.



-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705



More information about the vsipl++ mailing list