[vsipl++] [patch] Profile_event class

Mon Jul 17 21:38:10 UTC 2006

Jules Bergmann wrote:
...

> Also, we should start thinking about how this should be configured
> and controlled.
> 
I like the configuration suggestions so far, but would like to put the 
one below off until we have the basic stuff implemented first.  For the 
record though, I see only a very minor benefit to being able to 
selectively turn these on and off.

>  - What types of events are profiled (by broad categories).
>    Categories:
>     - objects: signal processing objects and linear algebra solvers
>     - matvec: linear algebra
>     - expr: element-wise expressions
>     - comm: communications
>     - user events
> 
>    New option: --with-profile-cat={no,objects,matvec,expr,user,all}
> 
>    Default is {no}.
> 

>  - Whether performance API (impl_perfromance) is enabled:
>    New option --[enable,disable]-performance-api
> 
Same here, the fewer options, the better...

>     --vsipl++-profile-mode={accum,trace,off}
>     --vsipl++-profile-output={filename}
> 
>    Probably most useful for tracing very small programs and accumulating
>    larger programs.

Ok.  Some benefit here.  Sounds like you're already willing to put this 
off as a future enhancement.

>  > # mode: pm_accum
>  > # timer: x86_64_tsc_time
>  > # clocks_per_sec: 3591371008
>  > #
>  > # tag : total ticks : num calls : op count : mflops
>  > fwd fft, cplx-cplx, rbv : 102141 : 1 : 51200 : 1800.24
>  > inv fft, cplx-cplx, rbv : 95490 : 1 : 51200 : 1925.63
>  >
>  > Note: rbv = return by value.  The others should be readable.  Full
>  > documentation will follow soon.
> 
> Can you add the FFT size to the tag?
> 
I was going to propose adding a second 'value' field.  We had one that 
we kind of took over for the op count.  Why not just add one or two more 
fields and make them general-purpose?  FFTM could put rows and cols, 
etc.  Other routines could put whatever was most relevant...

In any case, I don't want to add it to the tag because it is a useful 
numerical value, so we should give it first-class status so a 
post-processing program can access it more easily.  Plus, I wanted to 
keep the tags as short as possible as we use them for searching (maybe 
this doesn't matter so much though).  I also considered removing the 
spaces and making it more compact, yet more cryptic.  What's the right 
balance here between short-and-cryptic and long-but-readable?

> Does base_interface have enough context to figure out the event name
> by itself?  If not, it might be worth passing the extra info (i.e.
> adding a template parameter for by_value vs by_reference).  That would
> make it easier to limit the impact of the profiling when it is turned
> off.
> 
I'll look at this again, that may be the case.  But I'm not sure how it 
affects performance.  Can you explain?  Can we afford the slight 
increase in cost for this since it is taking place when the Fft object 
is constructed?

>  >      : input_size_(io_size<D, I, O, A>::size(dom)),
>  >        output_size_(io_size<D, O, I, A>::size(dom)),
>  > -      scale_(scale)
>  > -  {}
>  > +      scale_(scale), event_(event_tag)
>  > +  {
>  > +    // Pre-compute the FLOP count.  Used for event profiling (if 
> enabled).
>  > +    event_.ops(op_count(this->input_size_.size()));
> 
> Why not pass the op count as an argument to the 'Profile_event'
> constructor?
> 
There was a problem with that at the time.  I'll need to try it again to 
see what the exact error was.

> 
> Profile_event should keep track of its accumulated time.  The above
> approach has two problems:
>  - Profile_event (and hence impl_performance) will only work when
>    profiling is turned on in the pm_accum mode.
>  - Objects with the same tag will confound each other's impl_performance
>    results.
> 
Both can be perceived as benefits -- at least I viewed them that way!

My argument would be that it should not do any profiling, or at least 
should minimize the effects of the profiling code, when profiling is not 
enabled.  Secondly, having the same underlying interface to both is good 
because it is simpler.  Finally, objects having the same tag are doing 
the same kind of work.  If however, the user desires, they may profile 
each separately by using different log files, or (if in the same scope) 
using the profiler in trace mode.

I'm not overtly attached to the above implementation.  Either way is 
good, and now is a good time to decide.  Maybe I've missed something 
about how you see impl_performance() being used that I don't.  I'm 
looking at it as a different interface into the same basic set of 
profiling features.  Do others have thoughts on this?

Thanks for the feedback!

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712