[vsipl++] [patch] Profile_event class
Don McCoy
don at codesourcery.com
Mon Jul 17 21:38:10 UTC 2006
Jules Bergmann wrote:
...
> Also, we should start thinking about how this should be configured
> and controlled.
>
I like the configuration suggestions so far, but would like to put the
one below off until we have the basic stuff implemented first. For the
record though, I see only a very minor benefit to being able to
selectively turn these on and off.
> - What types of events are profiled (by broad categories).
> Categories:
> - objects: signal processing objects and linear algebra solvers
> - matvec: linear algebra
> - expr: element-wise expressions
> - comm: communications
> - user events
>
> New option: --with-profile-cat={no,objects,matvec,expr,user,all}
>
> Default is {no}.
>
> - Whether performance API (impl_perfromance) is enabled:
> New option --[enable,disable]-performance-api
>
Same here, the fewer options, the better...
> --vsipl++-profile-mode={accum,trace,off}
> --vsipl++-profile-output={filename}
>
> Probably most useful for tracing very small programs and accumulating
> larger programs.
Ok. Some benefit here. Sounds like you're already willing to put this
off as a future enhancement.
> > # mode: pm_accum
> > # timer: x86_64_tsc_time
> > # clocks_per_sec: 3591371008
> > #
> > # tag : total ticks : num calls : op count : mflops
> > fwd fft, cplx-cplx, rbv : 102141 : 1 : 51200 : 1800.24
> > inv fft, cplx-cplx, rbv : 95490 : 1 : 51200 : 1925.63
> >
> > Note: rbv = return by value. The others should be readable. Full
> > documentation will follow soon.
>
> Can you add the FFT size to the tag?
>
I was going to propose adding a second 'value' field. We had one that
we kind of took over for the op count. Why not just add one or two more
fields and make them general-purpose? FFTM could put rows and cols,
etc. Other routines could put whatever was most relevant...
In any case, I don't want to add it to the tag because it is a useful
numerical value, so we should give it first-class status so a
post-processing program can access it more easily. Plus, I wanted to
keep the tags as short as possible as we use them for searching (maybe
this doesn't matter so much though). I also considered removing the
spaces and making it more compact, yet more cryptic. What's the right
balance here between short-and-cryptic and long-but-readable?
> Does base_interface have enough context to figure out the event name
> by itself? If not, it might be worth passing the extra info (i.e.
> adding a template parameter for by_value vs by_reference). That would
> make it easier to limit the impact of the profiling when it is turned
> off.
>
I'll look at this again, that may be the case. But I'm not sure how it
affects performance. Can you explain? Can we afford the slight
increase in cost for this since it is taking place when the Fft object
is constructed?
> > : input_size_(io_size<D, I, O, A>::size(dom)),
> > output_size_(io_size<D, O, I, A>::size(dom)),
> > - scale_(scale)
> > - {}
> > + scale_(scale), event_(event_tag)
> > + {
> > + // Pre-compute the FLOP count. Used for event profiling (if
> enabled).
> > + event_.ops(op_count(this->input_size_.size()));
>
> Why not pass the op count as an argument to the 'Profile_event'
> constructor?
>
There was a problem with that at the time. I'll need to try it again to
see what the exact error was.
>
> Profile_event should keep track of its accumulated time. The above
> approach has two problems:
> - Profile_event (and hence impl_performance) will only work when
> profiling is turned on in the pm_accum mode.
> - Objects with the same tag will confound each other's impl_performance
> results.
>
Both can be perceived as benefits -- at least I viewed them that way!
My argument would be that it should not do any profiling, or at least
should minimize the effects of the profiling code, when profiling is not
enabled. Secondly, having the same underlying interface to both is good
because it is simpler. Finally, objects having the same tag are doing
the same kind of work. If however, the user desires, they may profile
each separately by using different log files, or (if in the same scope)
using the profiler in trace mode.
I'm not overtly attached to the above implementation. Either way is
good, and now is a good time to decide. Maybe I've missed something
about how you see impl_performance() being used that I don't. I'm
looking at it as a different interface into the same basic set of
profiling features. Do others have thoughts on this?
Thanks for the feedback!
Regards,
--
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
More information about the vsipl++
mailing list