[patch] Serial Expression Profiling
Don McCoy
don at codesourcery.com
Wed Aug 9 20:26:16 UTC 2006
The attached patch extends the profiling further by handling some of the
dispatched expression evaluations. The three specific cases covered are:
* Loop fusion - collapsing multiple loops into one when doing
element-wise operations on views.
* Dense expressions - converting tightly-packed 2-D and 3-D views
into 1-D views that are then evaluated normally.
* Matrix transpose - transposing matrices with possibly different
storage formats (row/col)
This can conceivably be extended to cover cases where we are dispatching
to IPP and SAL as well.
All expressions are tagged in the profiler output with "Expr[/type/]",
where type is LF, Dense or Trans. Following that is the dimensionality
(1D, 2D or 3D), a compact representation of the expression and finally
the size(s). For example, the following expression (where all are the
same size and of type Vector<T>):
r = v1 * v2;
Gets logged as:
Expr[LF] 1D *SS 262144 : 66929535 : 1 : 262144 : 14.0664
The expression is represented as "*SS", meaning "the binary multiply
operator applied to two single-precision real values" (again using the
BLAS/LAPACK convention of S/D/C/Z for operand types).
In general, operators are designated with a 'u', 'b' or 't' for unary,
binary and ternary operators respectively, with the exception of the
common binary operators, shown in their more familiar +-*/ form.
Multiple operators are evaluated in order, therefore
v1 * T(4) + v2 / v3
is tagged as:
Expr[LF] 1D *SS/SS+SS 2048 : 1527534 : 1 : 6144 : 14.4451
Changing it to
(v1 * T(4) + v2) / v3
yields:
Expr[LF] 1D *SS+SS/SS 2048 : 1536309 : 1 : 6144 : 14.3626
Dense expressions will appear twice in the profiler output -- once when
it is converted from a 2- or 3-D view and once when evaluated as a 1-D
expression. They do, in fact, refer to the same expression. For example:
Expr[Dense] 3D *SS 64x64x64 : 67455693 : 1 : 262144 : 13.9567
Expr[LF] 1D *SS 262144 : 66991743 : 1 : 262144 : 14.0533
Note that the dense evaluation includes the time it takes to perform the
loop fusion evaluation, hence the slightly longer amount of time spent
there. However, the time difference is probably dominated by the amount
of time it takes to generate the tag itself. Note also that the sizes
are reported differently, but are equivalent as 64x64x64=262144
Finally, please note that not all the operation counts are done at this
point. Missing ones should probably be counted in some fashion.
Currently, if an operator is not handled, it defaults to adding zero ops
to the total count.
Regards,
--
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: se1.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060809/207e9608/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: se1.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060809/207e9608/attachment-0001.ksh>
More information about the vsipl++
mailing list