From don at codesourcery.com  Sun Sep  3 00:48:39 2006
From: don at codesourcery.com (Don McCoy)
Date: Sat, 02 Sep 2006 18:48:39 -0600
Subject: [patch] Profiling configuration change
Message-ID: <44FA2667.50404@codesourcery.com>

This patch changes the way users will compile their programs when using 
profiling.  Before, configuration options were needed.  Now user 
programs will be compiled with a macro that serves the same purpose.

Use -DVSIP_IMPL_PROFILER=15 to enable profiling of all operations.  The 
value 15 (0x0F) is a mask composed of bits now defined in 
impl/profile.hpp.  Each bit corresponds to a set of operations as before.

It is not necessary to define anything when building the library (other 
than a timer), so both the debug and release binary builds will be 
suitable for use with profiling.

Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pe.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060902/2e4491c4/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pe.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060902/2e4491c4/attachment-0001.ksh>

From jules at codesourcery.com  Sun Sep  3 00:54:45 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sat, 02 Sep 2006 20:54:45 -0400
Subject: [vsipl++] [patch] Profiling configuration change
In-Reply-To: <44FA2667.50404@codesourcery.com>
References: <44FA2667.50404@codesourcery.com>
Message-ID: <44FA27D5.6050708@codesourcery.com>

Don McCoy wrote:
> This patch changes the way users will compile their programs when using 
> profiling.  Before, configuration options were needed.  Now user 
> programs will be compiled with a macro that serves the same purpose.
> 
> Use -DVSIP_IMPL_PROFILER=15 to enable profiling of all operations.  The 
> value 15 (0x0F) is a mask composed of bits now defined in 
> impl/profile.hpp.  Each bit corresponds to a set of operations as before.
> 
> It is not necessary to define anything when building the library (other 
> than a timer), so both the debug and release binary builds will be 
> suitable for use with profiling.
> 
> Regards,

Don, this looks good, please check it in.  thanks, -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Sun Sep  3 19:53:17 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 03 Sep 2006 13:53:17 -0600
Subject: [vsipl++] Readme for Profiling
In-Reply-To: <44EB5004.1090409@codesourcery.com>
References: <44EB5004.1090409@codesourcery.com>
Message-ID: <44FB32AD.8050707@codesourcery.com>

This has been updated to reflect recent changes in how we enable profiling.

Don McCoy wrote:
> This 'readme' file is referred to in the tutorial section on 
> profiling, meant to reside in the top-level directory of the source 
> distribution. It serves as a place to put implementation details that 
> would otherwise clutter the tutorial.  It also makes a nice handy 
> mini-reference.
>
> In the near future I'd like to add some more details regarding each of 
> the objects or events we profile internally.  This may make it more 
> clear how to determine which events are "nested" (i.e. listed by more 
> than one expression evaluator).
>

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pr2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060903/6378cd97/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pr2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060903/6378cd97/attachment-0001.ksh>

From don at codesourcery.com  Tue Sep  5 19:00:35 2006
From: don at codesourcery.com (Don McCoy)
Date: Tue, 05 Sep 2006 13:00:35 -0600
Subject: [patch] SIMD 'rscvmul' evaluators
Message-ID: <44FDC953.4030509@codesourcery.com>

This patch corrects a problem where two of the SIMD evaluators were not 
handling re-dimensioned (2-D --> 1-D) views correctly.  This only 
affects the case where an element-wise multiplication is being performed 
with a real scalar and a complex view (and the view was re-dim'd).

It is worth mentioning also that this defect was uncovered using the 
profiling features recently added.   In general, the evaluators were 
doing the right thing, but when they did not, it fell back to using loop 
fusion, thereby still getting the correct answer (just not taking 
advantage of SIMD instructions). 

Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sl.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060905/244891b9/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sl.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060905/244891b9/attachment-0001.ksh>

From mark at codesourcery.com  Tue Sep  5 20:08:43 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Tue, 05 Sep 2006 13:08:43 -0700
Subject: [vsipl++] [patch] SIMD 'rscvmul' evaluators
In-Reply-To: <44FDC953.4030509@codesourcery.com>
References: <44FDC953.4030509@codesourcery.com>
Message-ID: <44FDD94B.4070900@codesourcery.com>

Don McCoy wrote:
> This patch corrects a problem where two of the SIMD evaluators were not 
> handling re-dimensioned (2-D --> 1-D) views correctly.  This only 
> affects the case where an element-wise multiplication is being performed 
> with a real scalar and a complex view (and the view was re-dim'd).
> 
> It is worth mentioning also that this defect was uncovered using the 
> profiling features recently added.   In general, the evaluators were 
> doing the right thing, but when they did not, it fell back to using loop 
> fusion, thereby still getting the correct answer (just not taking 
> advantage of SIMD instructions).

This is very nice on several levels: nice that the profilers found the 
problem, nice that the bug was performance degradation rather than 
failure, and nice that you were able to easily fix it!

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Tue Sep  5 20:38:25 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 05 Sep 2006 16:38:25 -0400
Subject: [vsipl++] [patch] SIMD 'rscvmul' evaluators
In-Reply-To: <44FDC953.4030509@codesourcery.com>
References: <44FDC953.4030509@codesourcery.com>
Message-ID: <44FDE041.3000802@codesourcery.com>

Don McCoy wrote:
> This patch corrects a problem where two of the SIMD evaluators were not 
> handling re-dimensioned (2-D --> 1-D) views correctly.  This only 
> affects the case where an element-wise multiplication is being performed 
> with a real scalar and a complex view (and the view was re-dim'd).

Don, this looks good, please check it in.  thanks -- Jules

> 
> It is worth mentioning also that this defect was uncovered using the 
> profiling features recently added.   In general, the evaluators were 
> doing the right thing, but when they did not, it fell back to using loop 
> fusion, thereby still getting the correct answer (just not taking 
> advantage of SIMD instructions).

This is good stuff!

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue Sep  5 21:54:35 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 05 Sep 2006 17:54:35 -0400
Subject: PAS support for split-complex
Message-ID: <44FDF21B.2040000@codesourcery.com>

This patch adds support to PAS parallel services (and some testing) for 
split complex.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pas-split.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060905/1eed0285/attachment.ksh>

From jules at codesourcery.com  Wed Sep  6 15:19:58 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 06 Sep 2006 11:19:58 -0400
Subject: [vsipl++] Readme for Profiling
In-Reply-To: <44FB32AD.8050707@codesourcery.com>
References: <44EB5004.1090409@codesourcery.com> <44FB32AD.8050707@codesourcery.com>
Message-ID: <44FEE71E.8070906@codesourcery.com>

Don McCoy wrote:
 > This has been updated to reflect recent changes in how we enable 
profiling.

Don,

This looks good.

I originally envisioned this to be primarily the event descriptions in
section 5 ("Event Tags") that would be fluctuating around enough that a
text file is the best place to keep them.

However, the additional material looks good.  It just increases the
pressure to get it into (soon to be created) reference section of
the user's guide!

I have some suggestions below.  I think the document looks good overall,
once you're happy, please check it in as a text file.

In the meantime, I will rename the "Tutorial" document to be a "User's
Guide" with tutorial and reference sections.  When this is done, you can
create a chapter with the stable bits from this document.

				-- Jules

 >
 > ------------------------------------------------------------------------
 >
 > 2006-09-03  Don McCoy  <don at codesourcery.com>
 >
 > 	* profiling.txt: New file.  Readme for built-in profiling.
 >
 >
 > ------------------------------------------------------------------------
 >
 > Index: profiling.txt
 > ===================================================================
 > --- profiling.txt	(revision 0)
 > +++ profiling.txt	(revision 0)
 > @@ -0,0 +1,256 @@
 > 
+-------------------------------------------------------------------------
 > +  Sourcery VSIPL++ Profiling API
 > 
+-------------------------------------------------------------------------
 > +Copyright (c) 2006 by CodeSourcery.  All rights reserved.
 > +
 > +
 > +Contents
 > 
+-------------------------------------------------------------------------
 > +1) Compiling with Profiling Enabled
 > +2) Command Line Options
 > +3) Profiling Functions
 > +4) Profile Log Files
 > +5) Event Tags
 > +
 > +
 > +
 > +1) Compiling with Profiling Enabled

I would call this section "Configure and Compile Options for Profiling"
or just "Configure and COmpile Options" (since this is profiling reference
chapter).

 > 
+-------------------------------------------------------------------------

"There are no configure options for profiling, instead it is enabled
via compile-time options.  However, to use profiling it is necessary
to configure the library with a suitable high-resolution timer (cross
reference to '--enable_timer' option in quickstart).  For example,"

 > +If building from source, enable a suitable high-resolution timer
 > +when configuring the library.  For example,
 > +
 > +  --enable-timer=x86_64_tsc
 > +
 > +Pre-built versions of the library enable a suitable timer for your
 > +system.
 > +
 > +
 > +To enable profiling, define VSIP_IMPL_PROFILER=<mask> on the command
 > +line when compiling your program.  On many systems, this option may be
 > +added to the CXXFLAGS variable in the project makefile.
 > +
 > +This macro enables profiling operations in several different areas
 > +of the library, depending on the value of <mask>
 > +
 > +	  Profiling Configuration Mask
 > +
 > +	Section	Description		Value
 > +        -------------------------------------
 > +	signal	Signal Processing         1	
 > +	matvec	Linear Algbra		  2
 > +	fns	Elementwise Functions	  4
 > +	user	User-defined Operations	  8
 > +
 > +Determine the mask value by summing the values listed in the table
 > +for the areas you wish to profile.  For example, if you wish to
 > +gather performance data on your own code as well as for FFT's,
 > +you would enable 'user' and 'signal' from the table above.  The
 > +value you would choose would be 1 + 8 = 9.
 > +
 > +
 > +
 > +2) Command Line Options
 > 
+-------------------------------------------------------------------------

I would emphasize that this is contigent on enabling profiling when 
compiling.

"For programs that have been compiled with profiling enabled, the profiling
mode and output file can be controlled from the command line."

 > +You may profile programs without inserting any code by specifying the
 > +options on the command line.  Use this to choose the profiler mode:
 > +
 > +  --vsipl++-profile-mode={accum, trace}
 > +

These paragraphs on trace and accumulate mode could go into a separate
section "Profiling Modes", or into the section on the log file format.
This point in the file is the first point they are used so it is the
logical place to define them, but since this is reference text, it may
not be used in a linear fashion.  I.e. If a user may want to refresh
their memory on what the modes are ("what are the profilign modes
again?"), it would not be readily apparant from the table of contents
that their definitions are in this section.

 > +In 'trace' mode, the start and stop times where events begin and end
 > +are stored as profile data.  The log will present these events in
 > +chronological order.  This mode is preferred when a highly detailed
 > +view of program execution is desired.
 > +
 > +In 'accumulate' mode, the start and stop times are subtracted to
 > +compute the duration of an event and the cumulative sum of these
 > +durations are stored as profile data.  The log will indicate the
 > +total amount of time spent in each event.  This mode is desirable
 > +when investigating a specific function's average performance.
 > +
 > +
 > +Specify the path to the log file for profile output using:
 > +
 > +  --vsipl++-profile-output=/path/to/logfile
 > +
 > +The second option defaults to the standard output on most
 > +systems, so it may be omitted.
 > +
 > +
 > +

 > +3) Profiling Functions

These are all objects, so this should be "Profiling Objects"

 > 
+-------------------------------------------------------------------------

It would be good to clarify that manually creating a Profile object
is an alternative to controlling profiling from the command line.

Maybe end the previous section with a transition paragraph:

"The profiling command line options control profiling for the entire
program execution.  For finer grain control, such as enabling profiling
during a specific portion of the program, or to mix different profiling
modes, explicit Profiling objects can be created."

Also, I would mention the arguments (object creation) before mentioning
what happens when the object is destroyed:

"The 'Profile' object is used to enable profiling during the lifetime
of the object.  When created, it takes arguments to indicate the
output file and the profiling mode (trace or accumulate).  When
destroyted (i.e. goes out of scope or is explicitly deleted),
the profile data is written to the specified output file.  For example:"

 > +The 'Profile' object is created to gather timing data for the
 > +duration of its existence.   When it is destroyed (i.e. goes
 > +out of scope or is explicitly deleted) the profile data is written
 > +to the specified output file.  The first parameter specifies the
 > +logfile and the second, the profiling mode.  For example:
 > +
 > +  impl::profile::Profile profile("profile.txt", impl::profile::accum)

Let's not overwrite this file with profiling output!

It would be good to clarify or hint that a user only needs to create
Scope_event objects for user-defined events.  The library already
defines a host of Scope_events for internal events.

 > +
 > +The 'Scope_event' object is used to insert a profiler event
 > +into the log.

"'Scope_event' is only necessary in user code for user-defined
events."


 >  This object should be created at the point where
 > +you wish to begin timing and destroyed when the event is over
 > +(such as a computation).  For example:
 > +
 > +  impl::profile::Scope_event event("User Event", op_count);
                                       ^^^^^^^^^^^^

"Event Tag" would tie this to the use of 'tag' in the log file
description.

 > +
 > +The first parameter is the tag that will be used to display the
 > +event's performance data in the log file.  The second parameter is
                                            ^
"(Section 5 "Event Tags" describes the event tags used internally by
the library)"

 > +optional.  If used, 'op_count' should be an unsigned integer specifying
 > +an estimate of the total number of operations (floating point or
 > +otherwise) performed.  This is used by the profiler to compute
 > +the rate of computation.  Without it, the profiler will still
 > +yield useful timing data.
 > +
 > +Creating a Scope_event object on the stack is the easiest way
 > +to control the region it will profile.  For example, from within
 > +the body of a function (or the as the entire function), use
 > +this to define a region of interest:
 > +
 > +  {
 > +    impl::profile::Scope_event event("Main computation:");
 > +
 > +    // perform main computation
 > +    //
 > +      ...
 > +  }
 > +
 > +The closing brace causes 'event' to go out of scope, logging
 > +the amount of time spent doing the computation.
 > +
 > +
 > +
 > +4) Profile Log Files
 > 
+-------------------------------------------------------------------------
 > +The profiler outputs a small header at the beginning of each log file.
 > +The headers differ slighly for acculate mode and trace modes.

4a) Log file header

	# mode: pm_accum
	# timer: x86_64_tsc_time
	# clocks_per_sec: 3591375104

The log file header has separate lines that describe:
  - the profiling mode used,
  - the low-level timer used to measure clock ticks,
  - the number of clock ticks per second,

 > +
 > +4a) Accumulate mode
 > +
 > +# mode: pm_accum
 > +# timer: x86_64_tsc_time
 > +# clocks_per_sec: 3591375104
 > +#
 > +# tag : total ticks : num calls : op count : mops
 > +
 > +The respective columns that follow this header are:
 > +
 > +  tag		A descriptive name of the operation.  This is either
 > +		a name used internally or specified by the user.
 > +
 > +  total ticks	The duration of the event in processor ticks.
 > +
 > +  num calls	The number of times the event occurred.
 > +
 > +  op count	The number of operations performed per event.
 > +
 > +  mops		The calculated performance figure in millions
 > +		of operations per second.

You could describe how mops is computed:


                    num_calls * op_count
                   ----------------------
                            10^6
    mops =      ----------------------------
                        total_ticks
                      ----------------
                       clocks_per_sec

 > +
 > +
 > +4b) Trace mode
 > +
 > +# mode: pm_trace
 > +# timer: x86_64_tsc_time
 > +# clocks_per_sec: 3591375104
 > +#
 > +# index : tag : ticks : open id : op count
 > +
 > +The respective columns that follow this header are:
 > +
 > +  index 	The entry number, beginning at one.
 > +
 > +  tag		A descriptive name of the operation.  This is either
 > +		a name used internally or specified by the user.
 > +
 > +  ticks		The current reading from the processor clock.
 > +
 > +  open id	A zero to indicate an event was created.
 > +		An event index to indicated the end of an event.

                 "If zero, indicates the start of an event.
	        If non-zero, this indicates the end of an event and
	        refers to the index of corresponding start of the
	        event"

 > +
 > +  op count	The number of operations performed per event, or
 > +		zero to indicate the end of an event.
 > +
 > +
 > +Note that the timings expressed in 'ticks' may be converted to seconds
 > +by dividing by the 'clocks_per_second' constant in the header.
 > +
 > +
 > +
 > +5) Event Tags
 > 
+-------------------------------------------------------------------------
 > +Sourcery VSIPL++ uses the following tags for profiling objects/functions
 > +within the library.  These tags are readable text containing information
 > +that varies depending on the event, but generally it tells you:
                                                         ^^^^^^^^^
"but generally it describes:"

 > +
 > +  * The object/function name
 > +  * The number of dimensions
 > +  * Information about the data types involved
 > +  * The size of each dimension
 > +
 > +In all cases, data types (<T>, <I> and <O> below) are expressed using
 > +the BLAS/LAPACK convention of
 > +
 > +    S - float
 > +    C - complex
 > +    D - double
 > +    Z - complex
 > +
 > +Expressions on views (vectors, matrices) are shown using prefix
 > +notation, i.e.
 > +
 > +    operator(operand, ...)
 > +
 > +Each operand may be the result of another computation, so expressions
 > +are nested, the parenthesis determining the order of evaluation.
 > +When the operand types are views, the usual S/D/C/Z are used to
 > +indicate the type.  When operands are scalars, lower-case values
 > +are used instead (s/d/c/z).
 > +
 > +
 > +Current Tag List:
 > +
 > +     --signal--
 > +     Convolution [1D|2D] <T> <row_size>x<col_size>
 > +     Correlation [1D|2D] <T> <row_size>x<col_size>
 > +     Fft 1D [Inv|Fwd] <I>-<O> [by_ref|by_val] <size>x1

What about 2D and 3D Ffts?  Perhaps this should be:

         Fft [1D|2D|3D] [Inv|Fwd] <I>-<O> [by_ref|by_val] <size>

 > +     Fftm 2D [Inv|Fwd] <I>-<O> [by_ref|by_val] <row_size>x<col_size>

All Fftm's are 2D.  However, the can either be row-wise or column-wise.
Perhaps this could be:

         Fftm [row|col] [Inv|Fwd] <I>-<O> [by_ref|by_val] 
<row_size>x<col_size>

 > +     Fir <T> <size>
 > +     Iir <T> <size>
 > +
 > +     --matvec--
 > +     dot <T> <size>x1
 > +     cvjdot <T> <size>x1
 > +     trans <T> <row_size>x<col_size>
 > +     herm <T> <row_size>x<col_size>
 > +     kron <T> <row_size_a>x<col_size_a> <row_size_b>x<col_size_b>
 > +     outer <T> <size_a>x1 <size_b>x1
 > +     gemp <T> <row_size_a>x<col_size_a> <row_size_b>x<col_size_b>
 > +     gems <T> <row_size>x<col_size>
 > +     cumsum <T> <row_size>x<col_size>
 > +     modulate <T> <row_size>x1
 > +
 > +     --fns--

"--Element-wise expressions--" would be more descriptive to the user.

Also (although some of this is redundant with above)

"For element-wise expressions, event tags have the following format:

	EVALUATOR DIM EXPR SIZE

The EVALUATOR indicates which VSIPL++ evaluator was dispatched to
compute the expression.

DIM indicates the dimensionality of the expression.

EXPR is memonic of the expression.

SIZE is ..."

Also, a brief description of the evaluators would be useful:

"The following evaluators are provided (Dispath to vendor math
libraries, such as SAL and IPP, is implemented with multiple
evaluators that share the same prefix):

  - Expr_Loop      - generic loop-fusion evaluator.
  - Expr_SIMD_Loop - SIMD loop-fusion evaluator.
  - Expr_Copy      - optimized data-copy evaluator.
  - Expr_Trans     - optimized matrix transpose evaluator.
  - Expr_Dense     - evaluator for dense, multi-dimensional expressions.
                     Converts them into corresponding 1-dim expressions
                     that are re-dispatched.
  - Expr_SAL_*     - evaluators for dispatch to the SAL vendor math library.
  - Expr_IPP_*     - evaluators for dispatch to the SAL vendor math library.
  - Expr_SIMD_*    - evaluators for dispatch to the builtin SIMD routines
                     (with the exception of Expr_SIMD_Loop, see above).

A complete listing of the evaluators is useful, but I would be OK with
leaving it out in favor of a condense list (Expr_SAL_* instead of a
complete listing of the SAL evaluators).  The complete list is going
to fluctuate and its going to have extraneous detail that we won't
docuemnt here (for example, this isn't the place to describe the
difference between Expr_IPP_SV-<func> and Expr_IPP_SV_FO-<func> and
between Expr_SAL_VVV and Expr_SAL_fVVV).

The condensed list should be enough for the user to determine if their
functions was dispatched to a math library (i.e. Expr_IPP_*), handled
internally in an optimized fashioned, or just handled with loop
fusion.


 > +     Expr_Loop [1D|2D|3D] <expr> <size>
 > +     Expr_Copy      "       "              (all have dim/expr/size)
 > +     Expr_Trans
 > +     Expr_Dense
 > +     Expr_SAL_COPY
 > +     Expr_SAL_V
 > +     Expr_SAL_VV
 > +     Expr_SAL_VVV
 > +     Expr_SAL_fVVV
 > +     Expr_SAL_VV_V
 > +     Expr_SAL_V_VV
 > +     Expr_SAL_fVV_V
 > +     Expr_Loop_Vmmul
 > +     Expr_IPP_V-<func>
 > +     Expr_IPP_VV-<func>
 > +     Expr_IPP_SV-<func>
 > +     Expr_IPP_SV_FO-<func>
 > +     Expr_IPP_VS-<func>
 > +     Expr_IPP_VS_AS_SV-<func>
 > +     Expr_SIMD_V-<func>
 > +     Expr_SIMD_VV-<func>
 > +     Expr_SIMD_Loop
 > +


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Thu Sep  7 04:19:34 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Thu, 07 Sep 2006 00:19:34 -0400
Subject: patch: Disable exceptions when compiler doesn't support them.
Message-ID: <44FF9DD6.9090707@codesourcery.com>

The attached patch allows the library to detect when the icl doesn't
support exceptions (i.e. when -GX or equivalent is not used) and makes
it use vsip::impl::fatal_exception() instead.
Additionally, the latter now also reports the call-site.

I have tested with 'icl -GX' as well as with 'icl' and got the
desired results.

OK to check in ?

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: no-exception.patch
Type: text/x-patch
Size: 2674 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060907/e1b74f1d/attachment.bin>

From mark at codesourcery.com  Thu Sep  7 04:28:49 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Wed, 06 Sep 2006 21:28:49 -0700
Subject: [vsipl++] patch: Disable exceptions when compiler doesn't support
 them.
In-Reply-To: <44FF9DD6.9090707@codesourcery.com>
References: <44FF9DD6.9090707@codesourcery.com>
Message-ID: <44FFA001.6040108@codesourcery.com>

Stefan Seefeld wrote:

> +// If the Intel compiler on windows is used without exception handling (-GX)
> +#  if defined(__ICL) && __EXCEPTIONS != 1

Picking nits: it's usually best to say "&& !__EXCEPTIONS" for things 
like this, since they might set __EXCEPTIONS to 2 in the future to 
indicate that they have a superset of what we currently think of as 
exceptions.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Thu Sep  7 11:04:37 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 07 Sep 2006 07:04:37 -0400
Subject: [vsipl++] patch: Disable exceptions when compiler doesn't support
 them.
In-Reply-To: <44FF9DD6.9090707@codesourcery.com>
References: <44FF9DD6.9090707@codesourcery.com>
Message-ID: <44FFFCC5.6060705@codesourcery.com>

Stefan Seefeld wrote:

> 
> OK to check in ?
> 

Stefan, yes this looks good (with Mark's suggestion).  thanks, -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From joseph_sacco at comcast.net  Fri Sep  8 16:45:36 2006
From: joseph_sacco at comcast.net (Joseph E. Sacco, Ph.D.)
Date: Fri, 08 Sep 2006 12:45:36 -0400
Subject: configure fails to recognize mpich
Message-ID: <1157733936.25286.22.camel@plantain.jesacco.com>

System:
* G4-based PPC running YDL-4.1 [FC4 clone]
* gcc-4.0.2
* mpich-1.2.7p1  [installed in /opt/mpich]
* mpich2-1.0.4pl [installed in /opt/mpich2]

==========================================================================

Problem:

        configure test for mpich fails.
        

Discussion
----------
Running

    ./configure --prefix=/opt/vsipl++ --with-mpi-prefix=/opt/mpich

fails:

                                   ...
        checking for mpi.h... yes
        checking whether MPICH_NAME is declared... yes
        checking for MPI build instructions... configure: error: Unable
        to compile / link test MPI application.


The same result is obtained with mpich2

        ./configure --prefix=/opt/vsipl++ --with-mpi-prefix=/opt/mpich


The test for mpi within configure appears rather innocuous:

        #include VSIP_IMPL_MPI_H       <<===  #include <mpi.h>
        int
        main ()
        {
        MPI_Init(0, 0);
          ;
          return 0;
        }

and does compile /link when run outside of configure.


Thoughts???


-Joseph


-- 
joseph_sacco [at] comcast [dot] net


From jules at codesourcery.com  Fri Sep  8 17:24:26 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 08 Sep 2006 13:24:26 -0400
Subject: [vsipl++] configure fails to recognize mpich
In-Reply-To: <1157733936.25286.22.camel@plantain.jesacco.com>
References: <1157733936.25286.22.camel@plantain.jesacco.com>
Message-ID: <4501A74A.9060700@codesourcery.com>

Joseph,

We've tested with MPICH in the past, but unfortunately much of our 
recent work has been with LAM/MPI.  We would like to fix this though.

Would you mind sending the config.log file?

			thanks,
			-- Jules

Joseph E. Sacco, Ph.D. wrote:

> 
> Problem:
> 
>         configure test for mpich fails.
>         
>


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Fri Sep  8 20:59:41 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Fri, 08 Sep 2006 16:59:41 -0400
Subject: [vsipl++] configure fails to recognize mpich
In-Reply-To: <1157733936.25286.22.camel@plantain.jesacco.com>
References: <1157733936.25286.22.camel@plantain.jesacco.com>
Message-ID: <4501D9BD.3040004@codesourcery.com>

Joseph,

I believe I have found the cause of the error. Our configuration script
assumes that running 'mpicxx -show -c' will generate a command string in which
the last token is '-c', which we then filter out using sed.

However, it appears in your case mpicxx generates a command string where
the '-c' option is in between other options, and so our attempt to filter
it out fails.

The attached patch makes sed filter out the '-c' option no matter where in the
command it appears. Please confirm that this fixes the error for you.

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.diff
Type: text/x-patch
Size: 604 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060908/530ab337/attachment.bin>

From joseph_sacco at comcast.net  Sat Sep  9 17:26:30 2006
From: joseph_sacco at comcast.net (Joseph E. Sacco, Ph.D.)
Date: Sat, 09 Sep 2006 13:26:30 -0400
Subject: [vsipl++] configure fails to recognize mpich
In-Reply-To: <4501D9BD.3040004@codesourcery.com>
References: <1157733936.25286.22.camel@plantain.jesacco.com>
	 <4501D9BD.3040004@codesourcery.com>
Message-ID: <1157822790.2513.10.camel@plantain.jesacco.com>

Stefan,

I can confirm that your patch works:

[patch applied to configure]

                          ...
        with mpi enabled:                        yes
        With parallel service:                   mpich
                              ...
        
There are many ways to resolve this issue. mpich1 supports command line
arguments:

$ mpicxx -compile-info
g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
-DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -DHAVE_MPI_CPP
-I/opt/mpich/include/mpi2c++ -fexceptions -c -I/opt/mpich/include

$ mpicxx -link-info
g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
-DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/opt/mpich/lib
-lpmpich++ -lmpich -lpthread -lrt


Maybe it would be cleaner to use these directly rather than using
"-show". 

Be well,

-Joseph

=========================================================================
On Fri, 2006-09-08 at 16:59 -0400, Stefan Seefeld wrote:
> Joseph,
> 
> I believe I have found the cause of the error. Our configuration script
> assumes that running 'mpicxx -show -c' will generate a command string in which
> the last token is '-c', which we then filter out using sed.
> 
> However, it appears in your case mpicxx generates a command string where
> the '-c' option is in between other options, and so our attempt to filter
> it out fails.
> 
> The attached patch makes sed filter out the '-c' option no matter where in the
> command it appears. Please confirm that this fixes the error for you.
> 
> Thanks,
> 		Stefan
> 
-- 
joseph_sacco [at] comcast [dot] net


From stefan at codesourcery.com  Sat Sep  9 18:04:09 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Sat, 09 Sep 2006 14:04:09 -0400
Subject: [vsipl++] configure fails to recognize mpich
In-Reply-To: <1157822790.2513.10.camel@plantain.jesacco.com>
References: <1157733936.25286.22.camel@plantain.jesacco.com>	 <4501D9BD.3040004@codesourcery.com> <1157822790.2513.10.camel@plantain.jesacco.com>
Message-ID: <45030219.9030604@codesourcery.com>

Joseph E. Sacco, Ph.D. wrote:
> Stefan,
> 
> I can confirm that your patch works:
> 
> [patch applied to configure]
> 
>                           ...
>         with mpi enabled:                        yes
>         With parallel service:                   mpich
>                               ...

Excellent !

> There are many ways to resolve this issue. mpich1 supports command line
> arguments:
> 
> $ mpicxx -compile-info
> g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -DHAVE_MPI_CPP
> -I/opt/mpich/include/mpi2c++ -fexceptions -c -I/opt/mpich/include
> 
> $ mpicxx -link-info
> g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/opt/mpich/lib
> -lpmpich++ -lmpich -lpthread -lrt
> 
> 
> Maybe it would be cleaner to use these directly rather than using
> "-show". 

I agree. However, as we have to deal with different versions of that
applet, we are aiming for a mechanism that is supported by all of them.
The '-show' / '-showme' option seems to be the least common denominator.

Thanks,
		Stefan		

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From don at codesourcery.com  Sun Sep 10 01:13:32 2006
From: don at codesourcery.com (Don McCoy)
Date: Sat, 09 Sep 2006 19:13:32 -0600
Subject: [patch] Fixes for building benchmarks with IPP (and MPI)
Message-ID: <450366BC.70904@codesourcery.com>

This patch does the following:

    * adds missing includes for several of the IPP benchmarks
    * removes two dependencies on headers in tests/ for the benchmarks
      (by moving them to vsip_csl/)
    * changes the standalone benchmarks makefile to exclude MPI-specific
      benchmarks by default
    * corrects a missing definition needed in the benchmark's makefile
      for detecting whether or not MPI is used


Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ib.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060909/436f5e37/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ib.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060909/436f5e37/attachment-0001.ksh>

From don at codesourcery.com  Sun Sep 10 21:24:25 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 10 Sep 2006 15:24:25 -0600
Subject: [patch] CFAR benchmark storage order
Message-ID: <45048289.4000700@codesourcery.com>

This change reverts the storage order of the tensor back to 'tuple<0, 1, 
2>' for the Vector and Hybrid methods.  The Slice method explicitly uses 
'tuple<2, 1, 0>' in order to get the best performance.

This was tested in the 'builtin' configuration on both 32-bit and 64-bit 
platforms, using GCC 4.1 and 3.4 respectively.

Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060910/4d6ddfcf/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060910/4d6ddfcf/attachment-0001.ksh>

From stefan at codesourcery.com  Mon Sep 11 14:28:22 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Mon, 11 Sep 2006 10:28:22 -0400
Subject: patch: Some more adjustments for intel-win
Message-ID: <45057286.3090408@codesourcery.com>

The attached patch contains some more (very minor) adjustments
for compiling with intel-win, as well as a fix for a bug in
our MPI detection, as reported by Joseph E. Sacco.

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060911/983e8ce7/attachment.ksh>

From jules at codesourcery.com  Mon Sep 11 15:04:09 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 11 Sep 2006 11:04:09 -0400
Subject: [patch] Tutorial updates
Message-ID: <45057AE9.3040906@codesourcery.com>

This patch makes some of the tutorial updates we have discussed:

  - Focuses tutorial on fast convolution by splitting the parallel fast
    convolution chapter into separate chapters for serial and parallel.

  - Makes the tutorial a user's guide with two parts: tutorial (Part I)
    and reference (Part II).

Also attached is a PDF.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: doc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060911/089f1bb5/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tutorial.pdf
Type: application/pdf
Size: 173784 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060911/089f1bb5/attachment.pdf>

From stefan at codesourcery.com  Tue Sep 12 03:40:36 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Mon, 11 Sep 2006 23:40:36 -0400
Subject: [vsipl++] patch: Some more adjustments for intel-win
In-Reply-To: <45057286.3090408@codesourcery.com>
References: <45057286.3090408@codesourcery.com>
Message-ID: <45062C34.3050000@codesourcery.com>

Here is an enhanced and extended version of the previous patch.
New additions include a new vsip/impl/inttypes.hpp header
providing fixed-size integer types such as int8_type, which
makes the vsip_csl::matlab code work even on windows (where
there is neither <stdint.h> nor <inttypes.h>),
and a fix to a bug related to the handling of Rt_ext_data.

OK to check in ?

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: intel-win.patch
Type: text/x-patch
Size: 28618 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060911/21e63de0/attachment.bin>

From mark at codesourcery.com  Tue Sep 12 03:50:12 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Mon, 11 Sep 2006 20:50:12 -0700
Subject: [vsipl++] patch: Some more adjustments for intel-win
In-Reply-To: <45062C34.3050000@codesourcery.com>
References: <45057286.3090408@codesourcery.com> <45062C34.3050000@codesourcery.com>
Message-ID: <45062E74.7090201@codesourcery.com>

Stefan Seefeld wrote:

> +# if SIZEOF_CHAR == 1
> +  typedef signed char int8_type;
> +  typedef unsigned char uint8_type;
> +# else
> +#  error "No 8-bit integer type"
> +# endif
> +
> +# if SIZEOF_SHORT == 2
> +  typedef short int16_type;
> +  typedef unsigned short uint16_type;

Just for the record:

1. sizeof (char) is required to be 1 in C++.

2. However, char is not required to be an 8-bit type.

So, in theory, these checks (which you added at my suggestion) are not 
fully robust.  For example, on a machine for which char is a 32-bit 
type, the above code will not work as intended.

However, I would not worry about this -- not even a little bit.  There 
are a very few such machines, and none in mainstream use, and if we find 
one, we can always fix this at that point.  (One relatively portable way 
might be to use UCHAR_{MIN,MAX} to tell us how many bits are actually in 
a char.)

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Tue Sep 12 12:41:23 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 08:41:23 -0400
Subject: [vsipl++] patch: Some more adjustments for intel-win
In-Reply-To: <45062C34.3050000@codesourcery.com>
References: <45057286.3090408@codesourcery.com> <45062C34.3050000@codesourcery.com>
Message-ID: <4506AAF3.7060304@codesourcery.com>

Stefan Seefeld wrote:
> Here is an enhanced and extended version of the previous patch.
> New additions include a new vsip/impl/inttypes.hpp header
> providing fixed-size integer types such as int8_type, which
> makes the vsip_csl::matlab code work even on windows (where
> there is neither <stdint.h> nor <inttypes.h>),
> and a fix to a bug related to the handling of Rt_ext_data.
> 
> OK to check in ?
> 
> Thanks,
> 		Stefan

Stefan, this looks good, please commit.  thanks, -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue Sep 12 14:12:29 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 10:12:29 -0400
Subject: [patch] Fix SIMD loop fusion to handle re-dimensioned expressions
Message-ID: <4506C04D.9070207@codesourcery.com>

This patch uses Adjust_layout_dim so that SIMD loop fusion Ext_data 
access works with re-dimensioned expressions (i.e. those generated by 
Eval_dense_expr).  It also adds a regression test for the case, and 
extends coverage_binary as well.

This was causing the fft test to fail when for the builtin binary packages.

Patch applied.

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fix.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/b11bccd3/attachment.ksh>

From jules at codesourcery.com  Tue Sep 12 16:10:17 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 12:10:17 -0400
Subject: [patch] Changes for merged packages
Message-ID: <4506DBE9.1040307@codesourcery.com>

This patch makes the changes necessary to build merged packages.

Major changes:
  - configure.ac: Move macros for parallel services, FFT, and ATLAS
    from acconfig.hpp to command line, so that different library
    variants in merged package will have *similar* acconfig.hpp.

    I say similar because there are some macros that configure
    places in acconfig.hpp that are only included in some variants
    (SIZEOF_DOUBLE, SIZEOF_LONG_DOUBLE) and different between variants
    (SIZEOF_LONG_DOUBLE).  Since we only use these values during
    configure, and not in the library, the differences don't
    affect the merged package.

    However, to be safe, I've undefined those in config.hpp.

  - package.py and scripts/config: Change to build merged packages.
    Primarily use --libdir to distinguish between variants instead of
    suffixes (although suffixes are still used to save away acconfig.hpp
    and results.qmr files for later inspection).

This patch also includes:
  - adds -lvsip_csl to context.in and vsipl++.pc.in so that tests using
    vsip_csl pass.

  - adds some verbose macros to fft.cpp to make failures easier to debug.

Ok to commit?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mondo.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/877e4144/attachment.ksh>

From jules at codesourcery.com  Tue Sep 12 17:26:21 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 13:26:21 -0400
Subject: [vsipl++] [patch] CFAR benchmark storage order
In-Reply-To: <45048289.4000700@codesourcery.com>
References: <45048289.4000700@codesourcery.com>
Message-ID: <4506EDBD.7020602@codesourcery.com>

Don McCoy wrote:
> This change reverts the storage order of the tensor back to 'tuple<0, 1, 
> 2>' for the Vector and Hybrid methods.  The Slice method explicitly uses 
> 'tuple<2, 1, 0>' in order to get the best performance.
> 
> This was tested in the 'builtin' configuration on both 32-bit and 64-bit 
> platforms, using GCC 4.1 and 3.4 respectively.

Don, this looks good, please commit.  thanks, -- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue Sep 12 17:38:47 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 13:38:47 -0400
Subject: [vsipl++] [patch] Fixes for building benchmarks with IPP (and
 MPI)
In-Reply-To: <450366BC.70904@codesourcery.com>
References: <450366BC.70904@codesourcery.com>
Message-ID: <4506F0A7.1040800@codesourcery.com>

Don McCoy wrote:
> This patch does the following:
> 
>    * adds missing includes for several of the IPP benchmarks
>    * removes two dependencies on headers in tests/ for the benchmarks
>      (by moving them to vsip_csl/)
>    * changes the standalone benchmarks makefile to exclude MPI-specific
>      benchmarks by default
>    * corrects a missing definition needed in the benchmark's makefile
>      for detecting whether or not MPI is used

Don,

This looks good.  I have one comment below, please check it in once that 
is addressed.

				thanks,
				-- Jules

> Index: GNUmakefile.in
> ===================================================================
> --- GNUmakefile.in	(revision 148805)
> +++ GNUmakefile.in	(working copy)
> @@ -116,8 +116,8 @@
>  VSIP_IMPL_SAL_FFT := @VSIP_IMPL_SAL_FFT@
>  VSIP_IMPL_IPP_FFT := @VSIP_IMPL_IPP_FFT@
>  VSIP_IMPL_FFTW3 := @VSIP_IMPL_FFTW3@
> +VSIP_IMPL_MPI_H := @VSIP_IMPL_MPI_H@

Since VSIP_IMPL_MPI_H is used here as a boolean (1 if MPI is present), 
and is used elsewhere as the name of the MPI header file, can you call 
it something else to avoid confusion -- for example VSIP_IMPL_MPI or 
VSIP_IMPL_HAVE_MPI?


> Index: configure.ac
> ===================================================================
> --- configure.ac	(revision 148805)
> +++ configure.ac	(working copy)
> @@ -858,7 +858,8 @@
>        vsipl_par_service=0
>        CPPFLAGS="$save_CPPFLAGS"
>      fi
> -  else
> +  else 
> +    AC_SUBST(VSIP_IMPL_MPI_H, 1)
>      AC_DEFINE_UNQUOTED([VSIP_IMPL_MPI_H], $vsipl_mpi_h_name,
>      [The name of the header to include for the MPI interface, with <> quotes.])


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Tue Sep 12 19:39:22 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 12 Sep 2006 15:39:22 -0400
Subject: patch: Fix issues with hypotf.
Message-ID: <45070CEA.3070300@codesourcery.com>

The attached patch properly forward-declares hypotf as extern "C",
and falls back to ::hypot(double, double) if not HAVE_HYPOTF.
The patch is checked in.

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/81ad0a38/attachment.ksh>

From don at codesourcery.com  Tue Sep 12 19:49:46 2006
From: don at codesourcery.com (Don McCoy)
Date: Tue, 12 Sep 2006 13:49:46 -0600
Subject: [vsipl++] [patch] Fixes for building benchmarks with IPP (and
 MPI)
In-Reply-To: <4506F0A7.1040800@codesourcery.com>
References: <450366BC.70904@codesourcery.com> <4506F0A7.1040800@codesourcery.com>
Message-ID: <45070F5A.4050303@codesourcery.com>

Jules Bergmann wrote:
> Since VSIP_IMPL_MPI_H is used here as a boolean (1 if MPI is present), 
> and is used elsewhere as the name of the MPI header file, can you call 
> it something else to avoid confusion -- for example VSIP_IMPL_MPI or 
> VSIP_IMPL_HAVE_MPI?
Done.  Checked in.

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712


From don at codesourcery.com  Wed Sep 13 00:16:08 2006
From: don at codesourcery.com (Don McCoy)
Date: Tue, 12 Sep 2006 18:16:08 -0600
Subject: [vsipl++] [patch] Tutorial updates
In-Reply-To: <45057AE9.3040906@codesourcery.com>
References: <45057AE9.3040906@codesourcery.com>
Message-ID: <45074DC8.6000608@codesourcery.com>

Jules Bergmann wrote:
> This patch makes some of the tutorial updates we have discussed:
>
>  - Focuses tutorial on fast convolution by splitting the parallel fast
>    convolution chapter into separate chapters for serial and parallel.
>
>  - Makes the tutorial a user's guide with two parts: tutorial (Part I)
>    and reference (Part II).
>
This patch extends these changes further by:

  - Rewrites the performance chapter about profiling in part I using 
fast convolution as the example.

  - Adding a profiling section to the reference part II.

Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pt3.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/d392c2ac/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pt3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/d392c2ac/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tutorial.pdf
Type: application/pdf
Size: 220169 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/d392c2ac/attachment.pdf>

From jules at codesourcery.com  Wed Sep 13 02:23:52 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Sep 2006 22:23:52 -0400
Subject: [patch] Fast path for FFT
Message-ID: <45076BB8.60005@codesourcery.com>

This patch adds a fast path for 1-dim, CC FFTs with unit-stride data. 
The fast path uses compiled-time Ext_data instead of Rt_ext_data for a 
marginal performance improvement.

To determine whether the backend will work with the fastpath (in 
particular, whether it supports split- or interleaved- complex), it is 
queried when the workspace is created.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fp.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/dda0be89/attachment.ksh>

From don at codesourcery.com  Wed Sep 13 03:55:48 2006
From: don at codesourcery.com (Don McCoy)
Date: Tue, 12 Sep 2006 21:55:48 -0600
Subject: [vsipl++] [patch] CFAR benchmark storage order
In-Reply-To: <4506EDBD.7020602@codesourcery.com>
References: <45048289.4000700@codesourcery.com> <4506EDBD.7020602@codesourcery.com>
Message-ID: <45078144.1020009@codesourcery.com>

Jules Bergmann wrote:
> Don McCoy wrote:
>> This change reverts the storage order of the tensor back to 'tuple<0, 
>> 1, 2>' for the Vector and Hybrid methods.  The Slice method 
>> explicitly uses 'tuple<2, 1, 0>' in order to get the best performance.
>>
>
This patch corrects an error with the previous patch (tuple<2,1,0> 
should have been tuple<2,0,1>).

Mea culpa,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/3da922c5/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060912/3da922c5/attachment-0001.ksh>

From don at codesourcery.com  Wed Sep 13 07:54:24 2006
From: don at codesourcery.com (Don McCoy)
Date: Wed, 13 Sep 2006 01:54:24 -0600
Subject: [vsipl++] [patch] CFAR benchmark storage order
In-Reply-To: <45078144.1020009@codesourcery.com>
References: <45048289.4000700@codesourcery.com> <4506EDBD.7020602@codesourcery.com> <45078144.1020009@codesourcery.com>
Message-ID: <4507B930.6070403@codesourcery.com>

Please disregard the previous version(s) of this patch.  The attached 
version has been checked more thoroughly than before.  This time I ran 
all the sets with varying storage orders for the CFAR data cube, then I 
compared results at the points specified by the HPEC Challenge (in terms 
of the number of range gates, RG). 

This retesting resulted in a change for the "by-vector" algorithm for 
about a 5% performance improvement.  See the table below, produced from 
data taken from the Xeon cluster at GTRI.

                   Slice
        RG      2-0-1   0-2-1
Set 1   64      293     210
Set 2   3500    186     147
Set 3   1909    187     145
Set 4   9900    202     150

                   Vector          Hybrid
        RG      0-1-2   1-0-2   0-1-2   1-0-2
Set 1   64      96      97      384     415
Set 2   3500    125     133     693     666
Set 3   1909    123     130     692     650
Set 4   9900    124     132     697     670

Regards,

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb3.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060913/0b9e354c/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cb3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060913/0b9e354c/attachment-0001.ksh>

From assem at codesourcery.com  Wed Sep 13 10:36:27 2006
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 13 Sep 2006 06:36:27 -0400
Subject: Matlab IO docbook
Message-ID: <4507DF2B.4040705@codesourcery.com>

Everyone,
  I had sent this patch out a while back but didn't get any replies 
about it. So, I'm assuming that this might be useful now. It is the 
docbook section that I wrote for Matlab IO.

Thanks,
Assem
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: svn.diff.09132006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060913/c3308e6c/attachment.ksh>

From jules at codesourcery.com  Wed Sep 13 13:09:24 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 13 Sep 2006 09:09:24 -0400
Subject: [vsipl++] [patch] Tutorial updates
In-Reply-To: <45074DC8.6000608@codesourcery.com>
References: <45057AE9.3040906@codesourcery.com> <45074DC8.6000608@codesourcery.com>
Message-ID: <45080304.6010704@codesourcery.com>

Don,

This looks good.  I have several suggestions below on the tutorial chapter.
Use them as you please :)  Once you're happy please check it in.  We can
continue to incorporate edits as we review at the whole document.

I haven't had a chance to read the reference chapter yet, I will send
comments on that later.

				thanks,
				-- Jules

 > +    <para>
 > +      In addition to the accumulate and trace modes, which have 
pre-defined
 > +      output formats, Sourcery VSIPL++ exposes a profiling API that 
you can
                                                     ^^ Performance API
 > +      use to gather data directly on individual objects, such as FFTs.
 > +      If you need finer control of what operations are profiled, or 
if you
 > +      want to record the profiling data in a custom format, you may 
wish to
 > +      use this API directly.  See <xref linkend="performance_api"/> for
 > +      more details.
 > +    </para>
 > +<table xml:id="supported_ops" frame="none" rowsep="0">
 > +  <title>Operations Supporting Profiling</title>


 >      <para>
 >        See the file <filename>profiling.txt</filename> for a detailed
 >        explanation of the profiler output for each of the functions 
above.
 > +      See the file <filename>profiling.txt</filename> for a detailed

Isn't profiling.txt now Chapter 5?

 > +      explanation of the profiler output for each of the functions 
above.
 > +      For information about how to configure the library for profiling,
 > +      see the Quickstart also.
 >      </para>

 > +    <para>
 > +      This macro enables profiling operations in several different areas
 > +      of the library, depending on the value of
 > +      <replaceable>mask</replaceable>.  To profile all operations, use
 > +      the value <literal>15</literal>.  See <xref 
linkend="mask-values"/>
 > +      for other possible values.

I would mention the motivation behind why we have a mask:

   "Since profiling can introduce overhead, especially for element-wise
   expressions, this macro allows you to choose which operations in the
   library are profiled.  To profile all operations, use the value
   <literal>15</literal>.  See ..."
 > +    </para>

 > +    <note>
 > +      <para>
 > +	Profiling support requires that you link with a version of Sourcery
 > +	VSIPL++ that supports profiling.  If you have received a binary
 > +	distribution of Sourcery VSIPL++ from CodeSourcery, you probably
 > +	already have an appropriate version of the library.  If you are
 > +	building Sourcery VSIPL++ yourself, see the Quickstart guide for
 > +	more information about the requirements for building Sourcery
 > +	VSIPL++ with profiling enabled.

We've changed things so that all libraries support profiling, if a timer
is provided:

   "Profiling requires that the library be configured with a high-resolution
   timer.  Binary distributions of Sourcery VSIPL++ from CodeSourcery have
   this done.  If you are building Sourcery VSIPL++ from source, see the
   Quickstart guide for more information about configuring high-resolution
   timers."

 > +      </para>
 > +    </note>


 > +    <section><title>Setup</title>
 > +    <para>
 > +      The only computation performed in the setup phase is a forward FFT
 > +      that maps the pulse replica into the frequency domain.  This
 > +      computation corresponds to the following line of the profiling
 > +      data:
 > +<screen>Fft Fwd C-C by_ref 256 : 142119 : 1 : 10240 : 258.767
 > +</screen>
 > +      The "Fft Fwd C-C by_ref 256" tag indicates that this computation
 > +      is a 256-element forward FFT with complex, single-precision 
inputs
 > +      and outputs, returning its result by reference.  The notation 
used
 > +      for data types (e.g., "C-C" in this example) is given in
                                                          ^^ described
 > +      <xref linkend="data-type-names"/>.
 > +    </para>
 > +
 >      </section>
 > -    <section id="trace-profile-data"><title>Trace Profile Data</title>
 > +    <section><title>Convert to frequency domain</title>
 >      <para>
 > -      This mode is used similarly to accumulate mode, except that an
 > -      extra parameter is passed to the creation of the 
<code>Profile</code>
 > -      object.
 > -      <screen>Profile profile("/dev/stdout", pm_trace);</screen>
 > +      The next step of the computation is to convert from the time 
domain
 > +      to the frequency domain.  In particular, an FFT is applied to 
a data
 > +      cube of 64 pulses, each containing 256 range cells:

   "In particular, a FFT is applied to each pulse of a data cube, which 
consists
   of 64 pulses each containing 256 range cells:"

 > +<screen>Fftm Fwd row_type C-C by_ref 64x256 : 1188144 : 1 : 1146880 
: 3466.65
 > +</screen>
 > +      For this FFT, the size is reported differently (rows x columns)
 > +      because this is a two-dimensional FFT.

It's not a 2-D FFT, its an "Multiple 1D FFT":

   "For this operation, a Fftm object was used to perform multiple FFTs 
on each
    row of the data cube."

 > +      The operation count (1.1 million) far outweighs that of
 > +      any other step, except the inverse FFT.
 > +      The performance measured was 3.5 GFLOPS/s on a 3.6 GHz Xeon.
 > +      Since the theoretical peak performance on such
 > +      a machine is about 14.4 GFLOP/s, the program has achieved an
 > +      a very good 24% of peak.
 > +      Other example programs measure in-cache FFT perfomance on vectors
 > +      of the same size at 4.9 GFLOP/s.  Therefore, considering that the
 > +      3.5 GFLOP/s includes cache overheads, the result is still good.

I would move the first sentence to a new paragraph following, to give it
some more context:

   "Since the operation count (1.1 million) of the FFT (and inverse
   FFT) outweigh the rest of the computation, the overall performance
   will be very close to the FFT performance."

 > +    </para>
 > +    </section>
 > +    <section><title>Convolution</title>
 > +    <para>
 > +      The actual convolution consists of a vector-matrix 
multiplication.
 > +      The corresponding profiling output is:
 > +<screen>Expr_Loop_Vmmul 2D vmmul(C,C) 64x256 : 1539531 : 1 : 98304 : 
229.321
 > +</screen>
 > +      Sourcery VSIPL++ chose to evaluate this expression by 
performing a
 > +      row-wise vector-vector multiplication on each of the rows of the
 > +      matrix.  Therefore, there is a second line:
 > +<screen>Expr_SIMD_VV-simd::vmul 1D *(C,C) 256 : 316674 : 64 : 1536 : 
1114.86
 > +</screen>
 > +      The tag used for this expression is "*(C,C)".  The profiling 
tag for
 > +      many operations is shown using a prefix notation; the operation
 > +      performed is followed by the types of the arguments.  The 
"simd" tag
 > +      indicates that VSIPL++ used the Single Instruction Multiple 
Data (SIMD)
 > +      facilities on the Xeon architecture for maximum performance.
 > +    </para>
 > +    <para>
 > +      The tick count for the vector-matrix multiplication (vmmul) 
includes
 > +      the time spent in the multiple row-wise scalar-vector 
multiplications.
 > +      Therefore the total number of time used by the program is 
*not* the
 > +      sum of the tick counts given for each line.

We should mention why vmmul performance is less than the constituent 
scalar-vector
multiplies:

   "You should notice the performance difference between the vmmul event and
   the individual scalar-vector multiplications.  Some of this is due to the
   extra work vmmul does to setup each individual multiplication: loop 
overhead
   and subview creation.  However, most of this is due to the overhead 
of profiling:
   the cost of accessing timers and the cost of maintaining profile data 
structures.

   In general, profiling overhead only slows the program execution but
   does not affect the measurements taken.  However, when an operation
   being profiled (such as vmmul) consists of many invocations of other
   profile operations (such as scalar-vector multiplication), measurements
   may be affected.

   When profiling is disabled, the performance of vmmul will be very 
close to the
   performance measured for the individual scalar-vector multiplications."

 > +    </para>
 > +    </section>
 > +    <section><title>Convert back to time domain</title>
 > +    <para>
 > +      The last step of the algorithm is to convert back to the time 
domain
 > +      by using an inverse FFT.  An inverse FFT is computationally
 > +      equivalent to a forward FFT, except that an additional 
multiplication
 > +      is performed to handle scaling.  The lines corresponding to the
 > +      inverse FFT are:

When scaling is done is a choice left up to the user, so instead of
saying "An inverse FFT is computationally equiv to a forward FFT,
except ..." which implies this is true of all FFTs, you might say "The
"The inverse FFT is computationally equiv to the forward FFT, except
...", which implies this is true for the FFTs in the example.

 > +<screen>Expr_Dense 2D *(C,s) 64x256 : 687285 : 1 : 32768 : 171.228
 > +Expr_Loop 1D *(C,s) 16384 : 653265 : 1 : 32768 : 180.145
 > +Fftm Inv row_type C-C by_ref 64x256 : 1559304 : 1 : 1146880 : 2641.48
 > +</screen>
 > +      The first line describes a evaluation of a "dense" two-
 > +      dimensional multiplication between a single-precision complex
 > +      view (a matrix) and a single-precision scalar.  Note that
 > +      scalars are represented using lower-case equivalents for
 > +      the data types in the table above.
 > +    </para>
 > +    <para>
 > +      A "dense" matrix is one in which the values are packed
 > +      tightly in memory with no intervening space between the rows
 > +      or columns.  Therefore, the two-dimensional multiplication can
 > +      be thought of as a 1-dimensional multiplication of a long vector.
 > +      The evaluation of the 2-D operation includes the time required 
for
 > +      the 1-D operation, together with a small amount of overhead.
 > +      You can tell that this is the case as the time shown on the
 > +      first line is slightly greater than the time shown on the second.
 > +      Both show the same number of operations because they are
 > +      referring to the same calculation.
 > +    </para>
 > +    <para>
 > +      Similarly, the time required for the inverse FFT includes both 
the
 > +      time spent actually computing the FFT as well as the time 
required
 > +      for the scaling multiplication.  Because the multiplication is 
not
 > +      included in the theoretical operation count, the MOP/s count 
shown
 > +      is somewhat smaller than than for the forward FFT.

I believe the theoretical operation count is intended to include this
scaling cost, but it requires extra effort on the part of the
implementation:

   "For FFTs, Sourcery VSIPL++ uses the commonly accepted theoretical
    operation count of 5 N log2(N).  This includes the cost of scaling,
    which may be folded in with final twiddle factors.  However, as this
    example illustrates, not all FFT backends have this capability, as
    a result scaled FFTs often have a MOP/s rate lower than non-scaled
    FFTs."

 > +    </para>
 > +    </section>
 > +  </section>
 > +  <para>
 > +    The analysis presented in this section is only a portion of what
 > +    one would do to verify an algorithm is performing as desired.
 > +    Core routines utilizing techniques such as the fast convolution
 > +    method comprise only a portion of larger programs whose
 > +    performance is also of interest.
 > +    The profiling capabilities utilized here can be extended to cover
 > +    those areas of the application as well.
 > +    See <xref linkend="application_profiling"/> for more details.
 > +  </para>
 > +  </section>
 > +    <section><title>Trace Profile Data</title>
 > +    <para>

Flow suggestion: describe what trace mode is, then give details on how
to enable it:

   "In trace mode, the profiler records each library call as a pair
   of events, allowing you to see where each call was made and when it
   returned. This provides two time stamps per call, showing not only
   which functions were executed, but how they were nested with respect
   to one another.  This mode is useful for investigating the execution
   sequence of your program.

   To enable trace mode, construct the 'Profile' object with a 'pm_trace'
   flag, as in this line:

     <screen>Profile profile("profile.txt", pm_trace);</screen>

   Long traces can result when profiling in this mode, so be sure to
   avoid gathering more data than you have memory to store (and have
   time to process later).  The output is very similar to the output in
   accumulate mode."


 > +      By passing an additional parameter to the 'Profile' constructor,
 > +      you can switch from "accumulate" mode to "trace" mode.  This line:
 > +<screen>Profile profile("profile.txt", pm_trace);</screen>
 > +      will cause Sourcery VSIPL++ to enter trace profiling mode.
 > +      All computations performed by your program while
 > +      <code>profile</code> is in scope will be traced.
 >        This mode is useful for investigating the execution sequence
 >        of your program.
 > -      The profiler simply records each library call as a pair of 
events,
 > -      allowing you to see where it entered and exited scope in each 
case.
 > +      The profiler records each library call as a pair of events,
 > +      allowing you to see where each call was made and when it returned.
 > +      This provides two time stamps per call, showing not only which
 > +      functions were executed, but how they were nested with respect
 > +      to one another.
 > +      Long traces can result when profiling in this mode, so
 > +      be sure to avoid gathering more data than you have memory to
 > +      store (and have time to process later).  The output is very
 > +      similar to the output in accumulate mode.
 >      </para>
 >      <para>
 > -      Long traces can result when profiling in this mode, so be sure to
 > -      avoid taking more data than you have memory to store (and have 
time
 > -      to process later).  The output is very similar to the output in
 > -      accumulate mode.
 > +      Here is a sample of the output obtained by running the fast
 > +      convolution example in trace mode, which can also be run with
 > +      the options
 > +<screen>--vsipl++-profile-mode=trace 
--vsipl++-profile-output=profile.txt
 > +</screen>
 >      </para>
 >      <programlisting><xi:include href="src/profile_trace.txt" 
parse="text"/>
 >      </programlisting>
 >      <para>
 > -      For each event, the profiler outputs an event number, an 
indentifying
 > -      tag, and the current timestamp (in "ticks").  The next two fields
 > -      differ depending on whether the event is coming into scope or 
out of
 > -      scope.  When coming into scope, a zero is shown followed by the
 > -      estimated count of floating point operations for that function.
 > -      When exiting scope, the profiler displays the event number being
 > -      closed followed by a zero.  In all cases, the timestamp (and
 > -      intervals) may be converted to seconds by dividing by the
 > -      'clocks_per_second' constant in the log file header.
 > +      For each event, the Sourcery VSIPL++ outputs an event number,
 > +      an indentifying tag, and the current timestamp (in "ticks").
 > +      The next two fields differ depending on whether the event
 > +      marks the entry point of a library function or its return.
 > +      At the start of a call, a zero is shown followed by the estimated
 > +      count of floating point operations for that function.  When
 > +      returning from a call, the profiler displays the event number
 > +      created when the function was called, followed by a zero.
 > +      In all cases, the timestamp (and intervals) may be converted to
 > +      seconds by dividing by the 'clocks_per_second' constant in the
 > +      log file header.
 >      </para>
 > +    <para>
 > +      In the break shown by the ellipses, the program is in the 
middle of
 > +      performing the vector-matrix multiply, which has been broken down
 > +      into 64 separate vector-multiplies.  The first two FFT's are
 > +      completed, as shown by the fact that each have two entries,
 > +      one for where the computation began and one for where it ended.
 > +      The Vmmul function has started, but not yet finished, so it only
 > +      has one entry as of yet.

The output includes the end event of the vmmul.  How about

   "For brevity, events for some of the 64 scalar-vector mulitplies 
performed in
    the vmmul operation have been replaced with an ellipses."

 > +    </para>
 >      </section>
 > -    <section id="performance-api"><title>Performance API</title>
 > +    <section xml:id="performance_api"><title>Performance API</title>
 >      <para>
 >        An additional interface is provided for getting run-time 
profile data.
 >        This allows you to selectively monitor the performance of a
 > @@ -166,19 +331,19 @@
 >      </para>
 >      <para>
 >        Classes with the Performance API provide a function called
 > -      <code>impl_performance</code> that takes a string parameter 
and returns
 > -      single-precision floating point number.
 > +      <code>impl_performance</code> that takes a std::string parameter
 > +      and returns a single-precision floating point number.

Doesn't impl_performance take a 'char const*' parameter?

 >      </para>
 >      <para>
 >        The following call shows how to obtain an estimate of the 
performance
 >        in number of operations per second:
 > -
 > -      <screen>float mops = fwd_fft.impl_performance("mops");</screen>
 > -
 > -      An "operation" will vary depending on the object and type of data
 > -      being processed.  For example, a single-precison Fft object will
 > -      return the number of single-precison floating-point operations
 > -      performed per second.
 > +<screen>float mops = fwd_fft.impl_performance("mops");</screen>
 > +      The definition of "operation" varies depending on the object
 > +      and type of data being processed.  For example, a single-precison
 > +      Fft object will return the number of single-precison
 > +      floating-point operations performed per second while a complex
 > +      double-precision FFT object will return the number of double-
 > +      precision floating-point operations performed per second.
 >      </para>
 >      <para>
 >        The table below lists the current types of information available.
 > @@ -219,37 +384,59 @@
 >  </table>
 >      </section>
 >    </section>
 > -  <section id="application-profiling"><title>Application 
Profiling</title>
 > +  <section xml:id="application_profiling">
 > +    <title>Application Profiling</title>
 >      <para>
 > -      The profiling mode provides an API that allows you to instrument
 > -      your own code.  Here we introduce a new object, the
 > -      <code>Scope_event</code> class, and show you how to use it in 
your
 > -      application.
 > +      Sourcery VSIPL++ provides an interface that allows you to 
instrument
 > +      your own code through the <code>Scope_event</code> class.

For avoidance of doubt, you could mention that these events get included
in the profiling output:

   "Sourcery VSIPL++ provides an interface that allows you to instrument
   your own code with profiling events that will be included in the
   accumulate mode and trace mode output.

   "Profiling events are recorded by constructing a 'Scope_even' object.
   ... MERGE WITH NEXT PARAGRAPH"

 >      </para>
 >      <para>
 > -      To create a <code>Scope_event</code>, simply call the 
constructor, passing
 > -      it the string that will become the event tag and, optionally, 
an integer
 > -      value expressing the number of floating point operations that will
 > -      be performed by the time the <code>Scope_event</code> object 
is destroyed.
 > -      For example, to  measure the time taken to compute a simple 
running sum
 > -      of squares over a C array:
 > +      To create a <code>Scope_event</code>, call the constructor, 
passing
 > +      it a std::string that will become the event tag and, 
optionally, an
 > +      integer value expressing the number of floating point operations
 > +      that will be performed by the time the <code>Scope_event</code>
 > +      object is destroyed.  For example, to measure the time taken to
 > +      compute the main portion in the fast convolution example,
 > +      modify the source as follows:
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Wed Sep 13 18:32:55 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Wed, 13 Sep 2006 14:32:55 -0400
Subject: patch: Add support for IPP on windows.
Message-ID: <45084ED7.8000003@codesourcery.com>

The attached patch makes the library compile with IPP on windows.
OK to commit ?

Regards,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IPP.patch
Type: text/x-patch
Size: 4575 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060913/22f9938a/attachment.bin>

From mark at codesourcery.com  Wed Sep 13 18:37:20 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Wed, 13 Sep 2006 11:37:20 -0700
Subject: [vsipl++] patch: Add support for IPP on windows.
In-Reply-To: <45084ED7.8000003@codesourcery.com>
References: <45084ED7.8000003@codesourcery.com>
Message-ID: <45084FE0.40608@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch makes the library compile with IPP on windows.
> OK to commit ?

Looks OK to me.

> Index: src/vsip/impl/config.hpp
> ===================================================================
> --- src/vsip/impl/config.hpp	(revision 149109)
> +++ src/vsip/impl/config.hpp	(working copy)
> @@ -29,6 +29,13 @@
>  # define VSIP_IMPL_PI 3.14159265358979323846
>  #endif
>  
> +#if defined(_WIN32) && VSIP_IMPL_HAVE_IPP
> +// IPP on Windows uses __stdcall for all functions.
> +# define VSIP_IMPL_IPP_CALL __stdcall
> +#else
> +# define VSIP_IMPL_IPP_CALL
> +#endif

Here, you probably don't really need the HAVE_IPP test, since the macro 
is only used with IPP.  It's not harmful; just seems redundant.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Wed Sep 13 21:11:23 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 13 Sep 2006 17:11:23 -0400
Subject: [vsipl++] patch: Add support for IPP on windows.
In-Reply-To: <45084FE0.40608@codesourcery.com>
References: <45084ED7.8000003@codesourcery.com> <45084FE0.40608@codesourcery.com>
Message-ID: <450873FB.7030909@codesourcery.com>

Mark Mitchell wrote:
> Stefan Seefeld wrote:
>> The attached patch makes the library compile with IPP on windows.
>> OK to commit ?
> 
> Looks OK to me.

Looks good here too.  Same comment as Mark about the VSIP_IMPL_HAVE_IPP 
test.  :)  Please check it in.

				thanks,
				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Thu Sep 14 07:52:28 2006
From: don at codesourcery.com (Don McCoy)
Date: Thu, 14 Sep 2006 01:52:28 -0600
Subject: [vsipl++] [patch] Tutorial updates
In-Reply-To: <45080304.6010704@codesourcery.com>
References: <45057AE9.3040906@codesourcery.com> <45074DC8.6000608@codesourcery.com> <45080304.6010704@codesourcery.com>
Message-ID: <45090A3C.4040403@codesourcery.com>

Jules Bergmann wrote:
> Don,
>
> This looks good.  I have several suggestions below on the tutorial 
> chapter.
> Use them as you please :)  Once you're happy please check it in.  We can
> continue to incorporate edits as we review at the whole document.
>
Thanks again for the comments.  It is now checked in with these edits 
and a few others, as attached.

-- 
Don McCoy
don (at) CodeSourcery 
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pt4.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/e69550c1/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pt4.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/e69550c1/attachment-0001.ksh>

From jules at codesourcery.com  Thu Sep 14 21:00:58 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 14 Sep 2006 17:00:58 -0400
Subject: [patch] work around for icl transpose bug
Message-ID: <4509C30A.2010507@codesourcery.com>

This patch attempts to work around the icl bug with complex transpose.

It has been tested with Intel C++ 9.1 for Windows ia32 against a 
simplified test case that triggered the bug (I will post that test later 
today).  It has not been tested with the original solver failures.

Patch applied.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trans.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/eebc0f87/attachment.ksh>

From mark at codesourcery.com  Thu Sep 14 22:18:22 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 14 Sep 2006 15:18:22 -0700
Subject: [vsipl++] [patch] work around for icl transpose bug
In-Reply-To: <4509C30A.2010507@codesourcery.com>
References: <4509C30A.2010507@codesourcery.com>
Message-ID: <4509D52E.9000004@codesourcery.com>

Jules Bergmann wrote:
> This patch attempts to work around the icl bug with complex transpose.
> 
> It has been tested with Intel C++ 9.1 for Windows ia32 against a 
> simplified test case that triggered the bug (I will post that test later 
> today).  It has not been tested with the original solver failures.

How horribly awfully sad. :-(  Looking at the test case you posted, I 
don't spot a coding bug.  It's always possible, but I didn't see it. 
So, it does seem most likely to be a coding bug.

In any case, given the schedule, I definitely agree that a work-around 
is in order.

I'm finishing up minor edits to the tutorial this afternoon/evening.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From mark at codesourcery.com  Thu Sep 14 22:21:33 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 14 Sep 2006 15:21:33 -0700
Subject: [vsipl++] [patch] work around for icl transpose bug
In-Reply-To: <4509D52E.9000004@codesourcery.com>
References: <4509C30A.2010507@codesourcery.com> <4509D52E.9000004@codesourcery.com>
Message-ID: <4509D5ED.1070908@codesourcery.com>

Mark Mitchell wrote:
> Jules Bergmann wrote:
>> This patch attempts to work around the icl bug with complex transpose.
>>
>> It has been tested with Intel C++ 9.1 for Windows ia32 against a 
>> simplified test case that triggered the bug (I will post that test 
>> later today).  It has not been tested with the original solver failures.
> 
> How horribly awfully sad. :-(  Looking at the test case you posted, I 
> don't spot a coding bug.  It's always possible, but I didn't see it. So, 
> it does seem most likely to be a coding bug.

compiler bug, I mean.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From mark at codesourcery.com  Fri Sep 15 01:11:54 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 14 Sep 2006 18:11:54 -0700
Subject: PATCH: Updates to tutorial
Message-ID: <200609150111.k8F1Bsm2013570@sethra.codesourcery.com>


This patch fixes some typos/grammar/etc. in the tutorial.  There's
clearly more we could do to improve the documentation, but this will
do for the upcoming release.

Jules, Don, I noticed that there's no performance graph for the
temporal-locality version of the parallel fast convolution.  Is that
graph available?

Thanks,

--
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713

2006-09-14  Mark Mitchell  <mark at codesourcery.com>

	* doc/tutorial/tutorial.xml: Add references to API reference and
	specification.
	* doc/tutorial/performance.xml: Edit.
	* doc/tutorial/parallel.xml: Likewise.
	* doc/tutorial/serial.xml: Likewise.

Index: performance.xml
===================================================================
--- performance.xml	(revision 149238)
+++ performance.xml	(working copy)
@@ -12,8 +12,27 @@
 ]>
 <chapter id="chap-performance"
          xmlns:xi="http://www.w3.org/2003/XInclude">
-  <title>Performance</title>
+  <title>Profiling</title>
 
+  <chapterinfo>
+   <abstract>
+    <para>
+     This chapter explains how to use the profiling features of Sourcery
+     VSIPL++ to improve the performance of your application.
+    </para>
+   </abstract>
+  </chapterinfo>
+
+  <para>
+   This chapter explains how to use the profiling features of Sourcery
+   VSIPL++ to improve the performance of your application.  Sourcery
+   VSIPL++ provides two profiling modes.  The <firstterm>library
+   profiling</firstterm> mode allows you to gather data about the
+   time used for computations performed through the VSIPL++ API.  The
+   <firstterm>application profiling</firstterm> mode allows you to
+   instrument blocks of application code to gather data at a higher
+   level.
+  </para>
 
   <section><title>Library Profiling</title>
     <para>
@@ -90,16 +109,15 @@
     <para>
       To enable profiling, define 
       <option>-DVSIP_IMPL_PROFILER=<replaceable>mask</replaceable></option>
-      on the command line when compiling your program.  
-      On many systems, this option may be added to the CXXFLAGS variable 
-      in the project makefile.  
-    </para>
-    <para>
-      Since profiling can introduce overhead, especially for element-wise
-      expressions, this macro allows you to choose which operations in the
-      library are profiled.  To profile all operations, use 
+      on the command line when compiling your program.  (If you are
+      using <command>make</command> to build your program, you might
+      want to add this command-line option to the
+      <varname>CXXFLAGS</varname> variable.)  To profile all operations, use 
       <option>-DVSIP_IMPL_PROFILER=15</option>.  
       See <xref linkend="mask-values"/> for other possible values.
+      Since profiling introduces some overhead, especially for element-wise
+      expressions, you may wish to limit the set of operations that are
+      are profiled.  
     </para>
     <note>
       <para>
@@ -115,27 +133,31 @@
 
     <section><title>Accumulating Profile Data</title>
     <para>
-      To use the accumulate mode, you must declare a <code>Profile</code>
+      To use the accumulate mode, you must declare a
+      <classname>Profile</classname>
       object.  Sourcery VSIPL++ will collect profiling data throughout 
-      its lifetime.  When the object goes out of scope, the data 
-      collected by profiling will be written to a log file.  For 
+      the lifetime of this object.  When the object goes out of scope,
+      the data collected by profiling will be written to a log file.  For 
       example, to profile your entire program, with all data written 
       to the file <filename>profile.txt</filename>, you would add 
       this line:
 
 <screen>Profile profile("profile.txt", pm_accum);</screen>
 
-      to the beginning of your <code>main</code> function, after 
+      to the beginning of your <function>main</function> function, after 
       initializing Sourcery VSIPL++.  Then, when the program exits, 
       this object will go out of scope and profiling data will be 
       written to the output file.  For this reason, only one object 
       of this type may be in scope at any given time.
     </para>
     <para>
-      If you are profiling your entire program, you may specify options
-      on the command line that perform the equivalent of the above two steps:
-
+      If you want to profile your entire program, you may invoke your
+      program with the following command-line options:
 <screen>--vsipl++-profile-mode=accum --vsipl++-profile-output=profile.txt</screen>
+      These options will be processed during the call to
+      <function>vsip::init</function>, and are equivalent to declaring
+      the profiling object in <function>maine</function>, as described
+      above.
     </para>
     <para>
       Using this technique on the example program <filename>fce-serial.cpp
@@ -149,7 +171,8 @@
       (or "event").  The first column gives a name for the event.  The 
       second column is the total amount of time spent in this operation 
       in "ticks". (You can convert ticks to seconds by dividing by the 
-      value given by the "clocks_per_sec" value in the profiling header.)  
+      value given by the <varname>clocks_per_sec</varname> value in
+      the profiling header.)  
       The third column indicates the number of times this operation was 
       performed.  The fourth column indicates the number of mathematical 
       operations performed during the computation.  (This is the number of 
@@ -369,32 +392,34 @@ Fftm Inv C-C by_ref 64x256 : 1559304 : 1
 
     <section xml:id="performance_api"><title>Performance API</title>
     <para>
-      An additional interface is provided for getting run-time profile data.
-      This allows you to selectively monitor the performance of a 
-      particular instance of a VSIPL class such as Fft, Convolution or
-      Correlation.
-    </para>
-    <para>
-      Classes instrumented the Performance API provide a function 
-      called <code>impl_performance</code> that takes a pointer to a 
-      constant character string and returns a single-precision floating 
-      point number.
+      Sourcery VSIPL++ provides an additional, low-level interface for
+      accessing profile data.  This interface allows you to
+      selectively monitor the performance of a particular instance of
+      classes that implement the Performance API.  Classes
+      instrumented the Performance API provide a function called
+      <methodname>impl_performance</methodname>.  This function maps
+      keywords (provided as C-style strings) to floating-point values.
+      The <classname>Fft</classname>,
+      <classname>Convolution</classname>, and
+      <classname>Correlation</classname> classes all implement the
+      performance API.
     </para>
     <para>
       The following call shows how to obtain an estimate of the performance
-      in number of operations per second:
+      in number of operations per second from a particular FFT object:
 
 <screen>float mops = fwd_fft.impl_performance("mops");</screen>
 
-      The definition of "operation" varies depending on the object 
-      and type of data being processed.  For example, a single-precison 
-      Fft object will return the number of single-precison 
-      floating-point operations performed per second while a complex 
-      double-precision FFT object will return the number of double-
-      precision floating-point operations performed per second.
+      The definition of &quot;operation&quot; varies depending on the
+      object and type of data being processed.  For example, a
+      single-precison FFT object will return the number of
+      single-precison floating-point operations performed per second
+      while a complex double-precision FFT object will return the
+      number of double-precision floating-point operations performed
+      per second.
     </para>
     <para>
-      The table below lists the current types of information available.
+      The table below lists the information available.
     </para>
 <table frame="none" rowsep="0"><title>Performance API Metrics</title>
 <tgroup cols="2">
@@ -442,28 +467,28 @@ Fftm Inv C-C by_ref 64x256 : 1559304 : 1
       included in the accumulate mode and trace mode output.
     </para>
     <para>
-      Profiling events are recorded by constructing a <code>Scope_event
-      </code>  object.  To create a <code>Scope_event</code>, call the 
-      constructor, passing it a <code>std::string</code> that will 
+      Profiling events are recorded by constructing a <classname>Scope_event
+      </classname> object.  To create a
+    <classname>Scope_event</classname>, call the 
+      constructor, passing it a <classname>std::string</classname> that will 
       become the event tag and, optionally, an integer value expressing 
       the number of floating point operations that will be performed by 
-      the time the object is destroyed.  
-      For example, to measure the time taken to compute the main portion 
-      in the fast convolution example, modify the source as follows:
+      the time the object is destroyed.  The following example shows
+      how to use this facility:
     </para>
 <programlisting><xi:include href="src/profile_example.cpp" parse="text"/>
 </programlisting>
-    <para>
-      The operation count passed as the second parameter is the 
-      sum of the two FFT's and the vector-matrix multiply.  
-      This resulting profile data is identical in format to that used for
-      profiling library functions.
-    </para>
+   <para>
+     The operation count passed as the second parameter is the 
+     sum of the two FFT's and the vector-matrix multiply.  
+     The resulting profile data is identical in format to that
+     obtained using the library API:
+   </para>
 <programlisting><xi:include href="src/profile_output.txt" parse="text"/>
 </programlisting>
     <para>
       Now the output has a new line that represents the time that
-      the <code>Scope_event</code> object exists, i.e. only while the
+      the <classname>Scope_event</classname> object exists, i.e. only while the
       program executes the three main steps of the fast convolution.
 
 <screen>Fast Convolution : 4256109 : 1 : 2424832 : 2046.11</screen>
Index: tutorial.xml
===================================================================
--- tutorial.xml	(revision 149238)
+++ tutorial.xml	(working copy)
@@ -61,7 +61,11 @@
     <title>Reference</title>
     <partintro>
       <para>
-        The sections in Part II form a reference manual for Sourcery VSIPL++.
+        The sections in Part II provide reference information about
+        Sourcery VSIPL++.  You should also refer to the VSIPL++ API
+	Specification and Sourcery VSIPL++ API Reference, both of
+	which are available at <ulink
+	url="http://www.codesourcery.com/vsiplplusplus"/>.
       </para>
 
       <literallayout>
Index: parallel.xml
===================================================================
--- parallel.xml	(revision 149238)
+++ parallel.xml	(working copy)
@@ -28,7 +28,7 @@
   <para>
    The first fast convolution program in the previous chapter makes
    use of two implicitly parallel operators: <function>Fftm</function> and
-   <function>vmmul</function>.  These operators are implicity parallel
+   <function>vmmul</function>.  These operators are implicitly parallel
    in the sense that they process each row of the matrix
    independently.  If you had enough processors, you could put each
    row on a separate processor and then perform the entire
@@ -38,19 +38,20 @@
   <para>
    In the VSIPL++ API, you have explicit control of the number of
    processors used for a computation.  Since the default is to use
-   just a single processor, the program above will not run in
-   parallel, even on a multi-processor system.  This section will show
-   you how to use <firstterm>maps</firstterm> to take advantage of
-   multiple processors.  Using a map tells Sourcery VSIPL++ to
-   distribute a single block of data across multiple processors.
-   Then, Sourcery VSIPL++ will automatically move data between
-   processors as necessary.
+   just a single processor, the program in <xref
+   linkend="sec-serial-fastconv"/> will not run in parallel, even on a
+   multi-processor system.  This section will show you how to use
+   <firstterm>maps</firstterm> to take advantage of multiple
+   processors.  Using a map tells Sourcery VSIPL++ to distribute a
+   single block of data across multiple processors.  Then, Sourcery
+   VSIPL++ will automatically move data between processors as
+   necessary.
   </para>
 
   <para>
    The VSIPL++ API uses the Single-Program Multiple-Data (SPMD) model
    for parallelism.  In this model, every processor runs the same
-   program, but operates on different sets of data.  For instance, in
+   program, but operates on different sets of data.  For example, in
    the fast convolution example, multiple processors perform FFTs at
    the same time, but each processor handles different rows in the
    matrix.
@@ -218,12 +219,12 @@
    <title>Implicit Parallelism: Parallel Foreach</title>
 
    <para>
-    You may feel that the original formulation was simpler and more
+    You may feel that the original formulation using implicitly
+    parallel operators was simpler and more
     intuitive than the more-efficient variant using explicit loops.
     Sourcery VSIPL++ provides an extension to the VSIPL++ API that
     allows you to retain the elegance of that formulation while still
-    obtaining the temporal locality obtained with the style shown in
-    the previous two sections.
+    obtaining good temporal locality.
    </para>
 
    <para>
@@ -373,11 +374,11 @@
 
   <para>
    Because the data will be arriving via DMA, you must explicitly
-   manage the memory used by Sourcery VSIPL++.  Each processor must allocate
-   the memory for its local portion of
-   <varname>data_in_block</varname>.  (All processors except the
-   actual input processor will allocate zero bytes, since the input
-   data is located on a single processor.)  The code required to
+   manage the memory used by Sourcery VSIPL++.  Because VSIPL++ uses the
+   SPMD model, each processor must allocate
+   the memory for its local portion the input block, even though all
+   processors except the actual input processor will allocate zero
+   bytes.  The code required to
    set up the views is:
   </para>
 
Index: serial.xml
===================================================================
--- serial.xml	(revision 149238)
+++ serial.xml	(working copy)
@@ -151,7 +151,7 @@
    Before performing the actual convolution, you must convert the 
    replica to the frequency domain using the FFT created above.  Because
    the replica data is a property of the chirp, we only need to do
-   this once; even if our radar system runs for a long time, the
+   this once; even if the radar system runs for a long time, the
    converted replica will always be the same.  VSIPL++ FFT
    objects behave like functions, so you can just &quot;call&quot; the
    FFT object:
@@ -165,7 +165,7 @@
    objects you've already created to go into and out of the frequency
    domain.  While in the frequency domain, you will use the
    <function>vmmul</function> operator to perform a 
-   vector-matrix multiply.  This will multiply each row
+   vector-matrix multiply.  This operator multiplies each row
    (dimension zero) of the frequency-domain matrix by the replica.
    The <function>vmmul</function> operator is a template taking a
    single parameter which indicates whether the multiplication should
@@ -284,7 +284,7 @@
   }]]></programlisting>
 
   <para>
-   The following graph shows that the new &quot;interleaves&quot;
+   The following graph shows that the new &quot;interleaved&quot;
    formulation is faster than the original &quot;phased&quot; approach
    for large data sets.  For smaller data sets (where all of the data
    fits in the cache anyhow), the original method is faster because
@@ -309,12 +309,12 @@
   </para>
 
   <para>
-   To perform I/O with external routines (such as posix
-   <function>read</function> and <function>write</function>
-   it is necessary to obtain a pointer to data.
-   Sourcery VSIPL++ provides multiple ways to do this:
-   using <firstterm>user-defined storage</firstterm>, and
-   using <firstterm>external data access</firstterm>.
+   To perform I/O with external routines (such as the POSIX
+   <function>read</function> and <function>write</function> functions)
+   it is necessary to obtain a pointer to the raw data used by
+   Sourcery VSIPL++. Sourcery VSIPL++ provides two ways to do this:
+   you may use either <firstterm>user-defined storage</firstterm> or
+   <firstterm>external data access</firstterm>.
    In this section you will use user-defined storage to
    perform I/O.  Later, in <xref linkend="sec-io-extdata"/> you
    will see how to use external data access for I/O.
@@ -385,7 +385,7 @@
    The <varname>true</varname> argument indicates that the data
    values sould be preserved by the admit.  In cases where the
    values do not need to preserved (such as admitting a block
-   after outout I/O has been performed and before the block will be
+   after output I/O has been performed and before the block will be
    overwritten by new values in VSIPL++) you can use
    <varname>false</varname> instead.
   </para>
@@ -417,14 +417,13 @@
 
   <para>
    In this section, you will use <firstterm>External Data
-   Access</firstterm> to get pointer to a block's data.
-   External data access allows a pointer to any block's
-   data to be taken, even if the block was not created with
-   user-specified storage (or if the block is not a <varname>Dense</varname>
-   block at all!)  This capability is useful in context where you
-   cannot control how a block is created.  To illustrate
-   this, you will create a utility routine for I/O that works
-   with any view passed as a parameter.
+   Access</firstterm> to get a pointer to a block's data.
+   You can use this method with any block, even if the block does not
+   use user-specified storage.  The external data access method is
+   useful in contexts where you cannot control how the block is
+   allocate.  For example, in this section, you will create a utility
+   routine for I/O that works with any matrix or vector, even if it
+   was not created with user-defined storage.
   </para>
 
   <para>
@@ -440,30 +439,30 @@
    <varname>block_type</varname> and the requested layout
    <varname>layout_type</varname>.  The constructor takes
    two parameters: the block being accessed, and the type of
-   syncing necessary.
+   synchronization necessary.
   </para>
 
   <para>
-   The <varname>layout_type</varname> parameter is an
-   specialized <varname>Layout</varname> class template that
+   The <varname>layout_type</varname> parameter is a
+   specialized <classname>Layout</classname> class template that
    determines the layout of data that <function>Ext_data</function>
    provides.  If no type is given,
    the natural layout of the block is used.  However, in some
-   cases it is necessary to access the data in a certain way,
-   such as dense or row-major.
+   cases you may wish to specify row-major or column-major layout. 
   </para>
 
   <para>
-   <varname>Layout</varname> class template takes 4 parameters to
+   The <classname>Layout</classname> class template takes 4 parameters to
    indicate dimensionality, dimension-ordering, packing format,
    and complex storage format (if complex).  In the example below
    you will use the layout_type to request the data access to be dense,
-   row-major, with interleaved real and imaginar values if complex.
-   This will allow you to read data sequentially from a file.
+   row-major, with interleaved real and imaginary values.  This layout
+   corresponds to a common storage format used for binary files
+   storing complex data.
   </para>
 
   <para>
-   The sync type is analgous to the update flags for
+   The synchronization type is analgous to the update flags for
    <function>admit()</function> and <function>release()</function>.
 
    <varname>SYNC_IN</varname> indicates that the block and pointer
@@ -486,15 +485,18 @@
   <programlisting><![CDATA[  value_type* ptr = ext.data();]]></programlisting>
 
   <para>
-   The pointer provided is valid only during the life of the object.
-   Moreover, the block being accessed should not be used during that time.
+   The pointer provided is valid only during the life of the
+   <classname>Ext_data</classname> object.
+   Moreover, the block referred to by the
+   <classname>Ext_data</classname> object must not be used during this
+   period.
   </para>
 
 
   <para>
-   Putting this together, you can create a routine to perform
+   Using these capabilities together, you can create a routine to perform
    I/O into a block.  This routine will take two arguments:
-   a filename to read, and a view to put the data into.
+   a filename to read, and a view in which to store the data.
    The amount of data read from the file will be determined by
    the view's size.
   </para>


From jules at codesourcery.com  Fri Sep 15 02:11:15 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 14 Sep 2006 22:11:15 -0400
Subject: [patch] Shared builtin libdirs for merged package
Message-ID: <450A0BC3.3080903@codesourcery.com>

This patch installs builtin libraries such as ATLAS and FFTW3 into 
'builtin_libdir', which can be set from configure.  By default it is the 
same as libdir, so it only makes a difference when explicitly used.

packpage.py and scripts/config is updated to use this so that builtin 
libraries are shared amongst different library variants when possible.

This patch includes a small bug-fix to simd.hpp, and some test updates 
made for debugging the windows' solver-lu failures.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/b229a3d2/attachment.ksh>

From jules at codesourcery.com  Fri Sep 15 02:19:12 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 14 Sep 2006 22:19:12 -0400
Subject: [vsipl++] [patch] Shared builtin libdirs for merged package
In-Reply-To: <450A0BC3.3080903@codesourcery.com>
References: <450A0BC3.3080903@codesourcery.com>
Message-ID: <450A0DA0.1030407@codesourcery.com>

Oops!  Here's the right patch. -- Jules

Jules Bergmann wrote:
> This patch installs builtin libraries such as ATLAS and FFTW3 into 
> 'builtin_libdir', which can be set from configure.  By default it is the 
> same as libdir, so it only makes a difference when explicitly used.
> 
> packpage.py and scripts/config is updated to use this so that builtin 
> libraries are shared amongst different library variants when possible.
> 
> This patch includes a small bug-fix to simd.hpp, and some test updates 
> made for debugging the windows' solver-lu failures.
> 
>                 -- Jules
> 
> 
> ------------------------------------------------------------------------
> 
> 
> configure options for gannon
>  --disable-mpi
>  --with-lapack=builtin
>  --with-atlas-tarball=/home/jules/csl/atlas/atlas3.7.11_SunOS_SunUS2.tar.gz
>  --enable-fft=builtin --disable-fft-long-double
>  --enable-profile-timer=posix


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/7f7e99c9/attachment.ksh>

From jules at codesourcery.com  Fri Sep 15 03:30:07 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 14 Sep 2006 23:30:07 -0400
Subject: [patch] Regression test for icc-windows bug with transpose.
Message-ID: <450A1E3F.2070503@codesourcery.com>

Patch applied.

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ta.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060914/af411164/attachment.ksh>

From jules at codesourcery.com  Fri Sep 15 05:52:10 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 15 Sep 2006 01:52:10 -0400
Subject: [patch] IPP and MKL configuration for windows
Message-ID: <450A3F8A.3040102@codesourcery.com>

This patch makes it slightly easier to configure for using IPP and MKL 
on windows.

First, since configure doesn't like paths with spaces, it is necessary 
to put the paths for IPP and MKL into the LIB and INCLUDE environment 
variables.

Once this is done, use the configure options:

	--enable-ipp=win

and

	--with-lapack=mkl_win

In the future we can clean these up to have configure "do the right 
thing" on windows without the "win" hints, but that can wait for now.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: wincfg.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060915/3831368c/attachment.ksh>

From jules at codesourcery.com  Fri Sep 15 18:02:10 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 15 Sep 2006 14:02:10 -0400
Subject: [patch] benchmark updates
Message-ID: <450AEAA2.6060807@codesourcery.com>

NOTE: This patch does not affect the core library or the contents of 
binary packages.

This patch adds a Impl_pop case to the Fftm benchmark which measures 
performance of a out-of-place Fftm as implemented by a loop of 
out-of-place Ffts.

Besides illustrating the advantages of using Fftm over Fft for some 
backend, this can be used to measure the performance of Fft when its 
data is not guaranteed to start in cache.  For example, in the FIR bank 
benchmark, when processing is done one row at a time, the forward Ffts 
are performing a disjoint Fftm.  Depending on the problem size vs cache 
size, their data may not be in cache.  This benchmark case partially 
models that.

Patch applied.

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bm.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060915/1e908f82/attachment.ksh>

From jules at codesourcery.com  Fri Sep 15 21:14:36 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 15 Sep 2006 17:14:36 -0400
Subject: [patch] Updated Qr benchmark
Message-ID: <450B17BC.2020503@codesourcery.com>

This updates the benchmark to cover the various Q save options (no Q, 
thin Q, full Q).  It also add coverage for row-major and col-major 
source data.

Patch applied.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue Sep 19 02:22:49 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 18 Sep 2006 22:22:49 -0400
Subject: Sourcery VSIPL++ 1.2 Available
Message-ID: <450F5479.4060503@codesourcery.com>

CodeSourcery is pleased to announce the availability of Sourcery VSIPL++
1.2.  This new version of Sourcery VSIPL++, a toolkit for developing
high-performance signal- and image-processing applications has a number 
of improvements and new features.  Highlights include greater 
portability with support for the Windows platform and the Intel C++ 
compiler, improved performance with SIMD loop fusion to make greater use 
of the PowerPC AltiVec and Intel SSE instruction sets, improved 
parallelism with support for Mercury's Parallel Acceleration System 
(PAS) library, and increased productivity with an integrated profiling 
capability to gather application performance data.

Sourcery VSIPL++ is a full implementation of the VSIPL++ API, an open
standard for platform-independent signal- and image-processing
developed by the DOD High Performance Embedded Computing Software
Initiative (HPEC-SI) and the VSIPL Forum. Sourcery VSIPL++ provides
many high-level routines used in SIP computing, such as FFTs, FIR
filters, SVD and QR decomposition, and linear algebra.

For more information about Sourcery VSIPL++, including information about
receiving a free 30-day evaluation, please visit our website:

    http://www.codesourcery.com/vsiplplusplus

For more information on the new features in this release, please visit:

    http://www.codesourcery.com/vsiplplusplus/1.2/news.html
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705