From jules at codesourcery.com Thu Jun 1 01:27:13 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 31 May 2006 21:27:13 -0400 Subject: [vsipl++] ATLAS undefines In-Reply-To: <447DFE3F.4030508@codesourcery.com> References: <447DE91D.6030903@codesourcery.com> <447DFE3F.4030508@codesourcery.com> Message-ID: <447E4271.9000807@codesourcery.com> Assem, To follow up on our IRC discussion, I was mistaken in my earlier email below. The BLAS provided in clapack/BLAS/SRC is a C implementation of the Fortran BLAS API. I.e. the cdotu it provides looks like this from C/C++ void cdotu(complex* return_value, int* n, complex* cx, int* incx, complex* cy, int* incy); versus cblas_cdotu_sub, which would look like: void cblas_cdotu_sub( int n, complex* cx, int incx, complex* cy, int incy, complex* return_value); (or more accurately, void* instead of complex* because C doesn't have complex or complex when the API was created). What we want to do is add a new way to configure VSIPL++ so that it uses the Fortran BLAS provided in clapack/BLAS/SRC and the Lapack provided in clapack/SRC. The easiest way to do this is to have configure define VSIP_IMPL_USE_CBLAS to 0 when using CLAPACK's BLAS. We don't want to break the way VSIPL++ works when it gets configured to use ATLAS (or MKL or ACML for that matter). I.e. when using ATLAS, we should continue to have configure define VSIP_IMPL_USE_CBLAS to 1. Moreover, we don't want to try to use clapack/BLAS/SRC's blaswrap.h to abstract the difference between the Fortran and C BLAS APIs. -- Jules Jules Bergmann wrote: > Assem, > > Thanks for posting this. > > It looks like we're trying to use the CBLAS bindings for > CLAPACK/SRC/BLAS. Unfortunately, looking at the source, it is a Fortran > API, with a few variances (the complex dot-product Fortran functions > have been converted to C "subroutines" that return the result by > reference). I suspect if you tried to build other tests you would see > linker errors for functions like cblas_trsm, etc. > > For this, we should take an approach similar to how we handled ACML: > > - Have configure define VSIP_IMPL_USE_CBLAS = 4 when using > CLAPACK/SRC/BLAS > > - In lapack.hpp, when VSIP_IMPL_USE_CBLAS == 4, > - wrap the dot-product functions to have a CBLAS interface and > define VSIP_IMPL_USE_CBLAS_DOT = 1. > > This should be done in a separate header file, similar to > acml_cblas.hpp. > > - Use Fotran API for other BLAS functions > (VSIP_IMPL_USE_CBLAS_OTHER = 0). > > Does that sound OK? > > -- Jules > > Assem Salama wrote: >> Everyone, >> As per Jule's request, this is the output of make when trying to >> compile convolution.cpp in the tests dir. The BLAS that I got with >> CLAPACK has functions similar to these but without the cblas prepended >> and without _sub. >> >> Thanks, >> Assem Salama >> >> >> ------------------------------------------------------------------------ >> >> g++ -g -O2 -I../src -I/drive2/assem/work/checkout/vpp/tests/../src >> -I/include/atlas -I/include/fftw3 >> -I/drive2/assem/work/checkout/vpp/vendor/atlas/include >> -I/drive2/assem/work/build/vpp_temp2/vendor/fftw/include -o >> convolution.exe convolution.o -L/lib/atlas -L/lib/fftw3 >> -L/drive2/assem/work/build/vpp_temp2/vendor/atlas/lib >> -L/drive2/assem/work/build/vpp_temp2/vendor/fftw/lib >> -L/drive2/assem/work/build/vpp_temp2/vendor/clapack >> -L/drive2/assem/work/build/vpp_temp2/lib -L../src/vsip -lvsip -llapack >> -lF77 -lcblas -lfftw3f -lfftw3 -lfftw3l || rm -f convolution.exe >> convolution.o: In function `dot': >> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:180: >> undefined reference to `cblas_ddot' >> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:217: >> undefined reference to `cblas_zdotu_sub' >> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:179: >> undefined reference to `cblas_sdot' >> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:216: >> undefined reference to `cblas_cdotu_sub' >> collect2: ld returned 1 exit status > > -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Thu Jun 1 23:35:33 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 01 Jun 2006 19:35:33 -0400 Subject: patch: Fix tests/fft.cpp for long double Message-ID: <447F79C5.1080703@codesourcery.com> The attached patch fixes compilation errors I get for tests/fft.cpp when configuring with --enable-fft=sal,ipp,builtin. It simply adds some missing evaluators that explicitely disable long double versions. The patch is checked in. Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From assem at codesourcery.com Sat Jun 3 12:25:20 2006 From: assem at codesourcery.com (Assem Salama) Date: Sat, 03 Jun 2006 08:25:20 -0400 Subject: ATLAS Patch Message-ID: <44817FB0.10207@codesourcery.com> Everyone, This patch use the BLAS that comes with LAPACK. This allows us to not have to deal with ATLAS at all. Thanks, Assem Salama -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ChangeLog.06032006 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cvs.diff.06032006.1.log URL: From jules at codesourcery.com Mon Jun 5 16:18:48 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 05 Jun 2006 12:18:48 -0400 Subject: [vsipl++] ATLAS Patch In-Reply-To: <44817FB0.10207@codesourcery.com> References: <44817FB0.10207@codesourcery.com> Message-ID: <44845968.10103@codesourcery.com> Assem Salama wrote: > Everyone, > This patch use the BLAS that comes with LAPACK. This allows us to not > have to deal with ATLAS at all. Assem, I'm reviewing this patch, but I had several high-level questions: - have you tested it through installation? configure make install ... set your PKG_CONFIG_PATH appropriately cd tests make -f make.standalone - have you tested that it doesn't break the existing configurations using ATLAS? -- Jules > > > ------------------------------------------------------------------------ > > 2006-06-03 Assem Salama > > * configure.ac: Added a new lapack option. The user can now say > --with-lapack=simple. This will build VSIPL++ with the BLAS that > comes with CLAPACK. > * vendor/GNUmakefile.inc.in: Added an option to compile the BLAS > library that comes with CLAPACK. > * vendor/clapack/SRC/make.inc.in: Changed library names to liblapack.a > and libcblas.a. That way, the user can use -llapack and -lcblas. > * vendor/clapack/blas/SRC/GNUmakefile.in: New file. This file used to > be Makefile. This file uses configure variable srcdir. > * vendor/clapack/blas/blaswrap.h: Added a define at the top to not > redefine blas functions to f2c functions. > * examples/GNUmakefile.inc.in: Changed typo that prevented VSIPL++ > from finishing a complete build. > > > ------------------------------------------------------------------------ > > Index: configure.ac > =================================================================== > RCS file: /home/cvs/Repository/vpp/configure.ac,v > retrieving revision 1.105 > diff -u -r1.105 configure.ac > --- configure.ac 14 May 2006 20:57:05 -0000 1.105 > +++ configure.ac 3 Jun 2006 10:40:47 -0000 > @@ -175,8 +175,9 @@ > Library), acml (AMD Core Math Library), atlas (system > ATLAS/LAPACK installation), generic (system generic > LAPACK installation), builtin (Sourcery VSIPL++'s > - builtin ATLAS/C-LAPACK), and fortran-builtin (Sourcery > - VSIPL++'s builtin ATLAS/Fortran-LAPACK). > + builtin ATLAS/C-LAPACK), fortran-builtin (Sourcery > + VSIPL++'s builtin ATLAS/Fortran-LAPACK, and a simple (Lapack > + that doesn't require atlas).). > Specifying 'no' disables search for a LAPACK library.]),, > [with_lapack=probe]) > > @@ -492,6 +493,9 @@ > #endif]) > vsip_impl_avoid_posix_memalign= > > +AC_CHECK_HEADERS([png.h], > + [AC_SUBST(HAVE_PNG_H, 1)], > + [], [// no prerequisites]) > > # > # Find the FFT backends. > @@ -1275,6 +1279,8 @@ > lapack_packages="atlas generic1 generic2 builtin" > elif test "$with_lapack" == "generic"; then > lapack_packages="generic1 generic2" > + elif test "$with_lapack" == "simple"; then > + lapack_packages="simple"; > else > lapack_packages="$with_lapack" > fi > @@ -1515,6 +1521,19 @@ > AC_MSG_RESULT([not present]) > continue > fi > + elif test "$trypkg" == "simple"; then > + > + curdir=`pwd` > + CPPFLAGS="$keep_CPPFLAGS -I$srcdir/vendor/clapack/SRC" > + LDFLAGS="$keep_LDFLAGS -L$curdir/vendor/clapack" > + LIBS="$keep_LIBS -llapack -lcblas" > + > + AC_SUBST(USE_SIMPLE_LAPACK, 1) > + > + lapack_use_ilaenv=0 > + lapack_found="simple" > + break > fi > > Index: vendor/GNUmakefile.inc.in > =================================================================== > RCS file: /home/cvs/Repository/vpp/vendor/GNUmakefile.inc.in,v > retrieving revision 1.15 > diff -u -r1.15 GNUmakefile.inc.in > --- vendor/GNUmakefile.inc.in 11 May 2006 11:29:04 -0000 1.15 > +++ vendor/GNUmakefile.inc.in 3 Jun 2006 10:41:15 -0000 > @@ -12,6 +12,7 @@ > # Variables > ######################################################################## > > +USE_SIMPLE_LAPACK := @USE_SIMPLE_LAPACK@ > USE_BUILTIN_ATLAS := @USE_BUILTIN_ATLAS@ > USE_FORTRAN_LAPACK := @USE_FORTRAN_LAPACK@ > USE_BUILTIN_LIBF77 := @USE_BUILTIN_LIBF77@ > @@ -20,7 +21,7 @@ > USE_BUILTIN_FFTW_DOUBLE := @USE_BUILTIN_FFTW_DOUBLE@ > USE_BUILTIN_FFTW_LONG_DOUBLE := @USE_BUILTIN_FFTW_LONG_DOUBLE@ > > -vendor_CLAPACK = vendor/clapack/lapack.a > +vendor_CLAPACK = vendor/clapack/liblapack.a > vendor_FLAPACK = vendor/lapack/lapack.a > vendor_PRE_LAPACK = vendor/atlas/lib/libprelapack.a > vendor_USE_LAPACK = vendor/atlas/lib/liblapack.a > @@ -33,6 +34,7 @@ > endif > > vendor_LIBF77 = vendor/clapack/F2CLIBS/libF77/libF77.a > +vendor_SIMPLE_BLAS = vendor/clapack/libcblas.a > > > vendor_ATLAS_LIBS := \ > @@ -104,7 +106,6 @@ > @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > libF77.clean.log 2>&1 > endif > > - > clean:: > @echo "Cleaning ATLAS (see atlas.clean.log)" > @$(MAKE) -C vendor/atlas clean > atlas.clean.log 2>&1 > @@ -123,6 +124,53 @@ > endif # USE_FORTRAN_LAPACK > > endif # USE_BUILTIN_ATLAS > +################################################################################ > + > +ifdef USE_SIMPLE_LAPACK > +all:: $(vendor_SIMPLE_BLAS) $(vendor_REF_LAPACK) > + > +libs += $(vendor_F77BLAS) $(vendor_REF_LAPACK) > + > +$(vendor_SIMPLE_BLAS): > + @echo "Building simple BLAS (see simpleBLAS.build.log)" > + @$(MAKE) -C vendor/clapack/blas/SRC all > simpleBLAS.build.log 2>&1 > + > +ifdef USE_FORTRAN_LAPACK > +$(vendor_FLAPACK): > + @echo "Building LAPACK (see lapack.build.log)" > + @$(MAKE) -C vendor/lapack/SRC all > lapack.build.log 2>&1 > + > +clean:: > + @echo "Cleaning LAPACK (see lapack.clean.log)" > + @$(MAKE) -C vendor/lapack/SRC clean > lapack.clean.log 2>&1 > +else > +$(vendor_CLAPACK): > + @echo "Building CLAPACK (see clapack.build.log)" > + @$(MAKE) -C vendor/clapack/SRC all > clapack.build.log 2>&1 > + > +clean:: > + @echo "Cleaning CLAPACK (see clapack.clean.log)" > + @$(MAKE) -C vendor/clapack/SRC clean > clapack.clean.log 2>&1 > +endif # USE_FORTRAN_LAPACK > + > +ifdef USE_BUILTIN_LIBF77 > +all:: $(vendor_LIBF77) > + > +libs += $(vendor_LIBF77) > + > +$(vendor_LIBF77): > + @echo "Building libF77 (see libF77.build.log)" > + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 all > libF77.build.log 2>&1 > + > +install:: $(vendor_LIBF77) > + $(INSTALL_DATA) $(vendor_LIBF77) $(DESTDIR)$(libdir) > + > +clean:: > + @echo "Cleaning libF77 (see libF77.clean.log)" > + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > libF77.clean.log 2>&1 > +endif # USE_BUILTIN_LIBF77 > + > +endif # USE_SIMPLE_LAPACK > > > > Index: vendor/clapack/blas/SRC/GNUmakefile.in > =================================================================== > RCS file: vendor/clapack/blas/SRC/GNUmakefile.in > diff -N vendor/clapack/blas/SRC/GNUmakefile.in > --- /dev/null 1 Jan 1970 00:00:00 -0000 > +++ vendor/clapack/blas/SRC/GNUmakefile.in 3 Jun 2006 10:41:20 -0000 > @@ -0,0 +1,164 @@ > +include ../../SRC/make.inc > + > +srcdir = @srcdir@ > +OBJEXT = @OBJEXT@ > + > +VPATH = $(srcdir) > + > + > +####################################################################### > +# This is the makefile to create a library for the BLAS. > +# The files are grouped as follows: > +# > +# SBLAS1 -- Single precision real BLAS routines > +# CBLAS1 -- Single precision complex BLAS routines > +# DBLAS1 -- Double precision real BLAS routines > +# ZBLAS1 -- Double precision complex BLAS routines > +# > +# CB1AUX -- Real BLAS routines called by complex routines > +# ZB1AUX -- D.P. real BLAS routines called by d.p. complex > +# routines > +# > +# ALLBLAS -- Auxiliary routines for Level 2 and 3 BLAS > +# > +# SBLAS2 -- Single precision real BLAS2 routines > +# CBLAS2 -- Single precision complex BLAS2 routines > +# DBLAS2 -- Double precision real BLAS2 routines > +# ZBLAS2 -- Double precision complex BLAS2 routines > +# > +# SBLAS3 -- Single precision real BLAS3 routines > +# CBLAS3 -- Single precision complex BLAS3 routines > +# DBLAS3 -- Double precision real BLAS3 routines > +# ZBLAS3 -- Double precision complex BLAS3 routines > +# > +# The library can be set up to include routines for any combination > +# of the four precisions. To create or add to the library, enter make > +# followed by one or more of the precisions desired. Some examples: > +# make single > +# make single complex > +# make single double complex complex16 > +# Alternatively, the command > +# make > +# without any arguments creates a library of all four precisions. > +# The library is called > +# blas.a > +# > +# To remove the object files after the library is created, enter > +# make clean > +# To force the source files to be recompiled, enter, for example, > +# make single FRC=FRC > +# > +#--------------------------------------------------------------------- > +# > +# Edward Anderson, University of Tennessee > +# March 26, 1990 > +# Susan Ostrouchov, Last updated September 30, 1994 > +# > +####################################################################### > + > +all: single double complex complex16 > + > +#--------------------------------------------------------- > +# Comment out the next 6 definitions if you already have > +# the Level 1 BLAS. > +#--------------------------------------------------------- > +SBLAS1 = isamax.o sasum.o saxpy.o scopy.o sdot.o snrm2.o \ > + srot.o srotg.o sscal.o sswap.o > +$(SBLAS1): $(FRC) > + > +CBLAS1 = scasum.o scnrm2.o icamax.o caxpy.o ccopy.o \ > + cdotc.o cdotu.o csscal.o crotg.o cscal.o cswap.o > +$(CBLAS1): $(FRC) > + > +DBLAS1 = idamax.o dasum.o daxpy.o dcopy.o ddot.o dnrm2.o \ > + drot.o drotg.o dscal.o dswap.o > +$(DBLAS1): $(FRC) > + > +ZBLAS1 = dcabs1.o dzasum.o dznrm2.o izamax.o zaxpy.o zcopy.o \ > + zdotc.o zdotu.o zdscal.o zrotg.o zscal.o zswap.o > +$(ZBLAS1): $(FRC) > + > +CB1AUX = isamax.o sasum.o saxpy.o scopy.o snrm2.o sscal.o > +$(CB1AUX): $(FRC) > + > +ZB1AUX = idamax.o dasum.o daxpy.o dcopy.o dnrm2.o dscal.o > +$(ZB1AUX): $(FRC) > + > +#--------------------------------------------------------------------- > +# The following line defines auxiliary routines needed by both the > +# Level 2 and Level 3 BLAS. Comment it out only if you already have > +# both the Level 2 and 3 BLAS. > +#--------------------------------------------------------------------- > +ALLBLAS = lsame.o xerbla.o > +$(ALLBLAS) : $(FRC) > + > +#--------------------------------------------------------- > +# Comment out the next 4 definitions if you already have > +# the Level 2 BLAS. > +#--------------------------------------------------------- > +SBLAS2 = sgemv.o sgbmv.o ssymv.o ssbmv.o sspmv.o \ > + strmv.o stbmv.o stpmv.o strsv.o stbsv.o stpsv.o \ > + sger.o ssyr.o sspr.o ssyr2.o sspr2.o > +$(SBLAS2): $(FRC) > + > +CBLAS2 = cgemv.o cgbmv.o chemv.o chbmv.o chpmv.o \ > + ctrmv.o ctbmv.o ctpmv.o ctrsv.o ctbsv.o ctpsv.o \ > + cgerc.o cgeru.o cher.o chpr.o cher2.o chpr2.o > +$(CBLAS2): $(FRC) > + > +DBLAS2 = dgemv.o dgbmv.o dsymv.o dsbmv.o dspmv.o \ > + dtrmv.o dtbmv.o dtpmv.o dtrsv.o dtbsv.o dtpsv.o \ > + dger.o dsyr.o dspr.o dsyr2.o dspr2.o > +$(DBLAS2): $(FRC) > + > +ZBLAS2 = zgemv.o zgbmv.o zhemv.o zhbmv.o zhpmv.o \ > + ztrmv.o ztbmv.o ztpmv.o ztrsv.o ztbsv.o ztpsv.o \ > + zgerc.o zgeru.o zher.o zhpr.o zher2.o zhpr2.o > +$(ZBLAS2): $(FRC) > + > +#--------------------------------------------------------- > +# Comment out the next 4 definitions if you already have > +# the Level 3 BLAS. > +#--------------------------------------------------------- > +SBLAS3 = sgemm.o ssymm.o ssyrk.o ssyr2k.o strmm.o strsm.o > +$(SBLAS3): $(FRC) > + > +CBLAS3 = cgemm.o csymm.o csyrk.o csyr2k.o ctrmm.o ctrsm.o \ > + chemm.o cherk.o cher2k.o > +$(CBLAS3): $(FRC) > + > +DBLAS3 = dgemm.o dsymm.o dsyrk.o dsyr2k.o dtrmm.o dtrsm.o > +$(DBLAS3): $(FRC) > + > +ZBLAS3 = zgemm.o zsymm.o zsyrk.o zsyr2k.o ztrmm.o ztrsm.o \ > + zhemm.o zherk.o zher2k.o > +$(ZBLAS3): $(FRC) > + > + > +single: $(SBLAS1) $(ALLBLAS) $(SBLAS2) $(SBLAS3) > + $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(SBLAS1) $(ALLBLAS) \ > + $(SBLAS2) $(SBLAS3) > + $(RANLIB) $(BLASLIB) > + > +double: $(DBLAS1) $(ALLBLAS) $(DBLAS2) $(DBLAS3) > + $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(DBLAS1) $(ALLBLAS) \ > + $(DBLAS2) $(DBLAS3) > + $(RANLIB) $(BLASLIB) > + > +complex: $(CBLAS1) $(CB1AUX) $(ALLBLAS) $(CBLAS2) $(CBLAS3) > + $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(CBLAS1) $(CB1AUX) \ > + $(ALLBLAS) $(CBLAS2) $(CBLAS3) > + $(RANLIB) $(BLASLIB) > + > +complex16: $(ZBLAS1) $(ZB1AUX) $(ALLBLAS) $(ZBLAS2) $(ZBLAS3) > + $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(ZBLAS1) $(ZB1AUX) \ > + $(ALLBLAS) $(ZBLAS2) $(ZBLAS3) > + $(RANLIB) $(BLASLIB) > + > +FRC: > + @FRC=$(FRC) > + > +clean: > + rm -f *.o > + > + > Index: vendor/clapack/blas/SRC/Makefile > =================================================================== > RCS file: vendor/clapack/blas/SRC/Makefile > diff -N vendor/clapack/blas/SRC/Makefile > --- vendor/clapack/blas/SRC/Makefile 16 Mar 2006 23:11:40 -0000 1.1.1.1 > +++ /dev/null 1 Jan 1970 00:00:00 -0000 > @@ -1,160 +0,0 @@ > -include ../../make.inc > - > -####################################################################### > -# This is the makefile to create a library for the BLAS. > -# The files are grouped as follows: > -# > -# SBLAS1 -- Single precision real BLAS routines > -# CBLAS1 -- Single precision complex BLAS routines > -# DBLAS1 -- Double precision real BLAS routines > -# ZBLAS1 -- Double precision complex BLAS routines > -# > -# CB1AUX -- Real BLAS routines called by complex routines > -# ZB1AUX -- D.P. real BLAS routines called by d.p. complex > -# routines > -# > -# ALLBLAS -- Auxiliary routines for Level 2 and 3 BLAS > -# > -# SBLAS2 -- Single precision real BLAS2 routines > -# CBLAS2 -- Single precision complex BLAS2 routines > -# DBLAS2 -- Double precision real BLAS2 routines > -# ZBLAS2 -- Double precision complex BLAS2 routines > -# > -# SBLAS3 -- Single precision real BLAS3 routines > -# CBLAS3 -- Single precision complex BLAS3 routines > -# DBLAS3 -- Double precision real BLAS3 routines > -# ZBLAS3 -- Double precision complex BLAS3 routines > -# > -# The library can be set up to include routines for any combination > -# of the four precisions. To create or add to the library, enter make > -# followed by one or more of the precisions desired. Some examples: > -# make single > -# make single complex > -# make single double complex complex16 > -# Alternatively, the command > -# make > -# without any arguments creates a library of all four precisions. > -# The library is called > -# blas.a > -# > -# To remove the object files after the library is created, enter > -# make clean > -# To force the source files to be recompiled, enter, for example, > -# make single FRC=FRC > -# > -#--------------------------------------------------------------------- > -# > -# Edward Anderson, University of Tennessee > -# March 26, 1990 > -# Susan Ostrouchov, Last updated September 30, 1994 > -# > -####################################################################### > - > -all: single double complex complex16 > - > -#--------------------------------------------------------- > -# Comment out the next 6 definitions if you already have > -# the Level 1 BLAS. > -#--------------------------------------------------------- > -SBLAS1 = isamax.o sasum.o saxpy.o scopy.o sdot.o snrm2.o \ > - srot.o srotg.o sscal.o sswap.o > -$(SBLAS1): $(FRC) > - > -CBLAS1 = scasum.o scnrm2.o icamax.o caxpy.o ccopy.o \ > - cdotc.o cdotu.o csscal.o crotg.o cscal.o cswap.o > -$(CBLAS1): $(FRC) > - > -DBLAS1 = idamax.o dasum.o daxpy.o dcopy.o ddot.o dnrm2.o \ > - drot.o drotg.o dscal.o dswap.o > -$(DBLAS1): $(FRC) > - > -ZBLAS1 = dcabs1.o dzasum.o dznrm2.o izamax.o zaxpy.o zcopy.o \ > - zdotc.o zdotu.o zdscal.o zrotg.o zscal.o zswap.o > -$(ZBLAS1): $(FRC) > - > -CB1AUX = isamax.o sasum.o saxpy.o scopy.o snrm2.o sscal.o > -$(CB1AUX): $(FRC) > - > -ZB1AUX = idamax.o dasum.o daxpy.o dcopy.o dnrm2.o dscal.o > -$(ZB1AUX): $(FRC) > - > -#--------------------------------------------------------------------- > -# The following line defines auxiliary routines needed by both the > -# Level 2 and Level 3 BLAS. Comment it out only if you already have > -# both the Level 2 and 3 BLAS. > -#--------------------------------------------------------------------- > -ALLBLAS = lsame.o xerbla.o > -$(ALLBLAS) : $(FRC) > - > -#--------------------------------------------------------- > -# Comment out the next 4 definitions if you already have > -# the Level 2 BLAS. > -#--------------------------------------------------------- > -SBLAS2 = sgemv.o sgbmv.o ssymv.o ssbmv.o sspmv.o \ > - strmv.o stbmv.o stpmv.o strsv.o stbsv.o stpsv.o \ > - sger.o ssyr.o sspr.o ssyr2.o sspr2.o > -$(SBLAS2): $(FRC) > - > -CBLAS2 = cgemv.o cgbmv.o chemv.o chbmv.o chpmv.o \ > - ctrmv.o ctbmv.o ctpmv.o ctrsv.o ctbsv.o ctpsv.o \ > - cgerc.o cgeru.o cher.o chpr.o cher2.o chpr2.o > -$(CBLAS2): $(FRC) > - > -DBLAS2 = dgemv.o dgbmv.o dsymv.o dsbmv.o dspmv.o \ > - dtrmv.o dtbmv.o dtpmv.o dtrsv.o dtbsv.o dtpsv.o \ > - dger.o dsyr.o dspr.o dsyr2.o dspr2.o > -$(DBLAS2): $(FRC) > - > -ZBLAS2 = zgemv.o zgbmv.o zhemv.o zhbmv.o zhpmv.o \ > - ztrmv.o ztbmv.o ztpmv.o ztrsv.o ztbsv.o ztpsv.o \ > - zgerc.o zgeru.o zher.o zhpr.o zher2.o zhpr2.o > -$(ZBLAS2): $(FRC) > - > -#--------------------------------------------------------- > -# Comment out the next 4 definitions if you already have > -# the Level 3 BLAS. > -#--------------------------------------------------------- > -SBLAS3 = sgemm.o ssymm.o ssyrk.o ssyr2k.o strmm.o strsm.o > -$(SBLAS3): $(FRC) > - > -CBLAS3 = cgemm.o csymm.o csyrk.o csyr2k.o ctrmm.o ctrsm.o \ > - chemm.o cherk.o cher2k.o > -$(CBLAS3): $(FRC) > - > -DBLAS3 = dgemm.o dsymm.o dsyrk.o dsyr2k.o dtrmm.o dtrsm.o > -$(DBLAS3): $(FRC) > - > -ZBLAS3 = zgemm.o zsymm.o zsyrk.o zsyr2k.o ztrmm.o ztrsm.o \ > - zhemm.o zherk.o zher2k.o > -$(ZBLAS3): $(FRC) > - > - > -single: $(SBLAS1) $(ALLBLAS) $(SBLAS2) $(SBLAS3) > - $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(SBLAS1) $(ALLBLAS) \ > - $(SBLAS2) $(SBLAS3) > - $(RANLIB) $(BLASLIB) > - > -double: $(DBLAS1) $(ALLBLAS) $(DBLAS2) $(DBLAS3) > - $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(DBLAS1) $(ALLBLAS) \ > - $(DBLAS2) $(DBLAS3) > - $(RANLIB) $(BLASLIB) > - > -complex: $(CBLAS1) $(CB1AUX) $(ALLBLAS) $(CBLAS2) $(CBLAS3) > - $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(CBLAS1) $(CB1AUX) \ > - $(ALLBLAS) $(CBLAS2) $(CBLAS3) > - $(RANLIB) $(BLASLIB) > - > -complex16: $(ZBLAS1) $(ZB1AUX) $(ALLBLAS) $(ZBLAS2) $(ZBLAS3) > - $(ARCH) $(ARCHFLAGS) $(BLASLIB) $(ZBLAS1) $(ZB1AUX) \ > - $(ALLBLAS) $(ZBLAS2) $(ZBLAS3) > - $(RANLIB) $(BLASLIB) > - > -FRC: > - @FRC=$(FRC) > - > -clean: > - rm -f *.o > - > -.c.o: > - $(CC) $(CFLAGS) -c $*.c > - > Index: vendor/clapack/blas/SRC/blaswrap.h > =================================================================== > RCS file: /home/cvs/Repository/clapack/BLAS/SRC/blaswrap.h,v > retrieving revision 1.1.1.1 > diff -u -r1.1.1.1 blaswrap.h > --- vendor/clapack/blas/SRC/blaswrap.h 16 Mar 2006 23:11:40 -0000 1.1.1.1 > +++ vendor/clapack/blas/SRC/blaswrap.h 3 Jun 2006 10:41:20 -0000 > @@ -5,6 +5,8 @@ > #ifndef __BLASWRAP_H > #define __BLASWRAP_H > > +#define NO_BLAS_WRAP > + > #ifndef NO_BLAS_WRAP > > /* BLAS1 routines */ > ? examples/png.cpp > Index: examples/GNUmakefile.inc.in > =================================================================== > RCS file: /home/cvs/Repository/vpp/examples/GNUmakefile.inc.in,v > retrieving revision 1.9 > diff -u -r1.9 GNUmakefile.inc.in > --- examples/GNUmakefile.inc.in 1 May 2006 19:36:25 -0000 1.9 > +++ examples/GNUmakefile.inc.in 3 Jun 2006 12:13:25 -0000 > @@ -20,17 +20,22 @@ > $(patsubst $(srcdir)/%.cpp, %.$(OBJEXT), $(examples_cxx_sources)) > cxx_sources += $(examples_cxx_sources) > > +examples_targets := examples/example1 examples/png > + > ######################################################################## > # Rules > ######################################################################## > > all:: examples/example1$(EXEEXT) > > -examples/example1$(EXEEXT): examples/example1.$(OBJEXT) $(libs) > - $(CXX) $(LDFLAGS) -o $@ $< -Llib -lvsip $(LIBS) > +examples/png: override LIBS += -lvsip_csl -lpng > > install:: > $(INSTALL) -d $(DESTDIR)$(pkgdatadir) > $(INSTALL_DATA) $(examples_cxx_sources) $(DESTDIR)$(pkgdatadir) > $(INSTALL_DATA) examples/makefile.standalone \ > $(DESTDIR)$(pkgdatadir)/Makefile > + > +$(examples_targets): %$(EXEEXT): %.$(OBJEXT) $(libs) > + $(CXX) $(LDFLAGS) -o $@ $< -Llib -lvsip $(LIBS) > + > Index: vendor/clapack/SRC/make.inc.in > =================================================================== > RCS file: /home/cvs/Repository/clapack/SRC/make.inc.in,v > retrieving revision 1.4 > diff -u -r1.4 make.inc.in > --- vendor/clapack/SRC/make.inc.in 29 Mar 2006 16:07:54 -0000 1.4 > +++ vendor/clapack/SRC/make.inc.in 3 Jun 2006 12:23:42 -0000 > @@ -45,8 +45,8 @@ > # machine-specific, optimized BLAS library should be used whenever > # possible.) > # > -BLASLIB = ../../blas$(PLAT).a > -LAPACKLIB = lapack$(PLAT).a > +BLASLIB = ../../libcblas$(PLAT).a > +LAPACKLIB = liblapack$(PLAT).a > F2CLIB = ../../F2CLIBS/libF77.a ../../F2CLIBS/libI77.a > TMGLIB = tmglib$(PLAT).a > EIGSRCLIB = eigsrc$(PLAT).a -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon Jun 5 17:08:31 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 05 Jun 2006 13:08:31 -0400 Subject: [vsipl++] ATLAS Patch In-Reply-To: <44817FB0.10207@codesourcery.com> References: <44817FB0.10207@codesourcery.com> Message-ID: <4484650F.7050101@codesourcery.com> Assem Salama wrote: > Everyone, > This patch use the BLAS that comes with LAPACK. This allows us to not > have to deal with ATLAS at all. > > Thanks, > Assem Salama > ------------------------------------------------------------------- > > Index: configure.ac > =================================================================== > RCS file: /home/cvs/Repository/vpp/configure.ac,v > retrieving revision 1.105 > diff -u -r1.105 configure.ac > --- configure.ac 14 May 2006 20:57:05 -0000 1.105 > +++ configure.ac 3 Jun 2006 10:40:47 -0000 > @@ -175,8 +175,9 @@ > Library), acml (AMD Core Math Library), atlas (system > ATLAS/LAPACK installation), generic (system generic > LAPACK installation), builtin (Sourcery VSIPL++'s > - builtin ATLAS/C-LAPACK), and fortran-builtin (Sourcery > - VSIPL++'s builtin ATLAS/Fortran-LAPACK). > + builtin ATLAS/C-LAPACK), fortran-builtin (Sourcery > + VSIPL++'s builtin ATLAS/Fortran-LAPACK, and a simple (Lapack > + that doesn't require atlas).). > Specifying 'no' disables search for a LAPACK library.]),, > [with_lapack=probe]) Instead of "simple", let's call this "simple-builtin" to be consistent with the other builtin options. > > @@ -492,6 +493,9 @@ > #endif]) > vsip_impl_avoid_posix_memalign= > > +AC_CHECK_HEADERS([png.h], > + [AC_SUBST(HAVE_PNG_H, 1)], > + [], [// no prerequisites]) What is this doing here? > > # > # Find the FFT backends. > @@ -1275,6 +1279,8 @@ > lapack_packages="atlas generic1 generic2 builtin" > elif test "$with_lapack" == "generic"; then > lapack_packages="generic1 generic2" > + elif test "$with_lapack" == "simple"; then > + lapack_packages="simple"; > else > lapack_packages="$with_lapack" > fi > @@ -1515,6 +1521,19 @@ > AC_MSG_RESULT([not present]) > continue > fi > + elif test "$trypkg" == "simple"; then > + > + curdir=`pwd` Because this library is builtin, we need to handle CPPFLAGS and LDFLAGS differently than normal. For a normal library, such a math library that is already installed on the system, for example MKL, we would add -I and -L options to CPPFLAGS and LDFLAGS. The CPPFLAGS/LDFLAGS would get used both for building VSIPL++ and they would get put into the .pc file so that applications built with VSIPL++ would know where to find MKL. For a builtin library, such as LAPACK and BLAS in this case, the library is not already installed on the system (we are doing that as part of making VSIPL++). This creates a problem. When building the VSIPL++ library proper (i.e. doing a 'make' or 'make check'), we need to refer to the builtin library in its source tree location (it won't be installed in its final location until 'make install'). However, the -I and -L options that go into the .pc file should reflect its installed location, not its source tree location. To handle this, we do the following for builtin libraries: - -I and -L options that are to be used while building VSIPL++ go into INT_CPPFLAGS and INT_LDFLAGS. - -I and -L options that are to be used by applications once VSIPL++ has been installed should go into CPPFLAGS and LDFLAGS. - libraries that will be built go into LATE_LIBS. Putting them into LIBS will break subsequent AC_LINK_IFELSE's in the configure file. So you should do: INT_CPPFLAGS="$INT_CPPFLAGS -I$srcdir/vendor/clapack/SRC" INT_LDFLAGS="$INT_LDFLAGS -L$curdir/vendor/clapack" LATE_LIBS="$LATE_LIBS -llapack -lcblas" CPPFLAGS="$keep_CPPFLAGS -I$includedir/lapack" LDFLAGS="$keep_LDFLAGS -L$libdir/lapack" > + CPPFLAGS="$keep_CPPFLAGS -I$srcdir/vendor/clapack/SRC" Are there include files in clapack/SRC that are necessary for building other files in the library? > + LDFLAGS="$keep_LDFLAGS -L$curdir/vendor/clapack" > + LIBS="$keep_LIBS -llapack -lcblas" > + > + AC_SUBST(USE_SIMPLE_LAPACK, 1) > + > + lapack_use_ilaenv=0 > + lapack_found="simple" > + break > fi > > Index: vendor/GNUmakefile.inc.in > =================================================================== > RCS file: /home/cvs/Repository/vpp/vendor/GNUmakefile.inc.in,v > retrieving revision 1.15 > diff -u -r1.15 GNUmakefile.inc.in > --- vendor/GNUmakefile.inc.in 11 May 2006 11:29:04 -0000 1.15 > +++ vendor/GNUmakefile.inc.in 3 Jun 2006 10:41:15 -0000 > @@ -12,6 +12,7 @@ > # Variables > ######################################################################## > > +USE_SIMPLE_LAPACK := @USE_SIMPLE_LAPACK@ > USE_BUILTIN_ATLAS := @USE_BUILTIN_ATLAS@ > USE_FORTRAN_LAPACK := @USE_FORTRAN_LAPACK@ > USE_BUILTIN_LIBF77 := @USE_BUILTIN_LIBF77@ > @@ -20,7 +21,7 @@ > USE_BUILTIN_FFTW_DOUBLE := @USE_BUILTIN_FFTW_DOUBLE@ > USE_BUILTIN_FFTW_LONG_DOUBLE := @USE_BUILTIN_FFTW_LONG_DOUBLE@ > > -vendor_CLAPACK = vendor/clapack/lapack.a > +vendor_CLAPACK = vendor/clapack/liblapack.a Let's keep the name as lapack.a, so that it is consitent with the Fortran lapack.a. > vendor_FLAPACK = vendor/lapack/lapack.a > vendor_PRE_LAPACK = vendor/atlas/lib/libprelapack.a > vendor_USE_LAPACK = vendor/atlas/lib/liblapack.a > @@ -33,6 +34,7 @@ > endif > > vendor_LIBF77 = vendor/clapack/F2CLIBS/libF77/libF77.a > +vendor_SIMPLE_BLAS = vendor/clapack/libcblas.a > > > vendor_ATLAS_LIBS := \ > @@ -104,7 +106,6 @@ > @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > libF77.clean.log 2>&1 > endif > > - > clean:: > @echo "Cleaning ATLAS (see atlas.clean.log)" > @$(MAKE) -C vendor/atlas clean > atlas.clean.log 2>&1 > @@ -123,6 +124,53 @@ > endif # USE_FORTRAN_LAPACK > > endif # USE_BUILTIN_ATLAS > +################################################################################ > + > +ifdef USE_SIMPLE_LAPACK > +all:: $(vendor_SIMPLE_BLAS) $(vendor_REF_LAPACK) > + > +libs += $(vendor_F77BLAS) $(vendor_REF_LAPACK) > + > +$(vendor_SIMPLE_BLAS): > + @echo "Building simple BLAS (see simpleBLAS.build.log)" > + @$(MAKE) -C vendor/clapack/blas/SRC all > simpleBLAS.build.log 2>&1 > + > +ifdef USE_FORTRAN_LAPACK > +$(vendor_FLAPACK): > + @echo "Building LAPACK (see lapack.build.log)" > + @$(MAKE) -C vendor/lapack/SRC all > lapack.build.log 2>&1 > + > +clean:: > + @echo "Cleaning LAPACK (see lapack.clean.log)" > + @$(MAKE) -C vendor/lapack/SRC clean > lapack.clean.log 2>&1 > +else > +$(vendor_CLAPACK): > + @echo "Building CLAPACK (see clapack.build.log)" > + @$(MAKE) -C vendor/clapack/SRC all > clapack.build.log 2>&1 > + > +clean:: > + @echo "Cleaning CLAPACK (see clapack.clean.log)" > + @$(MAKE) -C vendor/clapack/SRC clean > clapack.clean.log 2>&1 > +endif # USE_FORTRAN_LAPACK > + > +ifdef USE_BUILTIN_LIBF77 > +all:: $(vendor_LIBF77) > + > +libs += $(vendor_LIBF77) > + > +$(vendor_LIBF77): > + @echo "Building libF77 (see libF77.build.log)" > + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 all > libF77.build.log 2>&1 > + > +install:: $(vendor_LIBF77) > + $(INSTALL_DATA) $(vendor_LIBF77) $(DESTDIR)$(libdir) > + > +clean:: > + @echo "Cleaning libF77 (see libF77.clean.log)" > + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > libF77.clean.log 2>&1 > +endif # USE_BUILTIN_LIBF77 > + > +endif # USE_SIMPLE_LAPACK We should be able to reorganize USE_BUILTIN_ATLAS and USE_SIMPL_LAPACK so that they share common rules (such as the rules for building LAPACK, LIBF77, etc). However, let's get this working and checked in first, then we can fix this later. > > > > Index: vendor/clapack/blas/SRC/GNUmakefile.in > =================================================================== Looks OK. > Index: vendor/clapack/blas/SRC/Makefile > =================================================================== Looks OK. > Index: vendor/clapack/SRC/make.inc.in > =================================================================== > RCS file: /home/cvs/Repository/clapack/SRC/make.inc.in,v > retrieving revision 1.4 > diff -u -r1.4 make.inc.in > --- vendor/clapack/SRC/make.inc.in 29 Mar 2006 16:07:54 -0000 1.4 > +++ vendor/clapack/SRC/make.inc.in 3 Jun 2006 12:23:42 -0000 > @@ -45,8 +45,8 @@ > # machine-specific, optimized BLAS library should be used whenever > # possible.) > # > -BLASLIB = ../../blas$(PLAT).a > -LAPACKLIB = lapack$(PLAT).a > +BLASLIB = ../../libcblas$(PLAT).a Let's call this libblas because it is a Fortran BLAS API, not a CBLAS API. > +LAPACKLIB = liblapack$(PLAT).a Let's leave this name unchanged so that it stays consistent with Fortran Lapack. > F2CLIB = ../../F2CLIBS/libF77.a ../../F2CLIBS/libI77.a > TMGLIB = tmglib$(PLAT).a > EIGSRCLIB = eigsrc$(PLAT).a -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From assem at codesourcery.com Mon Jun 5 17:31:20 2006 From: assem at codesourcery.com (Assem Salama) Date: Mon, 05 Jun 2006 13:31:20 -0400 Subject: [vsipl++] ATLAS Patch In-Reply-To: <4484650F.7050101@codesourcery.com> References: <44817FB0.10207@codesourcery.com> <4484650F.7050101@codesourcery.com> Message-ID: <44846A68.3010005@codesourcery.com> Jules, The reason that I had to change the lib names to lib... is because when I do a make check and try link with -llapack and -lcblas, it can't find them because they are called lapack.a and cblas.a instead of liblapack.a and libcblas.a. Assem Jules Bergmann wrote: > Assem Salama wrote: >> Everyone, >> This patch use the BLAS that comes with LAPACK. This allows us to >> not have to deal with ATLAS at all. >> >> Thanks, >> Assem Salama >> > ------------------------------------------------------------------- >> >> Index: configure.ac >> =================================================================== >> RCS file: /home/cvs/Repository/vpp/configure.ac,v >> retrieving revision 1.105 >> diff -u -r1.105 configure.ac >> --- configure.ac 14 May 2006 20:57:05 -0000 1.105 >> +++ configure.ac 3 Jun 2006 10:40:47 -0000 >> @@ -175,8 +175,9 @@ >> Library), acml (AMD Core Math Library), atlas (system >> ATLAS/LAPACK installation), generic (system generic >> LAPACK installation), builtin (Sourcery VSIPL++'s >> - builtin ATLAS/C-LAPACK), and fortran-builtin (Sourcery >> - VSIPL++'s builtin ATLAS/Fortran-LAPACK). + >> builtin ATLAS/C-LAPACK), fortran-builtin (Sourcery >> + VSIPL++'s builtin ATLAS/Fortran-LAPACK, and a simple (Lapack >> + that doesn't require atlas).). >> Specifying 'no' disables search for a LAPACK library.]),, >> [with_lapack=probe]) > > Instead of "simple", let's call this "simple-builtin" to be consistent > with the other builtin options. > >> >> @@ -492,6 +493,9 @@ >> #endif]) >> vsip_impl_avoid_posix_memalign= >> >> +AC_CHECK_HEADERS([png.h], + [AC_SUBST(HAVE_PNG_H, >> 1)], + [], [// no prerequisites]) > > What is this doing here? > >> >> # >> # Find the FFT backends. >> @@ -1275,6 +1279,8 @@ >> lapack_packages="atlas generic1 generic2 builtin" >> elif test "$with_lapack" == "generic"; then >> lapack_packages="generic1 generic2" >> + elif test "$with_lapack" == "simple"; then >> + lapack_packages="simple"; >> else >> lapack_packages="$with_lapack" >> fi >> @@ -1515,6 +1521,19 @@ >> AC_MSG_RESULT([not present]) >> continue >> fi >> + elif test "$trypkg" == "simple"; then >> + >> + curdir=`pwd` > > Because this library is builtin, we need to handle CPPFLAGS and > LDFLAGS differently than normal. > > For a normal library, such a math library that is already installed on > the system, for example MKL, we would add -I and -L options to > CPPFLAGS and LDFLAGS. The CPPFLAGS/LDFLAGS would get used both for > building VSIPL++ and they would get put into the .pc file so that > applications built with VSIPL++ would know where to find MKL. > > For a builtin library, such as LAPACK and BLAS in this case, the > library is not already installed on the system (we are doing that as > part of making VSIPL++). This creates a problem. When building the > VSIPL++ library proper (i.e. doing a 'make' or 'make check'), we need > to refer to the builtin library in its source tree location (it won't > be installed in its final location until 'make install'). However, > the -I and -L options that go into the .pc file should reflect its > installed location, not its source tree location. > > To handle this, we do the following for builtin libraries: > - -I and -L options that are to be used while building VSIPL++ go into > INT_CPPFLAGS and INT_LDFLAGS. > - -I and -L options that are to be used by applications once VSIPL++ > has been installed should go into CPPFLAGS and LDFLAGS. > - libraries that will be built go into LATE_LIBS. Putting them > into LIBS will break subsequent AC_LINK_IFELSE's in the > configure file. > > So you should do: > > INT_CPPFLAGS="$INT_CPPFLAGS -I$srcdir/vendor/clapack/SRC" > INT_LDFLAGS="$INT_LDFLAGS -L$curdir/vendor/clapack" > LATE_LIBS="$LATE_LIBS -llapack -lcblas" > CPPFLAGS="$keep_CPPFLAGS -I$includedir/lapack" > LDFLAGS="$keep_LDFLAGS -L$libdir/lapack" > > >> + CPPFLAGS="$keep_CPPFLAGS -I$srcdir/vendor/clapack/SRC" > > Are there include files in clapack/SRC that are necessary for building > other files in the library? > >> + LDFLAGS="$keep_LDFLAGS -L$curdir/vendor/clapack" >> + LIBS="$keep_LIBS -llapack -lcblas" >> + >> + AC_SUBST(USE_SIMPLE_LAPACK, 1) >> + + lapack_use_ilaenv=0 >> + lapack_found="simple" >> + break >> fi >> >> Index: vendor/GNUmakefile.inc.in >> =================================================================== >> RCS file: /home/cvs/Repository/vpp/vendor/GNUmakefile.inc.in,v >> retrieving revision 1.15 >> diff -u -r1.15 GNUmakefile.inc.in >> --- vendor/GNUmakefile.inc.in 11 May 2006 11:29:04 -0000 1.15 >> +++ vendor/GNUmakefile.inc.in 3 Jun 2006 10:41:15 -0000 >> @@ -12,6 +12,7 @@ >> # Variables >> ######################################################################## >> >> >> +USE_SIMPLE_LAPACK := @USE_SIMPLE_LAPACK@ >> USE_BUILTIN_ATLAS := @USE_BUILTIN_ATLAS@ >> USE_FORTRAN_LAPACK := @USE_FORTRAN_LAPACK@ >> USE_BUILTIN_LIBF77 := @USE_BUILTIN_LIBF77@ >> @@ -20,7 +21,7 @@ >> USE_BUILTIN_FFTW_DOUBLE := @USE_BUILTIN_FFTW_DOUBLE@ >> USE_BUILTIN_FFTW_LONG_DOUBLE := @USE_BUILTIN_FFTW_LONG_DOUBLE@ >> >> -vendor_CLAPACK = vendor/clapack/lapack.a >> +vendor_CLAPACK = vendor/clapack/liblapack.a > > Let's keep the name as lapack.a, so that it is consitent with the > Fortran lapack.a. > >> vendor_FLAPACK = vendor/lapack/lapack.a >> vendor_PRE_LAPACK = vendor/atlas/lib/libprelapack.a >> vendor_USE_LAPACK = vendor/atlas/lib/liblapack.a >> @@ -33,6 +34,7 @@ >> endif >> >> vendor_LIBF77 = vendor/clapack/F2CLIBS/libF77/libF77.a >> +vendor_SIMPLE_BLAS = vendor/clapack/libcblas.a >> >> >> vendor_ATLAS_LIBS := \ >> @@ -104,7 +106,6 @@ >> @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > >> libF77.clean.log 2>&1 >> endif >> >> - >> clean:: >> @echo "Cleaning ATLAS (see atlas.clean.log)" >> @$(MAKE) -C vendor/atlas clean > atlas.clean.log 2>&1 >> @@ -123,6 +124,53 @@ >> endif # USE_FORTRAN_LAPACK >> >> endif # USE_BUILTIN_ATLAS >> +################################################################################ >> >> + >> +ifdef USE_SIMPLE_LAPACK >> +all:: $(vendor_SIMPLE_BLAS) $(vendor_REF_LAPACK) >> + >> +libs += $(vendor_F77BLAS) $(vendor_REF_LAPACK) >> + >> +$(vendor_SIMPLE_BLAS): >> + @echo "Building simple BLAS (see simpleBLAS.build.log)" >> + @$(MAKE) -C vendor/clapack/blas/SRC all > simpleBLAS.build.log 2>&1 >> + >> +ifdef USE_FORTRAN_LAPACK >> +$(vendor_FLAPACK): >> + @echo "Building LAPACK (see lapack.build.log)" >> + @$(MAKE) -C vendor/lapack/SRC all > lapack.build.log 2>&1 >> + >> +clean:: >> + @echo "Cleaning LAPACK (see lapack.clean.log)" >> + @$(MAKE) -C vendor/lapack/SRC clean > lapack.clean.log 2>&1 >> +else >> +$(vendor_CLAPACK): >> + @echo "Building CLAPACK (see clapack.build.log)" >> + @$(MAKE) -C vendor/clapack/SRC all > clapack.build.log 2>&1 >> + >> +clean:: >> + @echo "Cleaning CLAPACK (see clapack.clean.log)" >> + @$(MAKE) -C vendor/clapack/SRC clean > clapack.clean.log 2>&1 >> +endif # USE_FORTRAN_LAPACK >> + >> +ifdef USE_BUILTIN_LIBF77 >> +all:: $(vendor_LIBF77) >> + >> +libs += $(vendor_LIBF77) >> + >> +$(vendor_LIBF77): >> + @echo "Building libF77 (see libF77.build.log)" >> + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 all > libF77.build.log >> 2>&1 >> + >> +install:: $(vendor_LIBF77) >> + $(INSTALL_DATA) $(vendor_LIBF77) $(DESTDIR)$(libdir) >> + >> +clean:: >> + @echo "Cleaning libF77 (see libF77.clean.log)" >> + @$(MAKE) -C vendor/clapack/F2CLIBS/libF77 clean > >> libF77.clean.log 2>&1 >> +endif # USE_BUILTIN_LIBF77 >> + >> +endif # USE_SIMPLE_LAPACK > > We should be able to reorganize USE_BUILTIN_ATLAS and USE_SIMPL_LAPACK > so that they share common rules (such as the rules for building > LAPACK, LIBF77, etc). However, let's get this working and checked in > first, then we can fix this later. >> >> >> >> Index: vendor/clapack/blas/SRC/GNUmakefile.in >> =================================================================== > > Looks OK. > >> Index: vendor/clapack/blas/SRC/Makefile >> =================================================================== > > Looks OK. > >> Index: vendor/clapack/SRC/make.inc.in >> =================================================================== >> RCS file: /home/cvs/Repository/clapack/SRC/make.inc.in,v >> retrieving revision 1.4 >> diff -u -r1.4 make.inc.in >> --- vendor/clapack/SRC/make.inc.in 29 Mar 2006 16:07:54 -0000 1.4 >> +++ vendor/clapack/SRC/make.inc.in 3 Jun 2006 12:23:42 -0000 >> @@ -45,8 +45,8 @@ >> # machine-specific, optimized BLAS library should be used whenever >> # possible.) >> # >> -BLASLIB = ../../blas$(PLAT).a >> -LAPACKLIB = lapack$(PLAT).a >> +BLASLIB = ../../libcblas$(PLAT).a > > Let's call this libblas because it is a Fortran BLAS API, not a CBLAS > API. > >> +LAPACKLIB = liblapack$(PLAT).a > > Let's leave this name unchanged so that it stays consistent with > Fortran Lapack. > >> F2CLIB = ../../F2CLIBS/libF77.a ../../F2CLIBS/libI77.a >> TMGLIB = tmglib$(PLAT).a >> EIGSRCLIB = eigsrc$(PLAT).a > > From assem at codesourcery.com Mon Jun 5 18:10:08 2006 From: assem at codesourcery.com (Assem Salama) Date: Mon, 05 Jun 2006 14:10:08 -0400 Subject: Matlab IO Patch Message-ID: <44847380.5020401@codesourcery.com> Everyone, New Matlab IO patch with Jule's suggestions. Thanks, Assem Salama -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cvs.diff.06052006.1.log URL: From jules at codesourcery.com Mon Jun 5 18:22:20 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 05 Jun 2006 14:22:20 -0400 Subject: [vsipl++] ATLAS Patch In-Reply-To: <44846A68.3010005@codesourcery.com> References: <44817FB0.10207@codesourcery.com> <4484650F.7050101@codesourcery.com> <44846A68.3010005@codesourcery.com> Message-ID: <4484765C.1050205@codesourcery.com> Assem Salama wrote: > > Jules, > The reason that I had to change the lib names to lib... is because when > I do a make check and try link with -llapack and -lcblas, it can't find > them because they are called lapack.a and cblas.a instead of liblapack.a > and libcblas.a. > Assem, That sounds fine. I had some idea about renaming it when copying it, but we don't do that copy right now. Even if we did, it makes sense to name it 'liblapack' from the start anyway. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From mark at codesourcery.com Mon Jun 5 18:26:19 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 05 Jun 2006 11:26:19 -0700 Subject: [vsipl++] Matlab IO Patch In-Reply-To: <44847380.5020401@codesourcery.com> References: <44847380.5020401@codesourcery.com> Message-ID: <4484774B.6050604@codesourcery.com> Assem Salama wrote: > + // is this the same class? > + if(!(m_view.array_flags[0] == > + (matlab::Matlab_header_traits + std::numeric_limits::is_signed, > + std::numeric_limits::is_integer>::class_type))) > + VSIP_IMPL_THROW(vsip::impl::unimplemented( > + "Trying to read a matrix of a different class")); > + // do dimensions agree? > + if(v_dim == 1) m_view.dim_header.size -= 4; // special case for vectors > + if(v_dim != (m_view.dim_header.size/4)) > + VSIP_IMPL_THROW(vsip::impl::unimplemented( > + "Trying to read a matrix of different dimensions")); "unimplemented" should only be use for things that we plan to implement, but haven't. Do we really ever expect to read a matrix of the wrong size? I think most of these things should just be errors, not unimplemented. > + /* > + strncpy(mbf.view_name.data(), > + reinterpret_cast(&m_view.array_name_header.size), > + length); > + mbf.view_name[length] = 0; > + */ No commented-out code. Assem, I know this has been pointed out before; please check your patches for this before submission. > + // Because we don't know how the data was stored, we need to instantiate > + // generic_reader which can read a type and cast into a different one > + if(temp_data_element.type == matlab::miINT8) > + { > + if(i==0)matlab::read(is,subview::real(mbf.v)); > + else matlab::read(is,subview::imag(mbf.v)); > + } > + else if(temp_data_element.type == matlab::miUINT8) > + { > + if(i==0)matlab::read(is,subview::real(mbf.v)); > + else matlab::read(is,subview::imag(mbf.v)); > + } This cascase of if's could be a case statement, something like this: istream (*fn)(istream&, subview::subview_type); switch (temp_data_element.type) { case matlab::miINT8: fn = matlab::read; break; case matlab::miUINT8: fn = matlab::read; break; ... } if (i == 0) fn(is,subview::real(mbf.v)); else fn(is,subview::imag(mbf.v)); I don't know if that's better; just suggesting it as possibly tidier. > + /// This struct is just used as a wrapper so that we can overload the > + /// << operator > + template > + struct Matlab_text_formatter > + { > + Matlab_text_formatter(ViewT v) : v_(v), view_name_("a") {} > + Matlab_text_formatter(ViewT v,std::string name) : > + v_(v), view_name_(name) {} > + > + ViewT v_; > + std::string view_name_; > + }; Another approach, is to add a "write" function to Matlab_text_formatter: void Matlab_text_formatter::write(ostream& os) { // Whatever is currently in operator<< } inline void std::ostream& operator<<(std::ostream& os, Matlab_text_formater mf) { mf.write(os); return os; } This is somewhat more "object-oriented". One advantage is that you then have a useful comment for Matlab_text_formatter: // A Matlab_text_formatter writes the contents of a view to a stream, // using the Matlab file format. -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From assem at codesourcery.com Mon Jun 5 18:59:27 2006 From: assem at codesourcery.com (Assem Salama) Date: Mon, 05 Jun 2006 14:59:27 -0400 Subject: [vsipl++] Matlab IO Patch In-Reply-To: <4484774B.6050604@codesourcery.com> References: <44847380.5020401@codesourcery.com> <4484774B.6050604@codesourcery.com> Message-ID: <44847F0F.40800@codesourcery.com> The reason that I have the comment is because I was planning on reading the array name into view_name_. I didn't know how to do that at the time so I just read it into a temporary array and commented out the part that didn't work. By leaving the comment in there for now, I will not forget that I was planning on fixing that part. Thanks, Assem Salama Mark Mitchell wrote: > Assem Salama wrote: > > >> + // is this the same class? >> + if(!(m_view.array_flags[0] == >> + (matlab::Matlab_header_traits> + std::numeric_limits::is_signed, >> + std::numeric_limits::is_integer>::class_type))) >> + VSIP_IMPL_THROW(vsip::impl::unimplemented( >> + "Trying to read a matrix of a different class")); >> > > >> + // do dimensions agree? >> + if(v_dim == 1) m_view.dim_header.size -= 4; // special case for vectors >> + if(v_dim != (m_view.dim_header.size/4)) >> + VSIP_IMPL_THROW(vsip::impl::unimplemented( >> + "Trying to read a matrix of different dimensions")); >> > > "unimplemented" should only be use for things that we plan to implement, > but haven't. Do we really ever expect to read a matrix of the wrong > size? I think most of these things should just be errors, not > unimplemented. > > >> + /* >> + strncpy(mbf.view_name.data(), >> + reinterpret_cast(&m_view.array_name_header.size), >> + length); >> + mbf.view_name[length] = 0; >> + */ >> > > No commented-out code. Assem, I know this has been pointed out before; > please check your patches for this before submission. > > > >> + // Because we don't know how the data was stored, we need to instantiate >> + // generic_reader which can read a type and cast into a different one >> + if(temp_data_element.type == matlab::miINT8) >> + { >> + if(i==0)matlab::read(is,subview::real(mbf.v)); >> + else matlab::read(is,subview::imag(mbf.v)); >> + } >> + else if(temp_data_element.type == matlab::miUINT8) >> + { >> + if(i==0)matlab::read(is,subview::real(mbf.v)); >> + else matlab::read(is,subview::imag(mbf.v)); >> + } >> > > This cascase of if's could be a case statement, something like this: > > istream (*fn)(istream&, subview::subview_type); > switch (temp_data_element.type) { > case matlab::miINT8: > fn = matlab::read; > break; > case matlab::miUINT8: > fn = matlab::read; > break; > ... > } > if (i == 0) > fn(is,subview::real(mbf.v)); > else > fn(is,subview::imag(mbf.v)); > > I don't know if that's better; just suggesting it as possibly tidier. > > >> + /// This struct is just used as a wrapper so that we can overload the >> + /// << operator >> + template >> + struct Matlab_text_formatter >> + { >> + Matlab_text_formatter(ViewT v) : v_(v), view_name_("a") {} >> + Matlab_text_formatter(ViewT v,std::string name) : >> + v_(v), view_name_(name) {} >> + >> + ViewT v_; >> + std::string view_name_; >> + }; >> > > Another approach, is to add a "write" function to Matlab_text_formatter: > > void > Matlab_text_formatter::write(ostream& os) { > // Whatever is currently in operator<< > } > > inline void > std::ostream& operator<<(std::ostream& os, Matlab_text_formater mf) { > mf.write(os); > return os; > } > > This is somewhat more "object-oriented". One advantage is that you then > have a useful comment for Matlab_text_formatter: > > // A Matlab_text_formatter writes the contents of a view to a stream, > // using the Matlab file format. > > From mark at codesourcery.com Mon Jun 5 19:22:29 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 05 Jun 2006 12:22:29 -0700 Subject: [vsipl++] Matlab IO Patch In-Reply-To: <44847F0F.40800@codesourcery.com> References: <44847380.5020401@codesourcery.com> <4484774B.6050604@codesourcery.com> <44847F0F.40800@codesourcery.com> Message-ID: <44848475.7060000@codesourcery.com> Assem Salama wrote: > The reason that I have the comment is because I was planning on reading > the array name into view_name_. I didn't know how to do that at the time > so I just read it into a temporary array and commented out the part that > didn't work. By leaving the comment in there for now, I will not forget > that I was planning on fixing that part. We have an issue-tracker for that. :-) It's bad if code gets shipped to a customer with FIXMEs or commented-out code; that makes it look like we gave them code we didn't find satisfactory. If you don't have a VSIPL++ tracker account, Stefan will be happy to help you get set up with that. Thanks, -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Mon Jun 5 21:50:28 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 05 Jun 2006 17:50:28 -0400 Subject: [patch] Fix for issue #117 Message-ID: <4484A724.8010207@codesourcery.com> This patch fixes issue #117, C += a*X not being dispatched to SAL. It also fixes a broken ifdef in coverage.hpp, and adds complex scalar-vector multiply benchmarks to vmul_sal. (Neither change is related to issue #117). Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fix117.diff URL: From jules at codesourcery.com Wed Jun 7 15:45:55 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 07 Jun 2006 11:45:55 -0400 Subject: [patch] Evaluate dense matrix/tensor expressions as vector expressions Message-ID: <4486F4B3.6000006@codesourcery.com> This patch evaluates expressions of dense matrices and tensors as if they were expressions of vectors. This allows our existing dispatch machinary for SAL, IPP, etc (which primarily apply to vectors) to be used were applicable. This patch has a positive performance impact on the CFAR u-benchmarks (email yesterday). However, this patch does not include additional IPP dispatch (I need to clean that up further) that went into those graphs. All tests pass for gcc-3.4, using IPP/MKL on cugel. I'm very impressed with the coverage our test suite provides. I've spent a half day fixing incomplete/broken bits in my patch that were identified by the suite. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: eval_dense_expr.diff URL: From don at codesourcery.com Wed Jun 7 18:22:45 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 07 Jun 2006 12:22:45 -0600 Subject: [patch] CFAR benchmark update Message-ID: <44871975.4080307@codesourcery.com> The attached patch enhances the CFAR benchmark by providing a second algorithm that processes the data by range vector instead of computing the values for a single range cell over all vectors. Some other minor changes help increase the stability of the algorithm with respect to avoiding false hits and outright misses in terms of finding targets. As the rework was extensive by the looks of the changes, the original algorithm did not change substantially (although the dimension ordering did change in the explicit declaration of the data cube, it can be adjusted easily in order to take advantage of recent dispatch additions). For ease of review, I'm including a patched copy of the benchmark as well. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cf3.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cf3.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfar.cpp Type: text/x-c++src Size: 20983 bytes Desc: not available URL: From assem at codesourcery.com Wed Jun 7 21:07:43 2006 From: assem at codesourcery.com (Assem Salama) Date: Wed, 07 Jun 2006 17:07:43 -0400 Subject: Matlab IO Message-ID: <4487401F.1020406@codesourcery.com> Everyone, This patch changes the names of mbf.v to mbf.view and mbf.view_name to mbf.name. Also changed unimplemented throws to errors. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cvs.diff.06072006.1.log URL: From don at codesourcery.com Thu Jun 8 08:33:47 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 08 Jun 2006 02:33:47 -0600 Subject: [patch] Firbank memory allocation Message-ID: <4487E0EB.20802@codesourcery.com> The attached patch corrects a memory allocation bug in the HPEC FIR Filter Bank benchmark. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb3.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb3.diff URL: From jules at codesourcery.com Thu Jun 8 17:50:58 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 08 Jun 2006 13:50:58 -0400 Subject: [vsipl++] [patch] CFAR benchmark update In-Reply-To: <44871975.4080307@codesourcery.com> References: <44871975.4080307@codesourcery.com> Message-ID: <44886382.9070603@codesourcery.com> Don McCoy wrote: > The attached patch enhances the CFAR benchmark by providing a second > algorithm that processes the data by range vector instead of computing > the values for a single range cell over all vectors. Some other minor > changes help increase the stability of the algorithm with respect to > avoiding false hits and outright misses in terms of finding targets. > > As the rework was extensive by the looks of the changes, the original > algorithm did not change substantially (although the dimension ordering > did change in the explicit declaration of the data cube, it can be > adjusted easily in order to take advantage of recent dispatch > additions). For ease of review, I'm including a patched copy of the > benchmark as well. > > Regards, > > Don, This looks good. Can you please check it in ASAP? That will make it easier to merge in the SIMD changes. thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Jun 8 18:55:54 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 08 Jun 2006 14:55:54 -0400 Subject: [patch] Fix bug in how dimension-ordering is determined for Sliced_block. Message-ID: <448872BA.3050507@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: subblock.diff URL: From jules at codesourcery.com Thu Jun 8 21:47:28 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 08 Jun 2006 17:47:28 -0400 Subject: [patch] Minor CFAR changes Message-ID: <44889AF0.2020201@codesourcery.com> This patch makes some minor changes to CFAR. For the slice version, it changes expressions to be more amenable to math library dispatch and changes the dimension-ordering to make subviews dense. For the vector version, it fixes a bug with sum (should be reset to 0 for each vector), reduces the temporary footprint used, and uses get() instead of (). Attached graphs show original (cfar-orig) and new (cfar) performance, for GCC 3.4 and GCC 4.1 on Pastec. The changes for the slice version have a larger impact. Using 4.1 is a win! I am cleaning up the C, C-simd and VSIPL++ SIMD versions. I'm planning to put the C and C-simd versions in a separate source file (cfar_c.cpp) and the VSIPL++ SIMD version in cfar.cpp (as t_cfar_hybrid). I "fixed" gnuplot to avoid using both red and green, although the color choices still don't seem ideal. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cfar-1.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfar-all-23.png Type: image/png Size: 4404 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfar-all-3.png Type: image/png Size: 4515 bytes Desc: not available URL: From don at codesourcery.com Thu Jun 8 22:20:59 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 08 Jun 2006 16:20:59 -0600 Subject: [vsipl++] [patch] Minor CFAR changes In-Reply-To: <44889AF0.2020201@codesourcery.com> References: <44889AF0.2020201@codesourcery.com> Message-ID: <4488A2CB.4030609@codesourcery.com> Jules Bergmann wrote: > Attached graphs show original (cfar-orig) and new (cfar) performance, > for GCC 3.4 and GCC 4.1 on Pastec. The changes for the slice version > have a larger impact. Using 4.1 is a win! Wow! Those are some nice results. Thanks for finding these issues and in general for helping me to better understand what was going on. I think this was a very instructive example. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From mark at codesourcery.com Thu Jun 8 22:26:32 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Thu, 08 Jun 2006 15:26:32 -0700 Subject: [vsipl++] [patch] Minor CFAR changes In-Reply-To: <4488A2CB.4030609@codesourcery.com> References: <44889AF0.2020201@codesourcery.com> <4488A2CB.4030609@codesourcery.com> Message-ID: <4488A418.5010406@codesourcery.com> Don McCoy wrote: > Jules Bergmann wrote: >> Attached graphs show original (cfar-orig) and new (cfar) performance, >> for GCC 3.4 and GCC 4.1 on Pastec. The changes for the slice version >> have a larger impact. Using 4.1 is a win! What's Pastec? It's nice to know GCC 4.1 is good for something! But, from what you said this morning, don't those results still fall short, relative to the C code? -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Thu Jun 8 22:49:21 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 08 Jun 2006 18:49:21 -0400 Subject: [vsipl++] [patch] Minor CFAR changes In-Reply-To: <4488A418.5010406@codesourcery.com> References: <44889AF0.2020201@codesourcery.com> <4488A2CB.4030609@codesourcery.com> <4488A418.5010406@codesourcery.com> Message-ID: <4488A971.6050709@codesourcery.com> Mark Mitchell wrote: > Don McCoy wrote: >> Jules Bergmann wrote: >>> Attached graphs show original (cfar-orig) and new (cfar) performance, >>> for GCC 3.4 and GCC 4.1 on Pastec. The changes for the slice version >>> have a larger impact. Using 4.1 is a win! > > What's Pastec? Pastec is another name for the GTRI cluster, aka durip (some acronym or such). > > It's nice to know GCC 4.1 is good for something! Good job! > But, from what you > said this morning, don't those results still fall short, relative to the > C code? Yes, that's right. I'm producing results for those cases now. However, it looks like 4.1 boosted our "slice" version, while at the same time pessimizing the plain C "vector" version. For a particular dataset size (dataset #3 at 200 gates): Variation MFLOPS 3.4 VSIPL++ slice 136 3.4 VSIPL++ vector 60 3.4 C vector 141 3.4 C+SIMD vector 470 4.1 VSIPL++ slice 226 4.1 VSIPL++ vector 100 4.1 C vector 128 4.1 C+SIMD vector 830 (I need to repackage/rerun the VSIPL++ + SIMD approach.) Question on SIMD: For the C+SIMD version, I used the intrinsics from xmmintrin.h (__m128, _mm_add_ps(), etc). This works with both 3.4 and 4.1. For the VSIPL++ SIMD version, I used the GCC vector extensions (typedef float v4sf __attribute++ ((vector_size(16))), '+' operator). The typedefs work with 3.4 and 4.1, but the operators (+, *, etc) only work with 4.1. Is there any difference in code generated from these two approaches? In particular, would it be worthwhile at all to recode the C+SIMD version to use the vector extensions? thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Fri Jun 9 00:56:23 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 08 Jun 2006 20:56:23 -0400 Subject: [patch] C impl of CFAR, VSIPL++ SIMD impl Message-ID: <4488C737.1040407@codesourcery.com> -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cfar-2.diff URL: From mark at codesourcery.com Fri Jun 9 17:02:07 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Fri, 09 Jun 2006 10:02:07 -0700 Subject: [vsipl++] [patch] Minor CFAR changes In-Reply-To: <4488A971.6050709@codesourcery.com> References: <44889AF0.2020201@codesourcery.com> <4488A2CB.4030609@codesourcery.com> <4488A418.5010406@codesourcery.com> <4488A971.6050709@codesourcery.com> Message-ID: <4489A98F.6000805@codesourcery.com> Jules Bergmann wrote: > Question on SIMD: For the C+SIMD version, I used the intrinsics from > xmmintrin.h (__m128, _mm_add_ps(), etc). This works with both 3.4 and > 4.1. For the VSIPL++ SIMD version, I used the GCC vector extensions > (typedef float v4sf __attribute++ ((vector_size(16))), '+' operator). > The typedefs work with 3.4 and 4.1, but the operators (+, *, etc) only > work with 4.1. Is there any difference in code generated from these two > approaches? I would not think so. However, if there *is* a difference, I would expect the xmmintrin.h to be better; that's mapping directly to the underlying instructions, with no compiler cleverness. -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Fri Jun 9 21:31:06 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 09 Jun 2006 17:31:06 -0400 Subject: [patch] Fixes to run CFAR benchmark in parallel. Message-ID: <4489E89A.7060904@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cfar-3.diff URL: From jules at codesourcery.com Tue Jun 13 02:15:49 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 12 Jun 2006 22:15:49 -0400 Subject: [vsipl++] Matlab IO In-Reply-To: <4487401F.1020406@codesourcery.com> References: <4487401F.1020406@codesourcery.com> Message-ID: <448E1FD5.8040801@codesourcery.com> Assem Salama wrote: > Everyone, > This patch changes the names of mbf.v to mbf.view and mbf.view_name to > mbf.name. Also changed unimplemented throws to errors. Assem, In general, there are a lot of comparisons being done between signed and unsigned values below. Can you recompile with the '-W -Wall' options, that will help catch these. It is a good practice to use '-W -Wall' when developing. I have a few more comments below, please take a look. I think this is starting to converge. I also have an action item for myself after reviewing this patch: - define a column-major next() I'll post something for this shortly. It would also be nice to do the following, but it is not critical. - move get_real_ptr/get_image_ptr functionality into Allocated_storage - move Subview_helper functionality into view class I'll capture these as issues. -- Jules > > Thanks, > Assem > > > ------------------------------------------------------------------------ > > ? .matlab.hpp.swp > ? generic_reader.hpp > ? matlab_temp > ? png.cpp > ? png.hpp > Index: GNUmakefile.inc.in > =================================================================== > RCS file: /home/cvs/Repository/vpp/src/vsip_csl/GNUmakefile.inc.in,v > retrieving revision 1.1 > diff -u -r1.1 GNUmakefile.inc.in > --- GNUmakefile.inc.in 8 May 2006 03:49:44 -0000 1.1 > +++ GNUmakefile.inc.in 7 Jun 2006 21:06:53 -0000 > @@ -12,13 +12,36 @@ > # Variables > ######################################################################## > > +VSIP_CSL_HAVE_PNG := @HAVE_PNG_H@ > + > +src_vsip_csl_CXXINCLUDES := -I$(srcdir)/src > +src_vsip_csl_CXXFLAGS := $(src_vsip_csl_CXXINCLUDES) > + > +ifdef VSIP_CSL_HAVE_PNG > +src_vsip_csl_cxx_sources += $(srcdir)/src/vsip_csl/png.cpp > +endif If you're not including png.cpp as part of your patch, why are you adding it to the makefile? > +src_vsip_csl_cxx_objects := $(patsubst $(srcdir)/%.cpp, %.$(OBJEXT),\ > + $(src_vsip_csl_cxx_sources)) > +cxx_sources += $(src_vsip_csl_cxx_sources) > + > +libs += lib/libvsip_csl.a > > ######################################################################## > # Rules > ######################################################################## > > +all:: lib/libvsip_csl.a > + > +clean:: > + rm -f lib/libvsip_csl.a > + > +lib/libvsip_csl.a: $(src_vsip_csl_cxx_objects) > + $(AR) rc $@ $^ || rm -f $@ > + > # Install the extensions library and its header files. > install:: > + $(INSTALL) -d $(DESTDIR)$(libdir) > + $(INSTALL_DATA) lib/libvsip_csl.a $(DESTDIR)$(libdir)/libvsip_csl$(suffix).a > $(INSTALL) -d $(DESTDIR)$(includedir)/vsip_csl > for header in $(wildcard $(srcdir)/src/vsip_csl/*.hpp); do \ > $(INSTALL_DATA) $$header $(DESTDIR)$(includedir)/vsip_csl; \ > Index: matlab.hpp > =================================================================== > RCS file: matlab.hpp > diff -N matlab.hpp > --- /dev/null 1 Jan 1970 00:00:00 -0000 > +++ matlab.hpp 7 Jun 2006 21:06:54 -0000 > @@ -0,0 +1,277 @@ > +#ifndef VSIP_CSL_MATLAB_HPP > +#define VSIP_CSL_MATLAB_HPP > + What header are the types in32_t, etc getting defined in? > +#include > +#include > +#include > +#include > +#include > + > +namespace vsip_csl > +{ > + > +namespace matlab > +{ > + struct data_element > + { > + int32_t type; > + int32_t size; > + }; > + > + template > + struct view_header > + { > + data_element header; > + data_element array_flags_header; > + char array_flags[8]; > + data_element dim_header; > + int32_t dim[Dim + Dim%2]; //the dim has to be aligned to an 8 byte boundary > + data_element array_name_header; > + }; > + > + // helper struct to get the imaginary part of a view. > + template + bool IsComplex = > + vsip::impl::Is_complex::value> > + struct Subview_helper; > + > + template > + struct Subview_helper > + { > + typedef typename ViewT::realview_type realview_type; > + typedef typename ViewT::imagview_type imagview_type; > + > + static realview_type real(ViewT v) { return v.real(); } > + static imagview_type imag(ViewT v) { return v.imag(); } > + }; > + > + template > + struct Subview_helper > + { > + typedef ViewT realview_type; > + typedef ViewT imagview_type; > + > + static realview_type real(ViewT v) { return v; } > + static imagview_type imag(ViewT v) { return v; } > + }; > + > + > + // generic reader that allows us to read a generic type and cast to another > + > + // the read function for real or complex depending of the view that was > + // passed in > + template + typename T2, > + typename ViewT> > + void read(std::istream& is,ViewT v) > + { > + vsip::dimension_type const View_dim = ViewT::dim; > + vsip::Index my_index; > + vsip::impl::Length v_extent = extent(v); > + typedef typename vsip::impl::Scalar_of::type scalar_type; > + T1 data; > + > + // get num_points > + vsip::length_type num_points = v.size(); > + > + // read all the points > + for(int i=0;i + is.read(reinterpret_cast(&data),sizeof(data)); Is 'sizeof(data)' the correct size to read here? Moreover, should 'data' really be of type 'T1'? If this is reading in part of a complex array, 'v' will be either the real or imag subview, which would make the correct type for 'data' to be 'scalar_type'. That is how write() below appears to work. If 'data' should be scalar_type, then instead of changing it here, it would be more natural to have 'operator>>' call read() with 'Scalar_type::type' as a parameter. > + put(v,my_index,scalar_type(data)); > + > + // increment index > + my_index = vsip::impl::next(v_extent,my_index); > + } > + > + } > + > + // a write function to output a view to a matlab file. > + template + typename ViewT> > + void write(std::ostream& os,ViewT v) > + { > + vsip::dimension_type const View_dim = ViewT::dim; > + vsip::Index my_index; > + vsip::impl::Length v_extent = extent(v); > + typedef typename vsip::impl::Scalar_of::type scalar_type; Passing T as a template parameter, but than only using Scalar_of seems unintuitive. Let's perform the Scalar_of at the caller of write(). > + scalar_type data; > + > + // get num_points > + vsip::length_type num_points = v.size(); > + > + // write all the points > + for(int i=0;i + data = get(v,my_index); > + os.write(reinterpret_cast(&data),sizeof(data)); > + > + // increment index > + my_index = vsip::impl::next(v_extent,my_index); > + } > + > + } > + > + struct header > + { > + char description[116]; > + char subsyt_data[8]; > + char version[2]; > + char endian[2]; > + }; > + > + // constants for matlab binary format > + > + // data types > + static int const miINT8 = 1; > + static int const miUINT8 = 2; > + static int const miINT16 = 3; > + static int const miUINT16 = 4; > + static int const miINT32 = 5; > + static int const miUINT32 = 6; > + static int const miSINGLE = 7; > + static int const miDOUBLE = 9; > + static int const miINT64 = 12; > + static int const miUINT64 = 13; > + static int const miMATRIX = 14; > + static int const miCOMPRESSED = 15; > + static int const miUTF8 = 16; > + static int const miUTF16 = 17; > + static int const miUTF32 = 18; > + > + // class types > + static int const mxCELL_CLASS = 1; > + static int const mxSTRUCT_CLASS = 2; > + static int const mxOBJECT_CLASS = 3; > + static int const mxCHAR_CLASS = 4; > + static int const mxSPARSE_CLASS = 5; > + static int const mxDOUBLE_CLASS = 6; > + static int const mxSINGLE_CLASS = 7; > + static int const mxINT8_CLASS = 8; > + static int const mxUINT8_CLASS = 9; > + static int const mxINT16_CLASS = 10; > + static int const mxUINT16_CLASS = 11; > + static int const mxINT32_CLASS = 12; > + static int const mxUINT32_CLASS = 13; > + > + // matlab header traits > + template > + struct Matlab_header_traits; > + > + template <> > + struct Matlab_header_traits<1, true, true> // char > + { > + static int const value_type = miINT8; > + static int const class_type = mxINT8_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<1, false, true> // unsigned char > + { > + static int const value_type = miUINT8; > + static int const class_type = mxUINT8_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<2, true, true> // short > + { > + static int const value_type = miINT16; > + static int const class_type = mxINT16_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<2, false, true> // unsigned short > + { > + static int const value_type = miUINT16; > + static int const class_type = mxUINT16_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<4, true, true> // int > + { > + static int const value_type= miINT32; > + static int const class_type= mxINT32_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<4, false, true> // unsigned int > + { > + static int const value_type= miUINT32; > + static int const class_type= mxUINT32_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<4, true, false> // float > + { > + static int const value_type= miSINGLE; > + static int const class_type= mxSINGLE_CLASS; > + }; > + > + template <> > + struct Matlab_header_traits<8, true, false> // double > + { > + static int const value_type= miDOUBLE; > + static int const class_type= mxDOUBLE_CLASS; > + }; > + VSIPL++ has a template classes View_of_dim<> and Col_major<> that can help us out here. Let's define Matlab_desired_LP as: > + // matlab desired layouts > + template