[vsipl++] Matlab IO

Wed Jul 12 13:57:16 UTC 2006

Assem Salama wrote:
 > Everyone,
 >  This patch is the new Matlab_IO using iterators.

Assem, Thanks, I have several (8) comments below.  -- Jules

[1] What directory are these files going in to?

In general, if you do an 'svn diff' in the top level of your checkout,
it should include the path as part of the filename.

If you want to only diff a subdirectory of your checkout, instead of
going into that directory (and losing the path name), you can do this
from the top level by giving 'svn diff' a path name, i.e.

	svn diff vsip_csl

 >
 >
 > ------------------------------------------------------------------------
 >
 > Index: matlab_file.cpp
 > ===================================================================
 > --- matlab_file.cpp	(revision 0)
 > +++ matlab_file.cpp	(revision 0)
 > @@ -0,0 +1,39 @@
 > +/* Copyright (c) 2005, 2006 by CodeSourcery.  All rights reserved. */

[2] The copyright shouldn't include 2005 (unless this code was started
back then).

 > +
 > +/** @file    vsip_csl/matlab_file.cpp
 > +    @author  Assem Salama
 > +    @date    2006-06-21
 > +    @brief   VSIPL++ CodeSourcery Library: Matlab_file class functions
 > +*/
 > +#include "vsip_csl/matlab_bin_formatter.hpp"
 > +#include "vsip_csl/matlab_file.hpp"

[3] In general, these should be angle bracket includes "#include
<...>"

 > +
 > +namespace vsip_csl
 > +{
 > +
 > +Matlab_file::Matlab_file(std::string fname) :
 > +  is_(fname.c_str()),
 > +  begin_iterator_(&is_,false),
 > +  end_iterator_(&is_,true)
 > +
 > +{
 > +  // read header to make sure it is matlab file
 > +  is_ >> matlab_header_;
 > +
 > +  // get length of file
 > +  {
 > +    std::istream::off_type temp_offset = 0;
 > +    std::istream::pos_type temp_pos = is_.tellg();
 > +    is_.seekg(temp_offset,std::ios::end);
 > +    begin_iterator_.set_length(static_cast<uint32_t>(is_.tellg()));
 > +    is_.seekg(temp_pos);
 > +  }
 > +  begin_iterator_.set_endian(matlab_header_.endian == ('I' << 8|'M'));
 > +
 > +  // read first header
 > +  begin_iterator_.read_header();
 > +
 > +}
 > +
 > +}

 > Index: matlab_file.hpp
 > ===================================================================
 > --- matlab_file.hpp	(revision 0)
 > +++ matlab_file.hpp	(revision 0)
 > @@ -0,0 +1,217 @@
 > +/* Copyright (c) 2005, 2006 by CodeSourcery.  All rights reserved. */
 > +
 > +/** @file    vsip_csl/matlab_file.hpp
 > +    @author  Assem Salama
 > +    @date    2006-06-21
 > +    @brief   VSIPL++ CodeSourcery Library: Matlab file class that 
handles
 > +             Matlab files using an iterator.
 > +*/
 > +
 > +#ifndef VSIP_CSL_MATLAB_FILE_HPP
 > +#define VSIP_CSL_MATLAB_FILE_HPP
 > +
 > +#include <iostream>
 > +#include <fstream>
 > +#include "vsip_csl/matlab_bin_formatter.hpp"
 > +
 > +namespace vsip_csl
 > +{
 > +
 > +class Matlab_file
 > +{
 > +  public:
 > +    // Constructors
 > +    Matlab_file(std::string fname);

[4] When creating a class, you need to consider

  - what constructors are necessary?

  - How will copies and assignment will be handled?
  - should copy be shallow, deep, or disallowed?

    Since you haven't defined a copy constructor, C++ will create a
    default copy constructor that does a shallow copy:

    	Matlab_file(Matlab_file const& obj)
	 : matlab_header_ (obj.matlab_header_),
	   is_            (obj.is_),
	   begin_iterator_(obj.begin_iterator_),
	   end_iterator_  (obj.end_iterator_)
	{}

    Shallow copy is OK if the object only has data that does not need
    to be deallocated (i.e. an 'int' member variable), refers to data
    not owned by the object (such as a pointer to external data that
    will not be deallocated when the object is destroyed), or that
    manages its own deallocation (such as a reference counted object).

    For Matlab_file, whether this is OK depends on std::ifstream.
    What does its copy constructor do, and how does this interact
    with the destructor?  For example, if std::ifstream uses a
    reference count to determine how many copies have been made and
    only deallocates its resources when the last copy is destroyed,
    then a shallow copy is alright.  On the other hand, if
    std::ifstream does a shallow copy and destroying an ifstream
    deallocates resources, then having Matlab_file also do a
    shallow copy is dangerous.

    I suspect that std::ifstream does the "right thing", making
    a shallow copy for Matlab_file OK, but you need to check.
    If the default copy is OK, you should include a comment
    "default copy constructor is OK."

    Another consideration is what is the behavior of a copy? What does
    it mean if two Matlab_file objects refer to the same file?  Can
    iterators from the two objects be used simultaneous, or does
    incrementing one iterator affect the state of the other?

    One way to prevent changes in one object from affecting the other
    is with a deep copy (instead of having both objects refer to the
    same file object, have the copy refer to a clone of the original).

    Another way is to prevent copies from being made by making the
    copy constructor private.

  - what should assignment do?

    Similar to copy, if you do not provide an assignment operator,
    C++ provides one for you that does a shallow assignment.  Is this
    the right thing?

  - Is a destructor necessary?

    Is there any cleanup that needs to be done when an object is destroyed?

    If you don't specify the destructor, C++ will create a default
    destructor that just destroys the object's members:

    	~Matlab_file() {} // default destructor

    Similar to shallow copy, this is OK for non-allocated data,
    pointers to external data, and data that cleans itself up when
    destroyed.

    Is this OK for Matlab_file?  In particular what happens to an ifstream
    object when destroyed?

    If it is OK, you should put a comment like: "default destructor OK",
    or explicitly define the default constructor.

 > +
 > +  // classes
 > +  public:
 > +    class iterator
 > +    {
 > +      public:

 > +	  ifs_(ifs), end_of_file_(end_of_file),
 > +	  read_data_(false) {}

[5] You need to answer the same questions for 'iterator'.  Are the default
copy-constructor, assignment operator, and destructor OK?

Unlike Matlab_file, we can't make the copy-constructor and assignment
operator private, because they're needed to use the iterator, i.e.:

	Matlab_file mfile("xyz.mat");
	Matlab_file::iterator cur = mfile.begin();
	Matlab_file::iterator end = mfile.end();

	while (cur != end)
	{
	  // process data at 'cur'
	  ++cur;
	}

Some questions:

  - what does it mean if you call 'mfile.begin()' a second time?

  - what are the semantics if you copy an iteraror?  How does using
    one affect the other

    	Matlab_file iterator cur  = mfile.begin();
	Matlab_file iterator copy = begin;

	++cur;

    Is 'copy' unchanged, updated to reflect the new state of the file, or
    invalidated?  This needs to be documented.

    I suspect the answer is "invalidated", which may be unintuitive to the
    user.  For example, if the iterator is inadvertently copied to call
    a function:

    	void
	process(Matlab_file::iterator cur)
	{
	  cur.read(...);
         }

	while (cur != end)
	{
	  process(cur);
	  ++cur;
	}

    The changes to iterator state made inside of 'process' (reading the
    file, which sets the 'read_data_' flag) will invalidate 'cur'.

    While this can be avoided by requiring the user to pass the
    iterator by reference:

    	void
	process(Matlab_file::iterator& cur)
	{
	  cur.read(...);
         }

    It would be better avoid this altogether.

    One way to do this is to have the Matlab_file object maintain the
    state, and have the iterators refer to the Matlab_file object, along
    with an indication of whether they are the special 'end' iterator.

 > +
 > +	bool is_eof() { return end_of_file_; }

 > +	void set_endian(bool swap_bytes)
 > +	  { view_header_.swap_bytes = swap_bytes;}

[6] This shouldn't be called 'set_endian' because it is not actually
setting the endianness.  'set_swap_bytes' would be a better name.

 > +	void set_length(uint32_t length) { length_ = length; }
 > +	void read_header() { (*ifs_) >> view_header_; }
 > +	std::istream *get_stream() { return ifs_; }
 > +
 > +      // operators
 > +      public:
 > +        iterator& operator++()
 > +	{
 > +	  if(!read_data_)
 > +	  {
 > +	    // advance file pointer to next header
 > +	    // make sure that we don't go beyond the end of file!
 > +	    if(view_header_.next_header >= length_)
 > +	      end_of_file_ = true;
 > +	    else
 > +	      ifs_->seekg(view_header_.next_header);
 > +
 > +	  }
 > +	  if(!end_of_file_) // read next header
 > +	    (*ifs_) >> view_header_;
 > +
 > +	  read_data_ = false;
 > +	  return *this;
 > +	}
 > +
 > +	bool operator==(iterator &i1)
 > +	{

[7] Since this is a member function of class iterator, it can see the
members of both 'this' and 'i1'.  Hence you can do:

	return i1.end_of_file == this->end_of_file_;

Which removes the need for the 'is_eof()' member function.

 > +	  return i1.is_eof() == end_of_file_;
 > +	}
 > +
 > +	bool operator!=(iterator &i1)
 > +	{
 > +	  return i1.is_eof() != end_of_file_;
 > +	}
 > +
 > +	Matlab_view_header*
 > +	operator*()
 > +	{
 > +	  return &view_header_;
 > +	}
 > +
 > +
 > +      private:
 > +        Matlab_view_header view_header_;
 > +	std::ifstream *ifs_;
 > +	bool end_of_file_;
 > +	bool read_data_;
 > +	uint32_t length_;
 > +    };
 > +
 > +  public:
 > +    // iterator functions
 > +    iterator begin() { return begin_iterator_; };
 > +    iterator end() { return end_iterator_; };
 > +

[8] Should this be a member of 'Matlab_file' or of 'Matlab_file::iterator'?

Since the view header is accessed by dereferencing the iterator, making
the read function an iterator member would be more intuitive:

	while (cur != end)
	{
	  if (*cur.header->num_dims == 1 &&
	      *cur.header->class_type == ... float...)
	  {
	    length_type size = *cur.header->dims[0];
	    Vector<float> vec(size);
	    cur.read(vec);
	    // process data in 'vec'
	  }
	  ++cur;
	}

 > +    // read a view from a matlab file after reading the header
 > +    template <typename T,
 > +	      typename Block0,
 > +	      template <typename,typename> class View>
 > +    void read_view(View<T,Block0> view, iterator  &iter);
 > +
 > +  private:
 > +    Matlab_bin_hdr                    matlab_header_;
 > +    std::ifstream                     is_;
 > +    iterator                          begin_iterator_;
 > +    iterator                          end_iterator_;
 > +
 > +
 > +};
 > +
 > +

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705