[vsipl++] Matlab IO
Jules Bergmann
jules at codesourcery.com
Wed Jul 12 13:57:16 UTC 2006
Assem Salama wrote:
> Everyone,
> This patch is the new Matlab_IO using iterators.
Assem, Thanks, I have several (8) comments below. -- Jules
[1] What directory are these files going in to?
In general, if you do an 'svn diff' in the top level of your checkout,
it should include the path as part of the filename.
If you want to only diff a subdirectory of your checkout, instead of
going into that directory (and losing the path name), you can do this
from the top level by giving 'svn diff' a path name, i.e.
svn diff vsip_csl
>
>
> ------------------------------------------------------------------------
>
> Index: matlab_file.cpp
> ===================================================================
> --- matlab_file.cpp (revision 0)
> +++ matlab_file.cpp (revision 0)
> @@ -0,0 +1,39 @@
> +/* Copyright (c) 2005, 2006 by CodeSourcery. All rights reserved. */
[2] The copyright shouldn't include 2005 (unless this code was started
back then).
> +
> +/** @file vsip_csl/matlab_file.cpp
> + @author Assem Salama
> + @date 2006-06-21
> + @brief VSIPL++ CodeSourcery Library: Matlab_file class functions
> +*/
> +#include "vsip_csl/matlab_bin_formatter.hpp"
> +#include "vsip_csl/matlab_file.hpp"
[3] In general, these should be angle bracket includes "#include
<...>"
> +
> +namespace vsip_csl
> +{
> +
> +Matlab_file::Matlab_file(std::string fname) :
> + is_(fname.c_str()),
> + begin_iterator_(&is_,false),
> + end_iterator_(&is_,true)
> +
> +{
> + // read header to make sure it is matlab file
> + is_ >> matlab_header_;
> +
> + // get length of file
> + {
> + std::istream::off_type temp_offset = 0;
> + std::istream::pos_type temp_pos = is_.tellg();
> + is_.seekg(temp_offset,std::ios::end);
> + begin_iterator_.set_length(static_cast<uint32_t>(is_.tellg()));
> + is_.seekg(temp_pos);
> + }
> + begin_iterator_.set_endian(matlab_header_.endian == ('I' << 8|'M'));
> +
> + // read first header
> + begin_iterator_.read_header();
> +
> +}
> +
> +}
> Index: matlab_file.hpp
> ===================================================================
> --- matlab_file.hpp (revision 0)
> +++ matlab_file.hpp (revision 0)
> @@ -0,0 +1,217 @@
> +/* Copyright (c) 2005, 2006 by CodeSourcery. All rights reserved. */
> +
> +/** @file vsip_csl/matlab_file.hpp
> + @author Assem Salama
> + @date 2006-06-21
> + @brief VSIPL++ CodeSourcery Library: Matlab file class that
handles
> + Matlab files using an iterator.
> +*/
> +
> +#ifndef VSIP_CSL_MATLAB_FILE_HPP
> +#define VSIP_CSL_MATLAB_FILE_HPP
> +
> +#include <iostream>
> +#include <fstream>
> +#include "vsip_csl/matlab_bin_formatter.hpp"
> +
> +namespace vsip_csl
> +{
> +
> +class Matlab_file
> +{
> + public:
> + // Constructors
> + Matlab_file(std::string fname);
[4] When creating a class, you need to consider
- what constructors are necessary?
- How will copies and assignment will be handled?
- should copy be shallow, deep, or disallowed?
Since you haven't defined a copy constructor, C++ will create a
default copy constructor that does a shallow copy:
Matlab_file(Matlab_file const& obj)
: matlab_header_ (obj.matlab_header_),
is_ (obj.is_),
begin_iterator_(obj.begin_iterator_),
end_iterator_ (obj.end_iterator_)
{}
Shallow copy is OK if the object only has data that does not need
to be deallocated (i.e. an 'int' member variable), refers to data
not owned by the object (such as a pointer to external data that
will not be deallocated when the object is destroyed), or that
manages its own deallocation (such as a reference counted object).
For Matlab_file, whether this is OK depends on std::ifstream.
What does its copy constructor do, and how does this interact
with the destructor? For example, if std::ifstream uses a
reference count to determine how many copies have been made and
only deallocates its resources when the last copy is destroyed,
then a shallow copy is alright. On the other hand, if
std::ifstream does a shallow copy and destroying an ifstream
deallocates resources, then having Matlab_file also do a
shallow copy is dangerous.
I suspect that std::ifstream does the "right thing", making
a shallow copy for Matlab_file OK, but you need to check.
If the default copy is OK, you should include a comment
"default copy constructor is OK."
Another consideration is what is the behavior of a copy? What does
it mean if two Matlab_file objects refer to the same file? Can
iterators from the two objects be used simultaneous, or does
incrementing one iterator affect the state of the other?
One way to prevent changes in one object from affecting the other
is with a deep copy (instead of having both objects refer to the
same file object, have the copy refer to a clone of the original).
Another way is to prevent copies from being made by making the
copy constructor private.
- what should assignment do?
Similar to copy, if you do not provide an assignment operator,
C++ provides one for you that does a shallow assignment. Is this
the right thing?
- Is a destructor necessary?
Is there any cleanup that needs to be done when an object is destroyed?
If you don't specify the destructor, C++ will create a default
destructor that just destroys the object's members:
~Matlab_file() {} // default destructor
Similar to shallow copy, this is OK for non-allocated data,
pointers to external data, and data that cleans itself up when
destroyed.
Is this OK for Matlab_file? In particular what happens to an ifstream
object when destroyed?
If it is OK, you should put a comment like: "default destructor OK",
or explicitly define the default constructor.
> +
> + // classes
> + public:
> + class iterator
> + {
> + public:
> + ifs_(ifs), end_of_file_(end_of_file),
> + read_data_(false) {}
[5] You need to answer the same questions for 'iterator'. Are the default
copy-constructor, assignment operator, and destructor OK?
Unlike Matlab_file, we can't make the copy-constructor and assignment
operator private, because they're needed to use the iterator, i.e.:
Matlab_file mfile("xyz.mat");
Matlab_file::iterator cur = mfile.begin();
Matlab_file::iterator end = mfile.end();
while (cur != end)
{
// process data at 'cur'
++cur;
}
Some questions:
- what does it mean if you call 'mfile.begin()' a second time?
- what are the semantics if you copy an iteraror? How does using
one affect the other
Matlab_file iterator cur = mfile.begin();
Matlab_file iterator copy = begin;
++cur;
Is 'copy' unchanged, updated to reflect the new state of the file, or
invalidated? This needs to be documented.
I suspect the answer is "invalidated", which may be unintuitive to the
user. For example, if the iterator is inadvertently copied to call
a function:
void
process(Matlab_file::iterator cur)
{
cur.read(...);
}
while (cur != end)
{
process(cur);
++cur;
}
The changes to iterator state made inside of 'process' (reading the
file, which sets the 'read_data_' flag) will invalidate 'cur'.
While this can be avoided by requiring the user to pass the
iterator by reference:
void
process(Matlab_file::iterator& cur)
{
cur.read(...);
}
It would be better avoid this altogether.
One way to do this is to have the Matlab_file object maintain the
state, and have the iterators refer to the Matlab_file object, along
with an indication of whether they are the special 'end' iterator.
> +
> + bool is_eof() { return end_of_file_; }
> + void set_endian(bool swap_bytes)
> + { view_header_.swap_bytes = swap_bytes;}
[6] This shouldn't be called 'set_endian' because it is not actually
setting the endianness. 'set_swap_bytes' would be a better name.
> + void set_length(uint32_t length) { length_ = length; }
> + void read_header() { (*ifs_) >> view_header_; }
> + std::istream *get_stream() { return ifs_; }
> +
> + // operators
> + public:
> + iterator& operator++()
> + {
> + if(!read_data_)
> + {
> + // advance file pointer to next header
> + // make sure that we don't go beyond the end of file!
> + if(view_header_.next_header >= length_)
> + end_of_file_ = true;
> + else
> + ifs_->seekg(view_header_.next_header);
> +
> + }
> + if(!end_of_file_) // read next header
> + (*ifs_) >> view_header_;
> +
> + read_data_ = false;
> + return *this;
> + }
> +
> + bool operator==(iterator &i1)
> + {
[7] Since this is a member function of class iterator, it can see the
members of both 'this' and 'i1'. Hence you can do:
return i1.end_of_file == this->end_of_file_;
Which removes the need for the 'is_eof()' member function.
> + return i1.is_eof() == end_of_file_;
> + }
> +
> + bool operator!=(iterator &i1)
> + {
> + return i1.is_eof() != end_of_file_;
> + }
> +
> + Matlab_view_header*
> + operator*()
> + {
> + return &view_header_;
> + }
> +
> +
> + private:
> + Matlab_view_header view_header_;
> + std::ifstream *ifs_;
> + bool end_of_file_;
> + bool read_data_;
> + uint32_t length_;
> + };
> +
> + public:
> + // iterator functions
> + iterator begin() { return begin_iterator_; };
> + iterator end() { return end_iterator_; };
> +
[8] Should this be a member of 'Matlab_file' or of 'Matlab_file::iterator'?
Since the view header is accessed by dereferencing the iterator, making
the read function an iterator member would be more intuitive:
while (cur != end)
{
if (*cur.header->num_dims == 1 &&
*cur.header->class_type == ... float...)
{
length_type size = *cur.header->dims[0];
Vector<float> vec(size);
cur.read(vec);
// process data in 'vec'
}
++cur;
}
> + // read a view from a matlab file after reading the header
> + template <typename T,
> + typename Block0,
> + template <typename,typename> class View>
> + void read_view(View<T,Block0> view, iterator &iter);
> +
> + private:
> + Matlab_bin_hdr matlab_header_;
> + std::ifstream is_;
> + iterator begin_iterator_;
> + iterator end_iterator_;
> +
> +
> +};
> +
> +
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list