Wednesday, May 14, 2014

Building custom iterators

Once one realizes that STL algorithms open the doors for expression, readability, maintenance and reusability, he wants to have it for the classes he creates.

Of course, you shouldn't expose the internals of your class, so you need a way to offer exploration of your object's data, without exposing implementation details, like the underlying data structures:

The quick-and-dirty way is to  provide algorithms for your class.

class Polygon {
        vector<Point> vertices;
    public:
        template <class Func>
        Func for_each_vertex ( Func f );  // access by vertex

        template <class Func>
        Func for_each_edge ( Func f );  // access by edge
};


However, this gives you limited expression. You soon have to write versions for find(), find_if(), copy(), copy_if(), transform(), count_if(), all_of(), any_of(), none_of(), and you end up replicating part of STL algorithms for each class, for each access type (by vertex, by edge).

Read this article, to see how you can implement this interface:

class Polygon {
        vector<Point> vertices;
    public:
        vertex_iterator begin_vertex();
        vertex_iterator end_vertex();

        edge_iterator begin_edge();
        edge_iterator end_edge();
};


Offering a vertex_iterator is very easy, because it is the same as giving an iterator to the underlying structure, vector<Point>.

To do this, you simply use typedef :

class Polygon {
        vector<Point> vertices;
    public:
        typedef vector<Point>::iterator vertex_iterator;

        vertex_iterator begin_vertex() { return vertices.begin(); }
        vertex_iterator end_vertex()   { return vertices.end(); }
        ...
};


To export edge_iterator though, you have to provide a custom iterator. This is because there is no direct representation of edges inside Polygon.

How do I create a custom iterator?

First, you have to decide how to define the state of your iterator, and how to define begin and end conditions.
In our case, it looks like we can always get an edge if we have an iterator to the first point of the edge. Thus, the state of the edge iterator could be an iterator to the vector<Point>.
Another peculiarity of the edge_iterator, is that the last edge should be composed of the last point and the first point of the vector.
In order for the edge_iterator to have access to first and last point, we need a pointer to the container, too:

class edge_iterator {
    vector<Point>::iterator m_curr_iter; // the state
    vector<Point> *m_container;
};

Then, you have to decide what the contents of the iterator will be. In other words, you have to decide on the value type, when the iterator is dereferenced

In our case, the dereference operator of the iterator should return an Edge. Through that Edge, we should be able to read and write the Points of the polygon. This is why Edge is composed of references to the Points of the Polygon.

struct Edge {
    Point &first;
    Point &second;

    Edge (Point &a, Point &b) : first(a), second(b) { } // convenience constructor
};

The dereference operator is thus written like:

Edge edge_iterator::operator * () {
    if (std::next(m_curr_iter) == m_container->end()) {
        return Edge (*m_curr_iter, *m_container->begin());  // wrap around
    } else {
        return Edge (*m_curr_iter, *std::next(m_curr_iter)); // normal case
    }
}

How many functions must a custom iterator implement ?

  • iterators should be constructed
  • iterators should be copyable
  • iterators should be assignable
  • iterators should increment (and decrement if possible)
  • iterators should dereference
You can also see a complete list here.

This requirements translate to this interface:

class edge_iterator {
    vector<Point>::iterator m_curr_iter; // the state
    vector<Point> *m_container;
public:
    edge_iterator (vector<Point>::iterator pos, vector<Point> &container);    // ctor
    edge_iterator (const edge_iterator &); // copy ctor

    edge_iterator& operator= (const edge_iterator& other);  // assignment

    edge_iterator& operator++ ();   // pre-increment
    edge_iterator operator++ (int); // post-increment

    Edge    operator* ();  // dereference eg: (*it).first
    EdgePtr operator-> (); // dereference eg: it->first
};

I have ommitted the const counterparts of operator* and operator-> , for clarity.

Let's start the implementation:

edge_iterator::edge_iterator (vector<Point>::iterator pos,
                              vector<Point> &container) 
        : m_curr_iter(pos), 
          m_container(&container)
{
}

Αlthough I need a reference to the container to construct the edge_iterator, I store the pointer of the container
  • I need a reference as the constructor parameter, because I want to make sure that I have a container.
  • I need to store a pointer, otherwise the assignment operator would not compile.

edge_iterator& edge_iterator::operator= (const edge_iterator& rhs)
{
    m_curr_iter = rhs.m_curr_iter;
    m_container = rhs.m_container; // pointer, because ref cannot be copied
    return *this;
}

The increment operators should be distinguished from each other. Beware that the post increment returns a copy of itself, and then increments itself.

edge_iterator& edge_iterator::operator++ ()    {  // pre-increment
{
    ++m_curr_iter;
    return *this;
}

edge_iterator  edge_iterator::operator++ (int) { // post-increment
{
    vector<Point>::iterator ret = m_curr_iter;
    ++(*this);
    return ret;
}


A tricky piece was the arrow-dereference operator:

EdgePtr edge_iterator::operator-> () {
    if (std::next(m_curr_iter) == m_container->end()) {
        return EdgePtr (*m_curr_iter, *m_container->begin());  // wrap around
    } else {
        return EdgePtr (*m_curr_iter, *std::next(m_curr_iter)); // normal case
    }
}

Arrow-dereference operators generally require that you pass a pointer to the data. But there is no Edge stored anywhere, so what pointer should I return? Edge is just a temporary object.
I therefore created a fake pointer, or rather an object that will behave like a pointer. Here it is:

struct EdgePtr {
    Edge data;

    EdgePtr (Point &a, Point &b) : data(a, b) { }

    Edge &operator* () { return e; }
    Edge *operator->() { return &e; }
};

This object acts like a pointer, but it is not. It is just the same as the data it points to!

Is that all ?


The interface I have provided only allows the iterator to increment by one.
I could have provided decrement operators so that you can iterate in a reverse manner.
I could have also provided operator+= and operator-= so that it behaves like a random access iterator.

You can also provide iterator_traits, which are just a bunch of typedefs (no memory overhead), so that some algorithms can make use of the extra information of the traits.

Finally


To be complete, the interface could be extended to look like so:

class edge_iterator : std::iterator<random_access_iterator_tag,
                                    Edge,     // `value_type`
                                    vector<Point>::iterator:: difference_type,
                                    EdgePtr,  // `pointer`
                                    Edge>     // `reference`
{
    vector<Point>::iterator m_curr_iter; // the state
    vector<Point> *m_container;

public:
    edge_iterator (vector<Point>::iterator pos, vector<Point> &container);    // ctor
    edge_iterator (const edge_iterator &); // copy ctor

    edge_iterator& operator= (const edge_iterator& other);  // assignment
    edge_iterator& operator++ ();   // pre-increment
    edge_iterator operator++ (int); // post-increment

    reference operator* ();  // dereference eg: (*it).first
    pointer   operator-> (); // dereference eg: it->first

    edge_iterator& operator-- ();   // pre-decrement
    edge_iterator operator-- (int); // post-decrement

    edge_iterator& operator+= (const difference_type &);
    edge_iterator& operator-= (
const difference_type &);

    edge_iterator operator+ (const difference_type &);
    edge_iterator operator- (
const difference_type &);

    friend bool operator<  (const edge_iterator& rhs);
    friend bool operator>  (const edge_iterator& rhs);
    friend bool operator<= (const edge_iterator& rhs);
    friend bool operator>= (const edge_iterator& rhs);

};

The extra lines are necessary to allow the iterator to behave like a random_access_iterator.
reference, pointer, and difference_type 
are implicitly defined by inheriting from std::iterator.

No comments:

Post a Comment