Iterators in PHP5

Marco Tabini, head of PHP|architecture magazine, was kind enough to allow me to publish my article from the January issue, although the contract buys all publishing rights. The article is about iterators in PHP5, I thought it’s a topic that should be discussed in a little more detail, since first I like SPL so much, second, I get a lot of questions about it. Oh, and by the way, if you haven’t subscribed to php|a yet, you should put it on your top-priority list.

Well, after proof-reading the article, and (hopefully) getting rid of all the typos and grammatical mistakes, here’s my latest and greatest article; I hope you find it useful.

Blurb (as seen on php|a)

When PHP5 came out, a buzz started about Design Patterns and their application, how to use them, what exactly are “patterns”, and why they are so important. Today, we take a look at one of the famous patterns, the Iterator.

Overview

So what exactly are design patterns? Most of us may have already heard of words like Singleton, Observer, Factory, and others. These are only fancy names for widely accepted solutions to problems that face developers over and over again. Design patterns only outline solutions, they don’t discuss implementation details, or language features, they don’t even have to be object-oriented (although most of them are).

The Iterator Pattern, which is the subject of this article, is one of the most widely used patterns. In theory—and this is where all the intimidating confusion comes from—it’s a standard mechanism to access elements of an object sequentially, without having to expose the underlying representation.

Definitions like the one above can be difficult to grasp at first, it cost me some long and tedious conversations to get the idea behind certain patterns; but enough theory, let’s jump into practice, shall we?

The Iterator

Loops are so common that most of us don’t think about it, we loop over arrays, over a database result, over files in a directory, but no two of those them look the same (see Listing 1). We need to have an agreement on how to loop over collections and data structures; this is where the Iterator Pattern comes to rescue.

Listing 1

<?php

/* Looping over an array */
foreach ( $array as $item ) {
    // Do something with $item
}

/* Looping over a MySQL result set */
while ( $row = mysql_fetch_array($result) ) {
    // Do something with $row
}

/* Looping over files in a directory */
while ( false !== ($file = readdir($dir)) ) {
    // Do something with $file
}

?>

An Iterator is an object that knows how to traverse data structures, from simple arrays to complex N-trees. It hides the implementation details and provides a standard interface for us to use, typically having at least one method to retrieve the next element (that is, moving in one direction). Other iterators might provide methods to retrieve elements in more than one direction, or have different ordering schemes.

Listing 2 shows an example interface that can be used as our iterator. Having this interface available, we can now create classes that implement SimpleIterator, like a SimpleDirectoryIterator (Listing 3) which we can use to loop over files in a directory.

Listing 2

<?php
interface SimpleIterator {    
    public function getNext();    
    public function reset();
}
?>

Listing 3

<?php
class SimpleDirectoryIterator implements SimpleIterator {
    
    private $handle;
    
    public function __construct($path) {
        $this->dir = opendir($path);
    }
    
    public function getNext() {
        return readdir($this->dir);
    }
    
    public function reset() {
        rewinddir($this->dir);
    }
    
    public function __destruct() {
        closedir($this->dir);
    }
    
}

/* Usage: */
$dir = new SimpleDirectoryIterator('/tmp');
while ( $file = $dir->getNext() ) {
    // Do something with $file
}
?>

"But why the hassle?" you may ask, it isn’t as if we "invented" something ground-breakin. Well, the idea is to make all loops look the same, no matter what we are looping over, be it an array, an XML collection, or LDAP listings.

The only problem is that if we, as developers, have to implement iterators ourselves (like what we just did), we’ll lose the whole idea of having an agreement. Different iterators shouldn’t look different. Fortunately, PHP5 already comes with a great extension that saves us the trouble, called Standard PHP Library (or SPL for the acronym-oriented).

SPL (Standard PHP Library)

The SPL extension didn’t get as much buzz as PHP5’s new XML and OOP features, but that didn’t stop it from making it to the final distribution, so you can almost always rely on SPL being available.

This collection of interfaces and classes gives PHP developers a chance to finally agree on something, and stop reinventing the wheel every other project. It’s currently focused on solving one problem—Iterators. Only SPL does it with a twist by letting us use iterators in plain ol’ foreach loops, so we can stop worrying about what goes where.

The code in Listing 3 shows how we would regularly code to loop over a directory using SimpleIterator. It might look OK in other languages, but to those of us with a strong PHP background, the code isn’t really natural.

With SPL’s DirectoryIterator class we can loop over the same directory as if it was an array, plain ol’ foreach (see Listing 4). Essentially, a directory is an array of files names, so why not make it act like one? And if we could make DirectoryIterator act like an array, why not make all iterators act the same? That’s the idea behind SPL.

Listing 4

<?php
$dir = new DirectoryIterator('/tmp');
foreach ( $dir as $file ) {
    // Do something with $file
}
?>

SPL’s abilities don’t stop at making iterators available for foreach constructs, they go as far as chaining multiple iterators together, limiting results, filtering unwanted ones, iterate over recursive structures, even implement our own iterators (see the reference at the end for a complete overview).

Implementing iterators isn’t a difficult task at all, there’s a limited set of methods that SPL interfaces define: implement those and you end up with an iterator that you can plug into foreach. Listing 5 is an example of how we would implement SmartDirectoryIterator which tries to be a little smarter (as the name implies) and stores more information about files in each iteration, such as its size, last modification date, etc. The implementation isn’t perfect though, it’s only an example of SPL’s possibilities and how you can make it work for you.

Listing 5

<?php

class SmartDirectoryIterator implements Iterator {
    
    protected $key;
    protected $current;
    protected $valid;
    
    protected $path;
    protected $handle;

    public function __construct($path) {
        $this->handle = opendir($path);
        $this->path = $path;
    }

    private function getFile() {
        if ( false !== ($file = readdir($this->handle)) ) {
            $path = $this->path . $file;
            $this->current['name'] = $file;
            $this->current['size'] = filesize($path);
            $this->current['modified'] = filemtime($path);
            return true;
        } else {
            return false;
        }
    }

    public function next() {
        $this->valid = $this->getFile();
        $this->key++;
    }

    public function rewind() {
        $this->key = 0;
        rewinddir($this->handle);
        $this->valid = $this->getFile();
    }

    public function valid() {
        return $this->valid;
    }

    public function key() {
        return $this->key;
    }

    public function current() {
        return $this->current;
    }

    public function __destruct( ) {
        closedir($this->handle);
    }

}

/* Usage: */

$dir = new SmartDirectoryIterator('/tmp/');
foreach ( $dir as $file ) {
    echo "$file[name] ($file[size] bytes)\n";
}

?>

Feeling adventurous yet? Let’s take it one step further then.

Filtering the Unwanted

One of SPL’s important features is the ability to filter results based on chosen set of criteria. Of course, we don’t write rules to filter such results, but we create a class that extends the built-in FilterIterator, and define only one method named accept(), now it takes care of "making" the rules and deciding whether to accept a certain element or not.

Lets say we want our directory iterator to drop out all hidden files (files starting with a dot in *NIX), or maybe drop out certain extensions, or how about files with obscene language. With SPL, it’s as simple as an if check.

Listing 6 illustrates how this can be done. ExtensionFilter is a class that accepts two arguments when instantiated, a DirectoryIterator object and the extension to be filtered. It calls its parent’s constructor (FilterIterator::__construct) to make sure everything works fine, and then implements the accept() method which decides which files are to be dropped out.

Listing 6

<?php

class ExtensionFilter extends FilterIterator {
    
    private $ext;
    private $it;
    
    public function __construct(DirectoryIterator $it, $ext) {
        parent::__construct($it);
        $this->it = $it;
        $this->ext = $ext;
    }
    
    public function accept() {
        if ( ! $this->it->isDir() ) {
            $ext = array_pop(explode('.', $this->current()));
            return $ext != $this->ext;
        }
        return true;
    }
}

$filtered = new ExtensionFilter(
                new DirectoryIterator('/tmp'), 'bz2');

foreach ( $filtered as $file ) {
    // Do something with $file
}

?>

On every iteration, accept() checks for unwanted file extensions, and tells the iterator not to accept a file with a match, otherwise the file is passes through. Now wasn’t that easy?

Limiting Results

So now we got rid of unwanted files, but what if there’s just too many of them? If you ever wished for an SQL-like LIMIT in PHP loops, you’d be quite happy with LimitIterator. You don’t need to extend it in order to use it, just pass it an iterator, a starting position and a number of elements to return, that’s it; LimitIterator takes care of the rest. Of course you can also "pipe" it through another a FilterIterator that’s passed through a FilterIterator that’s passed through… you get the idea. Listing 7 is an example of such usage: it gets the first 10 files and then filters ones with bz2 extension.

Listing 7

<?php

$iterator = new ExtensionFilter(
                new LimitIterator(
                    new DirectoryIterator('/tmp'),
                    0, 10),
                'bz2'
            );

foreach ( $iterator as $file ) {
    // Do something with file
}

?>

Recurse Thyself

Not everyone likes recursion, it can be cumbersome to code recursive functions without extra care for terminating conditions. For instance, a common recursive function that every developer has somewhere in his library is a reader that lists all files in a directory and in its subdirectories. Without SPL we would have to write code similar to that in Listing 8: As you can see, it’s not the most intuitive code you can come across, and it takes a good eye to spot the recursive call in the middle.

Listing 8

<?php

function recurse_dir($path) {
    static $depth = 0;
    
    $dir = opendir($path);
    while ( false !== ($file = readdir($dir)) ) {
        if ( $file == '.' || $file == '..' ) {
            continue;
        }
        echo str_repeat("--", $depth) . " $file\n";
        
        if ( is_dir("$path/$file") ) {
            $depth++;
            recurse_dir("$path/$file");
            $depth--;
        }            
    }
    closedir($dir);
}

recurse_dir('/tmp');

?>

Luckily for us, SPL comes with built-in recursive iterators to simplify this kind of tasks. The RecursiveDirectoryIterator can be used to accomplish the same task, with less and better-looking code. Of course, it can’t operate on its own (technically it can, but not recursively), that’s why there’s a RecursiveIteratorIterator, you’ve read it right, it’s really called that.

RecursiveIteratorIterator is essentially an Iterator except it knows how to iterate over classes that implement RecursiveIterator, like our RecursiveDirectoryIterator.

The code in Listing 9 outputs the same listing as the non-Iterator code in Listing 8, except that it’s reduced to only three lines. I can already hear a "Wow. But hey, how does the magic happen?". Let’s a take a look under the hood.

Listing 9

<?php

$dir = new RecursiveIteratorIterator(
           new RecursiveDirectoryIterator('/tmp'), true);

foreach ( $dir as $file ) {
    echo str_repeat("--", $dir->getDepth()) . " $file\n";
}

?>

Implementing Recursive Iterators

At first glance, it might seem difficult to implement the RecursiveIterator interface. I can’t exactly say it’s a no-brainer, it does need some careful coding, but in practice, there’s really not much to it.

By definition, RecursiveIterator is an Iterator, it only adds two additional methods, hasChildren() and getChildren(). These methods are used by the RecursiveIteratorIterator to figure out what to do next. If an $object->hasChildren() then the iterator loops over them and calls the same method for each of those children, moving deeper until it has reached a child with no children (a leaf), then it moves to the next child on the previous level (or depth), and goes on like this.

So we’re going to implement a recursive directory iterator. In order to do that we need a flat directory iterator first, one that doesn’t go into subdirectories. Let’s call it DirIterator to avoid name-clashing with DirectoryIterator. There are five methods that we should implement in order to make it a "real" iterator. rewind() which resets the iterator to the beginning, next() which moves the iterator to the next item, key() returns the key (usually just an index) of the current item, current() which returns the actual item, and valid() which tells whether there are any left items in the list. Listing 10 is an implementation of DirIterator, which is used like all our iterators before, inside a foreach loop; but, it still doesn’t quite do what we want it to do.

Listing 10

<?php

class DirIterator implements Iterator {
    
    protected $dir;
    
    protected $key;
    protected $file;
    protected $valid;
        
    public function __construct($path) {
        $this->dir = opendir($path);
    }
    
    public function __destruct() {
        closedir($this->dir);        
    }
    
    public function rewind() {
        $this->key = 0;
        rewinddir($this->dir);
        $this->next();
    }
    
    public function next() {
        $this->key++;
        $this->valid = false !== ($this->file = readdir($this->dir));
    }
    
    public function key() {
        return $this->key;
    }
    
    public function current() {
        return $this->file;
    }
    
    public function valid() {
        return $this->valid;
    }
    
}

/* Usage: */
$dir = new DirIterator('/tmp');
foreach ( $dir as $file ) {
    // Do something with file
}

?>

Now it’s time for us to make it recurse. Since a RecursiveIterator is already an Iterator we don’t need to redo the work, by simply extending DirIterator and adding hasChildren() and getChildren() methods, we’ll get our own RecursiveDirIterator. In Listing 11, RecursiveDirIterator::hasChildren() returns true only if a file is a directory and isn’t a dot ("." or ".."), otherwise PHP will get upset and tries to open the current directory times and times until it dies with an stack-overflow error; endless recursion. Now when RecusriveDirIterator says it hasChildren(), getChildren() is called and it returns a new instance of RecusriveDirIterator, a child iterator, if you want, only this iterator’s path is a subdirectory. Think of it as calling a function within itself, except this time it’s much easier to code.

Listing 11

<?php

class RecursiveDirIterator extends DirIterator
                            implements RecursiveIterator {
    
    protected $path;
    
    public function __construct($path) {
        parent::__construct($path);
        $this->path = $path;
    }
    
    public function hasChildren() {
        return ( is_dir($this->path . '/' . $this->file) &&
                !($this->file == '.' || $this->file == '..'));        
    }
    
    public function getChildren() {
        return new RecursiveDirIterator($this->path . '/'. $this->file);
    }

}

/* Usage: */
$dir = new RecursiveIteratorIterator(
           new RecursiveDirIterator('/tmp'), true);
foreach ( $dir as $file ) {
    echo str_repeat("--", $dir->getDepth()) . " $file\n";
}

?>

Note that unlike recursive functions, we don’t need to keep track of the current path, what we have read already, and so on. RecursiveIteratorIterator takes care of that by itself.

Turning Objects into Iterators

One interesting interface in SPL is IteratorAggregate. It only has one method to implement, getIterator(), and once it’s implemented, any object can be "iteratable".

But why is this useful? After all, we can directly implement Iterator and so far, we seem to have been able to get along just fine with it. The problem is, in OOP role separation is important, not every object is an iterator, but most objects can provide iterators for their internal data. Take for example a class that represents a student’s profile with some personal information (first name, last name, etc.). A student isn’t an iterator (that is, not related with an is-a relationship), but the information inside it can be iterated through, so we should be able to get a list of a students information without literally looping over the student himself (Listing 12).

Listing 12

<?php

class Student implements IteratorAggregate {
    
    protected $info = array(
        'first_name' => 'John',
        'last_name' => 'Smith',
        'email' => 'john@smith.com'
    );
    
    public function getIterator() {
        return new ArrayObject($this->info);
    }    
    
}

$student = new Student;
foreach ( $student as $info => $value ) {
    echo "$info: $value\n";
}

?>

Suppose that this information is stored in an associative array. With IteratorAggregate available, all we need is to add one method to the student class and let it return an iterator.

Now we can use an instance of Student in a foreach loop without having to implement all five methods of Iterator, implementing those is both logically and practically incorrect.

You’re probably wondering why did we use an ArrayObject to wrap the the $info array in getIterator(). By definition, SPL (and thus PHP) expects getIterator() to return an Iterator (or any class that implements it), it doesn’t care how it represents its data, whether in an array, an XML files, or an SQL database, it only cares what data is provided and it knows how to ask for it. Therefore, SPL defines an ArrayObject class (that implements Iterator) which wraps around an array and is used in getIterator().

However, there’s a subtle difference here. On one hand, looping over an array looks exactly the same as looping over an ArrayObject. On the other hand, we’re technically looping over ArrayObject’s properties and their values rather than array’s keys and values. Yes, that’s right, SPL can take any object and loop over its properties in a foreach loop.

An Iterator… Without an Iterator!

PHP5’s default behavior is to list an object’s properties when you stick it into foreach, that’s great stuff, but when PHP can’t access a property because it isn’t public (private or protected) it just skips it.

This doesn’t really mean that we can’t turn that object into an iterator, except we’ll have to do it from the inside. Take the code in Listing 13, the method showInfo() uses $this to loop over the object, naturally $this has access to all the properties so it doesn’t skip any of them.

Listing 13

<?php

class Car {
    protected $color = 'red';
    protected $wheels = 4;
    protected $brand = 'Audi';
    
    public function showInfo() {
        foreach ( $this as $property => $value ) {
            echo "$property: $value\n";
        }
    }
            
}

$car = new Car;
$car->showInfo();

?>

One Big List

Suppose you have a list of different iterators for different things, like a QueryIterator, DirectoryIterator and a SimpleXMLIterator—and maybe 5 or 10 more—and we really need them all in one go. Now this doesn’t mean that we need to write a loop foreach and everyone of them (no pun intended), SPL provides us with an AppendIterator (a.k.a CombinedIterator).

AppendIterator takes a list of iterators (or classes that implement any Iterator) and loops over them as if they are one huge list, just call append(), pass it an iterator and place an AppendIterator in foreach, it takes care of the rest.

How does it work? SPL has an interface called OuterIterator, it’s used to wrap iterators, think RecursiveIteratorIterator or LimitIterator, these classes wrap around existing iterators and present certain functionality over them (like recursive iteration or limiting results). We too can implement OuterIterators, however we would rarely need that since SPL already does a fine job in filling us with iterators. AppendIterator is one of those OuterIterators, it has all the basic methods (key(), current(), next()…) in addition to getInnerIterator() which returns the iterator that’s currently in use.

What really makes OuterIterator special is that each time it moves from an inner iterator to another it starts acting like it, we can call methods from the inner iterator and expect an overloaded __call() to invoke the method corresponding to the inner iterator.

A little warning though—AppendIterator is still not released with the latest PHP5 distribution (5.0.3), so you might have to wait for a while until the next release, or you can checkout the latest SPL from CVS.

SimpleXML Iteration

PHP5 comes with SimpleXML, a library that simplifies almost all XML-related operations, from loading, to parsing, to iteration. However, SimpleXML’s native iteration isn’t handled by SPL, it’s closer to the way our first SimpleIterator worked, where a SimpleXML object simply provides methods for iteration.

SPL makes SimpleXML even simpler by having SimpleXMLIterator, note that this iterator doesn’t come with SimpleXML, so if you compile PHP with SimpleXML but without SPL you won’t be able to use it.

SimpleXMLIterator is a recursive iterator, although you don’t have to use it recursively (as in use it with a RecursiveIteratorIterator) if you have a flat-listed XML file. Take for example the XML cookbook in Listing 14, a list of recipes with their names, ratings and ingredients. We can load the whole file and pass it to an instance of SimpleXMLIterator and it will take care of the rest (see Listing 15). There’s no need for hair-pulling sessions trying to figure out how the SAX parser works, what DOM is, and dealing with other fancy acronyms. This iterator returns a SimpleXML object on each loop, making it more convenient to access XML elements. As with all iterators, you can filter the results, limit them by offset and length, display only parents, etc.

Listing 14

<?xml version="1.0"?>

<cookbook>

  <recipe>
    <name>Chocolate Muffins</name>
    <rating>Yummy</rating>
    <ingredients>Flour, Cocao, Sugar</ingredients>
  </recipe>
  
  <recipe>
    <name>Caramel Cake</name>
    <rating>Sweet</rating>
    <ingredients>Caramel, Flour, lots of love</ingredients>
  </recipe>

</cookbook>

Listing 15

<?php

$data = file_get_contents('cookbook.xml');

foreach ( new SimpleXMLIterator($data) as $recipe ) {
    echo $recipe->name . "\n";
}

?>

Of course, you can pass a SimpleXMLIterator to a RecursiveIteratorIterator, but in most PHP applications I came across, XML was used to store simple data like configuration files, so recursive iteration is probably not what you’re looking for.

Reference

Here’s a list of the most important interfaces and classes available with SPL

Interfaces:

  • Traversable: An internal interface that should not be implemented, it’s the base interface for all iterators.
  • Iterator: Defines how to iterator forward over a collection.
  • SeekableIterator: Same as Iterator except it can seek to a position in a collection. It can be used with LimitIterator to efficiently move to an offset.
  • RecursiveIterator: Defines how to traverse tree-like collections.
  • IteratorAggregate: Creates an external iterator for an object.

Outer iterators: These classes wrap around iterators and traverse them in different manners.

  • RecursiveIteratorIterator: Takes a class the implements RecursiveIterator and loops over it going deeper with each child.
  • FilterIterator: Filters results based on accept() criteria.
  • LimitIterator: Limits results based on an offset and a length.
  • InfiniteIterator: Takes and iterator and traverses it forever!
  • AppendIterator: Combines a number of iterators and and converts them to one big list.
  • CachingIterator: Pre-fetches the next element in order to know if an iterator has more elements.
  • ParentIterator: Filters out leafs (elements with no children) so that only parents are fetched.
  • ArrayObject: Converts an array into an iterator object.

Useful Iterators: These classes are either built-in or come with SPL as examples in external files.

  • DirectoryIterator: Used to loop over files in a directory and provides some useful methods like isDot() and isFile().
  • RecursiveDirectoryIterator: Same as above only traverses subdirectories.
  • DirectoryTree: An iterator that doesn’t show "." and "..".
  • FindFile: Base class for searching files in a certain path.
  • RegexFindFile: Same as above except it takes a regular expression rather than a file name to search for.
  • SimpleXMLIterator: As shown above, iterates over an XML tree returning SimpleXML objects.

Final Words

I’ve spent a couple of months making actual use of SPL, and I have to say I was impressed. There’s are a lot of libraries that can be converted into easier, understandable, and more importantly, maintainable classes using iterators. Knowing that I’m not limited with SPL, that I can create my own iterators and my colleagues don’t have to worry about it was a big advantage in terms of productivity.

I think SPL is a step forward for PHP, it makes real use of interfaces, defines certain standards that everyone should follow, and it states that PHP canbe a language for the enterprise. Thanks to Marcus Boerger for making it happen.



16 Responses (Add Your Comment)

  1. Wow! Excellent article! Very useful; thank you for the introduction. Am I alone in not knowing the SPL existed? :)

    Reply ↵
  2. I don’t think so, I read a lot of PHP5 code that can be converted to a much more readable code using SPL.

    Thanks for reading this, I’m really glad you liked it.

    Reply ↵
  3. SPL is one of the main reasons that I finnaly installed PHP5, so this article was perfect for me.

    Reply ↵
  4. I think listing 7 has errors:

    $iterator = new ExtensionFilter(
    new LimitIterator(
    new DirectoryIterator(‘/tmp’),
    0, 10),
    ‘bz2′
    );
    should be:
    $iterator = new LimitIterator(
    new ExtensionFilter(
    new DirectoryIterator(‘/tmp’),
    ‘bz2′),
    0, 10);

    Reply ↵
  5. Very nice article. I’ll add a link on the SPL pages as soon as i regenerate the docs & thanks.

    Reply ↵
  6. Thank you Marcus, I’m very glad you liked it.

    Reply ↵
  7. God Article, was looking for somthing like this for a couple of months now.

    Thanks !!

    Reply ↵
  8. Joseph Crawford Jun 11, 2005
    at 12:41am

    great article, very clear and concise.

    thanks ;)

    Reply ↵
  9. Thank you thery match!!! This is a very very usefull article.

    Reply ↵
  10. Great article !

    Reply ↵
  11. nice article …. thanks, was looking for this
    type from many days

    Reply ↵
  12. THANKS! I’m just getting into PHP – working with a bunch of
    experienced PHP people. I stumbled on SPL while working on
    some little utilities of my own. One of my teammates saw my
    code and asked what’s a DirectoryIteratorIterator? When I
    told him about SPL, it sure made me look good!

    Reply ↵
  13. Excellent. This article is a fine departure from the usual SPL articles which assume the reader is familiar with the concept of iterators (for instance from experience with the C++ STL) and really breaks it down with real world examples of usage.

    Reply ↵
  14. Greg Alexander Dec 3, 2006
    at 5:01am

    This is great, I also found an excellent tutorial on SPL at http://phpro.org/tutorials/Introduction-to-SPL.html
    These have been most helpful

    Reply ↵
  15. Great stuff! While I was familiar with most of the things you explained here, it was nice to see an article that discussed most facets of SPL at once, and in depth.

    Reply ↵
  16. A very very good article.

    With the lack of documentation on that, I found this resource to be extremly helpful.

    You should link to this on PHP.net, really!

    So, more people can find your great work!

    Reply ↵

Leave a Reply

Formatting: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Other Entries

Tweets from