Iterators in PHP5
February 25th, 2005 • PHP
Marco Tabini, head of PHP|architecture magazine, was kind enough to allow me to publish my article from the January issue, although the contract buys all publishing rights. The article is about iterators in PHP5, I thought it’s a topic that should be discussed in a little more detail, since first I like SPL so much, second, I get a lot of questions about it. Oh, and by the way, if you haven’t subscribed to php|a yet, you should put it on your top-priority list.
Well, after proof-reading the article, and (hopefully) getting rid of all the typos and grammatical mistakes, here’s my latest and greatest article; I hope you find it useful.
Blurb (as seen on php|a)
When PHP5 came out, a buzz started about Design Patterns and their application, how to use them, what exactly are “patterns”, and why they are so important. Today, we take a look at one of the famous patterns, the Iterator.
Overview
So what exactly are design patterns? Most of us may have already heard of words like Singleton, Observer, Factory, and others. These are only fancy names for widely accepted solutions to problems that face developers over and over again. Design patterns only outline solutions, they don’t discuss implementation details, or language features, they don’t even have to be object-oriented (although most of them are).
The Iterator Pattern, which is the subject of this article, is one of the most widely used patterns. In theory—and this is where all the intimidating confusion comes from—it’s a standard mechanism to access elements of an object sequentially, without having to expose the underlying representation.
Definitions like the one above can be difficult to grasp at first, it cost me some long and tedious conversations to get the idea behind certain patterns; but enough theory, let’s jump into practice, shall we?
The Iterator
Loops are so common that most of us don’t think about it, we loop over arrays, over a database result, over files in a directory, but no two of those them look the same (see Listing 1). We need to have an agreement on how to loop over collections and data structures; this is where the Iterator Pattern comes to rescue.
<?php
/* Looping over an array */
foreach ( $array as $item ) {
// Do something with $item
}
/* Looping over a MySQL result set */
while ( $row = mysql_fetch_array($result) ) {
// Do something with $row
}
/* Looping over files in a directory */
while ( false !== ($file = readdir($dir)) ) {
// Do something with $file
}
?>
An Iterator is an object that knows how to traverse data structures, from simple arrays to complex N-trees. It hides the implementation details and provides a standard interface for us to use, typically having at least one method to retrieve the next element (that is, moving in one direction). Other iterators might provide methods to retrieve elements in more than one direction, or have different ordering schemes.
Listing 2 shows an example interface that can be used as our iterator.
Having this interface available, we can now create classes that implement
SimpleIterator,
like a SimpleDirectoryIterator
(Listing 3) which we can use to loop over files in a directory.
<?php
interface SimpleIterator {
public function getNext();
public function reset();
}
?>
<?php
class SimpleDirectoryIterator implements SimpleIterator {
private $handle;
public function __construct($path) {
$this->dir = opendir($path);
}
public function getNext() {
return readdir($this->dir);
}
public function reset() {
rewinddir($this->dir);
}
public function __destruct() {
closedir($this->dir);
}
}
/* Usage: */
$dir = new SimpleDirectoryIterator('/tmp');
while ( $file = $dir->getNext() ) {
// Do something with $file
}
?>
"But why the hassle?" you may ask, it isn’t as if we "invented" something ground-breakin. Well, the idea is to make all loops look the same, no matter what we are looping over, be it an array, an XML collection, or LDAP listings.
The only problem is that if we, as developers, have to implement iterators ourselves (like what we just did), we’ll lose the whole idea of having an agreement. Different iterators shouldn’t look different. Fortunately, PHP5 already comes with a great extension that saves us the trouble, called Standard PHP Library (or SPL for the acronym-oriented).
SPL (Standard PHP Library)
The SPL extension didn’t get as much buzz as PHP5’s new XML and OOP features, but that didn’t stop it from making it to the final distribution, so you can almost always rely on SPL being available.
This collection of interfaces and classes gives PHP developers a chance
to finally agree on something, and stop reinventing the wheel every
other project. It’s currently focused on solving one problem—Iterators.
Only SPL does it with a twist by letting us use iterators
in plain ol’ foreach
loops, so we can stop worrying about what goes where.
The code in Listing 3 shows how we would regularly code to loop over a
directory using SimpleIterator.
It might look OK in other languages, but to those of us with a strong
PHP background, the code isn’t really natural.
With SPL’s DirectoryIterator
class we can loop over the same directory as if it was an array,
plain ol’ foreach (see Listing 4). Essentially, a directory
is an array of files names, so why not make it act like one?
And if we could make DirectoryIterator
act like an array, why not make all iterators act the same?
That’s the idea behind SPL.
<?php
$dir = new DirectoryIterator('/tmp');
foreach ( $dir as $file ) {
// Do something with $file
}
?>
SPL’s abilities don’t stop at making iterators available for
foreach constructs, they go as far as chaining multiple
iterators together, limiting results, filtering unwanted ones, iterate
over recursive structures, even implement our own iterators (see the
reference at the end for a complete overview).
Implementing iterators isn’t a difficult task at all, there’s a
limited set of methods that SPL interfaces define: implement those
and you end up with an iterator that you can plug into foreach.
Listing 5 is an example of how we would implement
SmartDirectoryIterator
which tries to be a little smarter (as the name implies) and stores
more information about files in each iteration, such as its size,
last modification date, etc. The implementation isn’t perfect though, it’s
only an example of SPL’s possibilities and how you can make it work
for you.
<?php
class SmartDirectoryIterator implements Iterator {
protected $key;
protected $current;
protected $valid;
protected $path;
protected $handle;
public function __construct($path) {
$this->handle = opendir($path);
$this->path = $path;
}
private function getFile() {
if ( false !== ($file = readdir($this->handle)) ) {
$path = $this->path . $file;
$this->current['name'] = $file;
$this->current['size'] = filesize($path);
$this->current['modified'] = filemtime($path);
return true;
} else {
return false;
}
}
public function next() {
$this->valid = $this->getFile();
$this->key++;
}
public function rewind() {
$this->key = 0;
rewinddir($this->handle);
$this->valid = $this->getFile();
}
public function valid() {
return $this->valid;
}
public function key() {
return $this->key;
}
public function current() {
return $this->current;
}
public function __destruct( ) {
closedir($this->handle);
}
}
/* Usage: */
$dir = new SmartDirectoryIterator('/tmp/');
foreach ( $dir as $file ) {
echo "$file[name] ($file[size] bytes)\n";
}
?>
Feeling adventurous yet? Let’s take it one step further then.
Filtering the Unwanted
One of SPL’s important
features is the ability to filter results based on chosen set of criteria.
Of course, we don’t write rules to filter such results, but we create
a class that extends the built-in FilterIterator,
and define only one method named accept(),
now it takes care of "making" the rules and deciding
whether to accept a certain element or not.
Lets say we want our directory iterator to drop out all hidden files
(files starting with a dot in *NIX), or maybe drop out certain
extensions, or how about files with obscene language. With SPL,
it’s as simple as an if check.
Listing 6 illustrates how this can be done. ExtensionFilter
is a class that accepts two arguments when instantiated, a
DirectoryIterator
object and the extension to be filtered. It calls its parent’s
constructor (FilterIterator::__construct)
to make sure everything works fine, and then implements the accept()
method which decides which files are to be dropped out.
<?php
class ExtensionFilter extends FilterIterator {
private $ext;
private $it;
public function __construct(DirectoryIterator $it, $ext) {
parent::__construct($it);
$this->it = $it;
$this->ext = $ext;
}
public function accept() {
if ( ! $this->it->isDir() ) {
$ext = array_pop(explode('.', $this->current()));
return $ext != $this->ext;
}
return true;
}
}
$filtered = new ExtensionFilter(
new DirectoryIterator('/tmp'), 'bz2');
foreach ( $filtered as $file ) {
// Do something with $file
}
?>
On every iteration, accept() checks for unwanted file
extensions, and tells the iterator not to accept a file with a match,
otherwise the file is passes through. Now wasn’t that easy?
Limiting Results
So now we got rid of
unwanted files, but what if there’s just too many of them? If you
ever wished for an SQL-like LIMIT
in PHP loops, you’d be quite happy with LimitIterator.
You don’t need to extend it in order to use it, just pass it an
iterator, a starting position and a number of elements to return,
that’s it; LimitIterator
takes care of the rest. Of course you can also "pipe" it
through another a FilterIterator
that’s passed through a FilterIterator
that’s passed through… you get the idea. Listing 7 is an example of
such usage: it gets the first 10 files and then filters ones with bz2
extension.
<?php
$iterator = new ExtensionFilter(
new LimitIterator(
new DirectoryIterator('/tmp'),
0, 10),
'bz2'
);
foreach ( $iterator as $file ) {
// Do something with file
}
?>
Recurse Thyself
Not everyone likes recursion, it can be cumbersome to code recursive functions without extra care for terminating conditions. For instance, a common recursive function that every developer has somewhere in his library is a reader that lists all files in a directory and in its subdirectories. Without SPL we would have to write code similar to that in Listing 8: As you can see, it’s not the most intuitive code you can come across, and it takes a good eye to spot the recursive call in the middle.
<?php
function recurse_dir($path) {
static $depth = 0;
$dir = opendir($path);
while ( false !== ($file = readdir($dir)) ) {
if ( $file == '.' || $file == '..' ) {
continue;
}
echo str_repeat("--", $depth) . " $file\n";
if ( is_dir("$path/$file") ) {
$depth++;
recurse_dir("$path/$file");
$depth--;
}
}
closedir($dir);
}
recurse_dir('/tmp');
?>
Luckily for us, SPL comes with built-in recursive iterators to simplify this
kind of tasks. The RecursiveDirectoryIterator
can be used to accomplish the same task, with less and better-looking
code. Of course, it can’t operate on its own (technically it can, but
not recursively), that’s why there’s a RecursiveIteratorIterator,
you’ve read it right, it’s really called that.
RecursiveIteratorIterator
is essentially an Iterator
except it knows how to iterate over classes that implement
RecursiveIterator,
like our RecursiveDirectoryIterator.
The code in Listing 9 outputs the same listing as the non-Iterator code in Listing 8, except that it’s reduced to only three lines. I can already hear a "Wow. But hey, how does the magic happen?". Let’s a take a look under the hood.
<?php
$dir = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator('/tmp'), true);
foreach ( $dir as $file ) {
echo str_repeat("--", $dir->getDepth()) . " $file\n";
}
?>
Implementing Recursive Iterators
At first glance, it might
seem difficult to implement the RecursiveIterator
interface. I can’t exactly say it’s a no-brainer, it does need some
careful coding, but in practice, there’s really not much to it.
By definition, RecursiveIterator
is an Iterator,
it only adds two additional methods, hasChildren()
and getChildren().
These methods are used by the RecursiveIteratorIterator
to figure out what to do next. If an $object->hasChildren()
then the iterator loops over them and calls the same method for each
of those children, moving deeper until it has reached a child with no
children (a leaf), then it moves to the next child on the previous
level (or depth), and goes on like this.
So we’re going to implement a recursive directory iterator. In order to
do that we need a flat directory iterator first, one that doesn’t go
into subdirectories. Let’s call it DirIterator
to avoid name-clashing with DirectoryIterator.
There are five methods that we should implement in order to make it a
"real" iterator. rewind()
which resets the iterator to the beginning, next()
which moves the iterator to the next item, key()
returns the key (usually just an index) of the current item,
current()
which returns the actual item, and valid()
which tells whether there are any left items in the list. Listing 10
is an implementation of DirIterator,
which is used like all our iterators before, inside a foreach
loop; but, it still doesn’t quite do what we want it to do.
<?php
class DirIterator implements Iterator {
protected $dir;
protected $key;
protected $file;
protected $valid;
public function __construct($path) {
$this->dir = opendir($path);
}
public function __destruct() {
closedir($this->dir);
}
public function rewind() {
$this->key = 0;
rewinddir($this->dir);
$this->next();
}
public function next() {
$this->key++;
$this->valid = false !== ($this->file = readdir($this->dir));
}
public function key() {
return $this->key;
}
public function current() {
return $this->file;
}
public function valid() {
return $this->valid;
}
}
/* Usage: */
$dir = new DirIterator('/tmp');
foreach ( $dir as $file ) {
// Do something with file
}
?>
Now it’s time for us to make it recurse. Since a RecursiveIterator
is already an Iterator we
don’t need to redo the work, by simply extending DirIterator
and adding hasChildren()
and getChildren()
methods, we’ll get our own RecursiveDirIterator.
In Listing 11, RecursiveDirIterator::hasChildren()
returns true only if a file is a directory and isn’t a dot ("."
or ".."), otherwise PHP will get upset and tries to open
the current directory times and times until it dies with an stack-overflow error;
endless recursion. Now when RecusriveDirIterator
says it hasChildren(),
getChildren()
is called and it returns a new instance of RecusriveDirIterator,
a child iterator, if you want, only this iterator’s path is a
subdirectory. Think of it as calling a function within itself, except
this time it’s much easier to code.
<?php
class RecursiveDirIterator extends DirIterator
implements RecursiveIterator {
protected $path;
public function __construct($path) {
parent::__construct($path);
$this->path = $path;
}
public function hasChildren() {
return ( is_dir($this->path . '/' . $this->file) &&
!($this->file == '.' || $this->file == '..'));
}
public function getChildren() {
return new RecursiveDirIterator($this->path . '/'. $this->file);
}
}
/* Usage: */
$dir = new RecursiveIteratorIterator(
new RecursiveDirIterator('/tmp'), true);
foreach ( $dir as $file ) {
echo str_repeat("--", $dir->getDepth()) . " $file\n";
}
?>
Note that unlike recursive functions, we don’t need to keep track of the
current path, what we have read already, and so on.
RecursiveIteratorIterator takes care of that by itself.
Turning Objects into Iterators
One interesting interface in SPL is IteratorAggregate.
It only has one method to implement, getIterator(),
and once it’s implemented, any object can be "iteratable".
But why is this useful? After all, we can directly implement
Iterator and so far, we seem to have been able to get
along just fine with it. The problem is, in OOP role separation is important, not every object
is an iterator, but most objects can provide iterators for their
internal data. Take for example a class that represents a student’s profile
with some personal information (first name, last name, etc.). A student
isn’t an iterator (that is, not related with an is-a
relationship), but the information inside it can be iterated through, so we
should be able to get a list of a students information without literally
looping over the student himself (Listing 12).
<?php
class Student implements IteratorAggregate {
protected $info = array(
'first_name' => 'John',
'last_name' => 'Smith',
'email' => 'john@smith.com'
);
public function getIterator() {
return new ArrayObject($this->info);
}
}
$student = new Student;
foreach ( $student as $info => $value ) {
echo "$info: $value\n";
}
?>
Suppose that this information is stored in an associative array.
With IteratorAggregate available, all we need is to add
one method to the student class and let it return an iterator.
Now we can use an instance of Student in a foreach
loop without having to implement all five methods of Iterator,
implementing those is both logically and practically incorrect.
You’re probably wondering why did we use an ArrayObject
to wrap the the $info
array in getIterator().
By definition, SPL (and thus PHP) expects getIterator()
to return an Iterator
(or any class that implements it), it doesn’t care how it represents
its data, whether in an array, an XML files, or an SQL database, it
only cares what data is provided and it knows how to ask for it. Therefore,
SPL defines an ArrayObject
class (that implements Iterator)
which wraps around an array and is used in getIterator().
However, there’s a subtle difference here. On one hand, looping over an array
looks exactly the same as looping over an ArrayObject.
On the other hand, we’re technically looping over ArrayObject’s
properties and their values rather than array’s keys and values. Yes,
that’s right, SPL can take any object and loop over its properties in
a foreach loop.
An Iterator… Without an Iterator!
PHP5’s default behavior is
to list an object’s properties when you stick it into foreach,
that’s great stuff, but when PHP can’t access a property because it
isn’t public (private or protected) it just skips it.
This doesn’t really mean that we can’t turn that object into an iterator,
except we’ll have to do it from the inside. Take the code in Listing
13, the method showInfo()
uses $this
to loop over the object, naturally $this
has access to all the properties so it doesn’t skip any of them.
<?php
class Car {
protected $color = 'red';
protected $wheels = 4;
protected $brand = 'Audi';
public function showInfo() {
foreach ( $this as $property => $value ) {
echo "$property: $value\n";
}
}
}
$car = new Car;
$car->showInfo();
?>
One Big List
Suppose you have a list of
different iterators for different things, like a QueryIterator,
DirectoryIterator
and a SimpleXMLIterator—and maybe 5 or 10 more—and
we really need them all in one go. Now
this doesn’t mean that we need to write a loop foreach
and everyone of them (no pun intended), SPL provides us with an
AppendIterator
(a.k.a CombinedIterator).
AppendIterator
takes a list of iterators (or classes that implement any
Iterator) and loops over them as if they are one huge list, just call
append(), pass it an iterator and place an
AppendIterator in foreach,
it takes care of the rest.
How does it work? SPL has an interface called OuterIterator,
it’s used to wrap iterators, think RecursiveIteratorIterator
or LimitIterator, these classes wrap around existing iterators
and present certain functionality over them (like recursive iteration or limiting
results). We too can implement OuterIterators,
however we would rarely need that since SPL already does a fine job
in filling us with iterators. AppendIterator
is one of those OuterIterators,
it has all the basic methods (key(),
current(),
next()…)
in addition to getInnerIterator()
which returns the iterator that’s currently in use.
What really makes OuterIterator special is that each time
it moves from an inner iterator to another it starts acting like it, we
can call methods from the inner iterator and expect an overloaded
__call() to invoke the method corresponding to the inner
iterator.
A little warning though—AppendIterator
is still not released with the latest PHP5 distribution (5.0.3), so
you might have to wait for a while until the next release, or you can
checkout the latest SPL from CVS.
SimpleXML Iteration
PHP5 comes with SimpleXML, a library that simplifies almost all
XML-related operations, from loading, to parsing, to iteration.
However, SimpleXML’s native iteration isn’t handled by SPL, it’s
closer to the way our first SimpleIterator worked,
where a SimpleXML object simply provides methods for iteration.
SPL makes SimpleXML even simpler by having SimpleXMLIterator,
note that this iterator doesn’t come with SimpleXML, so if you
compile PHP with SimpleXML but without SPL you won’t be able to use
it.
SimpleXMLIterator is a recursive iterator, although you
don’t have to use it recursively (as in use it with a
RecursiveIteratorIterator) if you have a flat-listed XML
file. Take for example the XML cookbook in Listing 14, a list of recipes
with their names, ratings and ingredients. We can load the whole file
and pass it to an instance of SimpleXMLIterator
and it will take care of the rest (see Listing 15). There’s no need
for hair-pulling sessions trying to figure out how the SAX parser
works, what DOM is, and dealing with other fancy acronyms. This iterator
returns a SimpleXML object on each loop, making it more convenient to access
XML elements. As with all iterators, you can filter the results,
limit them by offset and length, display only parents, etc.
<?xml version="1.0"?>
<cookbook>
<recipe>
<name>Chocolate Muffins</name>
<rating>Yummy</rating>
<ingredients>Flour, Cocao, Sugar</ingredients>
</recipe>
<recipe>
<name>Caramel Cake</name>
<rating>Sweet</rating>
<ingredients>Caramel, Flour, lots of love</ingredients>
</recipe>
</cookbook>
<?php
$data = file_get_contents('cookbook.xml');
foreach ( new SimpleXMLIterator($data) as $recipe ) {
echo $recipe->name . "\n";
}
?>
Of course, you can pass a SimpleXMLIterator
to a RecursiveIteratorIterator,
but in most PHP applications I came across, XML was used to store simple
data like configuration files, so recursive iteration is probably not
what you’re looking for.
Reference
Here’s a list of the most important interfaces and classes available with SPL
Interfaces:
- Traversable: An internal interface that should not be implemented, it’s the base interface for all iterators.
- Iterator: Defines how to iterator forward over a collection.
- SeekableIterator: Same as Iterator except it can seek to a position in a collection. It can be used with LimitIterator to efficiently move to an offset.
- RecursiveIterator: Defines how to traverse tree-like collections.
- IteratorAggregate: Creates an external iterator for an object.
Outer iterators: These classes wrap around iterators and traverse them in different manners.
- RecursiveIteratorIterator: Takes a class the implements RecursiveIterator and loops over it going deeper with each child.
- FilterIterator:
Filters results based on
accept()criteria. - LimitIterator: Limits results based on an offset and a length.
- InfiniteIterator: Takes and iterator and traverses it forever!
- AppendIterator: Combines a number of iterators and and converts them to one big list.
- CachingIterator: Pre-fetches the next element in order to know if an iterator has more elements.
- ParentIterator: Filters out leafs (elements with no children) so that only parents are fetched.
- ArrayObject: Converts an array into an iterator object.
Useful Iterators: These classes are either built-in or come with SPL as examples in external files.
- DirectoryIterator:
Used to loop over files in a directory and provides some useful
methods like
isDot()andisFile(). - RecursiveDirectoryIterator: Same as above only traverses subdirectories.
- DirectoryTree: An iterator that doesn’t show "." and "..".
- FindFile: Base class for searching files in a certain path.
- RegexFindFile: Same as above except it takes a regular expression rather than a file name to search for.
- SimpleXMLIterator: As shown above, iterates over an XML tree returning SimpleXML objects.
Final Words
I’ve spent a couple of months making actual use of SPL, and I have to say I was impressed. There’s are a lot of libraries that can be converted into easier, understandable, and more importantly, maintainable classes using iterators. Knowing that I’m not limited with SPL, that I can create my own iterators and my colleagues don’t have to worry about it was a big advantage in terms of productivity.
I think SPL is a step forward for PHP, it makes real use of interfaces, defines certain standards that everyone should follow, and it states that PHP canbe a language for the enterprise. Thanks to Marcus Boerger for making it happen.
16 Responses (Add Your Comment)
-
-
I don’t think so, I read a lot of PHP5 code that can be converted to a much more readable code using SPL.
Thanks for reading this, I’m really glad you liked it.
Reply ↵ -
Teddy Apr 17, 2005at 10:43pm
SPL is one of the main reasons that I finnaly installed PHP5, so this article was perfect for me.
Reply ↵ -
stancell Apr 23, 2005at 5:31am
I think listing 7 has errors:
$iterator = new ExtensionFilter(
new LimitIterator(
new DirectoryIterator(‘/tmp’),
0, 10),
‘bz2′
);
should be:
$iterator = new LimitIterator(
new ExtensionFilter(
new DirectoryIterator(‘/tmp’),
‘bz2′),
0, 10);Reply ↵ -
Very nice article. I’ll add a link on the SPL pages as soon as i regenerate the docs & thanks.
Reply ↵ -
Thank you Marcus, I’m very glad you liked it.
Reply ↵ -
Maciek May 14, 2005at 11:56am
God Article, was looking for somthing like this for a couple of months now.
Thanks !!
Reply ↵ -
Joseph Crawford Jun 11, 2005at 12:41am
great article, very clear and concise.
thanks ;)
Reply ↵ -
Thank you thery match!!! This is a very very usefull article.
Reply ↵ -
Great article !
Reply ↵ -
nice article …. thanks, was looking for this
type from many daysReply ↵ -
Grant Parks Jun 21, 2006at 7:30am
THANKS! I’m just getting into PHP – working with a bunch of
experienced PHP people. I stumbled on SPL while working on
some little utilities of my own. One of my teammates saw my
code and asked what’s a DirectoryIteratorIterator? When I
told him about SPL, it sure made me look good!Reply ↵ -
Excellent. This article is a fine departure from the usual SPL articles which assume the reader is familiar with the concept of iterators (for instance from experience with the C++ STL) and really breaks it down with real world examples of usage.
Reply ↵ -
Greg Alexander Dec 3, 2006at 5:01am
This is great, I also found an excellent tutorial on SPL at http://phpro.org/tutorials/Introduction-to-SPL.html
These have been most helpfulReply ↵ -
Great stuff! While I was familiar with most of the things you explained here, it was nice to see an article that discussed most facets of SPL at once, and in depth.
Reply ↵ -
dr.colossos Sep 15, 2008at 11:26pm
A very very good article.
With the lack of documentation on that, I found this resource to be extremly helpful.
You should link to this on PHP.net, really!
So, more people can find your great work!
Reply ↵
Wow! Excellent article! Very useful; thank you for the introduction. Am I alone in not knowing the SPL existed? :)