2012-07-31

What Iterators Can Do For You

Basically Iterators provide a list interface for an object. Like all interfaces they are a contract how something can be used. If you use an interface it is not relevant how it is implemented - the implementation logic is encapsulated.

It is of course relevant on the integration level. A bad implementation can impact the performance of you application. Even an good implementation may need special resources (like a database). But all this does not impact how you use it. Your code using the object with the Iterator interface stays the same.

Let's start with a simple example that outputs a list.

$elements = array(
  'line one', 'line two'
);

foreach ($lines as $key => $value) {
  echo $key.': '.$value."\n";
}


If we transfer this into an object it would like that:

class MyProjectLineOutput {
  private $_lines = NULL;

  public function __construct($lines) {
    $this->_lines = $lines;
  }

  public function __invoke() {
    foreach ($this->_lines as $key => $value) {
      echo $key.': '.$value."\n";
    }
  }
}

$output = new MyProjectLineOutput(
  array(
    'line one', 'line two'
  )
);
$output();


On the first glance that looks like a lot more work but it isn't. The code includes two tasks. Get the lines and output them. The class encapsulates the output task and makes it reusable. In this simple example that may look superfluous but think a little larger. Like output an select-field or csv.

Encapsulate file()


Still we haven't used an Iterator, but just encapsulated the output. PHP provides several default iterators and one of them is the ArrayIterator.

$output = new MyProjectLineOutput(
  new ArrayIterator(
    array(
      'line one', 'line two'
    )
  )
);
$output();


The ArrayIterator just takes an array and makes it an Iterator. It is mostly used to implement another interface - IteratorAggregate. Both the "Iterator" and the "IteratorAggregate" interfaces inherit from a common ancestor named "Traversable". You can not implement "Traversable" directly but use it to validate if an object is traversable or in other words can be used with foreach.

Now let's load the lines from a file.

class MyProjectFile implements IteratorAggregate {
  private $_file;

  public function __construct($file) {
    $this->_file = $file;
  }

  public function getIterator() {
    return new ArrayIterator(file($this->_file));
  }
}

$output = new MyProjectLineOutput(
  new MyProjectFile('sample.txt')
);
$output();


The main difference between using file directly to this is that the file() is accessed later in the process. The foreach() inside the MyProjectLineOutput::output() method calls MyProjectFile::getIterator(). Until then we can pass the instance of MyProjectFile around without loading the file into memory. Unlike a direct call to file() we don't pass the concrete data around but an information how it can be obtained.

Iterating A Text File


Implementing Iterator we can make sure that only a part of the file needs to be loaded.

class MyProjectFileUnbuffered implements Iterator {
  private $_file;
  private $_handle = NULL;
  private $_key = -1;
  private $_current = NULL;

  public function __construct($file) {
    $this->_file = $file;
  }

  public function __destruct() {
    if (is_resource($this->_handle)) {
      fclose($this->_handle);
    }
  }

  public function rewind() {
    if (!is_resource($this->_handle)) {
      $this->_handle = fopen($this->_file, 'r');
    } else {
      fseek($this->_handle, 0);
    }
    $this->_key = -1;
    $this->next();
  }

  public function next() {
    if ($this->_key > 0 or $this->_current !== FALSE) {
      $this->_current = fgets($this->_handle);
      $this->_key++;
    }
  }

  public function key() {
    return $this->_key;
  }

  public function current() {
    return $this->_current;
  }

  public function valid() {
    return $this->_current !== FALSE;
  }
}


This is more source then implementing it directly. Mostly because of the class and function declarations. But you have to write this only once. And it can be improved or replaced without affecting the usage.

Map Elements

Using iterators you can encapsulate mapping actions, like using array_map() on an array but only for elements that are read. Let's say you need to chop all trailing whitespaces from the lines.

Step One: Map Iterator:


class MyProjectMapIterator implements OuterIterator {

  private $_innerIterator = NULL;
  private $_callback = NULL;

  public function __construct(Iterator $innerIterator, $callback) {
    $this->_innerIterator = $innerIterator;
    $this->_callback = $callback;
  }

  public function map($current, $key) {
    return call_user_func($this->_callback, $current, $key);
  }

  public function getInnerIterator() {
    return $this->_innerIterator;
  }

  public function rewind() {
    $this->getInnerIterator()->rewind();
  }

  public function next() {
    $this->getInnerIterator()->next();
  }

  public function key() {
    return $this->getInnerIterator()->key();
  }

  public function current() {
    return $this->map(
      $this->getInnerIterator()->current(),
      $this->getInnerIterator()->key()
    );
  }

  public function valid() {
    return $this->getInnerIterator()->valid();
  }
}


OuterIterator is an interface for iterators that wraps other iterators, it is defined in the SPL. It extends the Iterator interface. The MapIterator is an iterator of that kind, so it is cleaner to implement it that way.

Step Two: Using The Map Iterator:

Using the map iterator is not unlike using array_map.

$output = new MyProjectLineOutput(
  new MyProjectMapIterator(
    new MyProjectFile('sample.txt'),
    function($current, $key) {
      return chop($current);
    }
  )
);
$output();


The file() function has an option to do this. But it is limited to exactly this task. With MapIterator you get a lot more flexibility. You could even extend the MapIterator to get reusable mappings.

Step Three: Extending the Map Iterator

class MyProjectMapIteratorUpper extends MyProjectMapIterator {

  public function __construct(Iterator $innerIterator) {
    parent::__construct(
      $innerIterator,
      function($current, $key) {
        return strToUpper($current);
      }
    );
  }
}

Filter Elements:

Here is another option for file(), that skips empty lines. This would be filter task and here is already an superclass for that in SPL.

class MyProjectFilterIteratorSkipEmptyLines extends FilterIterator {

  public function accept() {
    return trim($this->getInnerIterator()->current()) !== '';
  }
}

Conclusion

Iterators are not about writing less code at one time. But they help you to write source that is encapsulated, easy to test and reusable. Because of this over time you will end up with less code.

3 comments:

  1. If you're interested in working with text files (creating/reading), consider the splfileobject, http://php.net/manual/en/class.splfileobject.php.

    Good overview of Iteartors.

    ReplyDelete
  2. Nice post.
    How about typehinting the callback of MyProjectMapIterator with the "callable" keyword? It would be cleaner I think and more "respectful" with the idea of "contract" that interface provides.
    "Because of this over time you will end up with less code."
    true enough.

    ReplyDelete
  3. @Jake: Good hint. Thanks.

    @artragis: You're right but callable is a PHP 5.4 feature. Not all people already updated and I didn't wanted to blow up the article with hints about the new features. And here are always the comments for additional hints. :-)

    ReplyDelete

x