2017-10-03

FluentDOM 7.0, The Next Step

FluentDOM 7.0 is out, so what has changed? Well, the FluentDOM namespace got a little crowded so I moved all the DOM child classes into FluentDOM\DOM, made the Creator a top level class and collected the utility classes. If you're updating you might need to change some imports. FluentDOM now requires PHP 7 and uses scalar type hints. In other words, lots of cleanup.

FluentDOM\XMLReader\SiblingIterator

Large XML files usually consist of a list element with many record elements as its children. The whole list is to large to load into memory, but the records are small enough.

The SiblingIterator takes a XMLReader, a tag name and a filter callback. It matches the tag name and executes the filter callback. If the tag name matches and the filter callback returns TRUE it will expand the node into DOM. After the first match it will only consider following siblings. This allows you to improve the read performance.

Here is an example that read a XML sitemap including video information.

$reader = new FluentDOM\XMLReader();
$reader->open($sitemapFile);
$reader->registerNamespace(
  's', 'http://www.sitemaps.org/schemas/sitemap/0.9'
);
$reader->registerNamespace(
  'v', 'http://www.google.com/schemas/sitemap-video/1.1'
);

foreach (new FluentDOM\XMLReader\SiblingIterator($reader, 's:url') as $url) {
  /** @var FluentDOM\DOM\Element $url */
  var_dump(
    [
      $url('string(v:video/v:title)'),
      $url('string(s:loc)')
    ]
  );
}

FluentDOM\XMLWriter::collapse()

FluentDOM 7.0 adds a collapse() method to XMLWriter. It is the missing opposite of XMLReader::expand(). Using the two methods allows you to work with large XML files in a really easy way.

The collapse() method takes any DOM node or node list and will write it to the output stream. You can use the extended DOM classes, FluentDOM\Creator or FluentDOM\Query to create the record node.

$writer = new FluentDOM\XMLWriter();
$writer->openURI('php://stdout');
$writer->registerNamespace(
  '', 'http://www.sitemaps.org/schemas/sitemap/0.9'
);
$writer->registerNamespace(
  'video', 'http://www.google.com/schemas/sitemap-video/1.1'
);

$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('urlset');
$writer->writeAttribute(
  'xmlns:video', 'http://www.google.com/schemas/sitemap-video/1.1'
);

$_ = FluentDOM::create();
$_->registerNamespace(
  '', 'http://www.sitemaps.org/schemas/sitemap/0.9'
);
$_->registerNamespace(
  'video', 'http://www.google.com/schemas/sitemap-video/1.1'
);

foreach ($videos as $video) {
  $writer->collapse(
    $_(
      'url',
      $_('loc', $video['url']),
      $_(
        'video:video',
        $_('video:title', $video['title'])
      )
    )
  );
}
$writer->endElement();
$writer->endDocument();

XMLWriter::setAttribute() recognizes if you write an namespace definition so it will not add it to descendant nodes.

Put Together

If you combine the expand iterator with collapse you can easily write mappers that can consume large XML files. You can basically use each record as a separate DOM document.

For example you can use it to merge XML documents and change the namespaces:

$writer = new \FluentDOM\XMLWriter();
$writer->openURI('php://stdout');
$writer->registerNamespace('p', 'urn:persons');
$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('p:persons');

// iterate the example sources
foreach ($data as $sourceFile) {
  // load the source into a reader
  $reader = new \FluentDOM\XMLReader();
  $reader->open($sourceFile);

  // iterate the person elements
  $persons = new FluentDOM\XMLReader\SiblingIterator($reader, 'person');
  foreach ($persons as $person) {
    // use the transformer to move the nodes into the namespace
    $writer->collapse(
      new \FluentDOM\Transformer\Namespaces\Replace(
        $person,
        // namespaces to replace
        ['' => 'urn:persons', 'urn:example' => 'urn:persons'],
        // prefix for target namespace
        ['urn:persons' => 'p']
      )
    );
  }
}

$writer->endElement();
$writer->endDocument();
x