2017-07-02

FluentDOM 6.1 released - Improvements

Release: FluentDOM 6.1.0

MultiByte HTML

Thanks to some issues reported by Kyle Tse the multibyte handling for HTML was improved. It should now work properly. The HTML loader can read the encoding/charset from meta tags or you can specify as an loader option. The default is UTF-8. FluentDOM\Document::saveHTML() has got some additional logic as well.

XMLReader/XMLWriter

If you need to handle huge XML files, the XMLReader and XMLWriter APIs are the way to do it. Well you could try using SAX, but believe me THAT is no fun. XMLReader and XMLWriter are nice APIs by itself, so FluentDOM adds only slight changes for namespace handling.

XMLReader::read()/XMLReader::next()

Of the two traversing methods, only next() allows to specify a local name as a condition. FluentDOM extends the signature of both methods to allow for a tag name and a namespace URI. As a result the source reading an XML with namespaces can be simplified:

$sitemapUri = 'http://www.sitemaps.org/schemas/sitemap/0.9';
$reader = new FluentDOM\XMLReader();
$reader->open($file);
if ($reader->read('url', $sitemapUri)) {
  do {
    //...
  } while ($reader->next('url', $sitemapUri));
}

XMLReader::registerNamespace()

Additionally you can register namespaces on the XMLReader object itself. This allows it resolve namespace prefixes in tag name arguments.

Namespace definitions will be propagated to an FluentDOM\Document instance created by FluentDOM\XMLReader::expand().

$reader = new FluentDOM\XMLReader();
$reader->open($file);
$reader->registerNamespace('s', 'http://www.sitemaps.org/schemas/sitemap/0.9');
if ($reader->read('s:url')) {
  do {
    $url = $reader->expand();
    var_dump(
      $url('string(s:loc)')
    );
  } while ($reader->next('s:url'));
}

XMLWriter::registerNamespace()

The same registration is possible on an FluentDOM\XMLWriter. It keeps track track of the namespaces defined in the current context and avoid adding unnecessary definitions to the output (PHP Bug).

XMLWriter has many methods that have a tag name argument and this change allows all of them to become namespace aware.

$writer = new FluentDOM\XMLWriter();
$writer->openURI('php://stdout');
$writer->registerNamespace('', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('urlset');

foreach ($urls as $url) {
  $writer->startElement('url');
  $writer->writeElement('loc', $url['href']);
  // ...
  $writer->endElement();
}

$writer->endElement();
$writer->endDocument();
x