2014-09-20

FluentDOM 5.1 - New Features

FluentDOM 5.1 is now available. Here are some of the highlights:

Functors 

The classes can now be called as functions to navigate in a DOM with XPath expressions. The following example fetches all link hrefs attributes from an HTML page:
$dom = new \FluentDOM\Document();
$dom->loadHTMLFile('http://fluentdom.org/');

$links = [];
foreach ($dom('//a[@href]/@href') as $href) {
  $links[] = (string)$href;
}


This works for most of the nodes in a DOM. 

Creator

The new Creator class provides short syntax to create DOM nodes. More detailed information can be found in the wiki.
$_ = FluentDOM::create();
echo $_(
  'ul',
  ['class' => 'navigation'],
  $_('li', 'FluentDOM')
);

XML To JSON

Several serializers/loaders for JSON where added. JSONML, Rayfish, BadgerFish and RabbitFish are supported.
echo "XML -> JsonML\n\n";
$json = json_encode(
  new FluentDOM\Serializer\Json\JsonML($dom), 
  JSON_PRETTY_PRINT);
echo $json;

echo "\n\nJsonML -> XML\n\n";
echo FluentDOM(
  $json, 'application/jsonml+json')->formatOutput();

The Release

2014-08-17

FluentDOM + HTML5

HTML 5 is not directly supported by PHPs DOM extension. That means FluentDOM can not understand it, too. But here is a solution. HTML5-PHP is library that can parse HTML5 into a DOM document.

Both libraries use Composer:
"require": {
  "fluentdom/fluentdom": "5.*",
  "masterminds/html5": "2.*"
}

Read HTML5 into FluentDOM:
$html5 = new Masterminds\HTML5();
$fd = FluentDOM($html5->loadHTML($html));

Or write it:
echo $html5->saveHTML($fd->document);

HTML5-PHP puts the elements into the XHTML namespace. To use XPath expressions, you will need to register a prefix for it:
$html5 = new Masterminds\HTML5();
$fd = FluentDOM($html5->loadHTML($html));
$fd->registerNamespace(
  'xhtml', 'http://www.w3.org/1999/xhtml'
);
echo $fd->find('//xhtml:p')->text();

2014-08-09

Xpath 1.0 - Quoting Strings

Strings in Xpath 1.0 can be enclosed in single or double quotes. The following expressions are equivalent.

//div[@id = 'foo']
//div[@id = "foo"]


This is nice because you can use the variant that requires less or none escaping. I prefer single quotes for PHP because they need less escaping (only single quote and backslash). I usually end up with something like this:

$xpath->evaluate('//div[@id = "foo"]');

However a problem comes up if the value is dynamic.

$xpath->evaluate('//div[@id = "'.$_GET['foo'].'"]');

If $_GET['foo'] contains a double quote, it will break the expression. It is compare able to an SQL-Injection and should be avoided, don't you think?

The Xpath 1.0 specification for a literal is:

Literal   ::=   '"' [^"]* '"'


| "'" [^']* "'"

It disallows the use of the enclosing quote in the literal itself, here is no way to escape it.

Hint: This is different in Xpath 2.0. You can duplicate the quotes to escape them.

Deciding Which Quote To Use

 The first and obvious step is to check the value for quotes and use the one that it does not contain:

function quote($value) {
  $char = strpos($value, '"') === FALSE ? '"' : "'";
  return $char.$value.$char;
}


But a value could contain both kind of quotes. This would still break the expression.

Divide And Conquer

If is not possible to quote the whole value because it contains both kind of quotes you need to divide it into parts that can be quoted. You can then use the Xpath function concat() to rebuild the orignal value again:

//div[. = concat("Singe Quote: '", 'Double Quote: "')]

Matching text structures is the domain of regular expression. So lets use them:

preg_match_all('("[^\']*|[^"]+)', 'Double Quote ", Single Quote \'', $matches);
var_dump($matches);


Output:

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(13) "Double Quote "
    [1]=>
    string(16) "", Single Quote "
    [2]=>
    string(1) "'"
  }
}

The pattern matches any string that start with a double quote and contains no single quote or any string that does not contain any double quote.

All that is left is quoting the parts and join them back together:

foreach ($matches[0] as $part) {
  $quoteChar = (substr($part, 0, 1) == '"') ? "'" : '"';
  $result .= ", ".$quoteChar.$part.$quoteChar;
}
return 'concat('.substr($result, 2).')';

Put Together

It does not make sense to create the function call for a single argument. So the check is still needed:

  1. If the value contains no single quote, use single quotes
  2. If the value contains no double quote, use the double quotes
  3. Otherwise divide the string and use concat()

A complete implementation can be found in FluentDOM\Xpath::quote().