2014-08-09

Xpath 1.0 - Quoting Strings

Strings in Xpath 1.0 can be enclosed in single or double quotes. The following expressions are equivalent.

//div[@id = 'foo']
//div[@id = "foo"]


This is nice because you can use the variant that requires less or none escaping. I prefer single quotes for PHP because they need less escaping (only single quote and backslash). I usually end up with something like this:

$xpath->evaluate('//div[@id = "foo"]');

However a problem comes up if the value is dynamic.

$xpath->evaluate('//div[@id = "'.$_GET['foo'].'"]');

If $_GET['foo'] contains a double quote, it will break the expression. It is compare able to an SQL-Injection and should be avoided, don't you think?

The Xpath 1.0 specification for a literal is:

Literal   ::=   '"' [^"]* '"'


| "'" [^']* "'"

It disallows the use of the enclosing quote in the literal itself, here is no way to escape it.

Hint: This is different in Xpath 2.0. You can duplicate the quotes to escape them.

Deciding Which Quote To Use

 The first and obvious step is to check the value for quotes and use the one that it does not contain:

function quote($value) {
  $char = strpos($value, '"') === FALSE ? '"' : "'";
  return $char.$value.$char;
}


But a value could contain both kind of quotes. This would still break the expression.

Divide And Conquer

If is not possible to quote the whole value because it contains both kind of quotes you need to divide it into parts that can be quoted. You can then use the Xpath function concat() to rebuild the orignal value again:

//div[. = concat("Singe Quote: '", 'Double Quote: "')]

Matching text structures is the domain of regular expression. So lets use them:

preg_match_all('("[^\']*|[^"]+)', 'Double Quote ", Single Quote \'', $matches);
var_dump($matches);


Output:

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(13) "Double Quote "
    [1]=>
    string(16) "", Single Quote "
    [2]=>
    string(1) "'"
  }
}

The pattern matches any string that start with a double quote and contains no single quote or any string that does not contain any double quote.

All that is left is quoting the parts and join them back together:

foreach ($matches[0] as $part) {
  $quoteChar = (substr($part, 0, 1) == '"') ? "'" : '"';
  $result .= ", ".$quoteChar.$part.$quoteChar;
}
return 'concat('.substr($result, 2).')';

Put Together

It does not make sense to create the function call for a single argument. So the check is still needed:

  1. If the value contains no single quote, use single quotes
  2. If the value contains no double quote, use the double quotes
  3. Otherwise divide the string and use concat()

A complete implementation can be found in FluentDOM\Xpath::quote().
x