2009-09-12

Scraping Links From HTML

Did you try to scrap content from a html document using regular expressions? This is a bad idea (Read here why!).

With FluentDOM it is easy:

Get all links

Just create and FluentDOM from the HTML string, find all links using XPath and map the nodes to an array.

<?php
require('FluentDOM/FluentDOM.php');
$html = file_get_contents('http://www.papaya-cms.com/');
$links = FluentDOM($html, 'html')->find('//a[@href]')->map(
  function ($node) {
    return $node->getAttribute('href');
  }
);
var_dump($links);
?>

Extend local urls

Need to edit the links? Pretty much the same:

<?php
require('FluentDOM/FluentDOM.php');
$url = 'http://www.papaya-cms.com/';
$html = file_get_contents($url);
$fd = FluentDOM($html, 'html')->find('//a[@href]')->each(
  function ($node) use ($url) {
    $item = FluentDOM($node);
    if (!preg_match('(^[a-zA-Z]+://)', $item->attr('href'))) {
      $item->attr('href', $url.$item->attr('href'));
    }
  }
);
$fd->contentType = 'xml';
header('Content-type: text/xml');
echo $fd;
?>

2009-09-10

Speaking at the PHPNW09

I will speak at the PHPNW09 in Manchester.

Optimizing Your Frontend Performance Take a look on web application performance from the users side. The session starts from the browser, showing tools to measure and analyze performance and takes you to the server, explaining headers and possible solutions.

Looking forward to answer your questions and hear from your experiences and solutions.

2009-07-14

FluentDOM Loaders

We are still improving and experimenting with FluentDOM. We removed the constructor and added a load() method. The reason was to allow the creation of new documents with FluentDOM.


Now this is possible:

$fd = new FluentDOM();
$fd->append($fd->document->createElement('html'))
   ->append($fd->document->createElement('body'))
   ->append('<h1>Hell World</h1>');
echo $fd;

FluentDOM uses loader objects (Thanks for the idea Toby) and supports different types of sources. You can load HTML or XML , files or strings or define your own custom loaders. To load a HTML file you can just use the FluentDOM function or the load() method:

$fd = FluentDOM($fileName, 'text/html');

$fd = new FluentDOM();
$fd->load($fileName, 'text/html');

Or you define your own loader object:

$fd = new FluentDOM();
$fd->setLoaders(array(new MyFluentDOMLoader()));
$fd->load($source, $contentType);

You can find an example for inifiles in ~/examples/iniloader/.

2009-06-24

Solar Power

The Vaio P has a very low power consumption. About 6 Watt without the wireless stuff (8 Watt with it). So I though I give it a try to power my P with the sun. It works, but I think I need a little larger solar panel. The current one has only 6 Watt.

2009-06-17

FluentDOM 1.0 Release

We just released FluentDOM 1.0. The package contains the two classes (FluentDOM and FluentDOMStyle) and a lot of examples.

FluentDOM is a test driven project. The tests are included in the package of course.

We decided to use The MIT License for the project. Test it, use it und please give us some feedback.

2009-06-13

FluentDOM.org

FluentDOM got an own webpage at http://fluentdom.org.

You can now find nightly builds at http://nightly.fluentdom.org. A PHPUnit log file and the code coverage report for the latest nightly build is provided, too.

Status Update

We added a FluentDOMStyle class. This class extends FluentDOM and adds support for manipulation of the style attributes.

$items = FluentDOMStyle($xhtml)->find('//div');
$items->css(
  array(
    'text-align' => 'center',
    'color' => 'black'
  )
);

2009-06-11

FluentDOM

Today I like to present a new projekt: FluentDOM

It provides an easy to use, jQuery like, fluent interface for DOMDocument.

The idea was born in a workshop of Tobias Schlitt about the PHP XML extensions at the IPC Spring in Berlin.

The last few days Bastian Feder and I implemented it. That's how it looks in action:

require_once('../FluentDOM.php');
echo FluentDOM($xml)
 ->node(
   FluentDOM($samples)
   ->find('//b[@id = "first"]')
   ->removeAttr('id')
   ->addClass('imported')
 )
 ->replaceAll('//p');

We are aware that here are some other projects with similiar concepts. But none of them matched our requirements (XML targeted, XPath selectors, namespace support, ...).

You can take a look at the current version in Bastian's public SVN.

svn://fluentdom.org/trunk/FluentDOM

2009-05-08

papaya CMS Nightly Builds

You can now download nightly builds from the papaya cms website. They are uploaded every night and include the system, base and free modules, the new default-xhtml template set and the matching theme.

2009-02-26

New Gadget: Vaio P

I'm in love with my little new toy: the Vaio P19. It is small, light and perfectly quiet. The high resolution display is sharp and easy to read.

The Atom Z530 is fast enough. I don't notice a real difference to the TX1 I had before. Video could be faster, I hope for a driver update.

The pictures in this post were taken in RAW with a Canon EOS 350D (some using a remote software on the Vaio P) and edited in Photoshop Elements on it. The P performed well.

Large applications like Open Office start really fast, maybe the SSD is the reason.

The ac adapter is small, not much longer then an AA battery. SonyStyle Japan includes an additional piece with the plug, like the ac adapter for the macbooks. Anybody knows how I can get one of the plug adapters? (Except from ordering a complete ac adapter from Japan.) :-)

2009-02-05

Multi Language XSLT: Language Texts

Currently I am refactoring the default templates for the upcoming papaya CMS 5 release. I will show you some of the concepts in this blog. As you probably know papaya CMS uses XSLT for its templates, which is imho a perfect choice for web applications.

You get a strict split between application logic and layout. But XSLT can do more. How about translating layout texts, like the caption of a more link, format numbers and dates? Sounds nice, doesn't it?

In the first step you need to separate the layouts texts from the xslt and create language files for easier management.

The template for this is quite small:

<xsl:template name="language-text">
  <xsl:param name="text"></xsl:param>
  <xsl:choose>
    <xsl:when
      test="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]">
      <xsl:value-of
        select="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$text"/>
    <xsl:otherwise>    
  </xsl:choose>
</xsl:template>

This will check for an element text in the variable $LANGUAGE_TEXTS_CURRENT with an attribute ident that has the same value like the parameter $text and output it's content.

To fill up the variable, create a xml file with your texts.

<texts>
  <text ident="SAMPLE">Sample text</text>
</texts>

At the top of the XSL file define a global parameter and load a xml file into it. The Xpath function document() loads XML data from an URI. By default the URI is relative to the current xsl file.

<xsl:param name="LANGUAGE_TEXTS_CURRENT"
  select="document('./de-DE.xml')/texts"/>

Of course this whould be only a single fixed language file. So you have to use a variable for the file name. Xpath concat() supports a dynamic count of parameters - no need for nesting.

<xsl:param name="LANGUAGE_TEXTS_CURRENT"
  select="document(concat('./', $PAGE_LANGUAGE, '.xml'))/texts"/>

Now you can call the template to get the language specific text.

<xsl:call-template name="language-text">
  <xsl:with-param name="text">SAMPLE</text>
</xsl:call-template>

This is still a little noisy, but in standard XSLT you can not help it. However if your processor supports the EXSLT extension you can. With EXSLT you can convert a template into a function. The result whould look like this:

<xsl:value-of select="language:text('SAMPLE')" />

Less source and easier to read. You could use it in an output tag, too.

<img src="sample.png" alt="{language:text('SAMPLE')}" />

The PHP 5 ext/xsl using the libxslt library supports EXSLT. To convert the template to a function you change the declaration from "xsl:template" to "func:function" after you did import the EXSLT function namespace:

<func:function name="language:text">
  <xsl:param name="text"/>
  <func:result>
    <xsl:choose>
      <xsl:when
        test="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]">
        <xsl:value-of
          select="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      <xsl:otherwise>    
    </xsl:choose>
  </func:result>
</func:function>

Here's still one little problem, if you haven't translated the language xml file to the current language or you missed a phrase the result is the text identifier. It whould be nice to fall back to a default language. So declare an additional parameter and add a condition to the template.

<xsl:param name="LANGUAGE_TEXTS_FALLBACK"
  select="document('./en-US.xml')/texts"/>
<func:function name="language:text">
  <xsl:param name="text"/>
  <func:result>
    <xsl:choose>
      <xsl:when
        test="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]">
        <xsl:value-of
          select="$LANGUAGE_TEXTS_CURRENT/text[@ident = $text]"/>
      </xsl:when>
      <xsl:when
        test="$LANGUAGE_TEXTS_FALLBACK/text[@ident = $text]">
        <xsl:value-of
          select="$LANGUAGE_TEXTS_FALLBACK/text[@ident = $text]"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      <xsl:otherwise>    
    </xsl:choose>
  </func:result>
</func:function>

You can extend this idea and have default and project specific language files.

Have fun experimenting.

2009-01-13

x