Searching and filtering with XPath

array xpath ( string path)

The standard way for searching through XML documents for particular nodes is called XPath, and Sterling Hughes (the creator of the SimpleXML extension) described it saying it's "as important to XML as regular expressions are to plain text".

Fortunately for us, XPath is a darn sight easier than regular expressions for basic usage. That said, it might take a little while to get your head around all the possibilities it opens up to you!

Using the same employees.xml file, give this script a try:

<?php
    $xml = simplexml_load_file('employees.xml');

    echo "<strong>Using direct method...</strong><br />";
    $names = $xml->xpath('/employees/employee/name');
    foreach($names as $name) {
        echo "Found $name<br />";
    }
    echo "<br />";

    echo "<strong>Using indirect method...</strong><br />";
    $employees = $xml->xpath('/employees/employee');
    foreach($employees as $employee) {
        echo "Found {$employee->name}<br />";
    }
    echo "<br />";

    echo "<strong>Using wildcard method...</strong><br />";
    $names = $xml->xpath('//name');
    foreach($names as $name) {
        echo "Found $name<br />";
    }
?>

What that does is pull out names of employees in three different ways. The key real work is done in the call to the xpath() function. As you can see in the prototype, xpath() takes a query as its only parameter, and returns the result of that query.

The query itself has specialised syntax, but it's very easy. The first example says "Look in all the employees elements, find any employee elements in there, and retrieve all the names of them." It's very specific because only employees/employee/name is matched.

The second query matches all employee elements inside employees, but doesn't go specifically for the name of the employees. As a result, we get the full employee back, and need to print $employee->name to get the name.

The last one just looks for name elements, but note that it starts with "//" - this is the signal to do a global search for all name elements, regardless of where they are or how deeply nested they are in the document.

So, what we have here is the ability to grab specific parts of a document very easily, but that's really only the start of XPath's coolness. You see, you can also use it to filter your results according to any values you want. Try this script out:

<?php
    $xml = simplexml_load_file('employees.xml');

    echo "<strong>Matching employees with name 'Laura Pollard'</strong><br />";
    $employees = $xml->xpath('/employees/employee[name="Laura Pollard"]');

    foreach($employees as $employee) {
        echo "Found {$employee->name}<br />";
    }

    echo "<br />";

    echo "<strong>Matching employees younger than 54</strong><br />";
    $employees = $xml->xpath('/employees/employee[age<54]');

    foreach($employees as $employee) {
        echo "Found {$employee->name}<br />";
    }

    echo "<br />";

    echo "<strong>Matching employees as old or older than 48</strong><br />";
    $employees = $xml->xpath('//employee[age>=48]');

    foreach($employees as $employee) {
        echo "Found {$employee->name}<br />";
    }

    echo "<br />";

?>

Let's break that down to see how the querying actually works. The key part, is, of course, between the square brackets, [ and ]. The first query grabs all employees elements, then all employee elements inside it, but then filters them so that only those that have a name that matches Laura Pollard. Once you get that, the other two are quite obvious: <, >, <=, etc, all work as you'd expect in PHP. Note that I slipped in a double slash in the last example to show you that the global search notation works here too.

You can grab only part of a query result by continuing on as normal afterwards, like this:

$ages = $xml->xpath('//employee[age>=48]/age');

foreach($ages as $age) {
    echo "Found $age ";
}

You can even run queries on queries, with an XPath search like this:

$employees = $xml->xpath('//employee[age>=49][name="Laura Pollard"]');

Going back to selecting various types of elements, you can use the | symbol (OR) to select more than one type of element, like this:

echo " Retrieving all titles and ages ";
$results = $xml->xpath('//employee/title|//employee/age');

foreach($results as $result) {
    echo "Found $result ";
}

That will output the following:

Found Chief Information Officer
Found 48
Found Chief Executive Officer
Found 54

You can, of course, combine all of this together to do search on more than one value, like this:

$names = $xml->xpath('//employee[age<40]/name|//employee[age>50]/name');

foreach($names as $name) {
    echo "Found $name ";
}

For maximum insanity, you can actually run calculations using XPath in order to get tighter control over your queries. For example, if you only wanted the names of employees who have an odd age (that is, cannot be divided by two without leaving a remainder) you would use an XPath query like this:

$names = $xml->xpath('//employee[age mod 2 = 1]/name');

Along with "mod" (equivalent to % in PHP) there's also "div" for division, + and - (same as PHP, except that - must always have whitespace either side of it as it may be confused with an element name), and ceiling() and floor() (equivalent to ceil() and floor() in PHP). These are quite advanced and don't really get that much use in practice.

However, there is a lot, lot more you can do with XPath, and for that I suggest you check out the Further Reading section!

 

Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: Outputting XML >>

Previous chapter: Reading from a string

Jump to:

 

Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.