Hudzilla.org - the homepage of Paul Hudson
Contents > XML & XSLT > Event-based parsing Wish List | Report Bug | About Me ]

12.2.4     Event-based XML parsing, at last!

This is NOT the latest copy of this book; click here for the latest version.

Parsing XML takes two steps. The second, harder part, we've already covered - working through the data with Expat. The first step is much easier, which is why I left it till last - you just need to read the contents of your selected XML file into a string, then pass the string to PHP.

Before you begin, one last note: a special option in PHP's XML parsing implementation, "Case Folding", is set to true by default. Case folding is basically the process of automatically converting the names of elements and the names of attributes to uppercase characters, and it is enabled by default. You'd do well to remember this when you are wondering why all the elements passed to your element handler function are uppercased!

Onto the code itself...

<?php
    $file
= '/path/to/somexmlfile.xml';
    if (!
file_exists($file)) {
        print
"Error loading XML file - please check the file exists and that you have access to it.";
        exit;
    } else {
        print
"XML file loaded successfully!<BR /><BR />";
    }

    
$data = file_get_contents($file);

    if (!
xml_parse($parser, $data, true)) {
        print
"<BR /><BR />Parse error!";
    } else {
        print
"<BR /><BR />Parsing complete.";
    }
?>

In the code, first the XML file is opened for read access using fopen($file, "r"), then the contents of the file are read entirely into a string, data, using $data = fread($fp, filesize($fp)). Note that you will need to alter the $file variable to point to an XML file that is stored somewhere PHP can access it.

With the XML document read into a string, fclose() is called, passing in the file handle we had opened for reading - in other words, we close the file we opened. As the data is now stored entirely in the string $data, we can now parse the data using xml_parse(). xml_parse() takes three parameters: a reference to the XML parser being used, the XML to parse, and a third parameter "is this all the XML to come?" The third parameter is very useful in situations where you want to feed XML chunk-by-chunk into the parser. xml_parse() returns true if the data was parsed successfully, and false if it fails.

In the code above, if the result from xml_parse() is false - that is, if the XML failed to parse - We print out a congratulatory status message, otherwise we print out failure.

If you have made this far, you have all the necessary ingredients to create a full XML parsing script. Do not fret, though, because you have already seen all the necessary code, it is just a matter of Bringing Everything Together.





<< 12.2.3 Callback function implementation   12.2.5 Bringing Everything Together >>
Table of Contents
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.



My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!



Top-right shadow
 
Bottom-left shadow Bottom shadow

Comments from other readers
Romack L. Natividad / romacknatividad@yahoo.com - 07 Sep 2008

$data = fread($fp, filesize($file)); // not $fp

A PHP User - 07 Sep 2008

Since one large advantage of the even drive model, is you do not need all of the xml in memory at once would it not make more sense to show an example that reads the file in in pieces?

A PHP User - 07 Sep 2008

is set to true by default ... and it is enabled by default

singpolyma AT gmail.com - 07 Sep 2008

file_get_contents($file); was used in the example, not fread



Add comment
Please note that by posting a comment here you are committing it to the public domain. This is important so that others can make use of your code themselves, and also so that I can incorporate helpful notes directly into the main text. Comments are limited to 2000 characters in length.

If you are reporting an error in the content, please tell me directly.

Your name/email address:
Your comment:
 
Now, in order to verify that you're a real person, please answer this simple question: what is seven plus two?
The answer is:
(please write in
numbers, eg 19)


Top-right shadow
 
Bottom-left shadow Bottom shadow