Hudzilla.org - the homepage of Paul Hudson
Contents > XML & XSLT > Event-based parsing Wish List | Report Bug | About Me ]

12.2.5     Bringing Everything Together

This is NOT the latest copy of this book; click here for the latest version.

Firstly, here's a new XML document for you to parse - it has the same format as the prior document, except it has more elements make the output more exciting:

<?xml version="1.0"?>
<newsitems>
    <news type="programming">
    PHP 6.0 has been released!
    </news>

    <news type="programming">
    Larry Wall switches to PHP!
    </news>

    <news type="sci-tech">
    Woman lands on Mars!
    </news>

    <news type="programming">
    XML takes over world!
    </news>
</newsitems>

Save the new document over the old one as we will be using it from now on. Notice that now there is the standard XML header in there, for the sake of compatibility.

Now, onto the PHP code itself. I am going to run over the complete code for an event-based XML-parsing script, and at the same time I will be introducing a couple of new functions to add extra functionality to your parsing scripts. You'll recognise a lot of the code from what you have just read, but there are a few new bits in there to keep you on your toes...

<?php
    $parser
= xml_parser_create();

    function
startElement($parser, $el_name, $attributes) {
        
$line = xml_get_current_line_number($parser);
        
$attribute_type = $attributes['TYPE'];
    
        switch (
$attribute_type) {
            case
"programming":
                print
"Programming headline found on line $line<BR />";
                break;
            case
"sci-tech":
                print
"Sci/tech headline found on line $line<BR />";
                break;
        }
    }

    function
endElement($parser, $el_name) {
        print
"Closed element $el_name.<BR />";
    }

    
xml_set_element_handler($parser, "startElement", "endElement");

    function
charData($parser, $chardata) {
        
$line = xml_get_current_line_number($parser);
        
$chardata = trim($chardata);
        if (
$chardata == "") return;

        print
"Character data found on line $line. The data was $chardata<BR />";
    }

    
xml_set_character_data_handler($parser, "charData");

    
$file = '/path/to/somexmlfile.xml';

    if (!
file_exists($file)) {
        print
"Error loading XML file - please check the file exists and that you have access to it.";
        exit;
    } else {
        print
"XML file loaded successfully!<BR /><BR />";
    }

    
$data = file_get_contents($file);

    if (!
xml_parse($parser, $data, true)) {
        print
"<H1>Unrecoverable XML error encountered! </H1>";
        
printf("<P> The error report was %s at line %d</P>", xml_error_string(xml_get_error_code($parser)),
        
xml_get_current_line_number($parser));
    } else {
        print
"<BR /><BR />Parsing complete.";
    }

    
xml_parser_free($parser);
?>

All being well, you should be able to recognise the majority of that code, despite the new pieces being in there. In startElement(), a new variable, $attribute_type, is set to the TYPE attribute of the items being passed in. This is then used in a switch case statement to select the correct output for type of news item. Notice that the attribute name is "TYPE" and not "type" because, as mentioned already, case-folding (automatic uppercasing) is enabled by default.

Also in startElement(), a new function appears, xml_get_current_line_number(). This takes a parser reference as its first parameter, and returns the current line being parsed by the parser as an integer. This is one of the advantages of event-based parsing - your callback functions are called when the appropriate XML is matched, which means you can get information about the line number, errors, and more.

In the charData() function, we now run the character data passed in through the trim() function. This is generally a good idea because very often you will find the character data contains spaces or line-breaks at the beginning and/or end, and this helps clean it up.

Finally, there is a new line of code to be run when XML parsing fails, and it introduces two more functions - xml_get_error_code(), and xml_error_string().





<< 12.2.4 Event-based XML parsing, at last!   12.3 SimpleXML >>
Table of Contents
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.



My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!



Top-right shadow
 
Bottom-left shadow Bottom shadow

Comments from other readers
A PHP User - 06 Sep 2008

You could always do...

if (!file_get_contents($file)) {
print "Error loading XML file - please check the file exists and that you have access to it.";
exit;
} else {
print "XML file loaded successfully!<BR /><BR />";
}

seeing as file_get_contents() returns false if something goes wrong, but I suppose that could also reduce readability to some extent.

A PHP User - 06 Sep 2008

A like eat shit!

A PHP User - 06 Sep 2008

function endElement($parser, $el_name) {
print "Closed element $el_name.<BR />";
}

and the first param ($parser) is for? whiping my ass?

Rodrigo - 06 Sep 2008

I guess so:

is_readable() returns TRUE if the file or directory specified by filename exists and is readable.

A PHP User - 06 Sep 2008

Wouldn't it be a smart move to use is_readable() instead of file_exists()?



Add comment
Please note that by posting a comment here you are committing it to the public domain. This is important so that others can make use of your code themselves, and also so that I can incorporate helpful notes directly into the main text. Comments are limited to 2000 characters in length.

If you are reporting an error in the content, please tell me directly.

Your name/email address:
Your comment:
 
Now, in order to verify that you're a real person, please answer this simple question: what is ten plus six?
The answer is:
(please write in
numbers, eg 19)


Top-right shadow
 
Bottom-left shadow Bottom shadow