Hudzilla.org - the homepage of Paul Hudson
Contents > Writing PHP Wish List | Report Bug | About Me ]

19.11     Output style

This is NOT the latest copy of this book; click here for the latest version.

Owing to the fact that PHP generates its output dynamically, it is actually rather easy to generate messy output that is hard to read. While this is not a problem in itself, it does not look good on you and your website, and also makes the outputted HTML source code hard to read if you have debugging to do.

However, one of the new extensions available in PHP is called Tidy, and, amongst other things, it can clean up and repair poorly written HTML. More advanced users may want to use it to traverse their HTML documents in PHP, but, let's face it, it's called Tidy for a reason - that's what it does best.

Here's an example HTML document:

<TITLE>This is bad HTML</title>

<BODY>
This would get rejected as XHTML for a number of reasons.
First, the <FOO> tag doesn't exist.<BR>Second, the tags aren't the same case.
Third, tags that don't end, like <HR>, aren't allowed.<BR>
Tidy should fix all this for us!

As you can see, it's quite messy. Let's put it through Tidy with no particular options set:

$tidy = new tidy("lame.html");
    
$tidy->cleanRepair();
    echo
$tidy;

That will output the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>This is bad HTML</title>
</head>
<body>
This would get rejected as XHTML for a number of reasons. First,
the tag doesn't exist.<br>
Second, the tags aren't the same case. Third, tags that don't end,
like
<hr>
, aren't allowed.<br>
Tidy should fix all this for us!
</body>
</html>

As you can see, it's added all the right header and footer tags to make the overall content compliant, and normalised the case of the elements. Second, it's taken away the FOO tag because it's invalid. Third, it has wrapped the lines so they aren't too long. Finally, it has added a new line after each tag.

I don't know about you, but I find line-wrapping at a fixed width an alien concept. Even when I'm programming on a command-line Linux box (which I do quite regularly!) I still type very long lines of code - I make them as long as they need to be! Fortunately we can turn off Tidy's desire to wrap lines with the list of options. Tidy accepts quite a variety of different options, and we'll go over some of the popular ones in a moment. First things first, though: let's blast line wrapping and make the output actually look tidy!

<?php
    $tidyoptions
= array("indent" => true,
                
"wrap" => 1000);
    
$tidy = new tidy("lame.html", $tidyoptions);
    
$tidy->cleanRepair();
    echo
$tidy;
?>

This time we use an array to store the options, enabling indent mode and setting the character-wrap limit to 1000 characters. Here's how that looks:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
  <head>
    <title>
      This is bad HTML
    </title>
  </head>
  <body>
    This would get rejected as XHTML for a number of reasons. First, the tag doesn't exist.<br>
    Second, the tags aren't the same case. Third, tags that don't end, like
    <hr>
    , aren't allowed.<br>
    Tidy should fix all this for us!
  </body>
</html>

Much better, but not yet perfect: it's valid HTML 3.2 now, but I'd much rather we went the whole hog and made it valid XHTML. Does it involve rewriting the HTML? Of course not - thanks to Tidy!

<?php
    $tidyoptions
= array("indent" => true,
                
"wrap" => 1000,
                
"output-xhtml" => true);
    
$tidy = new tidy("lame.html", $tidyoptions);
    
$tidy->cleanRepair();
    echo
$tidy;
?>

That extra option makes the world of difference. Take a look at the output now:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      This is bad HTML
    </title>
  </head>
  <body>
    This would get rejected as XHTML for a number of reasons. First, the tag doesn't exist.<br />
    Second, the tags aren't the same case. Third, tags that don't end, like
    <hr />
    , aren't allowed.<br />
    Tidy should fix all this for us!
  </body>
</html>

Now we get the works: a full XHTML doctype, all our tags are indented, and all our tags are closed. This is what we should be aiming for as standard.





<< 19.10.4 Distinguishing code blocks   19.11.1 Options for Tidy >>
Table of Contents
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.



My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!



Top-right shadow
 
Bottom-left shadow Bottom shadow

Comments from other readers
A PHP User - 30 Aug 2008

wrap text test
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

337750155@msn.com - 30 Aug 2008

<a href=http://iega.info/sm.html#1506732415 >backgrounds</a> [url=http://iega.info/sm.html#377597534]backgrounds[/url] http://iega.info/sm.html 1278037782



Add comment
Please note that by posting a comment here you are committing it to the public domain. This is important so that others can make use of your code themselves, and also so that I can incorporate helpful notes directly into the main text. Comments are limited to 2000 characters in length.

If you are reporting an error in the content, please tell me directly.

Your name/email address:
Your comment:
 
Now, in order to verify that you're a real person, please answer this simple question: what is seven plus one?
The answer is:
(please write in
numbers, eg 19)


Top-right shadow
 
Bottom-left shadow Bottom shadow