|
Newsletters
|
|
|
|
|
Parsing XML With PHP
by Marc Robards
Last week, we presented a simple framework
(named XMLCast) for distributing content to a variety of devices using XML.
This application was built using Microsoft's Active Server Pages (ASP) technology
but we realize that many of you aren't using ASP (we aren't either). This article
will present the concept of XML parsing using the PHP scripting
language. In the coming weeks, we will follow this example up with an expansion of XMLCast using other
tools such as XSLT and Cocoon.
Recently, The Wireless Developer Network began offering our daily news in
a variety of formats for people that wanted their news delivered in ways other
than standard HTML. Among the formats we offer is Rich Site Summary (RSS),
an XML format that splits up items (news headlines, in this case) into easily
extractable elements, allowing other sites to grab our latest news headlines,
format them as they wish, and list them on a page on their site, all with the
convinence of XML data exchange.
RSS version 0.91 was developed by Netscape for their "My Netscape Network" and
it allows a site to create an XML file that contains basic information about
the site, in addition to "items" which can have "title", "link" and
"description" nodes. To see an example of our RSS news feed,
click here. That's great, you say, but now that we've got the RSS XML document, how do
we extract the information and serve it up as HTML? Well, each language has
it's own way to deal with XML, and for this example, we're using PHP and it's
included XML parser. PHP uses James Clark's expat library, which you already have if you
are using Apache 1.3.9 or later. To parse XML with PHP, you must configure
PHP with the --with-xml argument prior to make and
make install.
We've written a simple PHP script that parses the RSS file, extracts the
pertinent information, formats it, and serves it up as regular HTML. Not only
does it give an example of how to parse an RSS XML file with PHP, this script
can also be added to any PHP file, allowing for automatically updated news
headlines straight from our site.
Click here to download the source code.
The first thing we do is create a class to hold our headlines:
class xItem {
var $xTitle;
var $xLink;
var $xDescription;
}
Then, we define a few global variables for the general site information,
and an array to hold the headline objects.
$sTitle = "";
$sLink = "";
$sDescription = "";
$arItems = array();
$itemCount = 0;
The meat of the XML parsing is in the next three functions, startElement, endElement, and characterData. We've used a nice trick by David Medinets from his book
PHP3 - Programming Browser-Based Applications
for extracting the XML data in PHP. With PHP's implementation of XML, there's no easy way to get around using global variables, but David's way is one of the
most straightforward PHP-XML implementations we've found. Here's the first two functions:
function startElement($parser, $name, $attrs) {
global $curTag;
$curTag .= "^$name";
}
function endElement($parser, $name) {
global $curTag;
$caret_pos = strrpos($curTag,'^');
$curTag = substr($curTag,0,$caret_pos);
}
To parse PHP in XML, you define functions to handle:
a) when the parser encounters the start element of a tag
b) when the parser encounters the end element of a tag
c) when the parser encounters the data within the start and end tags
The way we handle these functions is by setting a global variable ($curTag)
to a string containg all the parent tags separated by a caret (^).
For example, an xml structure that looks like:
-
would translate to a $curTag:
^RSS^CHANNEL^ITEM
when the parser has found the <ITEM> tag. All we have to do is check for
when the parser has found the correct $curTag, and extract the data accordingly.
That's all done in the characterData function. Here it is:
function characterData($parser, $data) {
global $curTag;
// get the Channel information first
global $sTitle, $sLink, $sDescription;
$titleKey = "^RSS^CHANNEL^TITLE";
$linkKey = "^RSS^CHANNEL^LINK";
$descKey = "^RSS^CHANNEL^DESCRIPTION";
if ($curTag == $titleKey) {
$sTitle = $data;
}
elseif ($curTag == $linkKey) {
$sLink = $data;
}
elseif ($curTag == $descKey) {
$sDescription = $data;
}
// now get the items
global $arItems, $itemCount;
$itemTitleKey = "^RSS^CHANNEL^ITEM^TITLE";
$itemLinkKey = "^RSS^CHANNEL^ITEM^LINK";
$itemDescKey = "^RSS^CHANNEL^ITEM^DESCRIPTION";
if ($curTag == $itemTitleKey) {
// make new xItem
$arItems[$itemCount] = new xItem();
// set new item object's properties
$arItems[$itemCount]->xTitle = $data;
}
elseif ($curTag == $itemLinkKey) {
$arItems[$itemCount]->xLink = $data;
}
elseif ($curTag == $itemDescKey) {
$arItems[$itemCount]->xDescription = $data;
// increment item counter
$itemCount++;
}
}
The characterData function checks if the $curTag is something we want to extract,
and if it is, assign it to our variables. The first chunk extracts the general
information about the site, and then checks if we've come across an <ITEM>.
If we have, it creates a new xItem, inserts it into our $arItems array, and sets
the properties to the appropriate data from the RSS file.
Now that the functions are defined, we use PHP's standard way of assigning our
functions to the XML parser:
// main loop
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($uFile,"r"))) {
die ("could not open RSS for input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
Everything in the above code that starts with "xml_" in the above code is
standard PHP XML functions. We tell PHP's XML parser we want our functions
to execute when the parser comes accross a start tag, end tag, or \
character data, and then we load the RSS file ($uFile, set to our RSS document),
and start up the parser (xml_parse).
Now that we have the data in nice little objects and variables, formatting it
and serving it up is simple:
We've added a few user-defined variables to set the font, font size, and whether
or not you want the descriptions along with the headlines (see the source code
for details), but basically the above code loops through our array of items,
echo-ing out them in a basic format.
When it comes to exchanging data, XML is hard to beat. Defining an XML format
that can be used by many people (like RSS) is just one of the benefits from using
this sophisticated, yet elegant, technology. Parsing XML in PHP may not be quite
so straightforward at first, but once you get a handle on it, the possibilities
of exchanging data (especially over something like the Internet) are endless.
Suggested Links:
About The Author: Marc Robards is a Microsoft
Certified Solutions Developer who is searching for the perfect balance
between Windows and Linux. Marc can be reached at marc@wirelessdevnet.com
|
|
|