The Mobile Developer
by Eric Giguère
XHTML and XHTML Basic: the Future of HTML
In my
last
column I referred you to
XHTML
as a way to transition from an
HTML-based website to an XML-based website. Since XHTML is the future
of HTML, I thought it was worth a closer look.
XHTML Defined
XHTML is an XML encoding of HTML 4. XHTML is defined by the
World Wide Web Consortium (W3C for
short), which also defines the various XML and HTML standards.
XHTML Basic is a subset of XHTML geared specifically for small
devices that can't support the full functionality of XHTML --
more on that later.
Why was XHTML developed? Although both XML and HTML are derivatives
of SGML -- Standardized General Markup Language -- there are
important differences between the two markup languages.
HTML is quite forgiving when it comes to tag placement,
missing tags, and attribute values. A web browser, for example,
will correctly interpret an HTML page even if you leave out
the <HTML> and </HTML> tags. Nor do they require
that you place the </P> tag to mark the end of a paragraph.
XML, however, has strict rules about the format and use of
tags. These rules make it easier to parse XML documents.
And more importantly, different XML parsers will interpret a
given XML document in the same way. HTML parsers are not
as consistent because the rules aren't as strict, which leads
to differences in the way web browsers display pages.
From HTML to XHTML
XHTML is the bridge between HTML and XML. It imposes XML's
strict syntax on HTML and defines XML document type definitions
(DTDs) covering HTML 4.01. Take, for example, this simple
HTML document:
<HTML>
<HEAD><TITLE>Hello World!</TITLE></HEAD>
<BODY>
Hello world!
<HR>
Hello again!
</BODY>
</HTML>
When converted to XHTML it looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Hello World!</title></head>
<body>
<p>Hello world!</p>
<hr/>
<p>Hello again!</p>
</body>
</html>
Not too different, is it? And to convert your
HTML to XHTML there's a tool called
HTML Tidy
that does the job very well.
But of course, today's browsers don't understand XHTML.
For example, XML constructs like the shorthand notation
for empty elements -- where instead of using a start
tag and an end tag (like <abc> and </abc>)
you use a single tag ending in a slash (like <abc/>) --
will be ignored or misinterpreted by web browsers.
We can take advantage of a web browser's forgiving
nature, however, and adapt our XHTML so that it
can be read correctly by a web browser while
still maintaining correct XML syntax.
Appendix C
of the XHTML specification shows you how to do this.
For example, place a space between the tag and
the trailing slash when specifying an empty
element. In other words, use <hr /> instead
of <hr>.
XHTML Basic
>From a small device perspective, however, XHTML is
too large to support, because HTML 4 is itself very
large. In order to encourage the adoption of XHTML
for all devices, the W3C is currently defining a
standard called
XHTML Basic.
It defines a standardized minimal subset of XHTML
aimed specifically at "small information appliances".
It's really a pre-emptive strike by the W3C to avoid
the fragmentation of XHTML by third parties, which is
what happened with HTML.
Does WML Disappear?
There will inevitably be a push to replace languages
like HDML and WML with XHTML Basic. This only makes
sense since the industry is moving towards adopting
XML-based formats for almost every kind of data
interchange. In fact, since WML is itself an XML
language the transition to XHTML doesn't seem
that hard at first glance. However, you have to
remember that WML (and HDML) also defines
actions as well as content. These currently
have no equivalent in XHTML. So, in the short
term at least, WML and HDML aren't going to disappear.
It will be interesting to see who wins out in the
end, though. Plan on supporting all
three markup languages at some point!
Eric Giguère is the author of
Palm Database Programming: The Complete Developer's Guide
and an upcoming book on the Java 2 Micro Edition. He works
as a developer in Sybase's Mobile & Embedded Computing division.
Visit his website at www.ericgiguere.com
or send him mail at ericgiguere@ericgiguere.com.