About XML
Like Hypertext Markup Language
(HTML), Extensible Markup Language (XML) is a subset of Standardized
General Markup Language (SGML) and has been designed specifically
for use on the Web. XML is defined in the W3C Recommendation published
by the World Wide Web Consortium. The latest version of this document
is available
.
XML is more complete and disciplined than HTML, and it is
also a framework for creating markup languages–it allows
you to define your own application-oriented markup tags.
XML provides a set of rules for structuring data. Like HTML,
XML uses tags and attributes, but the tags are used to delimit pieces
of data, allowing the application that receives the data to interpret
the meaning of each tag. These properties make XML particularly
suitable for data interchange across applications, platforms, enterprises,
and the Web. The data can be structured in a hierarchy that includes
nesting.
An XML document is made up of declarations, elements, comments,
character references, and processing instructions, indicated in
the document by explicit markup.
The simple XML document that follows contains an XML declaration followed
by the start tag of the root element, <d_dept_list>,
nested row and column elements, and finally the end tag of the root
element. The root element is the starting point for the XML processor.
|
1 |
<?xml version="1.0"><br /><d_dept_list><br /> <d_dept_list_row><br /> <dept_id>100</dept_id><br /> <dept_name>R &amp;D</dept_name><br /> <dept_head_id>501</dept_head_id><br /> </d_dept_list_row><br /> ...<br /></d_dept_list> |
This section contains a brief overview of XML rules and syntax.
For a good introduction to XML, see XML in 10 points
. For more detailed information, see
the W3C XML page
, the O’Reilly page XML from the inside out
,
the XML Cover Pages
, or one of the many books about XML.
Valid and well-formed XML documents
An XML document
must be valid, well-formed, or both.
Valid documents
To define a set of tags for use in a particular application,
XML uses a separate document named a document type definition (DTD).
A DTD states what tags are allowed in an XML document and defines
rules for how those tags can be used in relation to each other.
It defines the elements that are allowed in the language, the attributes
each element can have, and the type of information each element
can hold. Documents can be verified against a DTD to ensure that they
follow all the rules of the language. A document that satisfies
a DTD is said to be valid.
If a document uses a DTD, the DTD must immediately follow
the declaration.
XML Schema provides an alternative mechanism for describing
and validating XML data. It provides a richer set of datatypes than
a DTD, as well as support for namespaces, including the ability
to use prefixes in instance documents and accept unknown elements
and attributes from known or unknown namespaces. For more information,
see the W3C XML Schema page
.
Well-formed documents
The second way to specify XML syntax is to assume that a document
is using its language properly. XML provides a set of generic syntax
rules that must be satisfied, and as long as a document satisfies
these rules, it is said to be well-formed. All valid documents
must be well-formed.
Processing well-formed documents is faster than processing
valid documents because the parser does not have to verify against
the DTD or XML schema. When valid documents are transmitted, the
DTD or XML schema must also be transmitted if the receiver does
not already possess it. Well-formed documents can be sent
without other information.
XML documents should conform to a DTD or XML schema if they
are going to be used by more than one application. If they are not
valid, there is no way to guarantee that various applications will
be able to understand each other.
XML syntax
There are a few more restrictions
on XML than on HTML; they make parsing of XML simpler.
Tags cannot be omitted
Unlike HTML, XML does not allow you to omit tags. This guarantees
that parsers know where elements end.
The following example is acceptable HTML, but not XML:
|
1 |
<table><br /> <tr><br /> <td>Dog</td><br /> <td>Cat<br /> <td>Mouse<br /></table> |
To change this into well-formed XML, you need to add all the
missing end tags:
|
1 |
<table><br /> <tr><br /> <td>Dog</td><br /> <td>Cat</td><br /> <td>Mouse</td><br /> </tr><br /></table> |
Representing empty elements
Empty elements cannot be represented in XML in the same way
they are in HTML. An empty element is one that is not used to mark
up data, so in HTML, there is no end tag. There are two ways to
handle empty elements:
- Place a dummy tag immediately
after the start tag. For example:1<img href="picture.jpg"></img> - Use a slash character at the end of the initial
tag:1<img href="picture.jpg"/>This tells a parser that the element consists only of one
tag.
XML is case sensitive
XML is case sensitive, which allows it to be used with non-Latin
alphabets. You must ensure that letter case matches in start and
end tags: <MyTag> and </Mytag> belong
to two different elements.
White space
White space within tags in XML is unchanged by parsers.
All elements must be nested.
All XML elements must be properly nested. All child elements
must be closed before their parent elements close.
XML parsing
There are two major types
of application programming interfaces (APIs) that can be used to
parse XML:
- Tree-based APIs map the XML document
to a tree structure. The major tree-based API is the Document Object
Model (DOM) maintained by W3C. A DOM parser is particularly useful
if you are working with a deeply-nested document that must be traversed
multiple times.
For more information about the DOM parser, see the W3C Document Object Model page
.PowerBuilder provides the PowerBuilder Document Object Model (PBDOM)
extension to enable you to manipulate complex XML documents. For
more information about PBDOM, see Application Techniques
and
the PowerBuilder Extension Reference
. - Event-based APIs use callbacks to report events,
such as the start and end of elements, to the calling application,
and the application handles those events. These APIs provide faster,
lower-level access to the XML and are most efficient when extracting
data from an XML document in a single traversal.
For more information about the best-known event-driven parser,
SAX (Simple API for XML), see the SAX page
.
Xerces parser
PowerBuilder includes software developed by the Apache Software Foundation
(http://www.apache.org/).
The XML services for DataWindow objects are built on the Apache Xerces-C++ parser,
which conforms to both DOM and SAX specifications and is portable
across Windows and UNIX platforms. For more information about SAX,
see the Xerces C++ Parser page
.