The eXtensible Markup Language (XML) was created to store and define
complex, hiearchically structured data for exchange and storage.
The XML structure begins with it's hiearchy at a root node and branches
from this document root.
The Document Type Definition (DTD) is optional and defines the data to be
presented in an XML document. It is often used to verify the data for
completness and adherance to rules.
XML Schema (XSD) is a newer and more complete data definition with definable
types. XSD will be competing with DTD as the format for data definition
especially when defining complex relationships and data types.
XML parsers fall into three major catagories:
DOM: Import/parse all data into a data structure in memory for query.
The data is held as nodes in a data tree which can be traversed.
While this is often easier to program than SAX invocations, it uses
more memory and runs slower.
SAX: Parse on the fly to look for the data requested.
This is event driven where callbacks are invoked as elements are
encountered during parsing. Programmer writes callbacks. A custom
class is written for each document. This is considered to be the
fastest way to parse a file.
Xpath: (XML Path) Search data with regular expression. Very easy to use.
Useage is similar to a query with regular expression. A node list is returned
which matches the Xpath expression. It is usually implemented as an extension
to DOM.
DTD:
Number of children:
? Only one element permitted.
* allows for zero or multiple elements i.e.: <!ELEMENT name (first, middle*, last?)>
+ At least one or many elements permitted.
Attributes:
CDATA #REQUIRED
CDATA #IMPLIED
CDATA
Character Data
PCDATA
Parsed character Data
NMTOKEN
No whitespaces.
NMTOKENS
One or more name tokens separated by white space
ENUMERATION
i.e.
<date month="January" day="27" year="2004"/>
ENTITY
ENTITTIES
ID
XML name specified: <!ATTLIST xml_name1 xml_name2 ID #REQUIRED>
xml_name2 is required.
IDREF
attribute refers to an ID
IDREFS
NOTATION
XML names may include _-.
When HTML text is included use <, &, > and " to repressent <, &, >, and " respectively.
// -------------------------------------------------------------------------- // Open XML document // --------------------------------------------------------------------------
if (doc == NULL) printf("error: could not parse file file.xml\n");
// -------------------------------------------------------------------------- // XML root. // --------------------------------------------------------------------------
/*Get the root element node */ xmlNode *root = NULL; root = xmlDocGetRootElement(doc);
// -------------------------------------------------------------------------- // Must have root element, a name and the name must be "AppConfigData" // --------------------------------------------------------------------------
// -------------------------------------------------------------------------- // AppConfigData children: For each DisplayX // --------------------------------------------------------------------------
[Potential Pitfall]: The order of the directory
paths referenced matters. Reference the libxml2 include path directories
before the gnome directory paths. The following will result in a
compilation error:
This is due to different structure definitions of xmlDocPtr
(struct _xmlDoc) and xmlNodePtr (struct _xmlNode) in libxml/tree.h. The reference to the subdirectory libxml/ should have
differentiated the two versions of the include file but that is not the case
with the GNU compiler. The proper file is /usr/include/libxml2/libxml/tree.h and not the file /usr/include/gnome-xml/tree.h.
Components:
LibXML: xml2-config --cflags --libs
(Reference this first.)
Gtk: pkg-config --cflags --libs gtk+-2.0
Gnome: gnome-config --cflags --libs gnome gnomeui xml
XSL family: has various subsets to describe XML encoded data.
W3C: XSL family
XSL: (Extensible Stylesheet Language) describes XML encoded data.
W3C: XSL
XSLT: (XSL Transformations) maps XML document from one form to another.
XSLT stylesheets are not procedural and often include a template to
define output.
W3C: XSLT
XSL-FO: (XSL Formatting Objects) define visual formatting of XML document.
XML.com: Using XSL-FO
XPath: (XML Path Language) non-XML language used to find data (XML query) within an XML document.
i.e.