XML Namespaces
In order to “play well with others” in the XML world, you need to understand and use XML Namespaces. The good news is that namespaces are easy to use. XML Namespaces is among the shortest XML-related specifications and can be read in about fifteen minutes. I only recommend reading most of the XML-related specifications, however, if you are having trouble sleeping.
Namespaces have a reputation for being confusing and frustrating. If you use namespaces incorrectly, you will indeed be signing yourself up for some frustration. And there are some esoteric points with namespaces that are difficult to understand. Most of the frustration comes from the fact that you can use many XML technologies successfully without using namespaces, but once namespaces are on the scene, some of your coding needs to be adjusted to account for them.
You don’t have to do much to support namespaces, so it’s best to use namespaces from the start and avoid code changes later. In this essay we’ll learn the basics of namespaces and key programming areas affected by their use. In particular, we’ll look at the namespace impact on XPath because it is central to both DOM and XSL processing.
The following XML sample has no namespace defined.
1 |<List name="Fruit List"> 2 | <Item>Apple</Item> 3 | <Item>Banana</Item> 4 |</List>
Here is the same sample with a default namespace set. All the elements and attributes in this document are in the same namespace.
1 |<List xmlns="http://liquidhub.com/SimpleList" name="Fruit List"> 2 | <Item>Apple</Item> 3 | <Item>Banana</Item> 4 |</List>
IRI, URI, We’re All Al-RI-ght for Namespaces
The value of the
namespace xmlns attribute is a uniform
resource identifier (URI). The URI should be unique and
persistent for all time. A uniform resource locator (URL) is
often used as the base for namespace URIs because the domain name
registration system provides a recognized central authority for
unique name ownership. A URL is a URI scheme for HTTP and
other location dependent Internet protocols.
For namespaces, the location aspect of URIs is not significant. No document has to reside at the location specified—though sometimes a DTD, XML Schema or other information can be found at the location. It’s not bad practice to serve up some helpful documentation or links from the URL associated with a namespace.
A uniform
resource name (URN) is also commonly used for namespace URIs.
URN was developed as a more constrained form of URI that you can
use to define name-based, location-independent identifiers.
Some examples of URN use are the ISBN for books (urn:isbn:0-345-39180-2), ISSN for
serial publications (urn:issn:1075-2838), and the IETF
requests for comments documents (urn:ietf:rfc:3986).
The internationalized resource identifier (IRI) is simply the evolution of the URI into an international character set. URIs only allow for a subset of us-ascii characters in their names. IRIs expand URIs to include UTF-8 characters. UTF-8 is an encoding of the universal character set (Unicode/ISO 10646). Any recently updated XML specifications favor IRIs to URIs.
The IETF RFC 3986 is a good place to start finding out more about URIs. IRIs are specified in IETF RFC 3987. Tim Berners-Lee wrote an excellent article on the design of URIs called “Cool URIs Don’t Change”, definitely worth a quick read.
Namespace Prefixes
Prefixes are simply shorthand for the full
namespace URI. The following sample demonstrates a namespace
prefix lh used for the namespace
declaration and also prefixes all the element names in the
“http://liquidhub.com/SimpleList”
namespace with the lh: prefix:
1 |<lh:List xmlns:lh="http://liquidhub.com/SimpleList" 2 | name="Fruit List"> 3 | <lh:Item>Apple</lh:Item> 4 | <lh:Item>Banana</lh:Item> 5 |</lh:List>
It’s not required
to prefix all the Item elements with lh:
because the lh: on the List
element cascades to the contained elements. Without the namespace
prefix mechanism, each element would have to include the namespace
declaration attribute to achieve the same effect as prefixing all
elements. Choosing to prefix all elements or to rely on namespace
cascading is a style decision; both methods accomplish the same
result. The web log sample below gives some insight into namespace
prefix usage tradeoffs.
Most namespaces
have a generally accepted prefix associated with them, xsl: for
style sheet transforms, xlink: for XML linking language,
but it is not necessary to always use the same prefix. The URI must
remain constant, but the same document with pig: as
the prefix instead of lh: would carry the same namespace
information.
Namespaces in Use
If you were
designing an XML application to store a web log, parts of your
markup would be dedicated to metadata for each web log post and
parts for the post content itself. Metadata elements like
Author, Category
and PostedDate would likely belong in
the same namespace. For the post content of each web log entry you
may choose to leverage XHTML formatting elements from the XHTML
namespace within your markup. Namespaces enable multiple markup languages to
be assembled into new markup languages without name
collisions. Namespaces enable modular XML.
The following sample shows elements from two namespaces combined in our hypothetical web log XML format:
1 |<blog:Blog xmlns:blog="http://liquidhub.com/Blog"> 2 | <blog:Entry> 3 | <blog:Author>Sam Page</blog:Author> 4 | <blog:Category>General</blog:Category> 5 | <blog:PostedDate>20050301</blog:PostedDate> 6 | <xhtml:body xmlns:xhtml="http://www.w3.org/1999/xhtml"> 7 | <p>Today is a <b>good</b> day.</p> 8 | <ul> 9 | <li>Read a book</li> 10| <li>Listen to music</li> 11| </ul> 12| </xhtml:body> 13| </blog:Entry> 14|</blog:Blog>
The sample avoids namespace prefixing every XHTML element within each post by using namespace cascading. Because other namespaces are not going to be mixing within the XHTML content, cascading is a reasonable approach.
The Dublin Core metadata element set is a common set of elements for providing bibliographical information about a document. Title, Creator, Subject, Date, and Language are some of the Dublin Core metadata element names. Without namespaces, you can see how Dublin Core would be a lot less useful because of inevitable name collisions with common element names like Title. Dublin Core demonstrates how namespaces enable modular XML.
All the elements in the blog namespace are prefixed in the sample because it’s clearer when mixing namespaces among child elements. In this case, it’s not entirely necessary to prefix, but if we were to add additional namespaces, say Dublin Core* metadata elements, the clarity advantage of prefixing would be more apparent.
Another reason to favor prefixes is that all the blog elements, whether explicitly prefixed or implicitly cascaded, belong to the blog namespace and must therefore in code be referenced with the namespace. The prefix in the XML document serves as a reminder to include the namespace in code references.
Namespaces and XPath
XPath is the basis for querying and manipulating XML DOM trees and also the pattern language used in XSL transforms. Because these technologies are fundamental to working with XML documents, every developer should be familiar with them.
Namespaces add some minor complications to how XPath expressions work because of the many valid ways that namespaces may be declared or prefixed. All XPath implementations that support namespaces have a method for managing namespaces similar to the one described below for the Microsoft .NET platform.
XPath Namespace Handling in Microsoft .NET
Every DOM and
XPath implementation requires a method to communicate the namespace
URI and prefix mapping for XPath expressions. In Microsoft’s .NET
implementation, the XmlNamespaceManager provides this
mapping. The following two samples show typical namespace uses in
.NET.
1 |// using XmlNode.selectNodes method with XmlDocument 2 |XmlNamespaceManager nsmgr = 3 | new XmlNamespaceManager( doc.NameTable ); 4 |nsmgr.AddNamespace( "lh", "http://liquidhub.com/SimpleList" ); 5 |XmlNodeList nodeList = doc.SelectNodes( "//lh:Item", nsmgr );
Note that at the end of line five, the
namespace prefix lh: is used in the XPath
expression “//lh:item”. This only works
because we associated the prefix lh: with
the namespace URI in line four. The source document could have used
a default namespace or a different prefix than the one used in our
code, but we avoid having to code for the many possible legal
namespace declarations by mapping a single prefix that will be
consistently used in our code. This leaves source documents free to
use whatever method they want to declare the namespaces.
The next code sample accomplishes the same prefix mapping with the more granular interfaces in the Microsoft .NET XML services.
1 |// using XPathNavigator and XPathExpression 2 |XmlNamespaceManager nsmgr = 3 | new XmlNamespaceManager( nav.NameTable ); 4 |nsmgr.AddNamespace( "lh", "http://liquidhub.com/SimpleList" ); 5 |XPathExpression expr; 6 |expr = nav.Compile( "//lh:Item" ); 7 |expr.SetContext( nsmgr ); 8 |XPathNodeIterator iterator = nav.Select( expr );
Because the .NET
XML classes provide versions of the SelectSingleNode and
SelectNodes methods that don’t
require a namespace manager, it’s easy to write XPath expressions
and code that are not namespace aware. If you don’t account for
namespaces in your XPath expressions from the start, you’ll be
faced with the painful task of re-coding to use the namespace
manager and prefixes in all of your XPath expressions later.
Namespaces can no doubt look ugly in an XML document. Many texts teaching basic XML leave namespaces out of examples to avoid complicating things. From a programming perspective, it’s better to account for namespaces from the beginning of a project to avoid messy problems later. Though you can get pretty far without dealing with namespaces, not being comfortable with namespaces is a distinct disadvantage. Namespaces are easy, so get on board!
References
- Namespaces in XML 1.1
- http://www.w3.org/TR/xml-names11
- Uniform Resource Identifier (URI): Generic Syntax
- http://www.ietf.org/rfc/rfc3986.txt
- Internationalized Resource Identifiers (IRI)
- http://www.ietf.org/rfc/rfc3987.txt
- Hypertext Style: Cool URIs Don’t Change
- http://www.w3.org/Provider/Style/URI.html
- Dublin Core Metadata Elements Set, Version 1.1
- http://dublincore.org/documents/dces
- XML Resource Directory Description Language
- http://www.rddl.org/
