XSL Identity Transforms
Many developers have a hard time getting started with XSL. One difficulty lies in the fact that XSL favors a recursive processing style. XML well-formedness guarantees that an XML document can be represented as a tree structure, and recursion is ideal for working with tree structures. Recursive thinking doesn’t come naturally to most people. You have to work hard to “get it.” Perhaps the same is true for XSL.
XML is an increasingly fundamental part of the technology landscape. XSL is a powerful way to manipulate XML and developers should be familiar with such a useful tool. Confidence in transforming XML documents with XSL is as important to a developer’s career as confidence in querying relational databases with SQL. This essay aims to show how powerful and elegant XSL’s recursive approach to transforming XML documents can be. The variations on simple identity transforms presented here embrace recursion and may give you a new way of thinking about XSL.
Pull vs. Push
In the simplest XSL transforms, a single template like the one below pulls content from an XML document into the transform output:
1 |<NameTag xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 2 | xsl:version="1.0 "> 3 | My name is: 4 | <b><xsl:value-of select="/Customer/Name" /></b> 5 |</NameTag>
Any literal content items within the template,
like the “My name is:” text on line three and the <b>
tags on line four, are simply copied to the output. The
xsl:value-of expression within the
<b> tags on line four
pulls content from the source XML document into the
result:
1 |<NameTag> 2 | My name is: 3 | <b>Sam Page</b> 4 |</NameTag>
A pull approach is very straightforward, but when pull-style templates get large, they are a mess to maintain. The pull approach is not suited to handling document-style XML input at all.
The following style sheet demonstrates a push approach:
1 |<xsl:stylesheet version="1.0" 2 | xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 3 | 4 | <!-- Customer template --> 5 | <xsl:template match="/Customer"> 6 | <NameTag> 7 | <xsl:apply-templates select="Name" /> 8 | </NameTag> 9 | </xsl:template> 10| 11| <!-- Name template --> 12| <xsl:template match="Name"> 13| My name is: 14| <b><xsl:value-of select="." /></b> 15| </xsl:template> 16| 17|</xsl:stylesheet>
Apply-templates throws any node matching its select expression up into the air and the best matching template catches each node establishing a new context. Once the catching templates are finished and all thrown nodes have been caught, context returns to the caller and processing continues.
Processing begins with the Customer template.
The XSL processor establishes a context within the source
document at the root Customer element. The xsl:apply-templates method on line
seven causes the Name template to match and the processor
establishes a new context at the Customer
element’s Name child element. Expressions
within a template always evaluate relative to the current
context. The context-changing methods, xsl:apply-templates and
xsl:for-each, push the
context around the source XML document. When a template is finished
processing, control returns to the calling template (pop!)
and context reverts to its previous state.
The xsl:call-template method does not
change context and is used chiefly to encapsulate code in named
templates much like a subroutine.
The benefits of the push approach are not necessarily apparent with such a simple example, but as transformation requirements get more complex, the push approach shines.
With pull processing, context stands in one place and you must reach throughout the XML input document for content. With push processing, context jumps around the XML input document at your direction allowing simple, local content selection. Understanding context is the key to understanding XSL.
The Identity Transform
An identity transform is a push processing style sheet that reproduces its input as its output. On the face of it, that doesn’t sound like a very useful transformation. But identity transforms provide the basis of a whole class of useful transformations.
The identity
transform below shows a typical recursive implementation of an
identity transform using XSL’s shallow-copy method xsl:copy:
1 |<?xml version="1.0" ?> 2 |<xsl:stylesheet version="1.0" 3 | xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 4 | 5 | <!-- IdentityTransform --> 6 | <xsl:template match="/ | @* | node()"> 7 | <xsl:copy> 8 | <xsl:apply-templates select="@* | node()" /> 9 | </xsl:copy> 10| </xsl:template> 11| 12|</xsl:stylesheet>
This identity transform makes a depth-first traversal of the entire XML document, copying elements and attributes as it goes. It’s a slick little piece of code!
The following list summarizes the special XPath functions frequently used when writing identity transforms:
@*– all attributesnode()*– all elementscomment()processing-instruction()text()
The node() function matches all the
node types below it in the list above, that’s why our identity
template is so spare.
Sometimes you’ll see identity transforms
expressed with @*|* instead of @*|node(). Such
templates drop comments and processing instructions but copy all
elements and attributes. Text nodes happen to get picked up by a
built-in XSL processor template for text nodes. Ever noticed how a
style sheet gone astray tends to dump all the text to output?
That’s the built-in text template at work.
The identity transform does not produce a byte-exact copy of its XML input. For example, it may expand closed elements to a pair of open and close elements or change white space depending on your XML toolset. It could also change the encoding of the document and expand entity references. But identity transforms do produce a copy that is semantically equivalent to its XML input.
Variations
Where the identity transform gets interesting is when you create additional special-case templates in your style sheet. During XSL processing, templates are assigned a match priority allowing only the most specific template to match. The XPath functions in the identity transform all have a relatively low match priority. If a more specific template match is found during the transform’s recursive walk of the XML document, that template is used instead of the identity template.
You can remove a single element or prune an entire branch from an XML document with a single-line empty template added to your style sheet:
1 |<xsl:template match="header" />
Empty XSL templates eat XML content during an
identity transform. Because the template above contains no further
xsl:apply-templates calls, the
recursion that led to the header element rewinds without
continuing to the header element’s child nodes.
Renaming is fun with an identity transform. Renaming all the para elements to p elements in an XML document can be accomplished with an identity transform and the following template:
1 |<!-- Rename para to p --> 2 |<xsl:template match="para"> 3 | <p> 4 | <xsl:apply-templates select="@* | node()" /> 5 | </p> 6 |</xsl:template>
Note the similarity to the base identity
transform template. In this case, the xsl:copy
command has been replaced with literal p
elements. The xsl:apply-templates call continues
the recursive walk among the p element’s child nodes. Renaming
elements with an identity transform is a lot less work than the
equivalent operation with the DOM API.
You can strip all attributes from a particular
element by simply not including the @* in the
recursive xsl:apply-templates call as line
four shows below:
1 |<!-- Strip all attributes from Product elements --> 2 |<xsl:template match="Product"> 3 | <xsl:copy> 4 | <xsl:apply-templates select="node()" /> 5 | </xsl:copy> 6 |</xsl:template>
Identity transforms can help you quickly build subsets of XML documents. In the following example, only customers in a certain zip code are copied to output:
1 |<!-- Copy only customers in the 90210 zip code --> 2 |<xsl:template match="Customer"> 3 | <xsl:if test="Address/ZipCode=’90210’"> 4 | <xsl:copy> 5 | <xsl:apply-templates select="@* | node()" /> 6 | </xsl:copy> 7 | </xsl:if> 8 |</xsl:template>
There are many ways to achieve this filtering
without an identity transform, but as the filtering criteria become
more complex, this approach has numerous advantages. By using
xsl:choose constructs in place of
the if above, much more complicated
and more clearly expressed filtering can be achieved than with
equivalent XPath expressions.
Creative variations on identity transforms allow XML documents to be entirely re-structured, sorted, grouped, flattened, expanded, filtered, decorated and otherwise transmogrified!
Attributes to Elements
A useful transform that’s very similar to the identity transform is one that converts attributes to elements. This simple transform has applications in documentation, validation, and code generation.
1 |<?xml version="1.0" ?> 2 |<xsl:stylesheet version="1.0" 3 | xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 4 | 5 | <!-- Elements --> 6 | <xsl:template match="/ | node()"> 7 | <xsl:copy> 8 | <xsl:apply-templates select="@* | node()" /> 9 | </xsl:copy> 10| </xsl:template> 11| 12| <!-- Attributes --> 13| <xsl:template match="@*"> 14| <xsl:element name="attribute"> 15| <xsl:attribute name="name"> 16| <xsl:value-of select="local-name()" /> 17| </xsl:attribute> 18| <xsl:value-of select="." /> 19| </xsl:element> 20| </xsl:template> 21| 22|</xsl:stylesheet>
If you were to expand the template above to include special rules for all the node types, you could create a style sheet that reveals the structure of an XML document as viewed from the XSL processor’s perspective. This is a good learning exercise.
Some other good learning exercises worth attempting are creating an XML representation of the Infoset of an XML document, pretty printing an XML document, or producing the C14N representation of an XML document. All of these transforms can be based on an identity transform. I’ll warn you that perfect results may not be attainable, but you can get close, and you will gain a better understanding of XSL and your XML toolset along the way. In the future, I’ll post my attempts at these transforms.
Developers must embrace recursion in XSL transformations in order to really “get” what XSL is all about. The identity transform is the King of recursive XSL transforms. In just three statements, the basic identity transform can copy any XML document. With simple variations on the identity transform, you can make quite complex transformations of XML documents. This style of XSL programming is also integral to creating XSL transformation pipelines—the subject of the next essay.
References
- XSL Transformations (XSLT) Version 1.0
- http://www.w3.org/TR/xslt/
