Archive | Basic XML | XSL Transforms | Projects | About

XSL Techniques

This essay is a miscellaneous hodge-podge of XSL techniques. All of these samples come directly from my experiences working with a variety of XML documents.

Variables

A fundamental xsl:variable technique is to use variables to cache the value of an expensive XPath expression or lookup in order to avoid multiple evaluations.

Variables also help with clarity when using an XPath expression with multiple axes or a function like current(). The following demonstrates the difference:

1 |<!-- Example with Variable -->
2 |<xsl:variable name="ManagerId" select="Manager/@empId" />
3 |<xsl:value-of select="/Managers[@empId=$ManagerId]" />
4 |
5 |<!-- Example with current() -->
6 |<xsl:value-of select="/Managers[@empId=current()/Manager/@empId]" />

The current() function is frequently used in xsl:for-each statements where it almost always causes confusion for developers. When a variable’s value is going to be used multiple times within an xsl:for-each loop, eliminating current() from the code helps with clarity.

Using variables as pointers can improve performance. If you have to perform a pivot-table or cross-tabulation style transform, variables save a lot of lookups.

1 |<!-- Cross-Tabulation -->
2 |<xsl:variable name="Vend1" select="/All/Vendor[1]" />
3 |<xsl:variable name="Vend2" select="/All/Vendor[2]" />
4 |...
5 |<table>
6 |   <tr>
7 |         <td>Vendor</td>
8 |         <td><xsl:value-of select="$Vend1/Name" /></td>
9 |         <td><xsl:value-of select="$Vend2/Name" /></td>
10|   </tr>
11|   <tr>
12|         <td>Q1 Sales</td>
13|         <td><xsl:value-of select="$Vend1/Sales/Q1" /></td>
14|         <td><xsl:value-of select="$Vend2/Sales/Q1" /></td>
15|   </tr>
16|...

The variables act as pointers in the select expressions above. In a large XML document, such pointers are much more efficient than direct lookups.

Choose When Otherwise

When I first encountered the xsl:choose statement, I didn’t like it at all. But over time it’s become one of my favorite XSL elements to experiment with.

Evaluation of the xsl:choose statement short-circuits at the first matching test condition. This means you can often use less expensive XPath test conditions first to weed out simple cases before resorting to an unavoidably expensive XPath test condition. When processing many elements in a loop, such an approach can yield big performance gains.

Another advantage of short-circuit evaluation is the ability to capture complex expressions as a series of empty xsl:when statements with logically negative test conditions followed by an xsl:otherwise statement that does the work.

1 |<!-- Animal CheckList -->
2 |<xsl:choose>
3 |   <xsl:when test="Pig[@snout=’true’]" />
4 |   <xsl:when test="Duck[@feet=’webbed’]" />
5 |   <xsl:when test="Chicken[number(@legs)=2]" />
6 |   <xsl:otherwise>
7 |         <xsl:call-template name="vet911" />
8 |   </xsl:otherwise>
9 |</xsl:choose>

This style of xsl:choose statement can be considerably easier to read and understand. Elaborate logical constructions are not uncommon with complex XSL transforms involving farm animals.

Grouping

Grouping, not groping, is a common reporting requirement. If your XML data is already grouped in a hierarchy, straightforward XSL processing will do. However, when you have to build the hierarchy from a flat structure, you’ll need a method of extracting all the unique elements to drive the grouping. Two common techniques for doing this are Muenchian grouping and a technique I’ll refer to as ‘Not Preceding’ grouping:

1 |<!-- Muenchian Grouping -->
2 |<xsl:key name="empKey" match="Employee" use="@empId"/>
3 |
4 |<xsl:for-each select="Employee[
5 |   generate-id()=generate-id(key('empKey', @empId)[1])]">
1 |<!-- not(preceding::Grouping) -->
2 |<xsl:for-each select="Employee[
3 |   not(preceding::Employee/@empId = @empId)]">

I almost always use the ‘Not Preceding’ grouping because it seems more intuitive to me. The Muenchian grouping will likely perform better for large XML. Making the reasonable implementation assumption that the key() function performs O(log n), the Muenchian grouping technique performs on the order of O(n(log n)). The ‘Not Preceding’ method is O(n²). However, depending on the size of your XML, the performance difference is probably irrelevant.

Advanced Sorting

The xsl:sort element allows both ascending and descending sort orders for the results selected by the xsl:apply-templates or xsl:for-each calls. Sorts can also convert text for numeric sorting:

1 |<xsl:sort select="EmployeeNum" data-type="number" />

Multiple sort orders are specified with additional xsl:sort elements:

1 |<xsl:apply-templates select="Customer">
2 |   <xsl:sort select="LastName" />
3 |   <xsl:sort select="FirstName" />
4 |</xsl:apply-templates>

It’s often overlooked that the sort select expression can include any XPath functions. Using XPath functions, you can create artificial keys for sorting. A compound key made from the first letter of the last name followed by the first digit group of an SSN could be built as follows:

1 |<xsl:sort select="concat( substring( LastName, 1, 1 ),
2 |                          substring-before( SSN, ‘-‘) )" />

The example may not be pretty, but the unusual technique of creating an artificial sort key is helpful occasionally.

Recursive String Parsing

Many languages have better facilities for string manipulation than XSL. But several of the string functions, concat(), substring-before() and substring-after(), lend themselves to recursive string processing. A basic recursive string tokenization procedure is shown below:

1 |<!-- recursively splits string into <token> elements -->
2 |<xsl:template name="tokenize">
3 |<xsl:param name="src"/>
4 |
5 |   <xsl:choose>
6 |   <xsl:when test="contains($src,' ')">
7 |
8 |         <!-- build first token element -->
9 |         <xsl:element name="Token">
10|               <xsl:value-of 
11|                     select="substring-before($src,' ')"/>
12|         </xsl:element>
13|
14|         <!-- recurse -->
15|         <xsl:call-template name="tokenize">
16|               <xsl:with-param name="src" 
17|                     select="substring-after($src,' ')"/>
18|         </xsl:call-template>
19|
20|   </xsl:when>
21|   <xsl:otherwise>
22|
23|         <!-- last token, end recursion -->
24|         <xsl:element name="Token">
25|               <xsl:value-of select="$src"/>
26|         </xsl:element>
27|
28|   </xsl:otherwise>
29|   </xsl:choose>
30|</xsl:template>

Here’s a simple driver that captures the Token elements into a variable then processes the result tree fragment with the node-set() extension function:

1 |<!-- recursive string parsing demo driver -->
2 |<xsl:template name="string-parse">
3 |<xsl:param name="src"/>
4 |
5 |   <!-- split input $src into tokens xml fragment -->
6 |   <xsl:variable name="tokens">
7 |         <xsl:call-template name="tokenize">
8 |               <xsl:with-param name="src"
9 |                     select="normalize-space($src)"/>
10|         </xsl:call-template>
11|   </xsl:variable>
12|
13|   <!-- iterate over each token -->
14|   <xsl:for-each select="msxsl:node-set($tokens)/Token">
15|         <xsl:value-of select="."/>
16|   </xsl:for-each>
17|
18|</xsl:template>

Variations on the recursive tokenizing template above are useful in a variety of circumstances. The recursion and driver could be combined into a single template, but were left separate here for clarity.

The Document Function

The document() function allows you to load an XML document from within a style sheet and make queries against it. External XML documents make for great lookup tables, separating your table data from your style sheet. They can also be used in a content management scenario to pull in common page template components when building documents.

A simple program that produces a nested XML representation of a folder and file structure by performing a recursive directory walk is a useful tool to have around. It’s the kind of thing you can whip together in Perl or .NET in a few minutes. Used as input to a document() function, a simple XSL-based driver program can be built for transforming a slew of XML files. Here’s the basic structure in XSL:

1 |<!-- Folder Driver -->
2 |<xsl:template match="/ | Folder">
3 |   <xsl:apply-templates select="Folder | File"/>
4 |</xsl:template>
5 |
6 |<!-- File Driver -->
7 |<xsl:template match="File">
8 |   <xsl:apply-templates select="document(@name)/Monkey"/>
9 |</xsl:template>
10|
11|<!-- Filter -->
12|<xsl:template match="Monkey">
13|   <xsl:copy-of select="Love"/>
14|</xsl:template>

If your XSL implementation provides extension functions for writing multiple result documents, then you’ve got the basis for a simple content management system.

Another use for the XSL-based driver program is as an XSL-grep tool. You simply write an XSL filter that pulls out just the information you need from all the external XML documents and combines them into a single output document. The sample above extracts all the Love from the Monkeys.

Extension Objects

Extension objects are very implementation specific, but they can be especially useful. With extension objects, you get to pass Java or .NET class instances into your transforms. Your classes can expose native Java or .NET methods to the XSL processor that you use just like any XPath function.

EXSLT
The EXSLT library is a collection of common XSL extension functions. The functions offer math, date, set operation, string manipulation and other extensions. The library is increasingly available for a number of XSL processors, making their use somewhat portable.

Common uses include generating unique serial numbers, encryption, mathematical calculations, and string tokenizing. Extension objects are easy to implement and open up the powerful libraries of Java and the .NET framework to your XSL.

Here’s a brief sample using C# and .NET:

1 |// XSL Extension Object Sample
2 |public class TestExtension
3 |{
4 |   public string Howdy( )
5 |   {
6 |         return "Hello";
7 |   }
8 |}
1 |// Create an argument list object
2 |XsltArgumentList args = new XsltArgumentList();
3 |
4 |// Create an instance of TestExtension
5 |TestExtension test = new TestExtension();
6 |
7 |// Associate the instance with a URI in the arguments
8 |args.AddExtensionObject( "http://liquidhub.com/Ext", test );
9 |
10|// Perform the transform
11|xsl.Transform( doc, args );
1 |<xsl:stylesheet version="1.0"
2 |   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
3 |   xmlns:lh="http://liquidhub.com/Ext">
4 |
5 |   <!—Sample extension object function call -->
6 |   <xsl:variable name="x" name="lh:Howdy()" />

The mapping of the extension object to a namespace URI is the important part to get correct. The URI in the code MUST match the URI in the style sheet. Extension functions can take any parameters the built-in XPath functions take, including node sets.

I wouldn’t use extension objects for something I could accomplish without too much trouble in XSL. Transform parameters passed in from your code might suffice in many situations. But extension objects are nice to know about because they can really bail you out when you need them for performance or exotic functionality.

section break

Future

XSL 2.0 is undoubtedly going to be an improvement to the already powerful XSL 1.0 language. XSL 1.0’s wide adoption really helped focus the XSL 2.0 efforts on important enhancements. Expect moving to XSL 2.0 to be like upgrading from the kind of hand tools you buy at a hardware mega store to the kind of hand tools a furniture maker works with. Both get the job done, but in a craftsman’s hands the finer tool will produce better results more easily.

The XSL 2.0 specification is being developed in coordination with the XPath 2.0 and XQuery 1.0 specifications. They will all probably become W3C recommendations in the winter or spring of 2006. XQuery is a complementary technology to XSL, best suited to handling the querying of XML data in a role similar to SQL with relational databases. XSL will remain the primary XML transformation language, though both languages overlap into each other’s functional territory. Both specifications share XPath 2.0 as a common expression language. Improvements to XPath will enable cleaner solutions to difficult XSL 1.0 operations like Muenchian grouping.

Microsoft will certainly support these new core XML technologies, but based on their past experience implementing XSL 1.0, they will certainly be waiting until the ink is dry on the XSL 2.0 specification. If you’re eager to try these new technologies, Dr Michael Kay’s Saxon toolset supports the latest draft specifications.

References

XML Path Language (XPath) 2.0
http://www.w3.org/TR/xpath20/
XSL Transformations (XSLT) Version 2.0
http://www.w3.org/TR/xslt20/
XQuery 1.0 : An XML Query Language
http://www.w3.org/TR/xquery/
Saxonica : SAXON
http://www.saxonica.com/
EXSLT : An Extension Library for XSL
http://exslt.org/