Using XSLT to Assist Regression Testing

    April 22, 2003

Regression testing is an important software-testing technique in which the output of a program before a change was made is compared to the output after the change in order to determine whether the change introduced bugs.

This sort of testing is useful after refactoring code to improve its structure or performance, but it does not alter its behavior. It is also useful when new features are added to software, as a way to test whether or not the old features have been affected.

Recently, colleagues of mine asked if I knew of a tool that could help them regression-test some code which outputs XML. The problem was, they explained, that their changes may affect the order of elements. However, for their application, the order did not matter as long as the hierarchical structure and element content remained the same. Example 1 demonstrates just such a case.

Example 1: Equivalent documents ignoring order


I did not know of a tool that did this sort of comparison explicitly; however, I quickly told them that they did not need such a tool. “All you need to do is to normalize the output XML using a tiny bit of XSLT,” I said. “Then you can simply use a standard Unix diff to check for differences.”

Before exploring the solution motivated by my colleague’s problem, it’s worth investigating solutions to other common normalization problems. A simple solution applies when the only expected differences between two XML documents are whitespace differences. That is, when you need to normalize whitespace between elements. Example 2 shows two documents which are equivalent, irrespective of whitespace. Example 3 shows a simple stylesheet for normalizing such documents. The idea is to copy the input to the output, stripping all of the documents original whitespace-only nodes and inserting new whitespace using indent="yes".

Example 2: Equivalent documents ignoring whitespace


Example 3: Normalizing whitespace-only nodes


Although the solution in Example 3 efficiently performs the desired normalization, it’s worth mentioning an alternative implementation using the identity transformation, as shown in Example 4.

Example 4: An alternate whitespace normalizer using the identity transformation


The identity transformation is one of the most useful XSLT idioms. Why would a transformation that simply copies its input to its output be so useful? Because of XSLT’s ability to override template rules via xsl:import. Consider the stylesheet in Example 5, which imports the stylesheet in Example 4.

Example 5: Extending the functionality of Example 4 using xsl:import


This stylesheet extends the functionality of the space-normalizing stylesheet to include the ability to strip certain elements from the input document. This is useful for comparing two documents that are identical, when both whitespace and certain specified elements are ignored. The elements are specified in a parameter as a comma-separated list. We normalize the list for easy element-membership checking. If we detect an element node that is not in the list, we copy it by invoking the template in the imported stylesheet using xsl:apply-imports. In the XSLT Cookbook, I explore several ways to exploit the identity transformation.

Turning back to the original problem, to solve it we have to transform documents, whose elements may come in any order, into some normalized form for the purpose of comparison. One obvious normalization technique is to sort the elements by their names within each level of the document hierarchy. And we want to retain the whitespace normalization features of the solution in Example 3, which leads to the XSLT in Example 6.

Example 6: A simple normalizer using xsl:sort


This solution is adequate for some cases, but we can generalize it further. The first improvement we can make is to address the case of duplicate elements occurring at some level in the hierarchy, as shown in Example 7. These can be addressed by extending the sort criteria to include the element content, as shown in Example 8.

Example 7: Documents that will not normalize correctly due to duplicate element names


Example 8: Normalization via sort using node name and node content


Alas, Example 8 is not a completely general solution. Can you think of a counter example that will not necessarily normalize correctly? Example 9 provides one.

Example 9: A counter example that won’t compare under our sort based normalization strategy


The problem is that if duplicate elements contain structure, then our sort will not necessarily succeed in placing them in a normalized order. Another problem is that we have not considered the presence of attributes. Both of these problems can be overcome at the cost of added complexity. Space does not permit me to explore this topic further here; however, I will revisit this topic in a future article for As it turns out, the solution in Example 8 works just fine for many cases, including the ones of interest to my colleagues. However, it would be useful to add the capability to ignore specific elements that we introduced in Example 5. In Example 10, we use the same strategy of importing and overriding the template rule for element nodes.

Example 10: Importing and overriding the template rule for element nodes


This article discussed some important XSLT features and techniques that extend beyond the immediate problem of normalizing documents for comparison. These include the handling of document whitespace, the use of the overriding, the identity transform, and the use of xsl:sort.

There are many excellent resources, both online and in print, which will enhance your understanding of these facilities. I provide a few of them below. And if you’ve enjoyed the approach taken in this article, then you may also enjoy my XSLT Cookbook.

Recommended resources:

  1. XSL FAQ.
  2. Robert DuCharme’s Transforming XML column.
  3. XSLT, by Doug Tidwell (O’Reilly, 2002).
  4. XSLT and XPath on the Edge, by Jeni Tennison (MT Books, 2001).

Originally published at

Sal Mangano has been developing software for over 12 years and has worked on many mission-critical applications, especially in the area of financial-trading applications. Unlike many XML/XSLT developers, he did not approach the technology from the standpoint of the Internet and Web development but rather from the broader need for a general-purpose, data-transformation framework. This experience has given him a unique perspective that has influenced many of the recipes in his book, the XSLT Cookbook. Sal has a Master’s degree in Computer Science from Polytechnic University.

XSLT Cookbook – Critical for converting XML documents, and extremely versatile, the XSLT language nevertheless has complexities that can be daunting. The XSLT Cookbook is a collection of hundreds of solutions to problems that Extensible Stylesheet Language Transformations (XSLT) developers regularly face. The recipes range from simple string-manipulation and mathematical processing to more complex topics like extending XSLT, testing and debugging XSLT stylesheets, and graphics creation with SVG. Recipes can be run directly or tweaked to fit your particular application’s needs more precisely. cover