Pulling a list of Unique Values from XML
When you have to work with HTML controls in a web-based application, 100% of which are populated and whose properties are set via dynamically-generated XSL transformations at runtime, you get to be pretty inventive. And one of the first things you learn is how NOT to “reinvent the wheel”. That is to say, if there is an example somewhere or some existing code that you can borrow from or re-use, there is “no shame and no blame”.
Recently I needed to grab some IDs and descriptions from an existing XML Data Island in order to dynamically create and populate a listbox. The only problem was, I couldn’t just do it with XPATH expressions in script “walking the DOM”, because there were more than one of each item in the document. What I needed to find was some way to do the XSL equivalent of “Select Distinct XYZID, XYZDESCRIPTION from MYXMLDOC”.
My first impression was to crack open Michael Kay or search my harddrive (where i have built a veritable “Driveopedia”), but no luck. Next I turned to my superior search capabilities which consist mainly of going to HotBot and typing plus (+) signs in front of all the words I know must be present to find what I want.
Anyway, to make it short, I found a couple of examples and after playing around with my syntax for a while, I was able to come up with a stylesheet that should work for most cases. By the way, i can’t remember where I found the best solution but I do remember the guy’s name — Martin Rowlinson. So Martin, thanks — yours did the trick for me!
First, let’s take a look at some sample XML data that we might need to get distinct items from:
View XML Document
After reviewing the document, you can see it has repetitive “ProdBasic nodes”:
Now what I needed to do is something that is extremely common, and that’s why I decided it would be useful to summarize a solution here. What I needed to do was to get the ‘MktSegID’s (to populate the “value” attributes of my listbox OPTION elements) and the ‘MktSegCode’s (to put in the innerText attribute of the corresponding listbox OPTION elements) – but I needed to get ONLY all of the UNIQUE ones. In other words, if there were three “BASIC” MktSegCode elements, I needed to return only ONE from the XML document — or my listbox would end up having duplicate elements and look very sloppy indeed.
Now that we see the structure of this particular XML, let’s take a look at the XSL stylesheet and analyze it. It’s so simple, in fact, that I’m reproducing it inline directly below:
<xsl:stylesheet version=”1.0″ xmlns:xsl=
<xsl:output method=”xml” />
The first thing that should jump out at you here is that we start out by declaring XSL “key” function. This is an extremely powerful part of XSL. It gives us the flexibility to find the nodes in a document that posess a given value for a named key. For example, if I have a key definition: <xsl:key name=”distinct-segcode” match=”MktSegCode” use=”.”/ > this means that the expression: key(‘distinct-segcode’, ‘BASIC’) would return a node-set containing the single element: <MktSegCode>BASIC</MktSegCode>. Note that the use=”.” portion tells it which element or attribute to use for a “match” — in this case, the value of the MktSegCode element itself (“.”).
Next, we do a for-each on each and every MktSegCode and MktSegId element in our document and apply an XPATH predicate filter”[ ……]” containing a generate-id() function that references the key whose value is the distinct-segcode key value that is equal to itself!
Remember our goal, simply stated, is “”only select this current node if it is the first occurence of all the nodes that have the same value”. What the generate-id() function is doing is returning a unique id for any given node. No matter how the node is selected it will always have the same generated id. Generate-id generates a unique id for the node that is passed as its parameter, or the current context – node if no parameter.
Notice on the first instance of the generate-id() function in each predicate there is no parameter – the default parameter is equal to the current context node. But on the second use of the generate-id() we’re passing a node-set ( in this case, key(‘distinct-segcode’,.) — or every MktSegCode element) – in which case generate-id() returns the id for the first node of that set. So, in essence, what this predicate filter is saying is “only select this node if it has the same unique id as the first node of all thenodes having the same value as this node”.
The result is about the same as the SQL “Select distinct MktSegCode from ProdInfo” statement that we’re all so familiar with. Different paradigm, eh? You can try out the result here.
Download the code that accompanies this article articles/20010508.zip
Dr. Peter Bromberg is a Senior Programmer/Analyst at Fiserv, Inc. in Orlando and a co-developer of the http://EggheadCafe.com developer