Web Analytics and Content Group Management

    January 9, 2007

I’ve been re-reading Stephen Jay Gould’s “Full House” (a book that should be of interest to any analyst) and I was struck by the following passage:

“classifications are not passive ordering devices in a world objectively divided into obvious categories. Taxonomies are human decisions imposed upon nature – theories about the causes of nature’s order. The chronicle of historical changes in classification provides our finest insight into conceptual revolutions in human thought.”

It seems particularly relevant to me right now as, in the past few Tools posts, I have been dealing with the importance of hierarchies in web analytics. And a hierarchy is, of course, a taxonomy of how content on a web site is related.

What the passage above really brought home to me is how fundamental, in analytics, is the view we take of how content is related across a web site. The Functionalist methodology that we use provides one kind of taxonomy – but it encourages a taxonomy by page type not by page content. That’s extremely useful for many kinds of analysis – since, in our experience, methods of analysis are driven more by page type than page content.

But there are many times (and visitor segmentation is high among these) when a taxonomy of pages needs to be based on the content of the page not its function. As a fact about a visitor, it is generally much less interesting that a visitor viewed three Router pages than that he viewed three ‘Product X’ pages (though the first fact is not without interest).

What makes problems in taxonomy especially challenging is the fact that no single taxonomy is likely to support a very wide range of analytic problems. If, for example, I want to know what visitors who trade Stocks are interested in, I might need a taxonomy that classifies pages based on tool-type (portfolio analysis, stock finder, research, historical performance, etc.). But I might also want a taxonomy that classifies page content based on equity type or market (all pages about IBM, MSFT or all pages about large cap and small cap, or all pages classified by relevant exchange). There is no single “correct” taxonomy for a page about, for example, IBM’s performance.

Similarly, if I’m interested in the way visitors use on site customer support, I may want to classify all pages by Customer Support (and Support sub-functions or contact mechanisms) while leaving all other pages in one great bucket.

For a navigational analysis I might want to classify all my router pages in a single bucket, all my search usage in another. Then I’d have a wonderful way of comparing a visitor’s search usage with his navigation usage – an analysis that’s really hard to do on a large site without this kind of capability.

On the SEMphonic website, I might want to classify pages within our web analytics space by their topic – Functionalism, SEM Analytics, Conversion Analysis, etc. But I might also want to classify them by their purpose – education, sales or information.

The gist of all this is that there is no one single correct taxonomy for a web site – there are only taxonomies appropriate for more or fewer analytic problems.

Unfortunately, many – indeed the vast majority – of web analytic solutions don’t provide a really nice way to build taxonomies on the fly. Indeed, a fair number of systems force the taxonomy of the site to be set in the single most inconvenient and stupid way possible – the tag. Trying to get a good taxonomy through a tag is a nearly hopeless task unless the IT gods have smiled on you and your site happens to have a fairly coherent directory structure.

A fair number of organizations get around the difficulties of building a taxonomy in a tag by integrating with their content management system. This is, obviously, vastly better than trying to enter and maintain a taxonomy in your javascript. But while the Content Management System is not a terrible place to construct a taxonomy, neither is it the best. First, because handling taxonomy in this fashion will guarantee that you are locked into a single view of the site that will be in-appropriate for many analyses. Second, because the taxonomy created by content managers will often – even as an uber-taxonomy of general interest – fall short of what an analyst would prefer. Its organizing principle is almost always going to be navigational – and while a navigational taxonomy is at least sensible – it isn’t always the best taxonomy for analysis.

It is also here that Web 2.0 concerns (and, in fact, any kind of dynamic site serving) will raise their collective head; because dynamic content almost never contains within it the ability to use a directory structure as an organizing principle. In addition, Web 2.0 type widgets often cry out for cross-taxonomy analysis. I want to understand them as actions within a particular widget scope (user filtered inside portfolio analysis) and I also want to understand them as the type of activity this user does (user filtered in any view).

There really is only one logical place to construct a taxonomy – and that is in the web analytics application. It would be nice if the web analytics application could be passed a general purpose taxonomy (via tag or CMS as is now the case). But it’s even more important that the application be able to construct multiple “point” taxonomies that can be used for specific analytic purposes.

The application is the logical place for this because it contains all of the necessary information (there’s nothing necessary for classification except the page/event name). In addition, it is really only the analyst that needs this capability. There is no reason to down-stream it because no one else can or will take advantage of it. Finally, a GUI application is a great place to actually construct taxonomies.

The combination of a graphical drag-and-drop interface, ability to apply regex rules and the ability to create analysis specific taxonomies on the fly would be a formidable boon to web analytics practice. These capabilities would make it relatively easy to “manufacture” a taxonomy for analysis – a task that would be excruciating in a CMS or using simple assignment statements.

I think this ability might also go some ways to providing a rational mechanism for handling Web 2.0 constructs. Instead of having to decide what events constitute a “Page View” and which don’t, the ability to quickly create, use and drop hierarchies would allow the analysts to pick and choose which events to group in which ways. For one analysis, a Filter Operation might be included in the hierarchy of interest. For another, it might not. This ability to promote and demote elements up and down and in and out of a hierarchy of interest would allow Web 2.0 objects to be tracked (at implementation) in a fairly uniform low-level manner. The analyst wouldn’t have to worry about encapsulating the “best” set of business logic into the implmenation. Instead, every action would be tracked and the analyst could promote the actions as each new analysis warranted.

Why haven’t vendors implemented this form of taxonomic flexibility? The reasons harken back to my post on the limitations of OLAP – when you create a hierarchy on the fly you need to be able to de-dup critical numbers like visits and visitors. With the number of possible hierarchies being essentially infinite, it means that it’s quite difficult to accomplish in any pre-packaged data form.

But just because analysts haven’t usually had access to good systems of hierarchical classification and analysis doesn’t mean it isn’t important. While great progress has been made in the area of visitor segmentation capabilities, there is still great room for improvement in hierarchical classification. And I remain convinced that once an analyst has this capability, it will never be viewed as secondary again. Taxonomy, as Stephen Jay Gould rightly observed, is more than a dry exercise in classification. It is the very stuff of which our view of the world is made – and no analyst can solve any measurement problem without, at the very least, a basic taxonomy in mind.

Indeed, simply understanding how important taxonomy is to web analytics can help clarify many an analytic task. And even if your tool forces you to jump through hoops and take wild detours, at least you will be moving in the right direction.


Add to Del.icio.us | Digg | Reddit | Furl

Bookmark WebProNews:

Gary Angel is the author of the “SEMAngel blog – Web Analytics and Search Engine Marketing practices and perspectives from a 10-year experienced guru.