Web Analytic Tools

    November 6, 2006

Step 1. Destroy your Checklist… Check.

There is nothing more misleading when thinking about tools than the notorious “checklist” of features used by analysts and companies. Checklists get created by people running down a list of features in a product: Report on Geography. Check. Report on Screen Resolution. Check. Show Average Page Time. Check. Visitor Segmentation. Check. Dashboards. Check.

For fully commoditized goods, the old checklist approach is indeed pretty useful. But it depends for its usefulness on two huge assumptions that just don’t apply in web measurement. The first is that you know exactly what you need and want so you can decipher the relative importance of checks. The second is that a checkmark is all there is to say about a feature (a megapixel is a megapixel – but Visitor Segmentation is not one single thing).

The first assumption (you know what you need) is rarely true in web analytics; the second (a function is a true commodity) almost never.

Let’s start with one of the biggest items on your checklist – a core capability to all serious web analytics – visitor segmentation. Almost every tool now offers some form of visitor segmentation but they differ wildly in flavor and capabilities. Visitor Segmentation, you see, isn’t one capability. It’s a whole matrix of capabilities necessary to solving a variety of fundamental web analytics tasks.

So here’s a more detailed subset of aspects of Visitor Segmentation:

Segment Creation

  • Can Segments be created without tags
  • Segmentation Logic: can full logical operators be used to define segments
  • Segmentation Variables: what data can be used to define segments
  • Can external data be used natively and combined with web data in segment creation
  • Can Segments be created via data-driven techniques like neural networks
  • Can segments focus on visit or visitor behavior
  • Can segments be defined based on time and event sequences
  • Can distributions be produced on key behaviors to assist in segment creation

Segment Methodology

  • Are segments samples or against all data
  • Are segments created in real-time or delayed

Segment Reporting

  • Are all data cuts available from segmentation or just some – and which ones
  • Can Segments be cross-tabulated

Segment Usage

  • Can segments “lock-down” a specific set of visitors that can then be tracked regardless of any segment criteria
  • Can visitors from a segment be exported to other systems

I’m going to talk about each of these four items in separate posts – beginning today – because if I tried to tackle all four of them at once your Monday morning would be shot!

Segment Creation in Web Analytics

The ability to create Segments without tags is a fundamental capability. Its importance cannot be overstated. Any tool lacking this capability should not be on your short-list. Why? Segments are the analyst’s primary tool. But they change for almost every single analysis. You won’t use the same segments over and over – you’ll need new segments every time with highly specific behavioral definitions. And if you think an Analyst can get a tag changed on a site to explore an analysis – think again. Even if this was possible, imagine the cycle time: identify a segment, change the tag, rollout, wait for results (no back-history here), see it isn’t what you need, try again. Impossible. Tag-based segmentation is garbage and fortunately is no longer the standard.

Segmentation logic has less obvious impacts than tagless segmentation. It is, nevertheless, the area where most of even the best analytic tools have serious shortcomings. In an effort to protect you from your presumed stupidity (or their performance problems) most analytic solutions have a simple way to build segmentation logic – the set of behaviors that defines a visitor segment. By far the most common method is this: you can define multiple conditions; each condition is an AND that must be met for the visit/visitor to be included; within each condition you can set multiple values that meet the criteria. These values are implicitly OR values – if the visit/visitor meets any one of them then the condition is satisfied. This sounds pretty powerful. You can build logic like this: I want any visitor who viewed 5 or more page views in the Product section of the site AND started and order AND didn’t place an order. But as powerful as this seems, there are many, many conditions that can’t be created. Here’s one common one: I’d like to create a Visit based Prospect segment defined as anyone who has 0 Purchases OR who became a customer in this session. Oops. That OR is between two types of conditions. Can’t be done with this type of criteria definition. Or suppose I’d like to say that my Engaged Prospects are any visitor with 5 or more Page Views OR 2+ Visits. That’s an OR across conditions. Again, can’t be done. There are a range of different criteria builders out there, but there’s only one that would really make sense – full logical control including AND and OR plus parenthetical grouping. And I’ll venture to suggest that no single deficiency in web analytic tools is as consistently frustrating as this seemingly small feature that isn’t even on most checklists out there.

A related but less ubiquitous source of frustration is the types of data that can be used for segmentation. The truth is, you never really know what type of data you’re going to need to create segments on. In just the last thirty days I’ve had reason to want to create segments based on screen-resolution (for an online video site) and on average page time (for targeting live chat). Most tools that provide segmentation do a fairly reasonable job of opening up variables. But I’ll be darned if most of them don’t have peculiar limitations that drive you crazy. There is really only one right answer here: every variable should be available for filtering segments. But unlike segmentation logic, I’m willing to live with a fairly generous set of alternatives – I just don’t understand why I have to.

The use of external data (especially when combined with native web analytics data) to drive segments is a growing trend and underscores many absolutely essential methods of analysis. Some systems let you create segments based on External Data, but then you can’t combine those segments with additional behaviors. Some systems make you do this differently – create a variable based on the external data – then segment the variable along with whatever additional criteria you need. That’s actually fine – but you may run into a second limitation here – since many systems place severe limits on how you can subset variables – limiting you to string operators, for example. This won’t work at all if you are importing numeric data and need to be able to use it as such. All in all, this is a capability that not every company needs – but when you do, it’s often absolutely essential.

As far as I know, not a single tool in circulation today let’s you build data-driven visitor segments using tools like neural segmentation. That’s a shame, because the truth is that any rule-based segmentation is deeply flawed when it comes to combining complex constellations of behaviors. Someday, this will be an important capability. But for now, just assume that this is an empty checkbox on every tool in the marketplace.

Web behavioral analysis is somewhat unusual because it does often focus on two very different levels of analysis – the visitor and the visit. Of these, visitor segmentation is probably more important. Segmentation over time across sessions is absolutely essential to many, many types of analysis – and any tool that doesn’t support this capability shouldn’t be your short-list. That being said, the visit is often an interesting unit of analysis in its own right. Segmentation criteria should be definable for both Visitors and Visits – and ideally should be able to isolate segments like this: I want all visits without a purchase by Visitors who have made a purchase.

Which brings me to my last point – and a real pet peeve of mine. Virtually every single web analysis you might actually need involves a time component. What did visitors do in the week after they signed up for a class? What did visitors do in between registration and purchase? What did visitors do in the three days after downloading a trial? What did visitors do in the two sessions prior to buying a video? What was all of the behavior of a Prospect up to an including a Purchase but not after? Do all of these questions seem basic, simple and obvious. They do to me. But answering them is nearly impossible unless a web analytic tool lets you define segments based on a time component that allows you specify criteria relative to dates and or specific actions on the web. How many tools do this? Do any? How vendors have been so negligent in handling time-based analysis is utterly beyond me. This capability is the closest thing to a truly crippling lack in today’s toolset – and the remedy of it would vault a tool firmly into best in class in my view.

I couldn’t end on such a downer, so I went back and added this little gem – the ability to view distributions of behavior (as in how many visitors viewed this page 1 time, how many 2 times, how many 3 times, etc.). Why is this a segmentation capability? In one sense it isn’t. But just because there are no data driven segments, the analyst is stuck coming up with segment definitions. Take an operations site with a key page – submit a form. What’s a heavy user of the form? You can look at the average page views and see that’s its 3.5 views per visitor. But what’s the distribution? It makes a huge difference. On many customer operations sites, the distribution may be a radical U – with high spikes at 0 and 1 and a shocking spike somewhere out in the 100+ range. But if you don’t know what the distribution is like, you have no way to build appropriate segment definitions. So you have to experiment – how many visitors do I get if I say 5+ form submits. Too many? Let’s try 8. And so on. The distribution of visitors (and visits too for that matter) around a behavior is an essential descriptive view for an analyst. And unlike time based segmentation criteria it is not totally without support in the tool world!

It’s getting on toward the end of the afternoon and I’m about ready to wrap up work for the day – so thankfully I’m pretty much finished with this first section. Here is the recap – you can’t buy a web analytics solution like a digital camera – there is too much variation in function and need. It is, I’m afraid, a process of matching your real needs with actual tool capabilities. That’s a lot harder, but it’s the only right way to do the job. And it’s an answer well illustrated by Visitor Segmentation.

I listed about twenty important aspects of a visitor segmentation implementation – and I probably left off a bunch of others. After all, I pretty much write this stuff off the top of my head! Within Segment Creation capabilities I listed eight different factors to consider – some of which are widely but not universally adopted, some of which have not yet seen the light of day, and most of which are in various states of play. How important these are to you is going to vary – but chances are that most of them should at least be considered if you’re evaluating a product.

So how do products in the marketplace actually stack up? That’s another very long discussion – one I’ll take up at the end of this series. Meanwhile, in the next post, I’ll take up Segment Methodology – an interesting and, on the whole, less depressing subject!



Add to Del.icio.us | Digg | Yahoo! My Web | Furl

Bookmark WebProNews:

Gary Angel is the author of the “SEMAngel blog – Web Analytics and Search Engine Marketing practices and perspectives from a 10-year experienced guru.