Web Analytics Tool Evaluation

    November 29, 2006

Tracking People not Behaviors

The language of web analytics is driven, to a large extent, by the tool vendors. When you use a tool, you are naturally going to adopt the language of the tool – to do anything else would be cumbersome and probably ridiculous. But that language hasn’t always served the community well. In the post on Segment Reporting, for example, I noted how misleading and confusing the concept of time-period uniques (daily uniques, weekly uniques, etc.) is. There is a similar, though slightly less damaging confusion around visitor segmentation – one that obscures a serious deficiency in most analytic tools.

In the common language of web analytic tools, a visitor segment is taken to be the set of visitors who meet certain behavioral criteria (the segment definition). But this isn’t quite the way visitor segments are actually implemented. It would be more precise to say that a visitor segment in a web analytics tool is a set of filter criteria that can be attached to a report (possibly any report) to remove any visitors not matching the filter set. Is there a difference between these two definitions?

I think there is – and here’s why.

Suppose you want to track the behavior on the SEMphonic web site of visitors who viewed the webinar I did with Paul Bruemmer on SEO measurement and Process. Let’s say the link to the webinar was posted on the site in November and, presumably, will remain there for at least a few months. In January of next year, you sit down and build a segment of visitors who clicked on the link to view the webinar. You want to see their behavior before they viewed the webinar and you want to see their behavior after – and your main interest is in seeing if the webinar drove online lead generation and your secondary interest is in understanding how viewing the webinar changed visitor’s behavior.

This seems like a classic web analysis problem. It’s the type of problem web analysts’ are asked to study over and over again – with the only real changes being the qualifying behavior and the conversion behavior. And it seems like the kind of problem that visitor segmentation is going to solve.

Seemingly, all you have to do is a setup a visitor segment where the qualifying behavior is “viewed the webinar.” Start your segment in October (so you get behavior prior to the webinar) and either run it forward to whenever you’re doing the analysis or pick some “close” date like the end of January (behavior after the webinar). With that segment in hand, you can run a report of site usage for the segment. That gives you a profile of “webinar users” behavior. You compare that to overall site usage and you’ve got your comparison of “webinar” vs. “non-webinar” users. If lots of visitors view the webinar, you can create a negative segment of visitors with the qualifying behavior “did not view the webinar” to get a cleaner contrast. You can also compare how many of the webinar visitors generated a lead (a conversion event) vs. either all users or the “non-viewed” segment. That gives you a measure of how effective the webinar was in driving lead conversion. So it seems like you’re done.

Except that none of that is really right. It might work. Indeed, we regularly do exactly this sort of analysis. But we also regularly have to discount it because it contains serious problems. You can count on any behavior you look at in October as prior to the webinar – since the link wasn’t available then. But once the link is posted, you have a mish-mash of behaviors. Some of the behavior is prior to viewing the webinar – even if its behavior from yesterday – since the visitor may just have the viewed the webinar. Some of the behavior is after the webinar. And, here’s the crux of the matter – you have no way in any specific report using the segment to tell which is which. Indeed, you can’t even tell (without some extra work) how much of the conversion came before the webinar and how much came after. It might be that lead generation is driving webinar views and not the other way around. The problem is that the segment isn’t “locked” down to a specific group of visitors in a specific time.

This problem becomes especially acute for the many situations where you want to track long-term behavior after a key event. Suppose, for example, that you want to understand the behavior of new customers. You need to “lock-down” a segment of new customers and then track them over an extended time-period – almost like an online focus group. You want to know how they behave in month 2 and month 3 and month 4. But the more extended the time-period, the greater is the mishmash of before and after events – and the harder it is to separate out what you care about from the noise.

From a tool perspective, it’s not as if the way vendors have usually implemented visitor segments is completely off-base. There are plenty of times when an open-segment – one that constantly introduces new members when they meet the qualifying behavior – is ideal. But, you really need to be able to “lock-down” a segment as well – to say that this segment is no longer a set of criteria, now it’s a set of visitors. And you want to be able to see the behavior of just those visitors for any time-period – whether it be a year ago or two years from now.

Once you see the difference between “locking” down a visitor segment and the standard implementation in web analytics, I think you’ll immediately grasp how important this feature actually is. Having it allows an analyst to do clean analysis against many of the most common real-world web analytics problems. Without it, you’re constantly going to be trying to adjust for all the noise in your data. That’s makes for lots of work and lots of very uncertain – and therefore pointless – analysis. Organizations don’t like to act on analysis when all you can say is “well, it looks like it might be this way though I can’t really tell for sure.”

What if your tool doesn’t support this kind of analysis? There are some work-arounds we’ve developed. And even though this series is ostensibly about tool evaluation, it’s worth discussing our two favorite solutions to the “lock-down” problem.

Solution number one has a single big drawback – it’s a tagging solution. Here’s the idea: you setup cookie logic that tags a visitor with a date when some key action is performed for the first time. It might be registration. It might be a PPC referral. It might be a purchase. It might be a class. Regardless, you write a permanent, non-expiring first-party cookie that says the event occurred. You use that cookie to write (once or every time) a custom variable in your analysis solution that contains the event date – formatted consistently and logically. Now, you can build visitor segments using the custom variable. You could, for instance, select all visitors who took the webinar in a specific week. But you could select a much wider date range for analysis – the month before or the month after as an example. This will provide a classic before and after analysis of the segment. And the only noise is the behavior immediately surrounding the week.

For some events, that’s still too much noise. But particularly for solving long period analysis problems like how new customers behave over time, this method is quite powerful.

But there’s no denying the tagging aspect is daunting. Getting a tagging change is too big a deal to realistically expect it to happen for something like tracking a webinar. So this technique is mostly useful for tracking behaviors that are of obvious and enormous importance to the business (like buying for the first time or signing up for an account).

Fortunately, we have workaround number two for situations like the webinar. This technique depends on the fact that vendors have to support online campaigns (which change often) – so they frequently provide good ways to define campaigns based on web site behaviors without having to fuss with the tag. You can take advantage of this capability to build a campaign that is only in-place for a specific period of time. Suppose, for instance, you created a campaign for the first week in November – and made clicking on the webinar link the defining behavior. Then you did another campaign for the next week. And so on. Now, you can construct a segment of visitors that responded to the campaign in week one of November. And, just as with the tagging solution, you can study the behavior of responders before the week and after. You can even study the behavior during the crucial week – though, of course, you’ll have all the old issues about did they take the webinar on Monday or Friday.

The “campaign” method has the very considerable advantage of not requiring a tag – and it’s the method we use most often to solve the “lock-down” problem. But there’s no denying that it’s a lot of work – since the campaigns have to be created on an ongoing basis.

The existence of these work-arounds can certainly help. But, both methods have serious drawbacks, not the least of which is that you have to plan for them. That means you can’t back-engineer an ad hoc analysis with them. Unfortunately, back-engineering is the way most analysis actually happens. It’s heavy weather to tell your boss or your client they’ll have to wait three months while you put the analysis infrastructure in place and then wait for enough before and after behavior to accumulate. Clients (and bosses) expect better – and they are right to do so. Which is why I put this ability to “lock-down” segments quite high on the list of features to look for when evaluating the visitor segmentation capabilities of a web analytics tool!



Add to Del.icio.us | Digg | Yahoo! My Web | Furl

Bookmark WebProNews:

Gary Angel is the author of the “SEMAngel blog – Web Analytics and Search Engine Marketing practices and perspectives from a 10-year experienced guru.