comScore – The Great Debate

    April 24, 2007

By now, nearly everyone in the web analytics community is abuzz over the recent release by comScore of a study on cookie deletion rates. comScore tracked a specific web site (advertised as a portal) and one 3rd Party Ad Serving network against a panel of 400,000 users. During the study, comScore measured how often the 1st Party cookies (issued by the portal) and the 3rd Party cookies (issued by an Ad Serving Network) were deleted and replaced. The results can fairly be described as startling in several respects.

First, comScore pegged 1st Party Cookie deletion rates during an average month at 31%. Second, comScore measured the number of cookies and found that there were a significant percentage (about 7%) of serial deleters – who ended up having more than 12 1st Party cookies from the Portal site during a month. Taken together, these numbers imply a dramatic error rate in the total unique visitors a site measured even during a month timeframe. Perhaps as, or even more, surprising, comScore tracked relatively little difference between 1st & 3rd Party cookie deletion. 3rd Party cookie deletion was higher (as almost everyone would have expected) but by a margin best described as small.

Some of the reaction to this is predictable. First, there is considerable skepticism about the findings themselves. Perhaps that’s justified. Lord knows there are plenty of flawed studies done, not least when the studying party has a strong self-interested stake in the outcome. In addition, the frequency of deletion coupled with the lack of disparity between 1st & 3rd Party cookies (where automated Spyware tools might provide an explanation) makes it hard for many people to understand. Are internet users really this consistent about deleting cookies when they have to do it manually? It does seem hard to believe. Keep in mind, as well, that we are talking cookie deletion – not rejection. So though 3rd Party cookies may share a similar lifespan to 1st Part cookies, they are considerably less likely to find their way onto a computer in the first place. And while the comScore study wasn’t especially clear about this, it doesn’t seem to have focused on cookie rejection at all.

There is a second common reaction summed up in the idea that this level of errors in reporting doesn’t much matter because analysts are concerned with trends not absolute numbers. Anil Batra of Zaaz, for instance, argues (here) that it would make no (or little) difference whether your site got 5k visitor or 3K visitors – the important thing is how your site is trending in response to your actions.

I’m afraid I can’t agree with this thinking. At first glance it’s not necessarily wrong-headed though I’d argue that in fact it’s important to know your actual audience size on the web – that real numbers do matter and not just trends. Imagine a conference organizer who wouldn’t tell sponsors how many visitors the conference draws – only that it is trending upwards. I doubt I’d sponsor a booth!

But this is really just the tip of the iceberg when it comes to analysis. After all, total visitors IS pretty far down on the list of statistics of interest to the analyst. Unfortunately, almost every single statistic that matters is going to be effected – and sometimes devastatingly – by this level of error in visitor tracking. Perhaps Zaaz doesn’t track "New" visitors to the site vs. returning ones. But I do. And when half of my "new" visitors are really heavy repeat visitors I can hardly hope that my analysis will be crisp. Then too, I’ve always believed that tracking repeat customers on a site is central to most eCommerce analysis. But with this level of error I’m misclassifying a big chunk of that behavior.

Or again, perhaps people aren’t worried about what tools or pages drove repeat visits or cross-session sales behaviors. But it seems important to me. For a portal site like the one tested by comScore, the bellwether analysis is content impact on engagement. I’d like to say that this analysis may still be possible with this level of error, but I’m not sure that it would be.

Not concerned about these little things? Well how about the fact that every one of your campaigns might be significantly mis-counted in terms of conversion? That seems pretty consequential.

In short, just because comScore isn’t focused on analytics but only on traffic reporting doesn’t mean the impact of their findings is limited to traffic. And the impact for analysis of this level of error in 1st Party cookies would be very bad indeed. That’s not an admission you’d probably expect to hear from a web analytics consultancy and I’m sorry if this constitutes breaking ranks, but our first commitment ought to be to the truth of the numbers.

On the other hand, I too have some caveats about the general applicability of these results to every site. First, it’s important to understand the difference between each statistic. The 31% 1st Party cookie rate is, in my view, probably the most damning statistic in the study. Why? Because unless the behavior is somehow related to the specific site in question, you’d expect it to hold up for every site. Or then again, maybe you wouldn’t. If we accept that this statistic represents manual deletion of cookies, then there are two alternatives. The first is that people simply erase all cookies. That’s certainly possible – maybe even likely since it’s so easy to do. But a user may also scan the cookies on the target system and since cookies are often quite identifiable it’s possible that some cookies are much more likely to be deleted than others. It may be that an Amazon or Charles Schwab cookie is much less likely to be deleted than an AOL one. That’s one of the reasons it would be very interesting to understand the qualitative part of the comScore study – namely, what deleters (especially "serial" deleters) were actually doing. In addition, I’d like to understand if this problem is especially severe for certain browser populations (Firefox). And are users simply setting their browser to delete all cookies every time they close? This might explain the behavior of deleters – they aren’t doing any manual work at all. It would also imply that for this segment even same day uniques are overstated.

As comScore positions the study, however, the most damning finding is probably the one about the potential traffic impact of serial cookie "deleters." And for their particular chosen site, the impact would indeed be considerable. But for many of our sites, this group would probably have much less of an impact. Why? Portal sites are unusual in attracting a very high volume of repeat visits. comScore reported an average of nearly 13 1st Party cookies during a month for the 7% of visitors with 4 or more cookies detected. For most of our client sites (that aren’t portals), the overwhelming majority of visitors will come much less often than this. And if only a very small percentage of visitors are multiple repeaters and only a small percentage (7%) of those are multiple deleters, then the impact is likely to be considerably less (in terms of percentage of total visitors) than for a portal site. It’s likely true that no set of sites has a higher percentage of frequent return visitors than a true portal.

I’ll be honest – I find the study worrisome. It’s not that I haven’t realized that cookie deletion is a significant issue or that the potential impact of frequent return visitors on visitor traffic eluded me. But every analyst knows there is a line between data that is fuzzy but useful and data that is too messy to analyze. And I’ll be honest as well in saying that we see many sites where the number of visitors flagged as "New" would be better explained by the comScore view of the world than by the one prevalent (including by us) in the web analytics world. Depending on how the comScore study shakes out, web analysts may be facing a significant rethinking of where we are relative to that "fuzzy" line for almost every kind of interesting analysis. For some types of sites, at least, this might drive our short-term tool kit back to the pathetic same session analysis we were stuck with in tools a couple of years back.

comScore has their own agenda, here, obviously. But to me, the issue is ultimately about much more than unique visitor counts. Perhaps Anil is right to discount the importance of that statistic. But in doing so, he’s surely missing the larger point. In my view, the basic analytic toolkit of visitor segmentation and cross-session tracking is at risk in light of these findings. What’s the point of visitor segmentation when your visitors are lost every couple of days? This is an issue the measurement community just can’t ignore. I remember when 3rd Party cookie rejection rates began to climb and the ostrich like mentality that insisted this wasn’t an issue. Till suddenly the vendors made it possible to use tagged 1st Party cookies and everyone began implementing first party cookies. Sadly, we may not have solved our problem.

Are there short-term solutions? Possibly. And they begin with getting a handle on the scope of the problem for your site and understanding how to protect your analysis from these problems. You can begin to get a handle on the issue for your site by examining the trends in "New" visitors – especially if you can measure from significant events like a measurement start, a new cookie, or major traffic spikes. You might also want to look at the percentage of New/Returning visitors by Browser type. If you have strong behavioral cues on your site (like Login), you can measure the degree to which these behaviors are used by "NEW" visitors. This percentage compared to total visitor usage is an excellent way to get a reasonable read on how much or little of a problem you have. Handling bias is trickier, but if you want to analyze, for example, whether types of tools or content drive repeat visits, then you’ll be wise to start with a universe of visitors that you can prove has remained cookie constant. How might you do this? By insuring that you segment for behavior in both the 1st and last month of your study. This will narrow your analysis, but insure that you’ve ruled out cookie issues.

In the end, every problem in data quality simple complicates the life of the analyst more and makes doing good analysis just that much harder. I don’t much care who wins the great traffic reporting war. But I care very much that web analytics have a mechanism for tracking with some degree reliability the over time behavior of visitors. Where there is a will, I’m confident our truly fine tool vendors will find a way. It may be incumbent on analysts to provide that will.