Quantcast

Yahoo SiteExplorer Web vs. API

Get the WebProNews Newsletter:
[ Search]

In response to my post about the Yahoo API giving the “wrong” results, I got an email from a Yahoo! rep, and we’ve been emailing back and forth a few times since. When I showed him the difference in the numbers given through the API and the Web interface for css3.info (I’ve updated my domain-info tool to both scrape the web interface and get the numbers through the API), saying “they can’t both be accurate” he explained the difference this way:


Nope, not going to claim they are accurate, merely an estimate taken from either the raw (for the scraped pages) or semi-analyzed (for the API data) for the server cluster you hit at the time of request.

If it were possible to return accurate numbers, I’m willing to bet they’d do that. Unfortunately, it’s usually not, due to stuff like scaling issues, crawl vs. report lag and other factors.

Followed by another nice quote at the end of that email:

Again, I’m not claiming that these number are the best possible (even as estimates, that’s why the engineers are trying to improve them), but they do serve as a guide. Likewise, I’d definitely make sure to grab numbers from Google, Ask and MSN since decision making off of one data point seldom makes for good decisions.

Now I think this points is great, were it not that the data these other 3 engines give are either stupidly off (in the case of Google and ASK) or non-existent at the moment (MSN).

He says something else too:

To be honest, the only person that can accurately measure real inbound link counts are the folks that control the access logs and can scan and report those. Anything outside of those numbers is never going to be as accurate.

Now this would be true, if all scrapers gave me clickthroughs… Yet they don’t. So I think I can get a nice sample of links which truely have value from access logs, but it wouldn’t show me any DMOZ links for instance. Another problem is of course, that your competitor probably won’t give you access to his access logs… So we need interfaces like these. The two different numbers now each have their inherent value because of these answers, so for now I’m going to keep using the API and scrape them.

I must say though, that it’s awesome to be able to mail with a rep from Yahoo! about this and discuss it so openly, and them having no problem at all with me blogging this.

Comments

Tag:

Yahoo SiteExplorer Web vs. API
Comments Off
About Joost de Valk
Joost de Valk is a Dutch SEO who works as Search Strategist at Onetomarket, a European online marketing agency. Among the clients whom he advises about search engine marketing are the airline KLM, one of the biggest Dutch publishers Wegener, and Amadeus, a leading ICT company in the travel industry.

In his spare time, Joost is a committer in the WebKit open source project, maintains css3.info, the best online resource about CSS3, blogs about SEO, WordPress and webdevelopment on his SEO blog and develops WordPress themes and plugins in his company AlthA Webdesign. WebProNews Writer
Top Rated White Papers and Resources

Comments are closed.