Search Engine Spider, Index, and Ranking
Reader question: I think my site has been banned in Google. It used to have all of these top positions for the past three years, and it suddenly disappeared. Can you help?
Answer: In search engine optimization (SEO), there are many terms that are confusing. For example, many firms that specialize in search engine advertising claim that they are search engine marketers, even though search engine marketing encompasses a wide variety of skills (search engine optimization, directory paid inclusion, search engine paid inclusion, vertical search, and search engine advertising).
In SEO, the terms spider/crawler, index, and rank have three completely different meanings. This article will address the different meanings, and why each term is important in the optimization process.
Search engine spiders
Search engine spiders are also known as crawlers. Search engine spiders find and fetch Web pages, and build a list of words found on each Web page.
So if a search engine “spiders” your site, the spider arrives at your Web server by following a link to your site, requests that URL from your server, and then builds the list of words found on that URL.
If a search engine is unable to spider a page, then it cannot include the information about that page in the search engine results pages (SERPs). Common reasons that search engines are unable to spider a site include:
Problematic site navigation schemes
Problematic URL structures
Web server issues
Interestingly, your Web analytics software (WebTrends, ClickTracks, Omniture, Urchin, Hitbox, etc.) will tell you how often your site is crawled by search engine spiders. If you change your content frequently, search engine spiders will revisit your site more frequently because search engine companies want to deliver the most accurate, current, and relevant search results.
However, a page will not rank higher just because a spider visits your site more frequently. Spidering and ranking are quite different processes.
The search engine index
After a search engine is able to spider a page and records the words and phrases on a page, it places that data into an index. Whenever you perform a search on Google, for example, the data about a URL is stored in the search engine index. Some search engine marketers call the index a “search engine database.”
A page can be included in a search engine index but not rank. However, a page cannot rank unless it is in the search engine index.
Web site owners should monitor their sites’ index count every month. An index count is the number of pages that are in the search engine indices. Each search engine has its own way of determining the index count. Below are current ways of determining the index count of your site:
Google and Teoma
site:yourdomain.com or domain:yourdomain.com
Take the time to perform these types of searches. Do not rely on link popularity checkers and other software that performs automated queries to the search engines. Unless the software owners have permission from Google, Yahoo, MSN Search, Teoma and other search engines to perform these queries (none of them do), the results might not be accurate.
If your pages are not included in a search engine index, then a site might have problems with server issues, site navigation, and URL structure. In addition, a site can be spidered by a search engine but not be included in the search engine index.
Search engine rankings
Okay everyone, repeat after me: No SEO knows what the search engine algorithm is. Say it five times. Say it 10 times. Say it 100 times or say it more until it sinks in.
No search engine marketer knows what a search engine’s algorithm is. Nice sales pitch, but one that is just not true.
No one knows exactly how a search engine ranks Web pages. What ethical search engine marketers do is follow the terms and guidelines set forth by the search engines. Unethical search engine marketers try to exploit the loopholes in the terms and guidelines. It is often the unethical search engine marketers who claim to know Google’s search engine algorithm.
Search engine rankings tend to rely on three components:
1. Keyword rich text – A page must contain the words and/or phrases that people type into search queries.
2. Search engines must be given easy access to that text.
3. Other people must consider a page’s content to be valuable. If a person thinks content is valuable, that person tends to link to the page that contains the content.
Of course, no search engine measures these components in the same manner. For example, all search engines remove common words from a page, called stop words or filter words. Filter words are common words (a, an, of, or, and, the) that the search engines ignore during a search. Search engines filter out these words because using these words in a search query can slow down the retrieval of search results without improving accuracy.
Google’s stop word list is not going to be the same as Yahoo’s stop word list. Additionally, no keyword density checker (from any search engine marketer) is going to have the same stop word list as Google. Likewise, each search engine measures link popularity in a different way.
Does that mean I advise people to create different pages for different search engines? Of course not. A well-written, user-friendly, and spider-friendly site can get plenty of traffic from all of the search engines without resorting to search engine spam.
So remember, being ranked in a search engine does not have the same meaning as being indexed in a search engine.
Shari Thurow is Marketing Director at Grantastic Designs, Inc., a full-service search engine marketing, web and graphic design firm. This article is excerpted from her book, Search Engine Visibility (http://www.searchenginesbook.com) published in January 2003 by New Riders Publishing Co. Shari can be reached at email@example.com.