Indexing Articles

Newspapers Propose New Indexing Standards

This should come as no surprise from the people who’d sue you to stop you from linking to them: at a publishers’ consortium today, after complaining about the limited nature of robots.txt, the newspaper industry has proposed new standards to prevent search engines (and other sites) from indexing their sites willy-nilly.

SiloMatic – Latent Semantic Indexing
· 2

The days of keyword stuffing, single phrase optimization and concentrating only on incoming links to gain traffic are slowly being phased out as a more holistic approach to judging website content comes online. This new concept has many webmasters hopping, and it should. Latent semantic indexing is quickly becoming the wave of now.

Don’t Hold Your Breath For GDrive
· 1

From purely a connotative language standpoint, "indexing the world’s information" sounds less ominous than the goal of caching "100% of user data." Even so, the long-rumored GDrive may never surface to store all of it.

Don’t Try Crawling Google News Comments

The newest feature on Google News – comments solicited from participants in a news story – won’t be part of anyone else’s news.

Unavailable After: Google Plans New Meta Tag

The unavailable_after meta tag on a web page tells a spider when it should stop indexing a particular page.

MSN Loses (Badly) In 301 Indexing Race

Moving a website can be intimidating, especially if it’s a bigger site. There will almost certainly be a loss of traffic, and much of that is due to the time it takes the search engines to recognize the new site. There are ways to dull the pain, but don’t count on Microsoft for it. 

SMX: Cutts on Themes and Latent Semantic Indexing

Having worked closely with latent semantic indexing, during my time at FI, I’ve become a big advocate of making sure you have structured themes in your content, and that you include a supporting cast of semantically connected keywords.

In this clip from SMX Advanced, Matt Cutts shares how Google is continually testing the use of LSI, and keyword themes.

Telegraph Cluelessly Attacks Google News Indexing
· 1

This time it is the UK-based Daily Telegraph that is complaining about a need to protect their content from search engines.

Book Publishers Half-way Getting it

They may not be all the way there yet, but at least HarperCollins and Random House — both ultimately owned by Rupert Murdoch’s News Corp. — are trying to bring books into the Web era with their new “browsing” widgets.

Controlling How Your Site Is Indexed

Over at the Google Blog, Dan Crow is publishing a series of how-to’s about controlling what the Googlebot says about your site through the Robots Exclusion Protocol. This is the cheap-and-easy version.

Don’t Index That Link!

Yahoo and Microsoft Indexing Blogs Poorly

I just got done reading an interview with Larry Cornett, Director of User Experience Design at Yahoo and Kathryn Kelly, Director of PR for Yahoo! Search over at search engine land.

Using Robots.txt To Prevent Search Indexing

Sometimes there are parts of your website you don’t want accessed by the search engines – for any number of reasons, like sensitive private data, articles that require subscriptions – whatever.

Latent Semantic Indexing – The Future of SEO
· 1

The major search engine Google has now incorporated a new element into its search algorithms called Latent Semantic Indexing.

Search Engine X Too Powerful. Just Open Source It?

Skrenta explains what many of you already grasped about Google’s growing power.

Adversarial Indexing

I told a client today about a phenomenon known as “adversarial indexing.” Actually, I said it was “known as adversarial indexing” or that “information retrieval scientists call it adversarial indexing,” when I should have said “information retrieval scientists call it something like adversarial indexing.”

Open Source SQL Full Text Search Engine – Sphinx

I came across Sphinx via the MySQL Performance Blog (which has some good entries you might want to check out). It is an Open Source Full Text SQL Search Engine.

Google Bestows Much Love Upon Webmasters

The revamped Google Webmaster Central has enough changes in place that it merits a second look for our readers.

Bigdaddy Timeline, Courtesy of Matt Cutts

Sometime around January or February, a number of webmasters began to notice that Google had somehow “lost” huge portions of their websites.

Google Blasted For Indexing Nazi Article

Google News has come under fire in England from anti-fascists organizations for picking up a news article published by far-right “Nazi” and “openly racist” British National Party (BNP), reports Times Online.

Latent Semantic Indexing (LSI): What it is, How it Works and What it Means

LSI is a methodology for automatic document classification. It examines all the words in all the documents of a corpus and calculates similarity measurements for each document or for individual terms.

Google Introduces Sidebar Plug-ins

It seems as if Internet companies, especially search engines, are releasing utilities that allow users to “mod” them (develop 3rd party add-on widgets) by using software development kits (SDK) and application programming interfaces (API).