Google Webmaster Tools Gets Updated Robots.txt Testing Tool

Google Webmaster Tools Gets Updated Robots.txt Testing Tool

By Chris Crum July 16, 2014

Google has released an updated robots.txt testing tool in Webmaster Tools. The tool can be found in the Crawl section. The aim of the new version of the tool is to make it easier to make and maintain a “correct” …

Developer Shares Story of Being Threatened by Facebook for Crawling

Pete Warden, a former software engineer at Apple, who is now working on his own start-up, posted an interesting story about how Facebook threatened to sue him for crawling the social network. I reached out to both Warden and Facebook for more details, but so far have only received response from Facebook, who calls  the incident as "violation of our terms."

Google States Case for Online News in WSJ
· 1 Comment

Update: The Wall Street Journal is running a piece from Google CEO Eric Schmidt on how Google can help newspapers. It’s an interesting read.

Why Your Robots.txt Blocked URLs May Show up in Google

Matt Cutts has appeared in yet another Google Webmaster Video, and this time he has a whiteboard with him so he can illustrate what he’s talking about. What he’s talking about this time are uncrawled URLs in search results.

Cutts says Google gets a lot of complaints from webmasters who say the search engine is violating their robots.txt files, with which they intend to keep Google from crawling certain pages. Sometimes those URLs still end up in search results.

White House Unblocks Google


If the outgoing Bush Administration was thought to run a secretive, bubble-icious type of White House, the Obama Administration so far is proving to be the opposite. The Whitehouse.gov redesign for greater transparency has already been widely noted—Presidential blog and all—but the website is now much more open to a new kind of visitor: the search engine spider.

Search Engines Indexing Google Profiles

It is true that Google Profiles serve as the main building blocks in acting as the foundation for making Google as the main social map in the whole web-world!

"A Google Profile is simply how you represent yourself on Google products — it lets you tell others a bit more about who you are and what you’re all about. You control what goes into your Google Profile, sharing as much (or as little) as you’d like." – Google

Google Indexing Sites in 1 Day Again
· 1 Comment

I created a new site on Friday, and by Saturday exactly 24 hours later it was in Google’s Index. I posted about this just over a month ago in my post, 7 Steps to Get Your New Site Indexed in 24 Hours.

I had a lot of comments about whether or not Adwords was necessary, so I thought I’d try it again without running Adwords this time. Here’s how it all played out:

Experts Say ACAP Specs Not Up To Snuff

Publishers have come together to develop the Automated Content Access Protocol (ACAP), a technical standard intended to dictate to search engines what they can index and what they can’t. One problem, though, as more technical minds have noted, is that the standard isn’t quite technical enough.

Google Explains Meta Tags
· 1 Comment

Search engines pay attention to some tags, and none to others. Remember when the ‘keywords’ meta tag mattered? Ah, the good old days.

Publishers Push ACAP As Robots.txt Improvement

The Automated Content Access Protocol (ACAP) debuted today as a set of improvements to deficiencies seen in the robots.txt protocol currently observed by search crawlers.

Unvalidated Robots.Txt Risks Google Banishment
· 3

The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.

Google Teaches Robots Tool About Sitemaps

The robots.txt analysis tool at Google Webmaster Central received some much-needed updating, and should be more effective for webmasters today.

Telegraph Cluelessly Attacks Google News Indexing
· 1

This time it is the UK-based Daily Telegraph that is complaining about a need to protect their content from search engines.

Get Your Sitemap Discovered Automatically with your Robots.txt File

At Search Engine Strategies New York it was announced that you can now have your sitemap automatically discovered by configuring it in your Robots.txt file. It is simple and easy to do, you’ll just need to know the URL or web address of your sitemap.

SES: I, Robots.txt

Danny Sullivan keyed the Robots.txt Summit session during SES New York, where representatives from major search engines discussed the future of the humble file used to manage crawler behavior.

Controlling How Your Site Is Indexed

Over at the Google Blog, Dan Crow is publishing a series of how-to’s about controlling what the Googlebot says about your site through the Robots Exclusion Protocol. This is the cheap-and-easy version.

Don’t Index That Link!

Using Robots.txt To Prevent Search Indexing

Sometimes there are parts of your website you don’t want accessed by the search engines – for any number of reasons, like sensitive private data, articles that require subscriptions – whatever.