<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WebProNews &#187; Spider</title>
	<atom:link href="http://www.webpronews.com/tag/spider/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webpronews.com</link>
	<description>Breaking News in Tech, Search, Social, &#38; Business</description>
	<lastBuildDate>Mon, 13 Feb 2012 03:20:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Unvalidated Robots.Txt Risks Google Banishment</title>
		<link>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11</link>
		<comments>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11#comments</comments>
		<pubDate>Wed, 21 Nov 2007 11:53:35 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[Robots.txt]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=42095</guid>
		<description><![CDATA[The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.
]]></description>
			<content:encoded><![CDATA[<p>The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.<br />
<span id="more-42095"></span></p>
<table width="400" cellspacing="0" cellpadding="2" border="0">
<tr>
<td align="center"><img width="400" height="200" border="0" src="http://images.ientrymail.com/webpronews/article_pics/unvalidated_robots_risks_google_banishment.jpg" title="Unvalidated Robots.Txt Risks Google Banishment" alt="Unvalidated Robots.Txt Risks Google Banishment" class="irImage" /></td>
</tr>
<tr>
<td class="caption" style="padding-bottom: 10px; padding-left: 45px; padding-right: 45px;" align="right">Unvalidated Robots.Txt Risks Google Banishment</td>
</tr>
<tr>
<td class="caption" style="padding-bottom: 0px;" align="center"><img width="334" height="21" src="http://images.ientrymail.com/webpronews/salon/complete.gif" alt="" /></td>
</tr>
</table>
<p>Webmasters welcome being dropped out of Google about as much as they enjoy flossing with barbed wire. Making it easier for Google to do that would be anathema to being a webmaster. Why willingly exclude one&#8217;s site from Google?</p>
<p>
That could happen with an unvalidated robots.txt file. Robots.txt allows webmasters to provide standing instructions to visiting spiders, which contributes to having a site indexed faster and more accurately.</p>
<p>
Google has been <a href=http://www.webpronews.com/topnews/2007/07/13/some-new-tags-to-play-with>considering new syntax</a> to recognize within robots.txt. The <a href=http://sebastians-pamphlets.com/validate-your-robots-txt-or-google-might-deindex-your-site/>Sebastians-Pamphlets</a> blog said Google confirmed recognizing experimental syntax like Noindex in the robots.txt file.</p>
<p>
This poses a danger to webmasters who have not validated their robots.txt. A line reading <tt>Noindex: /</tt> could lead to one&#8217;s site being completely de-indexed.</p>
<p>
The surname-less Sebastian recommended Google&#8217;s <a href=https://www.google.com/webmasters/tools/robots?siteUrl=>robots.txt analyzer</a>, part of Google&#8217;s Webmaster Tools, and only using the <tt>Disallow, Allow, and Sitemaps</tt> crawler directives in the Googlebot section of robots.txt.</p>
<p><center><a href="http://aj.600z.com/aj/41546/0/cc?z=1"><img src="http://aj.600z.com/aj/41546/0/vc?z=1&#038;dim=41553" width="336" height="55" border="0"></a></center></p>
<p>
<small></small></p>
<p>
<a href="http://twitter.com/dutter/">follow me on Twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Review: Trackback Spider</title>
		<link>http://www.webpronews.com/review-trackback-spider-2007-05</link>
		<comments>http://www.webpronews.com/review-trackback-spider-2007-05#comments</comments>
		<pubDate>Tue, 29 May 2007 19:51:51 +0000</pubDate>
		<dc:creator>Andy Beard</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Dashboard]]></category>
		<category><![CDATA[domain]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Trackback Spider]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=38014</guid>
		<description><![CDATA[<h3>What Does Trackback Spider Do?</h3>
<p>From what I have read on the pre-release materials, it sends trackback notifications to create one-way links to your website.</p>
<p>That means:-</p>
]]></description>
			<content:encoded><![CDATA[<h3>What Does Trackback Spider Do?</h3>
<p>From what I have read on the pre-release materials, it sends trackback notifications to create one-way links to your website.</p>
<p>That means:-</p>
<p><span id="more-38014"></span></p>
<ul>
<li>You are not being referenced on the site sending the trackback</li>
<p></p>
<li>You are not being linked to by the site sending the trackback</li>
</ul>
<p>This is Blackhat, lots of tools have been available for this kind of thing in the past, and this kind of software is actually fairly unpopular even among most shady web masters.</p>
<p>Fortunately it is not very effective on active blogs for a number of reasons.</p>
<ul>
<li>It is becoming much harder to monetise sites which are using trackback spam. Bloggers tend to be quite vocal and keen to report such sites to Google.</li>
<p></p>
<li>Blogspot uses linkbacks &#8211; even splogs have to reciprocate a link &#8211; Google should really provide a way to moderate linkbacks.</li>
<p></p>
<li>WordPress has a number of good solutions to handle trackback spam &#8211; my favorite is Spam Karma which always checks for a reciprocal link, and my installation also has quite a growing blacklist.</li>
<p></p>
<li>Most bloggers who receive a trackback actually check them</li>
</ul>
<p>There might be some minor success on old abandoned WordPress blogs, and maybe abandoned blogs on other formats, but those generally are low quality, and fairly low traffic.</p>
<p>So even if you want to wear a black hat, this type of tool is becoming more and more useless daily.</p>
<p>Disclosure: I do have some past history with the developer, as I purchased Domain Dashboard his previous product, and whilst many people were satisfied, I have never managed to get <a href="http://andybeard.eu/2006/10/domain-dashboard.html">Domain Dashboard</a> working consistently for all my domains, the product hasn&#8217;t been updated for months, and support was unresponsive.</p>
<p>I didn&#8217;t ask for a refund &#8211; if the product had worked with my various hosting plans, it would have saved me a huge amount of time daily. Maybe in the future it will, because I am thinking about moving a lot of my hosting.</p>
<p>I don&#8217;t think there is much of a legitimate market for this new product. Most people I know who dabble on the dark side already have similar solutions, and those who I would look on as &quot;grey hat&quot; don&#8217;t dabble with this kind of trackback spam.</p>
<p><strong><a href="http://andybeard.eu/2007/05/trackback-spider-review.html" title="Andy Beard">* Originally published at AndyBeard.eu</a></strong> <br />
<a title="Comment on trackback spider" href="http://andybeard.eu/2007/05/trackback-spider-review.html#comments">Comments</a></p>
<p>Tag: </p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/review-trackback-spider-2007-05/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Webmaster Claims Spider Entered Contract In Suit</title>
		<link>http://www.webpronews.com/webmaster-claims-spider-entered-contract-in-suit-2007-03</link>
		<comments>http://www.webpronews.com/webmaster-claims-spider-entered-contract-in-suit-2007-03#comments</comments>
		<pubDate>Fri, 16 Mar 2007 22:37:34 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright law]]></category>
		<category><![CDATA[Eric Goldman]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Suzanne Shell]]></category>
		<category><![CDATA[Wayback Machine]]></category>
		<category><![CDATA[Web Crawlers]]></category>
		<category><![CDATA[Webmaster]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=36221</guid>
		<description><![CDATA[<p>The Web and artificial intelligence have brought about some surreal, science fiction like questions. The most recent mind-bending concept is whether or not robots can enter into contracts &#8211; that is, is a Web crawler implicitly entering a contract posted on a website announcing copyright conditions? <br />
]]></description>
			<content:encoded><![CDATA[<p>The Web and artificial intelligence have brought about some surreal, science fiction like questions. The most recent mind-bending concept is whether or not robots can enter into contracts &ndash; that is, is a Web crawler implicitly entering a contract posted on a website announcing copyright conditions? </p>
<p>A little while back, we explored the idea that RSS, as an automatic distribution agent, could <a href="http://www.webpronews.com/insiderreports/2006/11/03/does-rss-imply-permission-to-reuse-content">imply permission</a> to republish. But that involves two human parties, essentially, with a technical agent in between. </p>
<p>A court battle in Colorado, however, focuses on claims brought by Suzanne Shell against the Internet Archive&#8217;s Wayback Machine, which holds in searchable perpetuity pages that appear on the Web, for future historical reference. </p>
<p>Shell owns the website www.profane-justice.org, devoted to providing information and support for people who feel they&#8217;ve been unlawfully targeted by state agents (like police or child services organizations) or unfairly accused of child abuse.</p>
<p>A notice appears on the site stating that users copying or distributing the content on the site automatically agree to the terms of a contract. Failure to abide carries a fee of $5,000 per page copied; $250,000 per occurrence of unauthorized use, and a charge of $50,000 for each occurrence of failure to pay, plus costs and triple damages. </p>
<p>There was no mechanism in place on the site (such as in the <a href="http://www.webpronews.com/topnews/2007/02/26/controlling-how-your-site-is-indexed">robots.txt file</a>) to prevent Internet Archive&#8217;s robot from scanning, copying, and storing the pages on Shell&#8217;s website. She discovered that the Wayback Machine had reproduced the contents of her website about 87 times in five years, &quot;and displayed her entire website to the public daily during that period.&quot;&nbsp; </p>
<p>Shell sued the company for conversion, civil theft, breach of contract, and violations of the Racketeering Influence and Corrupt Organizations Act (RICO) and the Colorado Organized Crime Control Act (COCCA). </p>
<p>As might be guessed, most of these claims were dismissed. But the breach of contract claim is still under consideration, awaiting more information. The question that will be decided, ultimately, is whether a web crawler that is not blocked by a website can legally be bound by a contract posted on the site. </p>
<p>The outcome of that question could also have important impact on Web-crawling and Internet copyright law itself. Search engines like Google have leaned on Fair Use principles when scouring the Web (and off-line libraries) for information. The Google Book Search project defenders have claimed that publishers can &quot;<a href="http://www.webpronews.com/topnews/2007/02/26/controlling-how-your-site-is-indexed">opt out</a>&quot; of their network by letting Google know their desire to do so. </p>
<p>Of a similar vein, then, is Web crawling. Webmasters must opt-out of indexing via the robots.txt file, preventing the spider from crawling the site. If the courts continue to back search engines&#8217; and other Internet companies&#8217; right to copy and index at will, then the whole copyright system, by default, it would seem, goes opt-out, too. </p>
<p>Copyright law blogger John Ottaviani, on <a href="http://blog.ericgoldman.org/archives/2007/03/can_a_spider_en.htm">Eric Goldman</a>&#8216;s blog, goes into the <a href="http://blog.ericgoldman.org/archives/waybackshell.pdf">Internet Archive v. Shell</a> case and its implications in greater detail.&nbsp;&nbsp;</p>
<p>&nbsp;</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/webmaster-claims-spider-entered-contract-in-suit-2007-03/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Cloaking Is Bad&#8230; Unless It&#8217;s Good</title>
		<link>http://www.webpronews.com/cloaking-is-bad-unless-its-good-2006-12</link>
		<comments>http://www.webpronews.com/cloaking-is-bad-unless-its-good-2006-12#comments</comments>
		<pubDate>Mon, 18 Dec 2006 17:30:40 +0000</pubDate>
		<dc:creator>Chris Richardson</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Addresses]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Browser]]></category>
		<category><![CDATA[Cloaking]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Engine]]></category>
		<category><![CDATA[GrayWOlf]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=33791</guid>
		<description><![CDATA[The concept of page cloaking has come under fire; again, because the idea is being used by a number of legitimate sites in order to protect or hide their content from users and/or search engine bots.  The fact that these sites do not get punished for using cloaking techniques has become a sore spot with some bloggers.
]]></description>
			<content:encoded><![CDATA[<p>The concept of page cloaking has come under fire; again, because the idea is being used by a number of legitimate sites in order to protect or hide their content from users and/or search engine bots.  The fact that these sites do not get punished for using cloaking techniques has become a sore spot with some bloggers.</p>
<p>Wikipedia <a href="http://en.wikipedia.org/wiki/Cloaking" class="bluelink">defines</a> cloaking as:</p>
<p><i>Cloaking is a <a href="http://en.wikipedia.org/wiki/Black_hat" class="bluelink">black hat</a> <a href="http://en.wikipedia.org/wiki/Search_engine_optimization" class="bluelink">search engine optimization</a> (SEO) technique in which the content presented to the <a href="http://en.wikipedia.org/wiki/Search_engine_spider" class="bluelink">search engine spider</a> is different from that presented to the users&#8217; <a href="http://en.wikipedia.org/wiki/Browser" class="bluelink">browser</a>. This is done by delivering content based on the <a href="http://en.wikipedia.org/wiki/IP_address" class="bluelink">IP addresses</a> or the User-Agent <a href="http://en.wikipedia.org/wiki/HTTP" class="bluelink">HTTP</a> header of the user requesting the page. When a user is identified as a search engine spider, a server-side <a href="http://en.wikipedia.org/wiki/Scripting_language" class="bluelink">script</a> delivers a different version of the <a href="http://en.wikipedia.org/wiki/Web_page" class="bluelink">web page</a>, one that contains content not present on the visible page. The purpose of cloaking is to deceive <a href="http://en.wikipedia.org/wiki/Search_engine" class="bluelink">search engines</a> so they display the page when it would not otherwise be displayed.</i></p>
<p>Basically, you are presenting search engine bots with a certain kind of content while delivering different content to the site visitor.  Normally, the cloaked pages are created to fool search engines in order to get better result rankings.  However, what if you are using cloaking procedures for legitimate reasons like protecting paid content or serving different content based on the visitor&#8217;s IP address?  Should sites doing this be subject to the same penalties?  It depends on whom you ask.</p>
<p>On the <a href="http://www.wolf-howl.com/tools/how-do-you-save-pages/" class="bluelink">Graywolf SEO blog</a>, readers are asked how they save articles from the New York Times because they are only available to the public for a limited amount of time.  Once an article gets to a certain age (2 weeks), the NYT hides it unless the Google crawler (or other search engine bot) requests it &#8211; fitting the definition of cloaking, something Graywolf takes the search engines (and the NYT) to task over.</p>
<p>Philipp Lenssen of Google Blogoscoped also <a href="http://blog.outer-court.com/archive/2006-12-13-n85.html" class="bluelink">has some issues</a> with Google seemingly allowing WebmasterWorld to cloak their pages, which goes against the search engine&#8217;s <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35769" class="bluelink">webmaster guidelines</a>.  For his post, Philipp conducted a search <a href="http://www.google.com/search?hl=en&#038;lr=&#038;safe=off&#038;q=php-based+cms&#038;btnG=Search" class="bluelink">related to CMS and PHP</a> and a WebmasterWorld post held the first position.  However, when Lenssen tried to access the page from the search results, he was taken to a login page &#8211; another example of cloaking in action (unfortunately, when I try to duplicate the search, I am taken directly to the content).</p>
<p>Both Lenssen and Graywolf wonder how these otherwise legitimate sites get away with these cloaking exercises when Google and the rest are explicitly against the act.  However, the examples given by both bloggers represent the &#8220;white-hat&#8221; side of cloaking in the sense they are not trying to game the search engines.  These sites and companies are merely trying to protect their content.  </p>
<p>However, this does not matter to either Lenssen or Graywolf.  Because Google has actually addressed this issue in their guidelines, both believe there should be no quarter when it comes to punishing the guilty parties, whether the sites have a legitimate reason for cloaking or not.  They also feel Google&#8217;s Matt Cutts should address the situation so there will be no more confusion.</p>
<p>At the Chicago SES, while it was never explicitly stated (at least in the sessions I attended), there seems to be a growing sentiment that as long as the webmaster isn&#8217;t trying to be deceptive, search engines will tolerate some cloaking.  The Wikipedia page discusses delivering content based on a visitor&#8217;s IP location (<a href="http://en.wikipedia.org/wiki/Cloaking#Cloaking_versus_IP_Delivery" class="bluelink">IP Delivery</a>) as one of the instances where cloaking is indeed accepted.  Although, the explanation also points out IP delivery isn&#8217;t the best example of cloaking because the content in question is not being hidden from search engines or users; it&#8217;s just being manipulated based on the visitor&#8217;s location.  </p>
<p>The question remains, however &#8211; should the search engines punish pages being cloaked for content protection reasons? If you follow the two bloggers cited in this article, then yes, all sites doing so should be punished.  If they are not going to punish these sites, then the search engine spokesmen and women should speak up and address the confusion.</p>
<p>Add to <a href="http://del.icio.us/post" onclick="window.open('http://del.icio.us/post?v=4&#038;partner=wpn&#038;noui&#038;jump=close&#038;url='+encodeURIComponent(location.href)+'&#038;title='+encodeURIComponent(document.title),'delicious','toolbar=no,width=700,height=400'); return false;" class="printMailTop"><img src="http://images.ientrymail.com/webpronews/delicious-pic.png" border="0"> Del.icio.us</a> | <a href="javascript:void window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)"><img src="http://images.ientrymail.com/webpronews/digg-pic.png" border="0"> Digg</a>  | <a href="javascript:location.href='http://reddit.com/submit?url='+encodeURIComponent(location.href)+'&#038;title='+encodeURIComponent(document.title)"><img src="http://images.ientrymail.com/webpronews/reddit.png" border="0"> Reddit</a> | <a href="javascript:location.href='http://www.furl.net/storeIt.jsp?u='+encodeURIComponent(document.location.href)+'&#038;t='+encodeURIComponent(document.title)+' '"><img src="http://images.ientrymail.com/webpronews/furl-pic.png" border="0"> Furl</a></p>
<p>Chris Richardson is a search engine writer and editor for <a href="http://www.WebProNews.com">WebProNews</a>. Visit WebProNews for the <a href="http://www.WebProNews.com">latest search news</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/cloaking-is-bad-unless-its-good-2006-12/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The ROR Advantage: No Spider Discrimination!</title>
		<link>http://www.webpronews.com/the-ror-advantage-no-spider-discrimination-2006-09</link>
		<comments>http://www.webpronews.com/the-ror-advantage-no-spider-discrimination-2006-09#comments</comments>
		<pubDate>Wed, 20 Sep 2006 17:14:01 +0000</pubDate>
		<dc:creator>Philip Nicosia</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Discrimination]]></category>
		<category><![CDATA[sitemap]]></category>
		<category><![CDATA[Spider]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=31595</guid>
		<description><![CDATA[Search engine optimization is a very complex science, but at its heart is the simple rule: to format your website in such a way that spiders can immediately recognize and index its content.
]]></description>
			<content:encoded><![CDATA[<p>Search engine optimization is a very complex science, but at its heart is the simple rule: to format your website in such a way that spiders can immediately recognize and index its content.</p>
<p>If they can&#8217;t &#8220;see&#8221; you, you might as well not exist&#8211;and if they can&#8217;t understand your code, no amount of keywords can get you in the Golden Top 20.</p>
<p>The problem that many website developers used to encounter was that search engines worked differently; so you could end up with a high ranking in Lycos but languish at the bottom of Google. How exactly should you optimize your site so you perform well in all search engines?</p>
<p>Enter ROR (short for Resources for a Resource), an independent XML format that translates your content in a way that all search engines can understand.</p>
<p>Think of it as a web spider&#8217;s Cliff&#8217;s Notes. it describes all the objects, services, discounts, images, podcasts, etc. If it&#8217;s on the site, it&#8217;s on the ROR feed, but in a format that&#8217;s easy to process and removes all risks of skipping or ignoring a link. </p>
<p>ROR calls its &#8220;magic file&#8221; structured feeds, which guide search engines as they scan the text. Unlike Google Sitemaps, it&#8217;s universally understood&#8211;and very easy to process. It&#8217;s also more detailed. It doesn&#8217;t just give a map or &#8220;table of contents&#8221;, it actually summarizes what&#8217;s inside. It&#8217;s also been in existence far longer than Google, so its reliability has been proven by time. </p>
<p>Though it&#8217;s been around for a long time, ROR is by no means outdated. The majority of the file formats are already available in ROR, although it is currently being updated to keep up with the growing number of website innovations. But to avoid being too unwieldy, the ROR system tries to re-use existing data structures. It boasts of being very streamlined, a strength that makes it one of the more efficient ways of indexing a site. </p>
<p>Usually the ROR feed is located in the directory, and is named by default ror.xml. It is possible to rename the file, and the search engines will still find it. The only thing it needs to have is a &lt;link&gt; tag in your main page (between the &lt;head&gt; and &lt;/head&gt; tags). Another alternative is to create a smaller ror.xml file which will direct the search engines to the ROR feed.</p>
<p>You can create this file in the <a href="http://www.xml-sitemaps.com/">ROR sitemap generator</a>.</p>
<p>XML-Sitemaps.com has an online <a href="http://www.xml-sitemaps.com/">sitemap generator</a> that creates XML, HTML, text and <a href="http://www.xml-sitemaps.com/forum/index.php/topic,418.0.html">ROR sitemaps</a> and also provides some useful <a href="http://www.xml-sitemaps.com/seo-tools.html">SEO tools</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/the-ror-advantage-no-spider-discrimination-2006-09/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Let the Spider Crawl</title>
		<link>http://www.webpronews.com/let-the-spider-crawl-2006-05</link>
		<comments>http://www.webpronews.com/let-the-spider-crawl-2006-05#comments</comments>
		<pubDate>Thu, 25 May 2006 20:23:10 +0000</pubDate>
		<dc:creator>Lee Odden</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Guidelines]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[Network]]></category>
		<category><![CDATA[Online]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=29511</guid>
		<description><![CDATA[For most sites, one of the first things we check is to make sure the site crawler friendly. "Crawler friendly" you say? What the heck does that mean?
]]></description>
			<content:encoded><![CDATA[<p>For most sites, one of the first things we check is to make sure the site crawler friendly. &#8220;Crawler friendly&#8221; you say? What the heck does that mean?</p>
<p>Search engines find sites mostly by following links from sites that are already known to find new sites and pages. The sofware programs that search engines use to perform this task are often called &#8220;Bots&#8221; or &#8220;Spiders&#8221;. You get the analogy right? &#8220;Web&#8221;, &#8220;Spiders&#8221;, &#8220;Crawl&#8221;.</p>
<p>If you don&#8217;t make sure your site is crawable and indexed, then you&#8217;re putting your web site at a gross disadvantage. For example, if you have a 1,500 page web site and only 700 pages are getting indexed, that&#8217;s like showing up to a baseball game with only 5 of your players. You need the whole team to win, so make sure your site is crawlable and getting indexed properly.</p>
<p>As search engine spiders crawl the links of your site, they make copies of the pages and then peform other functions that strip away the code, interpret the remaining text as well as other analysis that ultimately leads to a score for the page and association of the page to certain words. All of this along with links into your site from other web sites, influence your rankings. On the PPT <a href="http://blog.searchenginewatch.com/blog/060510-123802" class="bluelink">slides from the recent Google Press Day</a>, it says there are over 200 &#8220;signals&#8221; used to rank web pages on Google.</p>
<p>Here&#8217;s an animation of how <a href="http://drunkmenworkhere.org/219.php?a=yahoo2hour" class="bluelink">Yahoo&#8217;s SLURP crawls a network</a> of pages.</p>
<p>If a search engine has difficulty &#8220;crawling&#8221; the links within your site, then the pages either won&#8217;t get indexed at all or will only get partially indexed &#8211; neither of which will help your site&#8217;s rankings.</p>
<p>OK, now I know why, but what about the how? Search engine friendly URLs are simple. As in, short and simple. For example, the url of <a href="http://www.toprankblog.com/2006/05/seo-tips-let-the-spider-crawl/" class="bluelink">this web page</a> is: http://www.toprankblog.com/2006/05/seo-tips-let-the-spider-crawl/</p>
<p>It could be something like http://www.toprankblog.com/?pageid=234234&#038;articleid=5tips&#038;postid=435345 or something similar. The second url is still crawlable, but if you got to pick, which one would you prefer to index? Which one would you be more likely to remember as a user?</p>
<p>Most problems with links and the URLs they point to getting crawled involve shopping cart software or content management systems that place a lot of extra information in the web page URL. If references to &#8220;?sid=&#8221; or a large number of variables are included in the URL it can cause issues. Search engine bots are leary of &#8220;spider traps&#8221; or situations with calendars or where an infinite number of url versions display the exact same web page. This often occurs with the use of session ids.<br />
Simple and short urls are typically the easiest to crawl so try to use a content management system that produces short, clean URLs.</p>
<p>You can also use programs like <a href="https://www.google.com/webmasters/sitemaps/docs/en/about.html" class="bluelink" title="google sitemaps">Google Sitemaps</a> to submit your site URLs for inclusion. There is no guarantee it will work, but it&#8217;s been pretty effective for many web sites. Google Sitemaps works in conjunction with a normal &#8220;crawl&#8221; of your web site. Plus there are many useful troubleshooting features and information available with Google Sitemaps. You can also submit an RSS feed or plain text file of your site&#8217;s URLs <a href="http://submit.search.yahoo.com/free/request" class="bluelink" title="yahoo">to Yahoo</a>.</p>
<p>There&#8217;s actually quite a bit more involved with making your site crawlable, but I&#8217;ll leave it at this for now.</p>
<p>Resources on crawler friendly web sites:
<ul>
<li><a href="http://www.google.com/webmasters/guidelines.html" class="bluelink" title="Google Webmaster Guidelines">Google Webmaster Guidelines </a></li>
<li><a href="http://help.yahoo.com/help/us/ysearch/deletions/deletions-05.html" class="bluelink" title="Yahoo! Search Content Quality Guidelines">Yahoo! Search Content Quality Guidelines</a> </li>
<li><a href="http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_GuidelinesforOptimizingSite.htm&#038;FORM=WGDD" class="bluelink">MSN guidelines</a> </li>
<li><a href="http://sp.ask.com/docs/about/aj/teoma.htm" class="bluelink">Ask/Teoma Crawler Information</a> </li>
<li><a href="http://www.smart-it-consulting.com/article.htm?node=148" class="bluelink">Smart IT Consulting Weblog</a> </li>
<li><a href="http://www.seomoz.org/articles/bg2.php#2a" class="bluelink">Speed Bumps and Walls &#8211; SEOMoz</a> </li>
<li><a href="http://www.seroundtable.com/archives/003418.html" class="bluelink">&#8220;Meet the Crawlers&#8221;</a> SES NYC 2006 </li>
</ul>
<p>Add to <a href="http://del.icio.us/post" onclick="window.open('http://del.icio.us/post?v=4&#038;noui&#038;jump=close&#038;url='+encodeURIComponent(location.href)+'&#038;title='+encodeURIComponent(document.title), 'delicious','toolbar=no,width=700,height=400'); return false;">Del.icio.us</a> | <a href="javascript:void window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)">DiggThis</a>  | <a href="javascript:void window.open('http://myweb2.search.yahoo.com/myresults/bookmarklet?t='+encodeURIComponent(document.title)+'&#038;u='+encodeURIComponent(window.location.href)+'&#038;tag=StreamCast,Skype,eBay,Kazaa','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)">Yahoo! My Web</a> | <a href="javascript:location.href='http://www.furl.net/storeIt.jsp?u='+encodeURIComponent(document.location.href)+'&#038;t='+encodeURIComponent(document.title)+' '">Furl</a></p>
<p>Lee Odden is President and Founder of<br />
<a href="http://www.toprankresults.com/">TopRank Online Marketing</a>, specializing in organic SEO, blog<br />
marketing and online public relations. He&#8217;s been cited as a search<br />
marketing expert by publications including U.S. News &#038; World Report and<br />
The Economist and has implemented successful search marketing programs<br />
with top BtoB companies of all sizes. Odden shares his marketing<br />
expertise at  <a href="http://www.toprankblog.com">Online Marketing Blog</a> offering<br />
daily news, interviews and best practices.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/let-the-spider-crawl-2006-05/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GoogleBot the &#8220;Spider of Doom&#8221;</title>
		<link>http://www.webpronews.com/googlebot-the-spider-of-doom-2006-03</link>
		<comments>http://www.webpronews.com/googlebot-the-spider-of-doom-2006-03#comments</comments>
		<pubDate>Thu, 30 Mar 2006 16:47:10 +0000</pubDate>
		<dc:creator>Jim Hedger</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Googlebot]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=28095</guid>
		<description><![CDATA[A <a href="http://www.thedailywtf.com/forums/65974/ShowPost.aspx" class="bluelink">funny story</a> is circulating in tech circles about how Googlebot inadvertently destroyed the database of a content management system (CMS) based site that took months of work to build.
]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.thedailywtf.com/forums/65974/ShowPost.aspx" class="bluelink">funny story</a> is circulating in tech circles about how Googlebot inadvertently destroyed the database of a content management system (CMS) based site that took months of work to build.</p>
<p>As the story goes, a web development firm was given a contract to rebuild an existing site using a CMS. As the client already had a site with a significant amount of content, they took it slow and fully populated the site with all the content from the previous site. When they had finally uploaded everything, they took the site live.</p>
<p>&#8220;Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default &#8220;please enter content&#8221; page. Whoops.&#8221;</p>
<p>After painstaking investigation, Googlebot, the spider Google uses to find information on the web, was found to be the cause.</p>
<p>When one of the users entered information to the CMS (using copy and paste), he or she included an EDIT hyperlink that was left in a multi-user document. As a human error, this wouldn&#8217;t normally be a problem because users are required to log-in with a password before they can make changes.</p>
<p>&#8220;But, the CMS authentication subsystem didn&#8217;t take into account the sophisticated hacking techniques of Google&#8217;s spider. As it turns out, Google&#8217;s spider doesn&#8217;t use cookies, which means that it can easily bypass a check for the &#8220;isLoggedOn&#8221; cookie to be &#8220;false&#8221;. It also doesn&#8217;t pay attention to Javascript, which would normally prompt and redirect users who are not logged on. It does, however, follow every hyperlink on every page it finds, including those with &#8220;Delete Page&#8221; in the title.&#8221;</p>
<p>In short, Googlebot muscled its way into the CMS and followed the edit link. The rest was history, or at least that&#8217;s what became of months of work. Fortunately, a recent backup of the full site was available for uploading. </p>
<p>Add to <script language='javascript'> document.write("<a   href='http://del.icio.us/post?url="+encodeURIComponent(document.location.href)+"&#038;title="+encodeURIComponent(document.title)+"  '>Del.icio.us</a>")</script> | <a href="javascript:void   window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,h  eight=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)">Digg</a>  | <a href="javascript:void   window.open('http://myweb2.search.yahoo.com/myresults/bookmarklet?t='+encodeURIComponent(document.title)+'&#038;u='+encodeURICompo  nent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=10  0,top=50',0)">Yahoo! My Web</a></p>
<p>Technorati: </p>
<p>Jim Hedger is the SEO Manager of <a href="http://www.Stepforth.com/">StepForth Search Engine Placement Inc.</a> Based in Victoria, BC, Canada, StepForth is the result of the consolidation of BraveArt Website Management, Promotion Experts, and Phoenix Creative Works, and has provided professional search engine placement and management services since 1997. http://www.stepforth.com/  Tel &#8211; 250-385-1190  Toll Free &#8211; 877-385-5526  Fax &#8211; 250-385-1198</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/googlebot-the-spider-of-doom-2006-03/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Could the New Google Spider be Causing Issues with Websites?</title>
		<link>http://www.webpronews.com/could-the-new-google-spider-be-causing-issues-with-websites-2006-03</link>
		<comments>http://www.webpronews.com/could-the-new-google-spider-be-causing-issues-with-websites-2006-03#comments</comments>
		<pubDate>Fri, 17 Mar 2006 13:57:14 +0000</pubDate>
		<dc:creator>Rob Sullivan </dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Websites]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=27721</guid>
		<description><![CDATA[Around the time Google announced "Big Daddy," there was a new Googlebot roaming the web.  Since then I've heard stories from clients of websites and servers going down and previously unindexed content getting indexed.
]]></description>
			<content:encoded><![CDATA[<p>Around the time Google announced &#8220;Big Daddy,&#8221; there was a new Googlebot roaming the web.  Since then I&#8217;ve heard stories from clients of websites and servers going down and previously unindexed content getting indexed.</p>
<p>I started digging into this and you&#8217;d be surprised at what I found out. </p>
<p>First, lets look at the timeline of events: </p>
<p>In Late September some astute spider watchers over at Webmasterworld spotted unique Googlebot activity.  In fact, it was in this thread that the bot was first reported on. It concerned some posters who thought that perhaps this could be regular users masquerading as the famous bot. </p>
<p>Early on it also appeared that the new bot wasn&#8217;t obeying the Robots.txt file.  This is the protocol which allows or denies crawling to parts of a website. </p>
<p>Speculation grew on what the new crawler was until Matt Cutts mentioned a new Google test data center.  For those that don&#8217;t know, Matt Cutts is a senior engineer with Google and one of the few Google employees talking to us &#8220;regular folk.&#8221; This mention happened in November. </p>
<p>There wasn&#8217;t much mention of Big Daddy until early January of this year when Matt again blogged about it asking for feedback. </p>
<p>Much feedback was given on the accuracy of the results.  There were also those that asked if the Mozilla Googlebot (known as &#8220;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#8221; in your visitor logs) and Big Daddy were related, but no response was made. </p>
<p>Now I&#8217;m going to begin some of my own speculation: </p>
<p>I do in fact believe the two are related.  In fact, I think this new crawler will eventually replace the old crawlers just as Big Daddy will replace the current data infrastructure. </p>
<p><b>Why is this important? </b></p>
<p>Based on my observations, this crawler may be able to do so much more than the old crawler. </p>
<p>For one, it emulates a newer browser.  The old bot was based on the Lynx text based browser.  While I&#8217;m sure Google added features as time went on, the basic Lynx browser is just that &#8211; basic. </p>
<p>Which explains why Google couldn&#8217;t deal with things like JavaScript, CSS and Flash. </p>
<p>However, with the new spider, built on the Mozilla engine, there are so many possibilities. </p>
<p>Just look at what your Mozilla or Firefox browser can do itself &#8211; render CSS, read and execute JavaScript and other scripting languages, even emulate other browsers. </p>
<p><b>But that&#8217;s not all. </b></p>
<p>I&#8217;ve talked to a few of my clients and their sites are getting hammered by this new spider.  It has gotten so bad that some of their servers have gone down because of the volume of traffic from this one spider! </p>
<p>On the plus side, I have clients who went from a few hundred thousand indexed pages to over 10 million in just a few weeks!  Literally since December, 2005 there&#8217;s been a 3500% increase in indexed pages over an 8 week period!  Just so you know, this is also the client&#8217;s site that went down because of the huge volume of crawling happening. </p>
<p><b>But that&#8217;s still not all. </b></p>
<p>I have another client which uses IP recognition to serve content based on a person&#8217;s geographic location.  If you live in the US you get American content and pricing; if you live in the UK you get UK content and pricing.  As you may imagine, the UK, US, Canadian and Australian content is all very similar.  In fact about the only thing noticeably different is the pricing aspect. </p>
<p>This is my concern &#8211; if the duplicate content gets indexed by Google what will they do?  There&#8217;s a good chance that the site would be penalized or even banned for violation of the webmaster quality guidelines set forth by Google. </p>
<p>This is why we implemented IP recognition &#8211; so that Googlebot, which crawls from US IP addresses only sees one version of the site. </p>
<p>However, a review of the server logs shows that this new Googlebot has been visiting not only the US content but also the content of the other sections of the site.  Naturally, I wanted to verify that the IP recognition was working.  It is.  This leads me to wonder then; can this browser spoof its location and/or use a proxy? </p>
<p>Imagine that &#8211; the browser is smart enough to do some of its own testing by viewing the site from multiple IP addresses.  If that&#8217;s the case then those who cloak sites are going to have problems. </p>
<p>In any case, from the limited observations I&#8217;ve made, this new Google &#8211; both the data center and the spider &#8211; are going to change the way we do things. </p>
<p>If you have experienced anything similar in the past few months to do with Google, be sure to add it to our comments section below. </p>
<p>Add to <script language='javascript'> document.write("<a   href='http://del.icio.us/post?url="+encodeURIComponent(document.location.href)+"&#038;title="+encodeURIComponent(document.title)+"  '>Del.icio.us</a>")</script> | <a href="javascript:void   window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,h  eight=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)">DiggThis</a>  | <a href="javascript:void   window.open('http://myweb2.search.yahoo.com/myresults/bookmarklet?t='+encodeURIComponent(document.title)+'&#038;u='+encodeURICompo  nent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=10  0,top=50',0)">Yahoo! My Web</a></p>
<p>Technorati: </p>
<p>Rob Sullivan is a SEO Consultant and Writer for <a href="http://www.textlinkbrokers.com">http://www.textlinkbrokers.com</a>.  Textlinkbrokers is the trusted leader in building long term rankings through safe and effective <a href="http://www.textlinkbrokers.com" target="_blank">link building</a>.  Please provide a link directly to Textlinkbrokers when syndicating this article. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/could-the-new-google-spider-be-causing-issues-with-websites-2006-03/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Assertivenet is Gigablast Spider (Gigabot)</title>
		<link>http://www.webpronews.com/assertivenet-is-gigablast-spider-gigabot-2006-03</link>
		<comments>http://www.webpronews.com/assertivenet-is-gigablast-spider-gigabot-2006-03#comments</comments>
		<pubDate>Mon, 13 Mar 2006 19:31:29 +0000</pubDate>
		<dc:creator>Adam Senour</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Assertivenet]]></category>
		<category><![CDATA[Forums]]></category>
		<category><![CDATA[Gigablast]]></category>
		<category><![CDATA[Gigabot]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Support]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=27580</guid>
		<description><![CDATA[The purpose of this article is to provide evidence and information to counteract the suggestion that Assertivenet is potentially used for malicious purposes.
]]></description>
			<content:encoded><![CDATA[<p>The purpose of this article is to provide evidence and information to counteract the suggestion that Assertivenet is potentially used for malicious purposes.</p>
<p><b>Initial Research</b></p>
<p>On Saturday, March 11, 2006, I received a somewhat urgent telephone call from a client of mine, <a href="http://www.hibiscusflorals.com/" class="bluelink">Hibiscus Florals</a>. The owner, Mark Morkowski, was concerned because he had been reviewing his website traffic statistics and had noticed that at numerous points throughout the day, a user or spider from &#8220;ASSERTIVENET&#8221; (IP 66.154.103.125) had visited the Hibiscus website.</p>
<p>Since this was rather unusual, Mark elected to investigate further by searching for more information &#8220;Assertivenet&#8221; via the Google search engine. The first three results that he found appear below:
<ul>
<li><a href="http://www.webmasterworld.com/forum10/11080.htm" class="bluelink">http://www.webmasterworld.com/forum10/11080.htm</a> </li>
<li><a href="http://www.powerbasic.com/support/forums/Forum12/HTML/002979.html" class="bluelink">http://www.powerbasic.com/support/forums/Forum12/HTML/002979.html</a> </li>
<li><a href="http://www.completewhois.com/cgi-bin/whois.cgi?query=66.154.111.40" class="bluelink">http://www.completewhois.com/cgi-bin/whois.cgi?query=66.154.111.40</a> </li>
</ul>
<p>From this information, Mark and I gathered that the owner of the spider in question appears to be a company called Assertive Networks, and hosted through a company called &#8220;BC Hosting.&#8221; More information wass not immediately available.</p>
<p>It is this lack of information that likely led some of the members of the PowerBASIC forums to block the IP range 66.154.* from accessing their various websites, and justifiably so. But this same lack of information led to additional questions:</p>
<p><b>1. What files was the Assertivenet spider accessing/trying to access?</b> Was the spider crawling pages or, like some bots, was it looking for specific files that could be used for malicious purposes (e.g. files and scripts that could be manipulated for website attacks?) </p>
<p><b>2. Why is the apparent owner of the Assertivenet spider a web hosting company (BC Hosting)? </p>
<p>3. What is the intended purpose of the Assertivenet spider? </b></p>
<p><b>Additional Research &#8211; All Is Not As It Appears</b></p>
<p>At this point, I decided to look beyond what the website traffic statistics revealed, as well as the information that Mark&#8217;s initial search revealed. I needed to start by answering the questions I posed earlier, and in<br />
order to do so, I needed to access the raw log files for the Hibiscus website.</p>
<p>I opened up the log files, searched for the particular IPs in question, and found a series of entries such as these:</p>
<p><code>2006-03-11 03:47:34 66.154.103.125 - 216.89.218.168 80 GET /robots.txt - 200 0 400 285 78 HTTP/1.0 www.hibiscusflorals.com <b>Gigabot/2.0/gigablast.com/spider.html </b>-<br />
2006-03-11 03:47:34 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=215 200 0 0 299 125 HTTP/1.0 www.hibiscusflorals.com <b>Gigabot/2.0/gigablast.com/spider.html</b> -<br />
2006-03-11 03:50:37 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=195 200 0 0 299 31 HTTP/1.0 www.hibiscusflorals.com <b>Gigabot/2.0/gigablast.com/spider.html </b>-<br />
2006-03-11 07:47:05 66.154.103.125 - 216.89.218.168 80 GET /robots.txt - 200 0 400 285 78 HTTP/1.0 www.hibiscusflorals.com <b>Gigabot/2.0/gigablast.com/spider.html </b>-<br />
2006-03-11 07:47:05 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=219 200 0 0 299 109 HTTP/1.0 www.hibiscusflorals.com <b>Gigabot/2.0/gigablast.com/spider.html </b>-</code></p>
<p>The spider in this case actually belongs to a search engine called <a href="http://www.gigablast.com/" class="bluelink">Gigablast</a>, and is appropriately named the Gigabot. The Gigabot only crawled pages and files as other search engines have, and made no attempts whatsoever to access files and scripts of a known malicious nature.</p>
<p>Gigablast is a &#8220;Tier 2&#8243; search engine that has over 1,000,000,000 pages indexed as of the date of this article (March 13, 2006.) While it is not on the same level in terms of popularity as the Big 3 of Yahoo!, MSN, and Google, it has indexed a significantly large portion of the web, and can be useful for some searches. In particular, Gigablast has implemented a &#8220;Giga bits&#8221; feature whereby alternate searches are suggested based on the user&#8217;s original query in order to help narrow the query down and provide greater relevancy.</p>
<p>I conducted additional research and discovered that some IP addresses from the 66.154.* IP block do resolve to gigablast.com e.g.:</p>
<li>66.154.102.46 </li>
<li>66.154.102.10 </li>
<li>66.154.103.50 </li>
<p><b>Conclusion &#8211; The Gigabot is Safe</b></p>
<p>As you may well have gathered by now, the Gigabot is a perfectly safe spider that acts and operates in the same manner as other search engine spiders operate. There is no reason at this time to block the 66.154.* IP range that the bot uses; if anything, webmasters would gain from the potential free traffic that Gigablast would generate for their websites as the result of the Gigabot&#8217;s efforts.</p>
<p>*Previously published at <a href="http://www.searchenginefriendlylayouts.com" class="bluelink">www.searchenginefriendlylayouts.com</a></p>
<p>Add to <script language='javascript'> document.write("<a   href='http://del.icio.us/post?url="+encodeURIComponent(document.location.href)+"&#038;title="+encodeURIComponent(document.title)+"  '>Del.icio.us</a>")</script> | <a href="javascript:void   window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,h  eight=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)">DiggThis</a>  | <a href="javascript:void   window.open('http://myweb2.search.yahoo.com/myresults/bookmarklet?t='+encodeURIComponent(document.title)+'&#038;u='+encodeURICompo  nent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=10  0,top=50',0)">Yahoo! My Web</a></p>
<p>Technorati: </p>
<p>Adam Senour is the owner of ADAM Web Design, a leading web design and development company in Toronto, Ontario, Canada.  Visit http://www.adamwebdesign.ca for more information on ADAM Web Design products and services.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/assertivenet-is-gigablast-spider-gigabot-2006-03/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LookSmart Builds Spider Bait</title>
		<link>http://www.webpronews.com/looksmart-builds-spider-bait-2005-10</link>
		<comments>http://www.webpronews.com/looksmart-builds-spider-bait-2005-10#comments</comments>
		<pubDate>Sat, 29 Oct 2005 15:17:06 +0000</pubDate>
		<dc:creator>Andrew Goodman</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[LookSmart]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[Searches]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Vertical]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=24195</guid>
		<description><![CDATA[So LookSmart's creating content now, to ensure itself a higher-quality distribution network for the pay-per-click ads it sells.
]]></description>
			<content:encoded><![CDATA[<p>So LookSmart&#8217;s creating content now, to ensure itself a higher-quality distribution network for the pay-per-click ads it sells.</p>
<p>Makes sense. It&#8217;s the type of on-topic, informative material that the original investors in the original highly granular directory thought would find an audience in the first place.</p>
<p>The next logical step would be to partner with Yahoo, Google, Quigo, etc. to actually serve (at least some of) the ads on those pages. LookSmart built a pretty robust platform for paid search, but the leaders in the field may well have larger advertiser networks and more importantly, advertisers willing and able to take the trouble to fund and maintain their contextual ad accounts.</p>
<p>I&#8217;d rather tweak my Google AdWords account to take account of (eg.) LookSmart&#8217;s new inventory than to have to open up a LookSmart account to get at it.</p>
<p><a href="http://www.clickz.com/news/article.php/3559766" class="bluelink">LookSmart Searches for Vertical Comeback</a></p>
<p><a name="andrew"></a> <a href="http://www.traffick.com/"> Andrew Goodman</a> is Principal of <a href="http://www.page-zero.com/">Page Zero Media</a>, a marketing consultancy which focuses on maximizing clients&#8217; paid search marketing campaigns.
<p>In 1999 Andrew co-founded <a href="http://www.traffick.com/">Traffick.com</a>, an acclaimed &#8220;guide to portals&#8221; which foresaw the rise of trends such as paid search and semantic analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/looksmart-builds-spider-bait-2005-10/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 1/45 queries in 0.028 seconds using memcached
Object Caching 653/773 objects using memcached

Served from: webpronews.com @ 2012-02-12 22:20:58 -->
