<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WebProNews &#187; Crawler</title>
	<atom:link href="http://www.webpronews.com/tag/crawler/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webpronews.com</link>
	<description>Breaking News in Tech, Search, Social, &#38; Business</description>
	<lastBuildDate>Mon, 13 Feb 2012 04:32:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Yahoo Slurp 3.0: Crawler Revised And Rolling Out</title>
		<link>http://www.webpronews.com/yahoo-slurp-30-crawler-revised-and-rolling-out-2008-04</link>
		<comments>http://www.webpronews.com/yahoo-slurp-30-crawler-revised-and-rolling-out-2008-04#comments</comments>
		<pubDate>Mon, 14 Apr 2008 23:37:19 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Slurp]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=45022</guid>
		<description><![CDATA[Site publishers will see a new Yahoo bot crawling through their pages, and webmasters should be aware of a couple of little issues that may affect them.
]]></description>
			<content:encoded><![CDATA[<p>Site publishers will see a new Yahoo bot crawling through their pages, and webmasters should be aware of a couple of little issues that may affect them.<br />
<span id="more-45022"></span>
<p>
Yahoo&#8217;s search team needed the past few weeks to prep their latest crawlers for its sojourns onto the World Wide Web. This new mutli-legged beast began its official rollout, as noted on the <a href=http://www.ysearchblog.com/archives/000531.html>Yahoo Search blog</a>.</p>
<p>
&#8220;The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for &#8216;Yahoo! Slurp,&#8217; though it&#8217;ll identify itself as Slurp 3.0 in your web logs,&#8221; the blog said. A handful of webmasters may notice a couple of changes.</p>
<p>
&#8220;Please note that if you&#8217;re using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you&#8217;re using any other method to avoid this problem,&#8221; they said. The IP ranges for Yahoo&#8217;s crawlers will change to a smaller set, too.</p>
<p>
The other concept applies to robots.txt files and directives. Yahoo Slurp 3.0 will recognize existing entries for &#8216;Slurp&#8217; or &#8216;Yahoo Slurp&#8217;, but not a reference to &#8216;Slurp/2.0.&#8217; Yahoo thinks those entries are rare enough anyway, but just in case, it was prudent for them to reference it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/yahoo-slurp-30-crawler-revised-and-rolling-out-2008-04/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Starts Controversial Form Crawling Program</title>
		<link>http://www.webpronews.com/google-starts-controversial-form-crawling-program-2008-04</link>
		<comments>http://www.webpronews.com/google-starts-controversial-form-crawling-program-2008-04#comments</comments>
		<pubDate>Sat, 12 Apr 2008 13:37:33 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Forms]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=44996</guid>
		<description><![CDATA[Googlebot received an update that permits it to complete certain forms, and learn more about the site hosting them.
]]></description>
			<content:encoded><![CDATA[<p>Googlebot received an update that permits it to complete certain forms, and learn more about the site hosting them.<br />
<span id="more-44996"></span>
<p>
Websites place content behind forms for the purpose of collecting information from a visitor requesting access to it. The site publisher might want those details for demographic details to improve marketing campaigns, for example.</p>
<p>
<a name="more"></a>Google thinks it can present better results to searchers by having access to the URLs behind forms, improving the site&#8217;s exposure in the process. The <a href=http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html>Google Webmaster Central blog</a> promised their crawls will be well-behaved:</p>
<blockquote style=background-color:#c2dfff;><p><i>Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won&#8217;t crawl any of the URLs that a form would generate.  Similarly, we only retrieve GET forms and avoid forms that require any kind of user information.</i></p></blockquote>
<p>
However, concerns have been raised about Google crawling forms not marked as forbidden. Kevin Heisler complained at <a href=http://blog.searchenginewatch.com/blog/080411-182955>Search Engine Watch</a> the practice could violate the privacy of corporate data.</p>
<p>
Though confident in Google&#8217;s intentions, Heisler thinks potential backlash from corporate interests could be a problem. &#8220;The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers,&#8221; he said.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-starts-controversial-form-crawling-program-2008-04/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Unvalidated Robots.Txt Risks Google Banishment</title>
		<link>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11</link>
		<comments>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11#comments</comments>
		<pubDate>Wed, 21 Nov 2007 11:53:35 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[Robots.txt]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=42095</guid>
		<description><![CDATA[The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.
]]></description>
			<content:encoded><![CDATA[<p>The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.<br />
<span id="more-42095"></span></p>
<table width="400" cellspacing="0" cellpadding="2" border="0">
<tr>
<td align="center"><img width="400" height="200" border="0" src="http://images.ientrymail.com/webpronews/article_pics/unvalidated_robots_risks_google_banishment.jpg" title="Unvalidated Robots.Txt Risks Google Banishment" alt="Unvalidated Robots.Txt Risks Google Banishment" class="irImage" /></td>
</tr>
<tr>
<td class="caption" style="padding-bottom: 10px; padding-left: 45px; padding-right: 45px;" align="right">Unvalidated Robots.Txt Risks Google Banishment</td>
</tr>
<tr>
<td class="caption" style="padding-bottom: 0px;" align="center"><img width="334" height="21" src="http://images.ientrymail.com/webpronews/salon/complete.gif" alt="" /></td>
</tr>
</table>
<p>Webmasters welcome being dropped out of Google about as much as they enjoy flossing with barbed wire. Making it easier for Google to do that would be anathema to being a webmaster. Why willingly exclude one&#8217;s site from Google?</p>
<p>
That could happen with an unvalidated robots.txt file. Robots.txt allows webmasters to provide standing instructions to visiting spiders, which contributes to having a site indexed faster and more accurately.</p>
<p>
Google has been <a href=http://www.webpronews.com/topnews/2007/07/13/some-new-tags-to-play-with>considering new syntax</a> to recognize within robots.txt. The <a href=http://sebastians-pamphlets.com/validate-your-robots-txt-or-google-might-deindex-your-site/>Sebastians-Pamphlets</a> blog said Google confirmed recognizing experimental syntax like Noindex in the robots.txt file.</p>
<p>
This poses a danger to webmasters who have not validated their robots.txt. A line reading <tt>Noindex: /</tt> could lead to one&#8217;s site being completely de-indexed.</p>
<p>
The surname-less Sebastian recommended Google&#8217;s <a href=https://www.google.com/webmasters/tools/robots?siteUrl=>robots.txt analyzer</a>, part of Google&#8217;s Webmaster Tools, and only using the <tt>Disallow, Allow, and Sitemaps</tt> crawler directives in the Googlebot section of robots.txt.</p>
<p><center><a href="http://aj.600z.com/aj/41546/0/cc?z=1"><img src="http://aj.600z.com/aj/41546/0/vc?z=1&#038;dim=41553" width="336" height="55" border="0"></a></center></p>
<p>
<small></small></p>
<p>
<a href="http://twitter.com/dutter/">follow me on Twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/unvalidated-robots-txt-risks-google-banishment-2007-11/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Avenue A / Razorfish Clarifies Crawler</title>
		<link>http://www.webpronews.com/avenue-a-razorfish-clarifies-crawler-2007-08</link>
		<comments>http://www.webpronews.com/avenue-a-razorfish-clarifies-crawler-2007-08#comments</comments>
		<pubDate>Wed, 01 Aug 2007 20:37:41 +0000</pubDate>
		<dc:creator>Mike Sachoff</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Audio]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Razorfish]]></category>
		<category><![CDATA[search engine optimization]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=39504</guid>
		<description><![CDATA[<p>Rob Aronson, Vice President of Search Engine Marketing for Avenue A <a title="Search Engine Optimization" href="http://www.avenuea-razorfish.com/">Razorfish</a> spoke to WebProNews for an interview about their Super Intelligent Link Crawler. (SiLC)</p>
]]></description>
			<content:encoded><![CDATA[<p>Rob Aronson, Vice President of Search Engine Marketing for Avenue A <a title="Search Engine Optimization" href="http://www.avenuea-razorfish.com/">Razorfish</a> spoke to WebProNews for an interview about their Super Intelligent Link Crawler. (SiLC)</p>
<p><span id="more-39504"></span></p>
<table width="400" cellspacing="0" cellpadding="5" background="http://images.ientrymail.com/webpronews/listen_audio.jpg" style="background-repeat: no-repeat; font-family: verdana, arial; font-size: 11px; border-bottom: solid 1px #000000;">
<tr>
<td><img src="http://images.ientrymail.com/webpronews/headphones.gif" align="absbottom"> <a href="http://videos.webpronews.com/video/2007/08/01/robaronson080107.mp3" style="color: #ff0000; text-decoration: none; font-size: 13px;" target="_blank"><b>Listen to the audio</b></a><br />I spoke to Rob Aronson, Vice President of Search Engine Marketing for Avenue A/ Razorfish about their proprietary tool Super Intelligent Link Crawler (SiLC) <a href="http://videos.webpronews.com/video/2007/08/01/robaronson080107.mp3">Listen to the audio here</a>.</td>
</tr>
</table>
<p>
He described the tool as a simulated spider that is basically a Web crawler that helps them identify how search engines will interact with their client&rsquo;s Web sites and also improves user&rsquo;s experience on the site.</p>
<p>The tool itself is a proprietary tool for their clients that want to optimize sites for search results. The tool is not a stand alone for<br />
public use and there is no link to the tool available.</p>
<p>When asked if it would be offered to the public, he said not at this point but that there was the potential for it to be made public in the future.</p>
<p>As for the cost to use the tool, it depends on the size of the site, how much work is being done and how much Web development needs to be done. Those factors determine a wide range of pricing.</p>
<p>He said the tool is aimed at mainly larger Web sites such as U.S. News and World Reports, and the larger the site the more benefits the technology will&nbsp;have.</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/avenue-a-razorfish-clarifies-crawler-2007-08/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Avenue A &#124; Razorfish Launches Link Crawler</title>
		<link>http://www.webpronews.com/avenue-a-razorfish-launches-link-crawler-2007-07</link>
		<comments>http://www.webpronews.com/avenue-a-razorfish-launches-link-crawler-2007-07#comments</comments>
		<pubDate>Tue, 31 Jul 2007 14:47:30 +0000</pubDate>
		<dc:creator>Mike Sachoff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Free]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Link]]></category>
		<category><![CDATA[Link Crawler]]></category>
		<category><![CDATA[links]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Razorfish]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Tool]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=39463</guid>
		<description><![CDATA[<p>Interactive marketing and technology company Avenue A &#124; Razorfish has launched Super-intelligent Link Crawler (SiLC), a tool that crawls Web sites to find errors such as broken links and &#34;404 Error Messages&#34; that inform users the Web page is not loading.</p>
]]></description>
			<content:encoded><![CDATA[<p>Interactive marketing and technology company Avenue A | Razorfish has launched Super-intelligent Link Crawler (SiLC), a tool that crawls Web sites to find errors such as broken links and &quot;404 Error Messages&quot; that inform users the Web page is not loading.</p>
<p><span id="more-39463"></span></p>
<table width="400" cellspacing="0" cellpadding="2" border="0">
<tr>
<td align="center"><img width="400" height="200" border="0" class="irImage" alt="Avenue A | Razorfish Launches Link Crawler" title="Avenue A | Razorfish Launches Link Crawler" src="http://images.ientrymail.com/webpronews/article_pics/avenue_a_razorfish_launches_link_crawler.jpg" /></td>
</tr>
<tr>
<td align="right" class="caption" style="padding-bottom: 10px; padding-left: 45px; padding-right: 45px;">Avenue A | Razorfish Launches Link Crawler</td>
</tr>
<tr>
<td align="center" class="caption" style="padding-bottom: 0px;"><img width="334" height="21" src="http://images.ientrymail.com/webpronews/salon/complete.gif" alt="" /></td>
</tr>
</table>
<p>The tool examines why errors occur and can measure a Web site&#8217;s performance against competing sites. The company has combined the tool with its search engine optimization and Web design to improve search rankings for brands like U.S. News &amp; World Report.</p>
<p>U.S. News &amp; World Report recently redesigned its Web site to improve traffic and search capability for the site. After using the SiLC tool, Avenue A | Razorfish found search engines were not ranking many of its Web pages because they were flagging pages as duplicate content.</p>
<p>U.S. News &amp; World <a title="Online Search" href="http://www.usnews.com/">Report</a> created the main content of a health page article and a &quot;printer friendly&quot; version. The web crawlers were tracking both and not ranking the pages in search engine results. After the relaunch organic visits increased 24 percent and organic visits from <a title="Google" href="http://labs.google.com/">Google</a> increased 45 percent.</p>
<p>William Flaiz, vice president, Search Engine Optimization (SEO) and Web Analytics for Avenue A | <a title="Search Engine Optimization" href="http://www.avenuea-razorfish.com/">Razorfish</a> said, &quot;SiLC is like a Swiss Army Knife that serves many purposes beyond web crawling.&quot;</p>
<p>&quot;It is different than other web crawlers because it can intelligently process why certain web pages aren&#8217;t performing well and integrate with other SEO tools to improve efficiencies not only within a web site, but to other linked Web sites, portals and blogs.&quot;</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/avenue-a-razorfish-launches-link-crawler-2007-07/feed</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>Wikia Acquires Grub from LookSmart</title>
		<link>http://www.webpronews.com/wikia-acquires-grub-from-looksmart-2007-07</link>
		<comments>http://www.webpronews.com/wikia-acquires-grub-from-looksmart-2007-07#comments</comments>
		<pubDate>Mon, 30 Jul 2007 16:24:48 +0000</pubDate>
		<dc:creator>Andy Beal </dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Grub]]></category>
		<category><![CDATA[Link]]></category>
		<category><![CDATA[LookSmart]]></category>
		<category><![CDATA[Wikia]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=39441</guid>
		<description><![CDATA[<p>Wikipedia founder Jimmy Wales would like you to help him build&#160;the&#160;revenue&#160;for his new &#8220;for profit&#8221; venture Wikia. Wikia has <a title="Wikia has acquired the distributed crawler Grub" href="http://arstechnica.com/news.ars/post/20070730-wikia-acquires-grub-distributed-search-indexing-system.html">acquired the distributed crawler Grub</a> from LookSmart and Wales plans to make it open source. <a href="http://gigaom.com/2007/07/30/google-vs-jimmy-wales-and-open-source-search/">He&#8217;d like to invite the community to line his coffers</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>Wikipedia founder Jimmy Wales would like you to help him build&nbsp;the&nbsp;revenue&nbsp;for his new &ldquo;for profit&rdquo; venture Wikia. Wikia has <a title="Wikia has acquired the distributed crawler Grub" href="http://arstechnica.com/news.ars/post/20070730-wikia-acquires-grub-distributed-search-indexing-system.html">acquired the distributed crawler Grub</a> from LookSmart and Wales plans to make it open source. <a href="http://gigaom.com/2007/07/30/google-vs-jimmy-wales-and-open-source-search/">He&rsquo;d like to invite the community to line his coffers</a>.</p>
<p>In case you need a better understanding of Wales&rsquo; plans, he explains his motives:</p>
<blockquote>
<p>&ldquo;Search is part of the fundamental infrastructure of the Internet. And, it is currently broken,&rdquo; Wales said back in December 2006, when Wikia launched Search Wikia effort. &ldquo;Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency.&rdquo;</p>
</blockquote>
<p>Of course, Wales forgets to mention that Wikipedia wouldn&rsquo;t have anywhere near the success if its content didn&rsquo;t appear in the top 5 of Google for just about every generic keyword you can imagine. It also slipped his mind that he has a track record of not exactly giving back to those that helped make it a success &#8211; they&rsquo;ll happily &ldquo;assimilate&rdquo; your web content, <a href="http://www.marketingpilgrim.com/2007/01/wikipedia-links-no-longer-passing-pagerank.html">just don&rsquo;t expect a link back in the process</a>.</p>
<p>I wish Jimmy Wales the best of luck, but if&nbsp;he really wants to change the world, why not improve the non-profit Wikipedia? If&nbsp;his plans involve Wikia &#8211; which means&nbsp;he&rsquo;s in this for the money &#8211; spare us the sanctimonious spin.</p>
<p><a href="http://www.marketingpilgrim.com/2007/07/wikipedias-jimmy-wales-wants-to-have-his-cake-and-eat-it-too.html#respond"> Comments</a></p>
<p>Tag: </p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/wikia-acquires-grub-from-looksmart-2007-07/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Name Is Crawl. Yahoo Crawl</title>
		<link>http://www.webpronews.com/the-name-is-crawl-yahoo-crawl-2007-06</link>
		<comments>http://www.webpronews.com/the-name-is-crawl-yahoo-crawl-2007-06#comments</comments>
		<pubDate>Tue, 05 Jun 2007 22:37:46 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Search Engine]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[Yahoo Slurp]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=38222</guid>
		<description><![CDATA[Yahoo Slurp, the portal company's search crawler, has been migrated to its new domain at crawl.yahoo.net and is coming to a web server near you.
]]></description>
			<content:encoded><![CDATA[<p>Yahoo Slurp, the portal company&#8217;s search crawler, has been migrated to its new domain at crawl.yahoo.net and is coming to a web server near you.<br />
<span id="more-38222"></span><br />
There&#8217;s a new crawler in town from Yahoo. Sort of. It&#8217;s the same crawler webmasters know and love already, but it will look different in their access logs, according to the <a href=http://www.ysearchblog.com/archives/000460.html>Yahoo Search</a> crew:</p>
<blockquote><p><i>As of today, the transition is complete and all machines crawling as Slurp are now in crawl.yahoo.net. You can see this change in your web server logs, where the page accesses from inktomisearch.com are being fully replaced by crawl.yahoo.net contacts. Note that this does not cover other Yahoo! crawlers, such Yahoo! China, and other verticals, like Yahoo! Shopping, Yahoo! Travel, etc., which have their own user-agent. </i></p></blockquote>
<p>Webmasters won&#8217;t have to tweak their robots.txt files or any IP-based filtering they perform, since the agent is still called Yahoo Slurp and the IP addresses originating the crawl will remain the same. </p>
<p>
However, security administrators will want to permit crawl.yahoo.net to access pertinent systems within their networks, or risk blocking the crawlers and incurring the screaming wrath of the webmaster.</p>
<p>
Priyank Garg and Yoram Arnon of Yahoo recommend webmasters set up reverse DNS-based authentication of the crawler for security reasons. This keeps rogue bots from masquerading as Yahoo Slurp.</p>
<p>
<small></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/the-name-is-crawl-yahoo-crawl-2007-06/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Webmaster Settles With IA, Goes After Teenager</title>
		<link>http://www.webpronews.com/webmaster-settles-with-ia-goes-after-teenager-2007-04</link>
		<comments>http://www.webpronews.com/webmaster-settles-with-ia-goes-after-teenager-2007-04#comments</comments>
		<pubDate>Wed, 25 Apr 2007 22:42:34 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Archive]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Internet Archive]]></category>
		<category><![CDATA[Jeff Veillette]]></category>
		<category><![CDATA[Reply]]></category>
		<category><![CDATA[Suzanne Shell]]></category>
		<category><![CDATA[Time]]></category>
		<category><![CDATA[WAS]]></category>
		<category><![CDATA[Wayback Machine]]></category>
		<category><![CDATA[Webmaster]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=37268</guid>
		<description><![CDATA[<p>One of the most bizarre Internet stories this year gets more bizarre. Internet Archive and Suzanne Shell have resolved their lawsuits against each other &#34;amicably,&#34; a word that follows words like &#34;theft&#34; and &#34;racketeering&#34; about as well as toothpaste follows orange juice. Next on her list: a 15-year-old Canadian jokester.&#160; <br />
]]></description>
			<content:encoded><![CDATA[<p>One of the most bizarre Internet stories this year gets more bizarre. Internet Archive and Suzanne Shell have resolved their lawsuits against each other &quot;amicably,&quot; a word that follows words like &quot;theft&quot; and &quot;racketeering&quot; about as well as toothpaste follows orange juice. Next on her list: a 15-year-old Canadian jokester.&nbsp; <br />
<span id="more-37268"></span> <br />
The case came to light last month when IA sought a court ruling that its WayBack Machine, which crawls and indexes copies of websites for historical purposes, wasn&#8217;t violating Shell&#8217;s copyrights. </p>
<p>Shell, who claimed IA&#8217;s web <a href="http://www.webpronews.com/topnews/2007/03/16/webmaster-claims-spider-entered-contract-in-suit">crawler entered into a contract</a> while indexing, responded (quite reasonably, I might add) with a countersuit accusing IA of conversion, civil theft, breach of contract, and violations of both federal racketeering and Colorado organized crime legislation. </p>
<p>It was an important case, not necessarily because of its overall merit (all but one claim was immediately dismissed), but because the judge would have to consider whether or not spiders could enter a contract, and the world was listening. </p>
<p>Seems the judge won&#8217;t be able to make that declaration any time soon, as IA and Shell seem to have kissed and made up, issuing a joint statement about the settlement. </p>
<p>&ldquo;Internet Archive has no interest in including materials in the Wayback Machine of persons who do not wish to have their Web content archived,&quot; said an IA spokesperson in a statement. </p>
<p>&quot;We recognize that Ms. Shell has a valid and enforceable copyright in her Web site and we regret that the inclusion of her Web site in the Wayback Machine resulted in this litigation. We are happy to have this case behind us.&rdquo;</p>
<p>Shell seems to have gotten over it, too. &ldquo;I respect the historical value of Internet Archive&rsquo;s goal,&quot; she said. &quot;I never intended to interfere with that goal nor cause it any harm,&rdquo; said Ms. Shell. </p>
<p>That&#8217;s interesting&hellip;could have sworn she sued them for theft, racketeering, and organized criminal activity. But if both say they settled it amicably, then who am I to question? Maybe IA just didn&#8217;t want to give the judge a chance to rule that web crawlers could actually enter a contract.</p>
<p>The statement ends with this gem:</p>
<blockquote><p><em>Both parties sincerely regret any turmoil that the lawsuit may have caused for the other. Neither Internet Archive nor Ms. Shell condones any conduct which may have caused harm to either party arising out of the public attention to this lawsuit. The parties have not engaged in such conduct and request that the public response to the amicable resolution of this litigation be consistent with their wishes that no further harm or turmoil be caused to either party. </em></p></blockquote>
<p>The turmoil they&#8217;re talking about may include 15-year-old Canadian and Digg.com user <a href="http://www.jeffler.com/">Jeff Veillette</a>, who issued a kind of online challenge to Shell by <a href="http://www.webpronews.com/topnews/2007/03/19/internet-archive-v-shell-the-publicity-aftermath">framing her entire site</a>, daring her to take action against him. While brazen, Shell claims it went well beyond framing. </p>
<p>If you visit her website, profane-justice.org (careful, you&#8217;re entering into a contract with an anti-child protective services activist), Shell posts Veillette&#8217;s name, home address, emails, and phone number after accusing him of hacking her website and setting up a pop-up spam campaign, viewable by 25 million people and defaming her. Shell says he expressly admitted to the hacking and spam campaign.</p>
<p>She includes blogger <a href="http://thetruthistold.com/">Billy Wiseman</a> (a guy she really, really doesn&#8217;t like) in the accusation, listing his address as well. The FBI Computer Crime Center was contacted. </p>
<p>However, in an interview with WebProNews, hacking and spamming came as news to Veillette, who says his personal information listed is outdated (remember, at 15, he&#8217;s a minor&nbsp; &#8212; is it legal to post a minor&#8217;s information online? Maybe the FBI needs to give her a call). Jeff denies hacking her site or spamming anyone, but does admit to framing her site elsewhere, an act we reported when it happened. </p>
<p>As for Wiseman, Veillette says he had nothing to do with any of it. &quot;Billy Wiseman wasn&#8217;t part of the attacks,&quot; he said. &quot;He was a blogger that was watching the Shell case and interviewed me.&quot;</p>
<p>&quot;What happened was I put the site in a frame. How that&#8217;s hacking I&#8217;ll never know.&quot; </p>
<p>So Shell never lost control of her site? </p>
<p>&quot;Nope,&quot; said Veillette. &quot;She ran out of bandwidth for her hosting.&quot;</p>
<p>So, has the FBI been in touch? </p>
<p>&quot;Nope. They probably laughed at this, too. What I&#8217;m wondering overall, though, is if she&#8217;s for supporting kids, why is she going out of her way to try and ruin a 15-year-old&#8217;s life?&quot; </p>
<p>Good question, Jeff. I&#8217;m sure, though, she never meant any harm.&nbsp; </p>
<p>Financial terms of the settlement between Shell and Internet Archive were not disclosed.</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/webmaster-settles-with-ia-goes-after-teenager-2007-04/feed</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>The Search Engines Agree on an XML Sitemap Standard</title>
		<link>http://www.webpronews.com/the-search-engines-agree-on-an-xml-sitemap-standard-2006-11</link>
		<comments>http://www.webpronews.com/the-search-engines-agree-on-an-xml-sitemap-standard-2006-11#comments</comments>
		<pubDate>Thu, 16 Nov 2006 16:30:01 +0000</pubDate>
		<dc:creator>Thomas McMahon</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[sitemap]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=32936</guid>
		<description><![CDATA[In great news, Google, Yahoo and MSN have agreed upon a <a href="http://googleblog.blogspot.com/2006/11/search-engines-united.html" class="bluelink">XML sitemap standard</a>.
]]></description>
			<content:encoded><![CDATA[<p>In great news, Google, Yahoo and MSN have agreed upon a <a href="http://googleblog.blogspot.com/2006/11/search-engines-united.html" class="bluelink">XML sitemap standard</a>.</p>
<p><img src="http://images.ientrymail.com/webpronews/gym1116.jpg"></p>
<p>What&#8217;s this mean to you? </p>
<p>Well, you just need to create one XML sitemap and you can submit it to all three engines. </p>
<p>I&#8217;ve been doing that for a while and didn&#8217;t seem to have any problem but now it&#8217;s official.</p>
<p>Why should you create an XML sitemap? An XML sitemap outlines all the URLs on a site so that search engines can find them all easily. </p>
<p>Much like a normal sitemap, it just lists all the pages however, an XML sitemap is optimized for search engines, not for people. By giving search engines a list of URLs on your site, you are increasing the number of pages they know about and can refer traffic to.</p>
<p>If you posted once a day for a year, you&#8217;d have 365 post pages. That doesn&#8217;t include categories, archives and any other blog related pages. </p>
<p>Since the numbers add up so quickly, it&#8217;s hard for a search engine to keep up and even harder for a user to keep them organized so that an engine can find them all easily. </p>
<p>An XML sitemap gives engines one file full of all the URLs on your site, optimized just for them.</p>
<p>So, how do you create an XML site map? There are variety of ways but here are my top three choices:<br />
<blockquote> <a href="http://www.arnebrachhold.de/2005/06/05/google-sitemaps-generator-v2-final" class="bluelink">1. Google Sitemap Generator Plugin</a> for WordPress. Once installed, it automatically updates and pings Google when you have new content. Since all engines now use the same XML formatting, the name is a little outdated. It should be updated to Search Engine Sitemap Generator. </p>
<p><a href="http://gsitecrawler.com/" class="bluelink">2. GSite Crawler.</a> This is a free desktop application for PCs that will crawl your site and create the XML site map for you to upload. It doesn&#8217;t automatically update but it is easy to use and you could re-generate an XML map ever month or so. If anyone knows of a free Mac app like this, I&#8217;d love to hear about it. </p>
<p><a href="http://www.xml-sitemaps.com/" class="bluelink">3. XML-Sitemaps.com</a>. This is a web based XML sitemap creation tool that is free for small sites. Again, it&#8217;s not automatic, but it&#8217;s free and easy to use. Plus it works on any platform. </p></blockquote>
<p>If you have a blog, you should be creating an XML sitemap. Even if you do it only once every six months, it&#8217;s better than none at all.</p>
<p>Do you use a different tool for creating XML sitemaps? If so, do share. <img src='http://www.webpronews.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><a href="http://bloggerdesign.com/70/search-engines-agree-on-xml-sitemap-standard/#respond" class="bluelink">Comments</a></p>
<p>Bookmark WebProNews: <a href=http://www.webpronews.com><img src=http://images.ientrymail.com/webpronews/wpn-readit.jpg border=0></a></p>
<p>Thomas McMahon is a SEO Designer for <a href="http://toprankresults.com/">TopRank Online Marketing</a> in Minneapolis, MN.  His specialities inlude technical optimization of existing web sites, creating search engine friendly web designs, and blog optimization.  He has also created a number of blog marketing tools, WordPress plug-ins and FireFox add-ons.</p>
<p>Blog: http://bloggerdesign.com</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/the-search-engines-agree-on-an-xml-sitemap-standard-2006-11/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yahoo Unleashes New Search Crawler</title>
		<link>http://www.webpronews.com/yahoo-unleashes-new-search-crawler-2006-07</link>
		<comments>http://www.webpronews.com/yahoo-unleashes-new-search-crawler-2006-07#comments</comments>
		<pubDate>Fri, 28 Jul 2006 18:56:49 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Crawler]]></category>
		<category><![CDATA[Delicious]]></category>
		<category><![CDATA[Digg]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[Yahoo Search]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=30618</guid>
		<description><![CDATA[Yahoo Slurp may be showing up heavily in access logs around the web, as the company gave the new spider a workout alongside the old spider before retiring the previous Slurp.
]]></description>
			<content:encoded><![CDATA[<p>Yahoo Slurp may be showing up heavily in access logs around the web, as the company gave the new spider a workout alongside the old spider before retiring the previous Slurp.</p>
<table width="400" border="0" cellpadding="2" cellspacing="0">
<tr>
<td align="center"><img src="http://images1.ientrymail.com/webpronews/072806YahooCrawler.gif" alt="Yahoo Unleashes New Search Crawler" width="400" height="200" border="0" class="irImage" title="Yahoo Unleashes New Search Crawler"></td>
</tr>
<tr>
<td align="right" class="caption" style="padding-bottom: 10px; padding-left: 45px; padding-right: 45px;">Yahoo Unleashes New Search Crawler</td>
</tr>
<tr>
<td align="center" class="caption" style="padding-bottom: 0px;"><img src="http://images.ientrymail.com/webpronews/salon/complete.gif" width="334" height="21"></td>
</tr>
</table>
<p>It has taken a few weeks for Yahoo to ensure its new version of <a href=http://help.yahoo.com/help/us/ysearch/slurp/ class=bluelink>Slurp</a> will work properly, which explains why sites have probably seen a lot more Yahoo requests trawling their sites.</p>
<p>On the <a href=http://www.ysearchblog.com/archives/000334.html class=bluelink>Yahoo! Search blog</a> entry about the new Slurp crawler, David Simpson and Priyank Garg discussed the switch in their Weather Report post:</p>
<p><i>
<div style=margin-left:10px; margin-right:10px>In addition to crawling the Internet faster, our new crawler is more efficient at visiting websites. As a result, site owners will notice as much as a 25% reduction in the number of requests and bandwidth consumed by the crawler.</p>
<p>Now, with the new crawler in full production, we have turned off the old crawler and site owners should see a much lower crawl load without a loss in content coverage.</p></div>
<p></i><br />
As always, ranking changes and page shuffling around the index may have happened for websites, so publishers will want to take a look at where they show up now in Yahoo! Search.</p>
<p>The efficiency as reported by Yahoo should be welcome by publishers when their new bandwidth bills arrive.</p>
<p><i>Discuss this article with your fellow WebPros at <a href="http://www.webproworld.com/viewtopic.php?p=315182" class=bluelink>WebProWorld</a>.</i><br />
&#8212;<br />
Tag: </p>
<p>Add to <a href="http://del.icio.us/post" onclick="window.open('http://del.icio.us/post?v=4&#038;partner=wpn&#038;noui&#038;jump=close&#038;url='+encodeURIComponent(location.href)+'&#038;title='+encodeURIComponent(document.title),'delicious','toolbar=no,width=700,height=400'); return false;" CLASS="printMailTop"><img src=http://images1.ientrymail.com/webpronews/delicious-pic.png border=0> Del.icio.us</a> | <a href="javascript:void window.open('http://digg.com/submit?phase=2&#038;url='+encodeURIComponent(window.location.href)+'&#038;ei=UTF-8','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)"><img src=http://images1.ientrymail.com/webpronews/digg-pic.png border=0> Digg</a>  | <a href="javascript:void window.open('http://myweb2.search.yahoo.com/myresults/bookmarklet?t='+encodeURIComponent(document.title)+'&#038;u='+encodeURIComponent(window.location.href)+'&#038;tag=Yahoo! Search','popup','width=520px,height=420px,status=0,location=0,resizable=1,scrollbars=1,left=100,top=50',0)"><img src=http://images1.ientrymail.com/webpronews/yahoo-pic.png border=0> Yahoo! My Web</a> | <a href="javascript:location.href='http://www.furl.net/storeIt.jsp?u='+encodeURIComponent(document.location.href)+'&#038;t='+encodeURIComponent(document.title)+' '"><img src=http://images1.ientrymail.com/webpronews/furl-pic.png border=0> Furl</a></p>
<p>Bookmark WebProNews: <a href=http://www.webpronews.com><img src=http://images.ientrymail.com/webpronews/wpn-readit.jpg border=0></a> </p>
<p><script language=JavaScript src="http://aj.600z.com/aj/1095/0/vj?z=1&#038;dim=1088&#038;pos=15"></script></p>
<p>David Utter is a staff writer for WebProNews covering technology and business. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/yahoo-unleashes-new-search-crawler-2006-07/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 1/47 queries in 0.029 seconds using memcached
Object Caching 669/789 objects using memcached

Served from: webpronews.com @ 2012-02-13 01:21:31 -->
