<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WebProNews &#187; Spidering</title>
	<atom:link href="http://www.webpronews.com/tag/spidering/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webpronews.com</link>
	<description>Breaking News in Tech, Search, Social, &#38; Business</description>
	<lastBuildDate>Mon, 13 Feb 2012 02:05:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Google Explains Meta Tags</title>
		<link>http://www.webpronews.com/google-explains-meta-tags-2007-12</link>
		<comments>http://www.webpronews.com/google-explains-meta-tags-2007-12#comments</comments>
		<pubDate>Wed, 05 Dec 2007 17:33:45 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Central]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[meta tags]]></category>
		<category><![CDATA[Robots.txt]]></category>
		<category><![CDATA[sitemaps]]></category>
		<category><![CDATA[Spidering]]></category>
		<category><![CDATA[Tags]]></category>
		<category><![CDATA[Webmaster]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=42427</guid>
		<description><![CDATA[<p>Search engines pay attention to some tags, and none to others. Remember when the 'keywords' meta tag mattered? Ah, the good old days.</p>
]]></description>
			<content:encoded><![CDATA[<p>Search engines pay attention to some tags, and none to others. Remember when the &#8216;keywords&#8217; meta tag mattered? Ah, the good old days.</p>
<p><span id="more-42427"></span></p>
<p>John Mueller placed a useful post from Zurich on the <a href="http://googlewebmastercentral.blogspot.com/2007/12/answering-more-popular-picks-meta-tags.html">Webmaster Central blog</a> at Google, where he delves into the issue of meta tags. Back in the day, meta tags like &#8216;keywords&#8217; helped webmasters get their sites indexed appropriately.</p>
<p>It took next to no time for spammers to start clogging &#8216;keywords&#8217; to the point where they became useless. As Mueller reminded everyone in an answer to a comment, Google isn&#8217;t looking at them for indexing purposes:</p>
<blockquote><p><em>(W)e generally ignore the contents of the &quot;keywords&quot; meta tag. As with other possible meta tags, feel free to place it on your pages if you can use it for other purposes &#8211; it won&#8217;t count against you.</em></p></blockquote>
<p>Plenty of tags do work favorably for pages, as do <a href="http://www.sitemaps.org/">Sitemaps</a>, which enjoy support from Google, Yahoo, and Microsoft. Meta tags can control robot behavior, or in the case of an individual crawler like Google&#8217;s Googlebot, affect a single robot&#8217;s actions with certain directives like meta name=robots or name=googlebot:</p>
<blockquote><p><em>Google understands the following values (when specifying multiple values, separate them with a comma): </em><br />
<blockquote><tt>noindex: prevents the page from being indexed (see &quot;Block or remove pages using meta tags&quot;) </tt><tt></p>
<p>nofollow: don't follow links from this page when looking for new pages to crawl (also see &quot;Block or remove pages using meta tags&quot;) </p>
<p>nosnippet: don't show a snippet of this page when displaying it in the search results (see &quot;Prevent or remove snippets&quot;) </p>
<p>noodp: don't use text from ODP (The Open Directory Project a.k.a. dmoz.org) to generate a title or snippet for this page (see &quot;How do I change my site's title and description?&quot;) </p>
<p>noarchive: don't display a &quot;Cached&quot; link for this page in the search results (see &quot;Prevent or remove cached pages&quot;) </p>
<p>unavailable_after:[date]: remove this page from the search results after the specified date and time (see &quot;Robots Exclusion Protocol: now with even more flexibility&quot;)</tt></p></blockquote>
</blockquote>
<p>Mueller noted the default rule is &quot;index, follow&quot; when this meta tag is not in place on a page, or if it is not targeted to the visiting spider when it arrives.</p>
<p><small></small></p>
<p><a href="http://twitter.com/dutter/">follow me on Twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-explains-meta-tags-2007-12/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Senate Considers Better Spidering As Law</title>
		<link>http://www.webpronews.com/senate-considers-better-spidering-as-law-2007-11</link>
		<comments>http://www.webpronews.com/senate-considers-better-spidering-as-law-2007-11#comments</comments>
		<pubDate>Wed, 14 Nov 2007 14:08:21 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[Policy]]></category>
		<category><![CDATA[Senate]]></category>
		<category><![CDATA[Spidering]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=41893</guid>
		<description><![CDATA[Federal agencies may have the Office of Management and Budget (OMB) helping guide their efforts to make web information more accessible to search engines.
]]></description>
			<content:encoded><![CDATA[<p>Federal agencies may have the Office of Management and Budget (OMB) helping guide their efforts to make web information more accessible to search engines.<br />
<span id="more-41893"></span><br />
A Senate Committee reviewing the E-Government Act of 2002 may end up recommending the Sitemap Protocol to federal agencies. Improving the availability of online government information means making it more available to the place where people tend to look for it: search engines.</p>
<p>
The <a href=http://googlepublicpolicy.blogspot.com/2007/11/senate-helping-make-govt-more.html>Google Public Policy blog</a> noted the  Senate Homeland Security and Government Affairs Committee will consider S. 2321, a Reauthorization of the <a href=http://thomas.loc.gov/cgi-bin/query/z?c110:S.2321:>E-Government Act</a>. This extension will have OMB draw up guidance and best practices for agencies to enable search crawlers to index their websites effectively.</p>
<p>
Google recommended Sitemaps as an easy way for agencies to comply with the Act&#8217;s requirements. The search company has helped several federal and state agencies, and the Library of Congress, implement Sitemaps for improving the site&#8217;s search visibility.</p>
<p>
Website architecture at government sites could be hindering people from finding the information they need, in Google&#8217;s view. Google estimates about half of government websites suffer from these architecture issues, and Sitemaps can help alleviate the problem.</p>
<p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/senate-considers-better-spidering-as-law-2007-11/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paid Search Engine Inclusion Programs &#8211; The Details (1)  Inktomi + AltaVista</title>
		<link>http://www.webpronews.com/paid-search-engine-inclusion-programs-the-details-inktomi-altavista-2004-03</link>
		<comments>http://www.webpronews.com/paid-search-engine-inclusion-programs-the-details-inktomi-altavista-2004-03#comments</comments>
		<pubDate>Mon, 15 Mar 2004 14:02:34 +0000</pubDate>
		<dc:creator>Dirk Brockhausen</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Engine]]></category>
		<category><![CDATA[Spider]]></category>
		<category><![CDATA[Spidering]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=9233</guid>
		<description><![CDATA[Following our overview of current paid search engine inclusion (PFI) programs, we will now take a closer look at the technical procedures involved, spidering activity, quality of service, etc. This first instalment deals with Inktomi and AltaVista, more will follow in a later article.
]]></description>
			<content:encoded><![CDATA[<p>Following our overview of current paid search engine inclusion (PFI) programs, we will now take a closer look at the technical procedures involved, spidering activity, quality of service, etc. This first instalment deals with Inktomi and AltaVista, more will follow in a later article.</p>
<p><b>Inktomi</b></p>
<p>Inktomi&#8217;s current submission partners are: ineedhits.com, Outrider, Position Technologies and VeriSign.</p>
<p>Outrider is PositionTech&#8217;s London based partner. Personnel connections exist between PositionTech and Outrider via Detlef Johnson, the former Supervisor Search Engine Optimization Services with Outrider, now functioning as VP of Technology with PositionTech.</p>
<p> We registered new domains with all four partners and tracked the ensuing spider visits.</p>
<p> <b>ineedhits.com</b></p>
<p>The Inktomi spider arrived the day following page submission:</p>
<p>#UA Mozilla/3.0 (Slurp/si; slurp@inktomi.com;<br />
 <a href="http://www.inktomi.com/slurp.html">http://www.inktomi.com/slurp.html</a>)<br />
j6002.inktomi.com<br />
209.185.141.249</p>
<p>Regularly, this spider will crawl your pages every second day. However, we detected exceptions from the rule. E.g. following the turn of the month, the spider took 4 days to come back.</p>
<p> Beyond the pages registered for paid inclusion, the domain URL (e.g. http://www.domain.com/ ) was spidered.</p>
<p> This will normally occur every 5 days, with the site&#8217;s robots.txt file being crawled as well.</p>
<p> However, the domain URL hasn&#8217;t to date been included in the Inktomi index.</p>
<p> About two weeks following paid submission, this Inktomi spider started to fetch the pages in irregular intervals:</p>
<p>#UA Mozilla/5.0 (Slurp/cat; slurp@inktomi.com;<br />
 <a href="http://www.inktomi.com/slurp.html">http://www.inktomi.com/slurp.html</a>)<br />
wm3018.inktomi.com<br />
216.35.116.103</p>
<p><b>Beyond Spidering</b></p>
<p>Spider visits apart, what&#8217;s important for the submitting webmaster is, of course, the PFI partners&#8217; Management Area or control center.</p>
<p> In ineedhits.com&#8217;s &#8220;Subscription Management Area&#8221; you will see your URLs&#8217;s current status.</p>
<p> The &#8220;Click-thru hits report&#8221; displays the number of hits and the search terms which generated them. You can click on the links to either AOL, HotBot or MSN to see the pertinent search results page. According to ineedhits.com: &#8220;Click-Thru reports currently only reflect a small percentage of the entire click-thrus Inktomi Search Partners refer to your URL.&#8221;</p>
<p> This is no exaggeration: in our case, only about 75% of hits and keywords actually generated were displayed in this report. Obviously, this calls for some serious improvement.</p>
<p> <b>Outrider</b></p>
<p> About 1 to 2 hours after submission, our site was visited by the PositionTech spider:</p>
<p>#UA Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)<br />
169.207.238.180</p>
<p>It was a one-hit affair only: we haven&#8217;t seen this spider visit our site again since.</p>
<p> Another two hours later, the Inktomi crawling cycle commended with a visit from this spider:</p>
<p>#UA Mozilla/3.0 (Slurp/si; slurp@inktomi.com;<br />
 <a href="http://www.inktomi.com/slurp.html">http://www.inktomi.com/slurp.html</a>)<br />
j6002.inktomi.com<br />
209.185.141.249</p>
<p>Following this, the spider started visiting the site in the same rhythm as with ineedhits.com.</p>
<p> <b>Beyond Spidering</b></p>
<p> In Outrider&#8217;s &#8220;Account Service&#8221; section you will see your URLs&#8217;s current status and the number of generated clicks. According to Outrider: &#8220;Click-Thru reports reflect less then half of all traffic generated by Inktomi.&#8221;</p>
<p> This can be confirmed on our part: on average, less than 50% of generated clicks were reported.</p>
<p> <b>PositionTech</b></p>
<p> The PositionTech spider arrived one hour after submission:</p>
<p>#UA Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)<br />
169.207.238.180</p>
<p>Another hour later, the Inktomi spider cropped up:</p>
<p>#UA Mozilla/3.0 (Slurp/si; slurp@inktomi.com;<br />
 <a href="http://www.inktomi.com/slurp.html">http://www.inktomi.com/slurp.html</a>)<br />
j6002.inktomi.com<br />
209.185.141.249</p>
<p>Ensuing spider behavior was identical to submissions via ineedhits.com and Outrider.</p>
<p> <b>Beyond Spidering</b></p>
<p> In PositionTech&#8217;s subscribers area you will see your URLs&#8217;s current status and the number of generated clicks as well as the keywords entered by visitors. In our case, only 40% of actually generated hits and keywords were displayed.</p>
<p> Here, there is no pointer to the fact that not all clicks will be correctly displayed. This calls for improvement.</p>
<p> Click-thru data can also be checkd in a graphical view.</p>
<p> A very nifty feature exclusively offered by PositionTech is their &#8220;Pure Web Search&#8221; function. Under URL < http://169.207.238.189/search/ > a search form for pure Inktomi is offered. Search results are uncluttered by &#8220;enhancements&#8221; such as Overture PPC links, ODP stuff, ads, etc. This URL is accessible to the general public (at least, currently).</p>
<p><b> VeriSign</b></p>
<p> One day after submission, the Inktomi spider arrived:</p>
<p>#UA Mozilla/3.0 (Slurp/si; slurp@inktomi.com;<br />
 <a href="http://www.inktomi.com/slurp.html">http://www.inktomi.com/slurp.html</a>)<br />
j6002.inktomi.com<br />
209.185.141.249</p>
<p>Ensuing spider behavior was identical to submissions via the other partners.</p>
<p> <b>Beyond Spidering</b></p>
<p> In VeriSign&#8217;s subscribers area you will see your URLs&#8217;s current status and the number of generated clicks as well as the keywords entered by visitors. However, this did not work in our case. Although our access log files registered several hits, VeriSign did not display so much as a single one of them.</p>
<p><b> Inktomi PFI Summary</b></p>
<p> * Pricing is identical, regardless of submission partner.</p>
<p> * Submitting your pages to any of the partners will trigger the Inktomi spider, which will then crawl your pages following its own rhythm, independent of chosen submission partner. Partners do seem to hold a certain influence on commencement of the spidering cycle, though.</p>
<p> * With PositionTech and Outrider it took a mere 1-2 hours for the first Inktomi spider to arrive. They are also the only submission partners checking sites with their own (shared) spider.</p>
<p> * Inktomi seems to have devoted a single spider for crawling pages under the PFI program.</p>
<p> * Crawled pages will normally show up in the Inktomi index about one day after spidering.</p>
<p> * PositionTech&#8217;s subscribers area control center offers the most comprehensive functionality, making it the superior service.</p>
<p> However, seeing that no submission partner will currently process 100% of generated hits, a systematic analysis of your own server traffic logs cannot be discarded.</p>
<p><b>AltaVista&#8217;s infoSpider</b></p>
<p> Paid inclusion for AltaVista is handled by their independent partner infoSpider.</p>
<p> Shortly after submission, a spider will check the submitted URL:</p>
<p>#UA infoSpider URL Checker<br />
208.185.243.149.proboost.com<br />
208.185.243.149</p>
<p>Four days after, the first AltaVista spider arrived on site:</p>
<p>#UA Scooter-3.0.3<br />
scooter2.sv.av.com<br />
209.73.162.172</p>
<p>Pages were included in AltaVista&#8217;s index the following day.</p>
<p> The spidering cycle is determined by AltaVista&#8217;s index runs for paid inclusions, which is about once per week.</p>
<p> The AltaVista spider(s) will visit submitted pages one, two or three days prior to this index run. There is currently no regular, predictable rhythm discernible as with Inktomi.</p>
<p> AltaVista has devoted at least one other spider to crawling PFI pages:</p>
<p>#UA Scooter-3.2<br />
scooter3.sv.av.com<br />
209.73.162.143	</p>
<p><b>Beyond Spidering</b></p>
<p>In infoSpider&#8217;s Service section, you can check on your URLs&#8217; current status. No hits and keywords data is being offered.</p>
<p> Compared to Inktomi&#8217;s PFI program, AltaVista&#8217;s refresh cycles are considerably longer, and the service area offered features only very basic functionality.</p>
<p>Dirk Brockhausen is the co-founder and principal of<br />
fantomaster.com GmbH (Belgium), < http://fantomaster.com/ >,<br />
a company specializing in webmasters software development,<br />
industrial-strength cloaking and search engine positioning<br />
services.</p>
<p>He holds a doctorate in physics and has worked as an SAP<br />
consultant and software developer since 1994. He is<br />
also Technical Editor of fantomNews, a free newsletter<br />
focusing on search engine optimization, available at:<br />
< <a href="http://Fantomaster.com/fantomnews-sub.html">http://Fantomaster.com/fantomnews-sub.html</a> ><br />
You can contact him at<br />
<a href="mailto:fntecheditor@fantomaster.com">mailto:fntecheditor@fantomaster.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/paid-search-engine-inclusion-programs-the-details-inktomi-altavista-2004-03/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 1/13 queries in 0.006 seconds using memcached
Object Caching 308/333 objects using memcached

Served from: webpronews.com @ 2012-02-12 21:07:15 -->
