<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WebProNews &#187; Crawling</title>
	<atom:link href="http://www.webpronews.com/tag/crawling/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webpronews.com</link>
	<description>Breaking News in Tech, Search, Social, &#38; Business</description>
	<lastBuildDate>Mon, 13 Feb 2012 04:32:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Google Rolls Out Breadcrumb Display in SERPs</title>
		<link>http://www.webpronews.com/google-testing-breadcrumb-display-in-serps-2009-12</link>
		<comments>http://www.webpronews.com/google-testing-breadcrumb-display-in-serps-2009-12#comments</comments>
		<pubDate>Wed, 23 Dec 2009 16:13:32 +0000</pubDate>
		<dc:creator>Chris Crum</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[breadcrumbs]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[search engine optimization]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Site Architecture]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=52518</guid>
		<description><![CDATA[<p><strong>Update:&#160;</strong>Google announced today that it will now be rolling out the use of breadcrumbs in seach results on a global basis. They will only be used in place of some URLs - mainly the ones that don't give the added context of a link the way that the breadcrumbs do. Google <a href="http://googleblog.blogspot.com/2009/11/new-site-hierarchies-display-in-search.html">says</a>:<br />
]]></description>
			<content:encoded><![CDATA[<p><strong>Update:&nbsp;</strong>Google announced today that it will now be rolling out the use of breadcrumbs in seach results on a global basis. They will only be used in place of some URLs &#8211; mainly the ones that don&#8217;t give the added context of a link the way that the breadcrumbs do. Google <a href="http://googleblog.blogspot.com/2009/11/new-site-hierarchies-display-in-search.html">says</a>:</p>
<p><em>Some web addresses help you understand the structure of the site and how the specific page fits into the site hierarchy. For example, consider a </em><a href="http://www.google.com/search?q=vint+cerf+biography"><em>search</em></a><em> for the biography of Vint Cerf (Google&#8217;s Internet Evangelist). The URL for one result, &quot;www.google.com/corporate/execs.html,&quot; shows that the page is located in a page about &quot;execs,&quot; under &quot;corporate,&quot; which is on the &quot;google.com&quot; site. This can provide valuable context when deciding whether to click on the result.</p>
<p>Often, however, URLs are too long, too short, or too obscure to add useful information.</em></p>
<p>That&#8217;s where the breadcrumbs come in. Google says the feature should be available globally within the next few days.</p>
<p><strong>Original Article:</strong>&nbsp;Google appears to be testing breadcrumbs in some search results, at least in some areas. If you are unfamiliar with the term <a href="http://en.wikipedia.org/wiki/Breadcrumb_%28navigation%29">breadcrumbs</a>, it refers to the hierarchical display commonly used in site navigation. For example: <strong>Home Page&gt;Product Page&gt;Product A Page</strong>. </p>
<p><span style="color: rgb(255, 0, 0);"><em><strong>Do you utilize breadcrumbs on your site?</strong></em></span><em><strong>&nbsp;<a href="http://www.webpronews.com/node/51472/talk">Comment here</a>.</strong></em></p>
<p>Several bloggers have noticed Google displaying these types of breadcrumbs in various places in seemingly random results to some queries. For example, Rob Hammond <a href="http://robbiehammond.com/google-using-breadcrumbs-in-serps">provides</a> the following screen shot:</p>
<p><center><a href="http://robbiehammond.com/google-using-breadcrumbs-in-serps"><img alt="Breadcrumbs in Google Search Results" title="Breadcrumbs in Google Search Results" src="http://images.ientrymail.com/webpronews/article_pics/google-breadcrumbs1.jpg" /></a></center></p>
<p>Leo Fogarty <a href="http://www.leofogarty.com/google/google-using-breadcrumbs-in-the-serps.html">provides</a> another, which shows the breadcrumbs displayed in a different position within the search result:</p>
<p><center><a href="http://www.leofogarty.com/google/google-using-breadcrumbs-in-the-serps.html"><img alt="Breadcrumbs in Google Search Results" title="Breadcrumbs in Google Search Results" src="http://images.ientrymail.com/webpronews/article_pics/google-breadcrumbs2.jpg" /></a></center></p>
<p>Google&#8217;s use of breadcrumbs appears to only be a test, and a limited one at that. Google has talked repeatedly about sites having good site architecture in the past. This allows Google to more easily and quickly crawl sites. </p>
<p>Bing acknowledges this too. Rick DeJarnette of Bing Webmaster Center recently said, &quot;You can have great content and a plethora of high quality inbound links from authority sites, but if your site&rsquo;s structure is flawed or broken, then it will still not achieve the optimal page rank you desire from search engines.&quot;</p>
<p>Here are some <a href="http://www.webpronews.com/topnews/2009/07/02/google-and-bing-tips-for-site-architecture-issues">tips from both Google and Bing</a> regarding site architecture issues. In addition, Google recently provided <a href="http://www.webpronews.com/topnews/2009/08/10/tips-for-getting-crawled-faster-by-google">this related information</a> on getting your site crawled faster. </p>
<p>If Google begins incorporating the breadcrumbs display as in the above tests, on a mainstream level, that will be all the more reason to clean your site architecture up, at least in the navigation area. Site architecture certainly <a href="http://en.wikipedia.org/wiki/Website_architecture">goes beyond this</a>, but it is a key part of usability anyway.</p>
<p><em><strong>Have you seen breadcrumbs show up in Google results?&nbsp;What do you think about the idea? <a href="http://www.webpronews.com/node/51472/talk">Share your thoughts</a>.</strong></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-testing-breadcrumb-display-in-serps-2009-12/feed</wfw:commentRss>
		<slash:comments>66</slash:comments>
		</item>
		<item>
		<title>Google to Start Crawling Google Docs Documents</title>
		<link>http://www.webpronews.com/google-to-start-crawling-google-docs-documents-2009-09</link>
		<comments>http://www.webpronews.com/google-to-start-crawling-google-docs-documents-2009-09#comments</comments>
		<pubDate>Mon, 21 Sep 2009 19:46:23 +0000</pubDate>
		<dc:creator>Chris Crum</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Apps]]></category>
		<category><![CDATA[Google Docs]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=51490</guid>
		<description><![CDATA[<p>Google has quietly <a href="http://www.google.com/support/forum/p/Google+Docs/thread?tid=0ca72389c9b26ef4&#38;hl=en">announced</a> that Google Docs documents that are published will soon be crawlable. This means if you have published documents as web pages, or used the publish/embed option for a document, and it has been linked to on the web, it can be indexed by Google and other search engines. <br />
]]></description>
			<content:encoded><![CDATA[<p>Google has quietly <a href="http://www.google.com/support/forum/p/Google+Docs/thread?tid=0ca72389c9b26ef4&amp;hl=en">announced</a> that Google Docs documents that are published will soon be crawlable. This means if you have published documents as web pages, or used the publish/embed option for a document, and it has been linked to on the web, it can be indexed by Google and other search engines. </p>
<p><img align="right" style="margin: 10px;" title="Publish as Web Page" alt="Publish as Web Page" src="http://images.ientrymail.com/webpronews/article_pics/publish-as-web-page.jpg" /> &quot;This is a very exciting change as your published docs linked to from public websites will reach a much wider audience of people,&quot; says Google Employee &quot;Marie F,&quot; on the Google Docs Help Forum.</p>
<p>The change does not apply to documents that are set to &quot;allow anyone with the link to view (no sign-in required).&quot; Any concerned users, who do not wish for their published documents to be indexed can un-publish them by:</p>
<blockquote><p>- going to the &quot;share tab&quot;</p>
<p>- For documents and spreadsheets, choosing &quot;publish as web page&quot;. For presentations choosing &quot;publish/embed&quot;</p>
<p>- Clicking the button that says &quot;stop publishing&quot;.</p></blockquote>
<p>Google notes that Google Apps users may find that they&#8217;re unable publish documents to the world if the admin of the domain has disallowed publishing outside the domain. </p>
<p>The company said they will be launching the change in a couple weeks. Keep an eye on <a href="http://www.google.com/support/forum/p/Google+Docs/thread?tid=0ca72389c9b26ef4&amp;hl=en">the help forum</a> for an update on when the change goes live.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-to-start-crawling-google-docs-documents-2009-09/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Tips for Getting Crawled Faster by Google</title>
		<link>http://www.webpronews.com/tips-for-getting-crawled-faster-by-google-2009-08</link>
		<comments>http://www.webpronews.com/tips-for-getting-crawled-faster-by-google-2009-08#comments</comments>
		<pubDate>Fri, 28 Aug 2009 08:00:19 +0000</pubDate>
		<dc:creator>Chris Crum</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Googlebot]]></category>
		<category><![CDATA[presentations]]></category>
		<category><![CDATA[search engine indexing]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=51224</guid>
		<description><![CDATA[<p>Probably the most important step in getting your site found in a search engine is the one in which the search engine crawls it. There are things that can be done and things that can be avoided to make this process as painless as possible for the search engine, which will in turn, make it as painless as possible for the webmaster.<br />
]]></description>
			<content:encoded><![CDATA[<p>Probably the most important step in getting your site found in a search engine is the one in which the search engine crawls it. There are things that can be done and things that can be avoided to make this process as painless as possible for the search engine, which will in turn, make it as painless as possible for the webmaster.</p>
<p>Since Google dominates the search market share by such a large market share, it is always a good idea to listen to what they have to say about such matters. So when they post a presentation with tips on optimizing crawling and indexing, you&#8217;ll probably want to pay attention. </p>
<p>Google has <a href="http://googlewebmastercentral.blogspot.com/2009/08/optimize-your-crawling-indexing.html">done just that</a>, highlighting things to stay away from, and things you can do to enhance your site&#8217;s crawlability. Here is that presentation with specific examples of URLs.</p>
<p><center></p>
<table>
<tbody>
<tr>
<td><iframe height="342" frameborder="0" width="410" src="http://docs.google.com/present/embed?id=dgk2ft62_18cvjx4nk4"></iframe></td>
</tr>
</tbody>
</table>
<p></center></p>
<p>&quot;<a href="http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html">The Internet is a big place</a>; new content is being created all the time,&quot; says Google Webmaster Trends Analyst Susan Moskwa. &quot;Google has a finite number of resources, so when faced with the nearly-infinite quantity of content that&#8217;s available online, Googlebot is only able to find and crawl a percentage of that content. Then, of the content we&#8217;ve crawled, we&#8217;re only able to index a portion.&quot;</p>
<p>&quot;URLs are like the bridges between your website and a search engine&#8217;s crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your URLs) in order to get to your site&#8217;s content,&quot; continues Moskwa. &quot;If your URLs are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your URLs are organized and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different URLs.&quot;</p>
<p>If you want to get crawled faster by Google, you should <strong>remove user-specific details from URLs.</strong> Specifics of this can be viewed in the slideshow.&nbsp; Basically, URL parameters that don&#8217;t change the content of the page, should be removed and put into a cookie. This will reduce the number of URLs that point to the same content, and speed up crawling. </p>
<p>Google says <strong>infinite spaces are a waste of time and bandwidth for all</strong>, which is why you should consider taking action when you have calendars that link to infinite numbers of past/future dates with unique URLs, or other paginated data. </p>
<p><strong>Tell Google to ignore pages it can&#8217;t crawl</strong>. This includes things like log-in pages, contact forms, shopping carts, and other pages that require users to perform actions that crawlers can&#8217;t perform themselves. You can do this with the robots.txt file. </p>
<p>Finally, <strong>avoid duplicate content</strong> when possible. Google likes to have one URL for each piece of content. They do recognize that this is not always possible though (because of content management systems and what have you), which is why the <a href="http://videos.webpronews.com/2009/02/13/breaking-news-matt-cutts-explains-canonical-tag-from-google-yahoo-microsoft/">canonical link element</a> exists to let you specify the preferred URL for a particular piece of content.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/tips-for-getting-crawled-faster-by-google-2009-08/feed</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>Google Improves Flash Indexing Capabilities</title>
		<link>http://www.webpronews.com/google-improves-flash-indexing-capabilities-2009-06</link>
		<comments>http://www.webpronews.com/google-improves-flash-indexing-capabilities-2009-06#comments</comments>
		<pubDate>Sun, 21 Jun 2009 13:56:43 +0000</pubDate>
		<dc:creator>Chris Crum</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[adobe flash]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Flash]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[search indexing]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=50382</guid>
		<description><![CDATA[<p>Last year, Google began crawling and indexing Flash content, but now Google has announced that it can also index external resource loading. In other words, Google can index external content that loads within an SWF file, and associate it with that file, so that it will appear in search results.<br />
<br />
For example, a site that loads something like this in Flash:</p>
<center><img title="Transporter Flash file" alt="Transporter Flash file" src="http://images.ientrymail.com/webpronews/article_pics/transporter.jpg" /></center>
<p>..might appear in a Google SERP like this:</p>]]></description>
			<content:encoded><![CDATA[<p>Last year, Google began crawling and indexing Flash content, but now Google has announced that it can also index external resource loading. In other words, Google can index external content that loads within an SWF file, and associate it with that file, so that it will appear in search results.</p>
<p>For example, a site that loads something like this in Flash:</p>
<p><center><img title="Transporter Flash file" alt="Transporter Flash file" src="http://images.ientrymail.com/webpronews/article_pics/transporter.jpg" /></center></p>
<p>..might appear in a Google SERP like this:</p>
<div style="text-align: center;"><a href="http://3.bp.blogspot.com/_o5Na_9269nA/Sjssweuwi-I/AAAAAAAACrQ/T5HkhLbSuCU/s1600-h/t4-tranformations-search-result.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img border="0" id="BLOGGER_PHOTO_ID_5348918193827122146" alt="" src="http://3.bp.blogspot.com/_o5Na_9269nA/Sjssweuwi-I/AAAAAAAACrQ/T5HkhLbSuCU/s400/t4-tranformations-search-result.png" style="cursor: pointer; width: 400px; height: 84px;" /></a></div>
<p>This is new to Google&#8217;s Flash-crawling abilities. On Google&#8217;s Webmaster Central Blog, Software Engineer Janis Stipins <a href="http://googlewebmastercentral.blogspot.com/2009/06/flash-indexing-with-external-resource.html">breaks down</a> just what Google can do when it encounters SWF files:</p>
<blockquote><p>- Index textual content displayed as a user interacts with the file. We click buttons and enter input, just like a user would.</p>
<p>-&nbsp; Discover links within Flash files.</p>
<p>-&nbsp; Load external resources and associate the content with the parent file.</p>
<p>-&nbsp; Support common JavaScript techniques for embedding Flash, such as SWFObject and SWFObject2.</p>
<p>- Index sites scripted with AS1, AS2, and AS3 even if the ActionScript is obfuscated.</p></blockquote>
<p>With regards to AS3, Stipins says, &quot;The ActionScript version isn&#8217;t particularly relevant in our Indexing process, so we support older versions of AS in addition to the latest.&quot;</p>
<p>Webmasters who have SWF files on the web that don&#8217;t want them or any of their external resources crawled, can take care of this with their robots.txt file.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-improves-flash-indexing-capabilities-2009-06/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Google Counts First Link, Not Second, Says SEO</title>
		<link>http://www.webpronews.com/google-counts-first-link-not-second-says-seo-2008-07</link>
		<comments>http://www.webpronews.com/google-counts-first-link-not-second-says-seo-2008-07#comments</comments>
		<pubDate>Mon, 07 Jul 2008 18:00:05 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[NoFollow]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=46135</guid>
		<description><![CDATA[Got a couple of links on one page to another page? Google only has love for the first one, no matter what you do with it.
]]></description>
			<content:encoded><![CDATA[<p>Got a couple of links on one page to another page? Google only has love for the first one, no matter what you do with it.<br />
<span id="more-46135"></span>
<p>
Swap its anchor text, nofollow it, Google doesn&#8217;t care, it seems. That first link on a page with multiple links to a second page receives Google&#8217;s notice. The rest, not so much.</p>
<p>
Branko Rihtman at <a href=http://www.seo-scientist.com/first-link-counted-rebunked.html>SEO Scientist</a> noted the first link phenomenon isn&#8217;t a new concept. Rand Fishkin made that point at <a href=http://www.seomoz.org/blog/results-of-google-experimentation-only-the-first-anchor-text-counts>SEOMoz</a> in March 2008.</p>
<p>
Rihtman wanted to delve into the idea a little bit farther, resulting in his test of one page, two links, one destination page, to see if Google really did appear to index the first, and only the first, link:</p>
<blockquote style=background-color:#c2dfff;><p>So, in order to test the theory, I picked two sites that sometimes double as my furry lab animals and set them up so that site A links to site B with two links using different anchor texts. The phrases appeared only on a site A, they were not to be found anywhere on site B and if that site was to rank for any of those two phrases, it would be only due to the anchor text of the links pointing from the site A.</p></blockquote>
<p>
Based on his testing, Rihtman found Google only picked up on the first link. Then he dropped a nofollow attribute on the first link, to see if Google would skip it and rank the site for the second link. It didn&#8217;t happen.</p>
<p>
&#8220;Besides the fact that nofollowing the top link will not get you around the Google ignoring the second link, it is interesting to see that Google will still count the top link even though it is nofollowed,&#8221; he said.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-counts-first-link-not-second-says-seo-2008-07/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Much ado about Delicious robots</title>
		<link>http://www.webpronews.com/much-ado-about-delicious-robots-2008-02</link>
		<comments>http://www.webpronews.com/much-ado-about-delicious-robots-2008-02#comments</comments>
		<pubDate>Mon, 18 Feb 2008 17:11:45 +0000</pubDate>
		<dc:creator>WebProNews Staff</dc:creator>
				<category><![CDATA[bots]]></category>
		<category><![CDATA[Crawling]]></category>
		<category><![CDATA[Delicious]]></category>
		<category><![CDATA[robots.txt]]></category>

		<guid isPermaLink="false">http://blogs.webpronews.com/2008/02/18/much-ado-about-delicious-robots/</guid>
		<description><![CDATA[Bots arriving from Google, Ask.com, and MSN to sample pages on the bookmarking site Delicious hit a robots.txt block. As do Yahoo&#8217;s Slurp bots too. It&#8217;s no big deal. There is a little confusion about Delicious and its handling of &#8230;]]></description>
			<content:encoded><![CDATA[<p>Bots arriving from Google, Ask.com, and MSN to sample pages on the bookmarking site Delicious hit a robots.txt block. As do Yahoo&#8217;s Slurp bots too. It&#8217;s no big deal.</p>
<p><span id="more-66811"></span></p>
<p>There is a little confusion about Delicious and its handling of visiting robots. <a href="http://www.searchenginejournal.com/yahoo-blocking-bots-from-spideing-delicious-bookmarks/6387/">Search Engine Journal</a> cited a blogger who claimed Delicious blocks bots from the big search engines:</p>
<blockquote><p><em>Colin Cochrane found this out the other day, saying that ‘This isn’t a simple robots.txt exclusion, but rather a 404 response that is now being served based on the requesting User-Agent.’</em></p></blockquote>
<p>Look a little closer at the <a href="http://del.icio.us/robots.txt">robots.txt</a> file, and you see something different happening. The bots from the four big search sites have been disallowed from certain subdirectories at Delicious, and not the bookmarks. The top line of the robots.txt file is a broad go-away to all bots, but from what we can tell from the <a href="http://www.robotstxt.org/orig.html">Robots.txt standard</a>, the lines aimed as the four specific bots allow them to go anywhere on Delicious that has not been expressly disallowed to them.</p>
<p>Nothing to see here folks, move along, move along.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/much-ado-about-delicious-robots-2008-02/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 1/29 queries in 0.009 seconds using memcached
Object Caching 453/525 objects using memcached

Served from: webpronews.com @ 2012-02-13 03:24:29 -->
