Google Details How They Crawl Images, Offers TipsBy: Zach Walton - April 25, 2012
It’s pretty clear now how Google crawls Web sites to get the results you search for. Heck, we can’t go a week without Google updating us on how they’re changing the algorithm that determines search results. Unfortunately, there hasn’t been much on how Google crawls for images – until now.
In a lengthy post on the Webmaster Central Blog, Google details how they crawl for images and what they’re looking for when it comes to choosing the most relevant images. Considering how often I and everybody else uses Google Images, the information should be extremely relevant for Webmasters wanting their images showing up in search.
To confirm what you may have already knew, Google crawls images from pretty much every source on the Web from bloggers to stock photo sites. They also crawl images in pretty much every standard format including BMP, GIF, JPEG, PNG, WebP and SVG.
As an example, Google shows how the image algorithm knows the difference between a search for coffee and tea. The feat is accomplished by looking “at the textual content on the page the image was found on to learn more about the image.” They also “look at the the page’s title and its body; we might also learn more from the image’s filename, anchor text that points to it, and its “alt text”.”
After all this, you may be wondering how you can get your images to better show up in the results. It’s quite easy really. For your image to show up in results, make sure that Google can crawl both the HTML that the image is embedded in as well as the image itself. Also, make sure that your image is in one of the support formats from above.
The next tips aren’t required, but they are recommended for Web sites hoping to get its images crawled. First, make sure the image filename is related to the image’s content. Don’t post an image of a giraffe and call the image “africa.jpg” for instance. If your image has an alt attribute, make sure it describes the image in a “human-friendly” way. That means using full words and maybe even sentences for your alt attributes. Finally, it really helps is the HTML page’s “textual contents as well as the text near the image are related to the image.”
One of the more recent updates to the Google Images algorithm was the addition of words like “sex” to safesearch. This was because Google updated their algorithm to handle requests like this without returning obscene content. Google talks about this in more detail now saying they encourage Webmasters who publish adult content on their pages mark such content with a metatag that tells Google about said adult content. If you don’t use these metatags, Google can still do a pretty good job of filtering adult content using computer vision or contextual clues.
Finally, the company answered some burning questions in regards to image search. Some of the details include why Googlebot crawls images instead of Googlebot-Image or if there is a file size limit for images that can be crawled (spoiler: there’s not). Google also suggests that Webmasters implement an Image Sitemap to tell them more about new images and what they’re about.
If you took anything away from this, it should be that image crawling is just as involved as regular Web crawling. It takes constant vigilance and work to get your image at the top of the heap in terms of relevance. These are just some beginner tips to getting your images crawled more often. If you need more in depth instruction on image crawling, Google has a handy Image publishing guidelines page set up for such an occasion.