Chapter 9 : Search Engine Tools and Services
Topics covered in this snack-sized chapter:
Sitemaps help search engines find and classify content on site that they may not have found on their own.
Sitemaps also come in a variety of formats and can highlight many different types of content, including video, images, news and mobile.
Sitemaps are of three varieties:
XML: Extensible Markup Language (Recommended Format)
- This is the most widely accepted format for sitemaps.
- It allows for the most granular control of page parameters.
RSS: Really Simple Syndication or Rich Site Summary
- It is relatively large file sizes. Since XML requires an open tag and a close tag around each element, file sizes can get very large.
- Easy to maintain; RSS sitemaps can easily be coded to automatically update when new content is added.
Txt: Text File
- Harder to manage; although RSS is a dialect of XML, it is actually much harder to manage due to its updating properties.
- Extremely easy; the text sitemap format is one URL per line up to 50,000 lines.
- Does not provide the ability to add meta data to pages.
The robots.txt file, a product of the Robots Exclusion Protocol, is a file stored on a website's root directory.
The robots.txt file gives instructions to automated web crawlers visiting your site, including search spiders.
By using robots.txt, webmasters can indicate to search engines which areas of a site they would like to disallow bots from crawling as well as indicate the locations of sitemap files and crawl-delay parameters.
The following commands are available:
- Disallow: Prevents compliant robots from accessing specific pages or folders.
- Sitemap: Indicates the location of a website’s sitemap or sitemaps.
- Crawl Delay: Indicates the speed (in milliseconds) at which a robot can crawl a server.
- An example of Robot.txt is shown below:
# Don’t allow spambot to crawl any pages
The meta robots tag creates page-level instructions for search engine bots.
The meta robots tag should be included in the head section of the HTML document.
An example of Meta Robot is shown below:
<title>The Best Website on the Internet</title>
<meta name="ROBOTS" content="NOINDEX,
The rel=nofollow attribute allows you to link to a resource.
Literally, "nofollow" tells search engines not to follow the link, but some engines still follow them for discovering new pages.
An example of nofollow is shown below:
<a href=”http://www. image.com” title=“Image”
rel=”nofollow”> image Link</a>
Often, two or more copies of the exact same content appear on your website under different URLs.
For example, the following URLs can all refer to a single homepage:
- http:// wagmob.com/default.asp
To search engines, these appear as 5 separate pages. Because the content is identical on each page, this can cause the search engines to devalue the content and its potential rankings.
The canonical tag solves this problem by telling search robots which page is the singular "authoritative" version which should count in web results.
- http:// wagmob.com/Default.asp
- An example of rel="canonical" for the URL http://wagmob.com/default.asp is shown below:
Two types of search engine tools are:
<title>The Best Webpage on the Internet</title>
<link rel="canonical" href="http://www.wagmob.com">
Geographic Target - If a given site targets users in a particular location, webmasters provide Google with information that will help determine how that site appears in its country-specific search results, and also improve Google search results for geographic queries.
Preferred Domain - The preferred domain is the one that a webmaster would use to index their site's pages.
URL Parameters - You can indicate information about each parameter on site. This helps to crawl site more efficiently.
Crawl Rate - The crawl rate affects the speed of Googlebot's requests during the crawl process. It has no effect on how often Googlebot crawls a given site. Google determines the recommended rate based on the number of pages on a website.
Malware - Malware is not only bad for users, but will have a severely negative effect on rankings. Like, Google will inform you if it has found any malware on your site.
Crawl Errors - If Googlebot encounters significant errors while crawling, such as 404s, it will report and identify where Googlebot found the link to the inaccessible URL.
HTML Suggestions - This analysis identifies search engine unfriendly HTML elements. Specifically, it lists meta description issues, title tag issues and non-indexable content issues.
Your Site on the Web
These statistics offer unique insight to SEOs in particular, as they report keyword impressions, click-through rates, top pages delivered in search results, and linking statistics.
It allows submitting sitemaps, test robots.txt files, adjust sitelinks, and submitting change of address requests when you move your website from one domain to another.
Sites Overview- This interface provides a single overview of all websites performance in Bing powered search results.
Crawl Stats - Here you can view reports on how many pages of your site Bing has crawled and discover any errors encountered. Like Google Webmaster, you can also submit sitemaps to help Bing to discover and prioritize your content.
Index - This section allows webmasters to view and help control how Bing indexes their web pages. Again, similar to settings in Google Webmaster Tools, it can explore how content is organized within Bing, submit URLs, remove URLs from search results, explore inbound links and adjust parameter settings.
Traffic - The traffic summary in Bing Webmaster reports impressions and click-through data by combining data from both Bing and Yahoo search results. Reports here show average position as well as cost estimates if you were to buy ads targeting each keyword.