Create FREE 'HowTo' Videos with MyGuide

How Search Engines Work

Pass Quiz and Get a Badge of Learning

Content "filtered", Please subscribe for FULL access.

Chapter 2 : How Search Engines Work?

Crawling and Indexing arrow_upward

Finding information by crawling

  • Search engines use software known as “web crawlers” to discover publicly available webpages.
  • Google’s most well known crawler is called “Googlebot”.
  • Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers.
  • The crawl process begins with a list of web addresses from past crawls and sitemaps provided by website owners.
  • As the search engine crawlers visit these websites, they look for links for other pages to visit. The software pays special attention to new sites, changes to existing sites and dead links.
  • Computer programs determine which sites to crawl, how often, and how many pages to fetch from each site.
  • Google doesn't accept payment to crawl a site more frequently for their web search results. This ensures they have the best possible results because in the long run that’s what’s best for users and, therefore, Google’s business.

  • Organizing information by indexing

  • The web is like an ever-growing public library with billions of books and no central filing system.
  • Search engines essentially gather the pages during the crawl process and then create an index, so that they know exactly how to look things up.
  • For example: Much like the index in the back of a book, the Google index includes information about words and their locations. When you search, at the most basic level, their algorithms look up your search terms in the index to find the appropriate pages.
  • The search process gets much more complex from there. When you search for “roses” you don’t want a page with the word “roses” on it hundreds of times. You probably want pictures, videos or a list of breeds.
  • Google’s indexing systems note many different aspects of pages, such as when they were published, whether they contain pictures and videos, and much more.
  • With the “Knowledge Graph”, they are continuing to go beyond keyword matching to better understand the people, places and things you care about.

  • Providing Answers arrow_upward

  • Search engines are answer machines.
  • When a person looks for something online, it requires the search engines to scour their corpus of billions of documents and do two things:
    • First, return only those results that are relevant or useful to the searcher’s query, and
    • Second, rank those results in order of perceived usefulness.
  • It is both “relevance” and “importance” that the process of SEO is meant to influence.

  • Relevance

  • To a search engine, relevance means more than simply finding a page with the right words.
  • In the early days of the web, search engines didn’t go much further than this simplistic step, and their results suffered as a consequence.
  • Thus, through evolution, smart engineers devised better ways to find valuable results that searchers would appreciate and enjoy. Today, 100s of factors influence relevance.

  • How do Search Engines determine importance?

  • Currently, the major engines typically interpret importance as popularity-the more popular a site, page or document, the more valuable the information contained therein must be.
  • This assumption has proven fairly successful in practice, as the engines have continued to increase users’ satisfaction by using metrics that interpret popularity.
  • Popularity and relevance aren’t determined manually. Instead, the engines craft careful, mathematical equations – algorithms – to sort the wheat from the chaff and to then rank the wheat in order of tastiness (or however it is that farmers determine wheat’s value).
  • These algorithms are often comprised of hundreds of components. In the “search marketing” field, we often refer to them as “ranking factors”.

  • What is PageRank? arrow_upward

  • PageRank is a numeric value that represents how important a page is on the web.
  • Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be.
  • Also, the importance of the page that is casting the vote determines how important the vote itself is.
  • Google calculates a page's importance from the votes cast for it. How important each vote is taken into account when a page's PageRank is calculated.
  • PageRank is Google's way of deciding a page's importance. It matters because it is one of the factors that determine a page's ranking in the search results. It isn't the only factor that Google uses to rank pages, but it is an important one.
  • In the above figure, cartoon illustrating the basic principle of PageRank. The size of each face is proportional to the total size of the other faces which are pointing to it.
  • 10 is the maximum amount of PageRank a page can have. This value can never be exceeded.
  • is the only webpage having a PageRank of 10/10.

  • SEO Guidelines from Google arrow_upward

  • Google recommend the following to get better rankings in their search engine:
    • Make pages primarily for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as cloaking.
    • Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
    • Create a useful, information-rich site, and write pages that clearly and accurately describe your content. Make sure that your <title> elements and ALT attributes are descriptive and accurate.
    • Use keywords to create descriptive, human friendly URLs. Provide one version of a URL to reach a document, using 301redirects or the rel=”canonical” element to address duplicate content.

    SEO Guidelines from MSFT arrow_upward

  • Bing engineers at Microsoft recommend the following to get better ranking in their search engine:
    • Ensure a clean, keyword rich URL structure is in place.
    • Make sure content is not buried inside rich media (Adobe Flash Player, JavaScript, Ajax) and verify that rich media doesn’t hide links from crawlers.
    • Create keyword-rich content based on research to match what users are searching for. Produce fresh content regularly.
    • Don’t put the text that you want indexed inside images. For example, if you want your company name or address to be indexed, make sure it is not displayed inside a company logo.

    Google Algorithm 2014 Updates arrow_upward

  • Pirate 2.0 — October 21, 2014: More than two years after the original DMCA/"Pirate" update, Google launched another update to combat software and digital media piracy. This update was highly targeted, causing dramatic drops in ranking to a relatively small group of sites.
  • Penguin 3.0 — October 17, 2014: More than a year after the previous Penguin update (2.1), Google launched a Penguin refresh. This update appeared to be smaller than expected (<1% of US/English queries affected) and was probably data-only (not a new Penguin algorithm). The timing of the update was unclear, especially internationally, and Google claimed it was spread out over "weeks".
  • "In The News" Box — October 1, 2014: Google made what looked like a display change to News-box results, but later announced that they had expanded news links to a much larger set of potential sites. The presence of news results in SERPs also spiked, and major news sites reported substantial traffic changes.
  • Panda 4.1 (#27) — September 23, 2014: Google announced a significant Panda update, which included an algorithmic component. They estimated the impact at 3-5% of queries affected. Given the "slow rollout," the exact timing was unclear.
  • Authorship Removed — August 28, 2014: Following up on the June 28th drop of authorship photos, Google announced that they would be completely removing authorship markup (and would no longer process it). By the next morning, authorship bylines had disappeared from all SERPs.
  • HTTPS/SSL Update — August 6, 2014: After months of speculation, Google announced that they would be giving preference to secure sites, and that adding encryption would provide a "lightweight" rankings boost. They stressed that this boost would start out small, but implied it might increase if the changes proved to be positive.
  • Pigeon — July 24, 2014: Google shook the local SEO world with an update that dramatically altered some local results and modified how they handle and interpret location cues. Google claimed that Pigeon created closer ties between the local algorithm and core algorithm(s).
  • Authorship Photo Drop — June 28, 2014: John Mueller made a surprise announcement (on June 25th) that Google would be dropping all authorship photos from SERPs (after heavily promoting authorship as a connection to Google+). The drop was complete around June 28th.
  • Payday Loan 3.0 — June 12, 2014: Less than a month after the Payday Loan 2.0 anti-spam update, Google launched another major iteration. Official statements suggested that 2.0 targeted specific sites, while 3.0 targeted spammy queries.
  • Panda 4.0 (#26) — May 19, 2014: Google confirmed a major Panda update that likely included both an algorithm update and a data refresh. Officially, about 7.5% of English-language queries were affected.
  • Payday Loan 2.0 — May 16, 2014: Just prior to Panda 4.0, Google updated it's "payday loan" algorithm, which targets especially spammy queries.
  • Unnamed Update — March 24, 2014: Major algorithm flux trackers and webmaster chatter spiked around 3/24-3/25, and some speculated that the new, "softer" Panda update had arrived. Many sites reported ranking changes, but this update was never confirmed by Google.
  • Page Layout #3 — February 6, 2014: Google "refreshed" their page layout algorithm, also known as "top heavy". Originally launched in January 2012, the page layout algorithm penalizes sites with too many ads above the fold.

  • Thank You from Kimavi arrow_upward

  • Please email us at and help us improve this tutorial.

  • Mark as Complete => Receive a Certificate in SEO

    Kimavi Logo

    Terms and conditions, privacy and cookie policy | Facebook | YouTube | TheCodex.Me | Email Kimavi

    Kimavi - A Video Learning Library { Learning is Earning }

    Get Ad Free Learning with Progress Report, Tutor Help, and Certificate of Learning for only $10 a month

    All videos on this site created using MyGuide.

    Create FREE HowTo videos with MyGuide.