Googlebot is the web crawler used by Google to gather the information needed and build a searchable index of the web. Googlebot has mobile and desktop crawlers, as well as specialized crawlers for news, images, and videos.
There are more crawlers Google uses for specific tasks, and each crawler will identify itself with a different string of text called a “user agent.” Googlebot is evergreen, meaning it sees websites as users would in the latest Chrome browser.
Googlebot runs on thousands of machines. They determine how fast and what to crawl on websites. But they will slow down their crawling so as to not overwhelm websites.
Let’s look at their process for building an index of the web.
How Googlebot crawls and indexes the web
Google has shared a few versions of its pipeline in the past. The below is the most recent.
Google starts with a list of URLs it collects from various sources, such as pages, sitemaps, RSS feeds, and URLs submitted in Google Search Console or the Indexing API. It prioritizes what it wants to crawl, fetches the pages, and stores copies of the pages.
It processes this again and looks for any changes to the page or new links….