The robot starts crawling the site from the main page, following the links deeper into the website. The robot doesn’t scan the sitemap (except when the sitemap has been preloaded in the audit settings), as well as pages that use canonical, noindex, and nofollow attributes. This algorithm is as close as possible to how search engine robots work.
An audit is not conducted for pages that are not involved in ranking and are not indexed by search engines.
Types of pages:
1. Non-canonical – pages with a canonical tag with a link to another page. When the canonical tag is on the page, we don’t scan this page, but follow the specified page and scan it. You can find such links in the Custom Overview.
2. Closed from indexing – pages with the noindex tag. Such pages will also not be indexed by search engines, and therefore we won’t crawl such pages. You can find such links in the Pages closed from indexing using the noindex meta tag or in the Custom Overview.
3. Closed from following – pages with the nofollow tag. When finding a link to pages with the nofollow tag, we don’t follow it and therefore don’t crawl according to the instructions of this tag. You can find such links in the check. The attribute rel = “nofollow” is used for internal links or in the Custom Overview.