Less is More. Take Control of Your Crawled Pages
More input doesn’t always result in more output. This statement doesn‘t apply in all disciplines of Digital Marketing, but it certainly does in SEO.
The way you enable search engines to access your website determines their overall perception of its quality. An inefficient use of crawl budget will result in your overall SEO performance being relatively low and not meeting its potential.
In your IA (Information Architecture), find the right balance between page granularity and high crawl efficiency. Make it a priority to redirect error pages that are being returned to Google, avoid broken links and redirect chains. Refrain from using parameter URLs, they tend to increase the number of URLs depending on the technical setup and area of implementation. Set up redirect rules for the most common types of duplicated content, such as uppercase URLs, http vs. https, URLs w/ www. vs. w/o www to name a few.
What is Crawl Budget?
Crawl budget is the amount of resources search engines spend to crawl a website. The budget allocated by search engines to a particular site, is largely influenced by the number and quality of external links (backlinks) from other websites.
Common Crawling Issues
A common misconception is that it is best practice to provide search engines with as many pages as possbile. As a result, one of the KPI’s that is being reported on is the number of pages that are being indexed and displayed in Google’s search results when doing a site query for a particular domain (i.e. site:resolutiondigital.com.au). Assuming that this number reflects the actual indexation status (which it rarely does for bigger websites), the only thing that this result is expressing is a pure number – without any context. It reflects quantity, not quality – and the missing context makes this number unusable.
An issue often experienced in online shops is a vast number of URLs indexed that are generated by product filters on category pages.
Filters are great for users, as they allow a more narrowed down search, using attributes like colour, size, price, brands, or product specifics like areas of application.
Search engines’ ability to crawl the website is affected when all of the filters can be applied in each variation possible, causing an infinite number of possible URLs that are potentially going to be crawled by search engines. Due to the total number of possible combinations, each single combination is so rare that the individual interest in each one is very low and does not provide enough value to be indexed by Google. Similar observations can be made with internal search result pages, or tag pages on content-heavy websites such as blogs.
Crawl budget is also taken up by issues like Duplicated Content (the same page under different URLs), 404 error pages (pages that Google attempted to crawl but got a ‘Page not available’ error returned), and outdated content that no longer provides value.
Google applies 100% of crawl budget to the total number of pages / URLs on a website. If 50% of URLs are pages of very little SEO value, Google is only able to spend 50% of the overall crawl budget on pages that do provide enough value to be listed in organic search.
How to Fix Crawling Issues
Many of theses issues can be attributed to (mis)configuartions in the server or CMS / shop system setup and can only be fixed by adjusting these. This applies to filters, internal search results, most cases of duplicated content, or 404 error pages. However, the way these issues are being solved is largely dependent on the underlying SEO strategy and the technical capabilities of the servers and systems used.
IA related issues usually require an audit of content structure and usage. Depending on the degree of the gap, a re-structure of the IA should be considered.
Inefficient use of crawl budget usually represents an underlying structural problem that – once the root cause is fixed – causes fewer problems going forward. Given its effects on rankings and traffic, the crawlability of a website should be analysed as a matter of priority, and time and resources should be made available to fix identified issues.
SEO is an ongoing and continuous process. Fixing structural issues at the start will pave the way for better and faster SEO results in the long run.