Google bot crawls web pages in order to update its index with new and fresh content. This is measured in URLs crawled per second.
When Google bot hits your website the rate at which it crawls your website should be optimal in order that your new content gets discovered without causing a load on your server.
You can find the number of pages crawled per day in your search console reports.
Understanding Crawl Reports
Before we learn how to optimize our site’s crawl rate it is necessary to be able to be able to read Google Crawl Reports
Pages Crawled Per Day – Pages crawled per day is the most important of all crawl metrics. You want Google to crawl a large number of pages on your site every day. When this graph shows dips and spikes you must be able to justify the reason for the same.
Kilobytes Downloaded Per Day – This metric tells you what is the total number of data Google has downloaded from your website on a particular day. This graph should correlate closely to the Pages crawled per day. If it does not then few of your pages might have gone bulky.
Time Spent Downloading a Page – This is the time your web server takes to fulfill an HTTP request by Google. Ideally, you want this number to as low as possible. If the time spent for each of these requests is high then Google might lower your site’s crawl rate.
Understanding Dips in crawl rate
There are quite a few reasons your crawl reports show dips in crawl rate. Here are few which could help you understand the reasons for the same.
- Broad rules in Robots.txt – When you update your robots txt file and block certain files and folders search engines will stop crawling those pages. If the folder you have blocked has many URLs then search engines will not crawl any of them. This shows up as a dip in crawl rate. Recheck if the rule you have written in your robots.txt is valid. If it is blocking a lot of junk URLs then you have made your site crawling a lot more efficient.
- Uncrawlable content – If few of your pages are broken or using a new layout or technology which is not search engine friendly then Google might lose access to crawling URLs leading to a dip in the graph.
- Time spent on an HTTP request – More the time your server takes to fulfill an HTTP request lower will be the pages crawled per day. This is the reason you must work on improving your server response time.
- If Google perceives your site as low quality – If your site is of low-quality then search engines will avoid it. Search engines look for original content which adds value to users over scraped or low-quality content.
- If you don’t add content regularly enough – Google bot is always on the hunt for fresh and original content. If your site pages don’t get updated frequently then Google stops crawling your site frequently and reduces the crawl rate of your site.
How to improve your site crawl rate?
The crawl rate of your website is decided by Google. However, you can influence it by doing few things right.
- Current pages in Google index – If Google has a large number of your pages already in its index then it might want to recrawl those pages to check for updated content. This results in a higher crawl rate.
- Links from other web pages – When you have links from other web pages both internal and external to your site Google assigns a higher crawl rate to your site.
- Sitemaps – You must have an HTML and XML sitemap in order to make it easy for Google to access your URLs. When Google receives a sitemap it downloads and crawls them one by one. Your site can have a higher crawl rate when you list all valid site URLs on your sitemap.
- Add content frequently – By having updated content on your site and adding new pages regularly you can make Google bot visit your page more frequently.
Understanding and solving Spikes in crawl rate
Seeing spikes in your crawl rate can be a good news if you have launched a bunch of pages and Google takes note of it. At the same time, you don’t want to expose filters, search and other pages which offer no value to users or search engines.
Although growth in crawl rates can be a good news to your search engine ranking spikes in crawl rate can pose problems if it is overloading your server.
Following are few measures you can take keep your crawl spikes to a healthy rate.
- Use a 503 HTTP status code – If an increase in Google crawl rates of your site is overloading the server than your users might have a sluggish experience on your website. This may lead to the loss of traffic and revenue. In such a case you must disallow Google Bot by showing it a 503 status code until your servers can start taking the load again.
- Set preferred max crawl rate – You can set the maximum Google crawl rate for your site using search console site setting section.
- Block junk folders and URLs using Robots.txt – Check in to your log files and see if GoogleBot has started to crawl any folders or URLs which are junk. You might have started a new internal site search and Google has started crawling these URLs. Block all such folders using the Robots.txt file
- Special Request Form – If you see that Google is crawling your site to a point where it is hampering your server performance and user experience in spite of limiting the maximum crawl rate then you can write a special request.