If content on your site is being misunderstood or ignored by web robots, it’s not going to rank in the SERPs. Unfortunately, you can’t just accuse the bots of being rude or dumb. You have to actually do something to fix it.
Not sure how to check your robots.txt file? We’ve got you.
Follow these simple steps for checking your robots.txt file to ensure important content on your website is being correctly crawled, indexed and enjoyed* by web bots.
*The web bots in question may or may not have the sentient capabilities to enjoy your content.
What is a robots.txt file?
Your robots.txt file instructs web robots, or search engine spiders, on how to crawl your pages and what content to index in the search engine results.
A typical robots.txt file looks something like this:
Or, if you’re a cool cat like Taylor Swift, your robots.txt file might look like this instead:
Reviewing the robots.txt file is crucial for ensuring important information on your website isn’t being ignored by web robots. If your robots.txt file is telling bots to block a page from the SERPs, search engine users won’t be able to find it.
How to find a site’s robots.txt file
- Put the path /robots.txt directly after the root domain.
- Example: The robots.txt file of example.com can be found at example.com/robots.txt.
What to look for in a robots.txt file
- Pages or resources that are currently blocked (statement = Disallow) or not blocked from robots (check that nothing is blocked that shouldn’t be, and vice versa)
- What robots these rules apply to (statement = User-agent)
What a robots.txt file usually contains
- User agents: this names any robots that are referenced in the robots.txt file.
When you’re allowing or disallowing pages from being crawled, you’ll typically want to do so for all bots. In a robots.txt file, ‘all bots’ is represented by an asterisk (*). So to apply blocks to all bots, begin your chain of commands with user-agent: *.
If you have a vendetta against a particular bot (maybe it was mean to you in high school), you can start your chain of commands with user-agent: the name of that specific bot. But there aren’t too many situations where you would need to do this.
- Allow/disallow commands: These tell the bots which pages/folders they should or shouldn’t crawl. You can apply these commands to a specific page (e.g. /blog/jan-2017/amazing-fake-blog-post/), to a folder (e.g. /blog/), or even to your whole site (just use ‘/’, though you probably don’t want to do this!).
Quick hint: Wondering why you would need to ‘allow’ a page when all pages are crawlable by default? Well, you might have a whole folder that you want to block but one specific page in that folder you want bots to crawl. Using the ‘allow’ command means you can override the ‘disallow’ for that one special page that you love so much for some reason.
One more tip: You can disallow a whole topic from being crawled by using an asterisk (multi-talented little punctuation mark, isn’t it?). For example, the command disallow: *seo would tell bots to disregard any URL that contains the term ‘seo’. This can be quite handy if you’ve got a lot of themed pages you need to block but not enough time to track them down one by one.
- Sitemap: At the end of your robots.txt file, you should tell the bots where to find your website’s sitemap (e.g. http://www.example.com/sitemap.xml).
Here’s an example we’ve zoomed in on and lovingly illustrated in Paint:
How to test a robots.txt file
Some robots.txt files are simple enough for a manual check, performed by going through the file to identify any potential issues.
For a longer and more complicated robots.txt file, a testing tool can be used. A couple of our favourite tools for this include:
- Google Search Console Robots.txt Checker
- Technical SEO Robots.txt Testing Tool.
While the thought of taking on a cluster of robot spiders may sound intimidating, checking your robots.txt file is actually pretty simple, and it can have a huge impact on your chances of ranking if you’ve currently got some disallows that you don’t actually want in place.
Take a look at our other quick SEO wins to boost your website even further.