If you are a website owner or developer, you will undoubtedly be wanting to do everything you can to get your website appearing as high up the Google search results as possible.
An extremely important part of this process is allowing your website pages to be crawled and indexed by search engine bots (robots).
Behind the scenes, there are two different files that help to give these bots the information they need to quickly and effectively read your website:
- Robots.txt file
- XML sitemap
What is a robots.txt file?
A robots.txt file is a simple text file that is located within the root directory of your website. The contents of the file tell search engine robots what pages to crawl (and what pages not to crawl) and also which search engines have permission to crawl your site. It is important that you have this file because when a search engine bot enters your site, it will look for your robots.txt before doing anything else. Even if you think you want a bot to crawl all of your pages, you still need to have a default robots.txt file.
What content is included within a robots.txt file?
<pre>User-agent: [user-agent name e.g. ‘Googlebot’] Disallow: [URL string not to be crawled e.g. http://www.example.com/non-public]
The two simple lines above represent a complete robots.txt file, however, many lines of user-agents and directives can be written to give specific instructions to each bot.
If you want your robots file to allow all user-agents to search all pages, your file would look like this:
User-agent: * Disallow:
A sitemap is an XML file which contains a list of all of the webpages on your site as well as metadata (metadata being information that relates to each URL). In the same way as a robots.txt file works, a sitemap allows search engines to crawl through an index of all the webpages on your site in one place.
How to create a robots.txt file with sitemap location
Creating a robots.txt file which includes your sitemap location can be achieved in three steps.
Step 1: Locate your sitemap URL
If you or your developer have already created a sitemap then it is likely that it will be located at http://www.example.com/sitemap.xml, where ‘example’ is replaced by your domain name.
To check if your sitemap is located here, simply type that URL into a browser and you will either see the sitemap or a 404 error which means it does not exist in that location.
Alternatively, you can use Google to locate your sitemap using search operators. Simply type site:example.com filetype:xml in Google’s search bar to see if Google finds it.
If you can’t find your sitemap it may not exist. In this case, you can generate a site map yourself or request a developer to produce one for you.
Step 2: Locate your robots.txt file
Similarly to your sitemap, you can check whether a robots.txt file has already been created on your website by simply typing http://www.example.com/robots.txt, where ‘example’ is replaced by your domain name.
If you don’t have a robots.txt file then you will need to create one and ensure that it has been added to the top-level directory (root) of your web server.
Simply create a .txt file and include the following text:
User-agent: * Disallow:
The above text allows all bots to crawl all your content.
Step 3: Add sitemap location to robots.txt file
Finally, you need to add your sitemap location to your robots.txt file.
To do so, you need to edit your robots.txt file and add a directive with the URL of your sitemap, as shown below:
And now your robots file should look like this:
Sitemap: http://www.example.com/sitemap.xml User-agent: * Disallow:
How to create a robots.txt file with multiple sitemap locations
Some larger website will have more than one sitemap to index all of their pages, or it may be that a site has multiple sub-sections and grouping pages with multiple sitemaps is used to make things more manageable. In this case, you must create a “sitemap of sitemaps”, known as a sitemap index file.
The formatting of this file is similar to a standard XML sitemap file.
When you have multiple sitemaps, you can either specify your sitemap index file URL within your robots.txt file:
Sitemap: http://www.example.com/sitemap_index.xml User-agent:* Disallow
Or alternatively, you can specific each individual sitemap file URL’s as a list:
Sitemap: http://www.example.com/sitemap_1.xml Sitemap: http://www.example.com/sitemap_2.xml User-agent:* Disallow
We hope that this article has given you insight into how both the sitemap and robots files are used by search engines to crawl the pages of your website. We urge anyone looking to seriously improve their SEO to implement both of these files on their website. Without them, you will be lagging behind your competitors.