08.21.2019

How to add sitemap into robots file

As a web developer and webmaster, you want your site to be seen in search results. And in order to be reach that you need your website and its web pages crawled and indexed by search engine bots (robots). There are 2 files to help bots find what they need: Robots.txt and Sitemap

What is robots.txt file

Robots.txt is a simple text file that is placed on your site’s root directory. It is that file on your website that tells these search engine robots what to crawl and what not to crawl on your site. It also contains commands that describe which search engine robots are allowed to crawl and which are not.
Usually, search bots look for the robots.txt file in a website as soon as they enter one. It is therefore, significant to have a robots.txt file in the first place. Even if you want all the search robots to crawl all the pages on your site, a default robots.txt that allows, this is necessary. Please read our beginner’s guide on robots.txt if you want to  learn more.
Robots.txt also contain one important information and that is about sitemaps. In this post, we are going to elaborate on this very feature of robots.txt. But before that lets see what is a sitemap and why is it important.

What is sitemap?

A sitemap is an XML file that contains a list of all webpages on your site. It may also contain additional information about each URL in the form of meta data. And just like robots.txt, a sitemap is a must-have. It helps search engine bots explore, crawl and index all the webpages in a site through the sitemap.

Learn some more basics of XML sitemap from one of our previous posts.

How To Create Robots.txt File With Sitemap Location?

Step 1: Locate Your Sitemap URL
If your website has been developed by a third-party developer, you need to first check if they provided your site with a sitemap. The URL to the sitemap of your site usually looks like this: http://www.example.com/sitemap.xml
So type this URL in your browser with your domain in place of "example".
You can also locate your sitemap via Google search by using search operators as shown in examples below:
site:example.com filetype:xml
OR
filetype:xml site:example.com inurl:sitemap
But this will only work if your site is already crawled and indexed by Google.
Step 2: Locate Your Robots.txt File
You can check whether your site has a robots.txt file by typing domain.com/robots.txt.
If you do not have a robots.txt file then you will have to create one and add it to the top-level directory (root directory) of your web server. You would need access to your web server. Usually, it is put in the same place where your site’s main “index.html” lies. The location of these files depends on the kind of web server software you have. You must take the help of a web developer if you are not well accustomed to these files.
Just remember to use all lower case for the file name that contains your robots.txt content. Do not use Robots.TXT or Robots.Txt as your filename.
Step 3: Add Sitemap Location To Robots.txt File
Now, open up robots.txt at the root of your site. Again, you need access to your web server to do so. So, ask for a web developer to do it for you, if you are not aware how to locate and open up your site’s robots.txt file.
To facilitate auto-discovery of your sitemap file through your robots.txt, all you have to do is place a directive with the URL in your robots.txt, as shown in the sample below:
Sitemap:  http://www.example.com/sitemap.xml
So, the robots.txt file looks like this:
Sitemap: http://www.example.com/sitemap.xml
User-agent:*
Disallow:

NOTE: The directive containing the sitemap location can be placed anywhere in the robots.txt file. It is independent of the user-agent line, so it does not matter where it is placed.

What will you do if you have more than 1 sitemap (Multiple sitemap)

Every sitemap can contain not more than 50,000 URLs. So in case of a larger site with many URLs, you can create multiple sitemap files. You must list these multiple sitemap file locations in a sitemap index file. The XML format of the sitemap index file is similar to the sitemap file, which means that it is a sitemap of sitemaps.
When you have multiple sitemaps, you can either specify your sitemap index file URL in your robots.txt file as shown in the example below:
Sitemap: http://www.example.com/sitemap_index.xml
User-agent:*
Disallow

Or, you can specify individual URLs of your multiple sitemap files, as shown in the example below:
Sitemap: http://www.example.com/sitemap_host1.xml
Sitemap: http://www.example.com/sitemap_host2.xml
User-agent:*
Disallow

Finally, there is one thing you need to pay attention to when adding the Sitemap directive to the robots.txt file.
Generally, it is advised to add the ‘Sitemap’ derivative along with the sitemap URL anywhere in the robots.txt file. But in some cases it has known to give some parsing errors. You can check Google Webmaster Tools for any such errors detected, about a week after you have updated your robots.txt file with your sitemap location.
To avoid this error it is recommended that you leave a line space after the sitemap URL.
I hope this article can help your website.
Source WooRank

Leave a comment