Kindly tell me the actual use of robots.txt file and its use?
To limit search engines and other crawlers from crawling URLs you do not want. (You can also e.g. control crawl-delay to prevent nice-behaving crawler robots from overloading your website server)
You can also use it to point to an XML sitemap
Robot text is used to direct search engines about where they can crawl your site. Of course, only approved SE follow this. Vicious crawlers will not follow your robot text and crawl everything it wants/
In point of view Robots txt files are use to give permission to search engine weather to index the webpaeg or foolow the page.there are several instruction for it.
Robots.txt is a text file that tells search engine spiders, also known as search engine robots, which parts of your website they can enter and which parts they can't.
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. It divided into sections by the robot crawler's User Agent name.
User-agent: * Disallow:
Simply, the robots.txt is a very simple text file that is placed on our root directory. For example www.example.com/robots.txt. This file tells search engine and other robots which areas of our site they are allowed to visit and index.
For google page crawler
The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.
A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/
Robots.txt allows you to specify which pages should not be crawled. Pages that don't get crawled can still rank for keywords and show up in search results.
Robots.txt is a file that is used to leave out content from the crawling process of search engine spiders / bots. WebPages are indexed by the search engines. But there may be some content that we don't want to be crawled & indexed. The main plan is we don't want them to be indexed.
Robots.txt is veryuseful and helpful file. When a crawler crawl a website, it read robots.txt initially. Robots.txt is used widely to instruct the crawler that which page has to be crawled and which is not.