What is robots.txt file and use?

Started by anandiphone, September 02, 2011, 08:39:39 AM

anandiphone

Kindly tell me the actual use of robots.txt file and its use?

Webhelpforums

To limit search engines and other crawlers from crawling URLs you do not want. (You can also e.g. control crawl-delay to prevent nice-behaving crawler robots from overloading your website server)

You can also use it to point to an XML sitemap
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

evelyn8411

Robot text is used to direct search engines about where they can crawl your site. Of course, only approved SE follow this. Vicious crawlers will not follow your robot text and crawl everything it wants/

John23

In point of view Robots txt files are use to give permission to search engine weather to index the webpaeg or foolow the page.there are several instruction for it.

alvypetersen

Robots.txt is a text file that tells search engine spiders, also known as search engine robots, which parts of your website they can enter and which parts they can't.

dpfocanil

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. It divided into sections by the robot crawler's User Agent name.

User-agent: * Disallow:

hannahraasch

#6
Simply, the robots.txt is a very simple text file that is placed on our root directory. For example www.example.com/robots.txt. This file tells search engine and other robots which areas of our site they are allowed to visit and index.


Nichole Green

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.
A robots.txt is a simple text file that can be created with Notepad. If you are using WordPress a sample robots.txt file would be:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
If you are not using WordPress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/

Black Ninja

Robots.txt allows you to specify which pages should not be crawled. Pages that don't get crawled can still rank for keywords and show up in search results.

infoanil

Robots.txt is a file that is used to leave out content from the crawling process of search engine spiders / bots. WebPages are indexed by the search engines. But there may be some content that we don't want to be crawled & indexed. The main plan is we don't want them to be indexed.

LucieJason

Robots.txt is veryuseful and helpful file. When a crawler crawl a website, it read robots.txt initially. Robots.txt is used widely to instruct the crawler that which page has to be crawled and which is not.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper