About Us Latest NewsFAQ
Robots.txt PDF Print Email

Robots.txt

A text file present in the root directory of a website which is used to control which pages are indexed by a robot or web spider.

Only robots which comply with the Robots Exclusion Standard will follow the instructions contained in this file.

The Robots Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent co-operating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.

 

Robots are often used by search engines to categorise and archive web sites, or by webmasters to proofread source code. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search.

This might be, for example, for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorisation of the website as a whole.

Example robots.txt files 

This example allows all robots to visit all files because the wildcard "*" specifies all robots:

User-agent: *
Disallow:

This example keeps all robots out:

User-agent: *
Disallow: /

The next is an example that tells all crawlers not to enter into four directories of a website:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

Call us for FREE on 0800 0430764 to discuss directing search engines around your website

 

 

Free SEO Tips


Valid XHTML 1.0 Transitional   Valid CSS!  payments powered by worldpay

©1997-2008 EQ MEDIA Limited. All Rights Reserved. Website Design, Development and Marketing.
Designated trademarks and brands appearing in this website are the sole property of their respective owners.
Use of the EQ MEDIA website www.eqmedia.co.uk constitutes acceptance of the EQ MEDIA Limited Terms of Use and Privacy Policy.