Robots.txtA text file present in the root directory of a website which is used to control which pages are indexed by a robot or web spider.Only robots which comply with the Robots Exclusion Standard will follow the instructions contained in this file. The Robots Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent co-operating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.
Robots are often used by search engines to categorise and archive web sites, or by webmasters to proofread source code. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorisation of the website as a whole. Example robots.txt files This example allows all robots to visit all files because the wildcard "*" specifies all robots: User-agent: * Disallow:
This example keeps all robots out: User-agent: * Disallow: /
The next is an example that tells all crawlers not to enter into four directories of a website: User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/
Call us for FREE on 0800 0430764 to discuss directing search engines around your website |