By definition a robots.txt file is a text file present in the root directory of a website which is used to direct the search engine crawlers to which files not to crawl.
In theory, a search engine robot (bot, spider etc) wants to vists a URL such as http://www.s2rsolutions.com/free-instant-website-report/. Before it does so, it firsts checks for http://www.s2rsolutions.com/robots.txt, and finds:
The “User-agent: *” means this section applies to all robots. The “Disallow:” is followed by URLs of web pages within the same website (http://www.s2rsolutions.com) that the robot should NOT index.
There could be a Dissallow all pages request if it necessary. The “Disallow: /” tells the robot that it should not visit any pages on the site.
Two important notes when using /robots.txt:
A robots.txt file should not be used to “hide” information or pages.
An example of why you might want search engines to NOT index a certain URL within your site would be that you want your website to be indexed and rank well for the products and services you offer of course, but maybe you have a client login page that only customers should access. You may not want these pages to be easily found by search users. Adding these URLs to the robots.txt file will prevent the major search engines such as Google, Bing, Yahoo from indexing those URLs.
Take a look at this example of how CNN.com builds their CNN.com robots.txt file:
Sitemap: http://www.cnn.com/sitemap_index.xml Sitemap: http://www.cnn.com/sitemap_news.xml Sitemap: http://www.cnn.com/video_sitemap_index.xml Sitemap: http://www.cnn.com/sitemap_election_2010.xml User-agent: * Disallow: /.element Disallow: /editionssi Disallow: /ads Disallow: /aol Disallow: /audio Disallow: /audioselect Disallow: /beta Disallow: /browsers Disallow: /cl Disallow: /cnews Disallow: /cnn_adspaces Disallow: /cnnbeta Disallow: /cnnintl_adspaces Disallow: /development Disallow: /NewsPass Disallow: /NOKIA Disallow: /partners Disallow: /pipeline Disallow: /pointroll Disallow: /POLLSERVER Disallow: /pr Disallow: /PV Disallow: /quickcast Disallow: /Quickcast Disallow: /QUICKNEWS Disallow: /test Disallow: /virtual Disallow: /WEB-INF
A well thought out and well formatted robots.txt file should be part of every good search engine optimization campaign.
If you would like help with your robots.txt file or any alternative marketing needs, conact S2R Solutions.