866-492-9710

What is a Robots.txt File

What is a Robots.txt File

 

By definition a robots.txt file is a text file present in the root directory of a website which is used to direct the search engine crawlers to which files not to crawl.

In theory, a search engine robot (bot, spider etc) wants to vists a URL such as http://www.s2rsolutions.com/free-instant-website-report/. Before it does so, it firsts checks for http://www.s2rsolutions.com/robots.txt, and finds:

User-agent: *
Disallow:

The “User-agent: *” means this section applies to all robots. The “Disallow:” is followed by URLs of web pages within the same website (http://www.s2rsolutions.com) that the robot should NOT index.

There could be a Dissallow all pages request if it necessary. The “Disallow: /” tells the robot that it should not visit any pages on the site.

Two important notes when using /robots.txt:

  • robots can ignore your /robots.txt. Such as malware robots, and email address harvesters.
  • the /robots.txt file is a public file. Anyone can see it and therefore what you dont want the search engines to index.

A robots.txt file should not be used to “hide” information or pages.

An example of why you might want search engines to NOT index a certain URL within your site would be that you want your website to be indexed and rank well for the products and services you offer of course, but maybe you have a client login page that only customers should access. You may not want these pages to be easily found by search users. Adding these URLs to the robots.txt file will prevent the major search engines such as Google, Bing, Yahoo from indexing those URLs.

Take a look at this example of how CNN.com builds their CNN.com robots.txt file:

Sitemap: http://www.cnn.com/sitemap_index.xml
Sitemap: http://www.cnn.com/sitemap_news.xml
Sitemap: http://www.cnn.com/video_sitemap_index.xml
Sitemap: http://www.cnn.com/sitemap_election_2010.xml
User-agent: *
Disallow: /.element
Disallow: /editionssi
Disallow: /ads
Disallow: /aol
Disallow: /audio
Disallow: /audioselect
Disallow: /beta
Disallow: /browsers
Disallow: /cl
Disallow: /cnews
Disallow: /cnn_adspaces
Disallow: /cnnbeta
Disallow: /cnnintl_adspaces
Disallow: /development
Disallow: /NewsPass
Disallow: /NOKIA
Disallow: /partners
Disallow: /pipeline
Disallow: /pointroll
Disallow: /POLLSERVER
Disallow: /pr
Disallow: /PV
Disallow: /quickcast
Disallow: /Quickcast
Disallow: /QUICKNEWS
Disallow: /test
Disallow: /virtual
Disallow: /WEB-INF

A well thought out and well formatted robots.txt file should be part of every good search engine optimization campaign.

If you would like help with your robots.txt file or any alternative marketing needs, conact S2R Solutions.

Be sure to follow us on Facebook or Twitter