Tuesday, March 31, 2015

What is Robots.txt?

What is robots.txt?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Cheat Sheet

Block all web crawlers from all content
User-agent: * 
Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot 
Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot 
Disallow: /no-google/blocked-page.html
Sitemap Parameter
User-agent: * 
Disallow: 
Sitemap: http://www.example.com/none-standard-location/sitemap.xml

No comments:

Post a Comment

Google+ Followers