Add a robots.txt file to your blog

If you are on free hosting such as wordpress.com or blogspot, you don’t need to do this since they have already made this file for you.

However, if you are on paid hosting, you need to make a robots.txt file.

What is this robots.txt file?

I have limited knowledge about this so let me allow wikipedia to define what’s robots.txt.

The robots exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. (Wikipedia)

Is this important?

Yes. Search engines use robots to crawl your sites. That means, it browses your files in your hosting. The robot crawls every bit of your site, public and private files all together. That will pose a big security risk for you since confidential data will be made available by the robots and be placed into the search engines. Imagine your password files can be found on google searches!

People visit your website through your web pages. What they are seeing are a result of html code that has been parsed/processed by your browser. Robots don’t have browser so they just crawl the files one by one. You have to restrict their access to your files. Robots should be limited only the files that you allow them to crawl.

So how do you limit these robots? By creating a robots.txt file.

How to create a robots.txt file?

  • You can write the contents of the robots.txt file yourself if you know how to code it.
  • You can check these websites out that will generate the contents for you.
    • http://www.mcanerin.com/EN/search-engine/robots-txt.asp
    • http://www.1-hit.com/all-in-one/tool-robots.txt-generator.htm
  • Use an external program to create the file
    • http://www.softsland.com/oven_fresh_robots_txt_maker.html
  • If you are using wordpress in your blog, these posts might prove useful.

The robots.txt file I’m using is what I copied from enblogopedia. http://www.enblogopedia.com/robots.txt

9 thoughts on “Add a robots.txt file to your blog

  1. ill remember this when i have my own. salamat sa tips

  2. just a question here, where would you put the robot.txt file?

  3. @Tina – no problem

    @fionixe – You put the robots.txt in the root directory. Usually under public_html. =)

  4. My googlebot is blocked by my robots.txt I want to update the file but I don’t know how to upload the robots.txt to my blogspot account.

    Can you please help me how to FTP the file to my http://maalamat.blogspot.com

    Thanks.

    ices last blog post..The Ego has Landed

  5. @Ice – I’m not sure if you can upload a robots.txt file when you are hosted in blogger. But as far as I know, blogger won’t block Google Bot since they are just from the same company. 🙂

  6. nice post,good work thanks

  7. What if my hosting provider add some crappy comments inside my robots.txt which Google webmaster report as parse errors.

  8. I was wondering if someone could post their ROBOTS.TXT file or give me an idea what is the best robots file we could use for our wordpress installed blogs.

    This is what I use and wanted to know if someone could give me any tips or suggestions or what?

    Here is what I used:

    User-agent: *
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /rss/
    Disallow: /comments/feed/
    Disallow: /comments/
    Disallow: /category/*
    Disallow: /tag/*

    But when i do site:domain.com in google it shows both my tags and categories and I dont want that .. really no reason for those to be indexed so please let me know what to do or any other useful tips or suggestions.

    Also it indexed my archives like domain.com/2011/03 .. any idea how to prevent this?

  9. I’m not sure if you can upload a robots.

Leave a Reply