Pages

Wednesday, 27 April 2011

Configuring a Robots.txt file : Wordpress






One of the most important things that a lot of WordPress bloggers overlook is the necessity of setting up a robots.txt file. WordPress, by its very nature, creates a substantial risk of duplicate content. If you aren’t prepared to handle it appropriately it can hurt your search engine rankings and cost you traffic.

Wp Robots

When a search engine crawls your blog it looks at every link it can find and indexes the content so it knows what your blog is all about. This is a good thing, because it allows you to show up in that search engine’s results.

The problem is that while the search engine’s spiderbot is crawling your site, it’s going to follow the links that each post lists to its different categories and tags. The spider could also easily find different archives pages as well. Add all of that to the home page of your blog and the actual post itself and you could have one piece of content popping up a dozen times or more. Now what the search engine has to do is decide which option is the most relevant, and it may or may not always choose what you want.

What is a Robots.txt File?


A robots.txt file is simply a list of “do’s and don’ts” you provide for search engines that crawl your site. By specifying these rules, you tell the bot what pages you don’t want them to crawl. Why do this? Because you don’t want the engine to crawl multiple pages that will all display the same duplicate content.

A robots.txt file is a simple text file (created using Notepad, Wordpad, or similar program) that is placed in the root of your website for visiting spiders to look at.

A good robots.txt file helps prevent duplicate content penalties by telling Google (and other search engines) what they should and should not bother looking at. You can tell the search engines to ignore category archives or tags pages for example. By eliminating the options the search engine has to crawl you increase the likelihood  that the only place your content is indexed is the actual post page itself (which is ideal).

This is what my robots.txt file looks like :
User-agent:*
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /comments/
Disallow: /cgi-bin/
Disallow: /tag/
Disallow: /archives/
Disallow: /category/

Sitemap: http://www.techhive.in/sitemap.xml

Hopefully this post helps you get a robots file set up for you blog… post any feedback or questions in the comments!

6 comments:

aluminium composite panels said...

Nice website, I think I recognize this design from somewhere, is it a template? It suits your website anyway.

Aneesh Bhatnagar said...

This is a theme called iPadPress from NewWPThemes.com . Head over to their website to check it out!

Antoine Erspamer said...

Respect to website author , some fantastic selective information .

Aneesh Bhatnagar said...

Thanks for that!

Milo Tsuzuki said...

Some truly interesting info , well written and broadly user friendly .

fat cow coupons said...

I like the valuable info you provide in your articles. I will bookmark your weblog and check again here frequently. I'm quite sure I’ll learn a lot of new stuff right here! Best of luck for the next!

Post a Comment