One of the most important things that a lot of WordPress bloggers overlook is the necessity of setting up a robots.txt file. WordPress, by its very nature, creates a substantial risk of duplicate content. If you aren’t prepared to handle it appropriately it can hurt your search engine rankings and cost you traffic.

When a search engine crawls your blog it looks at every link it can find and indexes the content so it knows what your blog is all about. This is a good thing, because it allows you to show up in that search engine’s results.
The problem is that while the search engine’s spiderbot is crawling your site, it’s going to follow the links that each post lists to its different categories and tags. The spider could also easily find different archives pages as well. Add all of that to the home page of your blog and the actual post itself and you could have one piece of content popping up a dozen times or more. Now what the search engine has to do is decide which option is the most relevant, and it may or may not always choose what you want.
What is a Robots.txt File?
A robots.txt file is simply a list of “do’s and don’ts” you provide for search engines that crawl your site. By specifying these rules, you tell the bot what pages you don’t want them to crawl. Why do this? Because you don’t want the engine to crawl multiple pages that will all display the same duplicate content.
A robots.txt file is a simple text file (created using Notepad, Wordpad, or similar program) that is placed in the root of your website for visiting spiders to look at.
A good robots.txt file helps prevent duplicate content penalties by telling Google (and other search engines) what they should and should not bother looking at. You can tell the search engines to ignore category archives or tags pages for example. By eliminating the options the search engine has to crawl you increase the likelihood that the only place your content is indexed is the actual post page itself (which is ideal).
This is what my robots.txt file looks like :
User-agent:*
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /comments/
Disallow: /cgi-bin/
Disallow: /tag/
Disallow: /archives/
Disallow: /category/
Sitemap: http://www.techhive.in/sitemap.xml
Hopefully this post helps you get a robots file set up for you blog… post any feedback or questions in the comments!
6 comments:
Nice website, I think I recognize this design from somewhere, is it a template? It suits your website anyway.
This is a theme called iPadPress from NewWPThemes.com . Head over to their website to check it out!
Respect to website author , some fantastic selective information .
Thanks for that!
Some truly interesting info , well written and broadly user friendly .
I like the valuable info you provide in your articles. I will bookmark your weblog and check again here frequently. I'm quite sure I’ll learn a lot of new stuff right here! Best of luck for the next!
Post a Comment