: SEO: Duplicate content caused by pagination/tag/index pages Possible Duplicate: What is duplicate content and how can I avoid being penalized for it on my site? I read that I should
Possible Duplicate:
What is duplicate content and how can I avoid being penalized for it on my site?
I read that I should use a NoIndex tag for transitional pages like index, pagination, or tag pages. Is this true? I have a Tumblr blog that I am considering putting NoIndex on the index, search, tag, pagination, and date pages.
Is NoIndex enough or are there other methods? Should the index page of a site be marked as NoFollow? That doesn't really sound too good.
What are the pages you would put NoIndex on?
More posts by @Si4351233
3 Comments
Sorted by latest first Latest Oldest Best
No, you should not stop search engines from indexing pagination or tags pages. (And definitely not your index page!) For blogs or sites without a clear menu structure, that is the main way they will find your content.
More often than not the search engines can work out those pages just fine, and discover your most important content i.e. the blog entries themselves.
However if you're noticing these list pages being indexed more than the blog posts, I would recommend that you block indexing of "infinite configurations", for example sorting by popularity. Or anything where the same items are listed in different orders - allow indexing of one sensible order and ignore the rest.
If multiple page URLs produce the same content then that's exactly what <link rel="canonical"> is for. It's for telling the search engines that the content of multiple URLs are the same and to use that specific one as the primary one. That avoids duplication issues altogether and is very simple to do.
There is nothing that can stop every robot from spidering ANY page on your site, unless it is forcibly prevented.
That being said, you can always encourage the robot to follow and index what you want/don't want. Some of these methods include:
Creating a robots.txt file and placing it in your root directory.
Setting all the response header cache options properly for each resource.
Creating a sitemap.xml document with only the pages you want to have spidered.
Consistent capitalization. Having everything lowercase many times will prevent dupes involving upper/lowercase.
Avoid passing $_GET variables in the URL, unless it truely creates unique data. (such as abc.com/index.php?session=21389271893219 use $_POST for this instead.
Not having duplicate content. (try mod_rewrite and/or redirection to prevent this)
Using bot detection to send a 404 NOT FOUND to those pages and 200 or redirect to live users. (301s are debatable)
Using Google webmaster tools to prevent the pages from being displayed in search results. (although this is usually a last-resort solution.)
Avoid UTF-8 encoded/encodable URLs, they will canonize.
Use proper session management to prevent secure information from being accessed directly.
There are more, but this works for 99% of most everything. The trick is good initial URL directory design.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.