Mobile app version of vmapp.org
Login or Join
Si4351233

: SEO: Duplicate content caused by pagination/tag/index pages Possible Duplicate: What is duplicate content and how can I avoid being penalized for it on my site? I read that I should

@Si4351233

Posted in: #Indexing #Noindex #Seo

Possible Duplicate:
What is duplicate content and how can I avoid being penalized for it on my site?




I read that I should use a NoIndex tag for transitional pages like index, pagination, or tag pages. Is this true? I have a Tumblr blog that I am considering putting NoIndex on the index, search, tag, pagination, and date pages.

Is NoIndex enough or are there other methods? Should the index page of a site be marked as NoFollow? That doesn't really sound too good.

What are the pages you would put NoIndex on?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Si4351233

3 Comments

Sorted by latest first Latest Oldest Best

 

@Cofer257

No, you should not stop search engines from indexing pagination or tags pages. (And definitely not your index page!) For blogs or sites without a clear menu structure, that is the main way they will find your content.

More often than not the search engines can work out those pages just fine, and discover your most important content i.e. the blog entries themselves.

However if you're noticing these list pages being indexed more than the blog posts, I would recommend that you block indexing of "infinite configurations", for example sorting by popularity. Or anything where the same items are listed in different orders - allow indexing of one sensible order and ignore the rest.

10% popularity Vote Up Vote Down


 

@Kevin317

If multiple page URLs produce the same content then that's exactly what <link rel="canonical"> is for. It's for telling the search engines that the content of multiple URLs are the same and to use that specific one as the primary one. That avoids duplication issues altogether and is very simple to do.

10% popularity Vote Up Vote Down


 

@Vandalay111

There is nothing that can stop every robot from spidering ANY page on your site, unless it is forcibly prevented.

That being said, you can always encourage the robot to follow and index what you want/don't want. Some of these methods include:


Creating a robots.txt file and placing it in your root directory.
Setting all the response header cache options properly for each resource.
Creating a sitemap.xml document with only the pages you want to have spidered.
Consistent capitalization. Having everything lowercase many times will prevent dupes involving upper/lowercase.
Avoid passing $_GET variables in the URL, unless it truely creates unique data. (such as abc.com/index.php?session=21389271893219 use $_POST for this instead.
Not having duplicate content. (try mod_rewrite and/or redirection to prevent this)
Using bot detection to send a 404 NOT FOUND to those pages and 200 or redirect to live users. (301s are debatable)
Using Google webmaster tools to prevent the pages from being displayed in search results. (although this is usually a last-resort solution.)
Avoid UTF-8 encoded/encodable URLs, they will canonize.
Use proper session management to prevent secure information from being accessed directly.


There are more, but this works for 99% of most everything. The trick is good initial URL directory design.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme