: Google keeps indexing /comment/reply URL With the new update of Google algorithm called Penguin, I think my site was being penalized due to webspam. But of course I don't create post which seems
With the new update of Google algorithm called Penguin, I think my site was being penalized due to webspam. But of course I don't create post which seems to be spam to Google. It is just I think how Google index my site.
I found out that Google index the URL of my site like:
www.example.com/comment/reply/3866/26556
So there are so many comment/reply URL index by Google. I have already added:
Disallow: /comment/reply/ Disallow: /?q=comment/reply/
but still Google still index this URL.
Any idea how to prevent Google from indexing comments?
More posts by @Yeniel560
4 Comments
Sorted by latest first Latest Oldest Best
Using disallow in your robots file will not stop Google from indexing those links or pages. That only tells Google do not crawl them.
If those pages are linked to from other pages on your domain they still will index the pages.
You haven't mentioned how long ago you added those Disallow rules. The effect isn't instantaneous, requiring at the very least a wait until you're spidered again, and even then might take a bit longer for them to actually get removed from the index/results.
If you use Webmaster Tools, are they showing up in your "Crawler access" screen(under Site Configuration)? That'll at least give you an idea when the robots.txt file was last grabbed.
You can use google webmaster tool: site-configuration section -> site links in order to demote links on your website. You can also use robots.txt as suggested by Ilmari Karonen as well as configure .htaccess (or httpd.conf) to preform 301 redirect
Have you made sure that your robots.txt syntax is correct? If you've signed up for Google's Webmaster Tools, you can use their robots.txt testing tool to see how Googlebot interprets it, but there are also several third-party robots.txt syntax checkers on the web.
You can also add robots meta tags to your reply pages to stop search engines from indexing them. One reason to do this, even if you have the pages disallowed in robots.txt, is that not all bots necessarily understand the fancier robots.txt syntax extensions such as * wildcards, or at least may not understand them the same way.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.