: Blocking pages from being crawled with specific URL encoded parameters While seeing the Google Webmaster Tools report there are certain URL encoded parameters like %5c and %22 coming up in URL's
While seeing the Google Webmaster Tools report there are certain URL encoded parameters like %5c and %22 coming up in URL's of site.
We tried to identify the issue and observed that due to incorrect structure in two of the links having " (forward slash and ") were created in the site, which has now been corrected.
We have removed and corrected these, but we feel that Google is now crawling some of the pages with forward slash and %22, which are coming up in Webmaster Tools as duplicate URL's.
Is there any way through robots.txt, through Google Webmaster Tools, or through .htaccess to let Google know not to follow links or crawl pages with a forward slash or %22 in it? The site is built upon the Joomla CMS platform.
More posts by @Jessie594
1 Comments
Sorted by latest first Latest Oldest Best
The Googlebot does observe pattern matching, however other search engine bots may not.
You can add these patterns to your "robots.txt" file. See the section named "Pattern matching" here: Google Webmasters Tools: Block or remove pages using a robots.txt file
As stated in that, you could try using an asterisk * to indicate a character or string of characters contained in URL's you'd like to block.
The example they provide there is to block all URL's containing a ? in them:
User-agent: Googlebot
Disallow: /*?
You can then test your robots.txt file using the steps at the bottom of the above link.
I'm unclear from your question whether you meant a forward slash / or backslash . If you do in fact mean a forward slash /, note that you should be careful trying to block this because that could result in all directories being blocked. For example, this will block the entire site:
User-agent: Googlebot
Disallow: /
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.