Mobile app version of vmapp.org
Login or Join
Jessie594

: Blocking pages from being crawled with specific URL encoded parameters While seeing the Google Webmaster Tools report there are certain URL encoded parameters like %5c and %22 coming up in URL's

@Jessie594

Posted in: #GoogleSearchConsole #Htaccess #RobotsTxt

While seeing the Google Webmaster Tools report there are certain URL encoded parameters like %5c and %22 coming up in URL's of site.

We tried to identify the issue and observed that due to incorrect structure in two of the links having " (forward slash and ") were created in the site, which has now been corrected.

We have removed and corrected these, but we feel that Google is now crawling some of the pages with forward slash and %22, which are coming up in Webmaster Tools as duplicate URL's.

Is there any way through robots.txt, through Google Webmaster Tools, or through .htaccess to let Google know not to follow links or crawl pages with a forward slash or %22 in it? The site is built upon the Joomla CMS platform.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Jessie594

1 Comments

Sorted by latest first Latest Oldest Best

 

@Nimeshi995

The Googlebot does observe pattern matching, however other search engine bots may not.

You can add these patterns to your "robots.txt" file. See the section named "Pattern matching" here: Google Webmasters Tools: Block or remove pages using a robots.txt file

As stated in that, you could try using an asterisk * to indicate a character or string of characters contained in URL's you'd like to block.

The example they provide there is to block all URL's containing a ? in them:

User-agent: Googlebot
Disallow: /*?


You can then test your robots.txt file using the steps at the bottom of the above link.

I'm unclear from your question whether you meant a forward slash / or backslash . If you do in fact mean a forward slash /, note that you should be careful trying to block this because that could result in all directories being blocked. For example, this will block the entire site:

User-agent: Googlebot
Disallow: /

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme