Mobile app version of vmapp.org
Login or Join
Lee4591628

: Google Webmaster Tools crawl errors due to Google decoding links that were urlencoded by Codeigniter In our pages we have internal links that are urlencoded because the URL may sometime contain

@Lee4591628

Posted in: #Codeigniter #CrawlErrors #Googlebot #GoogleSearchConsole #UrlEncoding

In our pages we have internal links that are urlencoded because the URL may sometime contain special characters such as () or !

So in our source code we have a link: example.com/American_Dad%21-245<br> In Google Webmaster Tools I have an error for this same link: example.com/American_Dad!-245
So basically Google has decoded the link in my source code and was trying to access that decoded URL. However since this decoded URL has a special character !, my Codeigniter setup returns a 400 error status code.

Is it possible to fix this issue without allowing special characters in Codeigniter setup, because I think those restrictions are important for the security of the site.

Should I double urlencode all my links as seems to be suggested by this post?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Lee4591628

2 Comments

Sorted by latest first Latest Oldest Best

 

@Pierce454

I think that the problem is that you have non-encoded forms of the links somewhere in your site. Googlebot has found those links and is trying to access those.

So, try to doublecheck that all links in your page are actually URL encoded.

The post you were referring to talks about double-encoding URLs in Giigle sitemaps, which is a separate issue.

10% popularity Vote Up Vote Down


 

@Heady270

Section 2.2 of RFC 3986 addresses reserved "sub delimiters" such as !:

It says that when you produce URLs with these characters you should url encode them:


URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.


However, it also says that you should accept the characters either encoded on unencoded unless the unescaped version of the character is used as a delimiter in that particular portion of the URL:


If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.


! has no delimiting role in the path portion of a http URL. You should not be returning a 400 status when you encounter it unencoded there. You should be treating %21 and ! the same and showing the same page in either case.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme