Mobile app version of vmapp.org
Login or Join
Hamaas447

: How do search engines handle protocol-relative links? With many sites supporting but not requiring HTTPS now, there's an increase in protocol-relative links. A protocol-relative link is one where

@Hamaas447

Posted in: #Http #Https #Links #RelativeUrls #SearchEngines

With many sites supporting but not requiring HTTPS now, there's an increase in protocol-relative links.

A protocol-relative link is one where the protocol is not specified, and the browser redirects to HTTPS if the page containing the link is viewed as HTTPS, and redirects to HTTP if the page containing the link is viewed as HTTP. For example, this link is protocol relative; if you hover over it you'll see the same protocol as however you're viewing this question.

How do search engines parse protocol-relative links? If the Googlebot is crawling a page via HTTP, will it remain in HTTP when following protocol-relative links, or will it know to look at both the HTTP and HTTPS versions of the target link?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Hamaas447

1 Comments

Sorted by latest first Latest Oldest Best

 

@Samaraweera270

How do search engines parse protocol-relative links? 


Web crawlers follow the same conventions as browsers do to parse URIs, as described in RFC 3986, including performing URL normalization in order to avoid crawling the same resource more than once.

Some crawlers like the Googlebot even render webpages, so protocol-relative and relative URIs will appear the same way to them as they do to users with modern browsers, with the base path from the current location added to them.

Google also states here that using relative URLs ensures your links and resources always use HTTPS.



You can test this with the Googlebot by using the fetch and render  mode in Fetch as Google, in which:


Googlebot gets all the resources referenced by your URL such as picture, CSS, and JavaScript files, running any code. to render or capture the visual layout of your page as an image. You can use the rendered image to detect differences between how Googlebot sees your page, and how your browser renders it.


By adding an image with a protocol-relative URI for the source, you'll be able to see if the Googlebot renders the page with that image or not.



Protocol-relative and relative URIs can however result in errors with some crawlers that are less sophisticated than the Googlebot since they often use parallel architectures in which the URLs are parsed from source code, databased, and then crawled in parallel. Unless the base path from the URL under which they were found was appended to the relative URI, the crawler won't be able to resolve it.

Another problematic area is when sitemaps are created automatically using sitemap tools, since they also just often parse the relative URI from the source code and list that in the sitemap, resulting in the same issue as above.

You can possibly circumvent these issues by setting a base element, which instructs browsers and bots on how to resolve relative URIs, including which protocol to use for relative URIs found on that page. It's highly recommended though to use absolute URLs whenever possible to avoid these issues altogether.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme