Mobile app version of vmapp.org
Login or Join
Sherry384

: Should I disallow crawling of HTTP after moving to HTTPS? I am migrating my well indexed website from HTTP to HTTPS only. Steps planned: All HTML pages canonical to be HTTPS now All HTTP pages

@Sherry384

Posted in: #301Redirect #Https #Migration #Seo #WebCrawlers

I am migrating my well indexed website from HTTP to HTTPS only.

Steps planned:


All HTML pages canonical to be HTTPS now
All HTTP pages will 301 redirect to HTTPS with the same URL
All sitemaps will have HTTPS URLs
All links on HTML pages will be HTTPS


Issue 1:

Currently robots.txt at HTTPS level, disallow all, as we did not want crawler to crawl HTTPS pages.

What should be our approach now:
Allowing https now for all links is obvious.

But what about allowing / disallowing HTTP (as if it did not crawl HTTP how would it know that nothing new has happened, only HTTP has been migrated to HTTPS) Though 301 will indicate that but that will only happen when crawler opens the same page with HTTP.

Issue 2:

If in sitemap I add all HTTPS links, it lands me to same question above. Crawler will start crawling https links and index them. But we want it to know that it is the same HTTP version and pass on rankings

So ideally for some span of time until all our links are indexed with HTTPS should we have both HTTP and HTTPS in sitemaps?

As per current understanding, following is the plan:

If google stops crawling http it won't pass rankings. So i am planning to allow both http and https as per robots are concerned.

But don't know how they will crawl it as on our web-server, we are 301 redirecting all http links to https counterpart. So even if it crawls root of my site say www.example.com it will eventually redirect to www.example.com where it will find all https links and hence will follow that.

May be it also try to crawl the links it has already indexed(all http links) and will find that they are redirected, but that depends on crawler behavior and frequency. In the time span between crawler's understanding of 301 and https version also crawled it will create duplicate content issue and hence affect our rankings.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Sherry384

3 Comments

Sorted by latest first Latest Oldest Best

 

@Karen161

If you're already 301 redirecting HTTP traffic to HTTPS then you've got the biggest step already done. Any links to HTTP pages will get redirected to the relevant HTTPS page, and search engines will follow those the same as real users. As long as your sitemap refers to the HTTPS version as well you should be fine.

As for more niggly bits, I'd also suggest checking the following:

Sitemap

You mentioned updating this, but some people use plugins to regenerate this regularly. Make sure whatever script you use doesn't accidentally replace HTTPS with HTTP by some automated process.

Canonicals

If you have canonical links in place across your site, make sure they point to HTTPS. If you use a WordPress plugin it might not automatically pick up the new "Site Address", so check your SEO plugins specifically. If you have a custom site, just check the protocol you include.

Robots.txt

The HTTP version of your robots.txt won't even be readable anymore if it's being redirected to the HTTPS version. Just make sure the version you serve over HTTPS doesn't block pages you actually want crawled.

Internal Links

Linking to pages within your site should always use HTTPS now. It may be easy to check in global menus, but checking in-page links gets more difficult. phpMyAdmin has a decent search facility to find any, so if you have that search for www.example.com and update from there. Other DB tools should have similar facilities. WordPress has plugins which even let you do in-place search/replacements.

External Links

You won't be able to control all external links coming into your website (oh how nice that would be) but you likely have control over more than you realise. Update all of your social media profiles (Facebook/Twitter/&c.) to link back to the HTTPS version of your website. Check links from email signatures as well, just to cover your bases.

HSTS

HTTP Strict Transport Security is a way to inform browsers to only use HTTPS when coming back to your website. Even if a user clicks a HTTP link, if they know your HSTS policy their browser will automatically request the HTTPS version without having to wait for a redirect. You can even submit your website to a "STS preloaded list" so browsers will be preloaded with your domain and will automatically request resources over HTTPS by default. Add a Strict-Transport-Security: max-age=10886400; includeSubDomains; preload header and submit to hstspreload.appspot.com/ or read www.owasp.org/index.php/HTTP_Strict_Transport_Security for more info.

10% popularity Vote Up Vote Down


 

@Michele947

No. Simply perform a shift: http to https. I don't know your business model but depending on the authority of your website the Google rankings will you see a massive disruption. An equilibrium will established with Google within 4 weeks, providing that you have taken the old pages to the new ones. The simple answer is don't block an intermediate link from http.

10% popularity Vote Up Vote Down


 

@Samaraweera270

Issue 1

No, there's no advantage to blocking crawling of HTTP so no reason to do it. Further, and this is a bit speculative, it may interfere with the flow of value from external links referencing your old HTTP versions.

Issue 2

Again, no benefit in this. The 301 redirects will do the job of passing value to the new URLs.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme