: Should I disallow crawling of HTTP after moving to HTTPS? I am migrating my well indexed website from HTTP to HTTPS only. Steps planned: All HTML pages canonical to be HTTPS now All HTTP pages
I am migrating my well indexed website from HTTP to HTTPS only.
Steps planned:
All HTML pages canonical to be HTTPS now
All HTTP pages will 301 redirect to HTTPS with the same URL
All sitemaps will have HTTPS URLs
All links on HTML pages will be HTTPS
Issue 1:
Currently robots.txt at HTTPS level, disallow all, as we did not want crawler to crawl HTTPS pages.
What should be our approach now:
Allowing https now for all links is obvious.
But what about allowing / disallowing HTTP (as if it did not crawl HTTP how would it know that nothing new has happened, only HTTP has been migrated to HTTPS) Though 301 will indicate that but that will only happen when crawler opens the same page with HTTP.
Issue 2:
If in sitemap I add all HTTPS links, it lands me to same question above. Crawler will start crawling https links and index them. But we want it to know that it is the same HTTP version and pass on rankings
So ideally for some span of time until all our links are indexed with HTTPS should we have both HTTP and HTTPS in sitemaps?
As per current understanding, following is the plan:
If google stops crawling http it won't pass rankings. So i am planning to allow both http and https as per robots are concerned.
But don't know how they will crawl it as on our web-server, we are 301 redirecting all http links to https counterpart. So even if it crawls root of my site say www.example.com it will eventually redirect to www.example.com where it will find all https links and hence will follow that.
May be it also try to crawl the links it has already indexed(all http links) and will find that they are redirected, but that depends on crawler behavior and frequency. In the time span between crawler's understanding of 301 and https version also crawled it will create duplicate content issue and hence affect our rankings.
More posts by @Sherry384
3 Comments
Sorted by latest first Latest Oldest Best
If you're already 301 redirecting HTTP traffic to HTTPS then you've got the biggest step already done. Any links to HTTP pages will get redirected to the relevant HTTPS page, and search engines will follow those the same as real users. As long as your sitemap refers to the HTTPS version as well you should be fine.
As for more niggly bits, I'd also suggest checking the following:
Sitemap
You mentioned updating this, but some people use plugins to regenerate this regularly. Make sure whatever script you use doesn't accidentally replace HTTPS with HTTP by some automated process.
Canonicals
If you have canonical links in place across your site, make sure they point to HTTPS. If you use a WordPress plugin it might not automatically pick up the new "Site Address", so check your SEO plugins specifically. If you have a custom site, just check the protocol you include.
Robots.txt
The HTTP version of your robots.txt won't even be readable anymore if it's being redirected to the HTTPS version. Just make sure the version you serve over HTTPS doesn't block pages you actually want crawled.
Internal Links
Linking to pages within your site should always use HTTPS now. It may be easy to check in global menus, but checking in-page links gets more difficult. phpMyAdmin has a decent search facility to find any, so if you have that search for www.example.com and update from there. Other DB tools should have similar facilities. WordPress has plugins which even let you do in-place search/replacements.
External Links
You won't be able to control all external links coming into your website (oh how nice that would be) but you likely have control over more than you realise. Update all of your social media profiles (Facebook/Twitter/&c.) to link back to the HTTPS version of your website. Check links from email signatures as well, just to cover your bases.
HSTS
HTTP Strict Transport Security is a way to inform browsers to only use HTTPS when coming back to your website. Even if a user clicks a HTTP link, if they know your HSTS policy their browser will automatically request the HTTPS version without having to wait for a redirect. You can even submit your website to a "STS preloaded list" so browsers will be preloaded with your domain and will automatically request resources over HTTPS by default. Add a Strict-Transport-Security: max-age=10886400; includeSubDomains; preload header and submit to hstspreload.appspot.com/ or read www.owasp.org/index.php/HTTP_Strict_Transport_Security for more info.
No. Simply perform a shift: http to https. I don't know your business model but depending on the authority of your website the Google rankings will you see a massive disruption. An equilibrium will established with Google within 4 weeks, providing that you have taken the old pages to the new ones. The simple answer is don't block an intermediate link from http.
Issue 1
No, there's no advantage to blocking crawling of HTTP so no reason to do it. Further, and this is a bit speculative, it may interfere with the flow of value from external links referencing your old HTTP versions.
Issue 2
Again, no benefit in this. The 301 redirects will do the job of passing value to the new URLs.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.