: How to allow Google to index protected content? I'm working on my site and as it require users to log in it will be hard to Google to index the sites, because to see 90% of the content
I'm working on my site and as it require users to log in it will be hard to Google to index the sites, because to see 90% of the content you have to be online.
So I've made a script which looks for the ip that enters the site if it is between 66.249.66.1 to 66.249.71.206. So if the ip is between this range, i set "the google bot" as online and it will be able to see all pages a normal user would see.
Is this a good idea? Are there more ip ranges? Can I trust these ip ranges?
More posts by @Jamie184
3 Comments
Sorted by latest first Latest Oldest Best
What you do is not a good idea and can be penalized as cloaking.
Till 1st October 2017 the best practice was the First Click Free as mentioned in a previous answer. However since October 2017 this has changed.
Now google uses Flexible Sampling for paywalled or otherwise not freely available content.
Basically Google lets the publishers decide how much content they will offer without restrictions, but they should mark up their content accordingly. So Google understands which content is protected and doesn't penalize the site for cloaking. A publisher can decide to offer a limited number of pages or just portions of pages for free and have the rest restricted.
Google indexes all restricted pages if the robots can see them. However the fact that they are protected may affect their ranking in ways only Google knows.
If you want to give Google access to restricted content, you can use First Click Free by Google.
First Click Free is designed to protect your content while allowing
you to include it Google's search index. To implement First Click
Free, you must allow all users who find your page through Google
search to see the full text of the document that the user found in
Google's search results and that Google's crawler found on the web
without requiring them to register or subscribe to see that content.
The user's first click to your content is free and does not require
logging in. You may, however, block the user with a login or payment
or registration request when he tries to click away from that page to
another section of your content site.
This is not a good idea and no, you cannot trust these IP ranges. The IP addresses used by Google are not public. But some/most search engine crawlers can be identified by doing a reverse DNS lookup on the IP address.
A googlebot example: 66.249.64.0 has a PTR record to crawl-66-249-64-0.googlebot.com, and any IP with a PTR record to a subdomain on googlebot.com is an IP address used by googlebot.
What you are doing is showing one set of content to Google and another to the actual user. This is heavily frowned upon and is called cloaking.
You should watch Matt Cutts' definitive cloaking video.
The best option is to take a subset of that content that you are willing to be publicly visible and make a portion of the site that exposes this content to the search engines and users, and if users want to see more they'll have to login.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.