: How do I comply with Google's First Click Policy with PHP? I recently password protected my site and made it a requirement for users to sign on or register in order to view the content and
I recently password protected my site and made it a requirement for users to sign on or register in order to view the content and post on the forums. In doing this, Google dropped most of my pages from the Google search results, since GoogleBot could no longer index them due to the password protection. This means that even though my site contains really relevant information, Google can't access it anymore.
I've researched Google's First Click Policy, which seems like a feasible option for my site. It basically states:
...we will crawl and index your site to the extent that you allow Googlebot to access it. In order to provide the best possible user experience and help more users discover your content, we encourage you to try First Click Free. If you prefer to limit access to your site to subscribers only, we will respect your decision and show a “subscription” label next to your links.
In this article, Google also states some guidelines for implementing this policy:
To implement First Click Free, you must allow all users who find your page through Google search to see the full text of the document that the user found in Google's search results and that Google's crawler found on the web without requiring them to register or subscribe to see that content. The user's first click to your content is free and does not require logging in. You may, however, block the user with a login or payment or registration request when he tries to click away from that page to another section of your content site.
I don't want to open my site up for public viewing, but how can I let Google index my site again while still maintaining a level of content protection, and required membership? Through PHP, which steps should be taken to ensure that GoogleBot can read and index the content of my site, whilst restricting users to viewing just the article that Google refers them to.
More posts by @Kristi941
1 Comments
Sorted by latest first Latest Oldest Best
As stated in the question, the best option for this type of issue would be to allow GoogleBot to index the site, but restrict the viewing of regular users.
To do this, you need to be able to differentiate between three types of viewers:
GoogleBot
A visitor being referred by Google search (as stated in the First Click Policy)
A direct visitor either being referred by a site other than Google or with no referrer
In order to differentiate between these three types, you need to do a series of two checks, one through HTTP_USER_AGENT and another through HTTP_REFERER.
Through the use of $_SERVER['HTTP_USER_AGENT'], detect if GoogleBot is accessing the website. If true, allow GoogleBot to access it without requiring a login.
Similarly, with $_SERVER['HTTP_REFERER'] detect where the user came from. If the user was referred by another site, this variable will contain a string containing the URL of that site. If this variable contains www.google.com then you should allow the viewer to access the site.
If the viewer isn't GoogleBot and wasn't referred by Google, you should force them to log in. This ensures that even if they were brought to your site by Google, their second click will require them to log in. If they just simply are from a site other than Google, they will also be forced to log in.
Here is a piece of code that will do these basic checks
//check if the viewer is googlebot, if so, allow
if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
//allow access
}
//check if the viewer was referred by google, if so, allow
elseif(strstr(strtolower($_SERVER['HTTP_REFERER']), "https://www.google.com"))
{
//allow access
}
//if the viewer isn't from google, block vistor viewing
else{
//redirect to login
}
For news sites, it is important to note that Google now requires 3 free articles per day if complying with the First Click Policy.
It is possible to limit the number of free articles that a Google News reader can access via First Click Free. A user coming from a host matching [www.google.] or [news.google.] must be able to see a minimum of 3 articles per day. This practice is described as "metering" the user: when the user has clicked on too many of a publisher’s articles from Google News, the meter for freely accessible articles on that site is exhausted.
It is also extrememly important to note that it is very easy to "spoof" a HTTP_USER_AGENT. There are numerous browser plugins and applications which would allow a viewer to have a user agent pretending to be GoogleBot. This would allow a spoofer to view every crawlable and indexable page on the site, regardless of their membership status. Secondary steps to ensure that it really is the real GoogleBot may be necessary, but a basic check like this one should work in many cases.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.