Mobile app version of vmapp.org
Login or Join
Pierce454

: How to identify the client is a search robot? I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation,

@Pierce454

Posted in: #Ajax #Apache

I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation, I found that neither Yahoo , Bing, nor Baidu implemented that scheme!

I'm wondering if there is a way to identify the web client is a search robot. If they are, they will be shown the HTML snapshot I created.

It will be best if I can identify them in APACHE level, then I can just do a mod_rewrite. But it's still ok if I can do that in PHP or GWT.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Pierce454

2 Comments

Sorted by latest first Latest Oldest Best

 

@Lee4591628

You can check the User Agent HTTP Header. www.user-agents.org/ is a good place for identifying who are the crawlers.

You can also read more about logging in Apache. You can generate a special log for a list of user agents (bots) for example.

10% popularity Vote Up Vote Down


 

@Si4351233

Search engine robots are, as far as the client is concerned, no different from any other user-agent. Indeed is worth noting that many search engines (Google in particular) can get unhappy if their robots are served different content than regular visitors. This means that they tend to use generic user agent strings (e.g. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)), but usually with some detail buried deeper as in the provided example.

The best way of detecting such robots is to use an IP filter. You'll need to either compile your own list or rely on one like this.

Using such a list should enable you to handle all major search engine robots. Adding rewrite rules based on IP is also fairly simple so it should meet your requirements. Just be sure to update it every once in a while.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme