: How to identify the client is a search robot? I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation,

I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation, I found that neither Yahoo , Bing, nor Baidu implemented that scheme!

I'm wondering if there is a way to identify the web client is a search robot. If they are, they will be shown the HTML snapshot I created.

It will be best if I can identify them in APACHE level, then I can just do a mod_rewrite. But it's still ok if I can do that in PHP or GWT.

10.02% popularity Vote Up Vote Down

: I say that moneybookers is also used a lot and its simple. http://www.moneybookers.com Maybe in the future google checkout maybe is also good, but for the moment is very limited. More to check

@Pierce454

0 Comments

: The GPL doesn't forbid commercial usage. You "just" have to turn all your project with GPL license. With a LGPL component, this is not a problem. Try to look for articles under Creative

@Pierce454

0 Comments

: Is it fine to share your blog everywhere? I am trying to perform some SEO on my blog. I want to to keep sharing the blog on FaceBook and other social networks, sharing the link on different

@Pierce454

Posted in: #Seo

3 Comments

: Domain name purchasing I am about to make a site regarding programming and technology.Being very new to the webmasters field I don't know much about what features does the .com,.info,.org etc

@Pierce454

Posted in: #Domains

2 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Lee4591628

You can check the User Agent HTTP Header. www.user-agents.org/ is a good place for identifying who are the crawlers.

You can also read more about logging in Apache. You can generate a special log for a list of user agents (bots) for example.

10% popularity Vote Up Vote Down

@Si4351233

Search engine robots are, as far as the client is concerned, no different from any other user-agent. Indeed is worth noting that many search engines (Google in particular) can get unhappy if their robots are served different content than regular visitors. This means that they tend to use generic user agent strings (e.g. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)), but usually with some detail buried deeper as in the provided example.

The best way of detecting such robots is to use an IP filter. You'll need to either compile your own list or rely on one like this.

Using such a list should enable you to handle all major search engine robots. Adding rewrite rules based on IP is also fairly simple so it should meet your requirements. Just be sure to update it every once in a while.

10% popularity Vote Up Vote Down

Feed

: How to identify the client is a search robot? I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation,

More posts by @Pierce454

: I say that moneybookers is also used a lot and its simple. http://www.moneybookers.com Maybe in the future google checkout maybe is also good, but for the moment is very limited. More to check

: The GPL doesn't forbid commercial usage. You "just" have to turn all your project with GPL license. With a LGPL component, this is not a problem. Try to look for articles under Creative

: Is it fine to share your blog everywhere? I am trying to perform some SEO on my blog. I want to to keep sharing the blog on FaceBook and other social networks, sharing the link on different

: Domain name purchasing I am about to make a site regarding programming and technology.Being very new to the webmasters field I don't know much about what features does the .com,.info,.org etc

Login to post a comment!

2 Comments

Back to top | Use Dark Theme