: Rule out third party scraping, but allow Google crawling How to make scraping of own content through wget, httrack etc. impossible, but allow crawling through googlebot? This should be done without
How to make scraping of own content through wget, httrack etc. impossible, but allow crawling through googlebot?
This should be done without showing to googlebot other content, as to other user agents.
And, please, better avoid IP recognition in your advices, if this is in general possible!
In current setup it works already based on IP recognition and the server goes periodically down. The setup is like:
first layer: nginx as caching,
second layer: apache with mod_security. mod_security makes IP recognition and manages traffic,
third layer: tomcat with CMS).
The main bottleneck is currently mod_security, and, partly, the way from mod_security to tomcat. Setup change is outside of manifold including viable solutions.
More posts by @Eichhorn148
1 Comments
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.