Mobile app version of vmapp.org
Login or Join

Login to follow query

More posts by @Alves908

1 Comments

Sorted by latest first Latest Oldest Best

 

@Nickens628

Check your server logs and look for multiple entries from the same IP and see what is being requested.

In a standard apache setup, you will want to look at the access_log file. You will see the IP address then in square brackets, you'll see the date and time the request happened, then in between "GET" and "HTTP" you will see the resource being requested. There should also be a number for the status code. if it is 200, then its a valid resource and that entry should be taken into consideration.

Generally scrapers request data at a high rate, so look for the date and time along with the IP address, and if that is the same multiple times while the requests are different each time, then a temporary solution would be to block the IP address in question to prevent it from continuing to scrape your content.

A good way to make the scrapers look bad is to code your HTML so that the links to your assets point directly to your server.

For example, instead of having this:

<img src="someimage.jpg" width="100" height="100" alt="picture">


You would have this:

<img src="http://example.com/someimage.jpg" width="100" height="100" alt="picture">


That way, if the scrapers steal all your HTML code and then try to host your website, they will have a harder time doing so, especially if you apply some form of hotlink protection to your images.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme