Mobile app version of vmapp.org
Login or Join
Kimberly868

: Why do people crawl sites without downloading pictures? Let me show you what I mean: IP Pages Hits Bandwidth 85.xx.xx.xxx 236 236 735.00 KB

@Kimberly868

Posted in: #WebCrawlers

Let me show you what I mean:

IP Pages Hits Bandwidth
85.xx.xx.xxx 236 236 735.00 KB
195.xx.xxx.xx 164 164 533.74 KB
95.xxx.xxx.xxx 90 90 293.47 KB


It's very clear that these person are crawling my site with bots. There's no way that you could visit my site and use <1MB bandwidth. You might say that there's the possibility that they could be browsing the site using some browser or plug-in that does not download images, js/css files, etc., but the simple fact of the matter is that there are not 90-236 pages that are linked from the home page (outside of WP files), even if you visited every page twice.

I could understand if these people were crawling the site for pictures, but once again, the bandwidth indicates that this isn't what is happening. Why, then, would they crawl the site to simply view the HTML/txt/js/etc. files?

The only thing that I can come up with is that they are scanning for outdated versions of WordPress, SQL injection vulnerabilities, etc., which makes me inclined to outright ban the IPs, but I'm curious, is it possible that this person is a legitimate user, or at the very least, not intending to be harmful?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Kimberly868

2 Comments

Sorted by latest first Latest Oldest Best

 

@Debbie626

I've noticed several IPs spidering my company's main web sites in entirety over the past 12 hours - it's nothing unusual. I suspect that as at least one of them appears to be a regular residential cable broadband connection here in the UK, they may well be people performing SEO-related indexing in order to gauge keyword effectiveness before trying to game a keyword in Google (and our sites are just one of many hits from their machine as part of that).

Either that or they're looking for vulns!

10% popularity Vote Up Vote Down


 

@Ogunnowo487

Some bots simply crawl just sites to index content. They're trying to do something like Google crawler and they do try it on your site. Or they do download it with

file_get_contents("yoursite");


and similar methods which do not retreive images/css/js

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme