Mobile app version of vmapp.org
Login or Join

Login to follow query

More posts by @Gonzalez347

2 Comments

Sorted by latest first Latest Oldest Best

 

@Pope3001725

Most bots don't use wget but rather a crawler, and you can advise them to go away by publishing in your site a robots.txt file. There are some rogue bots that won't honor your robots.txt and these will have to be explicitly blocked. You can identify those by reviewing your web server's logs, but also by applying publicly available blacklists.

10% popularity Vote Up Vote Down


 

@Ann8826881

Yes and No. Let me explain.

Any user can change the User Agent string that Wget uses. If the string is not changed, then Wget can easily be captured using the following in your .htaccess file.

RewriteCond %{HTTP_USER_AGENT} wget.* [NC]
RewriteRule .* - [F,L]


However, if the User Agent string is changed, then you may never know that it is Wget.

That being said, one thing is clear to any web master who has been around for a while - the log file must be examined often for abusive activity. It is possible to block bad actors as they come along and yet impossible to block them ahead of time though you can get a lot of them.

You will be able to stop most all abuses if you watch your site access logs often enough and know how to use .htaccess and regular expressions. It is not a difficult process and should be well understood by any web master.

While some will argue purely from a philosophical point of view, the fact is that Wget should be blocked in most cases. In all the years I have dealt with the web (and that is a lot more than almost anyone), Wget has no purpose to a user except to scrape resources from a website. While some sites do open themselves up to this form of activity and actually invite it, all accesses I have experienced with Wget have been a form of abuse or theft.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme