: What are some potential issues in blocking all incoming requests from the Amazon cloud? Recently I, along with the rest of the world, have seen a significant increase in what appears to be
Recently I, along with the rest of the world, have seen a significant increase in what appears to be scraping from Amazon AWS-related sources.
So simply put, I blocked all incoming requests from the Amazon cloud for our hosted application.
I know that some good services/bots are now hosted on the cloud, and I'm wondering if certain IP addresses should be allowed, as they may gather data that would in the end benefit our site's SEO rankings?
-- UPDATE --
I added a feature to block requests from the following hosts:
Amazon
Softlayer
ServerDeals
GigAvenue
Since then, I have seen my network traffic decrease (monitored by network out bytes). Average operation is around 10,000,000 bytes.
You can see where last week I was not blocking, then started blocking. I've since removed the blocks and will see what the outcome is.
More posts by @Candy875
2 Comments
Sorted by latest first Latest Oldest Best
It is possible that some sites that link to you may run regular checks on their outbound links to make sure that they still work; if you block them there is a chance their script will see your link as dead and remove it, hurting your SEO.
It is also possible that smaller search engines may use 'the cloud' to gather data, and blocking them would hurt your ability to get potential hits from them.
Finally, it is possible that any given IP will be assigned to a different source in the future that could send links and/or traffic your way.
These risks must be weighed against the potential costs of A) increased bandwidth and load times of bots that DON'T offer the above links and traffic, and B) competitive cost of bots gathering SEO data on your site that you don't use but competitors using those services might.
In any individual case the risk of either side may be small, so you may want to play it safe and not block cloud IPs unless you have a strong case for doing so. You can also get the benefits of some blocking without the costs by examining server logs and using htaccess to block specific user agents that belong to organizations that you don't use and don't send you any links or traffic. (Of course, this will not block the least useful bots that masquerade as common browsers without more advanced tecniques.)
I have also found myself in this position and wanted to give you my feedback:
I've spent hours researching and refining my ip tables to get the best of both worlds, but in reality it's not possible to have any real accuracy and I have decided to take the scraping hit but allow full access to AWS.
Like you have stated, AWS is a very popular platform for many systems and without wanting to take any risks, I have taken the safe path and chosen to remove my block.
I'm sure in the near future AWS will host many useful bots and services and blocking it would not be an option.
This problem is like trying to tackle spam email from "gmail" for example. 90% of the email sent to our company address from @gmail is spam, but we wouldn't want to block all of gmail's servers. So instead we have to manually block each address.
I'm sure you agree this approach is best but not the most practical.
Regards,
William
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.