Mobile app version of vmapp.org
Login or Join
Holmes151

: Crawling other websites - Can I do so? I want to build a meta search engine, so I want to crawl other websites and filter/organize the info I get and present them to the user. My questions

@Holmes151

Posted in: #WebCrawlers

I want to build a meta search engine, so I want to crawl other websites and filter/organize the info I get and present them to the user.

My questions are:


Can I do that(crawl) without having the allowance of the other websites? Does it make a difference where the other website is hosted? I mean, google does basically the same.
If I'm allowed to crawl the other websites: Can I sell premium accounts on my website? To get the newest data for example. I unsure since I'm working with data from others.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Holmes151

1 Comments

Sorted by latest first Latest Oldest Best

 

@LarsenBagley505

There are several issues that you will encounter here and they get more and more complex as you go...

As @closetnoc states in his comment actually crawling a site is very expensive resource wise as you are downloading entire web pages, often in parallel, and then you need to have written the crawler to extract the given data you are after. In order to crawl the internet and maintain a reasonably up to date index Google maintains a mind boggling number of servers in data centers all over the world, and while Google does not advertise the number of servers they use conservative estimates based on power usage figures released by Google place the number at somewhere from 900'000 to well over a million servers.

You will also encounter copyright issues, and while an argument could be made that the content was extracted under fair use this wouldn't prevent someone from attempting to sue you for copyright infringement and you needing to mount an expensive legal defence. This would also be further complicated if you intend on selling premium subscriptions to your site to provide access to data sooner than to free users as you are effectively profiting off someone else's copyrighted works.

The management of such an endevour would require a very large team of database engineers, database administrators, server administrators, server technicians, network engineers, programmers, designers, data analysts, etc.

These are just some of the issues that could cause you issues in the early stages of your project and these issues would expand hugely as the project continued.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme