Mobile app version of vmapp.org
Login or Join

Login to follow query

More posts by @Rivera981

2 Comments

Sorted by latest first Latest Oldest Best

 

@Phylliss660

An internal web crawler can be used for the following purposes:


Creating a Localized Search Engine.
Scanning Pages to Detect Dead Links, where the target page was deleted or unpublished from a CMS, but the links to that page still exist in other pages.
Finding Breadcrumb Click Trail Paths in massive websites.
Finding Orphaned Pages in CMS generated websites.
Load Testing from multiple server locations & different countries.
Unit Testing Pages, which have Jasmine like Green Dots at the bottom of the pages.
Detecting SEO Optimizations on Various Pages, like missing Meta tags.
Generating Customized Reports, which log file analysis tools might not create.
Spell-Checking Pages when working on large sites, where editors create content in a CMS and may not spell-check their articles.
Automatic Foreign Language Page translation (Internationalization AKA "i18n" purposes).
Showing Off Your Programming Smarts to Your I.T. Co-Workers
Anything else that you'd like your bot to test out on massive websites, where database queries might not be as beneficial as an internal web crawler.


You don't have to run a web crawler on your production box. You can run it on a sandbox, like staging or test to see how it would perform under load. Then you could run up 10,000 virtual users & see what the resulting metrics data looks like. You can look at & analyze anything that you want to with any public HTML data, which is on a website.

Most of the time external web crawlers are used by search engines to find content on other people's websites. So unless you're looking to build a search bot & a search engine, you may want to see if any of those internal purposes might be of benefit to you. They can be useful on massive 1,000+ page websites, but won't be beneficial on small 10 page websites. It takes a lot of time to write & debug webcrawlers. So plan wisely, before deciding to build one from scratch!

10% popularity Vote Up Vote Down


 

@Si4351233

Using a crawler on your own site can be good if you get a lot of pages on your site which don't necessarily have an entry in a database to search, this is effectively a way to implement site search for your site. It can also be used where you need a single unified search for a number of separate web properties. I implemented this for a university which had over 100 separate sites under them, some in sub domains some in completely separate domains, all on different servers with different databases. It provided a single unified search tool to search the entire universities online presence easily.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme