Mobile app version of vmapp.org
Login or Join
Jessie594

: How to crawl a webPage with dynamic content added by javascript I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to

@Jessie594

Posted in: #Javascript #WebCrawlers

I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to fully crawl a webpage which has lazy loading feature enabled. I am using Apache Nutch to crawl websites but I don't think it has the capability to fetch the URLs being injected in HTML page by javascript when the page is scrolled down. I see a lot of websites doing lazy loading for performance issue. So Can somebody please explain me how can i crawl the data which comes in HTML page on lazy load. (On scrolling the page down).

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Jessie594

2 Comments

Sorted by latest first Latest Oldest Best

 

@Ravi8258870

You can use some javascript parser at your server-side code(crawler) and parse the javascripts to fetch all the Ajax requests and then also crawl them.
One of them is google-caja.

Try it. May be it will solve your purpose.

10% popularity Vote Up Vote Down


 

@Phylliss660

Googlebot understands links "hidden" with JS. But the LazyLoad just makes the browser render the content after the initial page load. The HTML is still there. So your bot should have no issue scanning it, since JS is client-side.

If you are having trouble with heavily-JS'd links, check them with parsechecker to see how the filters handle them, and adjust them accordingly.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme