: How to crawl a webPage with dynamic content added by javascript I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to
I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to fully crawl a webpage which has lazy loading feature enabled. I am using Apache Nutch to crawl websites but I don't think it has the capability to fetch the URLs being injected in HTML page by javascript when the page is scrolled down. I see a lot of websites doing lazy loading for performance issue. So Can somebody please explain me how can i crawl the data which comes in HTML page on lazy load. (On scrolling the page down).
More posts by @Jessie594
2 Comments
Sorted by latest first Latest Oldest Best
You can use some javascript parser at your server-side code(crawler) and parse the javascripts to fetch all the Ajax requests and then also crawl them.
One of them is google-caja.
Try it. May be it will solve your purpose.
Googlebot understands links "hidden" with JS. But the LazyLoad just makes the browser render the content after the initial page load. The HTML is still there. So your bot should have no issue scanning it, since JS is client-side.
If you are having trouble with heavily-JS'd links, check them with parsechecker to see how the filters handle them, and adjust them accordingly.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.