Mobile app version of vmapp.org
Login or Join
Shakeerah822

: Understanding Ajax crawling of search site I have a couple of questions about Ajax crawling of a website, which is kind of search engine itself. The base article explains the mechanism of making

@Shakeerah822

Posted in: #Ajax #Google #Seo #Url #WebCrawlers

I have a couple of questions about Ajax crawling of a website, which is kind of search engine itself.
The base article explains the mechanism of making AJAX applications crawlable. All this stuff with HTML-snapshots is clear and easy to implement, but I can't understand where the Google Bot "finds pretty AJAX URLs"( ie example.com/ajax.html#key=value) to index them.

First thing, that came to mind are breadcrumbs. In a sitemap we are able to specify pages with breadcrumbs/links on them, so the search engine bots may crawl these pages and get HTML-snapshots from there.
But I'm sure, there are other ways to give bots these "pretty AJAX URLs".

In our case, we have a simple search site, where users enter keywords, press on "Find", js executes the Ajax request, receives a JSON responce and fill page with results (without any refresh of course).
In this case - how can we make sure that the Google bot crawls all the search results in addition to adding a sitemap with links to all search results?

Is there some example of solution, described in article above?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Shakeerah822

2 Comments

Sorted by latest first Latest Oldest Best

 

@Carla537

Just create an XML sitemap of your actual content pages (the job vacancies), submit it to Google and use a script to keep it up to date. That's all you need to do.

Generally, trying to get Google to index site search result pages is pointless. Google is a search engine, and is perfectly capable of indexing your content pages directly. Letting one search engine (Google) index the result pages of another search engine (your site search) is just silly and adds a needless layer of indirection. At worst, it could even lead to significant problems:


If your site search results were crawlable, Googlebot might decide to spend all its time crawling the nearly infinite space of possible search results, leaving it very little time to index your actual content pages.
Also, since the bot would have so many pages to crawl, it would recrawl each of them only rarely. If your site wasn't completely static, this would mean that most pages from your site in Google's index would be stale. This could lead to a lousy user experience and a drop in conversions, as Google would be sending users to pages on your site that no longer have the content they searched for.

(You can see this happening with some badly designed sites that e.g. let Google crawl their "most recent posts about X" lists. You'll see the page in Google's results, the snippet shows the keywords you were looking for, but when you actually click through, the content you saw in the snippet is no longer anywhere to be seen.)


For these reasons, it is actually often recommended that you deliberately forbid external search engines like Google from crawling your site search results using robots.txt, even if they could technically do so. For example, Google's Webmaster Tools Help pages say:


"Consider using a robots.txt file to block Googlebot's access to problematic URLs. Typically, you should consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars."


The one exception is where you have a relatively small set of keywords (or "categories" or "tags", like these ones) that associate related pages into lists that users might want to browse. (Here, "relatively small" means a finite number, preferably less than the actual number of content pages on your site, as opposed to "anything a user might think to search for".) In that case, you may want to include those list pages in your sitemap (perhaps with a lower priority than your content pages) so that Google will find and index them too.

You should also consider adding non-AJAX links to those pages from your front page (and from any other page where they might be useful) so that even users with JavaScript disabled (or not supported by their browser) can find them. If the user does have JS enabled, you can use it to replace the links with the corresponding AJAX interface.

In fact, more generally, it's a good idea to design your site so that it works (as far as technically possible) even without JavaScript. This not only makes your site friendly to users with odd browsers (like blind users using an audio browser or lynx with a Braille terminal) or who surf with JS turned off by default for security or performance reasons, but it also automatically makes it accessible to search engines, which see your site much like a user with a JS-less text-only browser would.

10% popularity Vote Up Vote Down


 

@Goswami781

Well...

this is a site search you're talking about indexing... and as such would require human interaction to generate/input search terms...

Upon google spidering your site, whether ajax or normal html, it wouldn't hit your site with search queries. It would only hit url's that you specify in anchors/markup.

To this end, the only way I can see that would allow you to get google to index these pages, would be for you to somehow capture users queries, and then render these into your markup somewhere for eventual spidering by bots.

I've put this in an answer rather than a comment, because I do feel this answers the question, although not in a very positive way I guess.

Good Luck...!

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme