Mobile app version of vmapp.org
Login or Join
Lee4591628

: How my robots.txt should look like for a single page app I can understand how to disallow bots to crawl some pages/folder in normal application. For example for google-bot it is nicely described

@Lee4591628

Posted in: #Ajax #RobotsTxt

I can understand how to disallow bots to crawl some pages/folder in normal application. For example for google-bot it is nicely described here.

But what should I do if I have a single page application (the one that uses only ajax to upload new content and has routing and page generation on the client). How to make it crawlable is described here and here, but what if I do not a bot to follow some links (that are on my starting page)? By this I mean the following:

When SPA is loaded for the first time it loads some basic HTML. This html can have specific links like:


home (#!home/)
about (#!about/)
news (#!news/)


but I do now want a bot to crawl #!about link.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Lee4591628

1 Comments

Sorted by latest first Latest Oldest Best

 

@Chiappetta492

I have found a way to do exactly what I want. It is nicely documented by google:


When your site adopts the AJAX crawling scheme, the Google crawler
will crawl every hash fragment URL it encounters. If you have hash
fragment URLs that should not be crawled, we suggest that you add a
regular expression directive to your robots.txt file. For example, you
can use a convention in your hash fragments that should not be crawled
and then exclude all URLs that match it in your robots.txt file.
Suppose all your non-indexable states are of the form
' #DONOTCRAWLmyfragment . Then you could prevent Googlebot from crawling these pages by adding the following to your robots.txt:

Disallow: /*_escaped_fragment_=DONOTCRAWL

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme