
: How my robots.txt should look like for a single page app I can understand how to disallow bots to crawl some pages/folder in normal application. For example for google-bot it is nicely described
I can understand how to disallow bots to crawl some pages/folder in normal application. For example for google-bot it is nicely described here.
But what should I do if I have a single page application (the one that uses only ajax to upload new content and has routing and page generation on the client). How to make it crawlable is described here and here, but what if I do not a bot to follow some links (that are on my starting page)? By this I mean the following:
When SPA is loaded for the first time it loads some basic HTML. This html can have specific links like:
home (#!home/)
about (#!about/)
news (#!news/)
but I do now want a bot to crawl #!about link.
More posts by @Lee4591628
1 Comments
Sorted by latest first Latest Oldest Best
I have found a way to do exactly what I want. It is nicely documented by google:
When your site adopts the AJAX crawling scheme, the Google crawler
will crawl every hash fragment URL it encounters. If you have hash
fragment URLs that should not be crawled, we suggest that you add a
regular expression directive to your robots.txt file. For example, you
can use a convention in your hash fragments that should not be crawled
and then exclude all URLs that match it in your robots.txt file.
Suppose all your non-indexable states are of the form
' #DONOTCRAWLmyfragment . Then you could prevent Googlebot from crawling these pages by adding the following to your robots.txt:
Disallow: /*_escaped_fragment_=DONOTCRAWL
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.