Mobile app version of vmapp.org
Login or Join
Hamm4606531

: Allow bots to read dynamically injected content I've got a pretty large Angular SPA, and I currently use ?_escaped_fragment_ to serve up static versions of all our pages. I've discovered, however,

@Hamm4606531

Posted in: #Ajax #AngularJs #RobotsTxt #Seo #WebCrawlers

I've got a pretty large Angular SPA, and I currently use ?_escaped_fragment_ to serve up static versions of all our pages. I've discovered, however, that this often has issues with newly deployed/updated pages (prerendered pages still have cached references to the old css which, since we name our css according to the deployment version, no longer exists ... so Google then looks at pages with no styling which makes them look like link-heavy garbage).

We could implement some work-arounds to get the prerendering to work, but I'd love to see if google can just crawl our ajax.

Here's my issue...

We currently "disallow: /api/" in our robots.txt because we don't want our api to be public. But our dynamically injected depends on info from our api, so in our "Fetch and Render", the GoogleBot gets a 404 because anytime it tries to pull info from the api, it gets blocked.


The browser user-agent (the right pane on Fetch and Render) renders it fine, but the google user-agent just shows a 404.

Any ideas on how to get around this? Do I have some basic misunderstanding of crawlers? I'm really stumped here...

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Hamm4606531

2 Comments

Sorted by latest first Latest Oldest Best

 

@Caterina187

Perhaps you should consider something like this code (for Apache) from the HTML5 Boilerplate:

# ----------------------------------------------------------------------
# | Filename-based cache busting |
# ----------------------------------------------------------------------

# If you're not using a build process to manage your filename version
# revving, you might want to consider enabling the following directives
# to route all requests such as `/style.12345.css` to `/style.css`.
#
# To understand why this is important and even a better solution than
# using something like `*.css?v231`, please see:
# www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/
# <IfModule mod_rewrite.c>
# RewriteEngine On
# RewriteCond %{REQUEST_FILENAME} !-f
# RewriteRule ^(.+).(d+).(bmp|css|cur|gif|ico|jpe?g|js|png|svgz?|webp|webmanifest)$ . [L]
# </IfModule>


That code ensures the request is directed to the appropriate file without developers having to manually rename it and worry about 404s.

That way Google would see HTML, CSS and JS.

10% popularity Vote Up Vote Down


 

@Si4351233

The think you have to be aware of here is that while Google is able to crawl and render AJAX injected content it needs access to all stylesheets and javascript files used for the process in order to process the page correctly. As long as there are no hyperlinks from your site or other sites into the API directory and it is only being done through AJAX importing of content then Google won't show the API directory in search results.

As for the CSS missing issue the best practice is to retain the previous version of all static content files until all caches have expired and the new files are being used. For all my sites I maintain the last 3 versions of all static files on the server just in case someone accesses an old file instead so that it can still be accessed, and particularly when there is a chance that some aspect of the website will still have a reference to the old static file.

But the main thing with your question is that you need to allow Googlebot access to the /api/ directory so that it can access the relevant content being requested by AJAX and crawl and index it appropriately.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme