Mobile app version of vmapp.org
Login or Join
Becky754

: Should we modify our Joomla robots.txt after Google's announcement on crawling of CSS and JavaScript? Have come across an announcement from Google: http://googlewebmastercentral.blogspot.in/2014/10/updating-our-technical-webmaster.htm

@Becky754

Posted in: #GoogleSearchConsole #Joomla #RobotsTxt #Seo

Have come across an announcement from Google: googlewebmastercentral.blogspot.in/2014/10/updating-our-technical-webmaster.html
It states:


For optimal rendering and indexing, our new guideline specifies that you should allow Googlebot access to the JavaScript, CSS, and image files that your pages use. This provides you optimal rendering and indexing for your site. Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.


By default, Joomla’s robots.txt file comes with disallowing:

Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/


Please advise, shall we remove below items from robots.txt file based on Google’s announcement?

Disallow: /components/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/


Is this is what is recommended as per announcement for Joomla based sites?

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @Becky754

5 Comments

Sorted by latest first Latest Oldest Best

 

@Connie744

The most recent versions of Joomla no longer block the /media/ and /templates/ folders:

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/


Not all extensions stick to the guidelines of where to place CSS and JS files etc, so a good work around is to allow Google to access these files regardless of where they are found.

You can achieve this by inserting a few lines to the start of your robots.txt file like this:
#Googlebot
User-agent: Googlebot
Allow: *.css
Allow: *.js

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/


EDIT:

Thanks @w3dk and @Stephen Ostermiller for the feedback! You are quite right. It is better to do something like this:

User-agent: *
Allow: *.css
Allow: *.js
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/


Unfortunately this does not seem to work as intended because the longer (more specific) rules override the shorter rules and the allow lines are ignored. It doesn't seem to make any difference whether the allow lines follow the disallow lines or vice versa.

The only way I can seem to get around this is by doing something like this which seems to work when I test it in Webmaster Tools:

User-agent: *
Allow: /************************************************************.css
Allow: /************************************************************.js
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/


EDIT 2 - BEST SOLUTION:

OK, so I did a little more research and found the answer at stackoverflow.com/a/30362942/1983389
It appears the most correct and most supported solution across all web crawlers is something like the following (allowing access to *.css and *.js files in the /bin, /cache, /installation, /language, /logs, and /tmp folders and possibly some of the other folders makes little sense):

User-agent: *
Allow: /administrator/*.css
Allow: /administrator/*.js
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Allow: /cli/*.css
Allow: /cli/*.js
Disallow: /cli/
Allow: /components/*.css
Allow: /components/*.js
Disallow: /components/
Allow: /includes/*.css
Allow: /includes/*.js
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Allow: /layouts/*.css
Allow: /layouts/*.js
Disallow: /layouts/
Allow: /libraries/*.css
Allow: /libraries/*.js
Disallow: /libraries/
Disallow: /logs/
Allow: /modules/*.css
Allow: /modules/*.js
Disallow: /modules/
Allow: /plugins/*.css
Allow: /plugins/*.js
Disallow: /plugins/
Disallow: /tmp/

10% popularity Vote Up Vote Down


 

@Megan663

In Joomla 3.3, these lines have been removed from the robots.txt file :

Disallow: /templates/
Disallow: /media/


More info here : www.energizethemes.com/blog/joomla/have-you-updated-the-joomla-robots-txt-file.html

10% popularity Vote Up Vote Down


 

@Cofer257

Apart from the overall use/lack thereof, of robots.txt in a well managed Joomla site, with "good" third party extensions — the only places that should contain CSS, JS or images are:

/images
/media
/templates


and of course their sub-directories.

So, you could just remove those from robots.txt.

10% popularity Vote Up Vote Down


 

@Alves908

If you see your pages without errors when fetching as Google in WMT, then you're probably fine. But, in future, you might upgrade some content on your website, which will demand some scripts/css from some of blocked folders. Therefore, I think you might be better with allowing search engines to crawl all these folders containing CSS/JavaScript.

10% popularity Vote Up Vote Down


 

@Cofer257

Honestly you are better off removing everything from your robots.txt. As far as I can see, all PHP files in Joomla contain the line

defined('_JEXEC') or die;


Which means if you load a PHP file directly in the browser all you get is a blank file, which search engines will ignore. (They shouldn't ever come across these anyway unless you linked them directly.)

The problem with leaving some of these directories blocked is that some components and modules keep their CSS/JS files inside those respective directories and not in the preferred media or images folders.

So there is no reason to block any Joomla files from Google.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme