Mobile app version of vmapp.org
Login or Join
Bryan171

: Disallowed images in the robots.txt of my Joomla site can't be displayed when shared in Facebook I have noticed that since I have disallowed images using the robots.txt in my Joomla site, when

@Bryan171

Posted in: #RobotsTxt

I have noticed that since I have disallowed images using the robots.txt in my Joomla site, when sharing an article in Facebook, the image will not be displayed. Why is that? Is it indeed related?

My robots.txt file:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Bryan171

1 Comments

Sorted by latest first Latest Oldest Best

 

@Jamie184

As moobot suggests, it would certainly make sense that the Facebook bot/crawler (it needs to crawl your page in order to build/show an image of the page) obeys robots.txt and your robots.txt currently blocks all bots from crawling your images.

If the FB bot didn't obey robots.txt then it would potentially be a bad-bot and is likely to get blocked by many sites.

You can try adding an exception to allow only FB to crawl your images:

# All user agents not matched elsewhere
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

# Facebook exception - same as above, except allow images
User-agent: facebookexternalhit
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/ #Disallow : /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/


Note I have comment out Disallow: /images/ from the facebookexternalhit group, so images are not blocked.

Any one robot/crawler will match only 1 group within the robots.txt file. It will match the most specific group. Only if it doesn't match a specific group will it match the generic User-agent: * group (the order does not matter).

FYI, I believe the full user-agent string for the FB crawler is (although it can be both http and https):

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme