Mobile app version of vmapp.org
Login or Join
Kimberly868

: Is it possible to exclude Data-URIs (data:image/xxx) through robots.txt? At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI. But until it

@Kimberly868

Posted in: #DataUri #RobotsTxt

At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI.

But until it is possible I like to know if it is possible to exclude Data-URIs of being crawled. For example:

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="trans-gif" />


or

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />


Do I need to add /data: or data: to my robots.txt or isn't it supported at all?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Kimberly868

2 Comments

Sorted by latest first Latest Oldest Best

 

@Jamie184

We are seeing requests from both BingBot and YahooSlurp for urls like: /folder/path/data:image/gif;base64,...


If you are actually getting an HTTP request for these resources then you should be able to block them in robots.txt, providing robots.txt is being honoured for these "strange" requests (Bing and Yahoo! should), with something like:

User-agent: *
Disallow /folder/path/data


Or, more generically using a * wildcard (an extension to the original robots.txt "standard" but is reportedly supported by Bing and Yahoo!):

User-agent: *
Disallow /*data:image/


Although, as John suggests in his answer, there really shouldn't be an HTTP request. Nothing is required from the server in order to get the resource (they already have it). I can't actually imagine a server ever responding with a valid response (at least not intentionally)?

Is this perhaps just a bot "glitch"?

10% popularity Vote Up Vote Down


 

@Sarah324

It's not supported at all since it is a not a HTTP call. It is embedded in the document and is essentially part of the HTML.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme