: Is it possible to exclude Data-URIs (data:image/xxx) through robots.txt? At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI. But until it
At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI.
But until it is possible I like to know if it is possible to exclude Data-URIs of being crawled. For example:
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="trans-gif" />
or
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
Do I need to add /data: or data: to my robots.txt or isn't it supported at all?
More posts by @Kimberly868
2 Comments
Sorted by latest first Latest Oldest Best
We are seeing requests from both BingBot and YahooSlurp for urls like: /folder/path/data:image/gif;base64,...
If you are actually getting an HTTP request for these resources then you should be able to block them in robots.txt, providing robots.txt is being honoured for these "strange" requests (Bing and Yahoo! should), with something like:
User-agent: *
Disallow /folder/path/data
Or, more generically using a * wildcard (an extension to the original robots.txt "standard" but is reportedly supported by Bing and Yahoo!):
User-agent: *
Disallow /*data:image/
Although, as John suggests in his answer, there really shouldn't be an HTTP request. Nothing is required from the server in order to get the resource (they already have it). I can't actually imagine a server ever responding with a valid response (at least not intentionally)?
Is this perhaps just a bot "glitch"?
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.