: Is it possible to exclude Data-URIs (data:image/xxx) through robots.txt? At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI. But until it

At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI.

But until it is possible I like to know if it is possible to exclude Data-URIs of being crawled. For example:

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="trans-gif" />

or

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />

Do I need to add /data: or data: to my robots.txt or isn't it supported at all?

10.02% popularity Vote Up Vote Down

: Why would certain metrics drop in half while others double? (iPad only) I'm a developer on a website. Normally I don't work with analytics much, but we rolled a recent update that's causing

@Kimberly868

Posted in: #GoogleAnalytics #Ipad

1 Comments

: Batch edit WordPress blog posts? I want to change the theme on my WordPress blog to a responsive design one. At the moment however I have used some recurring html at the top of every blog

@Kimberly868

Posted in: #Wordpress

1 Comments

: What are proper process to disallow a site from google crawl I have done google and get the two solution for disallow whole a site from google crawl. 1: User-agent: * Disallow:

@Kimberly868

Posted in: #Google #RobotsTxt #WebCrawlers

1 Comments

: SEO advantages of /blog when forwarding via ProxyPass We've had requests from clients via their SEO companies to host blogs accompanying ecommerce sites on theirdomain.com/blog rather than on -

@Kimberly868

Posted in: #Apache #Blog #Seo #WebHosting #Wordpress

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Jamie184

We are seeing requests from both BingBot and YahooSlurp for urls like: /folder/path/data:image/gif;base64,...

If you are actually getting an HTTP request for these resources then you should be able to block them in robots.txt, providing robots.txt is being honoured for these "strange" requests (Bing and Yahoo! should), with something like:

User-agent: *
Disallow /folder/path/data

Or, more generically using a * wildcard (an extension to the original robots.txt "standard" but is reportedly supported by Bing and Yahoo!):

User-agent: *
Disallow /*data:image/

Although, as John suggests in his answer, there really shouldn't be an HTTP request. Nothing is required from the server in order to get the resource (they already have it). I can't actually imagine a server ever responding with a valid response (at least not intentionally)?

Is this perhaps just a bot "glitch"?

10% popularity Vote Up Vote Down

@Sarah324

It's not supported at all since it is a not a HTTP call. It is embedded in the document and is essentially part of the HTML.

10% popularity Vote Up Vote Down

Feed

: Is it possible to exclude Data-URIs (data:image/xxx) through robots.txt? At the moment Google (I can not find similar informations for Bing, Yahoo!, etc.) does not index Data-URI. But until it

More posts by @Kimberly868

: Why would certain metrics drop in half while others double? (iPad only) I'm a developer on a website. Normally I don't work with analytics much, but we rolled a recent update that's causing

: Batch edit WordPress blog posts? I want to change the theme on my WordPress blog to a responsive design one. At the moment however I have used some recurring html at the top of every blog

: What are proper process to disallow a site from google crawl I have done google and get the two solution for disallow whole a site from google crawl. 1: User-agent: * Disallow:

: SEO advantages of /blog when forwarding via ProxyPass We've had requests from clients via their SEO companies to host blogs accompanying ecommerce sites on theirdomain.com/blog rather than on -

Login to post a comment!

2 Comments

Back to top | Use Dark Theme