Mobile app version of vmapp.org
Login or Join
Murray432

: Determine Original Domain from CDN URL Given a URL with a CDN domain (e.g. a1856.g2.akamai.net), is there a way to find out the original webpage/service that was requested when it was delivered

@Murray432

Posted in: #Cdn #Dns

Given a URL with a CDN domain (e.g. a1856.g2.akamai.net), is there a way to find out the original webpage/service that was requested when it was delivered via that CDN?

I am monitoring my internet usage and the monitoring software spits out URLs visited. Some of these are obvious while others delivered via CDNs are not.

e.g.

xkcd.com - the user obviously visited the XKCD website
a1856.g2.akamai.net - I have no idea, this is what I would like to know more about.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Murray432

2 Comments

Sorted by latest first Latest Oldest Best

 

@Sherry384

Short answer

You can't in general. Many "original domains" could be mapped to the same CDN host. In a sense if all you've got is the information you mentioned in your question, it's not enough and "too late" to do anything.

Longer answer

There's some hope if the traffic is in clear. The point is: a CDN is often use to serve "heavy" traffic, while lightweight "initiating" traffic is done through a regular domain and that one mentions the CDN urls.

For example, in the case of browsing activity (in clear or decipherable), you might get away with some tool that parses sniffed web pages to extract links and do the correlation. The principle is: if that CDN was called through some URL, probably some webpage served by the original domain mentioned that exact URL. It will not work for computed URLs via JavaScript etc if the parser does not execute JavaScript, but when it finds a correlation it is most certainly correct.

This may not always work. For example, for at least ten years some software has been downloading their updates through a CDN. In that case there may be no prior webpage to do the correlation (though the software may calls home domain to get the CDN URL, in which case the solution above may catch it).

The assumption of web pages may be weakened by just looking for anything that looks like a URL in traffic instead of assuming well-formed HTML pages. Some TCP traffic that is not web but still in clear would let you link the CDN URL back to the original domain.

If the traffic is encrypted, that kind of information is hidden. Some tools might still find some weaker correlation, like "what domain was requested by same client IP prior to the CDN domain" but such correlation won't be certain.

All in all, you need more information than just "URLs visited".

Tools? Try Justsniffer

I haven't used any tool but one might start with Justniffer.

For http traffic it produces Apache-like log with referer field (see Examples). Here, a query to google.com refered to another URL starting with /csi:

192.168.2.2 - - [15/Apr/2009:17:20:18 +0200] "GET /csi?v=3&s=web&action=&tran=undefined&ei=MvvlSdjOEciRsAbY0rGpCw&e=19592,20292&rt=prt.175,xjs.557,ol.558 HTTP/1.1" 204 0 "http://www.google.it/search?q=subversion+tagging&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8 Gecko/2009032711 Ubuntu/8.10 (intrepid) Firefox/3.0.8)"


So, this tool can correlate between two queries. Looks like it has potential to do what you need.

Moreover, Justniffer:


It is extensible

Can be extended by external scripts. A python script has been
developed to recover all files sent via HTTP (images, text, html,
javascript, etc.).


Look at justniffer-grab-http-traffic


An example written in python is the http_parser.py It stores the transferred contents in an output directory separated by domains.


To me, this strongly suggests that starting with sample scripts bundled with justsniffer, it should be relatively easy to write an extension script implementing the suggestions in the "longer answer" above.

That would probably address your need. Else look for sentences or keywords in its description that might hint you at what to type in a search engine to find similar tools.

10% popularity Vote Up Vote Down


 

@BetL925

What is the problem with the second URL? It is the www.akamai.com/, well known CDN service.
If your really really want to determine the CDN url, use this cdn recognition service. They offer a bookmarklet too

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme