Mobile app version of vmapp.org
Login or Join
Cofer257

: Interpretation of empty User-agent How should I interpret a empty User-agent? I have some custom analytics code and that code has to analyze only human traffic. I have got a working list of

@Cofer257

Posted in: #Apache #Http #UserAgent

How should I interpret a empty User-agent? I have some custom analytics code and that code has to analyze only human traffic. I have got a working list of User-agents denoting human traffic, and bot traffic, but the empty User-agent is proving to be problematic. And I am getting lots of traffic with empty user agent - 10%.

Additionally - I have crafted the human traffic versus bot traffic user agent list by analyzing my current logs. As such I might be missing a lot of entries in there. Is there a well maintained list of user agents denoting bot traffic, OR the inverse a list of user agents denoting human traffic?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Cofer257

2 Comments

Sorted by latest first Latest Oldest Best

 

@Ogunnowo487

I work for a security company and among other things we monitor Bad Bot traffic.

Based on my experience, humans visits with blank user-agent data indicate scraping/spamming attempts (usually scraping) made by "headless browser" bots.

These visitors can sometimes execute JS, and so they will appear in GA - still, this dose not make them human :)

Apologize for the "plug" but please know that, if needed, we offer free Bad Bot protection services - coupled with CDN acceleration and other goodies.

In this specific case our system would recognize this visit as "suspicious", verified it against known attack vectors and - if still unsure - performed further test and challenges. These challenges are performed seamlessly, without causing any delay to the session.

10% popularity Vote Up Vote Down


 

@Phylliss660

If you want to analyze only "human traffic" I would not count the ones with empty or missing user agent string. In my experience almost any browser will always send one. Even most privacy plugins or extensions rather fake (include other OS or Client name) or "normalize" (e.g. no release numbers) or randomize (e.g. sometimes FF, sometimes IE strings) the UA strings, but not completely remove them (as this might cause problems with some sites that rely on it, even if that's no good idea.)

A simple request with no UA can be done like this:

wget --user-agent="" example.com

As you see you can add anything you want. Sites that store and publish UA's found "in the wild" are not of great use as they find lot's of crap.

Maybe someone just recursively fetched your content. Or used some SEO tool to analyze your site (some allow users to manually change the header, others with the intent to ignore a robots.txt line). Things like that. In those situations UA header is often faked to hide client and purpose.

If these requests keep constantly around it might be helpful to further analyze the headers (Proxies?) or the IPs (A certain block? Privacy concerned company/Proxy?)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme