: How to determine if a user agent string has proper syntax or might be a hacking attempt? I was checking my server via awstats to see who was visiting my site and I have a user-agent of

Posted in: #Hacking #Security #UserAgent

I was checking my server via awstats to see who was visiting my site and I have a user-agent of the following value:

}__test|O:21:

Some research has lead me to believe that someone was trying to hack my server.

To prevent such things from happening, what are the syntax rules in determining that a user agent string is 100% legitimate and not some hacker-crafted string? For example, what characters are/are not allowed in a proper user-agent string and do some characters need to be in a certain order?

10.03% popularity Vote Up Vote Down

: Should I compress very small HTML pages? Here's the thing: More compression of a page means more work of the server CPU. 99% of my website for most visitors is gzip compressed. I have made

@Harper822

Posted in: #Compression #Html

2 Comments

: Running my site on an old-fashioned phone emulated Back in the day before iPhones were invented, different technologies (like xhtml/wap) were used to present pages. Now it seems that all new

@Harper822

Posted in: #Mobile #WebsiteDesign

0 Comments

: Decent waiting time before making old links broken After merging subdomains into one domain, the website speed improved. At this time, I have 301 redirects in place for anyone requesting old

@Harper822

Posted in: #Redirects #Subdomain #Url

2 Comments

: What is this new "optimization tab" for adsense? I checked adsense today for my website and as it was loading, I noticed a #1 on a red circle next to a new "optimization" tab. I then click

@Harper822

Posted in: #GoogleAdsense

0 Comments

Login to post a comment!

3 Comments

Sorted by latest first Latest Oldest Best

@Looi9037786

As the user agent is completely client controlled, it is a good thing to pay attention to it, as it can be used in various attacks.

Allowed Characters in User Agent

what characters are/are not allowed in a proper user-agent string and do some characters need to be in a certain order?

@Stephen Ostermiller already linked to RFC2616. It was updated in RFC7231, but nothing really changed:

User-Agent = product *( RWS ( product / comment ) )
[...]
product = token ["/" product-version]
product-version = token

It does however link to RFC7230 to specify how comments may look:

comment = "(" *( ctext / quoted-pair / comment ) ")"
ctext = HTAB / SP / %x21-27 / %x2A-5B / %x5D-7E / obs-text
[...]
quoted-pair = "" ( HTAB / SP / VCHAR / obs-text )

This is a fancy way of saying that pretty much all characters are allowed in the comment part of the user agent. () are the only ones that cannot be placed freely.

token is a bit more restrictive, as can be seen in RFC7230. It doesn't allow (),/:;<=>?@[]{}.

How to filter user agents

what are the syntax rules in determining that a user agent string is 100% legitimate and not some hacker-crafted string?

As user agents can contain pretty much any character, reasonable filtering is impossible. And this isn't even considering that not all clients will follow the RFC (filtering should not be very restrictive, for usability reasons).

Filtering user input is a good first line of defense, but it should never be your only one, as it is extremely difficult to prevent all attacks with input filtering.

You need secure coding practices, and you need to implement proper defenses against common attacks. So if the user agent is echoed, you need to encode it to prevent XSS. If the user agent is stored in the database, you need to use prepared statements to defend against SQL injection. If you pass something to the PHP function unserialize, you need to keep object injection in mind (I'm mentioning it because the O:21 looks a bit as it might have been a test). And so on.

If you want an additional line of defense, you might think about using a WAF such as mod_security.

10% popularity Vote Up Vote Down

@Megan663

The User-Agent header is part of the RFC2616, which is an improved version of the RFC1945, where it states:

The User-Agent request-header field contains information about the
user agent originating the request. This is for statistical purposes,
the tracing of protocol violations, and automated recognition of user
agents for the sake of tailoring responses to avoid particular user
agent limitations. User agents SHOULD include this field with
requests. The field can contain multiple product tokens (section 3.8)
and comments identifying the agent and any subproducts which form a
significant part of the user agent. By convention, the product tokens
are listed in order of their significance for identifying the
application.

User-Agent = "User-Agent" ":" 1*( product | comment )

Where product is defined as:

product = token ["/" product-version]
product-version = token
token = 1*<any CHAR except CTLs or separators>

And comment as:

comment = "(" *( ctext | quoted-pair | comment ) ")"
ctext = <any TEXT excluding "(" and ")">

Source: Paulo Santos's answer to What is the standard format for a browser's User-Agent string?

10% popularity Vote Up Vote Down

@Pope3001725

There are no rules. A user agent can be anything.

There's no reasonable way to whitelist user agents as there are a lot of legitimate ones and you do not want to accidentally block a legitimate user. There's also no way to block bad user agents because, once again, there is no standard way to determine if a user agent represents a bad user.

If you want to try to block bad bots you can compare a user agent against this database and see who it is and then make a semi-educated decision about whether you should block it or not. There are also some attempts to maintain lists of bad user agents but I don't know how current they are and new user agents appear every day.

10% popularity Vote Up Vote Down

Feed

: How to determine if a user agent string has proper syntax or might be a hacking attempt? I was checking my server via awstats to see who was visiting my site and I have a user-agent of

More posts by @Harper822

: Should I compress very small HTML pages? Here's the thing: More compression of a page means more work of the server CPU. 99% of my website for most visitors is gzip compressed. I have made

: Running my site on an old-fashioned phone emulated Back in the day before iPhones were invented, different technologies (like xhtml/wap) were used to present pages. Now it seems that all new

: Decent waiting time before making old links broken After merging subdomains into one domain, the website speed improved. At this time, I have 301 redirects in place for anyone requesting old

: What is this new "optimization tab" for adsense? I checked adsense today for my website and as it was loading, I noticed a #1 on a red circle next to a new "optimization" tab. I then click

Login to post a comment!

3 Comments

Back to top | Use Dark Theme