: How can you diagnose the cause of slowness for some users but not all? We are getting reports from a handful of users that our web application is loading very slowly for them. So slowly
We are getting reports from a handful of users that our web application is loading very slowly for them. So slowly that many of their page requests are timing out.
The majority of our users are not experiencing this.
The users who are getting the slowness don't have problems accessing any other site.
We can log in to their instance of the application and from our connection everything is fast.
We have set up screen-sharing with the affected users and have seen first-hand the slowness they are experiencing.
These customers are getting very upset and blaming us, even though we believe the fault lies somewhere else. We're about to lose customers.
tracert doesn't return anything. every request times out.
I'm really out of my realm of expertise here and I have no idea where to begin hunting this issue down, any help would really be appreciated.
More posts by @Bryan171
3 Comments
Sorted by latest first Latest Oldest Best
These issues are often hard to trace back to it's origin. Here are a few things you can do to try and figure out where the problem is exactly.
One simple trick is to create a fairly sizable but not huge HTML file and put it in the web space. You can optionally include a couple of sizable but not huge images in the HTML file. You can ask if your users (that are having trouble) to access this page several times over a period to make sure that the page loads as it should. This will tell you a few things; that the issue is not a network issue, it is not a browser issue (generally), and that it is not a web server issue.
If this works okay, then another thing you can do is to change your application/page to load the JavaScript last as a test to see if there is a JavaScript load and execution issue that is stopping the page from loading. Not all code errors result in an exception of some sort. It is possible that there is a JavaScript incompatibility that exists in some browsers by brand and version. Not all browsers execute JavaScript the same so code can execute fine some or most places and not others. Loading the JavaScript last will allow the page to load while the error remains. This will indicate a possible JavaScript problem.
If these simple tests go well, then I would be looking at two other things.
It may not tell you much, but I would use Chrome or some other browser that provides a waterfall style analysis of load times. You may have to migrate around the application/site or refresh pages until the problem reappears. Chrome has a developers tool that will allow you to see if there is a particular issue with downloading/rendering any portion of your page/site. It will not tell you if you have any latency in queries or anything else that renders the page, but it will indicate that the page is slow to render. It will also tell you if images, JavaScript, or other resource is not loading quickly as it should.
If your page renders slowly, then you will want to look at what causes this. If, for example, a database query is occasionally slow, you can look to your database server for answers. Not knowing SQL Server anymore, I can tell you that MySQL has a slow query log that can be enabled. Some installs have this feature enabled during install. It does require a restart of the server to enable this feature, but is a good idea to do from time to time to tune slower queries (anyhow) so it is something to consider when changes are made.
Lastly, there is always a network issue that may exist. It can be obscure. I use WireShark when I have issues to analyze. This is too huge and technical to discuss, but once the problem occurs, you should be able to parse the network traffic to determine if there are any errors along the way. For example, I used WireShark to determine that my new firewall was fragmenting DNS queries that would occasionally fail causing problems and being quite frustrating. I was able to view the DNS query traffic to determine that an authority required bit was set (which is not normally bad) and the query was fragmented not allowing an authority look-up to succeed without specifying a trace. Without the authority required bit set, the DNS query would return a non-authoritative response if possible even if the request was fragmented. But DNS queries for authoritative responses require that the DNS query not be fragmented. Hence the problem. You may have an issue that is that obscure. It may be difficult to find. Let's hope not!
It can be very frustrating to have a customer experience these issues. I know! It is like having egg on your face, but as we in the IT world know all too well, these things happen from time to time. The key is to analyze the issue to know where to look and resolve the problem quickly. Duh! But it is also important that your customer see you go to extraordinary lengths to solve the problem and exactly how technical and obscure it can be. They may not understand it, but they will appreciate the sweat equity in solving the problem and your clear expertise in tracking the problem down. Heck! They may bake you a cake or buy you a beer! Now wouldn't that be nice? As the great and illustrious Homer said (Simpson that is), "Hummmmm cake." In fact, I think he also said "Hummmmm beer."
Are all requests from those users slow or only some of them? Does it happen to them on all browsers? Can you write a test program, that does the same queries without a browser? Is anti-virus enabled? Firefox has safebrowsing, which first queries google if the pages you want to visit are safe.
Best thing you can get is a full trace with wireshark, so you can analyze it and see if it is a DNS problem, dropped packets or whatever. Installation is not that bad either, if you have a remote session open to them, you can do all except entering the admin password.
Are you using SSL (https) requests? It might be that their browser is using a different method for cryptography.
It can be any part of hardware involved, is it possible to use another computer in their network? Or maybe take one of their computers/laptops to a network where everything works? Maybe a mobile phone with Wifi?
If not, can you create them a linux live CD and let them run Firefox there?
There is tons of stuff to test before giving up, just make sure the users understand that and see you are doing everything possible to help them. Often we only see the technical side, but this is a lot about trust too.
You could hire a PC support company located close to them and let them help you analyze the problem there with extra hardware.
Of course it would help if you can post more details about your server and maybe a link with a demo account so people can take a look and you get some more ideas.
Oh, there was also something with IPv6, that it first tried to resolve an AAAA address before using IPv4 A addresses. Don't remember who did that, but you can see those queries in a wireshark capture. Also Windows Name Resolution with Discovery Mode was a problem once.
There are two main approaches to figuring this type of thing out. Either determine what is the common factor shared by the affected users, or what changed recently so that these users are suddenly affected. Your best bet is to approach it from both directions, as the answer ends up being a mixture of both.
What Ties These Users Together?
If they don't have problems with other sites, that cuts out the user network (but it could still be network related at your site's end).
How is your site load balanced? Are all of the affected users hitting a particularly slow instance or server? (You should have some sort of server performance monitoring tool in place to detect this)
Are these users all using a particular browser or version of a browser? Your site might have a bug when accessed by that version.
Do these users share a common privilege or access level on the site? There may be a bug involved with that privilege.
Are these users in the same geographical area? It might be a local ISP (Although if other sites are fine, chances are the ISP has nothing to do with it)
What Changed Recently?
Assuming that these users were able to access the site normally before, something changed to cause this issue.
Has there been any server configuration changes recently?
Have there been any code or database changes recently?
Are there new traffic patterns? (surge or lull in traffic may affect how your site is being cached)
Dealing with tracert (all credit to closetnoc for correcting me)
Tracert uses ping to ping each router of a proposed route and ping can be disabled at any router along the way, so to say that a tracert that fails means... anything at all... is incorrect. It just means that a ping response to any point of a proposed route fails. Nothing more. Keep in mind, that often tracert is a proposed route which does NOT always end up being the actual route. It is an important tool, but not always an indicator of a problem. Ping requests can be dropped by rule when traffic passes normally.
tl;dr
I realize that answering a question with many more questions is not that great of a practice. The idea here is that you need to ask yourself these types of questions until you figure out what ties the users together (and then something related to that commonality is why they are all affected), or what changed recently (and then something related to the change us what broke), or a combination of the two (e.g., we recently launched a new feature that causes the site to hang in IE 6, so only the IE 6 users are affected and it started happening because of the new feature.)
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.