Troubleshooting an IIS Crash/Hang
This is an experience of mine at a client. It was a project about integrating a third party application with the client’s web application, so that the client could provide better service to its customers.
I have designed and implemented this ASP integration application and the project was deployed successfully. Everyone was very impressed with my work and I was happy.
I was thinking how well my design was and that all implementations should be as good as mine. how nice would it be if all the systems that are designed and implemented ran smoothly after a successful QA, with out ever crashing or needing any monitoring.
Phone rings !!!!
Wake up Caveman!!! Wake up !!! Ohhhhhhhh Shoot!! it was only a dreammmm….. damn-it!!
I woke up from my dream with the annoying ring of the phone. I got this call from the support group
Me: Hello, Me here
Support Guy(SG): Hi this is SG from the Support group, how are you?
Me: Good, How about you?
Me: What’s up?
SG: The website is hanging intermittently and we are having to do an IISReset to bring back the site.
Me: hmm… Is there a pattern to this issue?
SG: The CSR Manager said that this happens randomly and that the business is getting effected because of the outage.
Me: Okay I will take a look at this and get back to you.
SG: This priority 1 issue.
Me: (I thought of shouting at him: so what!!! hold on !!! I will get to it when I can get to it) Thanks for letting me know about this…. and the call ends.
This is the time when I was scratching my head about what could possibly have gone wrong, that caused the website to hang. My first instinct told me that the third party component might be the culprit (as later turned out to be), coz I did not design/code it, heheheheeeee.
There could be several reasons like some of the following that could cause an application failure.
I would usually start with checking the health of the Web application that includes (but not limited to) checking the following:
- No. of Database Connections.
- CPU on Web Server and Database Servers..
- Event viewers on all servers.
- Memory consumed by Dllhosts or Worker (w3wp.exe) processes.
- Web Service call durations.
At times this might also not help and all you would notice is a blip on the radar that does not tell you much, like an iisreset has been automated and that a crash has occurred. How would you know the cause of the crash/hang?
IIS State is a command line utility, that is a part of the IIS 6.0 Resource toolkit, that is a very handy to diagnize IIS related issues.To attach IISState to a particular w3wp.exe process execute the following command (where <PID> is the Process ID). This will do an immediate dump of the current process.
iisstate -p <PID>
- -sc(waits for a “soft crash” such as an ASP 0115 Trappable Error Occured in an External Object)
- -hc (waits for a “hard crash” where the process terminates unexpectedly)
- -d(write out a dump file, which can be used for further analysis, e.g. by WinDBG)
IISState outputs a log file containing the stacks of all the threads in the process. I used the IISState utility to get a dump file and a logfile by hooking it to the only w3wp.exe process. I got lucky and the crash happened in a little time. Upon examining the dump I have noticed that the one of the threads was waiting on another thread to get its job done and that, that thread was waiting on another thread. I traversed through a bunch of threads to finally derive at a thread that was the culprit. I was able to figure out that this thread belonged to a third part dll as mentioned in the thread info. I have checked with the 3rd part software company and found out that they had released a newer version that took care of the thread issue.
Another way of diagnosing the issue would be by further analyzing the dump file with utility like DebugDiag, WinDBG. For this method of diagnosis you will need the .pdb files of the application and the necessary symbols.