Exception woes and the dreaded clr20r3 error
Posted on February 13th, 2009
Well, we got bit by this obscure bug as well. As most of you, if you have encountered this, have found out there is not really a lot of information out there on the web to really explain what the heck is going on. All we know is that it kills our app and we get no real information out of it.
So hopefully my information will provide some extra light on the subject. Since each case is different however from my research they are also very similar in there respective problems.
My scenario
Windows Service running on Windows 2003 server
Multi-threaded app (roughly 70 threads)
Lots of real-time messaging using Tibco EMS
nHibernate database layer
.Net 2.0 framework
The problem
Without reason or warning the windows service would crash, without any notifications going out, and our users would be dead in the water.
What we tried:
- First, we tried adding the AppDomain.UnhandledException logic. Bam – NO good.
- Next, we tried to add the .Net legacy exception handling tag to our app config file. Bam – NO good. Not only that but we could not even start our service properly.
- Next, we called MS. Opened a case and got some tools to try and capture some mem dumps if we could replicate the server failure in our dev environment. This might have worked but we fixed the problem before we could find out.
- Lastly, we were reviewing lots of our code to try and find any leaks, wholes, etc that could maybe cause a critical thread to fail and bring it down. Honestly we got lucky. Just so happens we found an area that did not look right, fixed the code, and Bam – No more exceptions.
What actually caused the problem
In as few lines as possible here is what caused the problem, this method is a shortened and modified version just showcasing the issue and also this is the method that was given to the thread process to run once the thread was started.
What we had in our code
public void RunMe() { try { List data = new List(); data.Add(new SomeObject()); data.Add(new SomeObject()); data.Add(new SomeObject()); foreach(SomeObject o in data) { DoSomethingToObject(o); data.Remove(o) } } catch (ThreadAbortException ex) { Log(ex); } }
So basically we had a bit of logic that had data in a List collection. We were enumerating over it and once it was processed removing it from the collection. We also had a Try..Catch block to try and catch a threadabort if one occured.
Why it blew up
Well looking at the code you’d think um that should not blow up. However if you stop and think about it for a second you will see what happened. If you guessed throwing an InvalidOperation exception… Here’s your cupie doll. 🙂 You guessed it we were throwing an exception because we were removing from the collection while we were enumerating it. Does not matter if you have a lock or anything else this is just a no-no. Now if we had used a for loop instead of a foreach and iterated in reverse that would have been fine. However the rules around IEnumberable don’t like what we were doing.
So we threw the InvalidOperation exception and since we were in a thread and our Try..Catch handler was not catching generic exceptions it ends up being an unhandled thread exception which then bubbles up and bubbles up and bubbles up… you get the point. Even though we had Try..Catch handlers at the service layers it does not matter as this type of unhandled exception will just shut you down. It won’t even fire the Unhandled AppDomain exception.
How we fixed it
Well obviously we had to fix the foreach loop. However the biggest thing that we did to fix the problem was to actually catch the exceptions and handle them. Once we handled the exception it would still cause our thread to shut down (until we fixed the underlying issue) but our server stayed up and no more clr20r3 errors.
From everything I have found the crux of the clr20r3 is exception handling. Make sure in your threads you have a generic exception handler and log the exception to a log file, event log, database, or wherever else you need to so you can actually get the answer you need and go fix the underlying problem.
The final solution
In case you wanted to see the code that fixed the problem here it is:
public void RunMe() { try { List data = new List(); data.Add(new SomeObject()); data.Add(new SomeObject()); data.Add(new SomeObject()); for (int i=data.Count-1;i!=0;i--) { DoSomethingToObject(data[i]); data.Remove(o) } } catch (Exception ex) { LogToEventLog(ex); } }
Pingback: Exception woes and the dreaded clr20r3 error | CaubleStone Ink | usproxylist.com
Thanks for the great information!
Your post was very helpful. While our problem was not related to iterating a collection, your description about error handling led me to the root cause of our problem.
I had a similar issue with multiple apps – finally got them worked out, and thought I’d post the solution I had.
– Make sure no code changes control properties during form loading (this can include things like changing button color via mouse hover)
– If you have a single-instance app, make sure you aren’t starting it in multiple places (multiple Start-menus/run keys) at the same time.
Good luck!