Crashing my computer with Nvlddmkm Event ID 4101

Since allowing myself to be convinced to update to Windows 10 (from Windows 7), I’ve been getting the dreaded “nvlddmkm Event ID 4101” error, perhaps also known as The Black Screen Of Death. It took a bit of digging but I believe I’ve uncovered a chain of issues that combined to cause the problem and a fix that appears to have fixed it.

A DISCLAIMER

But first, a disclaimer that shouldn’t be necessary, but: I’m relating this account solely on the off-chance that it is useful to someone else in debugging their own issues, which are probably entirely different. Taking any of these steps is likely to result in your computer on fire, your pets kidnapped, and infection with foot herpes. DONT DO IT! … and if you do, don’t come crying to me, because I’ll just point at the “DONT DO IT!” and shrug.

The Scene

My SL/gaming computer was a moderately high-end machine four years ago.  Intel i7-2700k, NVidia GTX 580, 16G RAM, primary disk a 256G SSD, blah blah blah, running Windows 7 Pro.  When I finally took the plunge and allowed windows to update itself to Windows 10, there were some initial graphics drivers issues [As an aside, it was totally freaky seeing a modern computer booting into 640×480 grayscale graphics. :P] but it didn’t take too long to get things running.  In fact, the system felt slightly more responsive, so I was pretty happy… For about 24 hours.

The Crash

Picture this (which should be easy if you’ve experienced the same): you are happily looking for hunt items or chatting with friends or whatever in SL and then without any warning, everything goes black and you are suddenly staring at your BIOS screen as your computer restarts. Let me reiterate: no warning!  No obvious sudden lag, CPU and GPU monitors not alarming, temperatures under control, etc. etc.  Lather, rinse, repeat.  Furthermore, it only happened while in SL.  This continued on and off for a few weeks.  Sometimes things were great, but then one evening I crashed in this way a half dozen times. I googled for answers, I poked at BIOS and registry settings, all to no avail.  I even started shopping for a new computer! But then I got a glimmer of hope and followed it (knocking on wood) to a happy conclusion.

The Debugging

First off, you’ll notice from the previous paragraph, that I was crashing to BIOS and didn’t say anything about the nvlddmkm event.  This is because whatever was crashing didn’t have time to write anything to the event log, so I was completely in the dark.  The event log would have standard info and sometimes warnings leading up to the crash and then… hiatus… and an error complaining that the computer wasn’t gracefully shut down. Grr.

Guessing that there was video card involvement, I tried a few different drivers, nothing helping until I tried a hot-off-the-presses update.  And the symptoms changed! Now instead of crashing to BIOS restart, my screens went black but things seemed to be running still: no restart and banging on the keyboard would occasionally result in one of several dings and crashy sounds.  Rebooting would bring things back to life, but the first time this happened, my initial thought was that this was even worse.  But no: NOW it left some intriguing events in the system log: the dreaded nvlddmkm Event ID 4101! [As an aside, software people, you really really ought to stop with non-descriptive errors.  Yes, I understand that you think “that will never happen” but it will, and it will piss off your customers that you couldn’t be bothered to describe the error in any sort of useful way.]

It turns out that nvlddmkm Event ID 4101 isn’t actually a real problem.  Well, it isn’t supposed to be:  it is merely an indication that windows has decided that your graphics driver has locked up and should be restarted. There’s an excellent summary of the issues here. Of course, most of the articles read somewhere between “hey, not our fault!” and “First, try reinstalling your OS.  If that doesn’t work, replace your computer.” Far from satisfying. A number of people recommended changing various TDR keys in the registry to various values, some which result in actually disabling the feature altogether. Of course, most official sources include dire warnings (see DONT DO IT! above). But I decided to throw caution to the wind and sat down to Read The F…abulous Manual, the Microsoft Hardware Dev Center article on TDR Registry Keys. I ended up adding a TdrLimitCount entry, that changes the default number of TDRs allowed in 60 seconds before crashing from 5 to 30… and the behavior of my system changed again!

Now instead of triggering the GPU reset, which left my screens blank and my system unusable, I got periodic freezes of my SL window.  Other windows were fine, as it turns out, and there wasn’t anything obviously otherwise going wrong with the system (heat fine, neither CPU nor GPU pegged). I then checked out the event log while a set of freezes were going on, and saw VSS errors.  VSS is the Volume Shadow Copy Service – part of the online backup system that allows backup of files that are in use.  AH HA!  My initial thought was that my online backup system was kicking in and trying to do something like backup the SL texture cache directories.  In fact, it was, but merely excluding them didn’t do the trick, so I suspect a more subtle interaction.  While I was there, I realized that when I upgraded windows 10, I had to reinstall Crash Plan, my online backup system (highly recommended, btw, and no I don’t get kickbacks), and neglected to tell the new install that it should use extra memory. Anyway, the combination of the new nvidia driver, the TdrLimitCount change, and turning off my online backup system has seemed to solve my problem in the short term.

The Future

Of course, it would be nice to not have to disable my backup system when I want to run SL.  I’m trialing out telling Crash Plan not to run unless there’s basically nothing else running.  The simplest thing to do is a setting that will only let it run if the CPU load is under 20%.  Better would be figure out a) why VSS is conflicting with anything SL, especially since the whole point of VSS is to keep backups from conflicting with running applications and b) what to do about it.  Stay tuned for further updates!