Win Crash on resume from hibernate
#1
I have my win10 intel nuc box set to hibernate.

Have not caught it in a log yet. But I am seeing an exception popup for the main loop (seen this twice so far). Then when I close the windows modal dialog it hangs basically and I have to kill it from task manager. Also saw one error where on resume it was saying the box was out of memory (16 gig and this is the only app not likely Wink, once so far)

Anyone else seeing a crash like that?

In the only log I have so far I am seeing this

02:32:19 T:2464 ERROR: Unable to activate the previous window
02:35:57 T:1656 ERROR: CImageLoader:Big GrinoWork - Direct texture file loading failed for resource://resource.images.studios.white/
02:56:06 T:2464 ERROR: Previous line repeats 4 times.
02:56:06 T:2464 ERROR: DXGI_ERROR_INVALID_CALL
02:56:14 T:2436 ERROR: Previous line repeats 26 times.
02:56:14 T:2436 ERROR: CAESinkDirectSound::GetDefaultDevice: Retrieval of audio endpoint enumeration failed.
02:56:14 T:2436 NOTICE: No Devices found - retry: 4
02:56:14 T:2464 ERROR: DXGI_ERROR_INVALID_CALL
02:56:15 T:2436 ERROR: Previous line repeats 4 times.
..... snip
02:56:20 T:2436 NOTICE: Found 0 Lists of Devices
02:56:20 T:5776 ERROR: CAESinkDirectSound::GetDefaultDevice: Retrieval of audio endpoint enumeration failed.
02:56:20 T:5776 ERROR: CAESinkDirectSound::Initialize: Failed to create the DirectSound device with error DSERR_NODRIVER, trying the default device.
02:56:20 T:5776 ERROR: CAESinkDirectSound::Initialize: Failed to create the default DirectSound device with error DSERR_NODRIVER.
02:56:20 T:2464 ERROR: DXGI_ERROR_INVALID_CALL
.... snip getdefaultdevice calls again
02:56:26 T:2436 NOTICE: Found 0 Lists of Devices
02:56:26 T:5776 ERROR: CAESinkDirectSound::GetDefaultDevice: Retrieval of audio endpoint enumeration failed.
02:56:26 T:5776 ERROR: CAESinkDirectSound::Initialize: Failed to create the DirectSound device with error DSERR_NODRIVER, trying the default device.
02:56:26 T:5776 ERROR: CAESinkDirectSound::Initialize: Failed to create the default DirectSound device with error DSERR_NODRIVER.
02:56:26 T:2464 ERROR: DXGI_ERROR_INVALID_CALL
02:56:39 T:2464 ERROR: Previous line repeats 46 times.
02:56:39 T:2464 ERROR: DXGI_ERROR_DEVICE_REMOVED
02:56:40 T:2464 ERROR: DXGI_ERROR_INVALID_CALL
02:56:40 T:2464 ERROR: CRenderSystemDX::SetFullScreenInternal - Failed switch full screen state: 887A0005 - DXGI_ERROR_DEVICE_REMOVED (The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

It is like it is freaking out on the suspend and the directx device has disappeared on it. As 3am would be about the time the box would have hibernated itself. Wonder if directx surfaces disappear on suspend (have to dig into the docs and see).
Reply
#2
Driver issue / stop kodi before suspend and start it again after resume.
First decide what functions / features you expect from a system. Then decide for the hardware. Don't waste your money on crap.
Reply
#3
Do you recommend a particular Intel driver to work around the issue? Also did something else change? I ask because this started happening on my upgrade from 16.0 to 16.1. Sorry I should have add that info to the description. I have been working around the startup crash in the manor you prescribe. So I can continue with that (if I remember). However, the idea is not to play with windows if I dont have to. Smile
Reply
#4
So figured I would take a look and see if I could catch it in a debugger. Grabbed the latest GIT compiled it on another box with vs2015. Right straight into another bug Sad

Exception thrown at 0x675BE681 (d3d11.dll) in Kodi.exe: 0xC0000005: Access violation reading location 0x00000038.

m_pImdContext->Flush(); in CRenderSystemDX:TongueresentRenderImpl. I commented out the line just to get past the issue (obviously not the right fix). It then did the same thing on another flush on exit.

I will probably back it down to 16.1 release instead of mainline and see if I the bug I am looking for to happen again.

Did get it to do the same thing in a different way. Except this time when the screen went to sleep instead of hibernate. But unfortunately it did not repeat the next 3 times I tried to get it to happen that way. Grrr. I also upgraded to the latest intel drivers on the box it is happening on. Did not help at all.
Reply
#5
So figured out something is wrong when using my nvidia chip in my laptop. Not sure what yet. It is a laptop that has both an intel and nvidia chipset. That is causing the flush command to fail for some reason. It does it also on 16.0 and 16.1. But since it is at exit it is probably just catching it or some race condition is 'ok' when I am not debugging. I can at least debug it using the intel chipset instead of the nvidia one for now to get me going. Probably what I want anyway as the NUC is intel. I am seeing the flush issue in 16 all the way thru to the current master 17.

That at least lets me attempt to debug why it is crashing out otherwise. I found a script on the net which lets me suspend the screen which I suspect tweaks this issue, the 'main loop' issue. This is realllllly acting like a race condition. As I have tried about 5 times to reproduce it. I did get a *very* odd memory deallocation error on startup in 16.1 but have not reproduced that yet. Looked like a memory overrun as it was the memory allocation checks that happen on stack returns. I will post more if I can cause it again.

Hopefully, I can reproduce it on my laptop. I *really* do not want to install visual studio setup on my NUC. I want to keep that thing fairly minimal.

Did a diff between 16.0 and 16.1 release code. Not really seeing anything interesting that could cause this. Though if it is a race condition just the new code added may have just changed the timing 'just enough'.

What was 1 issue is probably 3. *sigh*
Reply
#6
Grabbed a full memory dump from a box that was doing this. Not much info yet.

Did look inside the exception that is going off at this point in the code.

mWhere is 0x722781b6
According to the debugger that corresponds to d3d11.dll!SCmdSetBlendState:SCmdSetBlendState

Looks like it is trying to read address 0xfffffff8. Which is probably invalid being at the end of the address spaceSmile

I used this article to help debug it. The windbg commands work inside of the immediate window in visual studio. https://blogs.msdn.microsoft.com/jmstall...came-from/

The context record was 0x0018ed44 so I did the command .cxr 0x0018ed44 and put the context there.

Thru a bit of visual studio immediate window thread context changing it ended up dropping me into CGUIControlGroup::Render(). line 124. Not sure this is right as the other bit is indicating it is in d3d11.
Reply
#7
Well I fixed one of my issues. I reinstalled my nvidia drivers on my laptop. Used the 'clean install' setting from the instillation dialog boxes. So it is no longer crashing on the flush command. Have to see if I can force the reinstall of the intel drivers and if that will clear up my NUC 'main loop' crash. If not I will see if I can setup an exception catcher and get a better memory dump. You can use the built in windows performance monitor to catch the error sooner. I may have to go this way if the reinstall does not work. If that doesnt work I will see if I can get remote debugging working. Have not figured out how to get the out of memory error to happen consistently.

Possibility1: win7/81->win10 upgrade messed something up.
Possibility2: Reinstall of newer drivers over older drivers messed something up.
Possiblity3: something else

Did manage to get the whole app to freak out on something else while trying to find a fix. If while starting up and you limit the CPUs to 4 and you switch from windowed to fullscreen a couple of times the whole python stack seems to start messing up. If you let it sit then it is fine. The python code seems to continue to work (which is kinda surprising considering the exceptions I was getting). But it gets hungup in the swig bindings somehow and seems to fix itself.
Reply
#8
Well got it to do something slightly different. Got the low memory error. Yesterday I uninstalled the existing intel driver from my NUC in the addremove programs dialog. Not sure if that 'really' removed it. Then installed the latest again.

Several hundred of these showed in the log this time.

05:32:18 T:1984 ERROR: DXGI_ERROR_INVALID_CALL
05:32:19 T:1984 ERROR: CRenderSystemDX::SetFullScreenInternal - Failed switch full screen state: 887A0005 - DXGI_ERROR_DEVICE_REMOVED (The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

No dump file was created. As it looks like windows told the app to close and it did so. I saved off the log if anyone is interested.

Lucky the only spot in the code that could have caused this error is the line hr = m_pSwapChain->SetFullscreenState(true, m_pOutput); Also I have true full screen set. So I know it is passing in true.

May need to recompile it myself and add in a call to GetDeviceRemovedReason on the error log. The MSDN docs say to call it for a better reason. My luck it will be DXGI_ERR_RDEVICE_REMOVED. Which does not tell me much than what I already know.

Turned on debug logging. But it seems to take a couple of hours for this to show up if it does at all. I have yet to get this to act consistently.
Reply
#9
Same problem with my nuc ..(DXGI_ERROR_INVALID_CALL)
.intel graphics drivers problem ?
Intel NUC 5YPPH-8Go-SSD120go-external drive 1To
Windows 10 PRO - Kodi Krypton 17 - IconMix
Reply
#10
I dont rule out a driver issue. It is possible. However I think there may be a bug here. As with this the nvidia flavor is doing something sorta similar (but with different errors) http://forum.kodi.tv/showthread.php?tid=272906 Been meaning to try the same fix they came up with there and not use true full screen and see if that helps.

I suspect that the way dx9 and dx11 handle context switches is in some subtle way different. Have not really figured out the docs yet. But it seems like when this happens you are supposed to discard the directx context and just recreate it. That would cause a flicker. But at these two points (screen saver resume and hibernate resume) it would not matter much.

I can see in my logs just before it hibernates the context is switching away from KODI to windows explorer. Then it is like the driver disposes of the the directx context so it is no longer any good. Now in regular use just switching between applications does not seem to cause this bug. Well at least I have not got it to happen. However I am seeing some people complaining about dual screen issues. Which could be similar.

Also the DXGI_ERROR_INVALID_CALL may just mean whatever it is calling is not supported. But you seem to have a fairly newish NUC so that should be false. I have the Intel NUC D54250WYK1 Which is speced to have the 5000 series GPU.
Reply
#11
I switched the full screen behavior. It seems to have helped a bit. Now my crash is on startup (sigh). But nothing consistent. Of course now that I have said something it will probably start crashing again Sad
Reply
#12
I wouldn't completely rule out Jarvis as the culprit (or at least contributor). there are pages and pages of people having problems with crashes, black screens, and freezes with Jarvis and Windows (7, 8, and 10).

I have had a similar issue since upgrading to 16.0 and in 16.1. When resuming from sleep I would get a solid black screen and no response to remote/mouse/keyboard. I would have to either alt/tab to windows or ctrl-alt-del. Sometimes Kodi would be completely unresponsive, others I could alt/tab back to Kodi and it worked normally. This happened after every resume from sleep. I upgraded from Win 7 Pro to Win 10 hoping to resolve. It didn't. Now I get the same problem just by turning my TV on/off without the PC going to sleep. I resolved it by going from fullscreen to windowed mode in the Kodi settings.

edit: forgot to add, I never had a problem with my system and Kodi in 3 years before the Jarvis upgrade.

i7 920
12gb ram
660Ti latest driver
Win 10 (same problem on 7)
Reply
#13
There was a major switch from dx9 to dx11 in jarvis. In my opinion that is exposing 2 things. First thing is that not all dx11 drivers are ready for prime time. One of my computers I had to reinstall the drivers to get it to not crash pretty much all the time (nvidia laptop). It is also exposing some interesting issues in the code that the basically 'new' drivers are showing. Basically edge cases that the drivers are doing that the code does not handle yet. It is also terribly inconsistent. In my case it is acting like the whole display surface is 'gone' then the code tries to use it (intel nuc). But the DX11 driver code goes on and tries to use the invalid handle causing a seg fault.

Thank you for doing the upgrade! I was going to try downgrading from 10. That info does help and saves me a bunch of time. It means OS version means nothing. Which is good news in helping find the issue.

The fullscreen thing does help on mine too. Have not dug the code yet to see what the diff is. I tried doing the power thing off/on with my TV and NUC but that did not seem to tweak the issue. If I could get it to tweak reliably I could catch it in a remote debug session. But before going thru the trouble of setting that up I want a better reliable way to cause it to happen. Just wish I could get it to happen on my other computer that I have visual studio already setup on.

The exception up above that I was tracing with the dmp file puts me squarely in the middle of rendering the controls and about 20 calls away from where the exception is thrown. Which does not make sense. Not unless that control list is garbage. There is also the idea of deferred rendering in dx11. So it could just be the victim.
Reply
#14
I get round this by using a script that enables my remote to close and open kodi when I'm finished with it hence allowing sleep or hibernate modes without incident, msg me and I'll explain in more detail if you find it useful otherwise ignore Smile ta Dek
Reply
#15
(2016-05-27, 12:57)Derek Wrote: I get round this by using a script that enables my remote to close and open kodi when I'm finished with it hence allowing sleep or hibernate modes without incident, msg me and I'll explain in more detail if you find it useful otherwise ignore Smile ta Dek

Would you mind sharing this script? I'm having the same problem with 16.1 running on radeon. My htpc is set to never go to sleep but it happens when I turn off my tv and leave Kodi running for some time. When turning my amplituner and tv back on Kodi is unresponsive and needs to be killed and log shows "ERROR: DXGI_ERROR_INVALID_CALL". I'm using Harmony One with RC6 receiver.
Reply

Logout Mark Read Team Forum Stats Members Help
Crash on resume from hibernate0