Ok, I found a bit of time today and .... I have solved the bug in the MPEG DXVA code !
Exactly as I suggested, the hardware decoder is decoding less slices than the software decoder. This difference is caused by the fact that the hardware decoder limits the maximum amount of slices ! It's defined as follows:
#define MAX_SLICES (SLICE_MAX_START_CODE - SLICE_MIN_START_CODE + 1)
#define SLICE_MIN_START_CODE 0x00000101
#define SLICE_MAX_START_CODE 0x000001af
0x1af0-0x101 +1 = 175 slices maximum.
It just stops there. The software decoder goes beyond, because there ARE more slices than 175
You can easily verify by letting the hardware decoder output the number of decoded slices (will always be 175) and let the software decoder output the decoded slices (will be more than 175).
If you simply set the MAX_SLICES constant in dxva2_mpeg2.c to for example 220, the testvideo's posted below decode perfectly correct
I haven't investigated if just simply increasing this value might have negative consequences (eg yield problems) in other situations, I guess that should be looked into before submitting the patch. But I'm calling it a day for now