Kodi Community Forum

Full Version: Plans for h.264 decode acceleration?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi. This isn't a feature request or anything, and I've seen the wiki page outlining some of the options for extending the hardware acceleration of video playback, but I was just wondering if there's any current plans for implementing some sort of acceleration for h.264?

Also there's mention in the changelog of support for using pixel shaders for decode acceleration. I presume that's for regular MPEG somewhat along the lines of xvmc rather than anything to do with h.264/VC1? Does it (pixel shader method) do more than xvmc?

Sorry if this has been covered before, I have searched around for the info but couldn't find it. Just curious really Smile

Thanks.
I suppose if the video drivers or ffmpeg supported it we'd have it. I *think* the devs are working for feature complete right now vs adding things like hardware accel although I'm quite sure hardware accel would make just about everyone happy. Maybe bitch at the likes of NVIDIA?

P.S. Some of the answers in this thread might help you understand the situation.
A few days ago one of the Google Summer of Code proposals was h264 hardware acceleration, though I don't know which platform.
There's little chance of seeing it in the nVidia drivers any time soon unfortunately which is why I'm interested in whether XBMC might implement it independently. AMD are making noises about making the next generation of their purevideo equivilent more open source friendly, but again that looks to be a way off yet.

I'm aware that XBMC doesn't support it now, and I've already read the wiki page linked in that thread, thanks for the response though. Smile

Interesting news about the GSOC proposal though. I guess that means there aren't any concrete plans yet.

So can anyone tell me what the pixel shader acceleration does? Is it similar to the xvmc extension?
Uhm...It can't be implemented in XBMC until it's implemented in the driver. XBMC can't directly access hardware. AFAIK no gfx has h.264 accel in linux.
GPU offloading is probably a lost cause for linux. Not because it's impossible or anything so much as that CPUs get faster and cheaper and it just won't matter all that much in the very near future.
althekiller: You don't necessarily need direct access to the hardware. Things like shader languages can allow you to run generic code on the GPU.

http://www.gpgpu.org/

Also I believe the next gen of nVidia and Ati's cards are expected to have more support for running arbitrary code, I forget the names for them. GPU's are still better *suited* to performing certain kinds of operations of course.

I think GPU offloading will continue to be relevant. There are higher resolutions than 1080p on their way, and they will need exponentially more cycles to decode.
There is a specific accelerator on the GPU for h.264 decoding. It is a separate block in the silicon. It has nothing to do with gpgpu.

On a side note nvidia supports gpgpu on all gf8+ cards through its CUDA library.
slight Wrote:There are higher resolutions than 1080p on their way, and they will need exponentially more cycles to decode.
I wouldn't expect video to be widely available in resolutions higher than 1080p in the next 5 years at least. Probably longer. By that time entire computers on a chip will be available with sufficient horsepower to decode 1080p; small and cheap enough to fit inside cellphones.
You could still use CUDA (thanks for reminding me of the acronym) or possibly HLSL/GLSL for decoding h.264.

According to XBMC the CABAC part of the h.264 decoding process probably can't be implemented in pixel shaders, but that still potentially leaves CUDA, and the other parts of the decoding process may still be candidates for running in HLSL/GLSL.

Fingers crossed this gets accepted:

http://wiki.xbmc.org/?title=GSoC_-_Hardw...oded_video

Anyway I'd be interested to hear from any of the devs on this.
My guess it that bitstream processing (CAVLC and CABAC entropy decoding) could probably not be done efficiently using CUDA either, same as for pixel shaders.
http://wiki.xbmc.org/?title=Hardware_Acc...o_Decoding
Quote:* CABAC entropy decoding is probably not possible to offload on GPU via pixel shader.
* NVIDIA and ATI/AMD GPUs use dedicated hardware blocks for entropy decoding.
I hope that I be proven wrong, but I think that the things that could be looked at first are:
Quote:* Motion compensation (mo comp)
* Inverse Discrete Cosine Transform (iDCT)
** Inverse Telecine 3:2 and 2:2 pull-down correction
* Inverse modified discrete cosine transform (iMDCT)
* In-loop deblocking filter
* Intra-frame prediction
* Inverse quantization (IQ)
* Variable-Length Decoding (VLD), more commonly known as slice level acceleration
* Spatial-Temporal De-Interlacing, (plus automatic interlace/progressive source detection)
I spen't some time a year ago, thinking about, how to implement some GLSL acceleration using ffmpeg's DSP API. The DSP API is there to provide SSE2, MMX implementations for e. g. IDCT etc.

I came to the conclusion that it's not useful to use it, as the amount of cpu cycles used to prepare and switch the OpenGL context needed, is far more than the gain. Additionally the CPU is blocked while waiting for the GPU to finish.

From my understing, first of all one need a completely restructred, maybe multithreaded, h.264 decoder, that is able to offload huge tasks to the GPU e. g. execution of deblocking filter for a whole frame. Unfortunately my understanding of h.264 internals is limited, and my spare time didn't allow to dig into. As an exercise I've implemented an GLSL based YUV->RGB converter for ffmpeg, and got nearly for free, a video scaler as well :-).

I'd like to open an technical discussion, on how "h.264 on GPU" can be realized . Even if GSoC didn't accept the project, I think It's worth starting it :-)

BTW: some of the step's listed above are nearly "for free" when using a GPU, e. g. "Motion compensation" is basicly "texture mapping" (maybe with an post processing shader).
we already do csc and scaling in hw, even on xbox.

problem with most of these are that they are extremly branchy. which makes them very ill suited for vectorization which is where the punch of a gpu really is at. i've seen some tests, claiming a 10x increase on mocomp (mpeg2) which consist of roughly 30% of the decoding processing needed. that still yields rather mediocre speed improvements, certainly nothing that approaches running hi resolution vids on significantly weaker hw. h.264 is even worse, in particular due to stuff like cabac, which takes extreme amounts of processing and, as its a bit based compression, is extremely branchy (if my understanding is correct, that the last part with a grain of salt). this is why vendors include specific hw to accelerate these things. with hw vendors being the paranoid d*icks they are, they won't let the FOSS community tap into the resources :/

also integrating things with ffmpeg is rather hard, as you point out, since the hooks are too low level, i.e. very small operations which makes the context overhead kill any gains
I wonder if better usage of SSE3 and SSE4.1 might help quite a bit and be more portable?

-elan