• 1
  • 2(current)
  • 3
  • 4
  • 5
  • 17
[imx6] Deinterlacing feature
#16
https://server.vijge.net/static/cubox/ir...31211.html <- Here is what jnettlet tells about the performance. But as always, this does not mean, if GC2000 can do 50 fps, the GC880 will only do 12.5
First decide what functions / features you expect from a system. Then decide for the hardware. Don't waste your money on crap.
Reply
#17
I tested my Cubox-i2 (GC880 gpu) with video clip 1080i59.94_mpeg2.ts. Without deinterlacing turned on I get 29.97 frames per second. With deinterlacing turned on I get 18 to 19 frames per second a lot of skipped frames. If that clip works ok on an i4/GC2000 then I'd say performance is about half.
Reply
#18
Any update on deinterlacing work or is performance where it needs to be now for 1080i60 (at least on the cubox4/TV)?
Reply
#19
Read the code!
First decide what functions / features you expect from a system. Then decide for the hardware. Don't waste your money on crap.
Reply
#20
Hi all

Only a very quick update following Zaphog24 message

We are preparing a detailed status about what can be achieved.
Jan already implemented a first kodi integration (but it is not ready except for developers)

To make it short by the time we share figures and explanations, yes imx6Q is able to deinterlace for 1080i60.
And maybe solo would achieve it too !
But please just be patient unless you are ready to dig in the code as it is a dev only thread...

Regards
Reply
#21
Wolfgar is right and results are looking quite promising so far. But all that needs more testing and clean-ups. As I have now a HummingBoard-i1 I can test with the lower end of the iMX6 specification (GC880) although GPU should not be the important factor anymore. So stay tuned and be sure that we are eager to get that integrated into Kodi. I spent (too) much time during the last weeks on testing and hacking to just stop it now ...
Reply
#22
Thanks for the update! I was following the 5805 thread on github but there hasn't been anything new there in the last 3 weeks. I thought this forum was where the discussion moved. Is there another place to check the day-to-day progress/discussion?
Reply
#23
No, there is no other place. You have to wait until we present our solution to the issue. For the time being I would like to focus on development ... feel free to follow our Github accounts and read code.
Reply
#24
* On behalf of wolfgar and smallint *

Abstract

The new proposed iMX implementation improves rendering in terms of used memory bandwidth and hardware utilization. User reports and our tests showed over the last months that we are currently hitting the hard limit with de-interlacing
and HD video streams. We identified buffer copies and GPU utilization as main bottleneck. The new implementation addresses these is limitations.

Description

The old implementation uses the following path to render a de-interlaced picture:

1. Decode data to buffer1
2. De-interlace buffer1 into buffer2
3. Render buffer2 with GPU to back buffer
4. Copy back buffer to fb0
5. Send fb0 buffer to IPU for display

This involves color conversions and buffer copies at any stage. Since CSC is very fast this is most likely not the primary perfomance penality. The used memory bandwidth and high utilization of GPU are the main drawbacks of this method according to our tests:

Note that memory bandwidth may be an improper naming and that bus load may be more appropriate. Indeed, one could measure quite greater memory bandwidth when using only the GPU for instance. But it does not mean that we have big
margins on the memory bus. GPU is able to perform long read/write bursts (64 bytes) from/to memory whereas some other components (in particular the VDIC) are structurally limited to smaller burst size and intrinsically use the memory bus in a non optimal way triggering contention while the theorics maximum bandwidth is far from being reached.

We are unable to deinterlace using VDIC and render with GPU in time for demanding videos streams.

The Freescale profiling tool mmdc (named according to the multi-mode DDR controller that provides debug functions) shows that these cases exhibit a bus load (ratio between busy cycles and total cycles) which is greater
than 97% which means that almost all cycles are busy. Unfortunately all these busy cycles are not used in a perfect way to transfer data thus the bandwidth which is not so high while the bus is definitively unable to deliver more...

The high bus load sometime exhibits as black screen: the HDMI display looses sync with the box due to DP which could not transfer the memory fast enough. That happens regularily with my (smallint) installation when de-interlacing 1080i50 with the render method to be replaced. The new implementation aims to avoid buffer copies as well as limiting bus
usage as much as possible and draws as follows:

1. Decode data to buffer1
2. De-interlace buffer1 into buffer2
3. Show buffer2 (framebuffer panning, no copy)
4. Send fb0 and fb1 to IPU for display combining them on the flow (thanks
to DP capability)

Note that step 1,2 and 3 are performed in separate threads with each method.

We avoid two buffer copies (to GPU and from GPU) by rendering directly into another framebuffer (fb1) which is composed with fb0 at each sync by the display processor (DP).

This allows for very fast de-interlacing of HD streams and even enables double rate rendering with 50fps.

It causes less hardware utilization since the GPU is not used at all during fullscreen playback which leads to less power consumption, less thermal dissipation and more memory bandwidth left for other tasks.

There are still possible improvements by playing with fb0, fb1 usage but current results are impressive already.

As Android is concerned it should work as well but needs to be tested.

What does change?

Since we are now rendering into another framebuffer the current GLES code does not have access to that framebuffer anymore. 3D rotations, color correction, gamma control and all that does not work with GL anymore but the IPU
implements corresponding functionality. As a consequence the screenshot feature is currently broken and does not save the video content. This can be fixed quite easily (not yet done).

Furthermore the framebuffer needs to be setup with 32 bpp to make compositing of fb0 and fb1 work with transparency. To address also lower end hardware based on i.MX6 Solo like hummingboard, 16 bpp can also work but with limitations. The implementation checks the current number of bits per pixel and switching to alpha blending (bpp == 32) or color keying (bpp == 16) of the GUI overlay.

Figures

Code:
File                        Progressive  De-interlacing  Double rate
--------------------------  -----------  --------------  -----------
1080i50_h264_stream                   -            29ms         16ms
1080i50_h264_mbaff_stream             -            16ms         11ms
burosch1_stream                       -             7ms          7ms


Those measurements were taken on wolfgars cubox-i4 with VPU@352Mhz and tweaked VPU prio an axi bus (devmem 0x00c49100 w 7 && devmem 0x00c49104 w 7).

The following numbers were gathered on my (smallint) box with a vanilla ArchLinux installation on a Wandboard Quad running kernel 3.10.17.

Code:
File                        Progressive  De-interlacing  Double rate
--------------------------  -----------  --------------  -----------
1080i50_h264_mbaff                 26ms            22ms         19ms
1080i50_h264_mbaff  (GPU)          26ms            35ms         28ms  
1080i50                            28ms            44ms         39ms
1080i50             (GPU)          45ms            53ms         42ms


Conclusion

After running this code for some time now on my box used on a daily basis we don't want to switch back to GPU rendering anymore. The playback feels much smoother and the full HD double rate feature in combination with Stéphans
kernel fix is something one don't want to miss anymore.

There are most likely additional issue that need to be addressed with that approach but those can be fixed in the software and we don't deal with hardware limitations to such extent as with the old implementation.

NOTE: This implementation requires a merge of PR 6090.

The implementation is available at [1].

wolfgar & smallint

[1] https://github.com/smallint/xbmc/tree/thread
Reply
#25
wow great work guys. I assume rendercapture would need to be fixed too (boblight support - hint hint...).
AppleTV4/iPhone/iPod/iPad: HowTo find debug logs and everything else which the devs like so much: click here
HowTo setup NFS for Kodi: NFS (wiki)
HowTo configure avahi (zeroconf): Avahi_Zeroconf (wiki)
READ THE IOS FAQ!: iOS FAQ (wiki)
Reply
#26
Hi memphiz

Thanks a lot,
You are very right, we will fix rendercapture so that boblight users don't get mad Wink
It was only about sharing explanations/results coupled with a first implementation for testing...

Stéphan
Reply
#27
Thank you very much for this awesome work! Concerning the PR to mainline - we are ready when you are :-)

Could you link to the kernel fix or is that already somewhere upstream?

Thanks very much.
First decide what functions / features you expect from a system. Then decide for the hardware. Don't waste your money on crap.
Reply
#28
The kernel fix is managed by wolfgar. The PR to mainline needs first PR 6090 and there are still some things to fix that I know:

1. figure out sync issues with 1080i50 hd+ sample
2. screenshot
3. rendercapture

Tests and feedback are always welcome ...
Reply
#29
Btw, there is still a small design issue to solve: I am using currently a static instance of a specific iMX context similar to gRBP. Where would be the best place for such a class? Currently it is in the codec itself but needs to be called later from other places as well (e.g. screenshot). xbmc/linux/IMX.h could work but that class would be used with Android as well (if it is working).
Reply
#30
It's something for FernetMenta as he is the architect in charge of this part of kodi, just drop him an email?
First decide what functions / features you expect from a system. Then decide for the hardware. Don't waste your money on crap.
Reply
  • 1
  • 2(current)
  • 3
  • 4
  • 5
  • 17

Logout Mark Read Team Forum Stats Members Help
[imx6] Deinterlacing feature3