• 1
  • 22
  • 23
  • 24(current)
  • 25
  • 26
  • 52
libstagefright - Experimental hardware video decoding builds
Hi,
today I have tried new build 19/02 on my MK802III (rockchip) and I here is my report:
H.264 videos in SD resolution now works fine (before not),
but some HD mkv are very laggy (only 10 fps) - for example this:
media info report:
https://www.box.com/s/qhed9xl6in076eg29i0o
xbmc log:
http://www.xbmclogs.com/show.php?id=1072
So I hope, that one of the next build solve mkv playback issues.
Thanks for your hard work.
about (not) MOD16 video issue on RK3066,

did you check stride size of decoded frame from libstagefright?

I guess modifying parameters of input (such as width/height) is NOT necessary to decode, but output buffer may need to be MOD16 (enlarged).

I have no idea that decoded picture is scaled to fit buffer, or just padded.

----
btw, this code is interesting.
https://github.com/koying/xbmc/commit/12...fe#L16R257

(I'm sure OMX.SEC has no relation to RK's decoder. I just want to say the thing I'm interested.)
Hi all, I am libstagefright developer from rockchip. I am Chen Hengming mail: [email protected]

The rockchip OMX component output buffer normally is MOD16. That is correct. We do not use the published width/height of a stream.

But due to RK3066's GPU hardware limit, the RK3066 OMX component output buffer is 32 bytes align on width and 16 bytes align on height.
So sorry for inconvenience.
Please set output buffer to MOD32 and test whether it is fixed.

BTW the OMX component use memcpy to transmit picture which is not efficient. In Rockchip libstagefright we use zero copy from decoder to render which introduce some private interface which make it difficult to use libstagefright from native layer.

If you need more infomation about that feel free to contact me.
(2013-02-22, 08:46)Herman.Chen Wrote: The rockchip OMX component output buffer normally is MOD16. That is correct. We do not use the published width/height of a stream.

But due to RK3066's GPU hardware limit, the RK3066 OMX component output buffer is 32 bytes align on width and 16 bytes align on height.
So sorry for inconvenience.
Please set output buffer to MOD32 and test whether it is fixed.

I'm not a developer of XBMC, but I want to say thank you for your contribution to open community Smile I hope it will help development.

(2013-02-22, 08:46)Herman.Chen Wrote: BTW the OMX component use memcpy to transmit picture which is not efficient. In Rockchip libstagefright we use zero copy from decoder to render which introduce some private interface which make it difficult to use libstagefright from native layer.

If you need more infomation about that feel free to contact me.

I guess RK3066 has/uses 2D accelerator (such as overlay and scaler) to show decoded frame on screen, correct?
I can see some references for /dev/rga and /dev/rk29-ipp in some libraries.
Hi Chen having this interface for XBMC like in RKPlayer and Direct to Display instead of memcpy would be indeed much more efficient, hope we can get to that point while still rendering Subtitles and GUI on top Smile
I guess these strange rendering and kernel crashes @ 1080p in XBMC we are experiencing currently are coming from the GPU driver then, though why does it work to disable 1 core and get more stable playback (without any rendering issues, though still occasional crashes) ?

I guess nothing has changed for the now released RK3188 in terms of this GPU output buffer Limitation ?

PS: It's nice to see Rockchip being interested in helping out reaching a broader audience with their Hardware and filling demands Smile
I hope Koying you didn't mind me contacting Herman via email yesterday after I read your reluctants to do so.

I also want to thank you personally Herman for bothering to answer my email and to join in the thread here. I'm hoping that we can now solve the issue of hardwear acceleration for the RK3066 chip set.

Hermans response to my mail

Hi Peter,
Thanks for you letter.
The rockchip OMX component output buffer normally is MOD16. That is correct. We do not use the published width/height of a stream.
 
But due to RK3066's GPU hardware limit, the RK3066 OMX component output buffer is 32 bytes align on width and 16bytes align on height.
So sorry for inconvenience.
You can set buffer to MOD32 and test whether it is fixed
 
Sorry for inconvenience again.
You can call me herman : )
Hi

I tested latest release 19/02 and when I tried to play one of my video xbmc crash.
It happen each time when I try to play it.
Logs: http://www.xbmclogs.com/show.php?id=1083

Btw. I tested also TED Talks plugin and I have just audio without video,
should I provide you some logs for that or this is know issue?

Best regards
Hi

I tested latest 02-19 on my odroid U2
720P is perfectly fine in official android ICS(2012-12-23),there are four working CPUs,but
720P is choppy in official android ICS(2013-01-03 and 2013-02-08),there are two or three
working CPUs。
I don't find out my log files.
Sorry,my English is poor.
(2013-02-22, 18:19)keithhuang Wrote: Hi

I tested latest 02-19 on my odroid U2
720P is perfectly fine in official android ICS(2012-12-23),there are four working CPUs,but
720P is choppy in official android ICS(2013-01-03 and 2013-02-08),there are two or three
working CPUs。
I don't find out my log files.
Sorry,my English is poor.

Keith, if possible try out the published JB 1.3.1 release from ODROID, the 2/19 XBMC release is working well for me running that OS release.
(2013-02-22, 08:46)Herman.Chen Wrote: Hi all, I am libstagefright developer from rockchip. I am Chen Hengming mail: [email protected]

The rockchip OMX component output buffer normally is MOD16. That is correct. We do not use the published width/height of a stream.

But due to RK3066's GPU hardware limit, the RK3066 OMX component output buffer is 32 bytes align on width and 16 bytes align on height.
So sorry for inconvenience.
Please set output buffer to MOD32 and test whether it is fixed.

BTW the OMX component use memcpy to transmit picture which is not efficient. In Rockchip libstagefright we use zero copy from decoder to render which introduce some private interface which make it difficult to use libstagefright from native layer.

If you need more infomation about that feel free to contact me.
Hi Herman,

Thanks for joining the thread.

We are passing a native windows to OMXCodec::Create, so rendering is done entirely inside libstagefright and we don't have any way to correct the frame size.

Related to this, it looks like the frame size passed in the metadata after a read is not correct, I. E. Is not mod16/mod32 but the published one.

Could it be that solving the metadata bug would also solve the rendering bug?
(2013-02-22, 09:43)fun_ Wrote: I guess RK3066 has/uses 2D accelerator (such as overlay and scaler) to show decoded frame on screen, correct?
I can see some references for /dev/rga and /dev/rk29-ipp in some libraries.

Yes, you are right. We use hardware rga to composite to final framebuffer directly and ipp is for deinterlace.


(2013-02-22, 09:50)CruNcher Wrote: Hi Chen having this interface for XBMC like in RKPlayer and Direct to Display instead of memcpy would be indeed much more efficient, hope we can get to that point while still rendering Subtitles and GUI on top Smile
I guess these strange rendering and kernel crashes @ 1080p in XBMC we are experiencing currently are coming from the GPU driver then, though why does it work to disable 1 core and get more stable playback (without any rendering issues, though still occasional crashes) ?

I guess nothing has changed for the now released RK3188 in terms of this GPU output buffer Limitation ?

PS: It's nice to see Rockchip being interested in helping out reaching a broader audience with their Hardware and filling demands Smile

Mutual help and cooperation is good for each other Smile Thanks for reply.
in RK libstagefright when Subtitles is required we use rga to make a copy of video frame and draw subtitles on it by cpu. Then the rga composite the video frame directly to linux framebuffer. These part of code is in hardware composer (hwc). GUI is another surface layer.
About the GPU driver we make some change to support video format GPU rendering. I am not familiar with this part, sorry.
RK3188 and RK3066 use the same GPU so they have a same output buffer limitaion.
(2013-02-23, 05:43)Koying Wrote: We are passing a native windows to OMXCodec::Create, so rendering is done entirely inside libstagefright and we don't have any way to correct the frame size.

Related to this, it looks like the frame size passed in the metadata after a read is not correct, I. E. Is not mod16/mod32 but the published one.

Could it be that solving the metadata bug would also solve the rendering bug?

is it impossible to use local native window which has MOD32 surface instead of the native window returned from g_xbmcapp.GetAndroidVideoWindow()?

I think metadata of input stream is correct and no need to *fix* it. only MOD32 surface is required to output frames from hardware.
(2013-02-23, 06:11)Herman.Chen Wrote:
(2013-02-22, 09:43)fun_ Wrote: I guess RK3066 has/uses 2D accelerator (such as overlay and scaler) to show decoded frame on screen, correct?
I can see some references for /dev/rga and /dev/rk29-ipp in some libraries.

Yes, you are right. We use hardware rga to composite to final framebuffer directly and ipp is for deinterlace.

thank you.

I lightly checked Rockchip's ICS source distributed from http://service.i-onik.de/a09_source_1.5/ics/.
it seems SurfaceFlinger::composeSurfaces() calls DisplayHardware::RenderVPUBuffToLayerBuff which uses rga on Rockchip device. then, all normal apps which use SurfaceFlinger to do composition can get benefit from hardware.

I guess other SoCs can behave similarly if it can do hardware composition. (it may be done by 3D accelerator, or it may be done by 2D accelerator)
(2013-02-23, 09:00)fun_ Wrote: is it impossible to use local native window which has MOD32 surface instead of the native window returned from g_xbmcapp.GetAndroidVideoWindow()?

I think metadata of input stream is correct and no need to *fix* it. only MOD32 surface is required to output frames from hardware.
We don't use a "real" surface, but a SurfaceTexture, which wraps a GL texture.
Behind the scenes, GraphicBuffers are used, which are HW buffers that the HW decoder fills with a decoded frame. On top of those, one EGLImageKHR is mapped to each native window buffer.
"SurfaceTexture::updateTexImage" just maps the latest EGLImageKHR to the GL texture.

Fact is, on the other platforms, if there is a MODxx tweak, the frame size metadata reflects this and all is well.
On rk3066, the returned frame size is not MODxx and I assume this (wrong) frame size is used to build the EGLImage -> crashes.
So, really, for this to work on rk3066, the HW buffers size should be MODxx.

Note that using SW renderer, i.e. libstagefright thumbnail mode, just crashes the mediaserver/vpu:
Code:
F/libc    (   90): Fatal signal 11 (SIGSEGV) at 0x40112000 (code=2), thread 1960 (mediaserver)
I/DEBUG   (   85): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (   85): Build fingerprint: 'rk30sdk/rk30sdk/rk30sdk:4.1.1/JRO03H/eng.root.20130116.110927:eng/test-keys'
I/DEBUG   (   85): pid: 90, tid: 1960, name: mediaserver  >>> /system/bin/mediaserver <<<
I/DEBUG   (   85): signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 40112000
I/DEBUG   (   85):     r0 40112000  r1 42a75020  r2 001fcfe0  r3 40c37000
I/DEBUG   (   85):     r4 41b89a48  r5 41b9b420  r6 00000440  r7 00000780
I/DEBUG   (   85):     r8 42678000  r9 41b900e0  sl 41b9b690  fp 41b8d428
I/DEBUG   (   85):     ip 40111000  sp 42877ca8  lr 40f600d7  pc 4013f798  cpsr 20000030
I/DEBUG   (   85):     d0  1010101010101010  d1  1010101010101010
I/DEBUG   (   85):     d2  1010101010101010  d3  1010101010101010
I/DEBUG   (   85):     d4  0000000000000000  d5  0000002800000000
I/DEBUG   (   85):     d6  4220000041300000  d7  3f8000003debc8c1
I/DEBUG   (   85):     d8  0000000000000000  d9  0000000000000000
I/DEBUG   (   85):     d10 0000000000000000  d11 0000000000000000
I/DEBUG   (   85):     d12 0000000000000000  d13 0000000000000000
I/DEBUG   (   85):     d14 0000000000000000  d15 0000000000000000
I/DEBUG   (   85):     d16 00000000000099cf  d17 7e37e43c8800759c
I/DEBUG   (   85):     d18 4000000000000000  d19 bf66c16be38d5283
I/DEBUG   (   85):     d20 3fc5555533bce6df  d21 3e66376972bea4d0
I/DEBUG   (   85):     d22 3ff0000000000000  d23 bf6376c7f8038f6c
I/DEBUG   (   85):     d24 3ff009bb63fc01c8  d25 0000000000000000
I/DEBUG   (   85):     d26 0000000000000000  d27 0000000000000000
I/DEBUG   (   85):     d28 0000000000000000  d29 0000000000000000
I/DEBUG   (   85):     d30 0000000000000000  d31 0000000000000000
I/DEBUG   (   85):     scr 60000010
I/DEBUG   (   85):
I/DEBUG   (   85): backtrace:
I/DEBUG   (   85):     #00  pc 0000c798  /system/lib/libc.so
I/DEBUG   (   85):     #01  pc 000fece4  <unknown>
I/DEBUG   (   85):
I/DEBUG   (   85): stack:
I/DEBUG   (   85):          42877c68  ec7a2d80
I/DEBUG   (   85):          42877c6c  42a74000  anon_inode:ion_share_fd
I/DEBUG   (   85):          42877c70  40c37000  /system/lib/libvpu.so
I/DEBUG   (   85):          42877c74  00000000
I/DEBUG   (   85):          42877c78  00000000
I/DEBUG   (   85):          42877c7c  40c37000  /system/lib/libvpu.so
I/DEBUG   (   85):          42877c80  00000000
I/DEBUG   (   85):          42877c84  40c2f514  /system/lib/libvpu.so (VPUMemInvalidate+200)
I/DEBUG   (   85):          42877c88  42877cdc
I/DEBUG   (   85):          42877c8c  42877c90
I/DEBUG   (   85):          42877c90  00000000
I/DEBUG   (   85):          42877c94  41b89a48  [heap]
I/DEBUG   (   85):          42877c98  41b9b420  [heap]
I/DEBUG   (   85):          42877c9c  00000440
I/DEBUG   (   85):          42877ca0  df0027ad
I/DEBUG   (   85):          42877ca4  00000000
I/DEBUG   (   85):     #00  42877ca8  42877cdc
I/DEBUG   (   85):          42877cac  42877ce8
I/DEBUG   (   85):     #01  42877cb0  00000000
I/DEBUG   (   85):          42877cb4  00000000
I/DEBUG   (   85):          42877cb8  00000000
I/DEBUG   (   85):          42877cbc  00000000
I/DEBUG   (   85):          42877cc0  41b9b6a0  [heap]
I/DEBUG   (   85):          42877cc4  00000000
(2013-02-23, 09:34)Koying Wrote: Fact is, on the other platforms, if there is a MODxx tweak, the frame size metadata reflects this and all is well.
On rk3066, the returned frame size is not MODxx and I assume this (wrong) frame size is used to build the EGLImage -> crashes.
So, really, for this to work on rk3066, the HW buffers size should be MODxx.

I'm sorry, I don't have enough knowledge to discuss EGL things...

I thought, buffer is prepared on main memory by software(your code). then, you can prepare tweaked buffer which has MODxx width/height for RK (you can calculate it from metadata), and you can pass it as a output buffer for hardware decoder.

but, fact is, it's impossible, right?
  • 1
  • 22
  • 23
  • 24(current)
  • 25
  • 26
  • 52

Logout Mark Read Team Forum Stats Members Help
libstagefright - Experimental hardware video decoding builds10