Still faster using central MySQL DB vs local SQLite?
#1
When the Pi was first introduced, a central MySQL database was usually suggested to improve library browsing performance. It's been awhile since I've researched the topic and Google hasn't really turned up anything covering the post Pi 2B era. Is there still a performance benefit over contemporary MicroSD cards?

My Raspberry Pi Model 2B's are all using Samsung EVO 16GB MicroSD cards. The central MySQL database would be on a modern SSD (e.g. Samsung 850 EVO) in a NAS (CPU Passmark ~1,000). Terminal library size would be on the scale of 1,000's of movies, 100's of TV Shows and 1,000's of music albums.
Reply
#2
Just playing wth numbers, it looks like the major bottlenecks of the Pi are:

* SD card max. link speed --> ~20MB/s

* MicroSD card performance (random R/W being key). My Samsung EVO 16GB Class 10 UHS-1 benchmark on a stock Pi 2B is listed below. The fastest cards I've seen are only slightly faster. For some reason, random r/w takes a noticeable hit (~40%) on the Pi compared to a PC. Overclocking/tuning may pay large dividends here.

Code:
Creating test file flash-bench.tmp of size 512MB.
Benchmark: Sequential read  Limits: Total size: 512MB Duration: 60s
512.0MB processed in 29.48s (17.37MB/s), CPU: user 0.03%, sys 1.40%
Benchmark: Sequential write  Limits: Total size: 512MB Duration: 60s
512.0MB processed in 53.83s (9.51MB/s), CPU: user 0.02%, sys 1.35%
Benchmark: Random read  Limits: Total size: 512MB Duration: 60s
278.8MB processed in 60.02s (4.64MB/s), CPU: user 0.03%, sys 1.54%
Benchmark: Random write  Limits: Total size: 512MB Duration: 60s
171.9MB processed in 154.74s (1.11MB/s), CPU: user 0.00%, sys 0.32%

* USB bus max link speed --> ~30MB/s

* Fast Ethernet max. speed --> ~12MB/s
Reply
#3
Maximizing the library browsing experience seems to fall on three key metrics -- fanart resolution, GUI speed and loading time. The Pi 2 currently scales artwork to 720p for background images and 540p for thumbnails. In my experience, there is a noticeable quality decline as artwork gets cached, but also an increase in GUI speed. When images are cached, loading and browsing my small library is perfectly speedy. I don't know if this would scale to huge library sizes, though.

Artwork is cached locally, while the database can be stored in a central MySQL server. Full resolution artwork (i.e. 1920x1080 fanart; 720x1080p thumbnails) seems to average ~250KB for thumbnails and ~1MB for background fanart. Loading these assets would presumably occur at 512K or sequential read speeds, meaning the Pi's SD Card link should be about saturated near 20MB/s. The Pi 2 has enough excess RAM to preload hundreds of thumbnails/fanart files. At ~20 images loaded per second, only fast library scrolling would cause artwork loading to lag.

A central MySQL database stored on a modern SSD would provide random r/w speeds that would easily saturate the Pi 2's fast ethernet link, providing ~2.5x speed increase. I don't know peak USB stick r/w performance, but storing the database on a USB stick could theoretically double the bandwidth again. In short, a central MySQL db offers a potentially significant speed boost for huge libraries (10,000s of entries; 10s of megabytes in size).

Open questions are how processing power and memory bandwidth plays a role. Does the Pi 2B still have bottlenecks here a central server could overcome?
Reply
#4
With the right SD card and sdhost overclocking options, 40MB/s sequential read from SD card is possible:

http://forum.kodi.tv/showthread.php?tid=...pid2005396
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#5
Ah, thanks. Upgrading to a nice 2A power supply and overclocking/tuning my Pi 2B would probably pay the highest dividends at the moment. I need to do some reading...

If anyone uses high resolution artwork, I'd be interested in hearing about your experience and performance optimizations. The same goes for MySQL users.
Reply
#6
It's perhaps difficult to say with any certainty without comparing like with like, but for your typical library sizes (ie. small to medium) I wouldn't expect there to be much difference between SQLite and MySQL. If anything SQLite may be faster due to the reduced latency. With a large library, MySQL on reasonably capable hardware might be expected to outperform SQLite (depending on the complexity of the query, size of resultset etc. etc.)

In general the only reason to use MySQL would be the benefits of a shared/centralised media library - any performance losses/gains are likely to be minimal.

Maybe someone needs to run an exhaustive set of benchmarks comparing different sized SQLite and MySQL libraries? Smile
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#7
(2016-02-02, 20:13)Milhouse Wrote: It's perhaps difficult to say with any certainty without comparing like with like, but for your typical library sizes (ie. small to medium) I wouldn't expect there to be much difference between SQLite and MySQL. If anything SQLite may be faster due to the reduced latency. With a large library, MySQL on reasonably capable hardware might be expected to outperform SQLite (depending on the complexity of the query, size of resultset etc. etc.)

In general the only reason to use MySQL would be the benefits of a shared/centralised media library - any performance losses/gains are likely to be minimal.

Maybe someone needs to run an exhaustive set of benchmarks comparing different sized SQLite and MySQL libraries? Smile

Thanks. If I do end up using a central MySQL server I'll try to post my results. However, I think I'm going to spend my effort recapturing as much of the native SD card performance that I can. 12MB/s random read speeds would be very nice. Big Grin

Image

Do you know why your write speeds didn't scale up with the mmc or sdhost overclocks? Do you remember if random read/write scaled as well or was sequential read the only improvement?
Reply
#8
Write speed is most likely dictated by the speed of the SD card controller and how quickly it can erase the flash blocks etc. whereas sucking data off the SD card is going to be pretty much limited only by the bus speed. Overclocking the bus speed improves read performance but does nothing for write performance.

I must admit I haven't really tested random read as it's not so easy to benchmark as sequential and at the time I just wanted something indicative. You could try benchmarking more thoroughly with Bonnie++.

If you need Bonnie++ for OpenELEC, install this file:
Code:
wget -q http://nmacleod.com/public/oebuild/bonnie++ -O ~/bonnie++ && chmod +x ~/bonnie++

Bonnie++ man page: https://manned.org/bonnie++/7978d2d5

OpenELEC RPi2 with "NOOBS" SD card @ 50MHz:
Code:
rpi22:~ # systemctl stop kodi
rpi22:~ # mkdir /storage/bonnie
rpi22:~ # /storage/bonnie++ -d /storage/bonnie -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
rpi22         1344M   148  99  8026   3  6157   3   719  99 26503   5  1394  42
Latency             57991us    2760ms    2048ms   13669us   12907us   17045us
Version  1.97       ------Sequential Create------ --------Random Create--------
rpi22               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 11567  80 +++++ +++ 18795  71   458  97 +++++ +++ 17785  69
Latency               842us    2505us    2756us   10534us      55us    1143us

OpenELEC RPi2 with "NOOBS" SD card @ 100MHz:
Code:
rpi22:~ # systemctl stop kodi
rpi22:~ # mkdir /storage/bonnie
rpi22:~ # /storage/bonnie++ -d /storage/bonnie -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
rpi22         1344M   142  95  8960   4  7412   4   721  99 46547  10  1407  42
Latency             55850us    2195ms    1858ms   12791us    4359us    9321us
Version  1.97       ------Sequential Create------ --------Random Create--------
rpi22               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  8861  61 +++++ +++ 11823  45   475  99 +++++ +++ 19912  77
Latency               674us    2454us    2698us   13318us      83us    1622us

100MHz is better, certainly the sequential read speed (single file) bears out my earlier testing. Unfortunately the sequential and random reads from the second test (multiple files) complete too quickly for Bonnie++ to consider them reliable tests... Sad
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#9
iozone is a better choice for benchmarking an sdcard on Pi.
Reply
#10
1) I was able to overclock my Pi to "turbo settings" (1000MHz ARM, 500Mhz Core, 500Mhz SDRAM, Overvolt=2).

2) I increased my resolution to 1080 for both <fanart> and <imageres> using a custom advancedsettings.xml. I also increased artwork bit-depth in the settings menu. I noticed in another thread that you recommended increasing gpu_mem size to 320MB from the default 256MB. Is this still valid?

3) I've been benchmarking my 16GB Samsung EVO card on the Pi using a utility created by user hglm from the Raspberry Pi forums.

Code:
Overlay config                      core_freq   turbo    SEQ. READ      SEQ. WRITE   RDM. READ     RDM. Write
Default w/ “Turbo Overclock”           500        0      17.37 MB/s     9.51 MB/s    4.64 MB/s     1.11 MB/s
dtoverlay=mmc,overclock_50=63          500        0      14.06 MB/s     8.94 MB/s    5.21 MB/s     1.19 MB/s
dtoverlay=sdhost                       500        0      15.10 MB/s     9.54 MB/s    4.89 MB/s     1.08 MB/s
dtoverlay=sdhost,overclock_50=63       500        0      17.89 MB/s     Err. MB/s    5.07 MB/s     1.05 MB/s
dtoverlay=sdhost,overclock_50=84       500        0      18.49 MB/s     9.45 MB/s    5.38 MB/s     1.06 MB/s
dtoverlay=sdhost,overclock_50=100      500        0      21.95 MB/s     9.84 MB/s    5.50 MB/s     1.06 MB/s
dtoverlay=sdhost,overclock_50=100      500        1      20.90 MB/s     9.36 MB/s    5.47 MB/s     1.05 MB/s

In short, I saw a max. ~27% improvement in sequential read speeds and ~18.5% improvement in random read speeds from default. Each dtoverlay setting was confirmed with:
Code:
cat /sys/kernel/debug/mmc0/ios

Giving an output like the following, in this instance for dtoverlay=sdhost,overclock_50=100 (force_turbo=0):
Code:
clock:        50000000 Hz
actual clock:    100000000 Hz
vdd:        21 (3.3 ~ 3.4 V)
bus mode:    2 (push-pull)
chip select:    0 (don't care)
power mode:    2 (on)
bus width:    2 (4 bits)
timing spec:    2 (sd high-speed)
signal voltage:    0 (3.30 V)

Turbo mode on/off was verified using bcmstat for each run, even though an addendum to your post says that turbo_mode currently defaults to "on" for sdhost reads/writes anyways.
Reply
#11
Performance is perfectly fine with high resolution fanart, except for some slight stuttering in Apple-style "Cover Flow" library views. Just for reference, my testing has specifically been with the ReFocus 2.0 skin using the "slide" and "gallery" views.

This stuttering persists no matter how long you wait, so it isn't a slow loading problem. If I wait and let the cpu slow down to 600MHz while watching bcmstat, constant scrolling doesn't cause it to ramp back up despite the slight stuttering. I assume then that the CPU isn't the limiting factor, unless forcing turbo mode would help? Are there any adjustments I can make to improve performance here?
Reply
#12
(2016-02-03, 00:10)ZwartePiet Wrote: I noticed in another thread that you recommended increasing gpu_mem size to 320MB from the default 256MB. Is this still valid?

I think so - when using higher resolution artwork, and higher colour depth, I found it was possible to exhaust GPU memory when scrolling rapidly through artwork ("insufficient resources" etc.). It won't happen often at 256MB, but is still possible, which is why I recommend 320MB - you've got the RAM to spare, so use it. You're extremely unlikely to see "insufficient resources" errors with gpu_mem=320.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#13
(2016-02-02, 23:31)popcornmix Wrote: iozone is a better choice for benchmarking an sdcard on Pi.

For OpenELEC (and maybe anyone else that doesn't want to build iozone):
Code:
wget -q http://nmacleod.com/public/oebuild/iozone -O ~/iozone && chmod +x ~/iozone

This thread has several examples of iozone results - for comparison purposes I suggest running it as follows:
Code:
~/iozone -e -I -a -s 50M -r 4k -r 512k -r 16M -i 0 -i 1 -i 2

My iozone results for NOOBS SD card, RPi2, OpenELEC (arm=1000, core=500, gpu=400, sdram=600, force_turbo=1):

sdhost@50MHz,
Code:
.                                                             random    random
              kB  reclen    write  rewrite    read    reread    read     write
           51200       4     2419     2414     7874     7866     7397      840
           51200     512     9644    11057    21695    21696    21683     4782
           51200   16384    11492    11109    22011    22010    22010    12914

sdhost@100MHz:
Code:
.                                                             random    random
              kB  reclen    write  rewrite    read    reread    read     write
           51200       4     2722     2661     9425     9425     9416     1018
           51200     512    13430    10026    39985    39975    39952     5192
           51200   16384    13675    11458    41005    41012    41018    17290

100Mhz relative to 50MHz:
Code:
.                                                             random    random
              kB  reclen    write  rewrite    read    reread    read     write
           51200       4   +12.5%   +10.2%   +19.7%   +19.8%   +27.3%   +21.2%
           51200     512   +39.3%    -9.3%   +84.3%   +84.3%   +84.3%    +8.6%
           51200   16384   +19.0%    +3.2%   +86.3%   +86.3%   +86.4%   +33.9%
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#14
(2016-02-03, 00:27)ZwartePiet Wrote: This stuttering persists no matter how long you wait, so it isn't a slow loading problem. If I wait and let the cpu slow down to 600MHz while watching bcmstat, constant scrolling doesn't cause it to ramp back up despite the slight stuttering. I assume then that the CPU isn't the limiting factor, unless forcing turbo mode would help? Are there any adjustments I can make to improve performance here?

What version (OS, Kodi) are you using? Recent OpenELEC/Kodi 17 builds (and possibly other Kodi 17 distributions) will now use both GPU and ARM for image decoding (two GPU queues plus 4 ARM cores, rather than just the two GPU queues while the ARM cores sat idle). This can make the GUI a little smoother (able to decode more images at the same time) and should also see the ARM cores spin up more often.

I'd suggest gpu_freq=400 which will help decode images more quickly (those that are decoded by the GPU, anyway).
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#15
(2016-02-03, 02:39)Milhouse Wrote: I'd suggest gpu_freq=400 which will help decode images more quickly (those that are decoded by the GPU, anyway).
Interesting - does this result in a measurable improvement (will test myself tonight)?
Also, is there a more specific overclock setting? I ask because after reboot I noticed that h264 is always 250 MHz even when not in use. I'm wondering if there's a way to keep previous behavior (0 when not in use, 250 when in use, 300 - or 400 in this case - when needed).
Reply

Logout Mark Read Team Forum Stats Members Help
Still faster using central MySQL DB vs local SQLite?0