Improving movie db update performance.
#1
Hi,

I've just updated to the nightlies for XBMC (although I suspect the issue is present in 12.x) and thought I'd run the artwork downloader to update clearart, banners etc etc.

I found that it seemed to be taking a large amount of time, so I've started digging into why. Note that a factor here is that I'm using mysql on a NAS, so that I can share the db across machines. This will be exaggerating the round trip time for each sql call.

There appear to be a few issues at play when using artwork downloader (however, I'm not sure it's purely to blame, as the json-rpc api for doing an update does look to imply that internally it's just one element being updated/set):
  • artwork downloader makes 5 individual JSON-RPC calls to VideoLibrary.SetMovieDetails, setting just one element for each artwork.
  • the json rpc api fetches the movie information, and merges the changes in, it then calls CVideoDatabase::SetDetailsForMovie
  • CVideoDatabase::SetDetailsForMovie then proceeds to delete the movie from the database, and re-adds the whole movie. This causes a number of SQL calls to happen, which are remarkably slow.

Total time for updating one film is 15s, a chunk of which is the database deleting and recreating the movie, see pastebin:
http://pastebin.com/Tq28NS3K

The time to do one SetMovieDetails call is ~3s. (and it also looks like a ffmpeg is fired up to examine the mkv on pretty much each call)

Overall it seems inefficient to delete everything and reinsert things, as it's not just the movie object it's all the side items, eg actors, genres, that are also unlinked and recreated.

So I'm wondering how to improve this. I can think of a couple of ideas:
  • add a new API to Videolibrary that can be passed a movie id, and a map/table of changes - it can then be smarter and just update items that need to be updated. If it doesn't know how to do a requested update (IE it's too complex) it can error and do a full object refresh (either internally, or ask the caller to do so) I'm thinking that this allows for db/api changes, eg new fields come in, if the code doesn't know how to handle it then it falls back to the full refresh.
  • Extend the SetDetailsForMovie API, and add the ability to diff the new movie details against the current movie details, and only update what's changed - while this would catch more call paths it would lose the ability to pass in the knowledge of what exactly has changed (which the json-rpc api has when called) However, it does mean that deeper understanding of how to compare movies is needed (and maintained)

Relevant code from VideoLibrary:
https://github.com/xbmc/xbmc/blob/master....cpp#L1989
and json-rpc:
https://github.com/xbmc/xbmc/blob/master...y.cpp#L473

Any thoughts? Have I missed something obvious that the json-rpc/VideoLibrary could be doing instead? Or should I throw some code together and see what works?

Thanks,
Chris
Reply
#2
The main thought would be that database updates with "set of changes to perform" would be fine most likely for trivial changes (i.e. change elements X,Y,Z) but is likely to be quite a bit more complex if you'd want to change just some of the linked item information (e.g. update a single cast member). Thus, I'd limit any partial updating to just handling top-level fields (if you want a cast change, then flush the cast and drop it back in, similarly with genre etc.)

i.e. keep it as simple as you can.

Note that the GetDirectory() fetches are likely what are causing the ffmpeg stuff to run - probably as you're sitting on the movie->titles screen while doing this, which is getting updates on items all the time? Drop back to Home or any other view pretty much should avoid this.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#3
I've done an initial version. I figured it was worth sharing to get initial comments. (Note my c++ is somewhat rusty, so I maybe missing some c++ tricks)

I've ended up handling the more complex cases, but I'm comfortable that it's doing the right thing.

I think I've also found a bug in the current code, it appears that the writers in writerlinkmovie aren't deleted when DeleteMovie is called. So if the credits change I suspect movies will end up with the old and new writers. (Do you want a bug for this?)

The results are here:
https://github.com/cg110/xbmc/commit/674...b63cc777d3

I'm not going to repeat the description from there here.

Overall effect is that the same kind of update from the artwork downloader addon is down to 1-2s for updating all the art, from 17s. Note this is with a debug build, so I suspect retail might be quicker.

Initial comments would be appreciated. Am I headed in the right direction with how XBMC does things?

Thanks,
Chris
Reply
#4
(2014-02-17, 02:30)cg110 Wrote: I think I've also found a bug in the current code, it appears that the writers in writerlinkmovie aren't deleted when DeleteMovie is called. So if the credits change I suspect movies will end up with the old and new writers. (Do you want a bug for this?)

I have noticed that the c06 and c15 fields in the movie table sometimes contain duplicate credits, could this be related to this bug?
Reply
#5
Awesome Smile

Yup, we want a bug report for the writelinkmovie issue. It should really be using triggers to drop 'em. Even better would be a pull request to fix it Wink

I've taken a brief look over the changes, and the idea looks sound. One thing is that it possibly doesn't matter if you update the entire row in the movie table versus updating one element of it, so you may be able to simplify things down significantly by just replacing the entire row all the time, with the special cases then being primarily the link tables. Timing on this might be interesting (it may make a small difference due to rebuilding of indices in some cases).

Assuming just the linking variables need handling, the amount of code drops quite a bit I think?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
I'm trying to build your patch on tip of master (4af3b544366b93e3b0a5958a5504318611ae4d32), but I'm getting a number of build compilation errors (and warnings) - see pastebin (ignore the ref to 73371a9, the source being compiled really is 4af3b54).

Any ideas?
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#7
The cbegin/cend are c++11 IIRC. Just switch to begin()/end().

The warnings about the switch are OK - it probably could do with a default branch though.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#8
Looks good overall. I've added a few comments. A PR would certainly be welcome.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#9
Thanks for the comments so far.

I'll take a closer look this evening. Yes, the cbegin/cend are my fault, visual studio says here's some constant iterator things, I thought awesome they're what I needed, although I didn't remember them from c++ last time I used c++ (some years ago). I'll fix that.

I'll also get bug in and a pull request this evening for the writers things, I didn't include the fix in this change so I could do a separate targeted patch.

I'm not sure reducing the logic to not mess about with making a reduced update line would be that much less change, as the code still has to work out what the update has done, and it seemed easier to just tag all the changes. Certainly my thinking was the less SQL run the less gratuitous index updates (or triggers) then get run (my SQL knowledge being from SQL Server, it can do a lot of work to just set columns back to the same values)

I guess it depends if there'd ever be a trigger on a update that required the movie row to be touched, even if a list item changed.
Reply
#10
I've pushed a couple more commits to:
https://github.com/cg110/xbmc/commits/master

MilhouseVH: in theory it should compile on openelec/linux now, as I've fixed the switch warning and removed the cbegin/cend.

I've also tidied up VideoLibrary changes based on the comments on github.

I've still some tidy up to do in VideoDatabase, but out of time for this evening.

Took me longer than I thought to work out the magic runes to create the patch and pull request in for the bugreport (http://trac.xbmc.org/ticket/14940 )
Reply
#11
(2014-02-17, 23:47)cg110 Wrote: I've pushed a couple more commits to:
https://github.com/cg110/xbmc/commits/master

MilhouseVH: in theory it should compile on openelec/linux now, as I've fixed the switch warning and removed the cbegin/cend.

I've also tidied up VideoLibrary changes based on the comments on github.

I've still some tidy up to do in VideoDatabase, but out of time for this evening.

Took me longer than I thought to work out the magic runes to create the patch and pull request in for the bugreport (http://trac.xbmc.org/ticket/14940 )

Excellent stuff, it builds and works very well (I'll upload a test build shortly to the OpenELEC/R-Pi test thread).

Here's the results when updating three movies (Argo, Kung Fu Panda 2 and Avatar) in a MySQL setup with imdb votes and ratings (using JSON, ie. the imdb option of the script in my sig) - all times in seconds:

Code:
.         Argo      K-F Panda 2     Avatar
Before: 32.069271      5.011988   22.674556
After:   0.256215      0.670386    0.146922

Note that the votes and ratings where bumped for the second (after) run to ensure the updates weren't optimised away.

SQL debug for the "after" updates:
Code:
22:08:19 136.565018 T:2837443664   DEBUG: JSONRPC: Incoming request: {"jsonrpc": "2.0", "params": {"rating": 7.9, "votes": "298,101", "movieid": 19}, "method": "VideoLibrary.SetMovieDetails", "id": "libSetDetails"}
22:08:19 136.759491 T:2837443664   DEBUG: Mysql Start transaction
22:08:19 136.776886 T:2837443664   DEBUG: Mysql execute: update movie set c05='7.900000',c04='298,101' where idMovie=19
22:08:19 136.810257 T:2837443664   DEBUG: Mysql commit transaction
22:08:19 136.820389 T:2837443664   DEBUG: JSONRPC: Incoming request: {"jsonrpc": "2.0", "params": {"votes": "116,469", "movieid": 127}, "method": "VideoLibrary.SetMovieDetails", "id": "libSetDetails"}
22:08:19 136.931686 T:2837443664   DEBUG: Mysql Start transaction
22:08:19 136.942627 T:2837443664   DEBUG: Mysql execute: update movie set c04='116,469' where idMovie=127
22:08:20 137.483810 T:2837443664   DEBUG: Mysql commit transaction
22:08:20 137.490738 T:2837443664   DEBUG: JSONRPC: Incoming request: {"jsonrpc": "2.0", "params": {"votes": "643,358", "movieid": 751}, "method": "VideoLibrary.SetMovieDetails", "id": "libSetDetails"}
22:08:20 137.599854 T:2837443664   DEBUG: Mysql Start transaction
22:08:20 137.610779 T:2837443664   DEBUG: Mysql execute: update movie set c04='643,358' where idMovie=751
22:08:20 137.629288 T:2837443664   DEBUG: Mysql commit transaction
22:08:20 137.640137 T:2837443664    INFO: JSONRPC Server: Disconnection detected

Absolutely outstanding! Smile
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#12
Awesome, that's what I hoped it would do Smile

That's heck of a drop for some of those films...
Reply
#13
(2014-02-18, 00:47)cg110 Wrote: That's heck of a drop for some of those films...

Yeah, it's all the cast members that are the killer (218, 32 and 167 respectively).
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply
#14
While I'm comfortable that the code does work for the simple cases it's not fully tested, so be aware of that before pushing it into your elec/rpi build.

Certainly I'm wondering if the list manipulations will work correctly (eg writers) as the transaction starts by deleting everything, and then adds the missing elements.

I expect it to work, but the "old code" transacted the deletes, then did the updates as another transaction, I'm trying to do it all in one. My experience of SQL is that just cos it should work doesn't mean it does Smile

Also your post made me spot a missing short cut. If the list of updatedDetails is empty then don't transact nothing to the sql db. This is possible in the case that it's a refresh that doesn't change anything. You did hint in your testing, that you had to force a change in the rating/votes count for it to do anything.

Anyway I've done an update that makes the Update code just return if there ends up being an empty updatedDetails set.
Reply
#15
I've made a test build available that includes the changes being discussed in this thread. It's based on OpenELEC, and for Raspberry Pi: details here. I've included a warning and flagged it as experimental, testers beware!

I've done further testing (still while updating only imdb votes/rating) and the 32 second and 22 second timings appear to be outliers - maybe the MySQL database wasn't "warmed up", as repeating the same test with a regular non-optimised build results in timings of between 6 seconds (Avatar) and 8 seconds (Argo). Even so, the optimised build is still at least one order of magnitude faster.
Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
Reply

Logout Mark Read Team Forum Stats Members Help
Improving movie db update performance.0