Kodi Community Forum

Full Version: Media importing and library integration (UPnP, Emby, Plex, ...)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
(2021-03-09, 17:56)Supersilver Wrote: [ -> ]Still can't import large TV Show libraries, everytime it gets to retrieving episodes from media provider it errors out. Probably will never work with large libraries I guess?
What is the error you get? Do you get it immediately when it tries to retrieve episodes or after a while?

It happened to me once or twice that I got a python exception when trying to retrieve a chunk of media items from an Emby server because something was wrong with the gzip compressed data. I don't know where exactly these errors come from since it works 99.9% of the times for me independent of whether I try to import 100, 1000 or 2000 episodes. Maybe I should add a retry to the import code in the add-on which tries again 3 to 5 times before completely giving up. It certainly has nothing to do with the general import logic because this happens in the add-on.
(2021-03-08, 23:40)Montellese Wrote: [ -> ]After spending several hours trying to improve the whole code responsible for adding / writing media items retrieved from a media provider into Kodi's database I realized that I only had to adjust the SQL transaction handling to improve the performance by a factor of up to 50. I only tested this on SQLite (so no idea what the performance with MySQL is) and the improvements vary depending on whether the SQLite database is located on a HDD or SDD but either way the performance improvement should be significant. On my dev computer importing 60 movies to an HDD took ~80s and on an SSD it took ~9s. With the improvements this went down to ~1.4s. Now when importing a large library the main time is spent on retrieving the media items from the media provider - adding the retrieved items to the database is very fast.

And last but not least these new test builds are based on Kodi v19 (final).

Downloads: https://github.com/Montellese/xbmc/wiki/...#downloads
Changelog: https://github.com/Montellese/xbmc/wiki/...erformance

I found the same thing with regards to transactions, specifically commits.  A commit with each record insert or similar will be very slow and as you observed storage media dependent on the severity of the slowness.  I batch up inserts and then do periodic commits.  It made a huge difference in performance.  Similar to what you've seen.  On my Intel NUC with an SSD I can get around 200 inserts a second.  13K records in just over a minute.


Jeff
(2021-03-09, 20:29)jbinkley60 Wrote: [ -> ]
(2021-03-08, 23:40)Montellese Wrote: [ -> ]After spending several hours trying to improve the whole code responsible for adding / writing media items retrieved from a media provider into Kodi's database I realized that I only had to adjust the SQL transaction handling to improve the performance by a factor of up to 50. I only tested this on SQLite (so no idea what the performance with MySQL is) and the improvements vary depending on whether the SQLite database is located on a HDD or SDD but either way the performance improvement should be significant. On my dev computer importing 60 movies to an HDD took ~80s and on an SSD it took ~9s. With the improvements this went down to ~1.4s. Now when importing a large library the main time is spent on retrieving the media items from the media provider - adding the retrieved items to the database is very fast.

And last but not least these new test builds are based on Kodi v19 (final).

Downloads: https://github.com/Montellese/xbmc/wiki/...#downloads
Changelog: https://github.com/Montellese/xbmc/wiki/...erformance

I found the same thing with regards to transactions, specifically commits.  A commit with each record insert or similar will be very slow and as you observed storage media dependent on the severity of the slowness.  I batch up inserts and then do periodic commits.  It made a huge difference in performance.  Similar to what you've seen.  On my Intel NUC with an SSD I can get around 200 inserts a second.  13K records in just over a minute.


Jeff
I'm not there yet. Initially I also tried to improve the performance by using batch inserts but in the current state of the code it's very difficult because before every INSERT statement the code first performs a SELECT statement to check if the item already exists or not. This can be refactored up to a certain point but it's not possible to get rid of these completely because a lot of the relationship data (e.g. actors / cast or genre) require these checks to avoid unique constraint violations and to get the ID of the item so that it can be used as a foreign key.

But from my performance tests with the latest release the limiting factor is by far the time it takes to retrieve all the metadata from the media provider. I just imported ~9K movies from a Plex media server and it took 95 minutes of which 85 minutes were spent on retrieving the items and 10 minutes were spent on writing the items to the database.
(2021-03-10, 19:13)Montellese Wrote: [ -> ]
(2021-03-09, 20:29)jbinkley60 Wrote: [ -> ]
(2021-03-08, 23:40)Montellese Wrote: [ -> ]After spending several hours trying to improve the whole code responsible for adding / writing media items retrieved from a media provider into Kodi's database I realized that I only had to adjust the SQL transaction handling to improve the performance by a factor of up to 50. I only tested this on SQLite (so no idea what the performance with MySQL is) and the improvements vary depending on whether the SQLite database is located on a HDD or SDD but either way the performance improvement should be significant. On my dev computer importing 60 movies to an HDD took ~80s and on an SSD it took ~9s. With the improvements this went down to ~1.4s. Now when importing a large library the main time is spent on retrieving the media items from the media provider - adding the retrieved items to the database is very fast.

And last but not least these new test builds are based on Kodi v19 (final).

Downloads: https://github.com/Montellese/xbmc/wiki/...#downloads
Changelog: https://github.com/Montellese/xbmc/wiki/...erformance

I found the same thing with regards to transactions, specifically commits.  A commit with each record insert or similar will be very slow and as you observed storage media dependent on the severity of the slowness.  I batch up inserts and then do periodic commits.  It made a huge difference in performance.  Similar to what you've seen.  On my Intel NUC with an SSD I can get around 200 inserts a second.  13K records in just over a minute.


Jeff
. I just imported ~9K movies from a Plex media server and it took 95 minutes of which 85 minutes were spent on retrieving the items and 10 minutes were spent on writing the items to the database.
I'd think that PlexKodiConnect is a couple of magnitudes faster than that, though I can't test on a large library. Took months to improve... Have you looked at its code, specifically the MySQL and Plex metadata retrieval part?
(2021-03-10, 22:47)Croneter Wrote: [ -> ]
(2021-03-10, 19:13)Montellese Wrote: [ -> ]
(2021-03-09, 20:29)jbinkley60 Wrote: [ -> ]I found the same thing with regards to transactions, specifically commits.  A commit with each record insert or similar will be very slow and as you observed storage media dependent on the severity of the slowness.  I batch up inserts and then do periodic commits.  It made a huge difference in performance.  Similar to what you've seen.  On my Intel NUC with an SSD I can get around 200 inserts a second.  13K records in just over a minute.


Jeff
. I just imported ~9K movies from a Plex media server and it took 95 minutes of which 85 minutes were spent on retrieving the items and 10 minutes were spent on writing the items to the database.
I'd think that PlexKodiConnect is a couple of magnitudes faster than that, though I can't test on a large library. Took months to improve... Have you looked at its code, specifically the MySQL and Plex metadata retrieval part?
I don't doubt it. Both the Emby and Plex media importer add-ons I wrote are just there to showcase what is possible with my media import work. I'm not a python programmer and I'm neither an Emby nor a Plex specialist or even user. So I just went with the simplest way to get to the data and in case of Plex this is achieved by using python-plexapi. So when it comes to the performance of retrieving media items from a Plex server the add-on is completely dependent on the performance of python-plexapi. My hope is that people with a lot more expertise will jump in and either take over the media importer add-ons I wrote or write completely new ones.
(2021-03-10, 19:13)Montellese Wrote: [ -> ]
(2021-03-09, 20:29)jbinkley60 Wrote: [ -> ]
(2021-03-08, 23:40)Montellese Wrote: [ -> ]After spending several hours trying to improve the whole code responsible for adding / writing media items retrieved from a media provider into Kodi's database I realized that I only had to adjust the SQL transaction handling to improve the performance by a factor of up to 50. I only tested this on SQLite (so no idea what the performance with MySQL is) and the improvements vary depending on whether the SQLite database is located on a HDD or SDD but either way the performance improvement should be significant. On my dev computer importing 60 movies to an HDD took ~80s and on an SSD it took ~9s. With the improvements this went down to ~1.4s. Now when importing a large library the main time is spent on retrieving the media items from the media provider - adding the retrieved items to the database is very fast.

And last but not least these new test builds are based on Kodi v19 (final).

Downloads: https://github.com/Montellese/xbmc/wiki/...#downloads
Changelog: https://github.com/Montellese/xbmc/wiki/...erformance

I found the same thing with regards to transactions, specifically commits.  A commit with each record insert or similar will be very slow and as you observed storage media dependent on the severity of the slowness.  I batch up inserts and then do periodic commits.  It made a huge difference in performance.  Similar to what you've seen.  On my Intel NUC with an SSD I can get around 200 inserts a second.  13K records in just over a minute.


Jeff
I'm not there yet. Initially I also tried to improve the performance by using batch inserts but in the current state of the code it's very difficult because before every INSERT statement the code first performs a SELECT statement to check if the item already exists or not. This can be refactored up to a certain point but it's not possible to get rid of these completely because a lot of the relationship data (e.g. actors / cast or genre) require these checks to avoid unique constraint violations and to get the ID of the item so that it can be used as a foreign key.

But from my performance tests with the latest release the limiting factor is by far the time it takes to retrieve all the metadata from the media provider. I just imported ~9K movies from a Plex media server and it took 95 minutes of which 85 minutes were spent on retrieving the items and 10 minutes were spent on writing the items to the database.

I do all of that in my code and still do batch commits.   I'll do an insert, then a select to see if it inserted, grab a value for the next piece.  I do a path table check/insert,  a file table check/insert, then an episode/movie table check/insert, then genre, then actor,  artwork tables, then stream details etc.. and with Kodi 19 I am getting around 30 records at a time from Mezzmo.  In Kodi 18 I could get well over 1,000 in a pull.  All of that gets written to the database and then a commit.  I was seeing like you when I did a commit at each insert it was terribly slow.  The same code running on a Raspberry Pi 4 is getting around 14 inserts a second and around 42 a second on a Vero 4K+.   All of these times include the fetch time from Mezzmo.  One thing I did was to ask the Mezmo developers to do is create a special query where I can pull any or all (by index) records from Mezzmo in a single request sorted with newest to oldest. 

To optimize my code I started with commits after everything.  Then starting at the innermost loop I started removing the commits, ensure data is written properly, go to the next loop etc.  I am happy to help with the SQL stuff.   


Jeff
@jbinkley60,
It would be quite illuminating to have a comparison between the current Mezzmo addon and a Mezzmo mediaimporter addon. At current we can only compare highly optimized addons for Plex and Emby with,  what are basically their proof of concept, counterparts for MediaImport. From my testing, retrieving items from the media server is taking significantly more time than it takes to write them to the database. This suggests the there would be a much greater payoff in working on optimizing retrieval, rather than further optimizing writing to the database.

This does not mean that the code for writing to the database could not be optimized further, just that the data so far suggest looking at the interaction with the server rather with Kodi. Since you have intimate knowledge of the current Mezzmo addon, it would be more obvious to you where the bottleneck is in the import process if a Mezzmo mediaimporter does not perform as well as the current one. At present we don't know. Hopefully you can find some time to have a go at it.
(2021-03-11, 19:45)LongMan Wrote: [ -> ]@jbinkley60,
It would be quite illuminating to have a comparison between the current Mezzmo addon and a Mezzmo mediaimporter addon. At current we can only compare highly optimized addons for Plex and Emby with,  what are basically their proof of concept, counterparts for MediaImport. From my testing, retrieving items from the media server is taking significantly more time than it takes to write them to the database. This suggests the there would be a much greater payoff in working on optimizing retrieval, rather than further optimizing writing to the database.

This does not mean that the code for writing to the database could not be optimized further, just that the data so far suggest looking at the interaction with the server rather with Kodi. Since you have intimate knowledge of the current Mezzmo addon, it would be more obvious to you where the bottleneck is in the import process if a Mezzmo mediaimporter does not perform as well as the current one. At present we don't know. Hopefully you can find some time to have a go at it.

My goal is eventually be able to find the time to work on a MediaImporter interface for the Mezzmo addon.  My "part time" consulting gig has become well over full time recently.  I am trying to spend some time periodically looking at the code in the Emby MediaImporter addon.  I find the retrieving data from the uPNP servers being slow a bit interesting especially slower than writing to the database.  I am seeing the opposite with the Mezzmo addon.  It is working like traditional databases where reading is faster than writing.  I have a 150 item Mezzmo playlist I use for testing.  It loads in about .3secs on a NUC and Kodi on Windows PC and less than 4 seconds on a Raspberry Pi 4 and Vero 4K+.  I am still working my way through the Emby code so I plan to look at the retrieval functions.  I agree that comparing against highly optimized may not be the best comparison but it will give an indication of the potential.


Jeff
Fingers crossed
I may have found the culprit for the remaining time it takes to write to the database. For ~1k movies it currently takes ~250s to retrieve them and ~90s to write them to the database. If I remove the SQL code which writes the actors of the movies to the database this goes down to ~5s. I tried to break it down further but it seems like the impact comes primarily from the INSERT INTO actor queries which have to be executed a lot because every movie also has around 20 to 40 actors which results in ~30k INSERT INTO actor statements.

Concerning retrieval I have to be honest that I haven't done any profiling there yet. There are several things which could be slow:
  • Retrieving the data via REST API from Emby (JSON) / Plex (XML) servers
  • Converting the data into ListItems
  • Passing the ListItems from the add-on to Kodi core.
So I'll try to find a way to profile these different steps to see which parts are taking long.
OK I spent some time profiling the add-on code and the interface to Kodi. For mediaimporter.emby retrieving all items from the Emby server using the REST API is very fast and~90% of the time is spent on putting together ListItems. For mediaimporter.plex retrieving all items from the Plex Media Server (using python-plexapi is rather slow.

So I'm currently focusing on the python interface to Kodi because it benefits all media importer add-ons.
  • With some basic modifications to the code which generates the python bindings I can improve the performance of putting together ListItems by ~25%. As a bonus these changes should also speed up any other python add-on / plugin which operates on ListItems.
  • By introducing more specific interface methods (instead of using methods like ListItem.setInfo() which takes a huge dict and has to take it apart again) I managed to gain another 25%. The problem with this is that they are not save to use for python plugins and other add-ons directly modifying ListItems which have been retrieved from Kodi core. So now the question is whether I should introduce new classes / structures to represent items coming from media importer add-ons to be able to freely improve performance or not. If I would do this it would be a lot more difficult for existing add-ons already using ListItems to integrate media import. Or maybe there is a third way which I haven't figured out yet.
Quote:If I would do this it would be a lot more difficult for existing add-ons already using 
ListItems to integrate media import. Or maybe there is a third way which I haven't figured out yet.

If this path is chosen would something like the mediaimport helper that I suggested remove that limitation by providing an interface to the new methods?
(2021-03-12, 16:12)Montellese Wrote: [ -> ]OK I spent some time profiling the add-on code and the interface to Kodi. For mediaimporter.emby retrieving all items from the Emby server using the REST API is very fast and~90% of the time is spent on putting together ListItems. For mediaimporter.plex retrieving all items from the Plex Media Server (using python-plexapi is rather slow.

So I'm currently focusing on the python interface to Kodi because it benefits all media importer add-ons.
  • With some basic modifications to the code which generates the python bindings I can improve the performance of putting together ListItems by ~25%. As a bonus these changes should also speed up any other python add-on / plugin which operates on ListItems.
  • By introducing more specific interface methods (instead of using methods like ListItem.setInfo() which takes a huge dict and has to take it apart again) I managed to gain another 25%. The problem with this is that they are not save to use for python plugins and other add-ons directly modifying ListItems which have been retrieved from Kodi core. So now the question is whether I should introduce new classes / structures to represent items coming from media importer add-ons to be able to freely improve performance or not. If I would do this it would be a lot more difficult for existing add-ons already using ListItems to integrate media import. Or maybe there is a third way which I haven't figured out yet.

Hi, I have been working on add-ons for Emby->Kodi for a few years now with my current active dev going into (https://github.com/faush01/plugin.video.embycon), this comment here:

"With some basic modifications to the code which generates the python bindings I can improve the performance of putting together ListItems by ~25%. As a bonus these changes should also speed up any other python add-on / plugin which operates on ListItems."

Is WOW, if this helps ans speeds up dynamic ListItem generation for displaying items that is great. The Add-on I created (EmbyCon) just uses dynamic ListItem creation to show items, it does not sync to the local DB at all so if this helps displaying list of ListItems by 25% that is very welcome :-)

We had a big jump in performance when we could create ListItems in offsecreen mode (xbmcgui.ListItem(list_item_name, offscreen=True)) which helped a great deal as it did not need to sync the display refreshes.

"So now the question is whether I should introduce new classes / structures to represent items coming from media importer add-ons to be able to freely improve performance or not."

Whatever changes you make for performance please also make them for dynamic ListItem video add-ons as well, populating ListItems has always been the bottle neck for EmbyCon in displaying large movie lists.

On the REST API front, are you retrieving ALL the data in one request or are you having to make multiple calls for each movie? Multiple calls is going to be very slow.
(2021-03-13, 03:20)null_pointer Wrote: [ -> ]Hi, I have been working on add-ons for Emby->Kodi for a few years now with my current active dev going into (https://github.com/faush01/plugin.video.embycon), this comment here:

"With some basic modifications to the code which generates the python bindings I can improve the performance of putting together ListItems by ~25%. As a bonus these changes should also speed up any other python add-on / plugin which operates on ListItems."

Is WOW, if this helps ans speeds up dynamic ListItem generation for displaying items that is great. The Add-on I created (EmbyCon) just uses dynamic ListItem creation to show items, it does not sync to the local DB at all so if this helps displaying list of ListItems by 25% that is very welcome :-)
Well I can't make any promises but it mainly benefits API methods which take a list or dict as a parameter. I saw a major improvement for ListItem.setCast(). I'll PR it to Kodi mainline ASAP.

(2021-03-13, 03:20)null_pointer Wrote: [ -> ]We had a big jump in performance when we could create ListItems in offsecreen mode (xbmcgui.ListItem(list_item_name, offscreen=True)) which helped a great deal as it did not need to sync the display refreshes.
I realized that right after writing my previous post. It provides another improvement.

(2021-03-13, 03:20)null_pointer Wrote: [ -> ]"So now the question is whether I should introduce new classes / structures to represent items coming from media importer add-ons to be able to freely improve performance or not."

Whatever changes you make for performance please also make them for dynamic ListItem video add-ons as well, populating ListItems has always been the bottle neck for EmbyCon in displaying large movie lists.
That's my intention but if all the onscreen / offscreen handling is causing too much overhead I'll have to come up with another approach. But I'm still investigating.

(2021-03-13, 03:20)null_pointer Wrote: [ -> ]On the REST API front, are you retrieving ALL the data in one request or are you having to make multiple calls for each movie? Multiple calls is going to be very slow.
With mediaimporter.emby REST API retrieval is not an issue and is very fast. Only matching movies to collections is slow because I have to retrieve the children / movies of every collection separately.

But for mediaimporter.plex retrieval is currently very slow so I'll have to look into that...
(2021-03-12, 02:04)LongMan Wrote: [ -> ]Fingers crossed

In order to help us with a future comparisons when I am able to add the MesiaImporter interface I added a few lines of performance logging to the Mezzmo addon and made it a setting which will allow logging to the Kodi logfile. Here's a few sample outputs from browsing on Kodi running in NUC, Windows on a PC and a Raspberry Pi 4 respectively.  It was just a few lines of code and I had always wanted to do it vs. my old method of dropping into debug and looking at timestamps.


 Kodi on NUC LibreElec
2021-03-13 14:56:09.725 T:140169445357312  NOTICE: Mezzmo stats: Playlist name is Recently Added
2021-03-13 14:56:09.725 T:140169445357312  NOTICE: Mezzmo stats: 0.71s server time  0.62s parsing time
2021-03-13 14:56:09.725 T:140169445357312  NOTICE: Mezzmo stats: 150 items displayed in 1.33s = 112.87 items/sec
 
Kodi on a Windows PC
2021-03-13 03:43:34.312 T:7556  NOTICE: Mezzmo stats: Playlist name is: Recently Added
2021-03-13 03:43:34.312 T:7556  NOTICE: Mezzmo stats: 0.71s server time  0.29s parsing time
2021-03-13 03:43:34.312 T:7556  NOTICE: Mezzmo stats: 150 items displayed in: 1.00s = 150.00 items/sec
 
Kodi on Raspberry Pi 4 LibreElec
2021-03-13 10:08:40.784 T:1921807216  NOTICE: Mezzmo stats: Playlist name is Recently Added
2021-03-13 10:08:40.784 T:1921807216  NOTICE: Mezzmo stats: 1.90s server time  2.26s parsing time
2021-03-13 10:08:40.784 T:1921807216  NOTICE: Mezzmo stats: 150 items displayed in 4.16s = 36.08 items/sec



Jeff