Guest - Testers are needed for the reworked CDateTime core component. See... https://forum.kodi.tv/showthread.php?tid=378981 (September 29) x
Best method to cache / save listitem object
#1
I'm working on updating my addon to be python 2 and python 3 compliant.  In addition to that, I'm trying to make it a little more efficient in returning listitem results.

This addon is different than most in that it can return a lot of results (on the order of thousands) for the user depending on how they use the addon.  My goal is to try and make my plugin as efficient as possible in returning listitems.  Currently, the addon works like this:

1)  User selects a listitem to pull up a larger list of items
2)  Plugin parses an xml file into a dict
3)  The dict is saved (using python pickle) to the plugin userdata folder so it doesnt have to be re-parsed if the user comes back to the same list
4)  The dict is massaged to populate listitem objects and then add them to the directory

I've found in timing tests during debugging that #4 above can take quite a long time if the list is long.  Is it possible for me to change this so that the listitem object(s) are saved (to disk or to RAM) directly, and then I can speed things up even more?  As far as I can tell, pickle will not work as the object is not serializable (in python 2, maybe it will be possible with python 3?)

Has anyone else come up with an efficient method to save listitem objects?

Thanks!
Reply
#2
Just to add to this.   I've played around with this a little more and found cloudpickle.  A simple test appears to work using it, but unfortunately the results saved / returned are empty:
python:

import resources.lib.cloudpickle as cloudpickle
try:
    import cPickle as pickle
except ImportError:
    import pickle
...
test1 = get_games_as_listitems()
test1_cloudpickle = cloudpickle.dumps(test1)
print 'test1'
print test1
print len(test1)
print test1[0].getLabel()

test2_cloudpickle = pickle.loads(test1_cloudpickle)
print 'test2'
print test2_cloudpickle
print len(test2_cloudpickle)
print test2_cloudpickle[0].getLabel()

Returns:
xml:

DEBUG: test1
DEBUG: [<kodi_six.utils.wrapped_class_ListItem object at 0x7ffbb81dea70>, ...  <kodi_six.utils.wrapped_class_ListItem object at 0x7ffbb7e4f5f0>]
DEBUG: 50
DEBUG: Mario

DEBUG: test2
DEBUG: [<kodi_six.utils.wrapped_class_ListItem object at 0x7ffbb81dea70>, ...  <kodi_six.utils.wrapped_class_ListItem object at 0x7ffbb7e4f5f0>]
DEBUG: 50

Reply
#3
Alas, miracles aren't possible. You cannot pickle a Python object if it does not support pickling protocol, especially C++ classes exposed via Python-C API because they cannot even be introspected. And no magic library can change that. You are using a wrong approach. Instead of saving resulting ListIems (which is not possible) you should save data that you create your ListItems from.
Reply
#4
Thanks for the input.  Sounds like I'm doing mostly all that can be done then to make listitem displaying from a plugin as efficient as possible.
Reply
#5
In Kodi 18 there is a new offscreen param when creating ListItems that can dramatically speed up ListItem creation and population.

list_item = xbmcgui.ListItem(listItemName, offscreen=True)
Reply
#6
Hi Zach,

I think both your addon and mines share a lot of common things. Here are some lessons learned from AEL/AML:

1) Use JSON instead of XML. JSON files load as much as 10 times faster than XML!

2) Precompute everything: you can still keep your main database in XML. However, create a JSON cache for every listitem in IARL everytime some XML changes. This JSON cache will have all the precomputed information required to render each item in the list (avoid tests when generating the listitem elements, for example file existence and the like, always precompute them in a cache).

3) Minimise the amount of calls to the Kodi API: use constructions like this

python:
listitem.setArt({'title' : machine_assets['title'], 'snap' : machine_assets['snap'],
     'boxfront' : machine_assets['cabinet'], 'boxback' : machine_assets['cpanel'],
     'cartridge' : machine_assets['PCB'], 'flyer' : machine_assets['flyer'] })
# >> Kodi official artwork fields
listitem.setArt({'icon' : icon_path, 'fanart' : fanart_path, 'banner' : banner_path, 'clearlogo' : clearlogo_path, 'poster' : poster_path })

instead of calling the API multiple times.

4) Think carefully about the Python data types you need for every variable in your addon. For example, if you need to check many times if a string is in a list of strings, using a list is slow. Convert the list to a dictionary dictionary or a set (StackOverflow is always your friend for this kind of questions).

-----------------------------

Using all these guidelines, in AEL/AML, lists having 1000/2000 items render almost instantly. Lists with 30.000 items (MAME Software List Amstrad CPC floppies, for example) render in about 8 seconds (Intel NUC i5 about 4 years old). I believe these are acceptable loading times.
Reply
#7
(2018-03-09, 10:06)Wintermute0110 Wrote: 1) Use JSON instead of XML. JSON files load as much as 10 times faster than XML!
If you need to store your Python data locally, JSON makes sense only if you want your files to be human-readable and/or exchange them with other programs. Otherwise pickling with protocol v.2 is better choice. By default Python 1 uses protocol v.0 that uses ASCII-only binary strings, while v.2 is a pure binary format and you should set it explicitly.
And for structured data you should consider using SQLite.
(2018-03-09, 10:06)Wintermute0110 Wrote: 4) Think carefully about the Python data types you need for every variable in your addon. For example, if you need to check many times if a string is in a list of strings, using a list is slow. Convert the list to a dictionary dictionary or a set (StackOverflow is always your friend for this kind of questions).

Lists and dictionaries (hash maps) are not interchangeable and each data structure has its own purpose. The only advice that I can give here is to avoid creating dynamic lists (which is a common rookie mistake in Python) at all purpose and use Python generators instead:
Bad:
python:
def get_items():
    listing = []
    for item in some_module.get_raw_items():
        listing.append(item)
    return listing
Good:
python:
def get_items():
    for item in some_module.get_raw_items():
        yield item

The only disadvantage of generators is that they are not serializable/picklable so collections of data need still to be saved as lists.
Reply
#8
(2018-03-09, 12:44)Roman_V_M Wrote:
(2018-03-09, 10:06)Wintermute0110 Wrote: 1) Use JSON instead of XML. JSON files load as much as 10 times faster than XML!
If you need to store your Python data locally, JSON makes sense only if you want your files to be human-readable and/or exchange them with other programs. Otherwise pickling with protocol v.2 is better choice. By default Python 1 uses protocol v.0 that uses ASCII-only binary strings, while v.2 is a pure binary format and you should set it explicitly.
And for structured data you should consider using SQLite.  
ROM Collector Browser uses SQLite and anyone can see the results (for big collections, more than 1000 ROMs). Advanced Launcher used XML and again anyone can see the results. When doing the initial AL to AEL code refactoring I did a lot of research and JSON for ROM database storage is the solution for fast loading.

Also, it could be fine to have a SQLite database for ROM storage. But then, after the database changes (after a ROM scanning or scraping) then precompute and save into JSON the possible views of the pluging: for example, listitem with all ROMs, listitem with only the parent ROMs, etc., each on a different JSON file.
 
(2018-03-09, 12:44)Roman_V_M Wrote:
(2018-03-09, 10:06)Wintermute0110 Wrote: 4) Think carefully about the Python data types you need for every variable in your addon. For example, if you need to check many times if a string is in a list of strings, using a list is slow. Convert the list to a dictionary dictionary or a set (StackOverflow is always your friend for this kind of questions).

Lists and dictionaries (has maps) are not interchangeable and each data structure has its own purpose. The only advice that I can give here is to avoid creating dynamic lists (which is a common rookie mistake in Python) at all purpose and use Python generators instead:
Bad:
python:
def get_items():
    listing =
    for item in some_module.get_raw_items():
        listing.append(item)
    return listing
Good:
python:
def get_items():
    for item in some_module.get_raw_items():
        yield item

The only disadvantage of generators is that they are not serializable/picklable so collections of data need still to be saved as lists.   
That's a good tip! I originally come from the C/C++ world and although I understand generators I am not 100% comfortable with them, and use them rarely. I will do some tests to see if I can improve performance of AML/AEL further.
Reply
#9
(2018-03-09, 14:07)Wintermute0110 Wrote: ROM Collector Browser uses SQLite and anyone can see the results (for big collections, more than 1000 ROMs). Advanced Launcher used XML and again anyone can see the results. When doing the initial AL to AEL code refactoring I did a lot of research and JSON for ROM database storage is the solution for fast loading.
I don't think it is SQLite vs Json or xml. Reading large sets of data using one query from SQlite is done in some miliseconds. I think I am doing something inefficient while populating the list in RCB. RCB needs around 20-30 sec for populating 5000 items. Thats ok for me but far away from your numbers. Maybe I will dig into this once more. (Although I think this is up to the user. I don't see much sense in having these large result sets)
Reply
#10
(2018-03-09, 08:51)null_pointer Wrote: In Kodi 18 there is a new offscreen param when creating ListItems that can dramatically speed up ListItem creation and population.

list_item = xbmcgui.ListItem(listItemName, offscreen=True)

BTW, as I understand, this parameter is meant for ListItems that are not meant to be displayed on the screen, like those in subtitles plugins (they are plugins from implementation POW, despite being classified as "services") and future Python scraper plugins.
Reply
#11
Thanks all for these comments.  I've started modifying some of my code and can already see improvement.

With Kodi v17 or earlier, I agree that minimization of listitem API calls is key, as that seems to take up 80% of the processing time.  My method is slightly tweaked now roughly as follows:
1)  User selects action to call for a return of large number of listitems
2)  Plugin parses an xml file into a dict for those listitems
3)  The dict is massaged into three parts to allow the minimum number of API calls to xbmcgui.listitem.  Now I have a dict that looks like:
dict['values'] #Values for the listitem like Label, Label2, path
dict['info'] #Info properties for the listitem like genre, date, etc
dict['art'] #Art properties for the listitem art like thumb, poster, etc
4)  The dict is saved (using python pickle) to the plugin userdata folder so it doesnt have to be re-parsed if the user comes back to the same list
5)  The listitem is populated using 3 xbmcgui.listitem calls (initial listitem object creation, setinfo, setart)

1-4 Alternative) User comes back to the same action to return the same listitems, now the plugin loads the pickled dict thats ready to populate the listitems
5) The listitem is populated just as it was before

As for the offscreen option, I found some discussion on that here.  Sounds like a valid option to generate listitems that aren't currently displayed to speed things up.  I don't see any current documentation on it though.  Say I generate my listitems with offscreen = True:
python:

for items in my_listitem_dict:
  list_to_eventually_display.append(xbmcgui.ListItem(**items['values'], offscreen=True))

...

for items in list_to_eventually_display:
   xbmcplugin.addDirectoryItem(plugin.handle,'',items, True)
xbmcplugin.endOfDirectory(plugin.handle)

At what point do I set offscreen = False?
Reply
#12
OK, just replying to myself again.  I've played with this some more and I'm officially amazed at the difference.

Here's two tests on a list with 2677 items in it, using a Kodi v18 alpha.

Test 1:
python:

t = time.time()
for game_item in current_page:
    game_list.append(xbmcgui.ListItem(label=game_item['values']['label'],label2=game_item['values']['label2'], offscreen=False))
    game_list[-1].setInfo(self.media_type,game_item['info'])
    game_list[-1].setArt(game_item['art'])
print 'Zachs Time Test: '+str(time.time()-t)


...

for list_item in get_game_lists_as_listitems():
    xbmcplugin.addDirectoryItem(plugin.handle, '',list_item, True)
xbmcplugin.endOfDirectory(plugin.handle)

Log Result:
xml:

DEBUG: Zachs Time Test: 25.3677198887
...
DEBUG:   -- items: 2677, sort method: 0, ascending: false


Test 2:
python:

t = time.time()
for game_item in current_page:
    game_list.append(xbmcgui.ListItem(label=game_item['values']['label'],label2=game_item['values']['label2'], offscreen=True))
    game_list[-1].setInfo(self.media_type,game_item['info'])
    game_list[-1].setArt(game_item['art'])
print 'Zachs Time Test: '+str(time.time()-t)


...

for list_item in get_game_lists_as_listitems():
    xbmcplugin.addDirectoryItem(plugin.handle, '',list_item, True)
xbmcplugin.endOfDirectory(plugin.handle)

Log Result:
xml:

DEBUG: Zachs Time Test: 0.769374132156
...
DEBUG:   -- items: 2677, sort method: 0, ascending: false

That is really an amazing speed up.
Reply
#13
(2018-03-09, 16:27)Roman_V_M Wrote:
(2018-03-09, 08:51)null_pointer Wrote: In Kodi 18 there is a new offscreen param when creating ListItems that can dramatically speed up ListItem creation and population.

list_item = xbmcgui.ListItem(listItemName, offscreen=True)

BTW, as I understand, this parameter is meant for ListItems that are not meant to be displayed on the screen, like those in subtitles plugins (they are plugins from implementation POW, despite being classified as "services") and future Python scraper plugins. 
 That is not what I was told.

https://forum.kodi.tv/showthread.php?tid...pid2531524
Reply
#14
(2018-03-12, 06:24)null_pointer Wrote:  That is not what I was told.

Thanks. Then my assumption was incorrect.
Reply
#15
@zachmorris I tested yesterday last version of IARL on Krypton and I think now it's much faster. Well done!

I'm writing now because I found this article which claims that " JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps)." Furthermore, "If you’re sticking to Pickling objects and have the freedom to use C compiled libraries, then go ahead with cPickle instead of pickle, although that still lacks behind JSON (twice in loading and dumping)." Could be of interest to anyone reading this thread.
Reply

Logout Mark Read Team Forum Stats Members Help
Best method to cache / save listitem object0