Guest - Testers are needed for the reworked CDateTime core component. See... https://forum.kodi.tv/showthread.php?tid=378981 (September 29) x
Best method to cache / save listitem object
#16
Thanks @Wintermute0110.  I had looked at that exact same article, as well as this one.

I'm just finishing up a new release of my addon for v18 and have come up with the following scheme:

1) Parse archived data from an xml file to a dict, in a format that can just be called directly from  xbmcgui.ListItem
2) Minimize the calls to xbmcgui.ListItem to 3.  One to create the list, one for setInfo, one for setArt
3) Save those parsed xml lists to json files for faster loading after the list was parsed
4) Keep a copy of the current list/dict in memory using json (for faster loading when the user is just browsing the list in various ways)
5) Send them to kodi using xbmcgui.ListItem(... offscreen=True)

This seems to be as good as I can get it.  Step 3 I tried using both pickle.load / pickle.dump and json.load / json.dump.  Indeed with a large list, json did speed things up even more (although the times are in the ms difference for a list of ~5000).
Steps 4 and 5 have sped up displaying a large list dramatically, over 20x faster. Even on an RPi its almost snappy. The offscreen=True option should almost be the default in my opinion, or at least in the documentation as a best practice.

In the future, i might play around with some of those other data serialization methods.  MessagePack seems promising, but someone would have to package that up to work with Kodi.
Reply
#17
@zachmorris Thanks a lot for your inputs! Definitely I have to investigate the offscreen = True parameter when I port the Advanced Launchers for Leia. Also, you gave me some ideas for further optimization of the Advanced Launchers (now I generate the list for the context menu for every ListItem row but actually it can be generated once and reused, among other things).
Reply
#18
I'm with @Roman_V_M on the use of generators (where applicable). I'd also suggest using a mixture of multiprocessing and multithreading (surprised I didn't see it mentioned). As for caching..., I'd use sqlite for anything over 10k items.... otherwise maybe try a csv file as your cache.
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#19
Just bumping this discussion whilst i was doing some testing on an update to my addon for v19. My goal was to try and speed up read/write/serialization of data that my addon uses as much as possible, so I was investigating json, msgpack, xmltodict, etc.

Strangely enough, just with v19 and python 3, there is a huge performance increase.

As a benchmark, I tested reading and writing a 12MB xml file that my addon uses.

Here's my quick and dirty script:
python:

from . import xmltodict
from . import umsgpack
import json
import time
import xml.etree.ElementTree as ET

print('Start test')
last_time = time.time()
with open('FBN_ZachMorris.xml','rb') as fn:
test_dict = xmltodict.parse(fn)
now = time.time()
diff = now - last_time
print('xmltodict read took %(value)s'%{'value':diff})

last_time = time.time()
with open('test.xml','w') as fn:
fn.write(xmltodict.unparse(test_dict, pretty=True))
now = time.time()
diff = now - last_time
print('xmltodict write took %(value)s'%{'value':diff})

last_time = time.time()
with open('FBN_ZachMorris.xml','rb') as fn:
test_et = ET.parse(fn)
now = time.time()
diff = now - last_time
print('ET read took %(value)s'%{'value':diff})

last_time = time.time()
dict_out = etree_to_dict(test_et.getroot())
now = time.time()
diff = now - last_time
print('ET parse took %(value)s'%{'value':diff})

last_time = time.time()
with open('test.json','w') as fn:
json.dump(test_dict,fn)
now = time.time()
diff = now - last_time
print('json write took %(value)s'%{'value':diff})

last_time = time.time()
with open('test.msgpack','wb') as fn:
fn.write(umsgpack.packb(test_dict))
now = time.time()
diff = now - last_time
print('umsgpack write took %(value)s'%{'value':diff})

last_time = time.time()
with open('test.msgpack','rb') as fn:
test=umsgpack.unpackb(fn.read())
now = time.time()
diff = now - last_time
print('umsgpack read took %(value)s'%{'value':diff})

last_time = time.time()
with open('test.json','rb') as fn:
test=json.load(fn)
now = time.time()
diff = now - last_time
print('json read took %(value)s'%{'value':diff})
print('End test')

With v18:
python:

DEBUG: xmltodict read took 15.1663999557
DEBUG: xmltodict write took 28.6266071796
DEBUG: json write took 1.92459106445
DEBUG: umsgpack write took 4.8748281002
DEBUG: umsgpack read took 4.04222583771
DEBUG: json read took 1.10773396492

With v19:
python:

DEBUG <general>: xmltodict read took 2.019379138946533
DEBUG <general>: xmltodict write took 2.5575900077819824
DEBUG <general>: ET read took 0.8522157669067383
DEBUG <general>: ET parse took 0.5609760284423828
DEBUG <general>: json write took 2.088736057281494
DEBUG <general>: umsgpack write took 1.2496559619903564
DEBUG <general>: umsgpack read took 1.4147701263427734
DEBUG <general>: json read took 2.420412063598633

I'm not exactly sure why the huge difference, but it's a welcome one. It significantly will simplify my code since I don't think i really have to worry about caching xml data anymore since load times are comparable (and fast).
Reply
#20
@zachmorris Hi Zach, thanks a lot for your benchmarks. I am now porting AML to Python 3 and using Kodi Matrix Alpha 1. I have noticed a considerable increase in the speed of ElementTree. I think the reason is that in Python 3 ElementTree uses a highly optimized C parser. If I need to use a simple XML file is is very good to know that xmltodict is on par with JSON. However, in AML I need to use an iterative ElementTree XML parser to deal with MAME XML and discard many information in the XML file the addon doesn't need. If I use xmltodict to load MAME.xml probably it will take several GBs of RAM and I am not sure if it will work with XML files with a very complex structure.

Also, thanks for letting me know about MessagePack, it looks like binary JSON with faster writing/loading times. I think I will keep using JSON for my addons because the data is human readable and the write/loading speed adequate.

Finally, in general I feel that the addons runs smother in Kodi Matrix compared with Kodi Leia. Probably the Python 3 interpreter has a lot of optimizations and improvements compared with its Python 2 counterpart. Python 3 has been developed but Python 2 was only on maintenance mode since long ago.
Reply
#21
(2020-09-10, 07:17)Wintermute0110 Wrote: @zachmorris Hi Zach, thanks a lot for your benchmarks. I am now porting AML to Python 3 and using Kodi Matrix Alpha 1. I have noticed a considerable increase in the speed of ElementTree. I think the reason is that in Python 3 ElementTree uses a highly optimized C parser. If I need to use a simple XML file is is very good to know that xmltodict is on par with JSON. However, in AML I need to use an iterative ElementTree XML parser to deal with MAME XML and discard many information in the XML file the addon doesn't need. If I use xmltodict to load MAME.xml probably it will take several GBs of RAM and I am not sure if it will work with XML files with a very complex structure.

Also, thanks for letting me know about MessagePack, it looks like binary JSON with faster writing/loading times. I think I will keep using JSON for my addons because the data is human readable and the write/loading speed adequate.

Finally, in general I feel that the addons runs smother in Kodi Matrix compared with Kodi Leia. Probably the Python 3 interpreter has a lot of optimizations and improvements compared with its Python 2 counterpart. Python 3 has been developed but Python 2 was only on maintenance mode since long ago.

I didn't include ET, but I could look at that as well. FWIW, xmltodict can stream data via a callback for a smaller memory footprint.
Reply
#22
I checked xml.etree.ElementTree, and it appears to be the winner at over 2x as fast as json or xmltodict at reading, and then around 1x faster than the others at reading+parsing (which is still pretty darn amazing, updated the timings above).

I'd love to utilize lxml, which according to the internet is even faster, but there is no binary addon for Kodi lxml.
Reply
#23
(2020-09-11, 04:52)zachmorris Wrote: I checked xml.etree.ElementTree, and it appears to be the winner at over 2x as fast as json or xmltodict at reading, and then around 1x faster than the others at reading+parsing (which is still pretty darn amazing, updated the timings above).

I'd love to utilize lxml, which according to the internet is even faster, but there is no binary addon for Kodi lxml.

Thanks again zach. As I said, I noticed a considerable increase in speed of ET but I didn't measure it, actual results are astonishing.
Reply
#24
@zachmorris what platform do you use?
Reply
#25
(2020-09-16, 11:29)Fuzzard Wrote: @zachmorris what platform do you use?

I was using a 2015 macbook, which isn't super fancy compared to whats out there now:
xml:

INFO <general>: Starting Kodi (19.0-ALPHA1 (18.9.701) Git:20200830-7c5ab082d3). Platform: OS X x86 64-bit
INFO <general>: Using Debug Kodi x64 build

In more testing, ET seems to perform better than json in every case I've tried (large or small files many or few elements). Even more interestingly, with some more tests, I've found reading directly from disk (an SSD mind you) vs reading/writing a result to RAM (using xbmcgui.Window(WINDOW_ID).setProperty/getProperty(key,value)) is faster when using ET (using json they're pretty much the same):

xml:

DEBUG <general>: Large file test
DEBUG <general>: File size 21332476
DEBUG <general>: ET parse from disk (etree_to_dict(ET.parse(file)) took 0.5243639945983887
DEBUG <general>: json write to disk (json.dump(value,file)) took 1.7911739349365234
DEBUG <general>: json read from disk (json.load(file)) took 1.805438756942749
DEBUG <general>: json write to RAM (xbmcgui.Window(WINDOW_ID).setProperty('test_dict',json.dumps(value)) took 1.3773140907287598
DEBUG <general>: json read from RAM (json.loads(xbmcgui.Window(WINDOW_ID).getProperty('test_dict'))) took 1.794213056564331
DEBUG <general>: write string to RAM took 0.05406999588012695
DEBUG <general>: ET parse string from RAM (etree_to_dict(ET.fromstring(xbmcgui.Window(WINDOW_ID).getProperty('test_string')))) took 1.331468105316162

This is very surprising to me, but I'm assuming the overhead resides in the getProperty / setProperty functions.

With all of this, the difference between half a second is pretty small compared to the generation and display of the listitems themselves now, so I think regardless of what you use it won't be noticeable.
Reply
#26
My question was more to see if i could assist with an lxml implementation to test with on the platform you use.
I had engaged in doing that for Apple/android platforms not too long ago. I'll see if i can dig up a working osx version over the next week or so of lxml.
Reply
#27
(2020-09-17, 04:34)zachmorris Wrote:
(2020-09-16, 11:29)Fuzzard Wrote: @zachmorris what platform do you use?

I was using a 2015 macbook, which isn't super fancy compared to whats out there now:
xml:

INFO <general>: Starting Kodi (19.0-ALPHA1 (18.9.701) Git:20200830-7c5ab082d3). Platform: OS X x86 64-bit
INFO <general>: Using Debug Kodi x64 build

In more testing, ET seems to perform better than json in every case I've tried (large or small files many or few elements). Even more interestingly, with some more tests, I've found reading directly from disk (an SSD mind you) vs reading/writing a result to RAM (using xbmcgui.Window(WINDOW_ID).setProperty/getProperty(key,value)) is faster when using ET (using json they're pretty much the same):

xml:

DEBUG <general>: Large file test
DEBUG <general>: File size 21332476
DEBUG <general>: ET parse from disk (etree_to_dict(ET.parse(file)) took 0.5243639945983887
DEBUG <general>: json write to disk (json.dump(value,file)) took 1.7911739349365234
DEBUG <general>: json read from disk (json.load(file)) took 1.805438756942749
DEBUG <general>: json write to RAM (xbmcgui.Window(WINDOW_ID).setProperty('test_dict',json.dumps(value)) took 1.3773140907287598
DEBUG <general>: json read from RAM (json.loads(xbmcgui.Window(WINDOW_ID).getProperty('test_dict'))) took 1.794213056564331
DEBUG <general>: write string to RAM took 0.05406999588012695
DEBUG <general>: ET parse string from RAM (etree_to_dict(ET.fromstring(xbmcgui.Window(WINDOW_ID).getProperty('test_string')))) took 1.331468105316162

This is very surprising to me, but I'm assuming the overhead resides in the getProperty / setProperty functions.

With all of this, the difference between half a second is pretty small compared to the generation and display of the listitems themselves now, so I think regardless of what you use it won't be noticeable.

I think you are correct. Data marshaling between Python/C++ takes time. In general, calling Kodi functions from Python must be minimized for addons where performance is critical. For example, if you have a big list of 4,000 ListItems(), AML measures the loading time (JSON from disk to memory) and the rendering time (calling all Kodi API functions). In both cases are 0.7 + 0.7 seconds, so even if you get 0.2/0.4 seconds improvement in data loading the actual difference will not be very noticeable. That's why I will keep using JSON for the time begin (although in my addons it is very easy to change the storage format, all IO goes though a load_data() write_data() functions that can be easily changed. On the other hand, JSON is very convenient because it is human readable and simplifies development a lot).

By the way, I use both Windows and Linux and I have notice the ET performance improvements in both OSes.
Reply
#28
(2020-09-17, 05:20)Fuzzard Wrote: My question was more to see if i could assist with an lxml implementation to test with on the platform you use.
I had engaged in doing that for Apple/android platforms not too long ago. I'll see if i can dig up a working osx version over the next week or so of lxml.

That would be awesome. Does this fall under the binary addon umbrella or would it have to be baked into Kodi directly?
Speaking of, there are also several faster implementations of json parsers: ujson, orjson, rapidjson all of which are similar to lxml but have binary component / external requirements. I think to get any faster parsing of either xml or json in Kodi libraries like these would have to be included (directly or as binary addon options if thats possible).
Reply
#29
It's a binary add-on.

At this time I don't think I would look into others you have listed. This is purely because I now know some of the limitations of the python cross-compile process and our build process. It leads to a considerable amount of patching to really do correct, and it's effectively manual for each new module.
I've been looking into a new way to build the modules that in theory should get around the pitfalls, and make the creation of further binary modules a simple process others can potentially do.
Reply
#30
(2020-09-17, 22:30)Fuzzard Wrote: It's a binary add-on.

At this time I don't think I would look into others you have listed. This is purely because I now know some of the limitations of the python cross-compile process and our build process. It leads to a considerable amount of patching to really do correct, and it's effectively manual for each new module.
I've been looking into a new way to build the modules that in theory should get around the pitfalls, and make the creation of further binary modules a simple process others can potentially do.

Yes, that'd be great. Coincidentally I asked that exact question on such templates a while ago. In a similar vein to this caching method, using a data management package like pandas is on my wishlist, and i think would go a long way in development of Kodi addons.
Reply

Logout Mark Read Team Forum Stats Members Help
Best method to cache / save listitem object0