Kodi Community Forum

Full Version: Issue with MySQL & large music libraries resulting in duplicate artists
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
With Kodi 17.6/stable, and using MySQL backend for mid-to-large music database (3K albums, 40K songs), I started having problems with a Kodi remote app, suddenly I started seeing artist entries duplicated.

I asked the app dev and he claims that there is a known kodi bug with the Music library when using MySQL backend, claims that his app downloads via JSON chunks of the DB, and that when doing that, the bug in Kodi causes the the same artist to be present in more than one chunk and that causes the problem

    > number of artists Mysql will returns the artists in different order making Kodi return duplicates.

@DaveBlake I was wondering if you are familiar with this issue. I tried to find if there is a bug reported somewhere but couldn't find it in github nor trak. I may not be looking for the bug the right way (maybe it was reported in some other context, perhaps not specific to Music or something like that).

Also any pointer on how to work-around the problem would be great even if temporary, pointers welcome. I tried many different things but couldn't find a way to get rid of the problem
Duplicated artists returned when trying to fetch artists via JSON API in batches is not an issue I am specifically familiar with or I would have fxed it already, and I don't think it has been reported before,  but I can immediately imagine why that is happening especially with a MySQL database. Also I do vaguely remember something similar happening with songs, also MySQL related, that was fixed.

Both Github and Trac are very hard to search for bugs (and fixes),  so no surprize it was unfruitful for you, and I am not going to bother. Having this discussion on thr forum will at least produce something everyone can find afterwards, and one reason why I am happy for users to use the forum for bug reporting. It lacks tracking and management or course, but then reports can sit on Trac forever since there is no manual triage procedure and not enough active "owners" of any component.

I think the issue is because many of the db queries historically do not have ORDER BY statements. Since all sorting is done internally to Kodi on  lists, not in db itself, the original authors didn't bother. This is fine with SQLite it order by default, but strictly order is undefined and MySQL can give problems by returning the results in an unexpected order.

The only work around for v17 is to use a SQLite database and abandon client server. But I will look into a solution for v18. I am currently reworking the way JSON fetches music data to make it faster, that includes sorting at the db sor the issue will vanish.
This is great news actually! Thanks Dave for the fast response. The important part is knowing that there is a json rework coming that will also eventually make the bug vanish.

I am guessing that v18 is a year away. In the interim I would be glad to volunteer testing time and effort whenever there is a branch that I can build. I have a Kodi install just for Music and can cover a broad set of music-centric use cases, library-related and playback (HD-audio) related. Please post somewhere around this sub-forum once there is a branch that needs testing and exposure.
@htpcero

Great that you are able to test a wide variety of the music functions.

For a start, could you subscribe to this thread. @DaveBlake posts in the thread whenever he needs testers. You will receive an email each time there is a new post...

https://forum.kodi.tv/showthread.php?tid=330948
Found it a few minutes ago and subscribed. Also to the Roadmap thread. Great to see so much activity on Music! I had no idea there was so much going on. Looks like Leia is going to be a great release for us music fans. Looking forward to help out
Yes, an immense amount of work by @DaveBlake. You should consider downloading one of the music test builds and playing. Lots of work on Artwork, Artist folders and Library Filtering by Source.
@htpcero great to hear you are up for testing new work etc., and maybe v18 not so far away (but no promises). At the moment the most recent completed music changes are in the current nightly, so perhaps try that in portable mode (so nothing lost in your stable system)

Reading back I realise that I am assuming quite a lot, and that can always come back and bite me! So if you are able it would be good to get some debug information from v17, just to be sure that my guesses are correct. Restart Kodi with debug logging turned on, and then from the remote app request a music library sync. No need to do anything else with Kodi via GUI or remote app (to keep the Kodi log as small as possible). Then post a link to the debug log here, note it is the Kodi debug I am interested in not the remote app which is just a tool for issuing JSON commands in this situation.

The further step could be using JSON more manually, or looking at your MySQL db how are your computer skills?

See wiki link in my colleagues signature if you need more instructions on debug logging
I will try and reproduce the problem with 17.6 from a test virtual machine, and then try the same with 18.0 nightly and compare.

Should be easy to reproduce, at least in my 'production' raspberry, just issuing two partial JSON queries of the artists table, 1-750, 751-1490, and verifying that the same artist record appears in both results. I will capture all the debug logs while doing that and post the results, and upload the logs. 

I will connect the test VM to the same mariaDB that the raspberry uses.
Quote:> The further step could be using JSON more manually, or looking at your MySQL db how are your computer skills?

That's exactly what I plan to do, use curl to trigger the same JSON queries that the app does manually, and test this in a) the raspberry, b) in an ubuntu VM with 17.6, and finally c) in the same VM upgraded to Leia nightly, to see if the issue goes away, and also to capture all the Kodi logs.

Also, I do have some basic mariaDB skills if you would need/like to try any change in the DB, do let me know. I have backups of the DB so I can modify as needed and later on restore my backup, so no risk.

In my setup the database runs in the raspberry, and is used by kodi in the raspberry itself and also by another kodi running on an ubuntu-based HTPC

Thought I would also add, my entire collection is based on single-file flac files + cuesheets, although I don't think this has any connection to the problem.
@DaveBlake - I captured kodi logs and JSON output showing the issue, first with 17.6 and then with the latest Leia 18.0 nightly

Unfortunately, Leia still shows the exact same problem as 17.6.

Both 17.6 and 18.0 are sending to MariaDB the exact same SQL query.

The resulting files are here:   https://filebin.net/9qd3lxnzwmiqaorg

There are two folders, one with 17.6 results and the other one with 18.0. Each test set has the JSON output (3 files, one per JSON call / table chunk, and the kodi log corresonding to the last two calls (since the duplicates always happen in the last two chunks, this is consistent)


I do the JSON queries like so:
Quote:/usr/bin/curl --user kodi:kodi --header 'Content-Type: application/json' --data-binary '{"id":609,"jsonrpc":"2.0","method":"AudioLibrary.GetArtists","params":{"albumartistsonly":false,"properties":["style","description","born","died","thumbnail","instrument","genre","fanart","songgenres","isalbumartist"],"limits":{"end":750,"start":0}}}' http://192.168.122.119:8080/jsonrpc  > kodi-17.6-artistsPart1.json

Part1: {"end":750,"start":0}
Part2: {"end":1500,"start":750}
Part3: {"end":2250,"start":1500}

Where Part1/2/3 are the 3 chunks

Sample repeated artists always show in the 2nd and 3rd chunks, there are about 30 of them, examples: "The Velvet Underground", "Santullo", "Rosario": they can all be found in both chunks (#2, #3)



17.6, for Part2, JSON request and corresponding SQL query
Quote:00:05:42.942 T:139754102224640   DEBUG: JSONRPC: Incoming request: {"id":609,"jsonrpc":"2.0","method":"AudioLibrary.GetArtists","params":{"albumartistsonly":false,"properties":["style","description","born","died","thumbnail","instrument","genre","fanart","songgenres","isalbumartist"],"limits":{"end":1500,"start":750}}}
00:05:42.982 T:139754102224640   DEBUG: GetArtistsByWhere query: SELECT artistview.* FROM artistview  WHERE ((EXISTS (SELECT 1 FROM song_artist WHERE song_artist.idArtist = artistview.idArtist AND song_artist.idRole = 1) OR EXISTS (SELECT 1 FROM album_artist WHERE album_artist.idArtist = artistview.idArtist)) AND (artistview.strArtist != '')) AND (artistview.strArtist <> 'Various artists') LIMIT 750,750


18.0 Leia, for Part2, JSON request and corresponding SQL query
 
Quote:00:25:16.716 T:140208452392704   DEBUG: JSONRPC: Incoming request: {"id":609,"jsonrpc":"2.0","method":"AudioLibrary.GetArtists","params":{"albumartistsonly":false,"properties":["style","description","born","died","thumbnail","instrument","genre","fanart","songgenres","isalbumartist"],"limits":{"end":1500,"start":750}}}
00:25:16.759 T:140208452392704   DEBUG: GetArtistsByWhere query: SELECT artistview.* FROM artistview  WHERE ((EXISTS (SELECT 1 FROM song_artist WHERE song_artist.idArtist = artistview.idArtist AND song_artist.idRole = 1) OR EXISTS (SELECT 1 FROM album_artist WHERE album_artist.idArtist = artistview.idArtist)) AND (artistview.strArtist != '')) AND (artistview.strArtist <> 'Various artists') LIMIT 750,750


Could this help as a fix:  when a table segment is requested (as opposed to the entire table), would it be possible for Kodi to do some kind of SORT of the table before splitting it into chunks. That would guarantee that there will never be repetitions with the same artist showing in different chunks, I think.
It has been pointed out to be that I have encountered this behaviour before (see https://trac.kodi.tv/ticket/15735 for interest). Ticket was raised before I joined the team, but I did try to investigate back in 2016, only to have the user say the issue vanished. I have no memory of it, but what I thought was intuitive insight into the cause was clearly informed by previous discussion!!  Clearly I am well on the road to madness!!!

Back then I would not attempt to solve what I could not reproduce. Now I understand much more about how at least some parts of Kodi are implemented and I am pretty sure that it is placing a LIMIT clause on a query without and ORDER BY that is sometimes resulting in undefined results. Of course it would be nice to see MySQL/MariaDB do this to be sure.

The current v18 nightly will not behave any differently from v17.6 in this regard, so save the exhaustive test efforts. But you might like to see what new features v18 has in other areas.

However it would be interesting to see what using the JSON calls from CURL produces (in 17.6 is fine) and put the evidence on record. Do you need help with the JSON syntax?  I use a tool called Postman to test JSON calls, but CURL is good enough too

Another thing would be to see what issuing the same SQL directly within a MariaDB browser does. The Kodi debug log from during the JSON call will show the SQL, but I can help you if that is digging a bit too deep.

I don't know of any change to the db itself that will avoid this issue. In ANSII SQL the order of results in a dataset is undefined if there is no ORDER BY, and so it is within spec if MariaDB does something unexpected. Now you mention the RDBMS is MariaDB, it could be there is some different behaviour between MariaDB andMySQl, or different  versions, I really don't know. Regardless Kodi needs to use different SQL. It is something that I will fix for v18 one way or another.

Meanwhile I look forwards to the debug log (from 17.6), and those JSON/SQL test results, let me know if you need better instructions.
Ah, cross posted
(2018-06-22, 06:48)htpcero Wrote: [ -> ]Unfortunately, Leia still shows the exact same problem as 17.6.
Yes, as expected. Sorry if something I said led you to think otherwise, changes releated to this are not in the nightly yet.

Now let's look at what you have sent Smile
OK, very useful data @hpcertpro  thank you. I can see the records in part1 are not in id order, hence the next chunck could well return duplicates. However it looks like you have accidentally duplicated the part3 JSON file as part2. I'm sure you have the part2 as you describe ({"end":1500,"start":750}) but it would be nice to have it too for completeness. Just the 17.6 file is all I need.
 
Quote:Could this help as a fix:  when a table segment is requested (as opposed to the entire table), would it be possible for Kodi to do some kind of SORT of the table before splitting it into chunks. That would guarantee that there will never be repetitions with the same artist showing in different chunks, I think.
Well that could be one way to do things, but not the best way. Fetching all the artists into local memory each time, sorting them there and then discarding those not wanted to be returned as JSON results is very ineffecient and slow. The faster and more accurate solution is to do both sorting and limits at the DB, with the correct SQL the db can do it all.

You can prove I am right using SQL directly if you like. Compare the results of
Code:
SELECT artistview.* FROM artistview 
WHERE ((
EXISTS (SELECT 1 FROM song_artist
WHERE song_artist.idArtist = artistview.idArtist AND song_artist.idRole = 1)
OR EXISTS (SELECT 1 FROM album_artist WHERE album_artist.idArtist = artistview.idArtist))
AND (artistview.strArtist != ''))
AND (artistview.strArtist <> 'Various artists')
LIMIT 1500,750
with
Code:
SELECT artistview.* FROM artistview
WHERE ((
EXISTS (SELECT 1 FROM song_artist
WHERE song_artist.idArtist = artistview.idArtist AND song_artist.idRole = 1)
OR EXISTS (SELECT 1 FROM album_artist WHERE album_artist.idArtist = artistview.idArtist))
AND (artistview.strArtist != ''))
AND (artistview.strArtist <> 'Various artists')
LIMIT 1500,750
ORDER BY artistview.idArtist
(2018-06-22, 07:59)DaveBlake Wrote: [ -> ]OK, very useful data @hpcertpro  thank you. I can see the records in part1 are not in id order, hence the next chunck could well return duplicates. However it looks like you have accidentally duplicated the part3 JSON file as part2. I'm sure you have the part2 as you describe ({"end":1500,"start":750}) but it would be nice to have it too for completeness. Just the 17.6 file is all I need.
 Done: downgraded to 17.6 &  re-downloaded the second chunk, sorry for the mess-up.

Here are the full logs again for reference: with part2 corrected: https://filebin.net/9v3kprt15ulh88s9

One thing I notice is that the query responses from mysql (and hence JSON), even if they have the duplicates, at least they are consistent. They don't change from run to run. Also as you mentioned, 17.6 and 18.0 responses are exactly the same.

I tried the SQL queries. The first one works (I think it's the same that Kodi uses today), the second one errors out though:
 
Quote:SQL Error (1064): You have an error in your SQL syntax; check the manual that corresponds to your MaridaDB server version for the right syntax to use near 'ORDER BY artistview.idArtist' at line 9

Line 9 is the 'order' line.

For reference, this is the version of mariadb in the raspberry (OS = OSMC which is raspbian = Debian Stretch)
Quote:$ apt show mariadb-server
Package: mariadb-server
Source: mariadb-10.1
Version: 10.1.26-0+deb9u1
Installed-Size: 62
Maintainer: Debian MySQL Maintainers <[email protected]>
Architecture: all
Depends: mariadb-server-10.1 (>= 10.1.26-0+deb9u1)

So I tweaked the query a bit like so:
Quote:SELECT artistview.* FROM artistview
WHERE ((
EXISTS (SELECT 1 FROM song_artist
WHERE song_artist.idArtist = artistview.idArtist AND song_artist.idRole = 1)
OR EXISTS (SELECT 1 FROM album_artist WHERE album_artist.idArtist = artistview.idArtist))
AND (artistview.strArtist != ''))
AND (artistview.strArtist <> 'Various artists')
ORDER BY artistview.idArtist
LIMIT 750 OFFSET 1500

This one actually actually works and solves the problem!

Part1: LIMIT 750 OFFSET 0
Part2: LIMIT 750 OFFSET 750
Part3: LIMIT 750 OFFSET 1500

(or: LIMIT {[offset,] row_count)

Without the ORDER BY, I could confirm repeated artists in the SQL query output between Part2 and Part3, and ALSO between Part1 and Part3.
Adding the ORDER BY statement solves the issue.

Let me me know if there is a chance to get the change into a nightly so I can test end-to-end (JSON queries and ultimately the app)
Thanks for the corrected file. Sorry for the typo in the SQL, I was in a rush and put the ORDER BY in the wrong place, but you got to the correct SQL statement and showed that having an ORDER BY it is the solution. I was pretty certain but always good to test.

For your interest the limit clause can be written either
LIMIT 750 OFFSET 1500
or
LIMIT 1500, 750
Quote:One thing I notice is that the query responses from mysql (and hence JSON), even if they have the duplicates, at least they are consistent. They don't change from run to run.
Yes, records are probably returned in some phyiscal order not a totally random one, but I have never managed to make my MySQL db do it.
Quote:Let me me know if there is a chance to get the change into a nightly so I can test end-to-end (JSON queries and ultimately the app)
I am including the fix for this with some other related changes so it may be a few more days before something is available for you to test. I'll post here (and you will here via my test thread too) when that is the case.

Thanks for your help indentifying and testing this issue
Had the chance to retest with the latest nightly last weekend, I know the fix wasn't announced here nor in the dev thread - but thought I'd give it one shot.

As expected the issue is still there. In case there is a pull request for this I'm able to build from source and test if that helps.

@DaveBlake  one thing that did get a fix after the issue report last month is music library browsing via the web interface, they now support the new JSON API and the music UX is no longer broken. Nice.
Pages: 1 2