Top 100 Albums fails (SQL error) on MySQL
#1
[Split from https://forum.kodi.tv/showthread.php?tid=360791 for context]
I tried to reproduce the "kodi hangs at the end of the rescan" problem capturing debug logs, with the Mac build that includes the patch, however I couldn't reproduce the problem; I suspect the original issue I ran into is related to my Mac being a laptop and having gone to sleep at some point during the 2-3 hour long re-scan process and this impacted kodi. I did notice while testing the release that Kodi has all kinds of issues when the mac wakes up and needs to be restarted. So let's assume this is a non-issue until it can be reproduced.
I will test the Linux 20.0.alpha build with the path on a VM in a server where laptop suspend is not supported and report back if I find issues.

@DaveBlake  RE the top-100-album list, I captured the debug log and sent you a link via PM. It is clearly a problem with the mysql statement specific to matrix; I use an external MySQL (maria db, hosted in debian); a Leia client that hosts MyMusic72 on the SAME external MySQL server, that uses the same dataset that got converted by Matrix to version 82, has no issues with Top-100-albums (just tested it)

The error that the query returns (which I got from the debug log) is:

ERROR 1406 (22001): Data too long for column 'sortString' at row 1

After some investigation I found the problem in the SELECT statement, in the final part:

Code:

... ORDER BY albumview.iTimesPlayed DESC, udfNaturalSortFormat(titlesortname, 8, '.') DESC, albumview.idAlbum DESC LIMIT 100;

and this is the solution that seems to work:

Code:

... ORDER BY albumview.iTimesPlayed DESC, udfNaturalSortFormat(LEFT(titlesortname,128), 8, '.') DESC, albumview.idAlbum DESC LIMIT 100;

I noticed in Kodi's code for the udfNaturalSortFormat that they expect 256 bytes for the title. However it seems that titlesortname in the MySQL DB accepts 256 chars of Unicode, 2 bytes per character. So using the leftmost 128 chars guarantees that the title will fit, and I could confirm the SELECT query no longer errors out, the SELECT query works. It seems that anyone having long album titles and using external MySQL for the music DB will run into this problem.


Let me know if you prefer that I open a Github issue or another thread there, for the top-100-album issue or we can continue discussing here, it would be nice to have the fix included in 19.1.
I am able to test variations of the SELECT command if needed. Note that posting an entire SELECT query here (I tried) triggered a cloudflare threat alarm, some sort of protection of this forum against sql injection (I guess)
Reply
#2
Thanks @htpcero for the log and for investigating the SQL problem. Yes v19 has different queries from previous versions - it does the sorting for limited queries efficiently in the database (allowing for articles, natural number order and collation) rather than grab every thing into memory and do it there. Sometimes I can post SQL in threads, sometimes it triggers a Cloudflare alert as a SQl injection protection.

I had expected any SQL error to appear in the log, there were changes to the logging system in v19 so perhaps that is why it didn't. Humm.... something to watch for.

No need for Github issue, I'm the dev to fix it and I'm here Smile
Reply
#3
Something wieird happing in MySQL...
I don't see how it can be a Unicode issue, the function parameter and the table column are both defined as VARCHAR (256)

I'm having difficulty reproducing so @htpcero can I have a list of all your album titles exported from album table, I don't need any other data.
Reply
#4
(2021-04-06, 11:50)DaveBlake Wrote: Something wieird happing in MySQL...
I don't see how it can be a Unicode issue, the function parameter and the table column are both defined as VARCHAR (256)

I'm having difficulty reproducing so @htpcero can I have a list of all your album titles exported from album table, I don't need any other data.

Sorry I missed this, some issue with notifications. I have an SQL dump, let me zip it and share it, it's very small when compressed, I will send you the link via PM

Thanks!
Reply
#5
Link via PM not working @htpcero , can you try again please.
Reply
#6
Data received, thanks @htpcero.

If you think you can write simple user instructions to allow other users to produce a db export from MySQL/MariaDB like that it would be very useful. So many have a server running but have no idea how to get a useable export, and sometimes looking at data is the quickest way to analyse an issue (especially issues with scraping of TV Shows).
Reply
#7
(2021-04-12, 19:43)DaveBlake Wrote: Data received, thanks @htpcero.

If you think you can write simple user instructions to allow other users to produce a db export from MySQL/MariaDB like that it would be very useful. So many have a server running but have no idea how to get a useable export, and sometimes looking at data is the quickest way to analyse an issue (especially issues with scraping of TV Shows).

Awesome, thanks.

Backing up a database in MySQL (or maria DB) is super simple. I have been using a script for this that runs weekly, I just checked the code and it's been running (in different machines) for an amazing 11 years, that's the beauty of linux I guess. Breaking it down into simple steps:
  • In the machine that is running the mysql daemon (server), Install the "mysql-client" package, e.g. in a linux distro such as ubuntu or debian
    • sudo apt install mysql-client
    • # this will install an utility called "mysqldump" together the mysql/mariadb client
  • Then you can backup any of your databases or all of them, for example:
    •  mysqldump -u [username] -p [password] [databasename] > [backupfile.sql]
      • Say I want my Kodi Music DB, and that my MySQL username and password (as used by Kodi) are "kodi" and "kodi".
        •  mysqldump -u kodi -p kodi MyMusic82 > backupOfMyMusicDB.sql
      • Say I want to backup all databases accessible by MySQL user "kodi"
        • mysqldump -u kodi -p kodi --all-databases --single-transaction > mysqldump-all-databases.sql

That's it! you can easily script this and add a call to the script in a cron job, or create a systemd timer.
Reply
#8
Oddly @htpcero the version of MySQL I am using doesn't produce any SQL error even on your data, but thankfully I still think I understand what is happening.

I think only one album title is causing the error, and if that is renamed then I expect the Top 100 albums to work again for you. I still need to implement a fix, but if you could rename that album (editing the strAlbum field of the album table for idAlbum = 2904 to be something short and simple and not "Complete Beethoven Edition Vol 15 - Wind Music - CD2 - Sextett op 71, 3 Equale WoO 30, Märsche WoO 29, 18, Polonanise WoO 21, Ecossaise WoO 22, Marsch-trio WoO 20, Märsche WoO 19, 24") and test then that would prove my hypothesis.

It isn't the 182 chars of that title that is the issue or unicode, but how many digits there are in the title and the way the natural number sorting (e.g. 99 before 100) is implemented is overflowing a function variable size. At least that is what I think is happening, but you tell me if editing avoids the issue.

BTW that is one Hell of an album title! Would be much nicer to use separate files with embedded metadata and make use of the disc set functionality in v19 Smile
Reply
#9
(2021-04-12, 23:54)DaveBlake Wrote: Oddly @htpcero the version of MySQL I am using doesn't produce any SQL error even on your data, but thankfully I still think I understand what is happening.

I think only one album title is causing the error, and if that is renamed then I expect the Top 100 albums to work again for you. I still need to implement a fix, but if you could rename that album (editing the strAlbum field of the album table for idAlbum = 2904 to be something short and simple and not "Complete Beethoven Edition Vol 15 - Wind Music - CD2 - Sextett op 71, 3 Equale WoO 30, Märsche WoO 29, 18, Polonanise WoO 21, Ecossaise WoO 22, Marsch-trio WoO 20, Märsche WoO 19, 24") and test then that would prove my hypothesis.

It isn't the 182 chars of that title that is the issue or unicode, but how many digits there are in the title and the way the natural number sorting (e.g. 99 before 100) is implemented is overflowing a function variable size. At least that is what I think is happening, but you tell me if editing avoids the issue.

BTW that is one Hell of an album title! Would be much nicer to use separate files with embedded metadata and make use of the disc set functionality in v19 Smile


Thanks Dave for following up! I got this CD box set some 20+ years ago and my tools at the time when I went fully "digital" were pretty limited.

I tested shortening and rescanning (with the Mac version of 20.0alpha that you provided in the other thread), removing numbers and text progressively from the title, 4 times all the way down to "Complete Beethoven Edition Vol 15 - Wind Music - CD2" but it didn't fix the problem, still the same issue. Could it be some other album (that box set has quite a few that have similarly complex titles).

Testing directly on the DB with different queries of the form:

Code:
...ORDER BY albumview.iTimesPlayed DESC, udfNaturalSortFormat(LEFT(titlesortname,173), 8, '.') DESC, albumview.idAlbum DESC LIMIT 100;

... I see that with Length up to 173 chars in the LEFT operator, it works (I get the list of 100 albums on the command line). with 174 or above, it errors out with

ERROR 1406 (22001): Data too long for column 'sortString' at row 1

My server is on a raspberry Pi running raspbian,

rpi ~> mysql --version
mysql  Ver 15.1 Distrib 10.3.27-MariaDB, for debian-linux-gnueabihf (armv7l) using readline 5.2

Notice that when you install MySQLd in a debian/ubuntu etc server, you actually get MariaDB; MariaDB 10.3 is supposed to be a "drop-in replacement" of MySQL 5.7 (in case you are testing with MySQL). More details here
Reply
#10
Yes MariaDB 10.3 is supposed to be a "drop-in replacement" of MySQL 5.7 (that I have) but there can be different behaviour, and I am running on Win10 not Linux which again makes for variations. Let alone all the possible config settings too. The diversity makes full platform testing almost impossible, hence why using Kodi client-server is refered to as "experimental" - a great many users have it working just fine, but we can't guaranty it.

I think you are  right there are some other albums that are breaking the Top 100 query. Since there are only 6 albums with more than 173 characters it makes them easier to find. The additional culprets are idAlbum = 2926 and 3281

The data from the following query should show you what is happening.
Code:
SELECT idAlbum, strAlbum, CHAR_LENGTH(strAlbum) AS len_strAlbum,
LENGTH(udfNaturalSortFormat(strAlbum, 5, '.')) AS len_sort,
udfNaturalSortFormat(strAlbum, 5, '.') AS titlesort
FROM album
WHERE CHAR_LENGTH(strAlbum) > 173
ORDER BY LENGTH(udfNaturalSortFormat(strAlbum, 5, '.')) DESC;
In particular look at the calculated field titlesort, it has all numbers in the album title left padded with "0" to produce natural number storting. This uses fewer "0" than the real query so does not build a titlesort field that is greater than 255 (which is the source of the error). The real query, padding numbers to length 8,  does  produce a string of 256 characters or more for those 3 original album titles.

If you can confirm that you can get Top 100 working with all 3 offending album titles reduced to something with fewer numbers it would be useful. I'm pretty sure I have nailed the issue, but always good to test.

The fix is to modify function udfNaturalSortFormat to only process so many digits in the title and avoid building a silly long sort string for bonkers titles. Meanwhile I also need to think of a nice way to update that function on existing MySQL/MariaDB installs without having to do a db bump. You on client/server and with your odd album titles  could be the only user to ever hit this, the whole world doesn't need a db migration because of it.
Reply
#11
Thanks @DaveBlake once again! I can see the problem now.
Confirmed!!: I changed the remaining 5 long album titles (in addition to the first one discussed earlier, 6 in total) to short values, and the Top-100 list is working OK

Assuming that new Matrix installations will get the new code for udfNaturalSortFormat, and seeing that updating the SQL function code is painful on existing Matrix installations, I propose: what if you just leave it as-is on existing installs. Anyone with problems such as myself, can always do a re-migration from Music72 to get a new Music82 db built with the new function code and get the issue corrected, it's not a big deal.

And in the future, whenever Kodi moves to Music83 or beyond, that DB migration opportunity could be used to get the new function code in place and everyone will get the fix.

If you have a patched 19.x or 20.x version for Mac, I can revert my cuesheet changes, delete the Music82 db, and retest the flow.

On the plus side of all of this, the Kodi music DB handling code is becoming ultra bullet proof, no matter what we throw at it, it can handle Smile. Thanks Dave again for this outstanding piece of software
Reply
#12
Great, thanks for confirming. Your data has some wild and wacky artist and album "names", but does make for interesting testing. Thanks for motivating me to make a more robust effort at the stored functions that the music db has on MySQL/MariaDB to simulate the original in local memory sorting approach.

Yeap, re-migration from v18.X to v19.1 (skipping v19.0) would fix, also I can let the few (if you are not totally unique) with v19.0 on client/server and wacky album names have the SQL to run that will update the stored functions as needed. So I think that is the way to go rather than migrate everyone to a new DB that has exactly the same schema as before. I'll get a PR done and ping you when there is a test build to try, thanks in advance.

Meanwhile looking at your data....
I understand that you are working with historic cuesheets etc., but your artists node must look horrible and unusable. I did wonder if adding "," as an artist separator in advancedsettings.xml before scanning might clean it up a bit. There are a few artists with comma in the name e.g. "Earth, Wind & Fire" that would be mis-interpreted so not a standard or good choice of separator, but things like "Orchestra del Teatro dell'Opera di Roma, Leontyne Price (Aida), Jon Vickers (Radamès), Robert Merril (Amonasro), Plinio Clabassi (Il Re), Rita Gorr (Amneris), Giorgio Tozzi (Ramfis, capo dei sacerdoti), Franco Ricciardi (un messagero), Mietta Sighele (Gran" and "English Baroque Soloists and Monteverdi Choir, Furio Zanasi (Ulisse), Lucile Richardot (Penelope), Francisco Fernández-Rueda (Eumete), Hana Blažíková (Minerva/Fortuna), Krystian Adam (Telemaco), Gianluca Buratto (Tempo/Nettuno/Antinoo), Michał Czerniawski" as single records just make me cry. Then again you have used <lastname>, <initial> a few times and that would take the initial as an individual artist. Maybe bulk edit the cuesheets replacing "," with ";"? I'll leave both as suggestions.

Embedded metadata could do so much better. Devil
Reply
#13
(2021-04-13, 18:44)DaveBlake Wrote: Great, thanks for confirming. Your data has some wild and wacky artist and album "names", but does make for interesting testing. Thanks for motivating me to make a more robust effort at the stored functions that the music db has on MySQL/MariaDB to simulate the original in local memory sorting approach.

Yeap, re-migration from v18.X to v19.1 (skipping v19.0) would fix, also I can let the few (if you are not totally unique) with v19.0 on client/server and wacky album names have the SQL to run that will update the stored functions as needed. So I think that is the way to go rather than migrate everyone to a new DB that has exactly the same schema as before. I'll get a PR done and ping you when there is a test build to try, thanks in advance.

Meanwhile looking at your data....
I understand that you are working with historic cuesheets etc., but your artists node must look horrible and unusable. I did wonder if adding "," as an artist separator in advancedsettings.xml before scanning might clean it up a bit. There are a few artists with comma in the name e.g. "Earth, Wind & Fire" that would be mis-interpreted so not a standard or good choice of separator, but things like "Orchestra del Teatro dell'Opera di Roma, Leontyne Price (Aida), Jon Vickers (Radamès), Robert Merril (Amonasro), Plinio Clabassi (Il Re), Rita Gorr (Amneris), Giorgio Tozzi (Ramfis, capo dei sacerdoti), Franco Ricciardi (un messagero), Mietta Sighele (Gran" and "English Baroque Soloists and Monteverdi Choir, Furio Zanasi (Ulisse), Lucile Richardot (Penelope), Francisco Fernández-Rueda (Eumete), Hana Blažíková (Minerva/Fortuna), Krystian Adam (Telemaco), Gianluca Buratto (Tempo/Nettuno/Antinoo), Michał Czerniawski" as single records just make me cry. Then again you have used <lastname>, <initial> a few times and that would take the initial as an individual artist. Maybe bulk edit the cuesheets replacing "," with ";"? I'll leave both as suggestions.

Embedded metadata could do so much better. Devil

Big Grin Big Grin Big Grin you are right, what can I say
I do have a plan to migrate to embedded metadata at one point, I even have a working prototype script to cut the flacs into songs based on cuesheet and semi-automate the tagging. I estimate will take 2-3 days to run over the entire collection. If you notice, I tried to use some level of consistency within the Genres. For classical, I "overloaded" the PERFORMER field like so: COMPOSER; ORCHESTRA, SOLOIST/s; CONDUCTOR and this works OK 80% of the time. For non-classical, it's a lot simpler. The extreme cases tend to happen for Operas where I have COMPOSER; ORCHESTRA, CHOIR, CAST; CONDUCTOR. 

So instead of mass replacement of ',' with ';' my plan is to use a python script and code different tagging logic based on Genre and use it at the same time I cut each song from the album. What's keeping me from starting this is the daunting challenge that comes after - to fix the 20% of the albums that I expect will go not-so-well with the tagging logic and will need me to manually edit the tags. 20% of a lot of albums, it's still going to be a lot of albums to retouch Smile

In the meantime, for Classical and Opera, I mainly use "advanced playlists" to slice my collection. I have advanced playlists per composer (that match the composer name in the performer "field" and also for top conductors and top soloists.

In the meantime thanks to the pandemic and I have had the chance to listen to my collection almost entirely in 15 months, again using advanced album playlists that select albums that are below a playcount threshold. All in-between work meetings Smile. This had made the pandemic isolation much easier to handle.
Reply
#14
Finally got a PR done @htpcero https://github.com/xbmc/xbmc/pull/19635, will kick off a v20 test build soonest
Reply
#15
Test builds of v20 with this fix osx-PR19635 and Win64-PR19635
Reply

Logout Mark Read Team Forum Stats Members Help
Top 100 Albums fails (SQL error) on MySQL0