Kodi Community Forum

Full Version: Isengard Beta 1 MySQL Upgrade issue
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
There appears to be a problem with Beta1 and MySQL upgrades

It tries to update the database (doing that hang thing for several minutes with zero progress report...) ... and apparently then has an issue:

Code:
12:50:43 T:12012   DEBUG: Mysql execute: CREATE UNIQUE INDEX ix_actor_1 ON actor (name(255))
12:50:44 T:12012   ERROR: SQL: Undefined MySQL error: Code (1062)
                                            Query: CREATE UNIQUE INDEX ix_actor_1 ON actor (name(255))

Full debug log: https://dl.dropboxusercontent.com/u/1088...di.old.log

mysql error 1062 appears to be a duplicate key error, perhaps because you are inserting '' into a key column instead of null (just a very quick google!)

Note the beta then starts as if nothing has happened and uses the partially upgraded tables.

Close the beta, open it again - it then appears to get confused and tries to re-upgrade the DB but can't as the new one already exists...the rollback has not deleted them...

Quick log snippet of the second run -
Code:
12:54:45 T:5820   DEBUG: Mysql Start transaction
12:54:45 T:5820   ERROR: SQL: The table does not exist
                                            Query: SELECT TRIM(strActor) as strActor FROM actors GROUP BY TRIM(strActor) HAVING COUNT(1) > 1
12:54:45 T:5820   ERROR: Exception updating database xbmcvideo92 from version 90 to 92
12:54:45 T:5820   ERROR: Error updating database xbmcvideo92 from version 90 to 92
12:54:45 T:5820   DEBUG: Mysql rollback transaction
12:54:45 T:5820   ERROR: Unable to open database: xbmcvideo91 [1049](Unknown database 'xbmcvideo91')
12:54:45 T:5820  NOTICE: Old database found - updating from version 90 to 92
12:54:45 T:5820   ERROR: SQL: Can't create database for copy: 'xbmcvideo90' (1007)
12:54:45 T:5820   ERROR: Unable to copy old database xbmcvideo90 to new version xbmcvideo92

Full debug log: https://dl.dropboxusercontent.com/u/1088...C/kodi.log
I haven't actually seen an *error* per se, but the database upgrade is taking forever.

Quote:12:08:25 T:139752794212288 NOTICE: Running database version Addons18
12:08:25 T:139752794212288 NOTICE: Running database version ViewModes6
12:08:25 T:139752794212288 NOTICE: Running database version Textures13
12:08:25 T:139752794212288 ERROR: Unable to open database: MyMusic52 [1049](Unknown database 'MyMusic52')
12:08:25 T:139752794212288 ERROR: Unable to open database: MyMusic51 [1049](Unknown database 'MyMusic51')
12:08:25 T:139752794212288 ERROR: Unable to open database: MyMusic50 [1049](Unknown database 'MyMusic50')
12:08:25 T:139752794212288 NOTICE: Old database found - updating from version 49 to 52
12:08:29 T:139752794212288 NOTICE: Attempting to update the database MyMusic52 from version 49 to 52
12:08:38 T:139752794212288 ERROR: Unable to open database: MyVideos92 [1049](Unknown database 'MyVideos92')
12:08:38 T:139752794212288 ERROR: Unable to open database: MyVideos91 [1049](Unknown database 'MyVideos91')
12:08:38 T:139752794212288 NOTICE: Old database found - updating from version 90 to 92
12:09:14 T:139752794212288 NOTICE: Attempting to update the database MyVideos92 from version 90 to 92

It was stuck at that exact point for hours before I tried to restart, delete the new database, and try again. I can tell from the mysql console that it is very busily doing *something*, but I have no idea what the actual progress is.
Yes, it's extremely slow to upgrade on a good sized DB, much slower than and previous version since at least Camelot.

But look at your log carefully - does it report success - mine looked ok, started up ok, I tested a few things, then only worked this out when I went back in later and realised it had onyl halfway got there (but left the half upgraded versions behind).
You probably want to look at PR6020, which is needed in order to fix the problem you are reporting that is itself caused by PR5653.

Obviously there is still a problem, even after PR6020. Someone with this problem (or access to the data) needs to analyse the original v90 database and determine where these duplicate actor entries are coming from (they're probably not exact duplicates, maybe one will have a leading or trailing space, but MySQL will ignore the leading/trailing spaces and consider them all equivalent when adding a unique index constraint), and more importantly why they're not being removed by PR6020.

The exact source of all these "duplicate" actors is itself unclear (ie. nobody is entirely sure how the actors are being added multiple times to the database with different leading/trailing space variations). This is fixed with brute force in v92 by trimming the actors before attempting to add them to the database, thus avoiding but not entirely eliminating the problem.

The slow upgrade is probably due in part to PR6020, which is just a one-off hack to overcome the original problem, plus maybe a slow/misconfigured/underpowered MySQL server (the migration from v90 to v91/v92 is quite significant), or a huge database.
The DB is about 60MB and running on a Win7 64 bit hefty i7 16GB SSD machine (here being called over gigabit ethernet) - so I doubt that server power is really the issue here as such (although across the network is always a lot slower than on the same machine obviously)... but I'd imagine few will have a dedicated server much more powerful in home use I imagine.

I have loaded some DB backups here:
https://dl.dropboxusercontent.com/u/1088...90.Sat.sql
https://dl.dropboxusercontent.com/u/1088...48.Sat.sql

I can also manually try any queries etc you might like etc.


One of the main things that would be good would be some sort of progress indicator when the DB upgrade is happening. I'm sure sqllite is greased lightning but syncing is very very handy so I'm sticking with mysql Smile. But it should definitely rollback *and then delete* the new databases if there is failure right? (....and ideally give an alert).

I don't run nightlies using my master DB, or really use many external tools, but I do use NFO files a lot....
(the 60Mb figure comes from an SQL browser - as you can see the video backups is ~25mb and the music just ~8mb).
(2015-05-03, 01:24)bossanova808 Wrote: [ -> ]Yes, it's extremely slow to upgrade on a good sized DB, much slower than and previous version since at least Camelot.

But look at your log carefully - does it report success - mine looked ok, started up ok, I tested a few things, then only worked this out when I went back in later and realised it had onyl halfway got there (but left the half upgraded versions behind).

OK, I left the house and let it run and run and run, and eventually it did actually successfully upgrade the DB from version 90 to 92. I am not sure how to determine the size of each DB, but I can tell you there are 25,282 TV episodes, 1,539 movies, 12,482 MP3s, plus an additional couple of thousand files that my script for counting such things does not count. I'm not sure if that counts as a large DB or not and if the time it took is reasonable.

I second the request for a progress indicator for the DB update.
(2015-05-03, 02:29)bossanova808 Wrote: [ -> ]The DB is about 60MB and running on a Win7 64 bit hefty i7 16GB SSD machine (here being called over gigabit ethernet) - so I doubt that server power is really the issue here as such (although across the network is always a lot slower than on the same machine obviously)... but I'd imagine few will have a dedicated server much more powerful in home use I imagine.

I have loaded some DB backups here:
https://dl.dropboxusercontent.com/u/1088...90.Sat.sql
https://dl.dropboxusercontent.com/u/1088...48.Sat.sql

Thanks for the databases. I've restored xbmcvideo90 into my MySQL 5.5.21 server (on FreeBSD) and run an upgrade to v92 against it. As expected, I see the same error - it takes 246 seconds to fail with a RPi2 as client (my server is only an HP N36L with 8GB RAM).

In your xbmcvideo90 database, out of a total number of 42560 actor rows, you have 200 unique actors generating a a total of 407 actor variations - that is, you have 207 duplicate actor rows:
Code:
Unique Actors query:
  select strActor from xbmcvideo90.actors group by trim(strActor) having count(*) > 1

All actor variations query:
  select a.idActor, a.strActor from xbmcvideo90.actors a
      join (select trim(b.strActor) as strActor
            from xbmcvideo90.actors b
            group by trim(b.strActor) having count(*)>1) as c on trim(a.strActor) = c.strActor
  order by trim(a.strActor)

For example, the unique actor "Wayne Brady" has a total of 3 variations (1 unique, with 2 duplicates) in the v90 actors table:
Code:
'27542', 'Wayne Brady'
'41587', ' Wayne Brady'
'41589', 'Wayne Brady '

These and all other duplicate actors ARE being successfully cleaned by PR6020 (this is where about half the upgrade time is spent - deduplication is not an efficient process).

However, and this is the really weird part, in your newly created xbmcvideo92 database, your new actor table has no obvious duplicate rows, yet the unique index constraint still fails to apply.

Code:
QUERY: select count(*) from xbmcvideo92.actor
RESULT: 42353

42560 - 42353 => 207 rows removed, this is exactly the number of duplicate actor rows previously identified, so that's all OK.

Code:
QUERY: select name from xbmcvideo92.actor group by trim(name) having count(*) > 1
RESULT:  0 row(s) returned

QUERY: select name from xbmcvideo92.actor group by name having count(*) > 1
RESULT:  0 row(s) returned

MySQL doesn't consider the new actor table to have any duplicate entries.

Exporting the data and using various Linux utilities to sort and process the data concurs - there are no obvious duplicates in the data.

Code:
QUERY: CREATE UNIQUE INDEX ix_actor_1 ON actor (name(255))
RESULT: Error Code: 1062. Duplicate entry '' for key 'ix_actor_1'

Yet MySQL won't apply a unique constraint on the name column for the actor table!

There *is* one row which has a blank name, row id 40971, where name == ''.

Deleting row 40971 still results in the same error when the unique index is applied:
Code:
Error Code: 1062. Duplicate entry '' for key 'ix_actor_1'

There are no other blank name rows in the actor table.

So this isn't something I've seen before, or know how to fix... it doesn't make sense.

My advice... this is likely to be a "corner case" (unless a MySQL guru has any ideas) so save your watched statuses (see script in sig), dump the v90/v92 databases and rescan your library start from scratch...
I can certainly do that (and probably will). I sync with trakt so it's not much of an issue at all really.

But - in the interests of an easier life for others...what about my original comment though - are there some empty string entries '', rather than nulls in there? Could that be the issue?
After all the deduplication, there is one row with a blank name in the v92 actor table, but that's totally fine - as long as there is only one.

And there are no rows in the v92 actor table that have a NULL name column.

Even with the blank name row removed, MySQL still fails with the same error when applying the unique index, even though there are now no rows at all with a blank (or NULL) value for the name column!

I'm... stumped.
Here we go...

I created a temporary table (actorX) in xbmcvideo92, added the unique index on the empty table and then populated it from the actor table:
Code:
CREATE TABLE xbmcvideo92.actorY (
  `actor_id` int(11) NOT NULL AUTO_INCREMENT,
  `name` text,
  `art_urls` text,
  PRIMARY KEY (`actor_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE UNIQUE INDEX ix_actor_1 ON xbmcvideo92.actorX (name(255))

insert into xbmcvideo92.actorX(name, art_urls)
    select name, art_urls from xbmcvideo92.actor

The result:

Code:
Error Code: 1062. Duplicate entry 'Deborah Moore (Alfidia),  Sara Kestelman (Old Woman),  Kathryn H' for key 'ix_actor_1'

Querying for this value:
Code:
select actor_id, name, art_urls from xbmcvideo92.actor where name like 'Deborah Moore (Alfidia),  Sara Kestelman (Old Woman),  Kathryn H%'
Result:
Code:
'24898', 'Deborah Moore (Alfidia),  Sara Kestelman (Old Woman),  Kathryn Hunter (Charmian),  Alice Henley (Livia),  Matthew Eggleton (Egyptian Courtier),  Richard Dillane (Centurion Varro),  Stefan Brown (Lucius (Older)),  Max Baldry (Caesarion (Older)),  Lydia Bio (Guest Star)', ''
'45410', 'Deborah Moore (Alfidia),  Sara Kestelman (Old Woman),  Kathryn Hunter (Charmian),  Alice Henley (Livia),  Matthew Eggleton (Egyptian Courtier),  Richard Dillane (Centurion Varro),  Stefan Brown (Lucius (Older)),  Max Baldry (Caesarion (Older)),  Lydia Bio', ''

So they're not strictly duplicates.... the first value is 268 characters in length, the second value is 255 characters... but because the keysize of the unique index is only 255 characters they're seen as duplicates, as the first 255 characters of both values are the same.

Populating actorX with the following query succeeds:
Code:
insert into actorX(name, art_urls)
    select name, art_urls from actor where actor_id != 24898

The problem, then, is that the UNIQUE INDEX is only on the first 255 characters of the "name" column yet the name column - type text - can (and does) contain more than 255 characters. When the first 255 characters are the same on different rows (even when the remaining characters are not the same), then the unique index will fail.

A quick & dirty solution would be to limit the length of name values to 255 characters when inserting into the actor table (it's not possible to specify a key size greater than 255 - the maximum MySQL key size is 767, but for utf-8 MySQL assumes 3 bytes per character, and 256 * 3 = 768).

I'd probably start by truncating the name value of every row to 255 characters *before deduplication* (to minimize impact on subsequent dedupe code), then ensure that all future inserted actors don't exceed 255 characters (same place we currently trim values).
By the looks of the PR update you have found a solve? And indeed submitted a PR I see now.

You rock, sir! Thanks!
nice!
Where do these long actor names containing multiple actors come from? Sounds to me like the main problem is somewhere else as those multiple actors could be stored separately.
(2015-05-03, 16:41)Montellese Wrote: [ -> ]Where do these long actor names containing multiple actors come from? Sounds to me like the main problem is somewhere else as those multiple actors could be stored separately.

Agreed, the problem is certainly somewhere else, just as the leading and trailing spaces that cause duplicates is a problem somewhere else, but we don't know where this data is coming from so for now we have to deal with it during the db migration and then henceforth deal with it the best we can until we do know where it's coming from and can finally fix it there.
Pages: 1 2 3