Login at Kodi Home

Milhouse · (This post was last modified: 2015-05-13, 04:22 by Milhouse.)

(2015-05-12, 15:09)bamzero Wrote: I'm experiencing this also going from v90 to v92. Have 33 dupe actors, though nothing over 255 characters and the upgrade appears to be removing them ok.
Obviously something there though as running the query manually returns #1062 - Duplicate entry '34730-movie-1586' for key 'ix_writer_link_1'
I can provide a dump and debug log if you like.

In your v90 db, you have actor "Ralph Wright\t" present 16 times in your actors table (he's a writer).

MyVideos90.actors

Code:
idActor        strName

34730        Ralph Wright\t

34729        Ralph Wright\t

34728        Ralph Wright\t

16743        Ralph Wright\t

16744        Ralph Wright\t

16745        Ralph Wright\t

34727        Ralph Wright\t

34726        Ralph Wright\t

34722        Ralph Wright\t

34721        Ralph Wright\t

34723        Ralph Wright\t

34725        Ralph Wright\t

65026        Ralph Wright\t

65023        Ralph Wright\t

65024        Ralph Wright\t

65025        Ralph Wright\t

Which is to say, you have 15 duplicates that need to be removed - the "chosen" actor (writer) to be preserved after de-duplication is 34730.

"Ralph Wright\t" has writer credits on 3 movies (movie id's: 1586, 1594, and 1596). Thanks to all the duplicates, this appears as follows in writerlinkmovie:

MyVideos90.writerlinkmovie

Code:
idWriter    idMovie

16743        1586

34721        1586

34722        1586

34723        1586

65023        1586

65024        1586

16744        1594

34725        1594

34726        1594

34727        1594

65025        1594

16745        1596

34728        1596

34729        1596

34730        1596

65026        1596

During the de-duplication process, after the call to "CleanupActorLinkTablePre91("writerlinkmovie", ..);", MyVideos92.writerlinkmovie (an intermediate migration table) becomes:

Code:
idWriter    idMovie

16743        1586

34721        1586

34722        1586

34723        1586

65023        1586

65024        1586

16744        1594

34725        1594

34726        1594

34727        1594

65025        1594

34730        1596

Note that the last movie, 1596, now has only one entry, which is correct. This is because it originally had 34730 as a writer, and the extraneous duplicate writer/movie rows have been removed by the loop at line 4725.

Unfortunately the other two movies, 1586 and 1594, do not have a matching row for writer 34730, and consequently the duplicate rows are not identified and removed by the loop at line 4725.

A subsequent de-dupe step (which by now assumes that all rows which needed to be removed have been removed by the loop at 4725) then updates the writerlinkmovie table so that all duplicate "Ralph Wright\t" entries are assigned the same idWriter of 34730, this causes MyVideos92.writerlinkmovie to become:

Code:
idWriter    idMovie

34730        1586

34730        1586

34730        1586

34730        1586

34730        1586

34730        1586

34730        1594

34730        1594

34730        1594

34730        1594

34730        1594

34730        1596

Hence the error when the unique index is eventually applied to writer_link.

MyVideos92.writerlinkmovie needs to be as follows before the data is transferred to the new writer_link table:

Code:
idWriter    idMovie

34730        1586

34730        1594

34730        1596

This is such a mess...but it's possible a simple DISTINCT() might be sufficient when transferring the data from writerlinkmovie to writer_link, prior to the addition of the unique index, in lines 4646-4647, so that:

Code:
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT idWriter,idMovie,'movie' FROM writerlinkmovie");

m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT idWriter,idEpisode,'episode' FROM writerlinkepisode");

becomes:

Code:
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT DISTINCT idWriter,idMovie,'movie' FROM writerlinkmovie");

m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT DISTINCT idWriter,idEpisode,'episode' FROM writerlinkepisode");

Obviously the equivalent transfers for directors, music artist, studios, genres, country etc. would also need the DISTINCT added.

With the addition of the DISTINCT it's possible the calls to CleanupActorLinkTablePre91() can be eliminated (needs further testing, so may be "safer" to leave them in).

Milhouse · (This post was last modified: 2015-05-13, 04:51 by Milhouse.)

@bamzero: You've also got several duplicate studios, which aren't currently being addressed by any existing code, but have the same leading/trailing space problem that afflicts actors...

Code:
idStudio    strStudio

917        " Columbia Pictures"

913        " Studio Canal"

251        "Columbia Pictures"

914        "Splendid Film "

609        "Splendid Film"

179        "Studio Canal"

This blows up a different unique index:

Code:
03:37:09 T:139957943474368   ERROR: SQL: Undefined MySQL error: Code (1062)

                                            Query: CREATE UNIQUE INDEX ix_studio_1 ON studio (name(255))

bamzero · 2015-05-13, 04:59

I'm glad you understand it Smile

Hopefully this will help future releases anyway.

Yeah, I went through v90 and manually removed up the extra Ralph Wrights' and duped movies, assigning the remaining to the one actor id.
Ran conversion and hit the error on the studio index as you mention.
I removed the duped studio and assigned both movies to the one remaining studio id and was able to successfully migrate to v92.

Thanks for all your help. Seems dupe checking might need attention in a few more areas during conversions.

bamzero · 2015-05-13, 05:01

Ah I see you found a few more non-identical ones.. Only the identical Splendid Film ones caused a problem in this case.