2015-05-13, 04:03
(2015-05-12, 15:09)bamzero Wrote: I'm experiencing this also going from v90 to v92. Have 33 dupe actors, though nothing over 255 characters and the upgrade appears to be removing them ok.
Obviously something there though as running the query manually returns #1062 - Duplicate entry '34730-movie-1586' for key 'ix_writer_link_1'
I can provide a dump and debug log if you like.
In your v90 db, you have actor "Ralph Wright\t" present 16 times in your actors table (he's a writer).
MyVideos90.actors
Code:
idActor strName
34730 Ralph Wright\t
34729 Ralph Wright\t
34728 Ralph Wright\t
16743 Ralph Wright\t
16744 Ralph Wright\t
16745 Ralph Wright\t
34727 Ralph Wright\t
34726 Ralph Wright\t
34722 Ralph Wright\t
34721 Ralph Wright\t
34723 Ralph Wright\t
34725 Ralph Wright\t
65026 Ralph Wright\t
65023 Ralph Wright\t
65024 Ralph Wright\t
65025 Ralph Wright\t
Which is to say, you have 15 duplicates that need to be removed - the "chosen" actor (writer) to be preserved after de-duplication is 34730.
"Ralph Wright\t" has writer credits on 3 movies (movie id's: 1586, 1594, and 1596). Thanks to all the duplicates, this appears as follows in writerlinkmovie:
MyVideos90.writerlinkmovie
Code:
idWriter idMovie
16743 1586
34721 1586
34722 1586
34723 1586
65023 1586
65024 1586
16744 1594
34725 1594
34726 1594
34727 1594
65025 1594
16745 1596
34728 1596
34729 1596
34730 1596
65026 1596
During the de-duplication process, after the call to "CleanupActorLinkTablePre91("writerlinkmovie", ..);", MyVideos92.writerlinkmovie (an intermediate migration table) becomes:
Code:
idWriter idMovie
16743 1586
34721 1586
34722 1586
34723 1586
65023 1586
65024 1586
16744 1594
34725 1594
34726 1594
34727 1594
65025 1594
34730 1596
Note that the last movie, 1596, now has only one entry, which is correct. This is because it originally had 34730 as a writer, and the extraneous duplicate writer/movie rows have been removed by the loop at line 4725.
Unfortunately the other two movies, 1586 and 1594, do not have a matching row for writer 34730, and consequently the duplicate rows are not identified and removed by the loop at line 4725.
A subsequent de-dupe step (which by now assumes that all rows which needed to be removed have been removed by the loop at 4725) then updates the writerlinkmovie table so that all duplicate "Ralph Wright\t" entries are assigned the same idWriter of 34730, this causes MyVideos92.writerlinkmovie to become:
Code:
idWriter idMovie
34730 1586
34730 1586
34730 1586
34730 1586
34730 1586
34730 1586
34730 1594
34730 1594
34730 1594
34730 1594
34730 1594
34730 1596
Hence the error when the unique index is eventually applied to writer_link.
MyVideos92.writerlinkmovie needs to be as follows before the data is transferred to the new writer_link table:
Code:
idWriter idMovie
34730 1586
34730 1594
34730 1596
This is such a mess...but it's possible a simple DISTINCT() might be sufficient when transferring the data from writerlinkmovie to writer_link, prior to the addition of the unique index, in lines 4646-4647, so that:
Code:
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT idWriter,idMovie,'movie' FROM writerlinkmovie");
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT idWriter,idEpisode,'episode' FROM writerlinkepisode");
Code:
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT DISTINCT idWriter,idMovie,'movie' FROM writerlinkmovie");
m_pDS->exec("INSERT INTO writer_link(actor_id, media_id, media_type) SELECT DISTINCT idWriter,idEpisode,'episode' FROM writerlinkepisode");
Obviously the equivalent transfers for directors, music artist, studios, genres, country etc. would also need the DISTINCT added.
With the addition of the DISTINCT it's possible the calls to CleanupActorLinkTablePre91() can be eliminated (needs further testing, so may be "safer" to leave them in).