2015-09-29, 13:18
Starting a new music thread to discuss album information, in particular how externally scraped (or NFO) data is merged with the data already in the music library as scanned from tags in music files.
Current implementation does not do what people expect or want, see first page of this thread Music Album Info - Should displaying it delete tag data.
There is also BUG - Album information search endless loop about the longstanding horrible dialog loop you can get stuck in when the album isn't found especially if you don't have a keyboard. There is a PR to fix this bug, but it is not complete yet, I hope Evilhamster will be able to get back to it.
I feel we need to improve the way that scraping this additional data is initiated. On the one hand most users would just want it to happen automatically, and it doesn't. On the other control of what scarper used and when it scrapes is awkward for those that do want to specify and control it. This issue is covered by Follow up to PR8069 - Adding music to the library so I don't want to include it here.
As a reminder the basic functionality is to have a scraper gather information about albums from external sources, additional to that data gathered from the tags on the song files, store it in the library and then display it. The additional information can also be gathered locally from NFO files instead of online, giving the user control. There is also a setting regarding overriding the tag data with scraped data that should determine data priority.
That is all well and good, but there are design issues around merging data that need consideration.
The album scraper (or NFO) gathers some data that may also have been gleaned from the tags.
a) If we are going to override the tag sourced data with this then it needs to propagate to all the data tables e.g. album_genre not just the genre string of album.
b) Equally if we do not want to override, just fill in the blanks, then data must not be changed by mistake e.g. artists getting deleted and replaced or added.
Do we agree Then I will attempt to change CAlbum::MergeScrapedAlbum to do that because the current implementation has bugs.
Now the fiddly details, and less obvious merge decisions.
1) Say we have an album but no Musicbrainz IDs for albums or artists from the tagged music files. Scraping finds the album and returns the MB IDs too. Do we store the MB IDs even if we have said do not override tags? They are useful to uniquely identify tracks, albums and artists, and it would be adding info since they were missing from the tags. So I think we should, or is someone going to object to having MB IDs appear in their library
2) What if scraping misidentifies the album or, if offered a choice, we select the wrong one? This may not happen much with popular music, but with classical it is quite common as either the scraper only finds one match, or the list of possibles only shows composer name, album title and year and not conductor and orchestra making it hard to spot the right one if it is there (showing album cover would make choice simpler but that is a scraper issue). Too late you see the info dialog showing the wrong data
Not so bad, I guess, if b) is working, but really could do with an "undo" or "revert to tag data" button.
Thoughts please!
Current implementation does not do what people expect or want, see first page of this thread Music Album Info - Should displaying it delete tag data.
There is also BUG - Album information search endless loop about the longstanding horrible dialog loop you can get stuck in when the album isn't found especially if you don't have a keyboard. There is a PR to fix this bug, but it is not complete yet, I hope Evilhamster will be able to get back to it.
I feel we need to improve the way that scraping this additional data is initiated. On the one hand most users would just want it to happen automatically, and it doesn't. On the other control of what scarper used and when it scrapes is awkward for those that do want to specify and control it. This issue is covered by Follow up to PR8069 - Adding music to the library so I don't want to include it here.
As a reminder the basic functionality is to have a scraper gather information about albums from external sources, additional to that data gathered from the tags on the song files, store it in the library and then display it. The additional information can also be gathered locally from NFO files instead of online, giving the user control. There is also a setting regarding overriding the tag data with scraped data that should determine data priority.
That is all well and good, but there are design issues around merging data that need consideration.
The album scraper (or NFO) gathers some data that may also have been gleaned from the tags.
a) If we are going to override the tag sourced data with this then it needs to propagate to all the data tables e.g. album_genre not just the genre string of album.
b) Equally if we do not want to override, just fill in the blanks, then data must not be changed by mistake e.g. artists getting deleted and replaced or added.
Do we agree Then I will attempt to change CAlbum::MergeScrapedAlbum to do that because the current implementation has bugs.
Now the fiddly details, and less obvious merge decisions.
1) Say we have an album but no Musicbrainz IDs for albums or artists from the tagged music files. Scraping finds the album and returns the MB IDs too. Do we store the MB IDs even if we have said do not override tags? They are useful to uniquely identify tracks, albums and artists, and it would be adding info since they were missing from the tags. So I think we should, or is someone going to object to having MB IDs appear in their library
2) What if scraping misidentifies the album or, if offered a choice, we select the wrong one? This may not happen much with popular music, but with classical it is quite common as either the scraper only finds one match, or the list of possibles only shows composer name, album title and year and not conductor and orchestra making it hard to spot the right one if it is there (showing album cover would make choice simpler but that is a scraper issue). Too late you see the info dialog showing the wrong data
Not so bad, I guess, if b) is working, but really could do with an "undo" or "revert to tag data" button.
Thoughts please!