Kodi Community Forum

Full Version: Request - Artist discography - more sophisticated matching of albums
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
In the Artist information window there are some information about the artist and also the (all) albums of the artist listed.
If an album is in the music library then this album is also shown in this list (with the cover), i.e. there is a matching of the own albums with the albums from the scrapper. This matching seems to be very simple and sometimes (often) there is no match. For me it looks like that the album title must match on every character and is case-sensitive.



As you can see my album title "Songs from a Room" will not match "Songs From a Room". The only difference is the lower case 'f' against the upper case 'F'.
Image


Other albums will not match because of a different apostrophe is used. I use the typewriter apostrophe (') but the scrapped album titles seems to use the typographic apostrophes ().
Image

Image



I have this non-matchings also on other artists/albums. Often I use EAC to rip my Audio CD's and I only do the tagging with EAC (using the data provided by the EAC data source). I'm not using MusicBrainz very often (even if this may not be a total solution for this problem).
I use the default scrapper (Universal Artist Scrapper and Universal Album Scrapper) with the default settings. Only the preferred language is set to German.

For me the matching should not be case sensitive.
Also the problem of different punctuation marks is very common, especially if you think in different types of (typographic) punctuation marks in different languages.

I think the matching of own album titles to scrapped album titles should be more sophisticated. Some improvements may be very easy to implement (e.g. not case-sensitive), other changes might be a little bit more difficult to implement (e.g. normalize [or remove] punctuation marks).
I think the source for the window for Artist information is DialogMusicInfo.xml.
According to the Wiki page there are two Names (SongInformation and MusicInformation) / Window ID's (10135 and 12001) for this source but I don't know, which name/ID will be used. I accessed if from the Music Library view, Artists and used Information from the context menu. I'm not sure if WINDOW_DIALOG_SONG_INFO or WINDOW_DIALOG_MUSIC_INFO is used here.
At the present time your best bet is to add musicbrainz IDs for artist, release and album to avoid this kind of result. Please read in the music section where there are numerous threads about similar topics.
Moved to music (where I'm more likely to notice it), and adjusted title for same.

I understand your issue with the artist discography @"NeroRome", but fixing it is not quite so trivial. The issue is what the scraper brings back, and how to compare that to what is in your library. But I'll see what ca be done for v19

The Artist info dialog is implemented in GUIDialogMusicInfo.cpp if that is of interest to you.
As far as "apostrophe" U+0027, I don't believe Unicode specifies any normalization and it can be controversial.  For example, many times people use the U+0027 instead of the proper ʻokina U+02BB for Hawaiian words and names which for some, is considered insulting.  I spent a lot of time cleaning up Musicbrainz.  If you take your strings from Musicbrainz you shouldn't have these kinds of problems.  Sorry for being a bit pedantic on this.

Normalization might be useful for half-width and full-width forms for ascii latin/punctuation but I doubt many users have a need for that, and I do support using normalization on unicode strings for searching and sorting/collation.  

scott s.
.
Also on MusicBrainz there are different symbols used for different releases, as you can see in the releases of the release group Death of a Ladies’ Man.


I take a look to the source code but my knowledge and experience in C/C++ is quite poor. Unfortunately the changes at all may not be as easy as I thought.


Indeed, normalization for the apostrophe symbol (typographic and typewriter, but also other symbols often used wrong in this case like the prime symbol, single quotation mark, acute accent or grave accent etc.) and other symbols/punctuation marks might be controversial. I know.

Of course, this is not a big issue.
Usually I'm not sure, if I should use the correct typographic and language specific form or just use the ordinary sign for the default keyboard. This is for many symbols, not only for apostrophes. And often I don't like the automatic substitution in a word processor or programs.
And I'm also don't know if I should use correct symbols in the tags but simple symbols in file names.
Life was so easy with plain 7-Bit-ASCII.


In some cases the correct symbol might be a big problem, in other cases only a small (optical) issue. And often it's very subjective but it's always an issue, even if it's only a small one.
However, I would prefer a fuzzier matching here.
Be warned the last time I looked at this I began to wish that Kodi didn't scrape discography at all Tongue

Matching on name in itself can be an issue even without Unicode chars - some users have separate albums by artist with same name, for example the original release and then a "deluxe" edition with more tracks. They have different mbid so Kodi can hold them and Musicbrainz retuns them to the scraper as separate discography items.  But I don't think that the scraper returns mbids for the discography items, and maybe it could that would at least make matching for exact for those that have tagged with mbid.
Here is a thought, DON'T use Kodi to scrap your Music. Instead, have Kodi read your .NFO files, that you created for your Music.

But How, i hear you ask? Easy, use MediaElch to scrap your Music (it's the only scrapper that currently does Music - as well as Movies and TV Shows). The best part is, you will see all the issues you listed above (and more), but you have the power to fix them much easier than DaveBlake can. Not that I don't want DaveBlake to fix some issues with how Kodi works with Music. But some issues are bigger, and actually require him to fix them, where Scraping is one that You as a user can easily fix.
I can see a use for approximate (fuzzy) string searching.  But it has both technical and implementation considerations and would need someone ideally with experience in the field to do this.  A quick web search didn't turn up any obvious candidates for a plug and play library.

For its part, Musicbrainz query api v2 implements Lucene search, which can do a fuzzy search based on Damerau-Levenshtein Distance using the tilda as a flag character in queries.

In Musicbrainz, there is a relationship model of work to recording to track.  It can be that a recording is released as tracks with different name strings, which could be "mistakes" (i.e., what some graphics guy put on a cover or label) or could have some artist intent.

Naturally, if every one were American or at least en-GB user it would be easy (though still should we have a visualization window and favorites menu?)

scott s.
.
(2019-02-27, 10:45)DaveBlake Wrote: [ -> ]Matching on name in itself can be an issue even without Unicode chars - some users have separate albums by artist with same name, for example the original release and then a "deluxe" edition with more tracks. They have different mbid so Kodi can hold them and Musicbrainz retuns them to the scraper as separate discography items.  But I don't think that the scraper returns mbids for the discography items, and maybe it could that would at least make matching for exact for those that have tagged with mbid.

I think, now we are going too far away from my issues.

It would be a different issue how to deal with different releases of an album, e.g. Deluxe Edition, Remastered Edition, etc.
IMHO the album title (tag) itself should not hold this extra information, but this doesn't follow the guideline of MusicBrainz. To have a good support of this guideline in Kodi the matching logic for the artists discography with the own library should also consider this (maybe in a second step).
AFAIK the scrapper (default settings) returns only studio albums of an artist and only the first, original release. Already in a usual way, where the album release was before CD's have been invented (like it is in my example case), there are usually at least 3 releases: The original release, a release on CD and a release as digital media. Often there are many other releases of an album, like for different countries or at different labels. And each release has its own release-ID at MusicBrainz.
In fact that scrappers don't provide MBID's, what I believe, MBID's cannot be uses for matching. Even if a scrapper would provide a release ID, there is no guarantee that you have the same release. Nevertheless, if you have a different release (e.g. at a different country) you might want to have your album-release matched in the artists discography. Furthermore not a release-ID but only a release-group-ID might be a possible solution for a more or less proper matching, but many other aspects have to be considered here.
Again, I'm not the biggest fan of MusicBrainz and I would prefer not to have a too deep connection of Kodi to MusicBrainz.

And again: I think we are moving too far from my suggestion, just to do case-insensitive matching (and normalize the string before matching).
(2019-02-27, 20:04)Powerhouse Wrote: [ -> ]Here is a thought, DON'T use Kodi to scrap your Music. Instead, have Kodi read your .NFO files, that you created for your Music.

Just to be sure: It's not about scrapping the music (albums) itself, but it's about of scrapping artist information and the automatic matching of own albums with the artist discography.

At album level I just use local covers and tag data only.
For artists I usually use the scrapper data and sometimes I use local "fanart" and "thumb".

But also if you use Artist nfo, you have to do it very accurate and a difference only at the case of one character would break the matching.
Actually, scraping with MediaElch, does everything you want, and more. Not only does it do Artists information (different than the Artist nfo you posted above, mostly with more information), Album information, Discography information, Genre (Artist, Album, Song levels), but also includes lots of Artwork (posters, fanart, Albums, logo, disc, etc). I believe the only things missing from the current builds of MediaElch are Spine Artwork, and Back Artwork (but those are too be added in future releases).

Oh, and did I mention that MediaElch also does all the MBID's as well. The catch is, you have to know which MBID to pick  for the Album (is it the Deluxe version, Released by Target, from Japan, etc.) Easy to figure this out in MusicBrainz Picard, but not as easy through MediaElch. But, if you do an Artist's music at the same time (both in Picard, and MediaElch) then you can link them together easily.
Kodi 18 can store MBID "release group".  If "release group" is populated that could potentially solve some album title issues.

scott s.
.
(2019-02-28, 22:13)scott967 Wrote: [ -> ]Kodi 18 can store MBID "release group".  If "release group" is populated that could potentially solve some album title issues.
Unfortunately no.
MBID's are not used for discography.
sql:
CREATE TABLE discography (idArtist integer, strAlbum text, strYear text);
It is not so much that "mbids are not used for the discography", as that the scraper does not provide them. I think the issue is that Musicbrainz does not return them, or it would take extra requests and thus make scraping unacceptably slow. But if the scraper could efficiently get the release group ids for the albums by an artist as part of scraping the artist, then I would certainly make the core changes neccessary to make use of it.

@"NeroRome" to my memory you seem to be the only user with a real interest in artist discography - I await others to get excited about this.

EDIT: June 2021
It seems I did take an interest after all. Improvements to matching up the albums in the user library to the list retrieved by scraping the artist was implemented  for v19. When available Musicbrainz release group id is used to match, and the default artist scraper has been updated to fetch these for the artist discography when scraping it from Musicbainz. See https://github.com/xbmc/xbmc/pull/18079