Kodi Community Forum

Full Version: How does library update scanning and scraping work
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I think there’s a huge user base using Kodi music. Most don’t have the special requirements I’d say. Personally I only use Kodi for music playback and I’ve introduced quite a few users to Kodi for music.

The new disc feature is going to solve some of my ‘like to haves’ I think but I can do all I want without MusicBrainz IDs. It’s pretty flexible once you work out how it works.

The biggest pain for new users is understand how Sources work and scanning. There’s quite a few JRiver uses over on QuadraphonicQuad and I’ve been helping them to ease the pain with tweaking music prior to adding to JRivers database. One nice feature is it auto-monitors the it’s Source folders and scan new media in automatically.

I know Kodi has an option to scan on startup, I’ve never tried this as I assume it does a full scan which would take a while. Please tell me this is only a scan for new. (if not mayb a new Preference?). That would be more useful than the full scan. Maybe the long scans for me is that not all files have MusicBrainz IDs so it’s goes looking. Unsure...
Quote:It’s pretty flexible once you work out how it works.
Yes, that was always the aim. Give power users as much functionality to use creatively as we can (now @black_eagle and me Smile ) within the inevitable limited time we can volunteer. Backing it up with clear instructions and user guides is just not the best use of my time. I do get sucked into long explanations on the forum sometimes, which is not efficient either,  so always happy that you expert users help each other out and can support the rest of the community. Anyone want to turn my longer posts into a guide please do.
 
(2021-05-08, 02:16)HomerJau Wrote: [ -> ]One nice feature is it [JRivers] auto-monitors it’s Source folders and scan new media in automatically.

I know Kodi has an option to scan on startup, I’ve never tried this as I assume it does a full scan which would take a while. Please tell me this is only a scan for new. (if not mayb a new Preference?). That would be more useful than the full scan. Maybe the long scans for me is that not all files have MusicBrainz IDs so it’s goes looking. Unsure...
Scanning in Kodi is 2 phase, with optional scraping of the resulting artists and albums as a 3rd.
Phase #1 of library update is to look through the folder tree of every source previously added to the library, working hierarchically and checking a hash value of size and date  timestamp against that stored previously for any changes. If there is a difference in hash for the folder (which will catch both new files and edits) then the metadata embedded in the files are read. That is Phase #2. Finally phase #3, if you have "fetch additional info on library update" enabled, and artist and album scrapers set, then the scraper attempts to fetch information and art for any previously unscrapped album or artist.

Now phase #1 is relatively quick, it only looks at hashes, but does require disc and LAN access to wherever your media files are located and speed for that will vary depending on your set-up and how much music you have.
But yes, scanning does take a quick look at all the folders, but how else is it to find the changed or new stuff? If JRivers, or anything else, has a different approach to auto-monitoring I would like to know what that is.

Phase #2 is slower, it is reading metadata and making db entries, but only does so for music files in folders that have changed. So say you add or edit a music file, Kodi will rescan all the files in that folder. This makes sense as the db entries for album and artists are a combination of data from multiple files taken together. It does mean that if you had a totally flat file arangement (all files in one fodler) Kodi would still use the tagging to make library entries for artists, albums etc. but rescanning would be slow. Most users have some kind of folder structure to their music files, how many folders have changed hash will impact the speed of phase #2.

Finally phase #3. Unless you have all local nfo and art, (or scraper set to be local only) then scraping requires internet access to various servers which takes time. All the current scrapers use Musicbrainz as a primary source, and access to their server is throttled to one attempt per second (or they block you). The first thing that happens if there is no mbid is that it does a look-up by name, before making a second call to get the data. Not having Musicbrainz ID  values in your tags means that remote scraping makes twice as many requests, which takes time and places more load on the free Musicbrainz service. Scraping will try everytime to fetch data (first locally and then remotely)  for any existing artists and albums that have not been successfully scraped. So if you have an odd artist without mbid that the scraper can't lookup it will keep trying every time.

If I want to get an item into the library quickly then I use "scan to library" from the context menu in file view, having navigated to the artist or album folder where I know I have added music. This reduces the time taken for even phase #1. I also don't scrape by default, but prefer to do that on my desktop (dev system), export to nfo files and local art, tweak anything I want to, add those nfo and images to my media folders and Artist Information Folder, and only then update my family system. I prefer that kind of hands on curation, but then my music collection is pretty mature now, I rarely add more than 10 albums at a time.

@HomerJau hope that clarifies things. It may well be that "update library on startup" is what you want, yes it only deep scans new/changed things. It runs in the background so does not prevent library use (just cleaning or scraping). But unless you are adding things to your music collection all the time I can't really see the need. Just click "update library" on the side blade when you know you have added stuff.

Oh and if you want to scrape remotely too maybe be kind to our friends at Musicbrainz and make nfo files for those unidentifieable albums and artists, set scraper to local for some folders (scraper settings it can be varied at that level), or tag your music with mbids (adding to Musicbrainz for any unknowns).

Might split this scanning/scraping mini-tutorial off into own thread.
Thanks Dave for your excellent write up!

I’m mindful of MusicBrainz hits so will investigate the local NFOs for albums. I use NFOs for my movies which makes scanning incredibly fast, same for Music Videos (but NFO local info is the only option).

I appreciate that NFOs gives a user full control over the metadata which many of us like. However, I don’t use NFOs for music albums at the moment only Artists, and but not all artists (I’ve gotten slack over the last couple of years).

Can you please confirm when music NFOs are used. I think I read here on the forum they were only used if a new album folder was found during a scan. After that file tagging is used. Sound like that’s not correct based on your post.

EDIT:

I’ve just read the wiki on artist and album nfo files for the first time in a few years:

The NFOs are not used to create an artist or album during a scan.

But this from the album page:
The following table lists the available tags that can be used in the album.nfo file. Some tags are used when scanned into the library, other tags are ignored.

The table that follows has an ‘Overwrites’ column (yes/no). Does this indicate the tags that overwrite data in the database after the file tag scan?

When does a scan read the NFOs. Immediately after an initial tag scan for new artist and albums?

Does a future music scan always read changed NFO files? Or is this only in ‘local information’ mode scanning.

THX
Garry
@HomerJau have split off from other thread as this conversation is its own subject.

(2021-05-09, 00:33)HomerJau Wrote: [ -> ]Can you please confirm when music NFOs are used.
When does a scan read the NFOs. Immediately after an initial tag scan for new artist and albums?
NFO files are read as part of scraping (Phase #3 in my description) albums or artists, whatever route you take to invoking scraping.

There a 3 routes that result in information for an artist or album being scraped:
  1. Hitting "Refresh" button on the artist or album information dialog.
  2. Selecting "Query for all" on the context menu from a filtered or full artists or albums node.
  3. As the last step in library update (all library) or scan item into library (just the folder you are on) when "Fetch additional information on library update" is enabled, for those artists and albums involved in the library update that have not been scraped succesfully before.
Scraping looks for nfo files first and uses them, it only fetches from remote sites when the scraper is set to do so AND no NFO file is found.


(2021-05-09, 00:33)HomerJau Wrote: [ -> ]Does a future music scan always read changed NFO files? Or is this only in ‘local information’ mode scanning.
Once additional information for an artist or album has been successfully scraped, no matter if the information was from NFO or remote sites, then it does not get scraped again. Editing or adding and nfo file does not cause rescraping. Only "refresh" button on the information dialog causes something to be scraped again, and then user is prompted when an nfo file is found over whether to used it or look remotely.

Like the wiki says NFO files are not used to create an artist or album during a scan, they are used to add additional information about artists and albums that are created by processing the tags embedded in music files.

(2021-05-09, 00:33)HomerJau Wrote: [ -> ]But this from the album page:
The following table lists the available tags that can be used in the album.nfo file. Some tags are used when scanned into the library, other tags are ignored.

The table that follows has an ‘Overwrites’ column (yes/no). Does this indicate the tags that overwrite data in the database after the file tag scan?
Hum, not as clear as it could be. Using "tag" for both xml entries in NFO files and embedded metadata in music files is totally confusing.

The primary source of information is the tags embedded in music files, but there are some things that could be provided via NFO too, and other things that can only come from scraping (NFO or remote sites). There is also a setting "Prefer online information" which controls some of what happens.

Historically the idea was to let uses that had messy tags clear it up via scraping, but this really was just giving users rope to hang themselves with once you realise that look-up by name alone insuffient with music. Then there was a never completed idea to have Kodi optionally pick-up community data changes direct from Musicbrainz. Ok if you had mbids, but got applied to just name based look-up too and mis-identified items and only partial data updates caused havoc. I stopped that for v16 onwards and "overwriting" became much more limited in scope - things now have to match-up on mbid, and not have the data from tags.

My recommendation to anyone with decent tagging (and that should be all users now!) is leave "Prefer online information" disabled.

But, assuming that you are curious, what could the scraper fetch and apply that the music library may already have from tags?
....
Well I will cover that in a subsequent edit, my dinner is ready!
Thanks again Dave for another great write-up.

I’m planning to sit down and spend some time on this tomorrow to review and check this out on Kodi to clear any queries that come up.

THX
Garry