Efficient music artist scraping
#16
The main problem with this is that it needs a way to remember the user that maybe he had done that before when he does a fresh install Sad
Reply
#17
Maybe a user prompt when a media source is added, at least for the first source when the library is first populated? Since you are about to populate your library for the first time, do you have a local location where Kodi can look for additional artist and album data?

But key is to have one location for all music, not lots of folders
Reply
#18
Yes, this sounds good, I'm just worried about some previous "discussions" in Github about video and audio needing to have the same on boarding screens and all that Sad
Reply
#19
What information of the scraped data do you guys actually use? If I'm honest to myself I don't use that much besides artists pictures (fanart, thumbs). The biographies I find generally much too long to read. Plus of the artists I listen to more often I already know their biography, for active artists they are outdated pretty fast and I'm way too lazy to update them in kodi for the rare case biographies were updated in the online source. Style, moods? I could generate smart playlists with that I believe. However I find that the last.fm playlist generators for music and music videos and the pretty new amazon prime music addon do a much better job at that.

The information I use on a daily basis are from my tags and not from scraped data I have to confess.

My personal kodi experience would not suffer much if no scraping occurred. What would be cool is if kodi showed me news headlines (twitter?), upcoming gigs and such stuff of my artists and recommend me new music based on my listening behaviour like last.fm, amazon and google do in a widget.
Reply
#20
My personal Music scrapping/scanning requests:

- Option to NOT scrape artists that only appear on compilations (we already have an option in settings for displaying them, not point in scanning if not displaying)
- Option to "Exclude path from library updates" for Music sources. just like we already have in the video section (This means I can scan in my music archive then turn off the scanner/scraper for subsequent library updates. Vastly speeding it up)
- Maybe a much harder one but Kodi music scanner to skip file scanning for non changed files (Kodi currently seems to read the id3 tags on every single library update)

The Artist or Album data scrapped from TADB is actually very easy like styles, moods ect. I see no benefit of restricting this as its all in the single JSON response.

An "Artist NFO Cache" may be useful in userdata. But may also add to complications so i'm not massively keen.

@Powerhouse TADB music video scraper is fixed now, but only available in the forum. Site was down this morning for a mysql update that I borked, fixed now.
@DarkHelmet, bit offtopic the last requests, but I am working on trending artsts API(see TADB front page already), and supply concerts ect from songkick api. I think extended info script already does some of that, just needs skin support.

Those who say scrapping is not needed, are completely missing the point of a media center. The idea is to use crowd effort to show your music nicely, not having to tag and find artwork for everything yourself Wink
Reply
#21
(2016-03-18, 17:44)zag Wrote: - Maybe a much harder one but Kodi music scanner to skip file scanning for non changed files (Kodi currently seems to read the id3 tags on every single library update)

Detour into scanning - extracting data from the music files themselves, used to create music library songs, albums and artists, opposed to scraping.

Kodi does already skip scanning tags on non-changed files, but of course it has to look to see what files have changed.

When you click library update, or have it happen automatically on start-up, you see Kodi loop through all your music sources and every subfolder. But it does not read the ID3 tags every time, that would be silly, instead it loops through the file structure looking for changed file path hash made from filenames, filesize and filedate.

If a song file hash has changed (meaning it has been edited and the tags may have changed) it then removes all the song data for songs in the same folder i.e. that album (presuming an album structure) and then rescans their tags and recreates the library entries. Effectively it rescans the album because the album data relies upon the tags of all its songs, so one song change could mean an album change, new artists also get picked up. If you have song files in a flat structure then this will mean a lot of rescanning when one file changes or new one added.

Do we want Kodi to even skip checking certain folders for changed file hash? I guess we could if even just a check of file hash is too slow. Some timmings for library update checking (no changes, no scraping) on large libraries would be interesting. But notice it is only update library that does this full hash check across all sources and subfolders. From file view you can trigger the rescan of a single media source, or scan of a new one. Want to scan just new stuff, add a new source and scan that, don't click library update so often.

Do we as users really want to spit our music into "archived" and newer stuff? Persoanlly I structure by category (family member, classical, soundtracks, pop etc.) then (primary) artist > album or composer > album. New stuff gets added under the category>artist mixed in with the stuff I already have. Only way to locate my new is have Kodi check all the hash.

I guess what could be useful for the more expert user is the ability to check (and re-scan) a subfolder of a music source rather than the whole source. For example I know I have added more Beethoven albums, but I have one music source for all my classical music and I want to avoid checking all of it or adding a separate source for each composer, the ability to check just the subfolder would be efficient. But I could just split my music into more sources if I was that worried about it.

Remember all that ^^ is about scanning, not scraping.
Reply
#22
Some tests here

http://forum.kodi.tv/showthread.php?tid=265666

Seems to be working well already, for simple scanning.

Will test with online info next. I may be some time.... Smile
Reply
#23
Some observations on current scraping behaviour when getting additional artist info from onlne sources:

If you initiate scraping from "Query Info for all artists" item on the context menu of the artists node or artists filtered by genre, then it does heed both the albumartistonly flag and the genre filter. In effect it only scrapes the artists that are listed. This provides an immediate way to focus scraping onto selected artists.

"Query Info for all artists" item on the context menu for "artists" playlist does not currently do anything. That is a shame, as it would be useful to use more than just genre as a filter on what artists get scraped. But should be fairly simple to extend this.

Similarly "Query Info for all artists" from a list of artists filtered by role, accessed by drilling down from a role node, does not currently do anything. This avoids the waste of trying to fetch data for credited producers, engineers, instrumentalists etc. that are not also a named title artist on the track and thus are less likely to be in the online sites. But it would be nice to be able to choose to try and scrape info for these people.

Using "Query Info for all artists" without having sometime since last power up checked for library updates, does not scrape anything just does a library cleanup (deleting orphans). Even if there are no library changes, update library has to have been clicked at some point if you want to scrape artists.

In stark comparison if artist scraping happens automatically on adding a new music source because "Fetch additional information during updates" setting is enabled then all song and album artist credits get scraped, or re-scraped if it was unsuccessful before.

With "Fetch additional information during updates" enabled on library update, because set to run on start-up or manually started from side blade, just those song files that have changed (hash) or are new have their song and album artists scraped. So if you add items to library with "Fetch additional information during updates" disabled, later enable it and then call library update nothing gets scraped.
Reply
#24
https://github.com/xbmc/xbmc/pull/9461 raised to fix some of the things listed above. There is more to do, but hopefully I can complete a few uncontrovertial ones first.
Reply
#25
Not that I'm against as not very used, but remember that GUI setting hide compilation artist might not be applied by all other consumers like WebInterface and remotes Wink
Reply
#26
(2016-03-27, 00:10)Tolriq Wrote: Not that I'm against as not very used, but remember that GUI setting hide compilation artist might not be applied by all other consumers like WebInterface and remotes Wink

No idea if API can trigger Kodi to do scraping, I probably don't want to know either, API work is so frustrating and depressing Sad

I'm not sure what consequences you invisage? As I understand it the API gets to see the data that Kodi has, so nothing changes there. With Kodi GUI showing only album artists the other artists don't get automatically scraped (although the user can elect to do so via a smart playlist or custom node, or temporarily changes the albumartistsonly setting). The API consumer, unlike the GUI, may be showing all artists, but will only see additional info for those that have been scraped. So what would be the problem?

BTW I have a personal mission to get rid of the term "compilation artist" because it is misleading, so much clearer to talk of album artists and song artists. The setting name and desciptions badly need amending but waiting for the settings refactoring for that to happen.
Reply
#27
Well it can scan : https://github.com/xbmc/xbmc/blob/master...y.cpp#L634 so i suppose it can scrape too if it's configured for it.

As I said I'm not saying this is important, but yes remotes / webinterface may want to display all artists with meta data. So one day it would be cool to be able to trigger the scrape of those if they are now removed.

Since there's an holy war against advanced settings, I suppose this will have to be done via the API and I have understood you no more want to hear about it Wink

Don't know if you can make it already a parameter to the function forced to false for the moment, then one day when you came back to API you add something to call it with true.
Reply
#28
I think it's ok to say that if an api consumer wants full data for all artists then the user should make sure a full scrape of all artist data has been done on the kodi install. It would certainly be desirable to one day be able trigger all the scraping options from the api as that will be necessary for headless use to become official.
Reply
#29
Looks good to me, be careful to go off topic with PR's.

Extra API feature requests should be handled in separate PR's (if you decide to code them).

For me, thats not important at the moment. This is about quicker more efficient scraping.
Reply
#30
Well you missed the point of my remark.

This is removing something that can be used at the moment Smile (And typically that I use personally)

API change would be to bring back what is removed by this PR, a way to scrape all artist even if they are hidden from GUI so that WebInterface and remote can use them.

So sorry but I'm far from Off Topic, as I said maybe I'm a rare use case that hide the compilation artist on Kodi for my wife, but still want to have all the data when I browse from Yatse.
And I like to have the fanart and thumbs for all the displayed artist it's way more beautiful.

So sorry if you do not understand the impact of this PR, but yes it does scrape quicker but this is at a cost.

Now as I already said I'm not against the change, but impact of such change must be understood, when there's an API that can consume data, changing behavior only based on GUI can impact negatively this API.
Reply

Logout Mark Read Team Forum Stats Members Help
Efficient music artist scraping0