Scraping from NFO files with new Python music scrapers

Scraping from NFO files with new Python music scrapers - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Music Support (https://forum.kodi.tv/forumdisplay.php?fid=263)
+--- Thread: Scraping from NFO files with new Python music scrapers (/showthread.php?tid=342397)

Scraping from NFO files with new Python music scrapers - scott967 - 2019-03-25

running the test build with the python scrapers .0.0.2

What I'm seeing is that when I select reload on artist info dialog, progress dialog opens and then the integrated artist scraper starts up. When the scraper exits, then the yes/no dialog opens. Then I can select "no" (don't ignore nfo).
Then integral artist scraper starts up again with the entire contents of the nfo file passed as action=NfoUrl&nfo=
After the python scraper exits then it logs

Code:
DEBUG: CAddonSettings[metadata.artists.universal]: loading setting definitions

DEBUG: CAddonSettings[metadata.artists.universal]: trying to load setting definitions from old format...

DEBUG: CAddonSettings[metadata.artists.universal]: loading setting values

DEBUG: MUSIC_INFO::CMusicInfoScanner::DownloadArtistInfo Got details from nfo

DEBUG: CAnnouncementManager - Announcement: OnUpdate from xbmc

Is that what's intended?

scott s.
.

RE: PR15791 Prompt to use nfo files - DaveBlake - 2019-03-25

Scott that does not sound right, but I am not able to reproduce. Does the artist have mbid from tags (could be in table from previous scraping check bScrapedMBID if unsure), is the new Python scraper set as the default artist scraper, or have you picked it just for this artist?

A look at the log might be useful if you have it, and the album.nfo file too.

EDIT:And finally how many album scrapers do you have installed and enabled, and what are they?

I think this could be related to CNfoFile and the fact that is loops through scrapers looking for one that can actually read the album.nfo file see https://github.com/xbmc/xbmc/blob/eb161cdb7c4c97e9f054192c9ea9e8d00839c305/xbmc/NfoFile.cpp#L78

RE: PR15791 Prompt to use nfo files - scott967 - 2019-03-26

1. I just tested artist reload, not album.
2. I did a new clean install with
-- music "fetch add'l" on
-- artist info folder set to empty folder
-- installed the integral artist and album scrapers ver 0.0.2
- set default artist and album scrapers to the integrated scrapers.
3. Added a new music source and scanned/scraped.
4. Exported all artists/albums to the artist info folder
5. Restarted Kodi. Navigated to artist, opened artist info and selected reload.
6. On select dialog, selected "no" for ignore local.

Debug log:https://pastebin.com/jPnw8JEU

Got same results. It works, but I don't see why it calls the integral artist scraper twice (and loads progress dialog in the background) and then does something with UAS.

scott s.
.

RE: PR15791 Prompt to use nfo files - Karellen - 2019-03-26

@scott967

The discussion here touches on what you have encountered... https://github.com/xbmc/xbmc/pull/15791#issuecomment-475482540

RE: PR15791 Prompt to use nfo files - DaveBlake - 2019-03-26

Yes @Karellen also noticed this behaviour while testing PR15791 (optionally ignore NFO on refresh) with the beta Python scrapers enabled. In his case it resulted in an error because the NFO file was created with available art URL data fetched using a previous scraper version and that produced an NFO that the xml scraper could not parse.

But none of this behaviour is caused by PR15791, it relates to the use of Python scrapers, hence going to edit the thread title.

(2019-03-26, 02:53)scott967 Wrote: Got same results. It works, but I don't see why it calls the integral artist scraper twice (and loads progress dialog in the background) and then does something with UAS.

It does seem odd doesn't it, but it is also how scraping from NFO files has worked for years, however the logging that comes with calling Python scripts makes the multiple scraper calls more obvious.

First thing the scraper (core code) always does is resolve Musicbrainz ID (if it has one) and creates a "URL"
- for the xml scrapers this is actually the URL needed to fetch data for that item from Musicbrainz
`http://musicbrainz.org/ws/2/artist/e119e5ff-0de0-421c-a630-0516c6acede8?inc=url-rels`
- for Python scrapers it looks like {"mbid": "e119e5ff-0de0-421c-a630-0516c6acede8", "artist": ""}

Then it checks for an NFO file, and if one exists (and is not ignored) the process passes to the NFO reader. The NFO reader loops through all the enabled scrapers of that kind trying to scrape the NFO file using it until it finds one that can succesfully do that. In that list of scrapers will be metadata.local, and that reads the NFO. The order of scrapers in the internal list will effect what other scrapers get tried before it.

So Scott what you observe makes sense given the code, but I have doubts about how sensible an approach it is. Why resolve the Muiscbrainz ID when it is not used? Why loop through scrapers when metadata.local is what is wanted? Something to be looked at more fully for v19

RE: PR15791 Prompt to use nfo files - scott967 - 2019-03-27

(2019-03-26, 09:52)DaveBlake Wrote: Yes @Karellen also noticed this behaviour while testing PR15791 (optionally ignore NFO on refresh) with the beta Python scrapers enabled. In his case it resulted in an error because the NFO file was created with available art URL data fetched using a previous scraper version and that produced an NFO that the xml scraper could not parse.

But none of this behaviour is caused by PR15791, it relates to the use of Python scrapers, hence going to edit the thread title.

(2019-03-26, 02:53)scott967 Wrote: Got same results. It works, but I don't see why it calls the integral artist scraper twice (and loads progress dialog in the background) and then does something with UAS.
It does seem odd doesn't it, but it is also how scraping from NFO files has worked for years, however the logging that comes with calling Python scripts makes the multiple scraper calls more obvious.

First thing the scraper (core code) always does is resolve Musicbrainz ID (if it has one) and creates a "URL"
- for the xml scrapers this is actually the URL needed to fetch data for that item from Musicbrainz
`http://musicbrainz.org/ws/2/artist/e119e5ff-0de0-421c-a630-0516c6acede8?inc=url-rels`
- for Python scrapers it looks like {"mbid": "e119e5ff-0de0-421c-a630-0516c6acede8", "artist": ""}

Then it checks for an NFO file, and if one exists (and is not ignored) the process passes to the NFO reader. The NFO reader loops through all the enabled scrapers of that kind trying to scrape the NFO file using it until it finds one that can succesfully do that. In that list of scrapers will be metadata.local, and that reads the NFO. The order of scrapers in the internal list will effect what other scrapers get tried before it.

So Scott what you observe makes sense given the code, but I have doubts about how sensible an approach it is. Why resolve the Muiscbrainz ID when it is not used? Why loop through scrapers when metadata.local is what is wanted? Something to be looked at more fully for v19

Thanks for the overview of the program flow. I didn't want to pollute the PR conversation with non-code Qs.

scott s.
.

RE: PR15791 Prompt to use nfo files - ronie - 2019-03-27

(2019-03-26, 09:52)DaveBlake Wrote: The NFO reader loops through all the enabled scrapers of that kind trying to scrape the NFO file using it until it finds one that can succesfully do that. In that list of scrapers will be metadata.local, and that reads the NFO. The order of scrapers in the internal list will effect what other scrapers get tried before it.

So Scott what you observe makes sense given the code, but I have doubts about how sensible an approach it is. Why resolve the Muiscbrainz ID when it is not used? Why loop through scrapers when metadata.local is what is wanted? Something to be looked at more fully for v19

not sure if you're aware or not, so though i mention it anyway...
.nfo files are not passed to xml/python scrapers in order to parse the metadata in those files, but to check if the .nfo file contains a url.
see:
- https://kodi.wiki/view/NFO_files#Parsing_nfo
- https://kodi.wiki/view/NFO_files#Combination_nfo

RE: Scraping from NFO files with new Python music scrapers - DaveBlake - 2019-03-27

Bless me there is wiki, I never think to look, thanks @ronie . Smile

I was aware that NFO files could contain a URL but had not put all the pieces together.

I was thinking that it could be a matter of music needs not being quite the same as video ones and things being done unnecessarily. It still seems a little odd to resolve Musicbrainz ID first, was that so it can take presidence over any other URL found in the NFO? Then loop through all scrapers to see if any of them find a URL and can use it to get data, and the metadata.local will read the xml tags.

So having Python scrapers just gives potentailly more scrapers to loop through and is more verbose in the log about it, but it is expected for the scraping process to access more than just the default scraper.

RE: Scraping from NFO files with new Python music scrapers - scott967 - 2019-03-27

Never used url nfos for any media so obviously I wasn't thinking it through.

scott s.
.

RE: Scraping from NFO files with new Python music scrapers - ronie - 2019-03-29

(2019-03-27, 09:16)DaveBlake Wrote: It still seems a little odd to resolve Musicbrainz ID first, was that so it can take presidence over any other URL found in the NFO?

i'm afraid i have no idea about that.
seems odd indeed, can't think of a valid use-case for it.