Kodi Community Forum
Release Universal Scraper for Music Artists - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Add-on Support (https://forum.kodi.tv/forumdisplay.php?fid=27)
+---- Forum: Information Providers (scrapers) (https://forum.kodi.tv/forumdisplay.php?fid=147)
+----- Forum: Music Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=303)
+----- Thread: Release Universal Scraper for Music Artists (/showthread.php?tid=132623)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-20

(2012-07-13, 08:12)deh2k7 Wrote: Liking the news music scrapers, fine work! I am seeing some weird, intermittent issues when scraping, however.

For things like genre, years active, i have the artist scraper set to allmusic. For many artists this seems to work fine, but many others I'm not getting the information scraped. I verified on the allmusic site the the fields were populated, but the scraper seems to have issues getting the information consistently.

I'm seeing a ton of the these messages while scraping:

Code:
00:14:55 T:5848   ERROR: ADDON::CScraper::Run: Unable to parse web site
00:14:56 T:668 WARNING: XFILE::CFileCurl::CReadState::FillBuffer: curl failed with code 22
00:14:56 T:668   ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.

When I saw a ton, I mean 100s-1000s in the log. What does the above mean?

Olympia - still seeing the error messages above, and they seem to correspond to a failed attempt to scrape an artist. I enabled debugging, wiped my music DB, and then rescraped and am able to reprove the issue when scraping with Universal Artist scraper. So far, I have not noticed any issues with albums, but I've been focused on artists.The issue is intermittent and doesn't seem to occur on the same artists; it does occur regularly, but the artists on which it occurs seems to be random as far as I can tell.

Here's an excerpt from the debug log (sorry pastebin was acting up on me tonite):
Code:
00:31:54 T:4288   DEBUG: MUSIC_INFO::CMusicInfoScanner::DoScan Scanning dir 'M:\Billy Joel\' as not in the database
00:31:54 T:4288   DEBUG: MUSIC_INFO::CMusicInfoScanner::DoScan Scanning dir 'M:\Billy Joel\extrafanart\' as not in the database
00:31:54 T:4288   DEBUG: MUSIC_INFO::CMusicInfoScanner::DoScan Scanning dir 'M:\Billy Joel\Greatest Hits, Vols. 1 & 2 (1973-1985)\' as not in the database
00:31:54 T:4288    INFO: Creating album thumb from memory: special://masterprofile/Thumbnails/Music/e/e6528036.tbn

00:31:55 T:4804   DEBUG: Thread MUSIC_GRABBER::CMusicInfoScraper start, auto delete: 0
00:31:55 T:4804   DEBUG: ADDON::CScraper::FindArtist: Searching for 'Billy Joel' using Universal Artist Scraper scraper (file: 'C:\Users\XBMC\AppData\Roaming\XBMC\addons\metadata.artists.universal', content: 'artists', version: '2.1.1')
00:31:55 T:4804   DEBUG: scraper: CreateArtistSearchUrl returned <url>http://search.musicbrainz.org/ws/2/artist/?fmt=xml&query=artist:"Billy%20Joel"&limit=100</url>
00:31:55 T:4804   DEBUG: FileCurl::Open(0EC8DCDC) http://search.musicbrainz.org/ws/2/artist/?fmt=xml&query=artist:"Billy%20Joel"&limit=100
00:31:55 T:4804 WARNING: XFILE::CFileCurl::CReadState::FillBuffer: curl failed with code 22
00:31:55 T:4804   ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.
00:31:55 T:4804   ERROR: ADDON::CScraper::Run: Unable to parse web site
00:31:55 T:4804   DEBUG: Thread MUSIC_GRABBER::CMusicInfoScraper 4804 terminating
00:31:55 T:888   DEBUG: Thread MUSIC_GRABBER::CMusicInfoScraper start, auto delete: 0
00:31:55 T:888   DEBUG: ADDON::CScraper::FindAlbum: Searching for 'Billy Joel - Greatest Hits, Vols. 1 & 2 (1973-1985)' using Universal Album Scraper scraper (path: 'C:\Users\XBMC\AppData\Roaming\XBMC\addons\metadata.album.universal', content: 'albums', version: '1.3.1')
00:31:55 T:888   DEBUG: scraper: CreateAlbumSearchUrl returned <url>http://search.musicbrainz.org/ws/2/release/?fmt=xml&query=release:"Greatest%20Hits%2c%20Vols%2e%201%20%26%202%20%281973%2d1985%29"%20AND%20artist:"Billy%20Joel"</url>
00:31:55 T:888   DEBUG: FileCurl::Open(0EC8DDE0) http://search.musicbrainz.org/ws/2/release/?fmt=xml&query=release:"Greatest%20Hits%2c%20Vols%2e%201%20%26%202%20%281973%2d1985%29"%20AND%20artist:"Billy%20Joel"
00:31:56 T:888 WARNING: XFILE::CFileCurl::CReadState::FillBuffer: curl failed with code 22
00:31:56 T:888   ERROR: CFileCurl::CReadState::Open, didn't get any data from stream.
00:31:56 T:888   ERROR: ADDON::CScraper::Run: Unable to parse web site
00:31:56 T:888   DEBUG: Thread MUSIC_GRABBER::CMusicInfoScraper 888 terminating
[

I had a large number of artists come up like this in my most recent attempt, but again it doesn't seem to always be the same artists.

Thanks in advance for all of the help (and a great scraper)!




Re: [Release] Universal Scraper for Music Artists - olympia - 2012-07-20

What happens if you individually refresh an artist like that which failed to scrape on mass scraping?


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-20

Refreshing manually after the scrape does indeed work on most artists, however I found a few ones that didn't search properly. Here are a couple examples:

The Red Hot Chili Peppers (searched if I omitted the 'The')
The Pusscat Dolls (searched if I omitted the 'The')
Motley Crue (couldn't get it search, but it has the umlauts above the 'o' and the 'u' in my library. Returned no results, but also did not give me a chance to correct it)
DJ Keoki (scraped if I just searched for Keoki)

Many artists recraped just fine if I performed a manual refresh. I noticed that when an artist did not scrape properly, it seemed to come in clusters. I'm guessing maybe the scraper is overloading the request source and we're getting timed out or something?


RE: [Release] Universal Scraper for Music Artists - olympia - 2012-07-20

(2012-07-20, 22:23)deh2k7 Wrote: The Red Hot Chili Peppers (searched if I omitted the 'The')
The Pusscat Dolls (searched if I omitted the 'The')
Motley Crue (couldn't get it search, but it has the umlauts above the 'o' and the 'u' in my library. Returned no results, but also did not give me a chance to correct it)
DJ Keoki (scraped if I just searched for Keoki)
Sorry to say, but did you actually look up these on MB before you waste my time with them?
How the scraper would find these if they are named differently there (with the only exception of Motley Crue)?

(2012-07-20, 22:23)deh2k7 Wrote: I'm guessing maybe the scraper is overloading the request source and we're getting timed out or something?
Yes, this is a known issue with MusicBrainz. XBMC is scraping faster than MB would allow.


RE: [Release] Universal Scraper for Music Artists - Martijn - 2012-07-20

Official name is "Red Hot Chili Peppers" without "The"


RE: [Release] Universal Scraper for Music Artists - olympia - 2012-07-20

Yeah, and DJ Keoki is 'Keoki', however 'The Pusscat Dolls' is 'The Pusscat Dolls'. This one is actually incorrectly named on MB.


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-21

(2012-07-20, 22:45)olympia Wrote:
(2012-07-20, 22:23)deh2k7 Wrote: The Red Hot Chili Peppers (searched if I omitted the 'The')
The Pusscat Dolls (searched if I omitted the 'The')
Motley Crue (couldn't get it search, but it has the umlauts above the 'o' and the 'u' in my library. Returned no results, but also did not give me a chance to correct it)
DJ Keoki (scraped if I just searched for Keoki)
Sorry to say, but did you actually look up these on MB before you waste my time with them?
How the scraper would find these if they are named differently there (with the only exception of Motley Crue)?

(2012-07-20, 22:23)deh2k7 Wrote: I'm guessing maybe the scraper is overloading the request source and we're getting timed out or something?
Yes, this is a known issue with MusicBrainz. XBMC is scraping faster than MB would allow.

Olympia - I appreciate the help and all of the work that you have done to date, but response seems a bit harsh, as I have been seeing some legitimate issues and it's hard to tell which issues are known and which are not, or which are non-issues and simply how the scraper works. In each case above, with or without the 'The' Musicbrainz produces a list of search results with each artist appearing with a 100 match score AND the first entry. I would think that the scraper would select these automatically as it seems to do in most cases.

The next closest score was below 50 in ALL cases (21,33,43). It's not like it was a 98 vs 100. With the Chili Peppers, searching with and without the leading 'The' still produces a list of results and has the Chili Peppers as #1, and with a 100% score. I'm not sure why or how it's wasting your time to raise a relatively common use case where the scraper fails to match an artist without manual intervention, when it could easily accept the highest scoring result (or the first one in the list in the case of a near match). This also would have addressed the Motley Crue scraping issue as it matches with or without the umlauts in the first position with a 100 match score.

I was also under the assumption that the scraper would be a bit less sensitive to non-specific leading articles, like 'the' or 'a' (thinking A Perfect Circle as an example), just in case.

I would respectfully ask that you consider at least the Musicbrainz match score or result list when matching artists to facilitate a more streamlined automated scraping process.



RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-21

I also found 3 others that are problematic - blink-182, SR-71, and Ill Niño. I'm guessing these 3 share the same issue as Mötley Crüe, with non-English or non-alphanumeric characters in the title?


RE: [Release] Universal Scraper for Music Artists - olympia - 2012-07-21

Right! Search results loosened up. Let's see if there will be other issues due to this...
Hopefully there won't be and then everyone can be happy.


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-21

I wasn't asking for looser search results, just that you utilize the scoring mechanism already provided instead of being inflexible and failing on a less than 100% character-for-character exact match. This is because not everyone has perfectly tagged, music that matches musicbrainz, and because musicbrainz, being a user-submitted site, is not always correct, as demonstrated by one of my examples. I also provided 4 other examples where your scraper fails to provide any matches, even though using the exact, character-for-character search on musicbrainz does work.

Instead of considering that there could be some some room for improvement (keep in mind I asked for consideration, not a guaranteed change), my constructive feedback, was met with flippant sarcasm. Nice. Thanks for your hard work anyway. A simple, "No, it probably would cause more problems than it solves..." would have sufficed.


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-21

(2012-07-20, 22:55)olympia Wrote: Yeah, and DJ Keoki is 'Keoki', however 'The Pusscat Dolls' is 'The Pusscat Dolls'. This one is actually incorrectly named on MB.

So how does one deal with incorrect data on MB? Clearly, 'The Pussycat Dolls' and 'Pussycat Dolls' are the same band, and both come up with the same match. Same goes for Chili Peppers.

Keoki is officially known by 3 different variations of the same name, depending on the album he released. He is also known as Superstar DJ Keoki, and DJ Keoki.

Doing a simple search on 'keoki' on Amazon will clearly show this: Disco Death Race 2000 is by 'Superstar DJ Keoki' and Journeys is by 'DJ Keoki'. It's just another example of how using the match score on Musicbrainz will facilitate matching on less than 100% character-for-character matches. In all search cases, the best search result has Keoki at the top of the list, with a 100 score and with the next closest match a long way away.

You can choose to ignore this and choose not to incorporate these recommendations, as is your prerogative as the author of this great scraper, but it's just rude to meet constructive feedback with sarcasm and snarky remarks.


RE: [Release] Universal Scraper for Music Artists - olympia - 2012-07-21

I was pissed of by your orignal post of the problem, because you haven't done your homework (analysing the problem) and I gave a straight feedback to you on this. However I didn't really want to be sarcastic by saying "Search results loosened up".

...and yes, you were asking for looser search results. Any deviation from the original search string means loosening to me.

In any case, I solved your problem, no? Did you test with the updated scrapers? I know I am not the nicest guy you ever met, but you can't say I am not responsive to the problems...


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-23

(2012-07-21, 18:35)olympia Wrote: I was pissed of by your orignal post of the problem, because you haven't done your homework (analysing the problem) and I gave a straight feedback to you on this. However I didn't really want to be sarcastic by saying "Search results loosened up".

...and yes, you were asking for looser search results. Any deviation from the original search string means loosening to me.

In any case, I solved your problem, no? Did you test with the updated scrapers? I know I am not the nicest guy you ever met, but you can't say I am not responsive to the problems...

Olympia - you have always been responsive and helpful, so I appreciate that.

I did test with updated scrapers, and it seems like motley crue and ill nino are working, but MB does not have AM links, so I added those. Will retest when the changes apply to make sure everything comes through.

Still having problems with names like blink-182 and SR-71. Not able to get a match at all with SR-71 but blink works if i manually search with 'blink 182', even though both artists appear with the '-' on MB.


RE: [Release] Universal Scraper for Music Artists - olympia - 2012-07-23

I know, I am aware of this issue. The root cause is this:
http://search.musicbrainz.org/ws/2/artist/?fmt=xml&query=artist:blink-182
Seems to be an encoding issue on the MusicBrainz site.

FYI - allmusic links you add becomes effective immediatelly.


RE: [Release] Universal Scraper for Music Artists - deh2k7 - 2012-07-24

(2012-07-23, 20:24)olympia Wrote: I know, I am aware of this issue. The root cause is this:
http://search.musicbrainz.org/ws/2/artist/?fmt=xml&query=artist:blink-182
Seems to be an encoding issue on the MusicBrainz site.

FYI - allmusic links you add becomes effective immediatelly.

OK - gotcha on the artists with a hyphen in the name. I validated and verified both Motley Crue and Ill Nino (on all three sites), and a manual refresh in XBMC does not seem to get me all of the information, even though the allmusic links are active. My scraper settings are for Last.FM first, with an allmusic fallback. Both artists have bios on both sites, and only the bio seems to not be scraping. I am getting the allmusic info like genres and years active. I'm scratching my head on these two ones. Need a debug log?