@
ronie I've done some album scraping with v002 and got a crop of script errors for you to check. Log is a monster 15MB so zipped
here, 1435 albums out of 1542 scraped successfully taking 2h 15m.
Yet to examine the successes more carefully to validate data etc. The SQL errors I understand, down to duplicate albums in my test data one with mbid one without (so seen by Kodi as different). Scraper fetches mbid and so we get an error on trying to save it. I'll give what best to do in that case some thought, but it is a core issue not an addon one.
Edit:
Multiple Scraper runs needed
Re-scraping twice more, which just retries to scrape those that have not been successful before, took the unscraped albums down from 107 to 64. Not sure why additional attempts were needed, there were no errors or timeouts in the log for these albums and some already had mbids.
Also I think that the progress bar vanished during the initial scraping causing me to think it had finished when it had not. This could be related perhaps? Either way I have been unable to repeat the progress bar issue, and see nothing in the log. All I can say is odd and something to watch for.
Look up by name identification issues for collaboration albums
Of the remaining 64 albums some of these are expected because they are not in the Musicbrainz or Discogs databases, and some cause the db error that I will fix.
What is causing look up by name identification issues is albums with multiple album artists. Many of my examples of collaborations are classical music, which I generally tag with mbids anyway, but this does effect other genres just to a lesser extent. We could just shrug and tell users to ensure collaborations have mbids, or maybe we could try to pass and use the artist names differently?
An example from my test data is "Neck and Neck" by "Chet Atkins / Mark Knopfler"
3f831051-75f4-3085-8d0f-8b94adbb3c5e
The scraper is passed the artist display text e.g. "Chet Atkins / Mark Knopfler", but requests like
Code:
http://musicbrainz.org/ws/2/release/?fmt=json&query=release:"Neck and neck" AND (artistname:"Chet Atkins / Mark Knopfler" OR artist:"Chet Atkins / Mark Knopfler")
do
not get a result from Musicbrainz.
However
Code:
http://musicbrainz.org/ws/2/release/?fmt=json&query=release:"Neck and neck" AND (artistname:"Chet Atkins" OR artist:"Chet Atkins")
or
http://musicbrainz.org/ws/2/release/?fmt=json&query=release:"Neck and neck" AND (artistname:"Mark Knopfler" OR artist:"Mark Knopfler")
or
http://musicbrainz.org/ws/2/release/?fmt=json&query=release:"Neck and neck" AND (artistname:"Chet Atkins & Mark Knopfler" OR artist:"Chet Atkins & Mark Knopfler")
or
http://musicbrainz.org/ws/2/release/?fmt=json&query=release:"Neck and neck" AND (artistname:"Mark Knopfler" OR artist:"Mark Knopfler" OR artistname:"Chet Atkins" OR artist:"Chet Atkins")
will find the album sucessfully.
Note that it as having " / " in the display name rather than " & " that mattered in this case, in others it could be use of " and " etc. Getting a matching combined names is going to be tricky. Perhaps Musicbrainz have some advise for queries using multile artist names?
An example from classical music is
"An Irish Symphony / A Comedy Overture" by "Hamilton Harty, Ulster Orchestra, Bryden Thomson"
4c2c013d-d901-4690-97d3-cc99a770b469
As the only release in the db of those works combined it can be found using any one of the three album artists, or a with a precisely punctated combination of all three
"Hamilton Harty" - yes
"Bryden Thomson" - yes
"Ulster Orchestra" - yes
"Hamilton Harty; Ulster Orchestra, Bryden Thomson" - yes
"Hamilton Harty, Ulster Orchestra, Bryden Thomson" - no
"Hamilton Harty / Ulster Orchestra / Bryden Thomson" - no
"Hamilton Harty; Bryden Thomson" - no
Not sure if we can easily pass an array of names to the scraper, or if we just pass a string with separators that the scraper can split. Artist display name itself can not be reliably split, so best let the core tell the scraper the individuals it knows.
I will add that for much classical music name lookup is useless e.g. "Symphony No. 4" "Ludvig van Beethoven" is hundreds of releases, adding conductor and orchestra is often still not enough to indentify the album with any accuracy.