• 1(current)
  • 2
  • 3
  • 4
  • 5
  • 32
  •   
Release Universal Scraper for Music Albums
#1
Image

Sites used
http://www.theaudiodb.com [default album description and review]
http://www.musicbrainz.com [default album rating]
http://www.allmusic.com [default styles, moods, themes]
http://www.fanart.tv [default album thumb]

This scraper collects information from the following supported sites: MusicBrainz, last.fm, theaudiodb.com and allmusic.com, while grabs artwork from: fanart.tv, last.fm, theaudiodb.com and allmusic.com.

The initial search is always done on MusicBrainz. In case allmusic link is not added on the MusicBrainz site, fields from allmusic.com cannot be fetched (very easy to add those missing links though).

You can find and download it from the Team-XBMC official addon repository.
Reply
#2
Thank you very much for this. It mostly works great, however there is one problem. When an album has more than one release listed on MusicBrainz then the scraper seems to pick one at random and more often than not it picks the wrong one. Album releases do vary greatly between different regions with different tracks, track ordering, number of CDs etc. Therefore the scraper invariably downloads an entirely wrong track listing and publisher for the album. Please can you add an option to manually select the release when MusicBrainz has more than one listed. Thank you for your time.
Reply
#3
I am aware of this, but this is not possible to enable on addod level currently. The scraper itself is already capable to handle this.
However the current behaviour of XBMC is, that if there is a match >99%, it picks the first hit automatically, regardless there are multiple 100% matches. I already requested to change this, and odds are high we will come up with something for this in Frodo.

There are two workarounds, unfortunatelly both are uncomfortable (as workarounds most often).
1. create an album.nfo in the album folder with the MusicBrainz link of the release you want
2. there is another one which involves scraper code modification, but if you do this, you will need to select ALL albums manually which is not really comfortable if you have 100s or 1000s of albums. Let me know you are interested in this.
Reply
#4
Thank you for your reply. Yes I would like to know more about the second workaround, thanks.

Also, I have found a problem with the scraping of moods from AllMusic. No matter what moods are listed on the AllMusic page (e.g this album has "Autumnal", "Literate", "Sardonic", "Sensual" etc. listed) the scraper always downloads the same moods for every album: "Romantic", "Bittersweet", "Druggy", "Melancholy", "Hypnotic", "Aggressive" and "Sexy". The themes and styles for each album are correctly downloaded.
Reply
#5
Thanks for noticing and reporting this. This was due to a bug fixed in scraper v1.0.1 (Do force refresh to quicken up the update).
...and thank you in the name of the community for adding allmusic links to MusicBrainz Smile Keep up the spitit! (you can actually easily enable the scraper to fetch album reviews from allmusic if you want - just look at resources/settings.xml in the scraper folder).

For the second workaround we spoke about above, open up 'albumuniversal.xml' and change the following lines:
line 15:
Code:
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt; \3 &lt;/artist&gt;&lt;title&gt; \2 &lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;http://musicbrainz.org/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5">

and line 18:
Code:
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt; \3 &lt;/artist&gt;&lt;title&gt; \2 &lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;http://musicbrainz.org/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">

or in short, just insert a space character in both lines before and after both '\3' and '\2'.
If you do this, the scraper will NOT find anything automatically, but if you do a manual refresh, it will show you a list of releases to select from.
Reply
#6
Great stuff! I confirm that the moods are working properly now in v1.01, thanks for fixing it so quickly. I also have the second workaround working and also the AllMusic reviews. Thanks for all your hard work. Smile
Reply
#7
I have found another problem and I've done a little digging around and think I know the cause. The track duration of the 6th track on this album is downloaded by the scraper as 08:45. Whereas the MusicBrainz page actually lists the duration as 00:53. The problem seems to be that the scraper retrieves the track duration in milliseconds and then takes the first 3 digits and assumes that they equate to seconds. Using this track as an example, the time in milliseconds retrieved by the scraper is 52506 (which equals 00:53). It then takes the first 3 digits, 525, and wrongly treats them as whole seconds. Which gives the result 525 / 60 = 8.75, or 08:45.
Reply
#8
yes, your findings are correct. I need to think about (I am not even sure there is a solution) if I can find some logic around this.
The XBMC scraper addon is only scraping data and is not able to manipulating it.
Reply
#9
Nahh, no solution for this until we can't make xbmc to understand duration in milliseconds...
Will raise this internally.
Reply
#10
Okay I understand. Thanks for taking a look.
Reply
#11
Can't you just reverse the logic in the scraper? Instead of taking the first 3 digits, capture everything but the last 3 digits?
i.e. rather than <length>(\d{3})\d*?</length> have <length>(\d*?)\d{3}</length>
Reply
#12
wooopss, where did I lost my head today?
Hell, yeah this will surely do it. Will fix this when I get home.

Thanks for refreshing my mind.
Reply
#13
hey olympia.

this seems to over run musicbrainz query/sec limit for me on a semi-regular basis. the problem is, once the qps limit kicks in musicbrainz start serving up a really simple fast reject page, which means the queries go even faster and the rate-limit just stays in force. so i have to stop scanning and restart.

i think unfortunately the real solution is a rate-limit per domain, and nothing we can do in the scraper. :-(
Reply
#14
^^ that' too bad. I didn't experience this yet as I didn't run a mass scraping on my side yet.
Reply
#15
(2012-06-10, 10:58)Zippy79 Wrote: I have found another problem and I've done a little digging around and think I know the cause. The track duration of the 6th track on this album is downloaded by the scraper as 08:45. Whereas the MusicBrainz page actually lists the duration as 00:53. The problem seems to be that the scraper retrieves the track duration in milliseconds and then takes the first 3 digits and assumes that they equate to seconds. Using this track as an example, the time in milliseconds retrieved by the scraper is 52506 (which equals 00:53). It then takes the first 3 digits, 525, and wrongly treats them as whole seconds. Which gives the result 525 / 60 = 8.75, or 08:45.

Fixed in v1.0.2
Credits to scudlee (see changelog), cheers!
Reply
  • 1(current)
  • 2
  • 3
  • 4
  • 5
  • 32
  •   
 
Thread Rating:
  • 7 Vote(s) - 3.86 Average



Logout Mark Read Team Forum Stats Members Help
Universal Scraper for Music Albums3.867