Kodi Community Forum

Full Version: Universal Scraper for Music Albums
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Is there a problem with some characters in album titles? I only ask because whatever I do I can't drag down the review and art for "NME: C86" - I even used MP3tag to tag the tracks from Musicbrainz so I know its not a tagging problem. If I search for "NME" I get a bunch of returns (none of them being C86) but if I search for "NME:" I get nothing.

I know t's on Audiodb http://www.theaudiodb.com/album/2159559
Scraper is putting artist name in genre, in actual gotham nightlies version, I have create new collection from zero, with download aditional data set on, after I export and create nfo for each album, when check file, I have this error for all albums, see artist field is blank and genre have name artist.

exemple:

Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<album>
    <title>Plays Metallica By Four Cellos</title>
    <musicBrainzAlbumID></musicBrainzAlbumID>
    <artist></artist>
    <genre>Apocalyptica</genre>
    <style>Heavy Metal</style>
    <style>Classical Crossover</style>
    <style>Neo-Classical Metal</style>
    <style>Progressive Metal</style>
    <mood>Atmospheric</mood>
    <mood>Complex</mood>
    <mood>Dramatic</mood>
    <mood>Eccentric</mood>
    <mood>Menacing</mood>
    <mood>Sophisticated</mood>
    <mood>Stylish</mood>
    <mood>Theatrical</mood>
    <mood>Visceral</mood>
    <mood>Volatile</mood>
    <mood>Elaborate</mood>
    <theme>Late Night</theme>
    <compilation>false</compilation>
    <review>The premise of a cello-playing four piece (from Finland, no less) performing the music of Metallica may seem outrageous at first, but much of the metal gods&apos; repertoire translates surprisingly well into this &quot;classical&quot; reading. And while Apocalyptica chooses to concentrate on more recent hits like &quot;Enter Sandman,&quot; &quot;Sad But True&quot; and &quot;Wherever I May Roam&quot; for obvious commercial purposes, it is Metallica&apos;s earlier, more complex compositions, like &quot;Master of Puppets,&quot; &quot;Creeping Death,&quot; and &quot;Welcome Home (Sanitarium),&quot; which actually work best in this scenario. The quartet&apos;s love and understanding of the music is the real key here, and their attention to detail and technically flawless performance will no doubt satisfy even the most skeptical of fans. And while it is unlikely that this album will interest non-Metallica fans, it obviously inspired the band themselves to explore the classical direction on their own terms with 1999&apos;s S&amp;M album.</review>
    <type></type>
    <releasedate></releasedate>
    <label>Mercury</label>
    <type></type>
    <thumb>http://userserve-ak.last.fm/serve/300x300/44621003.jpg</thumb>
    <thumb>http://userserve-ak.last.fm/serve/174s/44621003.jpg</thumb>
    <thumb>http://image.allmusic.com/00/amg/cov120/drc800/c809/c809303317d.jpg</thumb>
    <path>G:\Musicas\Apocalyptica\(1996) Plays Metallica\</path>
    <rating>5</rating>
    <year>1996</year>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Enter Sandman</title>
        <position>1</position>
        <duration>03:42</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Master of Puppets</title>
        <position>2</position>
        <duration>07:15</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Harvester of Sorrow</title>
        <position>3</position>
        <duration>06:14</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>The Unforgiven</title>
        <position>4</position>
        <duration>05:21</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Sad But True</title>
        <position>5</position>
        <duration>04:48</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Creeping Death</title>
        <position>6</position>
        <duration>05:06</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Wherever I May Roam</title>
        <position>7</position>
        <duration>05:08</duration>
    </track>
    <track>
        <musicBrainzTrackID></musicBrainzTrackID>
        <title>Welcome Home (Sanitarium)</title>
        <position>8</position>
        <duration>05:50</duration>
    </track>
</album>

Tags album

Image

Thanks Olympia, for great work, very good addons.
(2013-05-01, 08:58)Onan Wrote: [ -> ]Is there a problem with some characters in album titles? I only ask because whatever I do I can't drag down the review and art for "NME: C86" - I even used MP3tag to tag the tracks from Musicbrainz so I know its not a tagging problem. If I search for "NME" I get a bunch of returns (none of them being C86) but if I search for "NME:" I get nothing.

I know t's on Audiodb http://www.theaudiodb.com/album/2159559

The TADB API is working fine so it must be something to do with the initial MuiscBrainz search, XBMC or the scraper

Code:
http://www.theaudiodb.com/api/v1/json/1/searchalbum.php?s=Various%20Artists&a=NME:%20C86
@olympia, is this a problem with the musicbrainz search maybe?

This does not work
Code:
http://www.musicbrainz.org/ws/2/release-group/?query=releasegroup:NME:%20C86

But removing the ":" works
Code:
http://www.musicbrainz.org/ws/2/release-group/?query=releasegroup:NME%20C86
SO I have HUNDRADS of “Billboard Top 100” playlist which means I have THOUSANDS of MP3, which are in folders and are tagged like so:

Code:
File name    
Artist    Album    Title    Year

001 Gotye - Somebody That I Used to Know (feat. Kimbra).mp3
    Gotye    Billboard Top 100 of 2012    Somebody That I Used to Know (feat. Kimbra)    2012

002 Carly Rae Jepsen - Call Me Maybe.mp3
    Carly Rae Jepsen    Billboard Top 100 of 2012    Call Me Maybe    2012

003 Fun. - We Are Young (feat. Janelle Monáe).mp3
    Fun.    Billboard Top 100 of 2012    We Are Young (feat. Janelle Monáe)    2012

004 Maroon 5 - Payphone (feat. Wiz Khalifa).mp3
    Maroon 5    Billboard Top 100 of 2012    Payphone (feat. Wiz Khalifa)    2012

005 Ellie Goulding - Lights (Single Version).mp3
    Ellie Goulding    Billboard Top 100 of 2012    Lights (Single Version)    2012

Here is a fill list in html example:
https://dl.dropboxusercontent.com/u/1247..._2012.html

Will this scrapper:
1. Find the correct album art for the above examples?
2. I know the album name is incorrect, so do I need to delete that for this scrape (now the album field would be blank or can I just leave them there)?
3. Will that album art be imbedded in the IDv2 tag? (so it works with all my music software/phones/iTunes...)
4. Is there any tweaks needed to make it scrap these types of VA albums?

Thanks for all your help!!!!
Nathan
All id3v2 tags need to be correct for them to scrape correctly.
XBMC will not modify your mp3 files so it won't embed the art in them. You need to do that yourself
Thanks for your quick response!

Main goal is gathering Album art for my MP3s for all my devices, including but not limited to XBMC
Which I assume would be ideal for everyone?!?


The id3v2 tags are 100% correct for Artist & Title. I can delete the data for album (only if needed).

1 will it scrape the data correctly with incorrect/no album name?
2 After I scrape, is their an easy way to sync scrapped data to id3 tag?

Thanks again!
Nathan
1 no
2 no
Since wanilton post was not answered I'll add some more Smile

I don't know if the problem is in the scraper or in internal Xbmc yet but in recent Gotham there's a problem with scraping album genres.

My collection is perfect tagged, during one of the recent build the strArtists was dropped (I think for musicBrainz) and Xbmc asked for a rescan for Art
(Maybe change this message box since this is needed for way more than just art).

When I accept the rescan Xbmc drop all my albums genre in strGenres and put the artists in it.

I join a screenshot of the database during the process.

You can see the dropped artists with correct genre, then during the rescan the artist field is populated and the genre is replaced with the artist.

Image
Would someone be able to take a look at this debug log here

Basically in trying to find album info for Tape Deck Heart by Frank Turner. All info is on Musicbrainz (some of which i entered myself) but i do not get any search results from when trying to scrap from XBMC.

Any help would be greatly appreciated.
(2013-05-27, 12:52)prawnee Wrote: [ -> ]Would someone be able to take a look at this debug log here

Basically in trying to find album info for Tape Deck Heart by Frank Turner. All info is on Musicbrainz (some of which i entered myself) but i do not get any search results from when trying to scrap from XBMC.

Any help would be greatly appreciated.

The MusicBrainz API search seems to return a result ok

http://search.musicbrainz.org/ws/2/relea...0Turner%22
Yeah that's what I can't understand. I'll do a bit more investigation when I get home.
It looks like musicbrainz can't handle / in searches, would it be possible to remove them? Example:

http://search.musicbrainz.org/ws/2/relea...D%20artist:"Pet%20Shop%20Boys"
Returns 400

http://search.musicbrainz.org/ws/2/relea...D%20artist:"Pet%20Shop%20Boys"
Works fine
prawnee, the problem is that the scraper can't handle the <disambiguation> statement between </status> and <text-representation>.

I also noticed a problem with the handeling of the <script> (I think I caused it in my previous diff)... It currently looks like:
(?:<script>[^<]*<)*/script>
But should look
(?:<script>[^<]*</script>)*

Attaching diff that fixes the problem (and my previous error)

http://www.xbmclogs.com/show.php?id=22844
(2013-05-28, 21:21)crankylemur Wrote: [ -> ]It looks like musicbrainz can't handle / in searches, would it be possible to remove them? Example:

http://search.musicbrainz.org/ws/2/relea...D%20artist:"Pet%20Shop%20Boys"
Returns 400

http://search.musicbrainz.org/ws/2/relea...D%20artist:"Pet%20Shop%20Boys"
Works fine

Yup, I can confirm this. It looks like scraper: CreateAlbumSearchUrl is missing "..." enclosure of release name

if you include it in your example, it fixes the result
Code:
http://search.musicbrainz.org/ws/2/release/?fmt=xml&query=release:"Bilingual%20%2f%20Further%20Listening%201995-1997%20"AND%20artist:"Pet%20Shop%20Boys"

However there is still empty result if you have multiple Artists in Album Artist tag separated with / (which is default character for support of multiple artists in XBMC Music DB)
I think scraper: CreateAlbumSearchUrl should use multiple artist parameters in such case

for example Album Artist tag: "Artist1 / Artist2" should create SearchUrl:
Code:
release:"Albumname" AND artist: "Artist1" AND artist: "Artist2"
instead of
Code:
release:"Albumname" AND artist: "Artist1 / Artist2"
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34