Improved allmusic.com scraper (plus a few questions)
#1
Hey gang,

Longtime user, (almost) first-time poster. Wink Anyhoo, I've been searching for the best music scraper for a while now, and haven't really been happy with any of them.. allmusic.com has fantastic artist information/reviews but awful photos (and the scraper never really seemed to get all the proper info), discog has decent photos but really limited artist information, and last.fm has medicore everything. So I started mucking around with the existing allmusic scraper (from r22528) and I've fixed a few problems with it, and improved it a bit (for my own needs, anyhow). Here's what I've done:

- The artist information (aside from the bio) wasn't being parsed properly; the ParseAMGArtist function was being passed the value "test" instead of the actual URL. Fixed.
- The album information wasn't being parsed properly either; the ParseAMGAlbum function was being passed the value "placeholder" instead of the actual URL. Fixed.
- The caching was glitchy, it seems that the cache file should be unique to each artist, whereas it was set to use the same cache file for every artist, so subsequent lookups would often have duplicate/incorrect information from the previous artist. Fixed.
- The scraper was set to only get thumbs from htbackdrops (same with all the other music scrapers now?), which is fine and may be preferrable to some, but htbackdrops barely has any thumbnails for the artists in my library. I noticed that discogs generally has decent photos for their artists, and are quite extensive, so I've changed the scraper to also check discogs for thumbs (though it will still use the thumb from htbackdrops as the primary if one is available).

With these changes, pretty much every artist and album in my library has a proper thumb, as well as full bio/reviews/discography/etc. Definitely a HUGE improvement.

I've posted it here: EDIT: my changes are now in the latest SVN builds. Just download that instead!

Hopefully it helps someone else as well Smile Maybe someone can merge these fixes into SVN.

1st question for the scraper gurus: is it possible to nest URL fetches or functions? I couldn't figure out how to fetch a URL, parse it, and then use the resulting string to fetch another URL and parse that.

2nd question: Is there a way to ensure a variable/buffer is URL encoded properly for use in a GET string?

Thanks!
Reply


Messages In This Thread
Improved allmusic.com scraper (plus a few questions) - by talisto - 2009-09-06, 08:03
[No subject] - by spiff - 2009-09-06, 10:02
[No subject] - by ashlar - 2009-09-06, 12:31
[No subject] - by spiff - 2009-09-06, 12:42
[No subject] - by blacklist - 2009-09-07, 16:47
[No subject] - by ashlar - 2009-09-08, 08:48
[No subject] - by spiff - 2009-09-08, 09:37
[No subject] - by talisto - 2009-09-08, 10:35
[No subject] - by spiff - 2009-09-08, 10:38
[No subject] - by stokedfish - 2009-09-08, 13:36
[No subject] - by spiff - 2009-09-08, 13:39
[No subject] - by talisto - 2009-09-08, 22:01
[No subject] - by talisto - 2009-09-08, 22:17
[No subject] - by spiff - 2009-09-09, 00:01
[No subject] - by spiff - 2009-09-09, 00:18
[No subject] - by ronie - 2009-09-09, 00:36
[No subject] - by talisto - 2009-09-09, 00:53
[No subject] - by talisto - 2009-09-09, 01:01
[No subject] - by spiff - 2009-09-16, 00:21
[No subject] - by joebrady - 2009-09-16, 03:47
[No subject] - by krypt2nite - 2009-09-16, 04:30
Music Scrapper - by adsoto - 2009-09-20, 19:26
[No subject] - by mkortstiege - 2009-09-20, 22:23
[No subject] - by adsoto - 2009-09-20, 23:19
[No subject] - by mkortstiege - 2009-09-20, 23:44
[No subject] - by talisto - 2009-09-26, 11:48
[No subject] - by mkortstiege - 2009-09-26, 11:55
[No subject] - by talisto - 2009-09-26, 19:30
[No subject] - by mkortstiege - 2009-09-26, 23:03
[No subject] - by talisto - 2009-09-27, 01:38
[No subject] - by mkortstiege - 2009-10-01, 01:00
[No subject] - by talisto - 2009-10-01, 02:37
[No subject] - by talisto - 2009-10-01, 08:02
[No subject] - by mkortstiege - 2009-10-01, 08:15
[No subject] - by talisto - 2009-10-01, 08:32
[No subject] - by talisto - 2009-10-01, 21:31
[No subject] - by blacklist - 2009-10-02, 06:25
[No subject] - by talisto - 2009-10-02, 09:17
[No subject] - by talisto - 2009-10-02, 09:27
[No subject] - by steve1977 - 2009-10-02, 10:27
[No subject] - by talisto - 2009-10-05, 04:00
[No subject] - by paco - 2009-10-13, 23:32
[No subject] - by talisto - 2009-10-14, 10:05
[No subject] - by Roborob - 2009-10-17, 15:06
[No subject] - by talisto - 2009-10-17, 21:34
[No subject] - by Roborob - 2009-10-18, 12:20
[No subject] - by azido - 2009-10-25, 17:34
[No subject] - by fnwc - 2009-10-26, 00:13
[No subject] - by talisto - 2009-10-26, 03:40
[No subject] - by talisto - 2009-10-26, 03:51
[No subject] - by azido - 2009-10-26, 16:11
[No subject] - by seedzero - 2009-10-27, 01:34
Scraping not working - by chumaj001 - 2009-10-27, 22:16
[No subject] - by spiff - 2009-10-27, 22:37
scrapers - by chumaj001 - 2009-10-27, 23:07
[No subject] - by SleepyP - 2009-10-27, 23:49
[No subject] - by Ronald Pagan - 2009-10-28, 02:28
freebase - by chumaj001 - 2009-10-28, 03:04
[No subject] - by talisto - 2009-10-28, 05:20
[No subject] - by talisto - 2009-10-28, 07:00
[No subject] - by Ronald Pagan - 2009-10-28, 16:15
[No subject] - by talisto - 2009-10-28, 19:12
Scrapers - by chumaj001 - 2009-10-28, 20:35
[No subject] - by talisto - 2009-10-28, 20:43
[No subject] - by mkortstiege - 2009-10-28, 20:58
[No subject] - by kiboy6 - 2009-10-29, 07:31
[No subject] - by mkortstiege - 2009-10-29, 09:47
[No subject] - by azido - 2009-10-29, 13:55
[No subject] - by ronie - 2009-10-29, 18:37
[No subject] - by spiff - 2009-10-29, 20:04
[No subject] - by azido - 2009-10-30, 01:21
[No subject] - by kiboy6 - 2009-10-31, 06:01
[No subject] - by ronie - 2009-10-31, 13:23
[No subject] - by sho - 2009-10-31, 15:10
[No subject] - by mkortstiege - 2009-10-31, 15:13
[No subject] - by ronie - 2009-10-31, 17:25
[No subject] - by ronie - 2009-11-01, 20:24
[No subject] - by kiboy6 - 2009-11-03, 01:02
[No subject] - by spiff - 2009-11-03, 01:14
[No subject] - by talisto - 2009-11-03, 01:29
[No subject] - by kiboy6 - 2009-11-03, 03:54
[No subject] - by steve1977 - 2009-11-29, 04:59
[No subject] - by steve1977 - 2009-12-02, 08:55
[No subject] - by spiff - 2009-12-02, 09:56
[No subject] - by infinite2 - 2009-12-02, 15:45
[No subject] - by theuni - 2009-12-02, 17:48
[No subject] - by infinite2 - 2009-12-02, 18:49
[No subject] - by Belgrath - 2010-07-12, 22:32
[No subject] - by Belgrath - 2010-07-18, 01:29
[No subject] - by johnny utah - 2010-09-28, 13:53
Logout Mark Read Team Forum Stats Members Help
Improved allmusic.com scraper (plus a few questions)0