Developing an Amazon Movie Scraper
#1
I have recently started using XBMC (on a Mac) and found that while the IMDB scraper works well enough, there are many DVDs not on IMDB that are on Amazon.

[Note: While the examples below use a film title of "Soylent Green" I have manually searched IMDB using a browser to confirm other titles are definitely not listed.]

Surprisingly there is no existing Amazon scraper. As part of an effort to make one myself I started off by looking at the existing scrapers to see how they worked, and following on from this I made some initial efforts to convert the current FilmAffinity scraper to use English results rather than Spanish results (you can download a copy here if you are interested http://homepage.mac.com/jelockwood/.Publ...nityen.zip).

While I have not yet got an Amazon scraper even partially working yet, I have found some important information about the format of the various URLs that Amazon uses.

1. Amazon itself normally replaces spaces in Title searches with a plus (+) symbol, however it does seem to also work with a space (or %20).

A search URL like the following entered in a web-browser all work

Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent+green&x=0&y=0
Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent green&x=0&y=0
Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent%20green&x=0&y=0

and indeed also the slightly shorter

Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent%20green

2. The URL of a result is normally a rather messy and complicated format like this

Code:
http://www.amazon.com/Soylent-Green-John-Barclay/dp/B0016I0AJG/ref=sr_1_1?ie=UTF8&s=dvd&qid=1217077050&sr=1-1

as you can see there would appear to be two different ID numbers plus a text field. However I have been able to determine that the following much simpler form of the URL also works.

Code:
http://www.amazon.com/dp/B0016I0AJG/

Therefore we just need to extract the ID number beginning with a B (they all seem to begin with a B).

3. The thumbnail image normally has a URL of the form

Code:
http://ecx.images-amazon.com/images/I/51bU-puSlkL._SL500_AA240_.jpg

and the large image a URL of the form

Code:
http://ecx.images-amazon.com/images/I/51bU-puSlkL._SS500_.jpg

as you can see the ID number is totally different to anything previously used. However I have also found that the following URL produces the same large image and uses the main ID number from the original URL

Code:
http://ecx.images-amazon.com/images/P/B0016I0AJG.01.L.jpg

or the older alternative host name

Code:
http://images.amazon.com/images/P/B0016I0AJG.01.L.jpg

Note these forms of the URL must use a P rather than an I.

Based on all the above, would anyone care to assist by coming up with an initial Scraper by coding up the CreateSearchUrl and GetSearchResults sections? I will then try scraping the info fields.

PS. On a different topic, if one has a VIDEO_TS folder in a folder representing the name of the film one can use this folder name for IMDB scraping, however as mentioned not all the DVDs are listed on IMDB, I can see it should be possible to use an NFO file to provide at least some metadata but I am unsure of the correct naming and placement in this scenario.

e.g. /DVDs/Soylent Green/VIDEO_TS/

What should the NFO file be called and in which of the three possible folders (DVDs, Soylent Green, or VIDEO_TS) should it be placed?
Reply


Messages In This Thread
Developing an Amazon Movie Scraper - by jelockwood - 2008-07-26, 15:41
[No subject] - by blittan - 2008-07-26, 16:30
[No subject] - by jelockwood - 2008-07-28, 12:04
[No subject] - by jmarshall - 2008-07-28, 12:10
[No subject] - by spiff - 2008-07-28, 12:50
[No subject] - by jelockwood - 2008-07-28, 12:57
[No subject] - by flipped cracker - 2008-08-06, 02:12
[No subject] - by spiff - 2008-08-06, 11:39
[No subject] - by DonJ - 2008-08-06, 14:42
[No subject] - by jelockwood - 2008-08-07, 03:30
[No subject] - by ShortySco - 2008-08-07, 04:24
[No subject] - by spiff - 2008-08-07, 10:09
[No subject] - by jelockwood - 2008-08-11, 13:25
[No subject] - by spiff - 2008-08-11, 14:22
[No subject] - by spiff - 2008-08-17, 22:18
[No subject] - by jelockwood - 2008-08-18, 00:57
[No subject] - by Gaarv - 2008-08-18, 10:41
[No subject] - by spiff - 2008-08-18, 12:01
[No subject] - by C-Quel - 2008-08-19, 22:47
[No subject] - by jelockwood - 2008-08-21, 00:30
Good news! - by jelockwood - 2008-08-23, 15:19
[No subject] - by jelockwood - 2008-08-23, 21:19
[No subject] - by C-Quel - 2008-08-24, 16:27
[No subject] - by jelockwood - 2008-08-25, 19:43
[No subject] - by w00dst0ck - 2008-08-26, 09:32
[No subject] - by w00dst0ck - 2008-08-26, 13:17
[No subject] - by jelockwood - 2008-09-20, 04:44
[No subject] - by w00dst0ck - 2008-10-01, 17:11
[No subject] - by gyrene2083 - 2008-12-11, 04:40
[No subject] - by jelockwood - 2008-12-11, 21:12
[No subject] - by spiff - 2008-12-14, 16:34
Scraper broken? - by jelockwood - 2009-01-11, 07:10
[No subject] - by C-Quel - 2009-01-12, 00:55
[No subject] - by jelockwood - 2009-01-12, 04:38
[No subject] - by mkortstiege - 2009-01-12, 09:47
[No subject] - by ultrabrutal - 2009-01-12, 10:16
[No subject] - by spiff - 2009-01-12, 13:38
[No subject] - by nekrosoft13 - 2009-01-12, 23:27
[No subject] - by jelockwood - 2009-01-13, 00:20
[No subject] - by Gamester17 - 2009-01-13, 11:19
[No subject] - by jelockwood - 2009-01-13, 12:30
[No subject] - by Gamester17 - 2009-01-13, 13:27
[No subject] - by C-Quel - 2009-01-13, 20:07
[No subject] - by ultrabrutal - 2009-01-13, 20:42
[No subject] - by Clumsy - 2009-01-13, 23:21
[No subject] - by azido - 2009-01-14, 11:30
[No subject] - by ultrabrutal - 2009-01-14, 13:28
[No subject] - by Nuka1195 - 2009-01-14, 16:03
[No subject] - by ultrabrutal - 2009-01-14, 16:14
[No subject] - by azido - 2009-01-14, 17:24
[No subject] - by ultrabrutal - 2009-01-14, 17:29
[No subject] - by azido - 2009-01-14, 17:51
[No subject] - by ultrabrutal - 2009-01-14, 17:57
[No subject] - by XavHorneT - 2009-01-22, 12:47
[No subject] - by jelockwood - 2009-01-23, 16:38
[No subject] - by XavHorneT - 2009-01-24, 16:33
[No subject] - by joolz - 2009-01-24, 18:48
[No subject] - by joolz - 2009-02-09, 00:10
Logout Mark Read Team Forum Stats Members Help
Developing an Amazon Movie Scraper1