Developing an Amazon Movie Scraper
#13
spiff Wrote:remember; the scraper is a xml file so any special chars needs to be xml'ized for the regexp to function properly in the scraper (or rather, for the scraper xml to load correctly in the first place). this is why there are all these < & etc stuffs in the other scrapers (which i assume you use for reference)

Yep, I spotted that on the Wiki and have taken that in to consideration when trying RegExhibit. It was the fact that in some places [a-zA-Z0-9] works and others I have had to use .* that makes it difficult to know if my regex will be right. Also escaping characters like ~ (tilde), a question mark ?, a space character, and ( ) <- not actual regex but real parenthesis is confusing. These are not listed in the Wiki.

While I am at it, here are some questions I feel the Wiki does not adequately answer.

1. I am using folder names as the search criteria. The folder names include the year of the film, e.g. "Soylent Green (1973)". While IMDB works very well with all of that as a search string, Amazon does not like the year being included (with or without parenthesis) if you type that using a web-browser, suggesting it would equally not like it being sent by a scraper.

Does your typical scraper when using folder names like this, include the year when creating a search URL, or does it strip it off, if so how?

2. When a scraper looks at the search results, XBMC displays a list of titles found, but the scraper has to use the ID number to generate the URL to access the selected result. I am not clear from the Wiki where these two different steps are done, and how the results are linked. As you saw from my last post, I have found the relevant html code returned by Amazon and somewhat got regex code that can extract either the ID or the title.

(Oops, I thought I had already posted this reply, but came back later and discovered this message still open for editing in a tab in my web-browser.)
Reply


Messages In This Thread
[No subject] - by blittan - 2008-07-26, 16:30
[No subject] - by jelockwood - 2008-07-28, 12:04
[No subject] - by jmarshall - 2008-07-28, 12:10
[No subject] - by spiff - 2008-07-28, 12:50
[No subject] - by jelockwood - 2008-07-28, 12:57
[No subject] - by flipped cracker - 2008-08-06, 02:12
[No subject] - by spiff - 2008-08-06, 11:39
[No subject] - by DonJ - 2008-08-06, 14:42
[No subject] - by jelockwood - 2008-08-07, 03:30
[No subject] - by ShortySco - 2008-08-07, 04:24
[No subject] - by spiff - 2008-08-07, 10:09
[No subject] - by jelockwood - 2008-08-11, 13:25
[No subject] - by spiff - 2008-08-11, 14:22
[No subject] - by spiff - 2008-08-17, 22:18
[No subject] - by jelockwood - 2008-08-18, 00:57
[No subject] - by Gaarv - 2008-08-18, 10:41
[No subject] - by spiff - 2008-08-18, 12:01
[No subject] - by C-Quel - 2008-08-19, 22:47
[No subject] - by jelockwood - 2008-08-21, 00:30
Good news! - by jelockwood - 2008-08-23, 15:19
[No subject] - by jelockwood - 2008-08-23, 21:19
[No subject] - by C-Quel - 2008-08-24, 16:27
[No subject] - by jelockwood - 2008-08-25, 19:43
[No subject] - by w00dst0ck - 2008-08-26, 09:32
[No subject] - by w00dst0ck - 2008-08-26, 13:17
[No subject] - by jelockwood - 2008-09-20, 04:44
[No subject] - by w00dst0ck - 2008-10-01, 17:11
[No subject] - by gyrene2083 - 2008-12-11, 04:40
[No subject] - by jelockwood - 2008-12-11, 21:12
[No subject] - by spiff - 2008-12-14, 16:34
Scraper broken? - by jelockwood - 2009-01-11, 07:10
[No subject] - by C-Quel - 2009-01-12, 00:55
[No subject] - by jelockwood - 2009-01-12, 04:38
[No subject] - by mkortstiege - 2009-01-12, 09:47
[No subject] - by ultrabrutal - 2009-01-12, 10:16
[No subject] - by spiff - 2009-01-12, 13:38
[No subject] - by nekrosoft13 - 2009-01-12, 23:27
[No subject] - by jelockwood - 2009-01-13, 00:20
[No subject] - by Gamester17 - 2009-01-13, 11:19
[No subject] - by jelockwood - 2009-01-13, 12:30
[No subject] - by Gamester17 - 2009-01-13, 13:27
[No subject] - by C-Quel - 2009-01-13, 20:07
[No subject] - by ultrabrutal - 2009-01-13, 20:42
[No subject] - by Clumsy - 2009-01-13, 23:21
[No subject] - by azido - 2009-01-14, 11:30
[No subject] - by ultrabrutal - 2009-01-14, 13:28
[No subject] - by Nuka1195 - 2009-01-14, 16:03
[No subject] - by ultrabrutal - 2009-01-14, 16:14
[No subject] - by azido - 2009-01-14, 17:24
[No subject] - by ultrabrutal - 2009-01-14, 17:29
[No subject] - by azido - 2009-01-14, 17:51
[No subject] - by ultrabrutal - 2009-01-14, 17:57
[No subject] - by XavHorneT - 2009-01-22, 12:47
[No subject] - by jelockwood - 2009-01-23, 16:38
[No subject] - by XavHorneT - 2009-01-24, 16:33
[No subject] - by joolz - 2009-01-24, 18:48
[No subject] - by joolz - 2009-02-09, 00:10
Logout Mark Read Team Forum Stats Members Help
Developing an Amazon Movie Scraper1