Filmweb scraper is broken
#1
I have just found that filmweb scraper is broken. There was some changes on a filmweb.pl. I have made some digging and was able to fix this. I don't have time to submit a patch but if someone want's to fix this here is what's need to be change

Lines 43 and 44 in filmweb.xml should now looks as above

Code:
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;filmweb-\1&quot;&gt;http://www.filmweb.pl\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" fixchars="2">searchResult.*?href=&quot;(.*?)&quot;&gt;.*?searchResultTitle[^&gt;]+&gt;(.*?)&lt;/a&gt;&lt;/h3&gt;.*?searchResultDetails[^0-9]+([0-9]*)</expression>

Regards
Reply
#2
Yes, thanks. Now it's looks like fixed. However I have another problem. Scraper doesn't get correct poster. Movie is found correctly, description and title is downloaded but instead of poster I see something like screenshot from the movie. Just like in the picture below. Please help.

Image
Reply
#3
Quote:Yes, thanks. Now it's looks like fixed. However I have another problem. Scraper doesn't get correct poster. Movie is found correctly, description and title is downloaded but instead of poster I see something like screenshot from the movie. Just like in the picture below. Please help.

Hi, I made few changes and it looks like it works now. You can change these two lines (204 and 205, but I'm not sure because I made few more changes):
Code:
            <RegExp input="$$1" output="&lt;thumb preview=&quot;http://gfx.filmweb.pl/po\12\2&quot;&gt;http://gfx.filmweb.pl/po\13\2&lt;/thumb&gt;" dest="8+">
                <expression noclean="1" repeat="yes">href=&quot;http://gfx.filmweb.pl/po(?:([^"]*\.)[^"]*(\.jpg))</expression>
to something like this:
Code:
            <RegExp input="$$1" output="&lt;thumb preview=&quot;\1&quot;&gt;\23\3&lt;/thumb&gt;" dest="8+">
                <expression noclean="1" repeat="yes">&lt;span class=&quot;poster&quot;&gt;[\s]*&lt;img src=&quot;((.*?\.)2(\.jpg))\?</expression>

Please make a backup of your filmweb.xml file before doing any changes!
Reply
#4
Thanks for the fixes, works great.
luki Wrote:two lines (204 and 205, but I'm not sure because I made few more changes)
For me it was line 203 and 204 Wink
Reply

Logout Mark Read Team Forum Stats Members Help
Filmweb scraper is broken0