Filmweb scraper is broken
#1
I have just found that filmweb scraper is broken. There was some changes on a filmweb.pl. I have made some digging and was able to fix this. I don't have time to submit a patch but if someone want's to fix this here is what's need to be change

Lines 43 and 44 in filmweb.xml should now looks as above

Code:
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\3&lt;/year&gt;&lt;url cache=&quot;filmweb-\1&quot;&gt;http://www.filmweb.pl\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" fixchars="2">searchResult.*?href=&quot;(.*?)&quot;&gt;.*?searchResultTitle[^&gt;]+&gt;(.*?)&lt;/a&gt;&lt;/h3&gt;.*?searchResultDetails[^0-9]+([0-9]*)</expression>

Regards
Reply
#2
Yes, thanks. Now it's looks like fixed. However I have another problem. Scraper doesn't get correct poster. Movie is found correctly, description and title is downloaded but instead of poster I see something like screenshot from the movie. Just like in the picture below. Please help.

Image
Reply
#3
Quote:Yes, thanks. Now it's looks like fixed. However I have another problem. Scraper doesn't get correct poster. Movie is found correctly, description and title is downloaded but instead of poster I see something like screenshot from the movie. Just like in the picture below. Please help.

Hi, I made few changes and it looks like it works now. You can change these two lines (204 and 205, but I'm not sure because I made few more changes):
Code:
            <RegExp input="$$1" output="&lt;thumb preview=&quot;http://gfx.filmweb.pl/po\12\2&quot;&gt;http://gfx.filmweb.pl/po\13\2&lt;/thumb&gt;" dest="8+">
                <expression noclean="1" repeat="yes">href=&quot;http://gfx.filmweb.pl/po(?:([^"]*\.)[^"]*(\.jpg))</expression>
to something like this:
Code:
            <RegExp input="$$1" output="&lt;thumb preview=&quot;\1&quot;&gt;\23\3&lt;/thumb&gt;" dest="8+">
                <expression noclean="1" repeat="yes">&lt;span class=&quot;poster&quot;&gt;[\s]*&lt;img src=&quot;((.*?\.)2(\.jpg))\?</expression>

Please make a backup of your filmweb.xml file before doing any changes!
Reply
#4
Thanks for the fixes, works great.
luki Wrote:two lines (204 and 205, but I'm not sure because I made few more changes)
For me it was line 203 and 204 Wink
Reply



Logout Mark Read Team Forum Stats Members Help
Filmweb scraper is broken0
This forum uses Lukasz Tkacz MyBB addons.