Lovefilm.se (Swedish) scraper - search uses javascript, can that be bypassed?
#9
spiff Wrote:nope. what i mean by search using google is something ala

http://www.google.com/search?hl=en&site=...tnG=Search

Yeah, that's what I meant, just didn't include the url but the search string. Tongue

I got a bit further now, and I have a scraper that works inside the editor, but not inside XBMC.

If I use the editor and test the scraper, it asks me for a search string, gets an url, gives me a list of search results, and then fetches info for the one I choose. All is well. But inside XBMC I get no results when scanning, and no results when adding manually (hitting 'I' on a movie). Using other scrapers work.
The XBMC log says this:
Code:
23:33:01 T:3860 M:450981888   DEBUG: SDLKeyboard: scancode: 23, sym: 105, unicode: 105, modifier: 0
23:33:01 T:3860 M:450981888   DEBUG: CApplication::OnKey: 61513 pressed, action is 11
23:33:01 T:3860 M:450969600   DEBUG: CVideoDatabase::GetMovieId (D:\Documents and Settings\Administrator\Skrivbord\filmer\the dark knight.iso), query = select idMovie from movie where idFile=2
23:33:01 T:3860 M:450945024   DEBUG: No NFO file found. Using title search for 'D:\Documents and Settings\Administrator\Skrivbord\filmer\the dark knight.iso'
23:33:01 T:3860 M:450945024    INFO: Loading skin file: DialogProgress.xml
23:33:01 T:3860 M:450940928   DEBUG: Load DialogProgress.xml: 3.23ms
23:33:01 T:3860 M:450940928   DEBUG: ------ Window Init (DialogProgress.xml) ------
23:33:01 T:3860 M:450940928   DEBUG: Alloc resources: 0.08ms (0.00 ms skin load)
23:33:01 T:2192 M:450609152   DEBUG: thread start, auto delete: 0
23:33:01 T:2192 M:450588672   DEBUG: CIMDB::InternalFindMovie: Searching for 'filmer' using Lovefilm.se scraper (file: 'lovefilm.xml', content: 'movies', language: 'sv', date: '2010-02-04', framework: '1,1')
23:33:01 T:2192 M:450506752   DEBUG: FileCurl::Open(0012D770) http://www.google.com/search?hl=en&q=intitle:filmer+site:lovefilm.se/film&num=100
23:33:01 T:2192 M:450465792    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://www.google.com
23:33:01 T:2192 M:450408448   DEBUG: FileCurl::Close(0012D770) http://www.google.com/search?hl=en&q=intitle:filmer+site:lovefilm.se/film&num=100
23:33:01 T:2192 M:450404352   DEBUG: scraper: GetSearchResults returned <results></results>
23:33:01 T:2192 M:450404352   ERROR: CIMDB::Process: Error looking up movie filmer
23:33:01 T:2192 M:450404352   DEBUG: Thread 2192 terminating
23:33:01 T:3860 M:450523136    INFO: Loading skin file: DialogKeyboard.xml
23:33:01 T:3860 M:450916352   DEBUG: Load DialogKeyboard.xml: 21.32ms

The results, according to the editor is:
Code:
<results><entity><url>http://www.lovefilm.se/film/48044-The+Dark+Knight.do</url><title>The Dark Knight DVD</title></entity><entity><url>http://www.lovefilm.se/film/52631-The+Dark+Knight+(Blu-ray)+-+Extramaterial.do</url><title>The Dark Knight (Blu-ray) - Extramaterial</title></entity><entity><url>http://www.lovefilm.se/film/51628-The+Dark+Knight+(Blu-ray).do;jsessionid=DDC3B8E739F803541C84096C18C90991</url><title>The Dark Knight (Blu-ray)</title></entity></results>

My XML code:
PHP Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="1,1" date="2010-02-04" name="Lovefilm.se" content="movies" thumb="lovefilm.png" language="sv">
    <CreateSearchUrl dest="4">
        <!-- I've used both <url>...</url> and like this here. Using <url>...</url> gives an error in the editor, but neither works in XBMC. -->
        <RegExp input="$$1" output="http://www.google.com/search?hl=en&amp;q=intitle:\1+site:lovefilm.se/film&amp;num=100" dest="4">
            <expression></expression>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="6">
        <RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="6">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;url&gt;\1&lt;/url&gt;&lt;title&gt;\2&lt;/title&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;a href=&quot;(http:\/\/www.lovefilm.se\/film\/.*?)&quot;.*?\)&quot;&gt;(.*?) - Hyr</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="8">
        <RegExp input="$$7" output="&lt;details&gt;\1&lt;/details&gt;" dest="8">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;&lt;year&gt;\2&lt;/year&gt;" dest="7+">
                <expression>&lt;h1&gt;.*?&gt;(.*?)&lt;\/span&gt;.*?([0-9]+)\)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;&lt;votes&gt;\2&lt;/votes&gt;" dest="7+">
                <expression>\(([0-9],[0-9])\) \(([0-9]+) röster\)&lt;\/p&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;originaltitle&gt;\1&lt;/originaltitle&gt;" dest="7+">
                <expression>Originaltitel:&lt;\/div&gt;.*?&lt;div class=&quot;mainInfoRowRight&quot;&gt;.*?&lt;strong&gt;(.*?)&lt;\/strong&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;director&gt;\1&lt;/director&gt;" dest="7+">
                <expression>REGISSÖR&lt;\/li&gt;.*?&lt;ul&gt;.*?&lt;li&gt;.*?&gt;(.*?)&lt;\/a&gt;&lt;\/li&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="7+">
                <expression trim="1">&lt;div id=&quot;description&quot;&gt;(.*?)&lt;\/div&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;genre&gt;\1&lt;/genre&gt;" dest="7+">
                <expression cs="true" repeat="yes" trim="1">&lt;li class=&quot;header&quot;&gt;GENRE&lt;/li&gt;.*?&lt;a href=&quot;/category/.*?&gt;(.*?)&lt;/a&gt;&lt;/li&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="7+">
                <expression>&lt;span&gt;.[^ ]*DVD.*?Speltid:.*?&lt;strong&gt;(.*?)\.&lt;\/strong&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="7+">
                <expression>&lt;img src=&quot;(http://static.lovefilm.se/img/cover/movie/huge/.*?)&quot;</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetDetails>
<!--Created with ScraperXml Editor, Author: filigran-->
</scraper> 

My regexes probably suck, but they yield some results in the editor atleast.
Is there anything missing, some required field? NfoUrl and stuff, do they have to be there?

Thanks for your help so far! Smile
Reply


Messages In This Thread
[No subject] - by spiff - 2010-01-20, 09:16
[No subject] - by The_Ghost16 - 2010-01-20, 11:06
[No subject] - by filigran - 2010-01-20, 12:43
[No subject] - by filigran - 2010-02-02, 23:06
[No subject] - by mkortstiege - 2010-02-03, 01:15
[No subject] - by filigran - 2010-02-03, 15:21
[No subject] - by spiff - 2010-02-03, 15:30
[No subject] - by filigran - 2010-02-05, 02:12
[No subject] - by spiff - 2010-02-05, 11:18
[No subject] - by filigran - 2010-02-06, 23:50
[No subject] - by jojje - 2010-04-20, 22:28
Logout Mark Read Team Forum Stats Members Help
Lovefilm.se (Swedish) scraper - search uses javascript, can that be bypassed?1