•  Previous
  • 1
  • 3
  • 4
  • 5(current)
  • 6
  • 7
  • 27
  • Next 
 
Thread Rating:
  • 2 Vote(s) - 5 Average
[RELEASE] FilmAffinity (Spanish) scraper
#61
Should be %HOMEPATH%\Application Data\XBMC\xbmc.log
or C:\Program Files\XBMC\xbmc.log

BTW: The german website http://www.regex-tester.de/regex.html translated to spanish http://tinyurl.com/5gfxx9 helps alot.
Reply
#62
One question

I have some functions:
Code:
    <GetMoviePosterDB clearbuffers="no" dest="12">
        <RegExp input="$$1" output="&lt;thumb&gt;\1l_\2&lt;/thumb&gt;" dest="13+">
                <expression clear="yes" repeat="yes" noclean="1,2">&quot;poster&quot;.*?src=&quot;(.*?)[a-z]_(.*?)&quot;</expression>
            </RegExp>
    </GetMoviePosterDB>


    <GetIMDBPoster dest="5">
        <RegExp input="$$16$$17$$13$$15$$18" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="5">
        <RegExp input="$$6" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="15">
            <RegExp input="$$1" output="\1_SX$INFO[imdbscale]_SY$INFO[imdbscale]_\2" dest="6">
                <expression noclean="1,2">&lt;a name=&quot;poster&quot;.*?src=&quot;(.*?)_S.*?(.jpg)&quot;.*?&lt;/a&gt;</expression>
            </RegExp>
            <expression clear="yes" noclean="1">(.*?_SX[0-9]+_SY[0-9]+_.jpg)</expression>
        </RegExp>
        <expression noclean="1"></expression>
        </RegExp>
    </GetIMDBPoster>

First function has a dest="13" inside
Second function has a dest="15" inside

As input of second function are $$13 and $$15

This works, and I have as a result the list of thumbs from both pages.

The problem I have is the next

if I insert this code in the <GetDetails> section:

Code:
          <RegExp input="$$1" output="&lt;url function=&quot;GetFilmAffinityPoster&quot;&gt;http://www.filmaffinity.com/es/film214384.html&lt;/url&gt;" dest="16">
              <expression noclean="1"></expression>
          </RegExp>

you can see $$16 is also an input of the GetIMDBPoster function, but it don't work. Why?

Thanks
HectorziN
Reply
#63
my suspicion is that you are bitten by the fact that order matters.

remember, the clearbuffers parameter say to not clear buffers after function calls.
so the point here is to
1) fill buffer 13
2) fill buffer 15
3) fill buffer 16
4) call GetIMDBPoster

not
1) fill buffer 13
2) fill buffer 15
3) call GetIMDBPoster
4) fill buffer 16

and i assume GetFilmAffinityPoster has clearbuffers="no"
Reply
#64
Thanks, I got it. The problem was the order of functions.

I am testing with PC version and it works searching for cariño.
Is in xbox where don't work. It is possible that SearchStringEncoding is not implemented in xbox version?

thanks
HectorziN
Reply
#65
I am searching for actor thumbs in imdb.
I have a problem with animation movies, in filmaffinity site de actor for this case is: "Animation"

Then my scraper searchs for animation in imdb and always finds this: Chuck Jones.

How can I avoid searching for actors when the actor is Animation? I should try to remove the Animation code from the buffer to avoid the scraper find it....

Thanks
HectorziN
Reply
#66
use an expression that clears a buffer IF you find animation. let this buffer hold the function call. clear if expression matches. append the buffer. problem solved.
Reply
#67
spiff Wrote:use an expression that clears a buffer IF you find animation. let this buffer hold the function call. clear if expression matches. append the buffer. problem solved.

I have tried it but....
If I write this:
Code:
        <RegExp conditional="SearchCastThumb" input="$$1" output="&lt;url function=&quot;SearchCastThumb&quot;&gt;http://spanish.imdb.com/find?s=nm&amp;amp;q=\1&lt;/url&gt;" dest="5+">
            <expression repeat="yes" noclean="1" trim="1">&lt;a href="search\.php.stype=cast.stext=([^&quot;]*)[^&gt;]*&gt;([^&lt;]*)</expression>
        </RegExp>

it works, but if I change dest="5+" with dest="9+" then the scraper don't call the function. I know because if dest="5+" the log has this:

Get URL: http://spanish.imdb.com/find?s=nm&q=Mar%...E9+Baus%E1

but with 9+ it isn't

why?

thanks
HectorziN
Reply
#68
i assume buffer 9 is never transfered to the one containing the return value from the scraper function..
Reply
#69
spiff Wrote:i assume buffer 9 is never transfered to the one containing the return value from the scraper function..

Yes, I use this:

Code:
      <RegExp input="$$9" output="\1" dest="5+">
         <expression></expression>
      </RegExp>

I tried this code after the call to function and also with the function inside it:

Code:
      <RegExp input="$$9" output="\1" dest="5+">
        <RegExp conditional="SearchCastThumb" input="$$1" output="&lt;url function=&quot;SearchCastThumb&quot;&gt;http://spanish.imdb.com/find?s=nm&amp;amp;q=\1&lt;/url&gt;" dest="9+">
            <expression repeat="yes" noclean="1" trim="1">&lt;a href="search\.php.stype=cast.stext=([^&quot;]*)[^&gt;]*&gt;([^&lt;]*)</expression>
        </RegExp>
         <expression></expression>
      </RegExp>


I also tested it with buffer 20 because buffer 9 is used in the scraper and buffer 20 is never used. But don't work in any case.

Thanks
HectorziN
Reply
#70
you need noclean on the outermost expression or all tags will be stripped off
Reply
#71
spiff Wrote:you need noclean on the outermost expression or all tags will be stripped off

Thanks! Solved!
HectorziN
Reply
#72
By the way, with Atlantis version searchstringencoding works!!!!
HectorziN
Reply
#73
Is not working on current version (9.04 beta)
Reply
#74
Any updates on this one?
Reply
#75
yes, i commited a fix at r19978
Reply
  •  Previous
  • 1
  • 3
  • 4
  • 5(current)
  • 6
  • 7
  • 27
  • Next 



[RELEASE] FilmAffinity (Spanish) scraper52