Thread Rating:
  • 0 Vote(s) - 0 Average
Quick Scraper Question (Hope so:))
#16
you need to set the SearchStringEncoding on the CreateSearchUrl function
Reply
#17
spiff Wrote:you need to set the SearchStringEncoding on the CreateSearchUrl function

As always thanks for this one, Spiff !!!
Reply
#18
Is there something equal for the GetDetails Section, because the plot is displayed with the html tags for the umlautsHuh
Reply
#19
My <GetThumbnailLink dest="5"> outputs as many urls as covers but my <GetThumbnail dest="5"> only outputs the thumb from the first url. how to make all url' s outputted, means getting all thumbs??

thanks in advance and sorry for that poor english Smile

Schenk
Reply
#20
<details><thumbs><thumb>..</thumb><thumb>..</thumb></thumbs></details>

also see http://forum.xbmc.org/showthread.php?tid=48643
Reply
#21
Thanks spiff but i can't follow you, maybe i got a blockade in my head now Smile

This is how it looks right now:

Code:
<RegExp input="$$1" output="&lt;url function=&quot;GetThumbnailLink&quot;&gt;http://www.cinefacts.de/kino/film/\1/\2/plakate.html&lt;/url&gt;" dest="5+">
                       <expression repeat ="yes">&lt;a href=&quot;/kino/film/([0-9]*)/([^\/]*)/plakate.html&quot;&gt;</expression>
                </RegExp>

                        <expression noclean="1"/>
                </RegExp>
        </GetDetails>

    <!--Thumbnail-->
        <GetThumbnailLink clearbuffers="no" dest="6">
             <RegExp input="$$1" output="&lt;details&gt;&lt;url function=&quot;GetThumbnail&quot;&gt;http://www.cinefacts.de/kino/film/\1&lt;/url&gt;&lt;/details&gt;" dest="6">
            <expression repeat="yes" noclean="1">&lt;a href=&quot;/kino/film/([^&quot;]+)&quot;&gt;[^&lt;]*&lt;img</expression>
        </RegExp>
        </GetThumbnailLink>



    <GetThumbnail dest="5">
        <RegExp input="$$2" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="5+">
            <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="2">
                <expression>=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetThumbnail>

I can't find out where to change this.

Cheers

Schenk
Reply
#22
why are you repeating in getthumbnaillink? i don't get what you are trying to achieve and hence it is impossible to help you
Reply
#23
spiff Wrote:why are you repeating in getthumbnaillink? i don't get what you are trying to achieve and hence it is impossible to help you

Thanks for answer spiff, trying to explain.

in the GetDetails i output one url like ...posters.html

in the GetThumbnailLink there are as many output url's as cover pages like ...poster_1.html poster_2.html etc.

i want all the outputted urls to be parsed for the cover and get them.

hope that will make things clearer for you and i really appreciate your help !!!

Thanx

Schenk
Reply
#24
aha.

then you want
Code:
<RegExp input="$$1" output="&lt;url function=&quot;GetThumbnailLink&quot; cache=&quot;some.xml&quot; &gt;http://www.cinefacts.de/kino/film/\1/\2/plakate.html&lt;/url&gt;" dest="5+">
                       <expression repeat ="yes">&lt;a href=&quot;/kino/film/([0-9]*)/([^\/]*)/plakate.html&quot;&gt;</expression>
                </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </GetDetails>

    <!--Thumbnail-->
        <GetThumbnailLink clearbuffers="no" dest="6">
             <RegExp input="$$7" output="&lt;details&gt\1&gt;&lt;/details&gt;" dest="6">
                    <RegExp input="$$1" output=";&lt;url function=&quot;GetThumbnail&quot;&gt;http://www.cinefacts.de/kino/film/\1&lt;/url&gt;" dest="7+">
            <expression repeat="yes" noclean="1">&lt;a href=&quot;/kino/film/([^&quot;]+)&quot;&gt;[^&lt;]*&lt;img</expression>
                    <RegExp input="" output="&lt;url function=&quot;CollectThumbnails&gt;&quot></url>
                       <expression/>
                    </RegExp>
        </RegExp>
        </GetThumbnailLink>

    <GetThumbnail  clearbuffers="no" dest="5">
        <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="8">
                <expression>=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
        </RegExp>
                <RegExp input="" output="&lt;details&gt;&lt;/details&gt; dest="5">
            <expression noclean="1"/>
        </RegExp>
    </GetThumbnail>
        
        <CollectThumbnails dest="2">
           <RegExp input="$$8" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="">
             <expression noclean="1"/>
           </RegExp>
        </CollectThumbnails>

i'm sure it's full of typos but i'm at work. shows the idea anyways
Reply
#25
Thanks spiff,

a few question for me the stupid boy:

cache="some.xml" what´s that?

a few typos is okay, i think i found them but the empty buffers and dest; is this for real or are these typos tooHuh

Thanks again, hope soon i'm finished with questioning and bothering you Smile

Schenk
Reply
#26
okay, the cache thing is actually to hack around a limitation i will lift soonish. you can only run a scraper function on a valid url.

if you set the cache property on a url, we cache to a local file with that name. usually it is used to run several functions on the same page. in this case we just need *some* valid url to run the last function on, and to avoid fetching anything we use the same url as the one we set the cache property on.

the expressions with the empty inputs are there on purpose. we need to return a valid xml from each function call, or the process stops. empty dest is a typo, it should be 2
Reply
#27
Hey spiff,

here's what i got now. didn't work and i don't know if i understand the cache thing right and there'll be many false things, i guess. maybe you could see what's wrong:

Code:
            <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnailLink&quot; cache=&quot;http://www.google.de&quot; &gt;http://www.cinefacts.de/kino/film/\1/\2/plakate.html&lt;/url&gt;" dest="5+">
                       <expression repeat ="yes">&lt;a href=&quot;/kino/film/([0-9]*)/([^\/]*)/plakate.html&quot;&gt;</expression>
                </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </GetDetails>

    <!--Thumbnail-->
        <GetThumbnailLink clearbuffers="no" dest="6">
               <RegExp input="$$7" output="&lt;details&gt;\1&lt;/details&gt;" dest="6">
                 <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnail&quot;&gt;http://www.cinefacts.de/kino/film/\1&lt;/url&gt;" dest="7+">
            <expression repeat="yes" noclean="1">&lt;a href=&quot;/kino/film/([^&quot;]+)&quot;&gt;[^&lt;]*&lt;img</expression>
                    <RegExp input="" output="&lt;url function=&quot;CollectThumbnails&quot;&gt;&lt;/url&gt;"
                        <expression/>
            </RegExp>
               </RegExp>
        </GetThumbnailLink>

    <GetThumbnail clearbuffers="no" dest="5">
        <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="8">
                <expression>=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
                </RegExp>
                 <RegExp input="" output="&lt;details&gt;&lt;/details&gt;" dest="5">
          <expression noclean="1"/>
        </RegExp>
    </GetThumbnail>

        <CollectThumbnails dest="2">
           <RegExp input="$$8" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="2">
             <expression noclean="1"/>
           </RegExp>
        </CollectThumbnails>
</scraper>

Thanx Schenk
Reply
#28
Code:
        <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnailLink&quot; cache=&quot;some.xml&quot;&gt;http://www.cinefacts.de/kino/film/\1/\2/plakate.html&lt;/url&gt;" dest="5+">
                       <expression repeat ="yes">&lt;a href=&quot;/kino/film/([0-9]*)/([^\/]*)/plakate.html&quot;&gt;</expression>
                </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </GetDetails>

    <!--Thumbnail-->
        <GetThumbnailLink clearbuffers="no" dest="6">
               <RegExp input="$$7" output="&lt;details&gt;\1&lt;/details&gt;" dest="6">
                 <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnail&quot;&gt;http://www.cinefacts.de/kino/film/\1&lt;/url&gt;" dest="7">
            <expression repeat="yes" noclean="1">&lt;a href=&quot;/kino/film/([^&quot;]+)&quot;&gt;[^&lt;]*&lt;img</expression>
                    </RegExp>
                    <RegExp input="" output="&lt;url function=&quot;CollectThumbnails&quot; cache=&quot;some.xml&quot; &gt;http://doesnt.matter&lt;/url&gt;" dest="7+">
                        <expression/>
            </RegExp>
               </RegExp>
        </GetThumbnailLink>

    <GetThumbnail clearbuffers="no" dest="5">
        <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="8+">
                <expression>=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
                </RegExp>
                 <RegExp input="" output="&lt;details&gt;&lt;/details&gt;" dest="5">
          <expression noclean="1"/>
        </RegExp>
    </GetThumbnail>

        <CollectThumbnails dest="2">
           <RegExp input="$$8" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="2">
             <expression noclean="1"/>
           </RegExp>
        </CollectThumbnails>
</scraper>
Reply
#29
spiff Wrote:
Code:
        <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnailLink&quot; cache=&quot;some.xml&quot;&gt;http://www.cinefacts.de/kino/film/\1/\2/plakate.html&lt;/url&gt;" dest="5+">
                       <expression repeat ="yes">&lt;a href=&quot;/kino/film/([0-9]*)/([^\/]*)/plakate.html&quot;&gt;</expression>
                </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </GetDetails>

    <!--Thumbnail-->
        <GetThumbnailLink clearbuffers="no" dest="6">
               <RegExp input="$$7" output="&lt;details&gt;\1&lt;/details&gt;" dest="6">
                 <RegExp input="$$1" output="&lt;url function=&quot;GetThumbnail&quot;&gt;http://www.cinefacts.de/kino/film/\1&lt;/url&gt;" dest="7">
            <expression repeat="yes" noclean="1">&lt;a href=&quot;/kino/film/([^&quot;]+)&quot;&gt;[^&lt;]*&lt;img</expression>
                    </RegExp>
                    <RegExp input="" output="&lt;url function=&quot;CollectThumbnails&quot; cache=&quot;some.xml&quot; &gt;http://doesnt.matter&lt;/url&gt;" dest="7+">
                        <expression/>
            </RegExp>
               </RegExp>
        </GetThumbnailLink>

    <GetThumbnail clearbuffers="no" dest="5">
        <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="8+">
                <expression>=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
                </RegExp>
                 <RegExp input="" output="&lt;details&gt;&lt;/details&gt;" dest="5">
          <expression noclean="1"/>
        </RegExp>
    </GetThumbnail>

        <CollectThumbnails dest="2">
           <RegExp input="$$8" output="&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="2">
             <expression noclean="1"/>
           </RegExp>
        </CollectThumbnails>
</scraper>


Error: Unable to parse GetThumbnailLink.xml
Reply
#30
Hey spiff, me again :=

I couldn't get this working and my head's gonna explode.

I found some new way to parse so i changed my code. I think it is the same but a lot easier.

Code:
            <!--Poster URL-->
                        <RegExp input="$$1" output="&lt;url function=&quot;GetPosters&quot;&gt;http://www.cinefacts.de/kino/film/\1/\2/\3/\4/plakat.html&lt;/url&gt;" dest="5+">
                <expression repeat="yes">&quot;/kino/film/([0-9]*)/([^\/]*)/([^\/]*)/([^\/]*)/plakat.html&quot;\)</expression>
            </RegExp>
                                <expression noclean="1"/>
        </RegExp>
    </GetDetails>

    <!--Poster-->
    <GetPosters clearbuffers="no" dest="5">
        <RegExp input="$$2" output="&lt;?xml version=&lt;details&gt;&lt;thumbs&gt;\1&lt;/thumbs&gt;&lt;/details&gt;" dest="5+">
            <RegExp input="$$1" output="&lt;thumb&gt;http://www.cinefacts.de/kino/plakat/\1&lt;/thumb&gt;" dest="2">
                <expression repeat="yes">href=&quot;/kino/plakat/([^&quot;]*)&quot;</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetPosters>
</scraper>

Poster URL gives me (for example) two valid html pages.

Get Poster gives me two url's but outputs it seperat so only one is available in XBMC. Could you again decsribe for me dumb ass what to do to get all covers downloaded.
Reply



Quick Scraper Question (Hope so:))00