How to use 2 variant in create searh string ?
#1
Hi guys;

i need some help to fix/upgrade turkcealtyazı.org searh string.web site add new searh system if i add year to serah url it's matching correct movie
My url
Code:
http://www.turkcealtyazi.org/filtre.php?tur=&yil=[b]XXXX[/b]&ulke=&sira=3&altyazi=3&fragman=3&tip=3&plimit=0&olimit=0&find=[b]YYYY[/b]
\1 ----> XXXX
\2 ----> YYYY
When i use both "<url>http://www.....\1......\2.....</url>[/b]" in create searh string scraper not working cannot parse web site.
How can i solve that?
Reply
#2
i need to see the whole code. i suspect it can't parse the xml.
Reply
#3
OK i play little bit last night and and now parse xml but my regex cannot exract correctly name and year
<CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-9">
<RegExp input="$$4" output="&lt;url&gt;\1&lt;\url&gt;" dest="3">
<RegExp input="$$1" output="http://www.turkcealtyazi.org/filtre.php?tur=&amp;yil=\2&amp;ulke=&amp;sira=4&amp;altyazi=3&amp;fragman=3&amp;tip=3&amp;plimit=0&amp;olimit=0&amp;find=\1" dest="4">
<expression>(.[a-z]+)[^0-9](\d+)</expression>
</RegExp>
<expression/>
</RegExp>
</CreateSearchUrl>

in scraper editor (.[a-z]+)[^0-9](\d+)" gives me \1-->name and \2-->year. For "the.phantom.2009" xbmc change "the phantom 2009" and i saw in editor
\1-->the+phantom
\2-->2009 .
But in xbmc log says \1--> "the%" \2-->20 what is the correct regex to seperate name and date

also when i try to input name manually xbmc still search with old name (folder name) is it a bug ?

Here is the create serch and get results part.
Code:
<CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-9">
    <RegExp input="$$4" output="&lt;url&gt;\1&lt;\url&gt;" dest="3">
        <RegExp input="$$1" output="http://www.turkcealtyazi.org/filtre.php?tur=&amp;yil=\2&amp;ulke=&amp;sira=4&amp;altyazi=3&amp;fragman=3&amp;tip=3&amp;plimit=0&amp;olimit=0&amp;find=\1" dest="4">
            <expression>(.[a-z]+)[^0-9](\d+)</expression>
        </RegExp>
        <expression/>
    </RegExp>
</CreateSearchUrl>

<GetSearchResults dest="8">
    <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-9&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
        <!--search results page-->
        <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3-\5&lt;/title&gt;&lt;url&gt;http://www.turkcealtyazi.org/mov/\1/\2.html&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;year&gt;\4&lt;/year&gt;&lt;/entity&gt; \n" dest="5">
            <expression cs="true" repeat="yes">&lt;a href=&quot;/mov/(.[0-9]*)/(.*?).html&quot; title=&quot;(.*?)&quot;&gt;.[^\(]*\((.*?)\)&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;b&gt;(.*?)&lt;</expression>
        </RegExp>
        <expression noclean="1"/>
    </RegExp>
</GetSearchResults>
Reply
#4
Not sure how you get "the phantom 2009" out of the file/folder name "the.phantom.2009". XBMC should extract the suspected movie title (only the title) to $$1 and year to $$2.

So try it like this:
Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1,1" date="2010-11-08"  content="movies" thumb="icon.png">
        <CreateSearchUrl SearchStringEncoding="iso-8859-9" dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://www.turkcealtyazi.org/filtre.php?tur=&amp;yil=$$2&amp;ulke=&amp;sira=4&amp;altyazi=3&amp;fragman=3&amp;tip=3&amp;plimit=0&amp;olimit=0&amp;find=\1&lt;url&gt;" dest="3">
            <expression />
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-9&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <!--search results page-->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3-\5&lt;/title&gt;&lt;url&gt;http://www.turkcealtyazi.org/mov/\1/\2.html&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;year&gt;\4&lt;/year&gt;&lt;/entity&gt; \n" dest="5">
                <expression cs="true" repeat="yes">&lt;a href="/mov/(.[0-9]*)/(.*?).html" title="(.*?)"&gt;.[^\(]*\((.*?)\)&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;b&gt;(.*?)&lt;</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetSearchResults>



...and could you please do the very tiny cosmetics modifications I requested on the mail list so we can include your scrapers in the official repo?

Also I am not sure about the icon for beyazperde you included. There was already an icon for it in the repo. So is the change on purpose?
Reply
#5
yes year $$2 ant name to \1 worked. thank you.

i'll continue to improve turkcealtyazi.org scraper and v.1.0.6 will be ready after few seconds on svn.please put this scraper to repo.also i fix things you ask.
i dont think to improve beyazperde.com scraper anymore cause web site havent got enough database.please remove this from repo.
thanks again..
Reply
#6
Can you please:

Change the first line from:
Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1,1" date="2010-11-13"  content="movies" thumb="icon.png">

to:
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="1.1" date="2010-11-13">

And remove this part from the CreateSearchUrl:
Code:
<RegExp input="$$2" output="\1" dest="3">
    <expression>\d+</expression>
</RegExp>
Reply

Logout Mark Read Team Forum Stats Members Help
How to use 2 variant in create searh string ?0