How to use "if then else": URL1 = empty > fetch infos from URL2 for
#1
Hello,

IMDB is a cool DB, but sadly most older movies haven't a german translation. So I would like to use (sometimes) for the plot and plot summary tags another URL. How can this be done in the scraper? Could you give me please a hint?

Regards,

Eisbahn
Reply
#2
for this you use a chain. you create a function to parse the other page;

Code:
<ParseOtherPage dest="3">
  <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
   ... stuff things into $$5
  </RegExp>
</ParseOtherPage>
then, in the GetDetails function or whatever you call this function (what i called to chain);

Code:
<RegExp input="$$1" output="&lt;url function=&quot;ParseOtherPage&quot;&gt;someurlor\1orwhatever&lt;/url&gt;" dest="5+">
  ...
</RegExp>
Reply
#3
Hi spiff,

don't know if we talked about different things. What I wanted:
- always scrape the german IMDB and search for the plot
- if plot is missing and showing a text like "no plot available, please translate and insert it in our HP" in german IMDB and the user wants to use another URL (and just in this two cases/conditions), scrape it for the plot

=> OK, the decision/asking the user about scraping another URL is no problem and could be done by the "conditional" flag
=> scraping another URL with a function is as well no problem (having done this for other infos just before)

My real question is: how to get the decision: scraped infos from IMDB are not good, use alternative (if user wants to)

Eisbahn
Reply
#4
just grab the plot to a buffer and check if it's bad, if it is, chain.
Reply
#5
Yes, and how can this be done?
Scraper code is for example
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="11" date="2010-06-07" name="DE_IMDb" content="movies" thumb="imdb.png" language="de">
    <include>common/fetch_other_url.xml</include>
    <GetSettings dest="3">
        <RegExp input="$$5" output="&lt;settings&gt;\1&lt;/settings&gt;" dest="3">
            <RegExp input="$$1" output="&lt;setting label=&quot;if plot is empty, fetch other URL&quot; type=&quot;bool&quot; id=&quot;fetchurl&quot; default=&quot;true&quot;&gt;&lt;/setting&gt;" dest="5">
                <expression/>
            </RegExp>
        </RegExp>
    </GetSettings>
    <NfoUrl dest="3">
        [...] some RegEx [...]
    </NfoUrl>
    <CreateSearchUrl SearchStringEncoding="iso-8859-1" dest="3">
        [...] some RegEx [...]
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        [...] some RegEx [...]
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5">
                <expression/>
            </RegExp>
            <RegExp conditional="fetchurl" input="$$2" output="&lt;url function=&quot;FetchPlotFromOtherURL&quot;&gt;$$3&lt;/url&gt;" dest="5+">
                <expression/>
            </RegExp>
            [...] some RegEx [...]
        </RegExp>
    </GetDetails>
</scraper>
But this will fetch the other URL as soon as the user sets "fetchurl" to true. How can I do the check: info in first URL is not good, conditional RegEx with url function should be run?

Eisbahn
Reply
#6
Code:
<RegExp input="$$1" output="\1" dest="6">
  <expression clear="yes">somethingthatgrabstheplot</expression>
</RegExp>
<RegExp input="$$6" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5+">
  <expression>(.+)</expression>
</RegExp>
<RegExp input="$$6" output="&lt;url function=&quot;theotherone&quot;&gt;theotherurl&lt;/url&gt;" dest="5+">
  <expression>^$</expression>
</RegExp>

1) grab plot to a buffer
2) if buffer is nonempty, use as plot
3) if buffer is empty, do the chain.

elementary, dr watson.
Reply
#7
@Eisbahn, some of the german scrapers are already using something like this to determine if there's an imdb id or if we have to use google.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#8
Hi Spiff,

not so easy for me, ouch...

In the GetDetails function of the scraper:
Code:
        <RegExp input="$$2" output="&lt;url function=&quot;GetIMDBPlot&quot;&gt;$$3plotsummary&lt;/url&gt;" dest="5+">
            <expression/>
        </RegExp>

this is running fine and calls my IMDB func (in common directory):
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="1,1" date="2010-06-12" name="IMDB Functions" content="movies" language="de">
    <include>ofdb_de.xml</include>
    <GetIMDBPlot dest="5">
        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="\1" dest="2">
                <expression clear="yes">&lt;div id=&quot;swiki.2.1&quot;&gt;\n\n([^\n]+)</expression>
            </RegExp>
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">
                <expression>(.+)</expression>
            </RegExp>
            <RegExp conditional="getofdbplot" input="$$1" output="\1" dest="4">
                <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>
            </RegExp>
            <RegExp conditional="getofdbplot" input="$$2" output="&lt;url function=&quot;GetOFDBURL&quot;&gt;http://www.imdb.de/title/$$4/&lt;/url&gt;" dest="3">
                <expression>^$</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetIMDBPlot>
</scraper>

Ok, if we do not find any plot, have a look at the OFDB site:
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="11" date="2010-06-12" name="OFDB Functions" content="movies" language="de">
    <GetOFDBURL dest="5">
        <!--<url function="GetOFDBLink">http://www.ofdb.de/view.php?SText=\1&Kat=IMDb&page=suchergebnis</url>-->
        <RegExp input="$$1" output="&lt;plot&gt;OFDB Function&lt;/plot&gt;" dest="5">
            <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>
        </RegExp>
    </GetOFDBURL>
    <GetOFDBLink dest="5">
        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBOutTagline&quot;&gt;http://www.ofdb.de/\1&lt;/url&gt;" dest="5">
            <expression>&lt;br&gt;1. &lt;a href=&quot;.*?([^&quot;]+)</expression>
        </RegExp>
    </GetOFDBLink>
    <GetOFDBOutTagline dest="5">
        <RegExp input="$$1" output="&lt;details&gt;&lt;outline&gt;\1&lt;/outline&gt;&lt;tagline&gt;\1&lt;/tagline&gt;&lt;plot&gt;\1&lt;/plot&gt;&lt;/details&gt;" dest="5">
            <expression>&lt;b&gt;Inhalt:&lt;/b&gt;([^&lt;]+)</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBPlot&quot;&gt;http://www.ofdb.de/plot/\1&lt;/url&gt;" dest="5+">
            <expression>&lt;a href=&quot;plot/([^&quot;]+)</expression>
        </RegExp>
    </GetOFDBOutTagline>
    <GetOFDBPlot dest="5">
        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5+">
            <RegExp input="$$1" output="\1" dest="2">
                <expression noclean="1">Eine Inhaltsangabe von(.*)Zur &amp;Uuml;bersichtsseite des Films</expression>
            </RegExp>
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">
                <expression noclean="1">&lt;br&gt;([^&lt;]+)(?:&lt;/font&gt;)</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetOFDBPlot>
</scraper>

As far as I can see, the OFDB feature has a problem/is never used. It's not a typo at the conditional flags: if I delete them, prblem still exists.
On my paper, pen and mind it works and one URL after the other is fetched and checked by the scraper.
What went wrong?

Regards,

Eisbahn
Reply
#9
error message in log is always (with different IMDB tt-IDs)
Code:
CIMDB::InternalGetDetails: Unable to parse web site [http://www.imdb.de/title/tt0499549/]
What have I tried: put all returns in <detail> tags => problem still exists
Reply

Logout Mark Read Team Forum Stats Members Help
How to use "if then else": URL1 = empty > fetch infos from URL2 for0