How to use "if then else": URL1 = empty > fetch infos from URL2 for

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #1
Hello,

IMDB is a cool DB, but sadly most older movies haven't a german translation. So I would like to use (sometimes) for the plot and plot summary tags another URL. How can this be done in the scraper? Could you give me please a hint?

Regards,

Eisbahn
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #2
for this you use a chain. you create a function to parse the other page;

Code:
<ParseOtherPage dest="3">
  <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
   ... stuff things into $$5
  </RegExp>
</ParseOtherPage>
then, in the GetDetails function or whatever you call this function (what i called to chain);

Code:
<RegExp input="$$1" output="&lt;url function=&quot;ParseOtherPage&quot;&gt;someurlor\1orwhatever&lt;/url&gt;" dest="5+">
  ...
</RegExp>
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #3
Hi spiff,

don't know if we talked about different things. What I wanted:
- always scrape the german IMDB and search for the plot
- if plot is missing and showing a text like "no plot available, please translate and insert it in our HP" in german IMDB and the user wants to use another URL (and just in this two cases/conditions), scrape it for the plot

=> OK, the decision/asking the user about scraping another URL is no problem and could be done by the "conditional" flag
=> scraping another URL with a function is as well no problem (having done this for other infos just before)

My real question is: how to get the decision: scraped infos from IMDB are not good, use alternative (if user wants to)

Eisbahn
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #4
just grab the plot to a buffer and check if it's bad, if it is, chain.
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #5
Yes, and how can this be done?
Scraper code is for example
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="11" date="2010-06-07" name="DE_IMDb" content="movies" thumb="imdb.png" language="de">
    <include>common/fetch_other_url.xml</include>
    <GetSettings dest="3">
        <RegExp input="$$5" output="&lt;settings&gt;\1&lt;/settings&gt;" dest="3">
            <RegExp input="$$1" output="&lt;setting label=&quot;if plot is empty, fetch other URL&quot; type=&quot;bool&quot; id=&quot;fetchurl&quot; default=&quot;true&quot;&gt;&lt;/setting&gt;" dest="5">
                <expression/>
            </RegExp>
        </RegExp>
    </GetSettings>
    <NfoUrl dest="3">
        [...] some RegEx [...]
    </NfoUrl>
    <CreateSearchUrl SearchStringEncoding="iso-8859-1" dest="3">
        [...] some RegEx [...]
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        [...] some RegEx [...]
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5">
                <expression/>
            </RegExp>
            <RegExp conditional="fetchurl" input="$$2" output="&lt;url function=&quot;FetchPlotFromOtherURL&quot;&gt;$$3&lt;/url&gt;" dest="5+">
                <expression/>
            </RegExp>
            [...] some RegEx [...]
        </RegExp>
    </GetDetails>
</scraper>
But this will fetch the other URL as soon as the user sets "fetchurl" to true. How can I do the check: info in first URL is not good, conditional RegEx with url function should be run?

Eisbahn
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #6
Code:
<RegExp input="$$1" output="\1" dest="6">
  <expression clear="yes">somethingthatgrabstheplot</expression>
</RegExp>
<RegExp input="$$6" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5+">
  <expression>(.+)</expression>
</RegExp>
<RegExp input="$$6" output="&lt;url function=&quot;theotherone&quot;&gt;theotherurl&lt;/url&gt;" dest="5+">
  <expression>^$</expression>
</RegExp>

1) grab plot to a buffer
2) if buffer is nonempty, use as plot
3) if buffer is empty, do the chain.

elementary, dr watson.
find quote
mkortstiege Offline
Team-Kodi Developer
Posts: 2,964
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #7
@Eisbahn, some of the german scrapers are already using something like this to determine if there's an imdb id or if we have to use google.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #8
Hi Spiff,

not so easy for me, ouch...

In the GetDetails function of the scraper:
Code:
        <RegExp input="$$2" output="&lt;url function=&quot;GetIMDBPlot&quot;&gt;$$3plotsummary&lt;/url&gt;" dest="5+">
            <expression/>
        </RegExp>

this is running fine and calls my IMDB func (in common directory):
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="1,1" date="2010-06-12" name="IMDB Functions" content="movies" language="de">
    <include>ofdb_de.xml</include>
    <GetIMDBPlot dest="5">
        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="\1" dest="2">
                <expression clear="yes">&lt;div id=&quot;swiki.2.1&quot;&gt;\n\n([^\n]+)</expression>
            </RegExp>
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">
                <expression>(.+)</expression>
            </RegExp>
            <RegExp conditional="getofdbplot" input="$$1" output="\1" dest="4">
                <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>
            </RegExp>
            <RegExp conditional="getofdbplot" input="$$2" output="&lt;url function=&quot;GetOFDBURL&quot;&gt;http://www.imdb.de/title/$$4/&lt;/url&gt;" dest="3">
                <expression>^$</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetIMDBPlot>
</scraper>

Ok, if we do not find any plot, have a look at the OFDB site:
Code:
<?xml version="1.0" encoding="utf-8"?>
<scraper framework="11" date="2010-06-12" name="OFDB Functions" content="movies" language="de">
    <GetOFDBURL dest="5">
        <!--<url function="GetOFDBLink">http://www.ofdb.de/view.php?SText=\1&Kat=IMDb&page=suchergebnis</url>-->
        <RegExp input="$$1" output="&lt;plot&gt;OFDB Function&lt;/plot&gt;" dest="5">
            <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>
        </RegExp>
    </GetOFDBURL>
    <GetOFDBLink dest="5">
        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBOutTagline&quot;&gt;http://www.ofdb.de/\1&lt;/url&gt;" dest="5">
            <expression>&lt;br&gt;1. &lt;a href=&quot;.*?([^&quot;]+)</expression>
        </RegExp>
    </GetOFDBLink>
    <GetOFDBOutTagline dest="5">
        <RegExp input="$$1" output="&lt;details&gt;&lt;outline&gt;\1&lt;/outline&gt;&lt;tagline&gt;\1&lt;/tagline&gt;&lt;plot&gt;\1&lt;/plot&gt;&lt;/details&gt;" dest="5">
            <expression>&lt;b&gt;Inhalt:&lt;/b&gt;([^&lt;]+)</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBPlot&quot;&gt;http://www.ofdb.de/plot/\1&lt;/url&gt;" dest="5+">
            <expression>&lt;a href=&quot;plot/([^&quot;]+)</expression>
        </RegExp>
    </GetOFDBOutTagline>
    <GetOFDBPlot dest="5">
        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5+">
            <RegExp input="$$1" output="\1" dest="2">
                <expression noclean="1">Eine Inhaltsangabe von(.*)Zur &amp;Uuml;bersichtsseite des Films</expression>
            </RegExp>
            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">
                <expression noclean="1">&lt;br&gt;([^&lt;]+)(?:&lt;/font&gt;)</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetOFDBPlot>
</scraper>

As far as I can see, the OFDB feature has a problem/is never used. It's not a typo at the conditional flags: if I delete them, prblem still exists.
On my paper, pen and mind it works and one URL after the other is fetched and checked by the scraper.
What went wrong?

Regards,

Eisbahn
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #9
error message in log is always (with different IMDB tt-IDs)
Code:
CIMDB::InternalGetDetails: Unable to parse web site [http://www.imdb.de/title/tt0499549/]
What have I tried: put all returns in <detail> tags => problem still exists
find quote