Kodi Community Forum

Full Version: German IMDB scraper, please test it and give feedback
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8
Eisbahn Wrote:What about <certification>? Is it deprecated and only MPAA is used instead?
Because of different DVDs, I've got more than one mpaa tag, e.g. 12years heavy cut, 16years cut, 18years uncut (it's not a single instance) at "The Rock" (IMDB-ID = tt0117500)

Certification is still in there, sorry I left it out of my info

Eisbahn Wrote:What about the function GetIMDBThumbs? Does it fetch all pics from IMDB, or only the posters (and maybe product)? What are the constants SX, SY, SX$INFO and SY$INFO (or what is this)? Why is the function not repeated (think the users wants more than one thumbnail)? Don't know exactly what this function should do. Pointing to <http://www.imdb.de/title/tt0499549/mediaindex?refine=poster>? Any help?

GetIMDBThumbs only grabs the posters. The actor thumbs are grabbed with the rest of the actor info

SX$INFO is nothing however the $INFO part has meaning, what you left out was [imdbscale] in its entirety $INFO[imdbscale], is a place holder for whatever value the user has selected in the settings for the size of the images to be downloaded (the setting with the id "imdbscale"), $INFO[<settingid>] simply tells the scraper "Replace this placeholder (the placeholder being in this case $INFO[<settingid>])with the text selected in the setting with the id <settingid>

Eisbahn Wrote:How can I call a site without getting a "&" to "&amp;" cleaned? Actually I used a function which removes the &amp; and makes an & into the links :=( The "no HTML clean" tag does not work at all...

Ampersands should be cleaned up by default (if you're looking at the source code of XBMC see ScraperParser:TonguearseExression where it is commented nasty hack #1)

double the ampersand

example http://foo.com/search.php?q=foo&amp;amp;s=foo2

the effect being that &amp;amp; becomes &amp;

Eisbahn Wrote:What format should <premiered> have? String with month written out, or date?

Premiered is simply imported/exported as a string, so it has no localization and/or globalization format. So it doesn't really matter
(but as of current i have no idea IF its stored in database, and if it is, no idea WHERE its stored, because looking in the video34.db the premiered value
seems to be nowhere.)
Nicezia Wrote:@vdrfan, also noticed there are a few other tags not mentioned anywhere else (country, sorttitle, epbookmark, originaltitle) and that premiered though taken from the nfo/scraper, doesn't seem to store into database at all (at least in the last version i'm basing off of which is before the add-on merge, and therefore when importing the file this info is lost, if its even provided)

are these extra tags depreciated tags that haven't been removed from code or added tags (only just now getting to a point where i can read C++ code as well as CSharp) and is the Premiered getting lost fromthe database an oversight?

added. premiered is only used in relation to tvshows. country and sorttitle should be selfexplanatory, epbookmark is the episode bookmark in multi-episode files (i.e. where does episode 2 start).
Nearly everything works now, only the thumbs from IMDB are not working at all...
In the main scraper I use the following RegEx
        <RegExp input="$$2" output="&lt;url cache=&quot;$$2-posters.html&quot; function=&quot;GetIMDBThumbs&quot;&gt;$$3mediaindex?refine=poster&lt;/url&gt;" dest="5+">
        <RegExp input="$$2" output="&lt;url cache=&quot;$$2-product.html&quot; function=&quot;GetIMDBThumbs&quot;&gt;$$3mediaindex?refine=product&lt;/url&gt;" dest="5+">
Resulting in the URLs

The Function is
<GetIMDBThumbs dest="5">
    <RegExp input="$$6" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
        <RegExp input="$$1" output="\1_SX512_SY512_\2" dest="4">
            <expression repeat="yes" noclean="1,2">&lt;img alt=&quot;&quot; height=&quot;100&quot; width=&quot;100&quot;  src=(.*?)_S.*?(.jpg)&quot;</expression>
        <RegExp input="$$4" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="6">
            <expression repeat="yes" noclean="1">(.*?_SX[0-9]+_SY[0-9]+_.jpg)</expression>
        <expression noclean="1"/>
If I do it by hand, I can see nice pics (why the hell should they be crippled to "square format"), e.g. <http://ia.media-imdb.com/images/M/[email protected]@._V1._CR0,0,388,388_SX512_SY512_.jpg>. But if I have a look in XBMC, I see only placeholders (white "Polaroid" with black square). What went wrong?

Eisbahn Wrote:(why the hell should they be crippled to "square format"), e.g. <http://ia.media-imdb.com/images/M/[email protected]@._V1._CR0,0,388,388_SX512_SY512_.jpg>.

it really isn't "crippled" to square, the image is scaled by imdb in relation to the width.
Nicezia Wrote:it really isn't "crippled" to square, the image is scaled by imdb in relation to the width.

Hmmm, the original image is <http://www.imdb.de/media/rm3073674240/tt0499549>, all thumbs are cutted to squares, e.g. <http://ia.media-imdb.com/images/M/[email protected]@._ V1._CR0,0,388,388_SX512_SY512_.jpg>. But thats not a problem of XBMC or the scraper, it's IMDB.
But the main problem still exists: the images are not shown in XBMC. Any chance to check wich URL is generated by the scraper and used for the pic in XBMC?
However: think I could release v1.0 which gathers nearly all infos in a nice format from IMDB and (on user preference) covers and plot from partner sites this weekend.

v1.0.0 available, see first post
corrected some RegEx to get all tags working again
- images/thumbs from IMDB are working now (only a typo)
- alternative plot from OFDB still not working
is the latest version available for download anywhere? would love to test it. danke!
latest Version 2.0.0 can be found here:

What is _not_ working
<premiered>Premierendatum</premiered> not im-/exported to XBMC
<aired>???</aired> only for TV-Shows/series?
<set>???</set> don't know what this is
<artist>???</artist> difference to actor?
<status>???</status> don't know what this is
<certification>Altersfreigabe für alle Staaten außer D</certification> not im-/exported to XBMC
<sorttitle>alternative Filmtitel</sorttitle>only first titel is im-/exported to XBMC
<code>???</code> don't know what this is, I think it's the codec => no sense to import anything in this field
<trailer>Trailer</trailer> senseless for me because the hole DVD is in XBMC present

Any hints for the corrupted tags are highly welcome.



I cant include the scraper into the addons dir. Can you attach a addon.xml file please?
Ah... I wrote it by myself... I've copied another addon.xml and edit it.
Thx for the scraper.
But i got no Covers ;(
@Eisbahn, mind posting an add-on ready version of the scraper? Otherwise users with newer builds won't be able to test and give feedback. Thanks.

at the moment the scraper is only ready for v9.11, not the upcomming v10 with the new structure. Sadly I do not have any infos how v10 should be implemented, wiki is empty and in the forum I couldn't find any infos as well...<http://wiki.xbmc.org/index.php?title=Add..._Extension > there are no infos for scrapers => No infos, no scaper :=(
For me with v9.11 unzipping and copying the two files into the video scraper dir works fine, can test v10 in a VM in a few hours. But I excpect it wont work out of the box with v10 as olympia wrote in another thread.
I don't know how often I asked: what tags are supported by XBMC v9.11 and v10 and what is the meaning of each? Any infoy about the structure in v10? Think it should be no problem (ok, a little) to get it working with v10.

All tags you've listed are obsolete due to the fact they are for shows, not needed for german stuff or handled internally. I am completely with you for the trailer stuff but i bet some users will request it as soon the scraper hits the official repository :p

The only difference for the upcoming dharma release is how scrapers and settings are handled. I'd recommend you to have a look at the other scrapers that are already add-ons to get an overview until the wiki is updated.
Hi vdrfan,

just found <http://xbmc.git.sourceforge.net/git/gitweb.cgi?p=xbmc/scrapers;a=commit;h=5b59dec81b4e5046a3a515bc0cc6fd68ba408201>. Hope this are actual and proper xml examples, will try it this evening at home.
Are any docs out right now? I know this situation from real life: no docs ready, but client wants an implementation of feature X. No problem, but if the client does not say what he realy wants, it wont be a cheap solution and both sides are frustrated at the end... => Normally I do not accept any contracts without clear rules, or I adapt the price a bit ;=)

Pages: 1 2 3 4 5 6 7 8