Kodi Community Forum
Scraping inconsistency scrap.exe/xbmc? - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: Scraping inconsistency scrap.exe/xbmc? (/showthread.php?tid=26879)

Pages: 1 2


Scraping inconsistency scrap.exe/xbmc? - ezd - 2007-06-16

I'm making a scraper for AsianDB.com. It seems to work flawlessly under scrap.exe, but XBMC misses a lot of info it retrieves. Here's an example details XML output:

Code:
<details>
    <title>Violent Cop</title>
    <year>1989</year>
    <director>Takeshi Kitano</director>
    <runtime>103mins</runtime>
    <thumb>http://www.asiandb.com/data/title/mini/4141.jpg</thumb>
    <rating>7</rating>
    <votes>3</votes>
    <genre>Action</genre>
    <genre>Crime</genre>
    <credits>Takeshi Kitano</credits>
    <credits>Hisashi Nozawa</credits>
    <actor>
        <name>Takeshi Kitano</name>
    </actor>
</details>

XBMC doesn't extract the director, genre, credits (correct way to enter writers?) and actors, but does get all other items.

Is there a bug in my XML output? (Note: pretty-printed for readability, no extra whitespace in actual XML)

Also, pressing X+Y during boot did get me in debug mode, but didn't tell much about the scraping process. Is there a method (like in the old days Big Grin ) to set the debuglevel to 'insane' or similar?

Thanks for any help you can give,

ezd


- ezd - 2007-06-16

For reference, I've upped the current asiandb.xml to pastebin.


- spiff - 2007-06-18

i'm on a conference this week, so this post is only to say that i cannot see anything wrong at first glimse. i hardly have inet accessiblity so i have to wait until i get back home to investigate.


- ezd - 2007-06-18

Thanks for the heads up, no hurry here, mostly did this for the Greater Xbmc Good Smile

Enjoy your conference!


- spiff - 2007-06-26

before each of those you have regexp's that grabs the relevant pieces of the html. on those you don't specify 'noclean="1"' and hence all html tags are stripped off. i guess the scrap.exe doesnt honor this.


- blaize - 2007-07-08

any progress on this scraper ?
i really need this one. Laugh


- spiff - 2007-07-09

then i suggest you finish it


- blaize - 2007-07-09

wise-ass... if i could don't you think i would ?
some people have learned them selfs programming skills, other artistic skills.


- spiff - 2007-07-09

it doesnt take programming skills. that's the whole reason i created the scraper system. it only takes some logic and reading a 10 min regexp guide.


- blaize - 2007-07-09

if you think it's that easy for everyone, then why is 'esd' having problems with it ?
I'm pretty much code-blind, but if you (or anyone else) could give a little help i might give it (another) try.


- spiff - 2007-07-09

ezd had done a simple screwup which i explained.

i'll answer specifics.


- blaize - 2007-07-10

sorry for the triple post, editing post doest seem to work for me for some reason.
a mod can combine/delete the posts if they feel the need.

the changes i made were pretty much only adding noclean="1" on the right places.
i also tried that with the stuff thats still not working (tagline, plot, cast, MPAA rating) but that didnt change anything.
so i edited those wrong or something else is wrong that i'm missing.

-blaize


- pike - 2007-07-10

how the heck did you manage ? the 2 posts are 10 (TEN) minutes apart!


- blaize - 2007-07-10

i know, i went back a page (history) but because i'm walking bcak and forth my PC and box i got confused and though i pressed edit (still cant find that button >_>)
thats how i reposted it.

Actors are working now, a stupid typo Confused


- ezd - 2007-07-22

Sorry for the slow reply, been away for a while, thanks Spiff for your reply. Still had a strange problem with XBMC hanging when I enabled plot extraction, but no problem in the new build, so I've upped the scraper on Sourceforge for inclusion.

For those in a hurry to enable Asiandb:

https://sourceforge.net/tracker/?func=detail&aid=1758452&group_id=87054&atid=581840

Cheers,

ezd