Login at Kodi Home

smuto · 2008-02-23, 18:38

can someone review and then commit this to the SVN?

smuto · 2008-05-25, 21:58

i have a script before every location url

ex.
details.1.html

how can i force scraper to skip this

smuto

smuto · 2008-05-26, 19:06

i add "spoof" to url, mayby this help
u can test my wip scraper

filmweb.xml_test

**spiff** · 2008-05-26, 22:46

spoof is for setting the referer. it probably does the trick indeed. sorry for the late response

smuto · 2008-06-10, 15:21

maybe it's not xbmc problem, but maybe u can help

Recently in movie info from filmweb scraper, accented characters are show as a entities

ex.
latin small letter o with acute
ó -> ó

is the way to fix this
smuto

**spiff** · 2008-06-10, 15:39

hmm, it should convert those tags when you load the xml?
if not, make sure cleaning is performed on the field. latter would remove them though

smuto · 2008-06-11, 21:35

with or without "noclean" i still have this same

ex
xbmc shows
który -> ktoacute;ry

in .xml from scrap.exe
który -> który

realy don't know what to do

i need to update SVN (small url function link fix)
but for now,this one is good for testing entitie
filmweb.xml_test

good for test is "Kingdom of the Crystal Skull"
tag title is OK
tags outline & plot are wrong

smuto · 2008-07-08, 21:51

for myself i edit source file HTMLUtil.cpp
edited HTMLUtil.cpp

Code:
strReturn.Replace("&ndash;", "-;");

  strReturn.Replace("&oacute;", "ó");

it's working, but i hope u help to fix this for all polish users

smuto

smuto · 2008-09-14, 21:23

i add fanart to filmweb scraper

i use polish wikipedia to migration from filmweb.id to imdb.id

we still have problem with entities, hope spiff find time to help us

u can test new scraper from here
filmweb.xml_test_scraper

smuto

**spiff** · 2008-09-18, 00:28

hi.

i see nothing wrong, nor any other way to handle this so i just commited your replaces along with the new scraper. please use trac in the future Smile

spiff

**haken** · 2008-09-24, 18:22

@smuto: There are some problems with titles that start with numbers eg. "1410" or "27 dresses" - numbers are cut off from them. Fanart support is really great.
I hope that xbmc compilation with edited HTMLutil.cpp will be ready soon. At this moment you could put your compiled xbmc default.xbe at smuto.w.interia.pl (would be great for me, because i want to rescan my movie library and polish plots with no entity problems is something I look for...)

**haken** · 2008-10-01, 19:47

Eventhough entities has been fixed with changeset 15625 it seems that "oacute problem" still exists (I checked filmweb scraper on xbmc compilations 15640 and 15728). Smuto - do you agree with me?

smuto · 2008-10-03, 19:09

@haken - u just need to update scraper
filmweb.xml

@spiff

Quote:i see nothing wrong, nor any other way to handle this so i just commited your replaces

but this is not a good idea - "oacute" & "ndash" are most popular
this mean i should add all entities to replaces
next in my queue are

Code:
strReturn.Replace("&nbsp;", "");

  strReturn.Replace("&rsquo;", "'");

smuto

smuto · 2008-10-07, 09:04

i don't know why, but sometimes wikipedia search don't work

i change the way of scraping the link after search - please test
filmweb.xml

is the way to show in skin custom label?

something like this i need for testing
ListItem.IMDbID or ListItem.FilmwebID

smuto

**haken** · (This post was last modified: 2008-11-01, 20:16 by haken.)

@smuto: I think that there are some changes in filmweb.pl website - descriptions cannot be scraped and high-res posters also. I looked inside the scraper, but it is to complicated for me Wink

Update: Scraper is ok! It was something else - now everything works perfect. I was surprised because each time earlier scraper worked or didn't work at all... Sorry!