Kodi Community Forum - Filmweb scraper

Pages: 1 2 3 4 5 6

can someone review and then commit this to the SVN?

i have a script before every location url

ex.
details.1.html

how can i force scraper to skip this

smuto

i add "spoof" to url, mayby this help
u can test my wip scraper

filmweb.xml_test

spoof is for setting the referer. it probably does the trick indeed. sorry for the late response

maybe it's not xbmc problem, but maybe u can help

Recently in movie info from filmweb scraper, accented characters are show as a entities

ex.
latin small letter o with acute
ó -> ó

is the way to fix this
smuto

hmm, it should convert those tags when you load the xml?
if not, make sure cleaning is performed on the field. latter would remove them though

with or without "noclean" i still have this same

ex
xbmc shows
który -> ktoacute;ry

in .xml from scrap.exe
który -> który

realy don't know what to do

i need to update SVN (small url function link fix)
but for now,this one is good for testing entitie
filmweb.xml_test

good for test is "Kingdom of the Crystal Skull"
tag title is OK
tags outline & plot are wrong

for myself i edit source file HTMLUtil.cpp
edited HTMLUtil.cpp

Code:
strReturn.Replace("&ndash;", "-;");

  strReturn.Replace("&oacute;", "ó");

it's working, but i hope u help to fix this for all polish users

smuto

i add fanart to filmweb scraper

i use polish wikipedia to migration from filmweb.id to imdb.id

we still have problem with entities, hope spiff find time to help us

u can test new scraper from here
filmweb.xml_test_scraper

smuto

hi.

i see nothing wrong, nor any other way to handle this so i just commited your replaces along with the new scraper. please use trac in the future Smile

spiff

@smuto: There are some problems with titles that start with numbers eg. "1410" or "27 dresses" - numbers are cut off from them. Fanart support is really great.
I hope that xbmc compilation with edited HTMLutil.cpp will be ready soon. At this moment you could put your compiled xbmc default.xbe at smuto.w.interia.pl (would be great for me, because i want to rescan my movie library and polish plots with no entity problems is something I look for...)

Eventhough entities has been fixed with changeset 15625 it seems that "oacute problem" still exists (I checked filmweb scraper on xbmc compilations 15640 and 15728). Smuto - do you agree with me?

@haken - u just need to update scraper
filmweb.xml

@spiff

Quote:i see nothing wrong, nor any other way to handle this so i just commited your replaces

but this is not a good idea - "oacute" & "ndash" are most popular
this mean i should add all entities to replaces
next in my queue are

Code:
strReturn.Replace("&nbsp;", "");

  strReturn.Replace("&rsquo;", "'");

smuto

i don't know why, but sometimes wikipedia search don't work

i change the way of scraping the link after search - please test
filmweb.xml

is the way to show in skin custom label?

something like this i need for testing
ListItem.IMDbID or ListItem.FilmwebID

smuto

@smuto: I think that there are some changes in filmweb.pl website - descriptions cannot be scraped and high-res posters also. I looked inside the scraper, but it is to complicated for me Wink

Update: Scraper is ok! It was something else - now everything works perfect. I was surprised because each time earlier scraper worked or didn't work at all... Sorry!

Pages: 1 2 3 4 5 6