Filmweb scraper
#31
can someone review and then commit this to the SVN?
Reply
#32
i have a script before every location url

ex.
details.1.html

how can i force scraper to skip this

smuto
Reply
#33
i add "spoof" to url, mayby this help
u can test my wip scraper

filmweb.xml_test
Reply
#34
spoof is for setting the referer. it probably does the trick indeed. sorry for the late response
Reply
#35
maybe it's not xbmc problem, but maybe u can help

Recently in movie info from filmweb scraper, accented characters are show as a entities

ex.
latin small letter o with acute
ó -> ó

is the way to fix this
smuto
Reply
#36
hmm, it should convert those tags when you load the xml?
if not, make sure cleaning is performed on the field. latter would remove them though
Reply
#37
with or without "noclean" i still have this same

ex
xbmc shows
który -> ktoacute;ry

in .xml from scrap.exe
który -> który

realy don't know what to do

i need to update SVN (small url function link fix)
but for now,this one is good for testing entitie
filmweb.xml_test

good for test is "Kingdom of the Crystal Skull"
tag title is OK
tags outline & plot are wrong
Reply
#38
for myself i edit source file HTMLUtil.cpp
edited HTMLUtil.cpp
Code:
strReturn.Replace("–", "-;");
  strReturn.Replace("ó", "ó");

it's working, but i hope u help to fix this for all polish users

smuto
Reply
#39
i add fanart to filmweb scraper

i use polish wikipedia to migration from filmweb.id to imdb.id

we still have problem with entities, hope spiff find time to help us

u can test new scraper from here
filmweb.xml_test_scraper

smuto
Reply
#40
hi.

i see nothing wrong, nor any other way to handle this so i just commited your replaces along with the new scraper. please use trac in the futureSmile

spiff
Reply
#41
@smuto: There are some problems with titles that start with numbers eg. "1410" or "27 dresses" - numbers are cut off from them. Fanart support is really great.
I hope that xbmc compilation with edited HTMLutil.cpp will be ready soon. At this moment you could put your compiled xbmc default.xbe at smuto.w.interia.pl (would be great for me, because i want to rescan my movie library and polish plots with no entity problems is something I look for...)
Reply
#42
Eventhough entities has been fixed with changeset 15625 it seems that "oacute problem" still exists (I checked filmweb scraper on xbmc compilations 15640 and 15728). Smuto - do you agree with me?
Reply
#43
@haken - u just need to update scraper
filmweb.xml

@spiff
Quote:i see nothing wrong, nor any other way to handle this so i just commited your replaces
but this is not a good idea - "oacute" & "ndash" are most popular
this mean i should add all entities to replaces
next in my queue are
Code:
strReturn.Replace(" ", "");
  strReturn.Replace("’", "'");
smuto
Reply
#44
i don't know why, but sometimes wikipedia search don't work

i change the way of scraping the link after search - please test
filmweb.xml

is the way to show in skin custom label?

something like this i need for testing
ListItem.IMDbID or ListItem.FilmwebID

smuto
Reply
#45
@smuto: I think that there are some changes in filmweb.pl website - descriptions cannot be scraped and high-res posters also. I looked inside the scraper, but it is to complicated for meWink

Update: Scraper is ok! It was something else - now everything works perfect. I was surprised because each time earlier scraper worked or didn't work at all... Sorry!
Reply
 
Thread Rating:
  • 2 Vote(s) - 3 Average



Logout Mark Read Team Forum Stats Members Help
Filmweb scraper32