Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (http://forum.kodi.tv)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Scraper Development (/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- xyber - 2009-07-09 18:03

Was just saying that your verify function might cause problems for your log function. But I did not check how you use it internally. I don't make calls to it so does not matter to me.

Great lib btw Smile Saved me a ton of work from so far. I'll announce the media manage I am working on soon. Still deciding if I want to first complete the TV eps section.


- Nicezia - 2009-07-09 22:24

thanks It'll be nice to se my work implemented in something other than my test programs

I have the Tv Show stuff working now, but i need a little input from spiff to understand something about the GetEpisodeList Function as i don't think i understand it as well as i thought

@spiff are there values passed to buffers during GetEpisodeList ?

I'm guessing its the same as GetEpisodeDetails, because i see it (tvdb scraper) looking for a value for cache but i want to be sure....


- xyber - 2009-07-10 16:28

I notice Application error (Rails) for http://www.themoviedb.org/movie today
and that causes ((VideoScraper)scraper).GetDetails(...); to fail when I'm using IMDB with Fanart selected.

For a user this would seem like the scraper failed from getting info from IMDB while it was really just the fanart he needed to turn off. I wonder if there is a way you could allow us to query your lib for more info on errors occuring inside your lib. So if I ask your lib why it did not return data from GetDetails I could see its a fail on retrieving a list of fanart and then at least prompt the user to turn it off in settings, or do it in the code and run the query again. .. or we can just hope that kinda problem with themoviedb don't happen too often Wink


- Nicezia - 2009-07-10 19:55

Will edit it to log scraper return values

However what exactley is the error/failure, i haven't seen this error and can't fix it if i don't know the full details of it. it retrives fanart just fine for me.


- smeehrrr - 2009-07-10 21:12

I had problems with fanart last night, the server was returning HTTP 500 errors. I don't believe it had anything to do with your library - it would have to either be a server side problem or a problem with the scraper itself. And it only happened on some titles. I haven't tried yet today to see if the same thing is happening.


- Nicezia - 2009-07-10 22:49

a question for users of this, would you rather the urlencoded trailer link or should i change it to a simple link if its urlencoded?


- smeehrrr - 2009-07-10 23:33

Nicezia Wrote:a question for users of this, would you rather the urlencoded trailer link or should i change it to a simple link if its urlencoded?
I don't understand this question.


- Nicezia - 2009-07-11 04:49

well some trailers come urlencoded (i.e. http%3A%2F%2Fwww.foo.com%2Furl.flv)

would you rather me leave it like that before sending to final details or would you like to decode the urlencoding? (seeing as how i'm creating a charset convertor)


- smeehrrr - 2009-07-11 07:06

It would make more sense for your API to return a decoded URL in the MovieTag object, I think. At any rate it should be consistent, so if some scrapers return an encoded url and others return a decoded one, you should pick one and normalize to that form always.


- xyber - 2009-07-11 13:23

Nicezia Wrote:Will edit it to log scraper return values

However what exactley is the error/failure, i haven't seen this error and can't fix it if i don't know the full details of it. it retrives fanart just fine for me.

It was http://www.themoviedb.org/ that was returning a server 500 error as someone else mentioned. Not error in your lib. What I was thinking though is that it would be nice if you could create a class one could access to query errors that occured.

For example,
Code:
public string GetDetails(string strResultsEntity)
dropped to the catch
Code:
//Exception handling Web
catch (System.Net.WebException wex)
{
    if (LoggingEnabled)
    {
        IO.Log(ScraperLogFile, "WebException :" + wex.Message);
    }
    return null;
}
which caused GetDetails to return a null. Problem though is that it actually got far enough to return details it got from IMDB, it was when it was looking for fanart URLs at themoviedb that it failed. So, if you where to save some error code somewhere where my code could have a look I could see it was maybe fanart or something that caused the fail and ask the user to turn off that setting if he don't mind getting fanart.

It was one of the XElement SecondPass = CustomFunctionParse(item2); calls that caused the exception. Can't remember if it was the first or second pass one.

Anycase, now that I think bout it, it might be hard to actually tell what went wrong as it could be a query to a server for trailers, or one for fanart or another server for posters. So it won't be easy knowing what to tell the user to turn off in the scraper settings.


- xyber - 2009-07-11 13:25

Nicezia Wrote:well some trailers come urlencoded (i.e. http%3A%2F%2Fwww.foo.com%2Furl.flv)

would you rather me leave it like that before sending to final details or would you like to decode the urlencoding? (seeing as how i'm creating a charset convertor)

Don't really care. Long as I can use it to download the trailer via my app when I finally get around coding that part. Guess it would be better to show an unencoded version to the user if an app where to show the user the url.


- xyber - 2009-07-11 16:28

Does the TV Show section of the lib work? I get very strange results or none at all using TvScraper.CreateSearch and TvScraper.GetDetails

this is TvScraper.GetDetails for Heroes on IMDB
Code:
<tvshow>
  <plot>tt0813715</plot>

and rest of it is only episodeguide, thumbs, backdrops and actor tags.
and I just spot this in scraper.log
Code:
2009/07/11 - 16:37:59 : XML Exception: There are multiple root elements. Line 1, position 88. - unable to parse <url function="GetSeriesPremiered">http://akas.imdb.com/title/tt0813715/episodes</url><url function="GetSeriesPlot">http://akas.imdb.com/title/tt0813715/plotsummary</url><url cache="tt0813715-credits.html" function="GetSeriesCast">http://akas.imdb.com/title/tt0813715/</url><url cache="tt0813715-posters.html" function="GetIMPALink">http://akas.imdb.com/title/tt0813715/posters</url><url cache="tt0813715-posters.html" function="GetIMDBPoster">http://akas.imdb.com/title/tt0813715/posters</url><episodeguide><url>http://www.imdb.com/title/tt0813715/episodes</url></episodeguide>



- ultrabrutal - 2009-07-11 19:12

when will we see some of these fixes hitting svn and a new build? (don't have C# compiler installed). I still get crashes in GetDetails and they are not solely because of fanart


- Nicezia - 2009-07-12 02:56

Well i'm in the process of testing a new release now, i suppose somewhere between midweek and the weekend I'll be submitting it to svn,

no TV shows isn't working just yet in the version that's in svn, however i have it working fully in the version i'm testing. and hope to upload really soon.


- Nicezia - 2009-07-12 02:59

If anyone knows anything about ISO charsets & windows codepages (for conversion purposes) i need someone to help with a charset conversion utility from non-latin character sets to unicode