Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (http://forum.kodi.tv)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Scraper Development (/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Nicezia - 2009-04-30 20:44

Just to make sure i'm getting this right(don't blast me i am just trying to verify my work)

Code:
scraper $$(20) //array of 20 strings
|
|                        
Function //9 string fields(the info is compiled to xml format and sent as a
     |                                                  single string back to one of the 20
     |                                                    buffers)
     |
     |
     Regular Expression (sends the info back to the function arrays)
         |
         |
         Expression (makes matches for each field and sends back to the
                                   RegExp each field 1-9 as an array)

Am i understanding right? that from expression we have a a results-determined amount of or 9 string arays (var[?][8]) which is compressed into a single string by the RegExp (var[8]), which is sent to the functions in one of 9 possible string variables (var = single string) collected from each regexp then the function compresses these 9 fields into a single string which is sent to one of twenty of the scraper buffers, and At the end of a function the clearbuffers(if set) clears the 9 function fields?


- Nicezia - 2009-05-01 05:27

Nevermind i just figured the whole thing out, seems i was thinking about it in the wrong way, i have it figured out now...

however, is that the option override the regexp ignore culture-specifics?


- spiff - 2009-05-02 01:37

i do not understand what you mean.


- Nicezia - 2009-05-02 22:16

i suppose it would be easier just to ask what the override option does, because i haven't got a clue, i'm guessing it has something to do with the regular expression engine. but i'm not quite clear on what it sets the reg expression engine to do.


- spiff - 2009-05-03 00:15

there is no override option?


- Nicezia - 2009-05-03 13:36

From The Scaper.Xml Wiki:

Quote:conditonal="<condition>": A condition that must resolve to TRUE for the particular RegExp to be run. Currently the only available condition is "override", which is set based on the Language Override setting in the scraper.

Can you point me to the code that handles this function in XBMC?


- spiff - 2009-05-03 13:54

aha.

those are scraper settings. they are the stuff returned from the <GetSettings> scraper function.

see PluginSettings.cpp (CBasicSettings) and ScraperSettings.cpp.

they are also used with the $INFO[settingname] construct


- Nicezia - 2009-05-03 14:21

oh ok, perhaps that's why it didn't make sense to me, i haven't tackled the whole custom function thing yet.


What's the protocol on get settings - Nicezia - 2009-05-05 00:58

btw, i have it running most scrapers, ported it to monodevelop and compiled it and it works the same on both mono and .NET. Currently its console only (I seperated it from the ui, because the UI actually was kinda distracting me from coding the damn thing.(I'd code a bit, and then slip over to the ui to consider how to integrate that code into the UI.)

I had asked a question, but once i looked into plugin settings that was answered for me Smile


- spiff - 2009-05-05 13:22

great! Smile

i would put alot of effort into having your parser work as a library if i were you. it will make it alot more useful, in particular i sincerly hope that stuff like MIP will pick it up.


- Gamester17 - 2009-05-05 17:10

If you open source this code/library and could pitch this concept in a better way to MediaPortal and MeediOS developers then we could all use the same scrapers and share the development of them, checkout:
http://forum.team-mediaportal.com/improvement-suggestions-46/suggestion-use-xbmcs-xml-scrapers-http-scraping-35312/
and:
http://www.meedios.com/forum/viewtopic.php?t=2238

Just the same all other open source media managers that are dotnet based could use this as well:
http://forum.xbmc.org/tags.php?tag=media+manager

Possible making this scraper method (as well as XBMC's NFO formatting) the open standard for all open source media center and media management applications Cool


- Nicezia - 2009-05-06 06:14

spiff Wrote:great! Smile

i would put alot of effort into having your parser work as a library if i were you. it will make it alot more useful, in particular i sincerly hope that stuff like MIP will pick it up.



I think that's going to be the only way seeing as how Monodevelop has absolutely no support for creating a GUI with .NET, save for C# which i don't know the slightest bit about.

As far as it goes with just the console module, all i have to finish up at this moment is my considerations of custom functions(which i'm working on at this moment) and then go back and account for error handling and streamline the code.


- Nicezia - 2009-05-06 06:28

Gamester17 Wrote:If you open source this code/library and could pitch this concept in a better way to MediaPortal and MeediOS developers then we could all use the same scrapers and share the development of them, checkout:
http://forum.team-mediaportal.com/improvement-suggestions-46/suggestion-use-xbmcs-xml-scrapers-http-scraping-35312/
and:
http://www.meedios.com/forum/viewtopic.php?t=2238

Just the same all other open source media managers that are dotnet based could use this as well:
http://forum.xbmc.org/tags.php?tag=media+manager

Possible making this scraper method (as well as XBMC's NFO formatting) the open standard for all open source media center and media management applications Cool

Not A Bad Idea at all i hadn't even considered that, but I was considering integrating this into a catalog manager that i wrote, it handles books and comics, movies, TVShows , the only thing is with it so far is that all info has to be put in manually (except for movies, which uses theMovieDB Api).

But it would definately be a good idea for everyone to have a unified and forwards thinking scraper library.

Since my end goal is making an editor for the ScraperXMLs (and now that i have a better understanding myself of how these things work, that goal seems simpler than it did last week) maybe that would calm the people who are saying its too difficult to program for .... (And i actually dissagree with that now considering that I have had no formal programming training myself and have only been self-teaching myself programming for about 11 months)


- Nicezia - 2009-05-06 14:20

Ummm, i kinda coded it this way without thinking about it, but just to make sure... the Function dest only tells XBMC where to look for the Information that was gathered by its root RegExp Element... right?

I was just looking back through my code and checking for errors when i realized there was a skip in the chain between regExp and the function, but i was using the Function Dest to pull the info to send to either get a page or pull results (based on what function is in play)

And last thing i need to verify is... while a custom function is running it uses a seperate Bufferspace from the main one with a completely fresh set of buffers $$1-$$20?


- spiff - 2009-05-06 14:33

yes on the first one.

the second one; depends on whether or not the calling function has clearbuffers="no" set. if it isn't set, clear buffers after function execution, if it is set do not and hence the next function should be called with the previous buffer state (excluding the first one which holds the data of course).