Kodi Community Forum

Full Version: HOW-TO write Media Info Scrapers - Scraper creation for dummies
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
well, you can just goggle for the xml-escape chars, it is general xml.
hi guys,

i'm sorry that i have to do this to all of you, but it was necessary as the current state was just embarrassing Smile

please mind http://trac.xbmc.org/changeset/21882

i opted to not keep the old loading code as this would never die. one clean, painful cut Smile
Nicezia Wrote:for the record when i use scrap using scrap.exe the url returns 'Blah+blah+blah' when i use xbmc the log reports that its scrapng for 'Blah%20blah%20blah'
Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd

"Now something totally different..." Big Grin

Is there a chapter 3 planned, about scraping tv-shows (tv series in particular)?
As all documentation deals with scraping movies, but I did not found any tv-show related info. Huh
UsagiYojimbo Wrote:Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd?

No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search
Well, see for yourself:HTML URL Encoding @ W3Schools Nerd
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.

BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)
UsagiYojimbo Wrote:Well, see for yourself:HTML URL Encoding @ W3Schools Nerd
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.

BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)

um... sure if it was a simple matter of encoding yes they would mean the same, but what we're dealing with is the site's search routine and not simply the encoding.... apparently the site for which i am working with interprets a + as MUST exist and not as a space... its server side interpretation, not W3 consortium standard encoding that is the matter... I understand what you're saing but the interpretation of symbols is sometimes different in a website's search routine
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing

That's because they don't mean the same thing everywhere.

%20 means space in the HTTP standard, while + means space in the CGI standard. So, as part of a key=value pair, after the ? in the URL, both %20 and + means space. But, before the ? only %20 will mean space.
trondmm Wrote:%20 means space in the HTTP standard, while + means space in the CGI standard. So, as part of a key=value pair, after the ? in the URL, both %20 and + means space. But, before the ? only %20 will mean space.
That is true, but it does not apply here, as the problem occurs in a search query, thus after the ? mark...
Nicezia Wrote:um... sure if it was a simple matter of encoding yes they would mean the same, but what we're dealing with is the site's search routine and not simply the encoding.... apparently the site for which i am working with interprets a + as MUST exist and not as a space... its server side interpretation, not W3 consortium standard encoding that is the matter... I understand what you're saing but the interpretation of symbols is sometimes different in a website's search routine
Well it must be encoding, otherwise the search engine of that site would not come across your + sign, just a space character...
Not sure if Im in the right forum

I have a website that I can watch live tv (makolive). Anybody know how I can watch them on my ATV2.

Thanks -
Hi,

did something change with Eden ? I read in different threads something about unresolved dependencies ?!?!

is there a short way, means without having installed whole XBMC, to test scrapers on linux like it's described for windows (using scrap.exe and so) ?

Greetz

LastCoder
Can somebody please create the site scraper for this website?
http://www.sakitvs.com/indiantv.htm

Thanks
Is there a replacement for scrap.exe as it seems to be retired? anything to make editing and testing the scraper along the way would be extremely helpful... i'm on windows. thanks in advance.
Hello


I'm trying to create my own scraper but I'm stuck at the part where I put the scraper in XBMC. Putting it in the "C:\Program Files (x86)\XBMC\system\scrapers\video" folder doesn't make it appear in XBMC. Even if I just copy the 'dummy.xml' from the guide and put it in that folder, it doesn't show up, so it isn't an issue with my own little scraper.

Any ideas here?
The likn to your tool to test scrapers, reported on page 1, is dead. As it is perhaps the most useful tool to any would-be scraper developer, can you update it/provide an alternative?
Pages: 1 2 3