HOW-TO write Media Info Scrapers - Scraper creation for dummies
#16
well, you can just goggle for the xml-escape chars, it is general xml.
Reply
#17
Information 
hi guys,

i'm sorry that i have to do this to all of you, but it was necessary as the current state was just embarrassing Smile

please mind http://trac.xbmc.org/changeset/21882

i opted to not keep the old loading code as this would never die. one clean, painful cut Smile
Reply
#18
Star 
Nicezia Wrote:for the record when i use scrap using scrap.exe the url returns 'Blah+blah+blah' when i use xbmc the log reports that its scrapng for 'Blah%20blah%20blah'
Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd

"Now something totally different..." Big Grin

Is there a chapter 3 planned, about scraping tv-shows (tv series in particular)?
As all documentation deals with scraping movies, but I did not found any tv-show related info. Huh
Reply
#19
UsagiYojimbo Wrote:Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd?

No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
Reply
#20
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search
Well, see for yourself:HTML URL Encoding @ W3Schools Nerd
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.

BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)
Reply
#21
UsagiYojimbo Wrote:Well, see for yourself:HTML URL Encoding @ W3Schools Nerd
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.

BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)

um... sure if it was a simple matter of encoding yes they would mean the same, but what we're dealing with is the site's search routine and not simply the encoding.... apparently the site for which i am working with interprets a + as MUST exist and not as a space... its server side interpretation, not W3 consortium standard encoding that is the matter... I understand what you're saing but the interpretation of symbols is sometimes different in a website's search routine
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
Reply
#22
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing

That's because they don't mean the same thing everywhere.

%20 means space in the HTTP standard, while + means space in the CGI standard. So, as part of a key=value pair, after the ? in the URL, both %20 and + means space. But, before the ? only %20 will mean space.
Reply
#23
trondmm Wrote:%20 means space in the HTTP standard, while + means space in the CGI standard. So, as part of a key=value pair, after the ? in the URL, both %20 and + means space. But, before the ? only %20 will mean space.
That is true, but it does not apply here, as the problem occurs in a search query, thus after the ? mark...
Reply
#24
Nicezia Wrote:um... sure if it was a simple matter of encoding yes they would mean the same, but what we're dealing with is the site's search routine and not simply the encoding.... apparently the site for which i am working with interprets a + as MUST exist and not as a space... its server side interpretation, not W3 consortium standard encoding that is the matter... I understand what you're saing but the interpretation of symbols is sometimes different in a website's search routine
Well it must be encoding, otherwise the search engine of that site would not come across your + sign, just a space character...
Reply
#25
Not sure if Im in the right forum

I have a website that I can watch live tv (makolive). Anybody know how I can watch them on my ATV2.

Thanks -
Reply
#26
Hi,

did something change with Eden ? I read in different threads something about unresolved dependencies ?!?!

is there a short way, means without having installed whole XBMC, to test scrapers on linux like it's described for windows (using scrap.exe and so) ?

Greetz

LastCoder
Ubuntu 14.04 LTS 64Bit Server, Xfce, KODI Isengard, Skin Arctic Zephyr, tvheadend tv backend
ASUS P8H61-M LE/USB3, Celeron G530, Geforce 210, 4 GB DDR3 RAM , 1 TB Samsung 2,5" HDD
iHOS104 BluRay Drive, WinTV-HVR-1200 & TT DVBS2-1600
Silverstone GD05B Case, Sony PS3 BD Remote control, Logitech Cordless Mediaboard Pro for PS3
Reply
#27
Can somebody please create the site scraper for this website?
http://www.sakitvs.com/indiantv.htm

Thanks
Raspberry Pi-B with Raspbmc
i5 iMac 21.5"
Macbook Pro
LG PA75U Projector with 113" DIY screen
iPhone 5S/Kodi
Denon AVR S500BT with Klipsch Reference series 5.1 setup
Amazon Fire TV with SPMC
Asus M004U (Coming Soon)

[Image: avatar_1672.gif]
Reply
#28
Is there a replacement for scrap.exe as it seems to be retired? anything to make editing and testing the scraper along the way would be extremely helpful... i'm on windows. thanks in advance.
[Image: widget]
Reply
#29
Hello


I'm trying to create my own scraper but I'm stuck at the part where I put the scraper in XBMC. Putting it in the "C:\Program Files (x86)\XBMC\system\scrapers\video" folder doesn't make it appear in XBMC. Even if I just copy the 'dummy.xml' from the guide and put it in that folder, it doesn't show up, so it isn't an issue with my own little scraper.

Any ideas here?
Reply
#30
The likn to your tool to test scrapers, reported on page 1, is dead. As it is perhaps the most useful tool to any would-be scraper developer, can you update it/provide an alternative?
Reply
 
Thread Rating:
  • 1 Vote(s) - 5 Average



Logout Mark Read Team Forum Stats Members Help
HOW-TO write Media Info Scrapers - Scraper creation for dummies51