2009-04-22, 08:30
well, you can just goggle for the xml-escape chars, it is general xml.
Nicezia Wrote:for the record when i use scrap using scrap.exe the url returns 'Blah+blah+blah' when i use xbmc the log reports that its scrapng for 'Blah%20blah%20blah'Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character...
UsagiYojimbo Wrote:Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... ?
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy searchWell, see for yourself:HTML URL Encoding @ W3Schools
UsagiYojimbo Wrote:Well, see for yourself:HTML URL Encoding @ W3Schools
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.
BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing
trondmm Wrote:%20 means space in the HTTP standard, while + means space in the CGI standard. So, as part of a key=value pair, after the ? in the URL, both %20 and + means space. But, before the ? only %20 will mean space.That is true, but it does not apply here, as the problem occurs in a search query, thus after the ? mark...
Nicezia Wrote:um... sure if it was a simple matter of encoding yes they would mean the same, but what we're dealing with is the site's search routine and not simply the encoding.... apparently the site for which i am working with interprets a + as MUST exist and not as a space... its server side interpretation, not W3 consortium standard encoding that is the matter... I understand what you're saing but the interpretation of symbols is sometimes different in a website's search routineWell it must be encoding, otherwise the search engine of that site would not come across your + sign, just a space character...