2009-09-21, 23:05
Just to let you know, in scraperXML i added a feature to expression
fixamp="1,2...."
this fixes stray ampersands in match groups in the same way encode, noclean, & trim work
adding a new function to scraperparser
FixAmp
I'm still not sure how this would be done in c++
but i think this is about right (i'm sure you'll be able to spot any errors i've made in it)
also the code in ScraperParser:arseExpression would have to be modified to call this function, but seeing as how that's the one function in ScraperParser that confounds me i can't actually try to show how it would be done, however it would follow the same model as noclean, trim and encode. also the part in bold needs to be edited to use XBMC Regular Expression Engine to take into acount ISO-8559 entities.
Anyway i did this because there seems to be alot of problems with stuff like Biographies & Descriptions which throw ampersands in there and make the result unparesable, or require the scraper writer to play games with the output to fix it, when a real simple function can fix it.
fixamp="1,2...."
this fixes stray ampersands in match groups in the same way encode, noclean, & trim work
adding a new function to scraperparser
FixAmp
I'm still not sure how this would be done in c++
but i think this is about right (i'm sure you'll be able to spot any errors i've made in it)
Code:
ScraperParser::FixAmp(CStdString &strXML)
{
CStdString strFixed = "";
for (int i = 0; i < strXML.length; i++)
{
if (strXML[i] = '&')
{
if (strXML.substring(i, 5) == "&"
|| strXML.substring(i, 4) == "<"
|| strXML.substring(i, 4) == ">"
|| strXML.substring(i, 6) == "'"
|| strXML.substring(i, 6) == """
|| [b]RegEx.Match(strXML.substring(i, 7), "&x[0-9]+;").Success[/b])
{
strFixed = strXML[i];
}
else
{
strFixed += strXML[i];
}
}
else
{
strFixed += strXML[i];
}
}
strXML = strFixed;
}
also the code in ScraperParser:arseExpression would have to be modified to call this function, but seeing as how that's the one function in ScraperParser that confounds me i can't actually try to show how it would be done, however it would follow the same model as noclean, trim and encode. also the part in bold needs to be edited to use XBMC Regular Expression Engine to take into acount ISO-8559 entities.
Anyway i did this because there seems to be alot of problems with stuff like Biographies & Descriptions which throw ampersands in there and make the result unparesable, or require the scraper writer to play games with the output to fix it, when a real simple function can fix it.