Hello everyone,
I have a problem with this scraper (and the non ASCII characters, like accents in french language).
Basically, when there is an accent, it is replaced by the URL-encoded UTF-8 code of this character. For instance, é becomes é.
Not every movies fields are affected by this bug, and it depends on the site used to fetch the data.
For example :
- With the scraper configured for IMDb, the main title is affected.
- With the scraper configured for themoviedb.org, the main title is not affected, but the original title and the description are.
But an example is woth a thousand words, so here is one :
The movie "La cité de la peur" becomes "La cité de la peur".
You can found a complete debug log captured while fetching this movies data (with de script configured to use themoviedb.org) here :
http://pastebin.com/hFXQ3LSj
Any help would be really apreciated.
PS: My whole database is full of these crappy-encoded characters, I was thinking about making a script to clean it up (like read, replace, write on the sqlite db), do you know if someone made something remotely like that so that I can build on it, or is it a totally bad idea ?
PS2: I noticed the same behaviour from the now deprecated IMDb scraper (
http://wiki.xbmc.org/index.php?title=Add-on:IMDb), so it might not be specific to this scraper and could require a global correction.
PS3: In case you asked : yes it used to work perfectly and no I can't remember when it stop working.