2013-12-11, 18:50
In latest nighties (at least for a week now) themoviedb scraper isn't able to scrap any movie that has non-ascii chars in the file/folder name. This wasn't an issue before.
Last week it was causing a "No connection" pop-up (as when the scrapper can't connect to the server) and the library update was paused. This week the library update no longer pauses, but for those special files no results are found.
UPDATE: The issue seems to be Windows specific since Linux clients can update library with the same samba share correctly.
Reading other threads and trac issues (haven't found anything relevant for this particular scraper) I think it might be related to the upgrade of PCRE lib and the obligation to specify corrent encoding information in XML files or web responses.
Here is a debug log of a xbmc run trying to update the library for a buch of movies with non-ascii chars in foder/file name.
http://xbmclogs.com/show.php?id=96038
Apparently somewhere someone is doing a lower-case conversion of the file name and the conversion is incorrectly changing utf-codes to other not valid codes.
Taking one example from that log:
The Spanish adapted name "Astérix" part of the file name you can see has been changed to lower case in "query=ast%e3%a9rixo".
But \xe3\xa9 is no the correct code for "é". it should be \xc3\xa9
you can see how the char \xc3 (character à in latin charset) has been incorrectly converted to \xe3 (ã in latin) causing the lack of results. If I manually change the API call to:
I get the correct result:
I'm gonna try and find where the issue lies.
And since a call with upper case A like:
works fine, I guess I'll try to comment the part that incorrectly converts to lowercase.
Last week it was causing a "No connection" pop-up (as when the scrapper can't connect to the server) and the library update was paused. This week the library update no longer pauses, but for those special files no results are found.
UPDATE: The issue seems to be Windows specific since Linux clients can update library with the same samba share correctly.
Reading other threads and trac issues (haven't found anything relevant for this particular scraper) I think it might be related to the upgrade of PCRE lib and the obligation to specify corrent encoding information in XML files or web responses.
Here is a debug log of a xbmc run trying to update the library for a buch of movies with non-ascii chars in foder/file name.
http://xbmclogs.com/show.php?id=96038
Apparently somewhere someone is doing a lower-case conversion of the file name and the conversion is incorrectly changing utf-codes to other not valid codes.
Taking one example from that log:
Code:
16:45:56 T:1788 DEBUG: ADDON::CScraper::FindMovie: Searching for 'Astérix y Obélix Al servicio de su majestad' using The Movie Database scraper (path: 'C:\Users\jurrabi\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.4')
16:45:56 T:1788 DEBUG: scraper: CreateSearchUrl returned <url>http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=ast%e3%a9rix%20y%20ob%e3%a9lix%20al%20servicio%20de%20su%20majestad&year=2012&language=es</url>
16:45:56 T:1788 DEBUG: CurlFile::Open(08D0C5A8) http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=ast%e3%a9rix%20y%20ob%e3%a9lix%20al%20servicio%20de%20su%20majestad&year=2012&language=es
16:45:57 T:1788 DEBUG: scraper: GetSearchResults returned <results></results>
But \xe3\xa9 is no the correct code for "é". it should be \xc3\xa9
you can see how the char \xc3 (character à in latin charset) has been incorrectly converted to \xe3 (ã in latin) causing the lack of results. If I manually change the API call to:
Code:
http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=ast%C3%A9rix%20y%20ob%C3%A9lix%20al%20servicio%20de%20su%20majestad&year=2012&language=es
Code:
{"page":1,"results":[{"adult":false,"backdrop_path":"/cvbqQyF4KIWQUViD2UwDCFe4hvq.jpg","id":99770,"original_title":"Astérix & Obélix - Au service de sa Majesté","release_date":"2012-10-17","poster_path":"/3tQslKo8oNG6mcDz4pF6YyjhDGq.jpg","popularity":3.608053125,"title":"Astérix y Obélix: Al servicio de su majestad","vote_average":5.9,"vote_count":20}],"total_pages":1,"total_results":1}
I'm gonna try and find where the issue lies.
And since a call with upper case A like:
Code:
http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=Ast%C3%A9rix%20y%20ob%C3%A9lix%20al%20servicio%20de%20su%20majestad&year=2012&language=es