2014-08-02, 15:22
Hi! I'm using Universal Movie Scraper and recently I've encountered this problem:
Some movies have incomplete title or genre information.
The title ends with </title> and genre ends with </genre> .
I've checked the log and find out that data from TMDb in English are correctly parsed while some information in Chinese are not.
I believe these question marks in scraper returned information caused the problem.
I manually downloaded the json file via the same URLThe file is correctly encoded and can be successfully parsed by various JSON viewers.
So I assume there is a bug where Universal Movie Scraper handles Chinese characters.
Please look into this issue.
Some movies have incomplete title or genre information.
The title ends with </title> and genre ends with </genre> .
I've checked the log and find out that data from TMDb in English are correctly parsed while some information in Chinese are not.
Code:
15:40:03 T:4677595136 DEBUG: CurlFile::Open(0x112c2a950) http://api.tmdb.org/3/movie/tt0896872?api_key=***omitted***&language=zh
15:40:04 T:4677595136 DEBUG: Get: Using "UTF-8" charset for "http://api.tmdb.org/3/movie/tt0896872?api_key=***omitted***&language=zh"
15:40:04 T:4677595136 DEBUG: scraper: ParseTMDBTitle returned <details><title>告密?/title></details>
*** irrelevant log content omitted ***
15:40:06 T:4677595136 DEBUG: scraper: GetTMDBLangGenresByIdChain returned <details><url function="ParseTMDBGenres" cache="tmdb-zh-tt0896872.json">http://api.tmdb.org/3/movie/tt0896872?api_key=***omitted***&language=zh</url></details>
15:40:06 T:4677595136 DEBUG: scraper: ParseTMDBGenres returned <details><genre>犯罪</genre><genre>剧?/genre><genre>惊悚</genre></details>
I believe these question marks in scraper returned information caused the problem.
I manually downloaded the json file via the same URL
Code:
http://api.tmdb.org/3/movie/tt0896872?api_key=***omitted***&language=zh
So I assume there is a bug where Universal Movie Scraper handles Chinese characters.
Please look into this issue.