cyrillic filenames in python scraper - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32) +--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60) +--- Thread: cyrillic filenames in python scraper (/showthread.php?tid=364911) |
cyrillic filenames in python scraper - alanhk - 2021-10-12 Hi ! I am writing a python scraper for russian films. its almost complete. last problem when I get movie filename from KODI I got something like. movie title become in wrong coding. (Pdb) print(sys.argv[2]) ?action=find&pathSettings=%7b%7d&title=%d0%9a%d0%be%d0%bd%d1%82%d0%b0%d0%ba%d1%82&year=2012 (Pdb) print(sys.argv[2].encode()) b'?action=find&pathSettings=%7b%7d&title=%d0%9a%d0%be%d0%bd%d1%82%d0%b0%d0%ba%d1%82&year=2012' In result i can't pass corect search name to my scraper. Can anyone help me to solve it? RE: cyrillic filenames in python scraper - alanhk - 2021-10-13 все решилось вот таким образом def decode_str(s: str): return bytes.fromhex(s.replace('%25', '')).decode(encoding='utf-8') тема закрыта RE: cyrillic filenames in python scraper - Karellen - 2021-10-13 @alanhk I've moved you to another forum. Hopefully a developer will spot your post and comment. Maybe @Roman_V_M or @pkscout might know RE: cyrillic filenames in python scraper - pkscout - 2021-10-13 Sorry, no help here. Python string decoding still feels like something of a black art to me. RE: cyrillic filenames in python scraper - Roman_V_M - 2021-10-14 The topic starter said that he resolved his/her problem with the mentioned code that is somewhat weird IMO. This is a simple URL encoding:
@pkscout There is nothing complex with encoding/decoding as soon as you understand the difference between abstract text (unicode in Python 2 and str in Python 3) and its binary representation (str in Python 2 and bytes in Python 3). A text encoding is just a set of rules for converting text to/from its binary representation. An encode error happens when a text encoding you are using cannot represent a specific text character as a sequence of bytes and a decode error happens when a text encoding cannot interpret some sequence of bytes. For example, ASCII encoding (a fixed length 1 bytes encoding) include only English characters and some punctuation and control symbols. On the other side, UTF-8 (a variable length 1-4 bytes encoding) can represent every possible character described in the Unicode standard. rant mode on When Python core developers introduced unicode type in Python 2.0 for storing abstract text they IMO committed a cardinal sin by allowing unicode and str types to be used in the same context by applying implicit decoding from ASCII, thus breaking Python's strong typing paradigm. Fortunately, it was finally fixed in Python 3. rand mode off BTW, text encodings were used long before computers were invented. For example, Morse code was a variable-length binary encoding for transmitting text over telegraph wires. RE: cyrillic filenames in python scraper - alanhk - 2021-10-14 (2021-10-14, 10:33)Roman_V_M Wrote: The topic starter said that he resolved his/her problem with the mentioned code that is somewhat weird IMO. This is a simple URL encoding:Thank you very much !!! |