2017-01-23, 22:14
OK, so it's "correct", but can it unzip files zipped by a windows system?
update:
I copied the zip fife created by Windows 7 over to an Android tablet and viewed it using the system app "File Manager". This one provides an option to set the text encoding. By default it selects UTF-8. In this encoding, apparently the character U+00B0 ( ° ) as encoded in the zip is invalid UTF-8 - the android app shows it as U+FFFD ( � ).
So I switched the encoding to ISO-8859-1. Doing this I get the same results as seen in Kodi, namely it is displayed as U+00F8 ( ø ). I don't know how to look at a zip file using a hex editor to see what actually is set as the filename, but it appears that Windows does do something different from Android when it zips filenames.
Since Windows 7 itself and 7-zip correctly unzip the filename, I assume that it's some sort of windows VB or VC function or win32 api that is invoked during zip/unzip and where the problem is introduced.
update2
Played around with some zip files in a hex editor. I see the filenames are uncompressed in these zips so I could look at what is happening. Results are strange/inconsistent (IMHO).
1. It appears the included Windows file archiver/zipped in Windows (File) Explorer can only handle filenames in the 0x00- 0xFF namespace. However, Windows doesn't correctly encode these filenames using CP-1252, ISO-8859-1 or any other 8 bit encoding that I could see. 7-Zip produces identical filenames in the zip file.
2. Filenames in Windows with characters 0x0100 and above in the unicode BMP results in the Windows (File) Explorer not zipping the files, instead raises a popup stating that filename characters couldn't be zipped. 7-Zip does zip these files, and seems to be using UTF-8 encoding but this is only done correctly for characters 0x0100 and above. Characters in the range 0x80 - 0xFF are not properly encoded into UTF-8.
scott s.
.
update:
I copied the zip fife created by Windows 7 over to an Android tablet and viewed it using the system app "File Manager". This one provides an option to set the text encoding. By default it selects UTF-8. In this encoding, apparently the character U+00B0 ( ° ) as encoded in the zip is invalid UTF-8 - the android app shows it as U+FFFD ( � ).
So I switched the encoding to ISO-8859-1. Doing this I get the same results as seen in Kodi, namely it is displayed as U+00F8 ( ø ). I don't know how to look at a zip file using a hex editor to see what actually is set as the filename, but it appears that Windows does do something different from Android when it zips filenames.
Since Windows 7 itself and 7-zip correctly unzip the filename, I assume that it's some sort of windows VB or VC function or win32 api that is invoked during zip/unzip and where the problem is introduced.
update2
Played around with some zip files in a hex editor. I see the filenames are uncompressed in these zips so I could look at what is happening. Results are strange/inconsistent (IMHO).
1. It appears the included Windows file archiver/zipped in Windows (File) Explorer can only handle filenames in the 0x00- 0xFF namespace. However, Windows doesn't correctly encode these filenames using CP-1252, ISO-8859-1 or any other 8 bit encoding that I could see. 7-Zip produces identical filenames in the zip file.
2. Filenames in Windows with characters 0x0100 and above in the unicode BMP results in the Windows (File) Explorer not zipping the files, instead raises a popup stating that filename characters couldn't be zipped. 7-Zip does zip these files, and seems to be using UTF-8 encoding but this is only done correctly for characters 0x0100 and above. Characters in the range 0x80 - 0xFF are not properly encoded into UTF-8.
scott s.
.