XBMC, no German umlauts on locale de_DE.UTF-8 -ISO8859-15- filenames.
#16
Thanks, wsnipex

Could you please provide some information about your setup:

a) Can you please confirm that the encoding of the files on your /tmp directory is ISO8859 ?
Eg: provide the output of

% cd /tmp
% env - sh
% ls *.mkv | od -x

b) Which version of linux, which version of XBMC ?

c) What setting of <locale> in guisettings.xml ?

Thank!
Reply
#17
My PM is enabled and open. Smile

All XBMC code is designed to process all strings as UTF-8 internally, converting from input and to output encoding when necessary. All GUI texts are converted to widestrings, on Win32 input is converted from wchar_t.

XBMC is expecting to get UTF-8 from any POSIX system (FreeBSD, Darwin (OSX and iOS), Linux (including Android))
mount command has special options like "utf8", "iocharset", "codepage", "locale" for FAT, NTFS, cdroms and some other, try to play with it.

Most (if not all) recent Linux/Unix distributions use UTF-8 for filenames. XBMC is not the only app that is expecting only UTF-8 on POSIX system.
I'd suggest you to think about migrating to UTF-8 or try to find some workaround like exporting your media library as NFS/SMB/CIF/WebDAV/FTP and connecting to it by OS/XBMC. I that case you will have more options.

PVR plugin can show umlauts in locally stored files? Did you check those files from OS?
I think they are stored in UTF-8 (as was created by XBMC).
Reply
#18
(2014-01-18, 11:36)Karlson2k Wrote: My PM is enabled and open. Smile

I'll try again, but i gae it a couple of tries after uNiversal told me to, and the forum software said your name is not know... *sigh*

Quote:All XBMC code is designed to process all strings as UTF-8 internally, converting from input and to output encoding when necessary. All GUI texts are converted to widestrings, on Win32 input is converted from wchar_t.
...more good explanation...

Thanks for the explanation, thats what i thought to figure after reading a bit of the code, but i could for the heck of it not find anything of what you wrote in any XBMC user documentation. That IMHO is the first thing that needs to be fixed.

Secondly, i think XBMC could easily alert the user that it will not support the display of local filenames correctly as soon as it encounters a non-UTF charset setting in the system, eg: via the usual system settings (LC_*, LANG, LANGUAGE,...).

This POSIX selection of charsets is highly confusing to me (even outside of XBMC ;-). I always thought that something like de_DE or us_US indicate ISO8859 in POSIX, and that de_DE.UTF8 are needed to have UTF8 characters in the local file system. Therefore it is highly confusing in the XBMC source when the ending after de_DE is dropped from the charset.

Its also confusing to see listed in the character set selection in the internationalization menu choices like 'Western Europe (ISO)' or 'Western Europe (Windows)' - what the heck are those doing ?

I have seen some conversions to UTF-8 in the filesystem interface , eg: i think fot ftp. Why is there no such conversion for the local filesystem ? As long as all file/directory access for local files/directories in XBMC goes through that module, it shouldn't be difficult to add it there, right ? Ideally of course on a per-mountpoint basis, but alas nobody seems to do that in linux, so three is no good standard around to store the charset encoding of a linux filesystem (other than on Mac if i remember correctly).

Quote:XBMC is expecting to get UTF-8 from any POSIX system (FreeBSD, Darwin (OSX and iOS), Linux (including Android)). mount command has special options like "utf8", "iocharset", "codepage", "locale" for FAT, NTFS, cdroms and some other, try to play with it.
Yepp, i was checking that out, and stupidly enough, this does not exist for the preferred local linux filesystems like ext2/3/4. It also does not exist for -rebinding *sigh*.

Quote:PVR plugin can show umlauts in locally stored files? Did you check those files from OS?
I think they are stored in UTF-8 (as was created by XBMC).

I've got VDR and XBMC running on the same system via VNSI plugin. Works fine with the umlauts there. Pretty sure that the VDR plugin site converts to UTF-8, if not the basic VDR itself. So.. given how the OS itself doesn't do a good job converting to UTF8 for local filesystems, it would really be lovely if XBMC could do it as well as other apps/plugins ;-))

But better doc of existing functionality would definitely be more important. And some alert when encountering a non-UTF linux system when accssing local files with non-ACII7 characters would be lovely too. Took me quite a while to get up to the point of this email, so would be good if others wouldn't have to run through this exercise.
Reply
#19
(2014-01-18, 13:22)te36 Wrote:
(2014-01-18, 11:36)Karlson2k Wrote: My PM is enabled and open. Smile
I'll try again, but i gae it a couple of tries after uNiversal told me to, and the forum software said your name is not know... *sigh*
Press "Members" in the top of page and search for user. After user is found, press "Send private message".
This search is case-insensitive. Address in PM is case sensitive.

Or just press "PM" button on any of post of the user.
(2014-01-18, 13:22)te36 Wrote: Thanks for the explanation, thats what i thought to figure after reading a bit of the code, but i could for the heck of it not find anything of what you wrote in any XBMC user documentation. That IMHO is the first thing that needs to be fixed.
XBMC is open source project, driven by enthusiasts. You can add this information to Wiki if you feel that this is important.

(2014-01-18, 13:22)te36 Wrote: Secondly, i think XBMC could easily alert the user that it will not support the display of local filenames correctly as soon as it encounters a non-UTF charset setting in the system, eg: via the usual system settings (LC_*, LANG, LANGUAGE,...).
Not all system with UTF-8 filenames use ".UTF-8" suffix in locale setting, so this could lead to many false alarms. Moreover, I think in most cases it will be false alams.
(2014-01-18, 13:22)te36 Wrote: This POSIX selection of charsets is highly confusing to me (even outside of XBMC ;-). I always thought that something like de_DE or us_US indicate ISO8859 in POSIX, and that de_DE.UTF8 are needed to have UTF8 characters in the local file system. Therefore it is highly confusing in the XBMC source when the ending after de_DE is dropped from the charset.
Linux use filenames as is, without any conversion. Locale information is only for application. The OS do nothing with it.

(2014-01-18, 13:22)te36 Wrote: Its also confusing to see listed in the character set selection in the internationalization menu choices like 'Western Europe (ISO)' or 'Western Europe (Windows)' - what the heck are those doing ?
This is defined in language file. Used as fallback encoding for subtitles, parsing HTML from web sites etc.

(2014-01-18, 13:22)te36 Wrote: I have seen some conversions to UTF-8 in the filesystem interface , eg: i think fot ftp. Why is there no such conversion for the local filesystem ? As long as all file/directory access for local files/directories in XBMC goes through that module, it shouldn't be difficult to add it there, right ? Ideally of course on a per-mountpoint basis, but alas nobody seems to do that in linux, so three is no good standard around to store the charset encoding of a linux filesystem (other than on Mac if i remember correctly).
Seems that there is a very little or no demand for it (everybody use UTF-8), so nobody wants to implement it.
And it's really difficult as even after the years it's not ideal for win32 (where conversion is forced).
Reply
#20
Thanks Karlsson.

I just went through HDDFile.cpp and HDDDirectory.cpp and added code to convert between SystemCharset and UTF8. Piece of cake to find the right places in the code BECAUSE THOSE ARE ALL THE PLACES WHERE A SIMILAR CONVERSION IS DONE FOR WINDOWS ;-))).

Still wasn't doing anything. Turns out that XBMC is calling iconv and sets SystemCharset to "" in CharsetConverter.cpp:CConverterType::ResolveSpecialCharset, and any conversion from/to "" doesn't seem to be doing anything. So it seems to me that any CharsetConverter calls From/To System was so far a NOP. But there didn't seem to be much use of SystemCharset in the source anyhow. Are you aware of any actual functionality of SystemCharset on any system (maybe non-linux, non-windows ?).

I think the right solution would be to have a GUI config option to set the System Charset, but i didn't want to hack in that into code, so i just take SystemCharset from environment now.

On my linux system, with the diff below, nothing change unless you set environment variable XBMC_SYSTEM_CHARSET to eg: ISO-8859-15 before starting XBMC, and with that it my german umlauts work fine now.

Any chance to get a diff like this committed ? Given how it seems to be NOP unless you explicitly set the environment variable, i'd hope the risk is fairly low.

Cheers
-----------
diff -cr ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/filesystem/HDDirectory.cpp ./xbmc/filesystem/HDDirectory.cpp
*** ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/filesystem/HDDirectory.cpp 2014-01-19 18:54:22.000000000 +0100
--- ./xbmc/filesystem/HDDirectory.cpp 2014-01-20 22:06:08.315295000 +0100
***************
*** 20,25 ****
--- 20,26 ----

#include "HDDirectory.h"
#include "Util.h"
+ #include "utils/CharsetConverter.h"
#include "iso9660.h"
#include "URL.h"
#include "FileItem.h"
***************
*** 81,86 ****
--- 82,88 ----
strSearchMask += L"*.*";
#else
CStdString strSearchMask(strRoot);
+ g_charsetConverter.utf8ToSystem(strSearchMask);
#endif

FILETIME localTime;
***************
*** 100,106 ****
#ifdef TARGET_WINDOWS
g_charsetConverter.wToUTF8(wfd.cFileName,strLabel, true);
#else
! strLabel = wfd.cFileName;
#endif
if ( (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) )
{
--- 102,109 ----
#ifdef TARGET_WINDOWS
g_charsetConverter.wToUTF8(wfd.cFileName,strLabel, true);
#else
! // strLabel = wfd.cFileName;
! g_charsetConverter.systemToUtf8(wfd.cFileName, strLabel);
#endif
if ( (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) )
{
***************
*** 152,157 ****
--- 155,161 ----
return Exists(strPath); // A drive - we can't "create" a drive
if(::CreateDirectoryW(CWIN32Util::ConvertPathToWin32Form(strPath1).c_str(), NULL))
#else
+ g_charsetConverter.utf8ToSystem(strPath1);
if(::CreateDirectory(strPath1.c_str(), NULL))
#endif
return true;
***************
*** 163,185 ****

bool CHDDirectory::Remove(const char* strPath)
{
if (!strPath || !*strPath)
return false;
#ifdef TARGET_WINDOWS
return (::RemoveDirectoryW(CWIN32Util::ConvertPathToWin32Form(strPath).c_str()) || GetLastError() == ERROR_PATH_NOT_FOUND) ? true : false;
#else
! return ::RemoveDirectory(strPath) ? true : false;
#endif
}

bool CHDDirectory::Exists(const char* strPath)
{
if (!strPath || !*strPath)
return false;
#ifdef TARGET_WINDOWS
DWORD attributes = GetFileAttributesW(CWIN32Util::ConvertPathToWin32Form(strPath).c_str());
#else
! DWORD attributes = GetFileAttributes(strPath);
#endif
if(attributes == INVALID_FILE_ATTRIBUTES)
return false;
--- 167,195 ----

bool CHDDirectory::Remove(const char* strPath)
{
+ CStdString strPath1 = strPath;
+
if (!strPath || !*strPath)
return false;
#ifdef TARGET_WINDOWS
return (::RemoveDirectoryW(CWIN32Util::ConvertPathToWin32Form(strPath).c_str()) || GetLastError() == ERROR_PATH_NOT_FOUND) ? true : false;
#else
! g_charsetConverter.utf8ToSystem(strPath1);
! return ::RemoveDirectory(strPath1) ? true : false;
#endif
}

bool CHDDirectory::Exists(const char* strPath)
{
+ CStdString strPath1 = strPath;
+
if (!strPath || !*strPath)
return false;
#ifdef TARGET_WINDOWS
DWORD attributes = GetFileAttributesW(CWIN32Util::ConvertPathToWin32Form(strPath).c_str());
#else
! g_charsetConverter.utf8ToSystem(strPath1);
! DWORD attributes = GetFileAttributes(strPath1);
#endif
if(attributes == INVALID_FILE_ATTRIBUTES)
return false;
diff -cr ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/filesystem/HDFile.cpp ./xbmc/filesystem/HDFile.cpp
*** ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/filesystem/HDFile.cpp 2014-01-19 18:54:22.000000000 +0100
--- ./xbmc/filesystem/HDFile.cpp 2014-01-20 23:59:40.923295000 +0100
***************
*** 23,28 ****
--- 23,29 ----
#include "system.h"
#include "HDFile.h"
#include "Util.h"
+ #include "utils/CharsetConverter.h"
#include "URL.h"
#include "utils/AliasShortcutUtils.h"
#ifdef TARGET_POSIX
***************
*** 94,99 ****
--- 95,101 ----
#ifdef TARGET_WINDOWS
m_hFile.attach(CreateFileW(CWIN32Util::ConvertPathToWin32Form(strFile).c_str(), GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL));
#else
+ g_charsetConverter.utf8ToSystem(strFile);
m_hFile.attach(CreateFile(strFile.c_str(), GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL));
#endif
if (!m_hFile.isValid()) return false;
***************
*** 117,122 ****
--- 119,125 ----
return true;
#else
struct __stat64 buffer;
+ g_charsetConverter.utf8ToSystem(strFile);
return (_stat64(strFile.c_str(), &buffer)==0);
#endif
}
***************
*** 164,169 ****
--- 167,173 ----
strWFile.pop_back();
return _wstat64(strWFile.c_str(), buffer);
#else
+ g_charsetConverter.utf8ToSystem(strFile);
return _stat64(strFile.c_str(), buffer);
#endif
}
***************
*** 187,192 ****
--- 191,197 ----
#ifdef TARGET_WINDOWS
m_hFile.attach(CreateFileW(CWIN32Util::ConvertPathToWin32Form(strPath).c_str(), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, bOverWrite ? CREATE_ALWAYS : OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL));
#else
+ g_charsetConverter.utf8ToSystem(strPath);
m_hFile.attach(CreateFile(strPath.c_str(), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, bOverWrite ? CREATE_ALWAYS : OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL));
#endif
if (!m_hFile.isValid())
***************
*** 310,315 ****
--- 315,321 ----
#ifdef TARGET_WINDOWS
return :Big GrineleteFileW(CWIN32Util::ConvertPathToWin32Form(strFile).c_str()) ? true : false;
#else
+ g_charsetConverter.utf8ToSystem(strFile);
return :Big GrineleteFile(strFile.c_str()) ? true : false;
#endif
}
***************
*** 322,327 ****
--- 328,335 ----
#ifdef TARGET_WINDOWS
return ::MoveFileW(CWIN32Util::ConvertPathToWin32Form(strFile).c_str(), CWIN32Util::ConvertPathToWin32Form(strNewFile).c_str()) ? true : false;
#else
+ g_charsetConverter.utf8ToSystem(strFile);
+ g_charsetConverter.utf8ToSystem(strNewFile);
return ::MoveFile(strFile.c_str(), strNewFile.c_str()) ? true : false;
#endif
}
diff -cr ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/utils/CharsetConverter.cpp ./xbmc/utils/CharsetConverter.cpp
*** ../../../../my-portage/media-tv/xbmc-9999/work/xbmc/utils/CharsetConverter.cpp 2014-01-19 18:54:22.000000000 +0100
--- ./xbmc/utils/CharsetConverter.cpp 2014-01-21 00:00:50.203540000 +0100
***************
*** 244,250 ****
switch (charset)
{
case SystemCharset:
! return "";
case UserCharset:
return g_langInfo.GetGuiCharSet();
case SubtitleCharset:
--- 244,254 ----
switch (charset)
{
case SystemCharset:
! {
! CStdString locale;
! locale = getenv("XBMC_SYSTEM_CHARSET");
! return locale ? locale : "";
! }
case UserCharset:
return g_langInfo.GetGuiCharSet();
case SubtitleCharset:
Reply
#21
(2014-01-21, 01:39)te36 Wrote: I just went through HDDFile.cpp and HDDDirectory.cpp and added code to convert between SystemCharset and UTF8. Piece of cake to find the right places in the code BECAUSE THOSE ARE ALL THE PLACES WHERE A SIMILAR CONVERSION IS DONE FOR WINDOWS ;-))).
It's not the only place where conversion is required. A lot of code use filesystem directly, without XBMC's VFS. There are some POSIX-specific code with filesystem access (without Win32 counterpart).
Moreover, you'll need to convert environment variables to UTF-8 in this case. And take care of environment variables for Python, some of them may don't need such conversion.
I'm sure that other aspects must be covered too.
(2014-01-21, 01:39)te36 Wrote: Still wasn't doing anything. Turns out that XBMC is calling iconv and sets SystemCharset to "" in CharsetConverter.cpp:CConverterType::ResolveSpecialCharset, and any conversion from/to "" doesn't seem to be doing anything. So it seems to me that any CharsetConverter calls From/To System was so far a NOP. But there didn't seem to be much use of SystemCharset in the source anyhow. Are you aware of any actual functionality of SystemCharset on any system (maybe non-linux, non-windows ?).
It's very simple: empty charset equals "char" charset (at least for GNU libiconv which is used for XBMC). "char" charset means "current system charset". Check libconv documentation.
(2014-01-21, 01:39)te36 Wrote: I think the right solution would be to have a GUI config option to set the System Charset, but i didn't want to hack in that into code, so i just take SystemCharset from environment now.
It should already depend on environment LANG/LC_ variables. See code in libiconv sources/libcharset/lib/localcharset.c
(2014-01-21, 01:39)te36 Wrote: On my linux system, with the diff below, nothing change unless you set environment variable XBMC_SYSTEM_CHARSET to eg: ISO-8859-15 before starting XBMC, and with that it my german umlauts work fine now.

Any chance to get a diff like this committed ? Given how it seems to be NOP unless you explicitly set the environment variable, i'd hope the risk is fairly low.
No chance for Gotham as we are in Feature Freeze stage.
Using custom environment variable as charset source can be used for tests, but don't think that it will be accepted for master. For such feature you need to implement GUI-part or it will be not accessible by endusers.
I'd suggest you to read Wki page about patch submitting: HOW-TO_submit_a_patch (wiki)
Reply
#22
(2014-01-21, 10:27)Karlson2k Wrote: It's not the only place where conversion is required. A lot of code use filesystem directly, without XBMC's VFS. There are some POSIX-specific code with filesystem access (without Win32 counterpart).
Any example ? Do you think there are other places where GUI visible filenames are affected ? In other words: If for example XBMC has its internal library somewhere on disk and uses filenames there derived from actual media filenames, then i think it's not necessary to convert those library-filenames. They can be happily on the disk in UTF-8. After all, you told me that the charset semantic of filenames on disk is up to the application.
Quote:Moreover, you'll need to convert environment variables to UTF-8 in this case. And take care of environment variables for Python, some of them may don't need such conversion.
I'm sure that other aspects must be covered too.
Not quite persuaded. As i said, some example would be good. I think one only needs to support locale character sets for user-visible media filenames on local disk. Ideally, as i said, this shuold even be a property of just the particular XBMC local disk mountpoint, but not sure yet how to do that.

Quote:It's very simple: empty charset equals "char" charset (at least for GNU libiconv which is used for XBMC). "char" charset means "current system charset". Check libconv documentation.
You'd think. Indeed, if i call "iconv -f '' -t 'UTF-8' <file>" it does work. Just XBMC didn't do anything in this case *sigh*.

Quote:It should already depend on environment LANG/LC_ variables. See code in libiconv sources/libcharset/lib/localcharset.c

Not really. iconv(3) is in libc, and that directory there looks totally different. But both the iconv(1) program and xbmc use the same library and xbmc doesn't work witt "". Go figure *sigh*.

Quote:No chance for Gotham as we are in Feature Freeze stage.
Sure, wasn't thinking of gotham.
Quote:Using custom environment variable as charset source can be used for tests, but don't think that it will be accepted for master. For such feature you need to implement GUI-part or it will be not accessible by endusers.
Definitely. If it's just adding a GUI parameter, that wouldn't be too bad, but if one has to go through skin code, that would suck. And given that you expect for "" to do something, it's probably better to not rely on SysrtemCharset as the parameter, such that there really is no risk for it to do something unexpected.

Quote:I'd suggest you to read Wki page about patch submitting: HOW-TO_submit_a_patch (wiki)

Sure. Probably better to first justify by finding other folks who'd like to have this, otherwise i can always run it as a local patch. Is there a feature rquest tracking mechanism ?

Thanks
Reply
#23
(2014-01-21, 12:08)te36 Wrote: Any example ? Do you think there are other places where GUI visible filenames are affected ? In other words: If for example XBMC has its internal library somewhere on disk and uses filenames there derived from actual media filenames, then i think it's not necessary to convert those library-filenames. They can be happily on the disk in UTF-8. After all, you told me that the charset semantic of filenames on disk is up to the application.
String processing expect UTF-8.
You can't have some strings in ISO encoding and other in UTF-8. This will produce a lot of errors for sure. Doesn't matter are names visible in GUI or not.
(2014-01-21, 12:08)te36 Wrote: Not quite persuaded. As i said, some example would be good. I think one only needs to support locale character sets for user-visible media filenames on local disk. Ideally, as i said, this shuold even be a property of just the particular XBMC local disk mountpoint, but not sure yet how to do that.
Simplest example - XBMC_HOME variable. Similar - PYTHONPATH and PYTHONHOME variables.
Don't ask me for more - I'm mostly win32 developer. Wink
(2014-01-21, 12:08)te36 Wrote: You'd think. Indeed, if i call "iconv -f '' -t 'UTF-8' <file>" it does work. Just XBMC didn't do anything in this case *sigh*.

Not really. iconv(3) is in libc, and that directory there looks totally different. But both the iconv(1) program and xbmc use the same library and xbmc doesn't work witt "". Go figure *sigh*.
XBMC use GNU libiconv, not libc iconv.
You can build depends by included CMake files.

(2014-01-21, 12:08)te36 Wrote: Sure. Probably better to first justify by finding other folks who'd like to have this, otherwise i can always run it as a local patch. Is there a feature rquest tracking mechanism ?

Thanks
Forum itself is request tracking. Wink
Seriously, forum is the right place for feature requests. You can see how popular any particular request.
Reply
#24
[quote='Karlson2k' pid='1606581' dateline='1390325383']
[quote]
String processing expect UTF-8.
You can't have some strings in ISO encoding and other in UTF-8. This will produce a lot of errors for sure. Doesn't matter are names visible in GUI or not.
[/quote]
Sure, all strings in XBMC are UTF8. I/O modules convert that UTF-8 over to some external format if necessary. Some filesystem modules like Rar etc. where already doing this. With my patch, the HD filesystem also does this.

The only confusion that could happen is if one software piece in XBMC access a file on a disk via the HDfile/directory interface, another software piece in XBMC accesses the same file via a different interface and they both expect to look at the same file/filename. Then there will be confusion in XBMC. I think it would be a bug if two interfaces are used to access the same files.

If some other interface other than HDfile/directory is used to access other files, eg: internal databases, thats IMHO fine and will not lead to confusion.

[quote] Simplest example - XBMC_HOME variable. Similar - PYTHONPATH and PYTHONHOME variables.
Don't ask me for more - I'm mostly win32 developer. Wink
[/quote]

Well, if those pathnames are in XBMC using '/' as the separator and UTF-8, then XBMC needs to convert them from windows-file-notation too, right ? And in that case it should be esy to find those places and make sure the same conversion happens on posix systems.

I just think that there is a fairly clear distnction between media file names and software path-names. XBMC_HOME and PYTHONHOME belong to software path names, so if those software patnames expect that the pathnames on disk are UTF-8, i think that's an acceptable limitation.

[quote]
XBMC use GNU libiconv, not libc iconv.
You can build depends by included CMake files.
[/quote]

Yeah. Not sure if i want to figure it out. I am just thinking that the safest way to introduce the patch is to change it such that a SystemCharset of "" would actually cause that there is no charset conversion invoked, and only when oyu explicitly set the SystemCharset would there be the conversion routines called. This wold not change the behavior as i see it, but if there really are systems where XBMC with "" would do something useful in iconv, then those ssytems would still need to configure the SystemCharset explicity in the GUI.

[quote]
Forum itself is request tracking. Wink
Seriously, forum is the right place for feature requests. You can see how popular any particular request.
[/quote]

Go you. Yeah, i think that non-US characters is primarily an issue for folks that may not be all that well tuned to english forums, but i admit, i'm probably a dying breed sticking to the ISO8859 encoding of filenames ;-))

Thanks for all your answers.
Reply
#25
(2014-01-21, 20:06)te36 Wrote:
(2014-01-21, 19:29)Karlson2k Wrote: String processing expect UTF-8.
You can't have some strings in ISO encoding and other in UTF-8. This will produce a lot of errors for sure. Doesn't matter are names visible in GUI or not.
Sure, all strings in XBMC are UTF8. I/O modules convert that UTF-8 over to some external format if necessary. Some filesystem modules like Rar etc. where already doing this. With my patch, the HD filesystem also does this.

The only confusion that could happen is if one software piece in XBMC access a file on a disk via the HDfile/directory interface, another software piece in XBMC accesses the same file via a different interface and they both expect to look at the same file/filename. Then there will be confusion in XBMC. I think it would be a bug if two interfaces are used to access the same files.

If some other interface other than HDfile/directory is used to access other files, eg: internal databases, thats IMHO fine and will not lead to confusion.
Same piece of code can be used to display GUI picture and photo for library. Just simple example.
All XBMC code must use UTF-8 for strings.
All file access should be go through VFS, but due to historical reasons, lack of volunteers and willful optimization not every file read/write/check go through VFS.

If you wish to implement support for non-UTF-8 filenames, you'll need to implement it properly, for every filesystem access. Otherwise it will introduce new bugs, that can be hardly detected. I don't thinks that halfway solution will be accepted.

(2014-01-21, 20:06)te36 Wrote:
Quote: Simplest example - XBMC_HOME variable. Similar - PYTHONPATH and PYTHONHOME variables.
Don't ask me for more - I'm mostly win32 developer. Wink
Well, if those pathnames are in XBMC using '/' as the separator and UTF-8, then XBMC needs to convert them from windows-file-notation too, right ? And in that case it should be esy to find those places and make sure the same conversion happens on posix systems.
Not exactly. As I said there are POSIX-only code, so you'll need to check it manually.
(2014-01-21, 20:06)te36 Wrote:
Quote:XBMC use GNU libiconv, not libc iconv.
You can build depends by included CMake files.
Yeah. Not sure if i want to figure it out. I am just thinking that the safest way to introduce the patch is to change it such that a SystemCharset of "" would actually cause that there is no charset conversion invoked, and only when oyu explicitly set the SystemCharset would there be the conversion routines called. This wold not change the behavior as i see it, but if there really are systems where XBMC with "" would do something useful in iconv, then those ssytems would still need to configure the SystemCharset explicity in the GUI.
Just try to replace "" with "char" in CharsetConverter.
And remember to set correct GUI language and region, as XBMC reset location according to GUI settings.
(2014-01-21, 20:06)te36 Wrote:
Quote:Forum itself is request tracking. Wink
Seriously, forum is the right place for feature requests. You can see how popular any particular request.
Go you. Yeah, i think that non-US characters is primarily an issue for folks that may not be all that well tuned to english forums, but i admit, i'm probably a dying breed sticking to the ISO8859 encoding of filenames ;-))
I didn't see any single request for non-UTF-8 filenames on Russian XBMC forum.
Isn't it simpler to migrate to UTF-8 and throw out all this headache? Wink
Reply

Logout Mark Read Team Forum Stats Members Help
XBMC, no German umlauts on locale de_DE.UTF-8 -ISO8859-15- filenames.0