Umlaut parsing problem (Tag reader bug?)
#1
Sorry for crossposting this, but after posting it in the scraper thread yesterday I've started to think that perhaps this isn't a scraper issue after all but a tag reader issue:

I'm having a curious problem with umlauts. Certain umlauts are not parsed correctly while others are, and in some cases the same identical umlaut is parsed correctly in one place and not in another.

Code:
01:09:58 T:4684   DEBUG: ADDON::CScraper::FindAlbum: Searching for 'Motörhead - March �r Die' using Universal Album Scraper scraper (path: 'D:\Static\_HTPC\XBMC_SVN\portable_data\addons\metadata.album.universal', content: 'albums', version: '1.3.3')
01:09:58 T:4684   DEBUG: scraper: CreateAlbumSearchUrl returned <url>http://search.musicbrainz.org/ws/2/release/?fmt=xml&query=release:March%20%f6r%20Die%20AND%20artist:Mot%c3%b6rhead</url>
01:09:58 T:4684   DEBUG: CurlFile::Open(08068978) http://search.musicbrainz.org/ws/2/release/?fmt=xml&query=release:March%20%f6r%20Die%20AND%20artist:Mot%c3%b6rhead

'Motörhead - March ör Die' is corrupted into 'Motörhead - March �r Die'. Notice how the umlaut 'ö' in the band name is correct, but the same umlaut gets corrupted in the album name. The incorrect strings get stored in the database. I'm using a nightly build and I know there have been some changes to the tag readers in the last few days so maybe this is related to those changes?
Reply
#2
It seems the problem is limited to Flac files with Flac tags. Converting the above file to .mp3 made the umlauts parse correctly.
Reply
#3
See the notes here:

https://github.com/xbmc/xbmc/pull/1122

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#4
Just a user, but my impression is when there is black diamond with question mark it is typically an encoding of unicode codepoint issue. My understanding is that MP3 tag (ID3v2) has r3 which allows either ANSI or UTF-16 encodings while r4 allows UTF-8 encoding. I believe a bit is set for ANSI/UTF-16 flag in r3 at least. For FLAC Vorbis Comment I believe UTF-8 encoding is mandatory.

I would probably look at those tags with a hex editor or do a copy/paste into something like BabelPad and verify if the proper unicode encoding has been used in your tags.

I guess jmarshall's link gives some insight into how XBMC is doing the UTF reading.

I don't know German, but I'm guessing the character you want is U+00F6 known as LATIN SMALL LETTER O WITH DIAERESIS in unicode-speak.

I'm pretty sure in MP3 file the character code would look like F6 00 (UTF-16 LE) but in the FLAC it would be C3 B6 (UTF-8).

scott s.
.
Reply
#5
My link gives the proposed fix. There's nothing wrong with the tags. XBMC is specifically requesting some tags (any that aren't in a list basically) as Latin1 when they should be being requested as utf8.

I'd fix it myself if I didn't have a bunch of other stuff to do - needless to say, it'll be fixed as soon as night199uk has a spare few minutes.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
(2012-10-02, 03:32)jmarshall Wrote: My link gives the proposed fix. There's nothing wrong with the tags. XBMC is specifically requesting some tags (any that aren't in a list basically) as Latin1 when they should be being requested as utf8.

I'd fix it myself if I didn't have a bunch of other stuff to do - needless to say, it'll be fixed as soon as night199uk has a spare few minutes.

Got it. (Amazing turn around time!)

scott s.
.
Reply
#7
night199uk expected to have a fix saturday morning. I hope he hasn't forgotten cause it's still broken.
Reply
#8
I believe he's moving house. It's certainly not forgotten Smile
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#9
Guys let that man live...
AppleTV4/iPhone/iPod/iPad: HowTo find debug logs and everything else which the devs like so much: click here
HowTo setup NFS for Kodi: NFS (wiki)
HowTo configure avahi (zeroconf): Avahi_Zeroconf (wiki)
READ THE IOS FAQ!: iOS FAQ (wiki)
Reply
#10
For how long? Smile
Reply
#11
https://github.com/xbmc/xbmc/commit/0272...87f45b5789
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#12
Much appreciated, my music library has been a mess for a few weeks.
Reply
#13
That fix is covering music files "only"? I receive a similar problem with current nightlies when rescanning movies into library from previous eden export. In some cases umlauts work, in some cases they dont and look like foreign symbols. No matter if movie title or plot.

Guess it has to do with the nfo file containing some ascii arts from "scene", but that has been no problem when importing in Eden
Reply

Logout Mark Read Team Forum Stats Members Help
Umlaut parsing problem (Tag reader bug?)0