State of the translation

  Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #46
viljoviitanen Wrote:And I'd be willing to try to go the "native transifex" way, i.e. using directly the api. Actually I'm just gonna try it, cos I like tinkering with apis Smile

Please let me know your results with the GUI. I had some glitches with both versions I tried. I am sure we need to tweak the installation and settings, but documentation is not too detailed. I guess because of their not free support service :-) I am sure it can be set up correctly as at transifax.net with the paid service they are using the verison 1.2.1 with no problem: https://www.transifex.net/projects/p/100...l/hu/view/

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #47
alanwww1 Wrote:Please let me know your results with the GUI. I had some glitches with both versions I tried. I am sure we need to tweak the installation and settings, but documentation is not too detailed. I guess because of their not free support service :-) I am sure it can be set up correctly as at transifax.net with the paid service they are using the verison 1.2.1 with no problem: https://www.transifex.net/projects/p/100...l/hu/view/

I too have some problems with the installation, there are some missing files. Also I have other problems with it.

I'll email you the details of installation... let's continue private for a while.
(This post was last modified: 2012-01-30 00:03 by viljoviitanen.)
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #48
Ok, I've been working with the "hosted" transifex at

https://www.transifex.net/projects/p/mytest55/

It's now looking quite good, I could add all translations with the api but I only did a few just to show what's possible.

The translations strings in that test project are not from current git, but from some old version I had on my hard disk.

The tool to convert XBMC strings.xml to "transifex compilant" xliff is at https://github.com/viljoviitanen/Simple-...texliff.py

The "catch" was to realise transifex uses internally a hash of source string and "context". To see how that works, search e.g. for "programs" at any translation.

Updating/uploading a translation can be easily done with curl, here I uploaded swedish core translation:

Code:
curl -F file=@core-se.xlf -i -L --user username:password -X PUT https://www.transifex.net/api/2/project/mytest55/resource/core_3/translation/sv/
This could be bound to a github webhook that checks if a translation has been updated there and push the change immediately to transifex. Here we'll need a mapping of all possible xbmc languages to transifex language codes, e.g. Swedish->sv.

Also, a change in transifex can be detected via a webhook and then the updated file could be pushed immediately to git (we'll need a reverse of the previous map).

I still don't know how the translation memory thing in transifex works... and the also should support for a "glossary" where xbmc specific terms can be collected from core translation to benefit addon translations.

http://help.transifex.net/intro/translat...ion-memory
find quote
takoi Offline
Team-Kodi Member
Posts: 788
Joined: Oct 2009
Reputation: 12
Location: Norway
Post: #49
I think you need to "own" the translations. Then you can choose to share, but only among the projects you own. It's seems very limited..

(go to your dashboard->translation memory and you can see it there)
(This post was last modified: 2012-01-31 01:26 by takoi.)
find quote
queeup Offline
Posting Freak
Posts: 763
Joined: Feb 2009
Reputation: 16
Post: #50
Very promising news. Good work.

I wonder anybody from XBMC developers tried to contact with Transifex devels? Maybe they can offer help, write a code for us or even they can offer to XBMC free project space? Good things happen between one open source community to other Smile
(This post was last modified: 2012-01-31 02:17 by queeup.)
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #51
queeup Wrote:Very promising news. Good work.

I wonder anybody from XBMC developers tried to contact with Transifex devels? Maybe they can offer help, write a code for us or even they can offer to XBMC free project space? Good things happen between one open source community to other Smile

It seems transifex is already free for open source projects (with public translations). Also, I think my current idea of doing updates both ways with xliff is "good enough" too, so no need for further coding at transifex. Xliff files map 1:1 with xbmc strings.xml files (even when the real context hints are added) so no problem there. And if/when xbmc updates its localization system, changing transifex to use those files won't be a problem.

I'll check out xbmc git and make an "initial import" of all core and confluence language strings to transifex later today so we'll see what it would look like for real... Maybe I'll add some addons which are in the xbmc git repo as well, so hopefully the translation memory can be tested too.

In transifex dashboard it says "Be aware that if you have only one project, the strings of its resources are already shared among them." - so it should "just work". But I have no idea how...
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #52
viljoviitanen Wrote:It seems transifex is already free for open source projects (with public translations). Also, I think my current idea of doing updates both ways with xliff is "good enough" too, so no need for further coding at transifex. Xliff files map 1:1 with xbmc strings.xml files (even when the real context hints are added) so no problem there. And if/when xbmc updates its localization system, changing transifex to use those files won't be a problem.

I'll check out xbmc git and make an "initial import" of all core and confluence language strings to transifex later today so we'll see what it would look like for real... Maybe I'll add some addons which are in the xbmc git repo as well, so hopefully the translation memory can be tested too.

In transifex dashboard it says "Be aware that if you have only one project, the strings of its resources are already shared among them." - so it should "just work". But I have no idea how...

Well, in the end I only put some languages there, but it was all done via the api with a quick and dirty shell script, so adding other languages and addons is pretty simple:

https://github.com/viljoviitanen/Simple-...otstrap.sh

(and as you see, I didn't check out git, but fetch the language strings directly..)

I also put in there "Sample Context" as the "context hint" anticipating the future context hint strings so you can see how it'll look like then the thing is finished.
find quote
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #53
Wow ! This is getting there :-)

A few things I noticed:

- We have 1351 strings in Transifex for Hun and Finnish. In the XML we have much more. The Ids are only there up to 16022. Is it a limitation of the API or curl or why the rest did not go in ?

- When we initially create the language, I think we should always use the base English strings in the version when the translated string was translated. This way the translator not only can see the missing ones, but also the changed ones compared to the fresh git version. It will be a PITA first but I think worth it. I will help of course to figure out all git versions for each language.

- I still believe (and that is the wish of the XBMC team) to have some kind of supervision control for the native language team members over what is going into trunk or what should be revised etc. Therefore I think it is necessary to maintain one git repository just for all the language files. After revision from other team members, here the language files will be pulled in by me manually. This repo is also useful to control what is actually should come to public translation. I mean there are some envelopment tasks where the strings change frequently until they reach a more less stable state. That is when I wish to push these strings into the intermediate git repo. This repo is also quite useful to maintain the addon strings coming from various sources from git, svn, to normal zip. I undertake to maintain this and cooperate with addon developers. As I read Transifex has an automatic pull function from GIT. So might be that we don't need to use the API for language file update. Only maybe fetching down the translated files.

Could you please upload the used xliff files for review ?

Anyway things looking very promising. Thanks for the help !

ps.: Do you know where can the gui language of Transifex itself can be set ? When I tested it running myself it had an option to change, but on transifex.net I don't see it.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
(This post was last modified: 2012-02-01 00:53 by alanwww1.)
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #54
alanwww1 Wrote:Wow ! This is getting there :-)

A few things I noticed:

- We have 1351 strings in Transifex for Hun and Finnish. In the XML we have much more. The Ids are only there up to 16022. Is it a limitation of the API or curl or why the rest did not go in ?

- When we initially create the language, I think we should always use the base English strings in the version when the translated string was translated. This way the translator not only can see the missing ones, but also the changed ones compared to the fresh git version. It will be a PITA first but I think worth it. I will help of course to figure out all git versions for each language.

- I still believe (and that is the wish of the XBMC team) to have some kind of supervision control for the native language team members over what is going into trunk or what should be revised etc. Therefore I think it is necessary to maintain one git repository just for all the language files. After revision from other team members, here the language files will be pulled in by me manually. This repo is also useful to control what is actually should come to public translation. I mean there are some envelopment tasks where the strings change frequently until they reach a more less stable state. That is when I wish to push these strings into the intermediate git repo. This repo is also quite useful to maintain the addon strings coming from various sources from git, svn, to normal zip. I undertake to maintain this and cooperate with addon developers. As I read Transifex has an automatic pull function from GIT. So might be that we don't need to use the API for language file update. Only maybe fetching down the translated files.

Could you please upload the used xliff files for review ?

Anyway things looking very promising. Thanks for the help !

ps.: Do you know where can the gui language of Transifex itself can be set ? When I tested it running myself it had an option to change, but on transifex.net I don't see it.

I got the missing strings thing, transifex stops processing when the source string is empty.

Last string in transifex core translation is Bob (inverted), and:

<string id="16022">Bob (inverted)</string>
<string id="16023"></string>

You can recreate the xliff files with the shell script easily from any unix machine which has bash, wget and python.. so almost any Smile

Anyway I fixed the xliff generator not to put empty strings in the xliff files, now all the non-empty strings are there. I deleted the previous translations from transifex and uploaded new strings there with the script. I also updated my github repo with the changes.

...

About control and intermediate repos:

You're absolutely right, an intermediate repo is probably the best thing to do.

But the automatic pull from transifex can be done if we host xliff files in the "intermediate" repo. But even then it's only for source files. So it's really not much use, it's better to do the integration via the api anyway and store xbmc string.xml files in the intermediate repo, I think. We'll want to keep integration simple between the intermediate repo and other repos, and do the converting between transifex and intermediate repo.

Actually an option would be to keep "control" in transifex, but when addon developers (hopefully) start using this thing, an intermediate repo probably is the best thing to do, with direct two-way transifex integration and then from there controlled pulls to main repos.

Perhaps the intermediate repo could also automatically pull any changes from main repos, so any changes there would be automatically pushed to transifex, as probably a portion of the translation work will still be done the old way. And at least the source files would need automatic updates to the intermediate repo anyway.
find quote
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #55
Looks ok now. A few thoughts:

- Would be good to note in the translated/returned/converted strings.xml file the revision or date of the original English file it is based on. For this we could use a header entry in the source xlif file eg. like this:

Code:
<header>
  <reference>3978ce61ee897fab5daf83c345f3a8183aae0b22</reference>
</header>

(I think "reference" is an existing possible field in the xlif form)

Don't know if that would work, but I read the export function is using the source file as a template replacing the target fields and updating the updateable fields. This way we might preserve the git revision the language file is based on.

- I realized that trans-unit id and context-type="id" has the same number stored. I guess without context-type="id" we can not handle the duplicated strings right ?

- I realized that at the details tab, Transifex shows the actual English translatable strings at two places. The second one is at string ID:

Code:
String ID: Vorbis
Description: id\: 34001:context\: Sample Context
Comment: Click here to add a comment
Occurrences: -

I was not able to find another open-to-edit project to check if this is normal.

- We will of course need a re-convert utility from xlif to xml. In this case I think we should use the latest English strings file as a template, deleted of course the comments, context hints, but preserving the empty strings, empty lines, line feeds, etc. So the format of the lang strings.xml is always in sync with the English one. Should I try to make this back convert util in C++ ?

- I will make a lot of tests now, how things are working in different scenarios. Eg. the source file changes, the language strings.xml file changes at GIT AND the Transifex stored one also changes, how this would merge or overwrite. Things like this.

Thanks for the update, cheers Attila

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
(This post was last modified: 2012-02-02 00:24 by alanwww1.)
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #56
alanwww1 Wrote:Looks ok now. A few thoughts:

- Would be good to note in the translated/returned/converted strings.xml file the revision or date of the original English file it is based on. For this we could use a header entry in the source xlif file eg. like this:

Code:
<header>
  <reference>3978ce61ee897fab5daf83c345f3a8183aae0b22</reference>
</header>

(I think "reference" is an existing possible field in the xlif form)

Don't know if that would work, but I read the export function is using the source file as a template replacing the target fields and updating the updateable fields. This way we might preserve the git revision the language file is based on.

- I realized that trans-unit id and context-type="id" has the same number stored. I guess without context-type="id" we can not handle the duplicated strings right ?

- I realized that at the details tab, Transifex shows the actual English translatable strings at two places. The second one is at string ID:

Code:
String ID: Vorbis
Description: id\: 34001:context\: Sample Context
Comment: Click here to add a comment
Occurrences: -

I was not able to find another open-to-edit project to check if this is normal.

- We will of course need a re-convert utility from xlif to xml. In this case I think we should use the latest English strings file as a template, deleted of course the comments, context hints, but preserving the empty strings, empty lines, line feeds, etc. So the format of the lang strings.xml is always in sync with the English one. Should I try to make this back convert util in C++ ?

- I will make a lot of tests now, how things are working in different scenarios. Eg. the source file changes, the language strings.xml file changes at GIT AND the Transifex stored one also changes, how this would merge or overwrite. Things like this.

Thanks for the update, cheers Attila

"reference": sure, it can be stored in transifex that way, but in intermediate repo we'll need some other way. Normally it would always be xbmc git master branch latest commit (English and other languages), and if git is updated, it's updated in intermediate repo and transifex in a matter of seconds with the web hooks. Of course we'll need some kind of periodic check in case the web hooks fail for some reason, and having the reference in intermediate repo probably helps detect failed webhooks from github direction, as it's bound to fail occassionally. This depends on 6 things: availability and bugs of 3 systems, github api, transifex api and our own web hooks (wherever they are stored. I'm thinking google appengine for now.. it's free and quite reliable and you get https connections for free). Both git and transifex store history, so any problems with the updates are probably easily fixed.

transifex "duplicate" source string: yeah, it seems that's the way it works, at least with xliff files (and well, having read the xliff import code, it IS that way, only info going in from the xliff to transifex is source, context and target, and target obviously is ignored in the source language import). If this bothers, you just probably need to learn to not mind about it Smile

convert back to xml: I don't think it's worth it doing in c++. I have almost ready python code from the other way convert, it's very little work, actually trivial, to do it the other way - depending on how clever it needs to be... so I'll have to ask: Why would we need to preserve the empty strings? As far as I see, they can be nonexisting in translations as xbmc then falls back to English, where the empty string exists. So it would be very easy to just do a "dumb" convert, it's much more work loading a template and replacing the strings and it's more error prone that way. By all means if you want to do it that way, then do it. But at least consider carefully if you want to spend hours there, where the gain is zero as far is I see.

Merging etc: you can't test it before it's done. And how it's going to work depends on how it's done Smile Source string changes of course is "basic" stuff for transifex so I'm quite sure there are no problems with that. But note: if context changes, I believe transifex treats the string as new (cos the internal hash changes), and it will need to be retranslated. I also think it could be automatically translated from translation memory (there's a setting for that), but this is something that will need to be tested.

I suggest this: the intermediate repository is always just overwritten from both transifex and xbmc git updates, and we notify manually (email) if there were conflicts (but I think detecting conflicts for normal updates in translations may be a bit difficult). The old strings are always preserved in the intermediate git history anyway. And we'll have to assume conflicts don't happen very often anyway because of the (normally) instant changes in both directions.
find quote
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #57
viljoviitanen Wrote:transifex "duplicate" source string: yeah, it seems that's the way it works, at least with xliff files (and well, having read the xliff import code, it IS that way, only info going in from the xliff to transifex is source, context and target, and target obviously is ignored in the source language import). If this bothers, you just probably need to learn to not mind about it Smile

It is not a big issue for me, just asked if it neeeds to be like that, but sure it stays like that no problem.

viljoviitanen Wrote:convert back to xml: I don't think it's worth it doing in c++. I have almost ready python code from the other way convert, it's very little work, actually trivial, to do it the other way - depending on how clever it needs to be... so I'll have to ask: Why would we need to preserve the empty strings? As far as I see, they can be non-existing in translations as xbmc then falls back to English, where the empty string exists. So it would be very easy to just do a "dumb" convert, it's much more work loading a template and replacing the strings and it's more error prone that way. By all means if you want to do it that way, then do it. But at least consider carefully if you want to spend hours there, where the gain is zero as far is I see.

Yeah I know we won't have any problem if we just leave out the formatting and even the empty strings in the translated files. I just know that the developers are a little picky on these formatting things and keeping things like in the English one. At least with the native language speaking developers who also maintain the translated files as well. I think for first step we should see how a back-converted language file would look. I will show them a diff between the Transifex imported and the existing git language file. I think at the end it could stay like that with no problem. What maybe would make sense to put into the back converted language XML file is the date/git revision of the source file, the translation team members and maybe the number of untranslated strings. I think we can get this info pretty easy from using the Transifex API.

viljoviitanen Wrote:I suggest this: the intermediate repository is always just overwritten from both transifex and xbmc git updates, and we notify manually (email) if there were conflicts (but I think detecting conflicts for normal updates in translations may be a bit difficult). The old strings are always preserved in the intermediate git history anyway. And we'll have to assume conflicts don't happen very often anyway because of the (normally) instant changes in both directions.

I agree. The intermediate repo should always be as much in sync with the originals as possible so all master git repositories can be updated anytime the maintainer wants an can use the mos recent translation version.

So please if you do have the xliff->XML convert util ready let's make the test that we can see how the original let's say Finnish strings.xml changes with the back-converted strings.xml.

Thanks, Cheers

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
viljoviitanen Offline
Member
Posts: 57
Joined: Apr 2010
Reputation: 1
Location: Jyväskylä, Finland
Post: #58
alanwww1 Wrote:Yeah I know we won't have any problem if we just leave out the formatting and even the empty strings in the translated files. I just know that the developers are a little picky on these formatting things and keeping things like in the English one. At least with the native language speaking developers who also maintain the translated files as well. I think for first step we should see how a back-converted language file would look. I will show them a diff between the Transifex imported and the existing git language file. I think at the end it could stay like that with no problem. What maybe would make sense to put into the back converted language XML file is the date/git revision of the source file, the translation team members and maybe the number of untranslated strings. I think we can get this info pretty easy from using the Transifex API.

Hm. It's a difficult problem (to me at least) if we want to preserve the formatting of the translated xml files.

It's ... well just not the thing to do with xml. If that's what required, we're making thing unnecessarily hard for ourselves. I don't even want to think about what's needed to preserve target formatting.

The other things should be easy to fetch from transifex api, yes (and the original English file git things, if we put them in the xliff files.)

alanwww1 Wrote:So please if you do have the xliff->XML convert util ready let's make the test that we can see how the original let's say Finnish strings.xml changes with the back-converted strings.xml.

Thanks, Cheers

Here, done the easy way, just throwing together an xml file "dumb":

https://github.com/viljoviitanen/Simple-...fftoxml.py

Finnish file generated with the tool from downloaded xliff:

http://dl.dropbox.com/u/25581711/test.xml

...

By the way: I noticed the xliff files we get back from transifex are kind of broken. They are like this:

Code:
<trans-unit id="0">
    <context-group><context context-type="id">0</context><context context-type="context">Sample Context</context></context-group>
    <source>Programs</source>
    <target>Ohjelmat</target>
    <target>Programs</target>
   </trans-unit>

with multiple target elements... So it seems the source file we upload in transifex should not have the target elements at all, or something. But luckily this does not matter, because nobody but us is using the xliff files.
find quote
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #59
I made a diff with the back converted xls file.

http://pastebin.com/2GkhGY0W

Seems ok, but I see one problem. We have the missing translations filled in with English ones. This could cause problems, as if we update back the language file to Transifex we could have them appear as translated ones, but they are only filled up with English.

Can you check what can be the problem ? Is it in the conversion or in Transifex ?

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
alanwww1 Offline
Team-Kodi Member
Posts: 1,364
Joined: Nov 2008
Reputation: 33
Location: Hungary
Post: #60
I think the problem for the English strings remaining in the untranslated fields, is also coming from the wrong source file, containing English strings in the "taraget" fileld. If we remove all the traget fields from the source file I think both problems are solved :-)

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Post Reply