alanwww1 Wrote:Looks ok now. A few thoughts:
- Would be good to note in the translated/returned/converted strings.xml file the revision or date of the original English file it is based on. For this we could use a header entry in the source xlif file eg. like this:
Code:
<header>
<reference>3978ce61ee897fab5daf83c345f3a8183aae0b22</reference>
</header>
(I think "reference" is an existing possible field in the xlif form)
Don't know if that would work, but I read the export function is using the source file as a template replacing the target fields and updating the updateable fields. This way we might preserve the git revision the language file is based on.
- I realized that trans-unit id and context-type="id" has the same number stored. I guess without context-type="id" we can not handle the duplicated strings right ?
- I realized that at the details tab, Transifex shows the actual English translatable strings at two places. The second one is at string ID:
Code:
String ID: Vorbis
Description: id\: 34001:context\: Sample Context
Comment: Click here to add a comment
Occurrences: -
I was not able to find another open-to-edit project to check if this is normal.
- We will of course need a re-convert utility from xlif to xml. In this case I think we should use the latest English strings file as a template, deleted of course the comments, context hints, but preserving the empty strings, empty lines, line feeds, etc. So the format of the lang strings.xml is always in sync with the English one. Should I try to make this back convert util in C++ ?
- I will make a lot of tests now, how things are working in different scenarios. Eg. the source file changes, the language strings.xml file changes at GIT AND the Transifex stored one also changes, how this would merge or overwrite. Things like this.
Thanks for the update, cheers Attila
"reference": sure, it can be stored in transifex that way, but in intermediate repo we'll need some other way. Normally it would always be xbmc git master branch latest commit (English and other languages), and if git is updated, it's updated in intermediate repo and transifex in a matter of seconds with the web hooks. Of course we'll need some kind of periodic check in case the web hooks fail for some reason, and having the reference in intermediate repo probably helps detect failed webhooks from github direction, as it's bound to fail occassionally. This depends on 6 things: availability and bugs of 3 systems, github api, transifex api and our own web hooks (wherever they are stored. I'm thinking google appengine for now.. it's free and quite reliable and you get https connections for free). Both git and transifex store history, so any problems with the updates are probably easily fixed.
transifex "duplicate" source string: yeah, it seems that's the way it works, at least with xliff files (and well, having read the xliff import code, it IS that way, only info going in from the xliff to transifex is source, context and target, and target obviously is ignored in the source language import). If this bothers, you just probably need to learn to not mind about it
convert back to xml: I don't think it's worth it doing in c++. I have almost ready python code from the other way convert, it's very little work, actually trivial, to do it the other way - depending on how clever it needs to be... so I'll have to ask: Why would we need to preserve the empty strings? As far as I see, they can be nonexisting in translations as xbmc then falls back to English, where the empty string exists. So it would be very easy to just do a "dumb" convert, it's much more work loading a template and replacing the strings and it's more error prone that way. By all means if you want to do it that way, then do it. But at least consider carefully if you want to spend hours there, where the gain is zero as far is I see.
Merging etc: you can't test it before it's done. And how it's going to work depends on how it's done
Source string changes of course is "basic" stuff for transifex so I'm quite sure there are no problems with that. But note: if context changes, I believe transifex treats the string as new (cos the internal hash changes), and it will need to be retranslated. I also think it could be automatically translated from translation memory (there's a setting for that), but this is something that will need to be tested.
I suggest this: the intermediate repository is always just overwritten from both transifex and xbmc git updates, and we notify manually (email) if there were conflicts (but I think detecting conflicts for normal updates in translations may be a bit difficult). The old strings are always preserved in the intermediate git history anyway. And we'll have to assume conflicts don't happen very often anyway because of the (normally) instant changes in both directions.