Req Clean unused strings.
#1
Hello,

I think it's time to big cleaning time for not used & duplicate language strings.

I know it's very hard and big job but it must be done some day. Maybe one good coder to make script for check not used & duplicate strings then it can be removed manually?
Reply
#2
No one interested? Really to many duplicate and unused string for translators. Sad
Reply
#3
I'm interested. The Wiki will be using strings help automate descriptions there, including translations (long term goal), so I'll have my nose in there in the near future. I don't know if I will be if any help for reducing duplicates, but I'll keep an eye out for them :)
Reply
#4
Thank you. I will also try to help if i can.
Reply
#5
Please note that even though some strings seem to be duplicates they are used in different contexts and thus might translate differently in certain languages. So while getting rid of orphaned strings and duplicates is of course very welcome, we have to be careful with merging labels. I give my best to bug our devs to add contextual notes to string so that they can be reused more easily and also try to avoid new duplicates if contexts match, but I can't check every single pull request on Github - so there might be things slipping through. Having a second pair of eyes checking our PRs for this would be very welcome.
Reply
#6
keep in mind that certain strings may not be used by core or confluence, but they might be widely used by other skins.
removing strings from the language file is not so easy as it seems ;-)
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#7
(2015-04-19, 19:04)ronie Wrote: keep in mind that certain strings may not be used by core or confluence, but they might be widely used by other skins.
removing strings from the language file is not so easy as it seems ;-)
Good reason to remove them then. Random strings with no context are basically untranslatable.

I actually made a script for checking for unused string like this some years ago; if only I could find it. It should work well enough to get draft. The hard ones are the string ids with 4 digit and less, which will usually just give too many false positives. And iirc there are some funky arithmetic going on for the weather strings to be wary of.
Reply
#8
(2015-04-19, 22:00)takoi Wrote: The hard ones are the string ids with 4 digit and less, which will usually just give too many false positives.

For that I think searching like str("g_localizeStrings.Get(183)") on the whole xbmc code directory is fine.

I think just keep core strings in and take skin and etc string out of core kodi. I remember few years ago decided to add some off
the official skin (confluence) strings to kodi core strings file tho.

Really tomorrow I will share with all of you one code and you will see how many duplicated strings we have atm. (code is not perfect. Actually very bad but for showing and example it's a start I guess)

I am not sure but I guess almost 1000 strings not used by kodi core atm. (numbers is not official.)

What a big job I asked. Sorry for everybody.
Reply
#9
(2015-04-19, 22:55)queeup Wrote:
(2015-04-19, 22:00)takoi Wrote: The hard ones are the string ids with 4 digit and less, which will usually just give too many false positives.

For that I think searching like str("g_localizeStrings.Get(183)") on the whole xbmc code directory is fine.
Absolutely not! Strings are called in various ways.
You must go through the entire code searching for that exact string id number.
Double strings are sometimes defined because they are used in different context.
There are also hundreds of strings that are used by skins and add-ons which are not used in core code.

I will not accept a blunt removal of thousand strings without proof they are not used in any way.

I think we already had this discussion once. I will say it again. First tag the strings that are 100% sure used with the file name where. Anything that is left will be reviewed.
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#10
I am translater and what you said about some string used by skins and add-ons can be true but what i am seeing while translating, people defining again and again same strings in their own strings file.
Reply
#11
I agree, a big benefit would be if all known strings would get contextual and usage comments. But that's the one huuuuuge task everybody is scared of, because you have to check for every string where and how it's used in the code.

Quote:For that I think searching like str("g_localizeStrings.Get(183)") on the whole xbmc code directory is fine.
As Martijn said - this won't work reliably. It'll probably find 50% of the strings used. Some string IDs are generated programmatically (like some stereoscopic ones) or are passed as variable to some other function (dialogs, ...) which then will do the translation.
Reply
#12
(2015-04-20, 10:53)queeup Wrote: I am translater and what you said about some string used by skins and add-ons can be true but what i am seeing while translating, people defining again and again same strings in their own strings file.
There is no doubt that we have duplicates that could be removed - but as already mentioned, we can't merge them blindly because they could be used in a different context and thus could require different translations. So research has to be done.
Reply
#13
EDIT:
Wrong: For checking duplicates:
Correct: Check and count if same string used more then once:
PHP Code:
import polib

LANGUAGE_FILE 
'strings.po'

PO_DICT = []
po polib.pofile(LANGUAGE_FILE)
for 
entry in po:
    
PO_ARRAY = {'msgctxt'entry.msgctxt.replace('#'''), 'msgid'entry.msgid}
    
PO_DICT.append(entry.msgid)


def has_duplicates(values):
    
# For each element, check all following elements for a duplicate.
    
for i in range(0len(values)):
        
count 1
        dup_str 
''
        
for x in range(1len(values)):
            if 
values[i] == values[x]:
                
count += 1
                dup_str 
values[i]
        if 
count is not 1:
            print 
dup_strcount

# Test the has_duplicates method.
print(has_duplicates(PO_DICT)) 

Output: http://xbmclogs.com/pcmn4pbuv
Reply
#14
I don't really see a few duplicated strings as a problem. Transifex have translation memory so it's 1 click to duplicate it. Incorrectly reusing strings is a much bigger problem.

I thinks there's an error in your code though queeup. The list of duplicates itself contains duplicates Tongue Try collections.Counter instead. Btw how did you load the po file? I just get "OSError: Syntax error in po file strings.po" from polib with that code.
Reply
#15
There is 2 thing gives syntax error in po file. You have to correct them all first.
1- Need 1 empty space after comments (like: # empty string)
2- Some strings use ("). you need to change with (')

My code is shit Smile But it shows correct most of the dups.

EDIT: Actually my code shows if string used more then once then show how many times used.
Reply

Logout Mark Read Team Forum Stats Members Help
Clean unused strings.0