• 1
  • 9
  • 10
  • 11(current)
  • 12
  • 13
  • 24
Release CleanSubs - (Clean subtitles from the ads and other rubbish)
(2017-02-02, 23:21)misa Wrote:
(2017-02-02, 12:11)DaLanik Wrote: v4.4
- Improve heuristic search (thanx Peter!)

Yes I am stupid Smile But what does this mean > heuristic search

It means it is using some logic to find ads and unwanted lines in subtitles instead of definitions (similar to how antivirus works, but not that complex) Smile
Reply
v4.5
- Change all xml settings reading from xml.dom to json
- Added Dutch translation (thanx Peter!) Smile
Reply
Hello,

I can't download the new version as the repo seems to be down. Where to download the latest version?

T.I.A.
Reply
Just checked, it is online...
Reply
v4.6
- json paths bugfix when scanning library

Sorry for the bug Smile
Reply
v4.7
- Improve logic
Reply
Hi there,

reporting back after my NFS request. Unfortunately, CleanSubs has never worked for me, neither through SMB nor NFS. I'm adding the relevant part of a debug log, maybe you find something...

https://hastebin.com/xakeziqapi.tex
Reply
Is that with the latest verson? I have tested with local NFS server and it worked for me. I'll examine the log anyways
Reply
v4.8
- Uses language(s) selected in Kodi subtitle settings

With this setting ON (Default), cleansubs will use only definitions for languages defined in Kodi subtitle settings. Serbian/Serbian Cyrillic/Serbo-Croatian/Croatian/Bosnian are treated as one (serbo-croatian) language, which ofcourse they are. Custom subscribtions from web portal (http://cleansub.heliohost.org) no longer work.
Reply
(2017-02-10, 23:39)HeresJohnny Wrote: Hi there,

reporting back after my NFS request. Unfortunately, CleanSubs has never worked for me, neither through SMB nor NFS. I'm adding the relevant part of a debug log, maybe you find something...

https://hastebin.com/xakeziqapi.tex

OK, here's a fix that should work for you. Let me know:



v4.9
- Bug fix
Reply
v5.0
- Optimize stats collection
Reply
fresh kodi v17 install (using mysql backend / sources.xml), installed repo, installed cleansubs.. went to configure it.. seeing:
Code:
16:14:32.445 T:7796   ERROR: EXCEPTION Thrown (PythonToCppException) : -->Python callback/script returned the following error<--
                                             - NOTE: IGNORING THIS CAN LEAD TO MEMORY LEAKS!
                                            Error Type: <type 'exceptions.IndexError'>
                                            Error Contents: list index out of range
                                            Traceback (most recent call last):
                                              File "C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\default.py", line 41, in onSettingsChanged
                                                GetSettings()
                                              File "C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\default.py", line 91, in GetSettings
                                                Lang[i] = Lang[i].upper()
                                            IndexError: list index out of range
                                            -->End of Python script error report<--

restarted kodi, it hung on exiting due to cleansubs db being locked.. eventually exited. went back in, configured cleansubs manual folder.. put two .srt files in the folder to test.
then went to addons > cleansubs to run it.. clicked once.. nothing happen visually in kodi.. looking at logs:

Code:
16:16:15.288 T:4864   DEBUG: CPythonInvoker(11, C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): start processing
16:16:15.300 T:4864   DEBUG: -->Python Interpreter Initialized<--
16:16:15.300 T:4864   DEBUG: CPythonInvoker(11, C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): the source file to load is "C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py"
16:16:15.300 T:4864   DEBUG: CPythonInvoker(11, C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): setting the Python path to C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs;C:\Program Files (x86)\Kodi\addons\script.module.pil\lib;C:\Users\zoggy\AppData\Roaming\Kodi\addons\script.module.beautifulsoup\lib;C:\Users\zoggy\AppData\Roaming\Kodi\addons\script.module.myconnpy\lib;C:\Program Files (x86)\Kodi\system\python\DLLs;C:\Program Files (x86)\Kodi\system\python\Lib;C:\Program Files (x86)\Kodi\python27.zip;C:\Program Files (x86)\Kodi\system\python\lib\plat-win;C:\Program Files (x86)\Kodi\system\python\lib\lib-tk;C:\Program Files (x86)\Kodi;C:\Program Files (x86)\Kodi\system\python;C:\Program Files (x86)\Kodi\system\python\lib\site-packages
16:16:15.300 T:4864   DEBUG: CPythonInvoker(11, C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): entering source directory C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs
16:16:15.300 T:4864   DEBUG: CPythonInvoker(11, C:\Users\zoggy\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): instantiating addon using automatically obtained id of "service.cleansubs" dependent on version 2.1.0 of the xbmc.python api
16:16:15.768 T:4864   DEBUG: CLEANSUBS >> DEFINITIONS >> NO NEW DEFINITIONS (L:21244 == R:21244)
16:16:15.852 T:4864   DEBUG: CLEANSUBS >> DELETED AND CREATED NEW DEF DB

after a delay, finally came back with a popup asking what to clean...

Code:
16:19:00.185 T:4864   DEBUG: CLEANSUBS STANDALONE >> FILE: >>test-RARBG_track4_eng.srt<<
16:19:00.186 T:4864   DEBUG: CLEANSUBS >> SUB STATS WILL BE ADDED TO LOCAL DATABASE
16:19:00.186 T:4864   DEBUG: CLEANSUBS >> ENC >> OPENED WITH ENCODING: utf-8
16:19:00.213 T:4864   DEBUG: CLEANSUBS >> PROCESSED IN 0.03 SECONDS, REMOVED 78 LINES

so it cleaned the sub.. but causes cosmetic artifacts and didnt cleanup the rarbg advertising...

this went from:
Code:
00:10:30,230 --> 00:10:34,265
but [Groans] this --

to: (double space after but)
Code:
00:10:30,230 --> 00:10:34,265
but  this --
this is too much.

thus may want to run a replace ' ' with ' ' after cleanup is done to reduce the cosmetic stuff..

then with the CC cleanup, its not cleaning up the 'music' stuff like:

Code:
00:14:38,544 --> 00:14:41,646
¶¶

then finally, looks like you need to add cleanup string for rarbg:

Code:
00:18:02,838 --> 00:18:04,838
Torrent downloaded by RARBG


also noticed that when it did cleanup the file it caused this dupe:

Code:
1
00:00:01,434 --> 00:00:03,602
1
<i> Previously on "marvel's</i>
<i> agents of S.H.I.E.L.D."...</i>
Reply
Hi there, reporting back for the path clean function. I see several problems in my debug log, of which I'm posting a sample:
Code:
20:14:32.231 T:9388   DEBUG: Thread LanguageInvoker start, auto delete: false
20:14:32.232 T:9388    INFO: initializing python engine.
20:14:32.232 T:9388   DEBUG: CPythonInvoker(61, C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): start processing
20:14:32.298 T:9388   DEBUG: -->Python Interpreter Initialized<--
20:14:32.298 T:9388   DEBUG: CPythonInvoker(61, C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): the source file to load is "C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py"
20:14:32.298 T:9388   DEBUG: CPythonInvoker(61, C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): setting the Python path to C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs;C:\Program Files (x86)\Kodi\addons\script.module.pil\lib;C:\Users\JoScha\AppData\Roaming\Kodi\addons\script.module.beautifulsoup\lib;C:\Users\JoScha\AppData\Roaming\Kodi\addons\script.module.myconnpy\lib;C:\Program Files (x86)\Kodi\system\python\DLLs;C:\Program Files (x86)\Kodi\system\python\Lib;C:\Program Files (x86)\Kodi\python27.zip;C:\Program Files (x86)\Kodi\system\python\lib\plat-win;C:\Program Files (x86)\Kodi\system\python\lib\lib-tk;C:\Program Files (x86)\Kodi;C:\Program Files (x86)\Kodi\system\python;C:\Program Files (x86)\Kodi\system\python\lib\site-packages
20:14:32.298 T:9388   DEBUG: CPythonInvoker(61, C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): entering source directory C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs
20:14:32.298 T:9388   DEBUG: CPythonInvoker(61, C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py): instantiating addon using automatically obtained id of "service.cleansubs" dependent on version 2.1.0 of the xbmc.python api
20:14:32.634 T:9388   DEBUG: CLEANSUBS >> DEFINITIONS >> NO NEW DEFINITIONS (L:21244 == R:21244)
20:14:32.693 T:9388   DEBUG: CLEANSUBS >> DELETED AND CREATED NEW DEF DB
20:14:43.619 T:9388   DEBUG: CLEANSUBS >> READ TOTAL DEFINITIONS: 0 elements
20:14:43.619 T:9388   DEBUG: CLEANSUBS STANDALONE >> STARTED VERSION 5.0
20:14:43.620 T:9388   DEBUG: JSONRPC: Incoming request: {
                                                "jsonrpc": "2.0",
                                                "id": 1,
                                                "method": "Files.GetSources",
                                                "params": {
                                                    "media": "video"
                                                    }
                                                }
20:14:43.620 T:9388   DEBUG: CLEANSUBS >> VIDEO PATHS >> multipath://nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fAction-Adventure-Western%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fAsian%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fComedy-Family-Romance%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fCrime-Suspense-Mystery%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fDrama-War%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fHorror%2f/nfs%3a%2f%2f192.168.1.185%2fd%2fTV-Movie%2f_Movie%2fSf-Fantasy%2f/nfs%3a%2f%2f192.168.1.185%2ft%2fTV-Movie2%2f_Doku%2f/nfs%3a%2f%2f192.168.1.185%2ft%2fTV-Movie2%2f_Anime%2f_Movies%2f/
20:14:43.620 T:9388   DEBUG: CLEANSUBS >> VIDEO PATHS >> nfs://192.168.1.185/t/TV-Movie2/_Anime/_Series/
20:14:43.620 T:9388   DEBUG: CLEANSUBS >> VIDEO PATHS >> nfs://192.168.1.185/t/TV-Movie2/_tv/
20:14:43.620 T:9388   DEBUG: CLEANSUBS >> VIDEO PATHS >> nfs://192.168.1.185/q/Music/_dvd-V/
20:14:43.620 T:9388   DEBUG: CLEANSUBS >> VIDEO PATHS >> nfs://192.168.1.185/d/TV-Movie/_Movie/Animation/
20:14:48.486 T:9388   DEBUG: DialogProgress::Open called
20:14:48.486 T:9388   DEBUG: ------ Window Init (DialogConfirm.xml) ------
20:16:11.077 T:9388   DEBUG: CLEANSUBS STANDALONE >> BEGIN PATH: >>\\POSTMAN\TV-Movie2\_tv\<< FOLDERS IN PATH: >>362<<
...
20:16:51.647 T:9388   DEBUG: CLEANSUBS STANDALONE >> FILE: >>American.Horror.Story.S02E12.en.srt<<
20:16:51.677 T:9388   DEBUG: CLEANSUBS >> SQL ERROR IN CheckDatabase
20:16:51.677 T:9388   DEBUG: CLEANSUBS >> SUB STATS WILL BE ADDED TO LOCAL DATABASE
20:16:51.692 T:9388   DEBUG: CLEANSUBS >> ENC >> OPENED WITH ENCODING: utf-8
20:16:51.713 T:9388   DEBUG: CLEANSUBS >> SQL ERROR IN AddtoDatabase : no such table: stats
20:16:51.736 T:9388   DEBUG: Previous line repeats 1 times.
20:16:51.736 T:9388   DEBUG: CLEANSUBS >> PROCESSED IN 0.09 SECONDS, NO LINES REMOVED
20:16:51.736 T:9388   DEBUG: CLEANSUBS STANDALONE >> FILE: >>American.Horror.Story.S02E12.ja.srt<<
20:16:51.766 T:9388   DEBUG: CLEANSUBS >> SQL ERROR IN CheckDatabase
20:16:51.767 T:9388   DEBUG: CLEANSUBS >> SUB STATS WILL BE ADDED TO LOCAL DATABASE
20:16:51.777 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING utf-8
20:16:51.782 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1250
20:16:51.792 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1251
20:16:51.797 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1252
20:16:51.807 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1253
20:16:51.817 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1254
20:16:51.827 T:9388   DEBUG: CLEANSUBS >> ENC >> TRYING ENCODING cp1257
20:16:51.827 T:9388   DEBUG: CLEANSUBS >> ENC >> OPENED WITH KODI ENCODING:
20:16:51.848 T:9388   ERROR: EXCEPTION Thrown (PythonToCppException) : -->Python callback/script returned the following error<--
                                             - NOTE: IGNORING THIS CAN LEAD TO MEMORY LEAKS!
                                            Error Type: <type 'exceptions.LookupError'>
                                            Error Contents: unknown encoding:
                                            Traceback (most recent call last):
                                              File "C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py", line 290, in <module>
                                                intCancel = scanPaths(manFolder, 1, 1, 3)
                                              File "C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\standalone.py", line 176, in scanPaths
                                                process_subs(os.path.join(path, basePath, name), 1)
                                              File "C:\Users\JoScha\AppData\Roaming\Kodi\addons\service.cleansubs\default.py", line 300, in process_subs
                                                file_input = codecs.open(fileName, 'r', SubCharset, errors='ignore')
                                              File "C:\Program Files (x86)\Kodi\system\python\Lib\codecs.py", line 899, in open
                                                info = lookup(encoding)
                                            LookupError: unknown encoding:
                                            -->End of Python script error report<--

1. It seems like there's problem with a non-existing table "stats".
2. It seems like there's a problem with double-byte encoded subs, in my case codepage 932 ANSI/OEM Japanese (Shift JIS). That's the part where Cleansubs tries a few different codepages and finally fails with erroring out. Maybe this could be made more resilient by skipping.

3. The cleaning itself seems to fail partially. Example:
American.Horror.Story.S02E12.en.srt vs. American.Horror.Story.S02E12.en.srt_ORIGINAL

Cleansubs manages to clean the last lines of the sub which are
Code:
728
00:42:42,598 --> 00:42:52,817
<font color="#ec14bd">Sync & corrections by honeybunny</font>
<font color="#ec14bd">www.addic7ed.com</font>

However, it fails to clean stuff from the top which still has
Code:
1
00:00:48,917 --> 00:00:51,152
Daddy?

2
00:00:51,220 --> 00:00:53,788
Daddy'll be there in a minute.

3
00:01:48,608 --> 00:01:58,632
<font color="#ec14bd">Sync & corrections by honeybunny</font>
<font color="#ec14bd">www.addic7ed.com</font>

I'll be back with some more tests about NFS.
Reply
(2017-02-23, 00:18)thezoggy Wrote: then with the CC cleanup, its not cleaning up the 'music' stuff like:

Code:
00:14:38,544 --> 00:14:41,646
¶¶

Please leave those in there as it gives a hint which parts are lyrics and which part is spoken.
Reply
(2017-02-23, 21:56)HeresJohnny Wrote:
(2017-02-23, 00:18)thezoggy Wrote: then with the CC cleanup, its not cleaning up the 'music' stuff like:

Code:
00:14:38,544 --> 00:14:41,646
¶¶

Please leave those in there as it gives a hint which parts are lyrics and which part is spoken.

(2017-02-23, 00:18)thezoggy Wrote: then finally, looks like you need to add cleanup string for rarbg:

Code:
00:18:02,838 --> 00:18:04,838
Torrent downloaded by RARBG

Pretty sure you shouldn't post stuff like that here.

Its a valid string to cleanup. I'm not posting release names (like you are) or how to get them.. which is the part you shouldn't be doing.
And if you dont want it cleaning up CC related entries.. dont use that option.

For better context of the ¶¶ entries, you can see that they are just fillers for a montage?.. no lyrics or anything being used... if your cleaning up sounds "[ door creaks ]" might as well cleanup the visual sound clue too..
Code:
00:02:45,633 --> 00:02:47,633
Where?
Nome, Alaska.

69
00:02:47,635 --> 00:02:51,403
¶¶

70
00:02:53,039 --> 00:02:54,873
[ door creaks ]

71
00:02:54,875 --> 00:02:57,709
[ Wind howling ]

72
00:02:57,711 --> 00:03:01,246
[ Door closes ]

73
00:03:09,556 --> 00:03:11,456
Mack: Coulson.

74
00:03:19,799 --> 00:03:22,467
What the hell is this?

75
00:03:29,542 --> 00:03:31,843
It's...You.

76
00:03:31,845 --> 00:03:33,979
¶¶

77
00:04:12,619 --> 00:04:15,487
¶¶

78
00:04:30,703 --> 00:04:31,803
doctor.
Reply
  • 1
  • 9
  • 10
  • 11(current)
  • 12
  • 13
  • 24

Logout Mark Read Team Forum Stats Members Help
CleanSubs - (Clean subtitles from the ads and other rubbish)2