Posts: 14,226
Joined: Nov 2009
Reputation:
704
Klojum
Lost connection
Posts: 14,226
1) Files named "évasion" and "evasion" are considered two different files in Kodi, in Linux and most likely also in Windows file storage. The situation that the scraper also reacts different is not uncommon with diacritic characters.
2) I don't think the scraper is up to that level of artificial intelligence already, but others probably have more info on that.
Posts: 45
Joined: Aug 2020
Reputation:
0
2022-05-20, 22:05
(This post was last modified: 2022-05-20, 22:29 by michelb2.)
thanks for the answer
first, as i say already , all my kodi parameters are for french, also for the search in the scraper
A) i allready search in tmdb with evasion y:2013 and évasion y:2013 and yes i have multi answers with different order.
evasion y:2013 =>
1 Evasion fiscale - Le hold-up du siècle 2013
2 La Grande Évasion 1963
3 Évasion 2013
évasion y:2013 =>
1 La Grande Évasion 1963
2 Evasion fiscale - Le hold-up du siècle 2013
3 Évasion 2013
so the proposal in tmdb python is not coherent with the order of tmdb answer
it seem more naturel for the scraper to choice the exact name first then the date and at last partial name with date not the same.
B) i try the manual search and saw that tt1211956 give the right answer
but my question was :
is it possible to automate the process (in advanced_setting.xml with <cleandatetime> and/or <cleanstrings>
or directly in the code of the scraper : metadata.themoviedb.org.python\scraper.py
PS
i just look in addons\metadata.themoviedb.org.python\python\lib\tmdbscraper\tmdb.py
and it seem to me that this function (line 134 and more)
def _parse_media_id(title):
if title.startswith('tt') and title[2:].isdigit():
return {'type': 'imdb', 'id':title} # IMDB ID works alone because it is clear
.....
is used to search if a imdb number or tmdb number exist in the title BUT only if it is the only part in the name file. ( as you can see in the manual search)
my knowledge in python is not good enough to change the code but all you have to do is to retrun the type and imdb number as soon as "tt\d+" exists in file name
Can someone can help me ?
Posts: 45
Joined: Aug 2020
Reputation:
0
2022-05-20, 22:31
(This post was last modified: 2022-05-20, 22:33 by michelb2.)
sorry
i edit my previous post while you answer to me
if i can export img to pastebin ,i can show you my answer in tmdb if you don't believe me
Posts: 612
Joined: Sep 2010
Reputation:
148
Ya, the scraper code can likely be changed to match the IMDB number from anywhere in the string - with a precise enough pattern we're unlikely to pull up false positives, like a movie with 'tt12345' legitimately in the title.
For this filename pattern specifically you'd also need to change "cleandatetime" to not match the year in the middle - that happens in Kodi before the name is passed to scrapers.
Posts: 45
Joined: Aug 2020
Reputation:
0
2022-05-22, 20:18
(This post was last modified: 2022-05-22, 21:11 by michelb2.)
thanks for the answer.
Could you tell me exactly where to made the change
i spent lot of time this week end to search in
metadata.themoviedb.org.python\python\scraper.py,
metadata.themoviedb.org.python\python\lib\tmdbscraper\tmdb.py
but dont' find where to act
everywhere i find the title in the functions but never the filename.
it seems to me that i have to act before but i don' know where
i don't understand why you want to "cleandate" because if you can replace "name_of_file_or_not (date) [tt123465]. extension" by "tt132465.extension" before applying the search, the result shoud be correct ?
and if you clean the date, you can miss the right movie if there is no tt number in the filename
Posts: 612
Joined: Sep 2010
Reputation:
148
Filename cleaning is done by Kodi before sending the title and year to the scraper. Scrapers have no access to the filename.
To make this work with your particular naming scheme, you would have to edit the line in the scraper that you have identified, and also change the cleandatestring as noted
Posts: 45
Joined: Aug 2020
Reputation:
0
2022-05-28, 18:24
(This post was last modified: 2022-05-29, 17:51 by michelb2.)
just for the record
in addons\metadata.themoviedb.org.python\python\lib\tmdbscraper\tmdb.py
i add :
def _parse_media_id(title):
m=re.search(r"(tt\d+)",title)
if m: return {'type': 'imdb', 'id':m.group()}
WITHOUT ANYTHING ELSE
and if the imdb number is BEFORE the date, the imdb is enough to scrap
"nimporteqoui (2019) [tt6063090] (4k,vostfr).iso" not OK
"nimporteqoui (2019) (4k,vostfr) [tt6063090].iso" not OK
"nimporteqoui [tt6063090] (2019) (4k,vostfr).iso" OK
as you say, sadly kodi remove the tt number before i can do anything else (except maybe cleandate !! but i did not try a movie without imdb in this case))
my mistake
"nimporteqoui [tt6063090] (4k,vostfr) (2019).iso" is not OK but i don't understand why
ok , i found , with
<cleanstrings> <regexp>\(.*\)?</regexp> </cleanstrings>
"nimporteqoui [tt6063090] (4k,vostfr) (2019).iso" is OK
So
title [imdb] (date) (divers).ext
title [imdb] (divers) (date).ext are ok