Login at Kodi Home

markg85 · 2019-12-17, 22:48

Hi,

My database is plagued false entries. That is because much folder structures look like this:

Quote:Movie
- Video file <actual movie>
- Video file <trailer>
- ..

This causes (or i think but a fairly educated guess) the movie to appear x number of times.
I can clean thuis up, but i need to do that directly in the KODI sqlite database as i don't want to change the folder content. There might be more ways, but i only know of this one.

Now a clean nice solution imho would be to let the indexer be a little more intelligent.
The indexer knows (or should know) the movie length from IMDb (and it knows that ID as it's present in my "uniqueid" table).
Now if it uses the exact time that IMDb gives as check against files to index it probably doesn't index much as the length can deviate a bit.

So here's the algorithm i would apply to it.

Get the information from the scraper.
Get the video length of all the potential files.
Index the file that is in length within 10% of the length that IMDb says it should be.
Disregard the rest.

Now you could still end up with multiple files being indexed for the same movie but then you really have multiple files in your folder that come close to the movie length as reported by IMDb. So in that case you really do need to clean up your folder.

I think i can implement this, but i don't quite know where the code is hidden for these scrapers? In my case "themoviedb" in particular.
All the above is true for series too.

Cheers,
Mark

**Karellen** · 2019-12-17, 23:09

Sorry, but I am not really following what you are trying to do.

For a start, your folder structure description is vague, so post a screenshot of an actual movie folder.

(2019-12-17, 22:48)markg85 Wrote: This causes (or i think but a fairly educated guess) the movie to appear x number of times.

Probably because you named your trailers incorrectly. Read here... Trailers (wiki)

(2019-12-17, 22:48)markg85 Wrote: The indexer knows (or should know) the movie length from IMDb (and it knows that ID as it's present in my "uniqueid" table).
Now if it uses the exact time that IMDb gives as check against files to index it probably doesn't index much as the length can deviate a bit.

No idea what you are trying to do. Why do you need to do this?

markg85 · 2019-12-18, 02:12

(2019-12-17, 23:09)Karellen Wrote: For a start, your folder structure description is vague, so post a screenshot of an actual movie folder.

Really?
Then how do you write down a folder structure? I tried to write a sort of simple tree structure which i think is quite clear.
Anyhow, here's an image if that helps.

The indexer is indexing this as two entries of "A Test Movie" where one would play the trailer and the other would play the actual movie.
That's annoying, i want to get rid of the non-movie entries!

In case it helps, these are the indexer settings:

Quote:<settings version="2">
  <setting id="certprefix" default="true">Rated </setting>
  <setting id="fanart">true</setting>
  <setting id="imdbanyway" default="true">false</setting>
  <setting id="keeporiginaltitle" default="true">false</setting>
  <setting id="language" default="true">en</setting>
  <setting id="RatingS" default="true">TMDb</setting>
  <setting id="tmdbcertcountry" default="true">us</setting>
  <setting id="trailer">true</setting>
</settings>

Note the "<setting id="trailer">true</setting>", it has absolutely nothing to do with the trailer i mentioned before

It's an indexer property which allows one to play a youtube trailer of the movie.
Also, and this is important, the trailer is just an example! It could be some bonus video material too. I don't want nor care for the trailers to be indexed, it's fine if they come from youtube (as i've set it in the indexer settings).
To make it as clear as i can possibly make it, i want the indexer to index only the movie in a given folder and ignore other video files.

Quote:
Quote:The indexer knows (or should know) the movie length from IMDb (and it knows that ID as it's present in my "uniqueid" table).
Now if it uses the exact time that IMDb gives as check against files to index it probably doesn't index much as the length can deviate a bit.
No idea what you are trying to do. Why do you need to do this?

That might not have been my most clear description ever. Sorry for that.
To rephrase. The indexer gets data from IMDb (or TMDb), that data should contain a runtime length for the movie that it's scraping.
Now if you use that runtime to check against the runtime of the video files you have you should easily be able to identify the exact file that should be the movie. It still can deviate a bit as you have different movie releases with different running times so some margin (i'd say 10% within the runtime that comes from IMDb) should be used. To complete the example, in this logic the file "A Test Movie (2019).mkv" should be identified as the movie file. The "trailer.mkv" quite likely just is a few minutes in length and should therefore not pass the check and thus stay out of the library.

Hope that clears it all op a bit Smile

**Karellen** · 2019-12-18, 02:30

(2019-12-18, 02:12)markg85 Wrote: Really?
Then how do you write down a folder structure? I tried to write a sort of simple tree structure which i think is quite clear.

Yep, to you, but there is more than one way to interpret what you wrote, and what your intent was, that I don't bother anymore. Just give me the real thing... that being a screenshot of the actual files so there is no misunderstanding and time wasting Smile

Even the word "Indexer" forces me to interpret your intent... which I assume "Indexer" = "Scraper"

You are naming your trailers incorrectly. If you name your trailer (as explained on that wiki page I gave you the link to and as clearly shown in the screenshots) as:
A Test Movie-trailer.mkv then you wont scrape duplicate movies.

As for runtimes... no, we would not modify scrapers to determine scraping based on runtime. Name your files correctly.

markg85 · 2019-12-18, 03:05

(2019-12-18, 02:30)Karellen Wrote: You are naming your trailers incorrectly. If you name your trailer (as explained on that wiki page I gave you the link to and as clearly shown in the screenshots) as:
A Test Movie-trailer.mkv then you wont scrape duplicate movies.

No i'm not. I'm trying to ignore them!
As i said before, the trailers part is an example of files that "could" be with a movie. I have other video files in there sometimes too which all cause false movie entries!

(2019-12-18, 02:30)Karellen Wrote: As for runtimes... no, we would not modify scrapers to determine scraping based on runtime. Name your files correctly.

Would you be so kind to point me to the source of these scrapers? Or how i make my own.
I did see this https://kodi.wiki/view/HOW-TO:Write_media_scrapers and if that's still the way then i'm totally amazed that everything is done in XML!
I didn't even know that you could go that far in XML!

Is there a way to proxy that to an actual programming language? C++? JavaScript? Python? As i want to run checks.

**Karellen** · 2019-12-18, 03:50

(2019-12-18, 03:05)markg85 Wrote: No i'm not. I'm trying to ignore them!

Hmmm. Then use one of these... Extras (wiki)

(2019-12-18, 03:05)markg85 Wrote: Would you be so kind to point me to the source of these scrapers? Or how i make my own.

https://github.com/xbmc/repo-scrapers

(2019-12-18, 03:05)markg85 Wrote: Is there a way to proxy that to an actual programming language? C++? JavaScript? Python? As i want to run checks.

Python v3... written from scratch.

**Karellen** · 2019-12-18, 03:51

Moved to metadata scrapers forum