Req Track assets by checksum
#1
In my exercises with xbmc and files on disk, one thing that is starting to annoy me is the inability of xbmc to track files as they move on disk. This bothers me because each time I move a file, xbmc does another scrape - it appears to have no idea that the information already in its library can be used.

For example, if I have H:\dir1\movie.mkv and I move it to H:\dir2\movie.mkv, I end up with two copies of "movie.mkv" in the Library.

If I turn on a specific feature then I can have xbmc delete the old one but that's not the right thing to do.

What has happened is that I have an object in my library, called "movie", of which the path to it is one property. When I add the file to the library, the path property has a value of X but at some time later I may want to give it a value of Y.

Thus what I'd like to propose is that xbmc index video assets by a checksum that is unique for each file. Maybe a SHA512 of the first 1k of data. This would then be used as a primary key for multimedia assets such that when a new file is found, the first 1k of data is read and hash'd. If there is a matching hash and the old path to the asset no longer represents an existing asset then path to the asset is updated to match the new one.

It is possible for various heuristics to be used such as requiring the filename to be the same in order to increase the likelihood of it being the same asset but just in a new position.

Another approach to solve the problem of scraping for information that is already in the library would be to teach the scraper to do a better search of the internal database first before going external. That would continue to be consistent with today's behaviour of having two entries in the library for a moved asset (the old one that is no longer present and the new one that is.)

Now that I've written all of that, this presupposes that I don't want to have duplicate copies of an asset in my library. Would preventing duplicate copies of an asset in the library be a problem? Is there anything or anyone that requires two paths to movie.mkv?
Reply
#2
+1 I was thinking thesame thing, but i gave up since I thought that you neeed to checksum the entire file and this requires a lot of processing power and time especially for large files
I believe your solution with the first 1k of data is great
Reply
#3
Not a bad idea, I also wonder if there's ever a use case for having two of the same file in different paths. I know I never do but I guess it's possible someone out there does.

As well it might be better to use a pre-existing standard for the hashes such as http://trac.opensubtitles.org/projects/o...ourceCodes . Could help with subtitle lookups too if that were used.
Reply
#4
Why re-invent the wheel? If you have something that moved you clean the library to remove the duplicates. For me it takes seconds. Besides I am willing to be that most of us don't move files around often enough to require the scraper to play bounty hunter.
Reply
#5
This is done by mythtv. But it is annoying in the counter scenario. If you rename the file to make it scrape better, mythtv doesn't try to scrape again because the file's hash is already in the database (with the wrong info).
If I have helped you or increased your knowledge, click the 'thumbs up' button to give thanks :) (People with less than 20 posts won't see the "thumbs up" button.)
Reply
#6
Doesn't sound like mythtv implemented it well. It does speak to how staying with a more straightforward method will produce less problems for users. There would be so many things to consider. I have over 400 movies and currently over 5000 files associated with them. In addition, I am re-ripping every day so my library is constantly changing from multiple files to a single file. Kodi handles this with no problems as is. Why change?
Reply

Logout Mark Read Team Forum Stats Members Help
Track assets by checksum1