2014-03-06, 16:17
In my exercises with xbmc and files on disk, one thing that is starting to annoy me is the inability of xbmc to track files as they move on disk. This bothers me because each time I move a file, xbmc does another scrape - it appears to have no idea that the information already in its library can be used.
For example, if I have H:\dir1\movie.mkv and I move it to H:\dir2\movie.mkv, I end up with two copies of "movie.mkv" in the Library.
If I turn on a specific feature then I can have xbmc delete the old one but that's not the right thing to do.
What has happened is that I have an object in my library, called "movie", of which the path to it is one property. When I add the file to the library, the path property has a value of X but at some time later I may want to give it a value of Y.
Thus what I'd like to propose is that xbmc index video assets by a checksum that is unique for each file. Maybe a SHA512 of the first 1k of data. This would then be used as a primary key for multimedia assets such that when a new file is found, the first 1k of data is read and hash'd. If there is a matching hash and the old path to the asset no longer represents an existing asset then path to the asset is updated to match the new one.
It is possible for various heuristics to be used such as requiring the filename to be the same in order to increase the likelihood of it being the same asset but just in a new position.
Another approach to solve the problem of scraping for information that is already in the library would be to teach the scraper to do a better search of the internal database first before going external. That would continue to be consistent with today's behaviour of having two entries in the library for a moved asset (the old one that is no longer present and the new one that is.)
Now that I've written all of that, this presupposes that I don't want to have duplicate copies of an asset in my library. Would preventing duplicate copies of an asset in the library be a problem? Is there anything or anyone that requires two paths to movie.mkv?
For example, if I have H:\dir1\movie.mkv and I move it to H:\dir2\movie.mkv, I end up with two copies of "movie.mkv" in the Library.
If I turn on a specific feature then I can have xbmc delete the old one but that's not the right thing to do.
What has happened is that I have an object in my library, called "movie", of which the path to it is one property. When I add the file to the library, the path property has a value of X but at some time later I may want to give it a value of Y.
Thus what I'd like to propose is that xbmc index video assets by a checksum that is unique for each file. Maybe a SHA512 of the first 1k of data. This would then be used as a primary key for multimedia assets such that when a new file is found, the first 1k of data is read and hash'd. If there is a matching hash and the old path to the asset no longer represents an existing asset then path to the asset is updated to match the new one.
It is possible for various heuristics to be used such as requiring the filename to be the same in order to increase the likelihood of it being the same asset but just in a new position.
Another approach to solve the problem of scraping for information that is already in the library would be to teach the scraper to do a better search of the internal database first before going external. That would continue to be consistent with today's behaviour of having two entries in the library for a moved asset (the old one that is no longer present and the new one that is.)
Now that I've written all of that, this presupposes that I don't want to have duplicate copies of an asset in my library. Would preventing duplicate copies of an asset in the library be a problem? Is there anything or anyone that requires two paths to movie.mkv?