Thank you so much for the detailed explanation, Jonathan. I can see why the system was designed this way, but am glad to hear you're looking into various ways of improving it so that this situation is not repeated. There really should be some security that prevents "dead URL = local file no longer valid" from ever being even a possibility (remote URL shouldn't matter once data is safely on the disk). Then again, maybe this won't repeat itself in the future once people are fully migrated to the new Texture database format; since even if a URL dies, it'll still contain a cached URL = local file reference and continue using the same image.
I am especially grateful for pointing me to the right source files. It allowed me to understand the old hashing method and I've hastily written a small export tool (didn't want to bother re-installing an older XBMC version and use its export since that, afaik, doesn't give human-readable .tbn files either).
It was one of those "okay I don't know Python and I don't really care about doing this perfectly, but I'm constructing this as I go" things, but it works. It queries the latest database for all movies, grabbing their path and filename data, applies the old hashing logic (supporting both single-file and stacked-file movies) and copies the cover+fanart (if existing) to nicely named output files under an output folder, where they'll be easy to look up.
Example:
The Godfather: Part II would be output as "The Godfather Part II_cover.tbn" and "The Godfather Part II_fanart.tbn"
I didn't implement all the other ideas (such as doing a HEAD http request to check if the image is a missing one) because I didn't find where the *web* URL of the currently-selected cover/fanart was stored in the database, and because what I've got now is fine enough.
It prints status along the way, such as:
[One Flew Over the Cuckoo's Nest] COV:True FAN:True
[24 Hour Party People] COV:True FAN:True
[Glengarry Glen Ross] COV:True FAN:True
The outputname-sanitizer (which is basically just a regexp I wrote) supports unicode such as до свидания so it's safe to run on foreign movies as well.
Anyway, the basic script idea (which is how I used it):
* Edit the script to set your proper user data and output folders
* Execute it to export all found movie covers/fanart (it only deals with the movie library)
* Launch the latest version of XBMC and scroll through the movies
* Anything with missing cover or fan art: Do a Refresh from the menu to grab the latest URLs, have a look at the exported thumb and fan art to see which one was used earlier, pick the same one. If the fan art was custom-imported from the local disk, just point xbmc directly to the exported .tbn file to re-import the previous image. Alternatively, one could even ignore refresh altogether and just completely import the old thumbnails from disk.
* After a little bit of work (but far less than without this script) you'll have re-instated the exact lost covers/thumbnails.
It's just a very cheap method (due to me not knowing the ins and outs of the XBMC database structure) but it works. I would have liked to implement the proper "HEAD request: is this URL still valid? Yes? Do nothing as XBMC will be able to find it again. No? Okay export the old cover/fanart." to only export the REQUIRED ("orphaned") covers/fanarts, and also to support more than just the movie library, but seriously I haven't got time to learn the remaining pieces of the database to do those things.
I still hope someone will get use out of this script in the future, or even adapt it to do more of the intended features.
Variables/strings to change:
xbmcuserdata = your own userdata path, with trailing slash (used to locate the database and thumbnails)
outputpath = desired output folder, with trailing slash, must have write permissions
"MyVideos64.db" = can be changed to lower database version, doesn't matter; use the highest you have
Code:
#!/usr/bin/env python
import sqlite3 as lite
import re
import os.path
import shutil
import sys
def get_crc32( string ):
string = string.lower()
bytes = bytearray(string.encode())
crc = 0xffffffff;
for b in bytes:
crc = crc ^ (b << 24)
for i in range(8):
if (crc & 0x80000000 ):
crc = (crc << 1) ^ 0x04C11DB7
else:
crc = crc << 1;
crc = crc & 0xFFFFFFFF
return '%08x' % crc
def get_thumbnails( moviename, moviepath, moviefiles ):
coverid = get_crc32(moviepath)
coverpath = "Video/%s/%s.tbn" % (coverid[:1], coverid)
fanartid = get_crc32(moviefiles if moviefiles[:8] == "stack://" else moviepath + moviefiles)
fanartpath = "Video/Fanart/%s.tbn" % (fanartid)
return (coverpath, fanartpath)
def copy_thumb( userdatapath, tbn, outputpath ):
thumbpath = userdatapath + "Thumbnails/" + tbn
success = False
if (os.path.isfile(thumbpath)):
if (os.path.isfile(outputpath)):
success = True
else:
try:
shutil.copyfile(thumbpath, outputpath)
except IOError, e:
print "Unable to copy file: %s" % e
sys.exit(1)
finally:
success = True
return success
# both paths must be pre-existing and the output path needs write permissions!
xbmcuserdata = "/home/mediacenter/.xbmc/userdata/"
outputpath = "/storage/incoming/xbmcexport/"
con = None
try:
# note: it's fine to use the latest database version, as it contains all required info as well
con = lite.connect(xbmcuserdata + "Database/MyVideos64.db")
cur = con.cursor()
cur.execute("SELECT movie.c00, movie.c22, files.strFilename FROM movie JOIN path ON path.strPath=movie.c22 JOIN files ON files.idPath=path.idPath")
data = cur.fetchall()
pattern = re.compile("[^\w ]+", re.UNICODE)
for (moviename, moviepath, moviefiles) in data:
paths = get_thumbnails( moviename, moviepath, moviefiles )
sys.stdout.write("[" + moviename + "] ")
safemoviename = pattern.sub("", moviename)
coversuccess = copy_thumb( xbmcuserdata, paths[0], outputpath + safemoviename + "_cover.tbn" )
fanartsuccess = copy_thumb( xbmcuserdata, paths[1], outputpath + safemoviename + "_fanart.tbn" )
print "COV:" + str(coversuccess) + " FAN:" + str(fanartsuccess)
except lite.Error, e:
print "Error: %s" % e.args[0]
sys.exit(1)
finally:
if con:
con.close()
I can also understand why you didn't want to bloat the XBMC source by replicating the old hashing method (since it was pretty complex with many different cases for different library types) just to do an in-place migrate of broken URLs.
However, here's an idea that I just had: If one knew where the currently chosen cover/fanart image URLs were stored in the database, one could verify if the files still exist online, and if not:
* Run the legacy hashing code on the paths + files to get the old local files for cover + fan art (such as "Video/b/b1ae7b4a.tbn")
* When you know the old local file, you now crc32(oldbrokenURL) to get the valid URL-based hash for the broken URL
* Next, copy("Video/b/b1ae7b4a.tbn", "Video/s/omenewhashbasedonURLinstead.tbn")
* Insert an entry in the Textures database pointing the old broken URL to the hashed version of the old broken URL.
Voila, in-place upgrades of the locally cached versions of old broken URLs.
This script is a good idea because there will be a swarm of people with "OMG NEW XBMC SUCKS MISSING HALF MY IMAGES IN MY 5000 MOVIE LIBRARY".
Could even be run automatically ONCE and then delete itself (the script).
The script would basically replicate the exact same thing that can happen in the future anyway:
* The CURRENT (valid) URLs can ALSO become invalid someday, causing the user to have invalidURL->cachedThumb mappings in Textures.db
* So why not import the old, now-dead URLs in the same way; it doesn't matter anyway. All URLs are bound to break some day and an import would help users.
Of course, the best method of all (as you say) would be to not store direct URLs at all, but instead store some form of siteName+movieID+fanartID mapping so that URLs cannot break in the future, since they'll be allowed to move images around as much as they want (short of the site completely dying or the actual art being permanently deleted from the site) without affecting our Textures database/local cache. On the same note, it would then be nice to move from crc32(URL)-named local files to something like "Thumbnails/sitename/movieid_imageid.tbn" to get away from the URL-centric system.
PS: I've now broken my Python virginity. Next on my list: The real life one!