dteirney Wrote:Does it make sense to further tweak the title matching in the VideoInfoScanner? That fallback mechanism helped to match some of my library since some shows have a subtitle, but nothing else, e.g. no original air date or no season and episode information.
This was my next thought.
I was thinking to do it the same way movies are done, but then realized that was all server-side matching.
My thought was:
strip off punctuation and all leading articles (a, an, the) from both sources, then use something like Contains() to see if the title I have is a substring of the tvdb string. This is a heuristic that would work well for the set of programs I currently have looked at, but I'm not sure it is a great general-purpose answer. It also may not work as well in other languages than english.
But something like 'similar_text' looks quite promising. It looks like it is pretty slow, but I don't imagine we're doing enough compares or long enough ones to make that an issue.
Assuming we can find a string-compare function that works, I'd like to adapt to to the case where multiple episodes contain the same air-date as well.
FYI, I found this, which describes lots of methods:
http://staffwww.dcs.shef.ac.uk/people/S....trics.html
And from there this library:
http://sourceforge.net/projects/simmetrics/
I think I'll put together a test case and run it against my db to see what I can get.
Dteirny, I don't think it makes sense to hold up the existing work before figuring this out. Looks like spiff is ok with what you've got so far, so let's try to get that in, then work on further heuristics to improve things. At least in North-America, the patches as-is have a very high success rate on anything recorded with SD.
Edit:
The other thing to remember, is that when I'm done, all myth stuff should be available in the library regardless of whether there was a DB match, so I am more focused on episodes than tv-shows where myth's info can be almost as good as what you get from thetvdb.