Finally I got everything working the way I want. Great thanks to olympia
. Without you I would never sort this out.
I post here my results for the case if somebody finds it useful to organize their own movie collection.
File names should contain movie title and year. I tried 2 formats, both work perfectly:
1. Sidney Lumet - Dog Day Afternoon (1975).part1.avi
2. Sidney Lumet - Dog Day Afternoon part1 (1975).avi
And the third is obvious when you don't have movie broken into parts:
3. Sidney Lumet - Dog Day Afternoon (1975).avi
There are 3 very important points I've learned about XBMC scraping:
1. It cuts automatically the year and the file extension from the file name before the scraper even starts working
Sidney Lumet - Dog Day Afternoon (1975).avi
Sidney Lumet - Dog Day Afternoon - this what comes to the scraper in buffer $$1 (well, not exactly this, see item 3)
The buffer $$2 in this case will contain "1975" - the year stripped from braces, even before the scraper starts.
2. It automatically recognizes words like "part[1-9]", "cd[1-9]" cuts them off and displays several parts as one item in the movie library. No further action is required from the scraper. Thus
Sidney Lumet - Dog Day Afternoon (1975).part1.avi
Sidney Lumet - Dog Day Afternoon (1975).part2.avi
are scraped as one item, which is
Sidney Lumet - Dog Day Afternoon (well, not exactly this, see item 3)
at buffer $$1, before applying regular expressions by scraper.
3. Items in $$1 come to scraper URL-encoded and lower-cased. Thus in our example $$1 will actually contain
All spaces are replaced with %20 and dash is replaced with %2d
I had to modify a little default scrapers, so that they can work with my file naming. Here is what I have for now:
1. TMDB scraper (on Ubuntu: ~/.xbmc/addons/metadata.themoviedb.org/tmdb.xml)
<RegExp input="$$1" output="<url>http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1+$$2</url>" dest="3">
There was an inner regexp and I removed it, because it does absolutely nothing. I added "+$$2" to the url so that it also searches by year - it is supported functionality of TMDB public API, so I don't know, why it was not used in the default scraper. Also I used my own regexp to parse file names. I tried to use ".+?" instead of ".+" like suggested by bambi73
, but it appears too "lazy", according to my tests it may take just "D" from "Dog Day Afternoon". I'm sorry if I'm wrong here, because I'm not so strong in regexps, as bambi73
Unfortunately, TMDB appears to not contain information about some of my movies (Woody Allen - Manhatten - what the heck, is it so rare?). That's why I used also another scraper - IMDB
: It's my mistake in typing. Manhatten should be ManhattAn. And of course TMDB could find a misspelled word too, but anyway I'm happy it was found at all
2. IMDB scraper (on Ubuntu: ~/.xbmc/addons/metadata.imdb.org/imdb.xml)
<CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-1">
<RegExp input="$$1" output="<url>http://akas.imdb.com/find?s=tt;q=\1$$4</url>" dest="3">
<RegExp input="$$2" output="%20(\1)" dest="4">
Again, the same regexp, but now there is also inner one for year. It was there and I didn't touch it, though I find it strange to use inner regexp, which just adds %20 before the year in the buffer $$4, while you can just add "%20($$2)" to the url directly. Anyway the scraper works 95% of time for me, so please consider doing this yourself, if you need to.
If this information is anyhow useful and somebody can point me to the corresponding Wiki page, I can add it there. Or somebody can do it for me, if I cannot access that wiki.