HOW-TO: Scrape anime metadata without renaming files
#1
So after much time spent googling, I've found that the answer to the question "How do I import my 2TB anime folder into xbmc?" is "spend a week of your life renaming all your files!" Instead of doing that I spent an hour writing some regex that matches pretty much all anime naming conventions, including some pretty messy names such as: [Coldlight]_Mahou_Shoujo_Lyrical_Nanoha_StrikerS_10v2_DVD[H264][E4A98905].mkv

Caveats:
- You must use thetvdb scraper. It's actually pretty good for anime though, and other scrapers do some weird things with season naming.
- For shows with multiple seasons, you must have the files put into Season 1, Season 2, etc subfolders, instead of having K-ON! and K-ON!! in your Anime folder. If you have everything in one folder with absolute numbering, you can right click that particular show in the library > change content > tvdb scraper settings > use absolute ordering, then right click > tv show info > reload. (tvdb scraper limitation)
- Any specials must be thrown into a "Specials" subfolder, and will not be shown in the xbmc library. You can still view them in the files menu if you so desire. (another tvdb limitation)

Despite these downfalls, this is still significantly less effort than renaming everything. All you need to do is add this to your advancedsettings.xml:
Code:
<advancedsettings>
    <video> <!-- Stop XBMC indexing some unwanted common items -->        
        <excludetvshowsfromscan action="prepend">
            <regexp>(?i)sample</regexp> <!-- Greedy, whole path, case insensitive ignore -->
            <regexp>(?i)uTorrentPartFile</regexp>  <!-- Ignore common scrap files -->
            <regexp>[\/\\][sS]pecials?[\/\\]</regexp>
            <regexp>[\/\\][Ee]xtras?[\/\\]</regexp>
            <regexp>(?i)[\. \-_](?:nc)?(?:op|ed|sp|pv)[\. \-_\(\[\d]</regexp>
        </excludetvshowsfromscan>
    </video>

    <tvshowmatching action="prepend"> <!-- remove this if not using smb:// or nfs:// :  [\\/][\\/].*? -->
        <regexp>(?i)[\\/][\\/].*?[\/\\].*?Season (\d+).*?[\\/].*[\. \-_\[](?:ep)?(\d\d)(?:[_\-\. ]?v\d)?[\. \-_\(\[].*[\]\)].*$</regexp>
        <regexp>(?i)[\\/][\\/].*?[\/\\].*?[\. \-_\[](?:ep)?(\d\d\d?)(?:[_\-\. ]?v\d)?[\. \-_\(\[].*[\]\)].*$</regexp>
        <regexp>(?i)[\\/][\\/].*?[\/\\].*?[\. \-_](\d\d?)x(\d\d?)[\. \-_].*?\.(?:mkv|mp4|avi)$</regexp>
        <regexp>(?i)[\\/][\\/].*?[\/\\].*?S(\d\d)E(\d\d).*</regexp>
    </tvshowmatching>

    <tvshowmatching action="append">

        <!-- I didn't write this block, but it's useful -->
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\]\D+[\. _\-](\d{1,2})[\. _\-]\D+</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\D\1(\d\d)(?!.*])</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\](\d{1,2})\W([^/\\]*)</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\Wep?\.?(\d{1,2})\W([^/\\]*)</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\W?episode\W?(\d{1,2})\W([^/\\]*)</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\Wpart\W?(\d{1,2})\W([^/\\]*)</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\Wchapter\W?(\d{1,2})\W([^/\\]*)</regexp>
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?\1\W?x\W?(\d{1,2})([^/\\]*)</regexp> >
        <regexp>(?i)[/\\](?:s|season)\W?(\d{1,2})\D*[/\\].*?s0?\1[ex.]{0,2}(\d{1,2})([^/\\]*)</regexp>
        
        <regexp>(?i)[\\/][\\/].*?[\\/].*?Season (\d+).*?[\\/].*?[\. \-_\[(?:ep)](\d\d?)v?\d?.*?\.(?:mkv|mp4|avi)$</regexp><!-- last resort -->
        <regexp>(?i)[\\/][\\/].*?[\\/].*?[\. \-_\[(?:ep)](\d\d?)v?\d?.*?\.(?:mkv|mp4|avi)$</regexp><!-- last resort -->
    </tvshowmatching>
</advancedsettings>

If you aren't using nfs or smb, remove all instances of this from the code:
Code:
[\\/][\\/].*?

That should be it! Enjoy your plot spoilers and lewd fanart!
Reply
#2
So what do you recommend I use if the Tvdb has the incorrect information on episodes (for instance the japanese Dragon ball z does not match up to the US)
Reply
#3
Hello,

Sorry to revive this old thread.

First of all let me say a great thanks for the scripts they were a great help! thanks for sharing!

I have come across a small problem with certain episode names which contain numbers. These are not recognized correctly.

For example: Bubblegum Crisis Tokyo 2040 - 01 - Can`t Buy a Thrill [anime-takeover].ogm

Is there a solution to this?
Reply
#4
(2015-04-25, 12:17)splatterpop Wrote: Hello,

Sorry to revive this old thread.

First of all let me say a great thanks for the scripts they were a great help! thanks for sharing!

I have come across a small problem with certain episode names which contain numbers. These are not recognized correctly.

For example: Bubblegum Crisis Tokyo 2040 - 01 - Can`t Buy a Thrill [anime-takeover].ogm

Is there a solution to this?

I actually have that series, so I will show you a screenshot on how to do it
Image
I do however use a media manager (Mediaelch) to scrape the data 1st, but as long as you follow the same naming scheme as any TV show needs, and follow the order TVDB uses, it works like a dream.

Kodi is rather "stupid" in the way it reads files. it needs to know the season number and episode number to be able to read. when you have " Bubblegum Crisis Tokyo 2040 - 01 - Can`t Buy a Thrill" it doesn't know if 2040 is the episode nor the season. Hence why I use the S##E## scheme.
Image
Reply
#5
There's also some notes on Anime (wiki) that might help.
Reply
#6
The TVDB solution in combination with a media manager/renamer works.

AniDB does not support seasons, TVDB does not support hashes? All I found was name matching, which can lead to ambiguities when there are prequels/sequels involved. However, its goodbye to AOM for now.

Thanks to all of you!
Reply
#7
Sorry but I don't quite understand if Kodi can handle something like this:

Tasogare Otome x Amnesia [BD] [1080p]\[ANK-Raws] Tasogare Otome x Amnesia - 01 (BDrip 1920x1080 x264 FLAC).MKV

without any external media scraper/renamer?

And if it can then what regex I should add to make it works?
Reply
#8
(2015-04-26, 00:00)isamu.dragon Wrote:
(2015-04-25, 12:17)splatterpop Wrote: Hello,

Sorry to revive this old thread.

First of all let me say a great thanks for the scripts they were a great help! thanks for sharing!

I have come across a small problem with certain episode names which contain numbers. These are not recognized correctly.

For example: Bubblegum Crisis Tokyo 2040 - 01 - Can`t Buy a Thrill [anime-takeover].ogm

Is there a solution to this?

I actually have that series, so I will show you a screenshot on how to do it
Image
I do however use a media manager (Mediaelch) to scrape the data 1st, but as long as you follow the same naming scheme as any TV show needs, and follow the order TVDB uses, it works like a dream.

Kodi is rather "stupid" in the way it reads files. it needs to know the season number and episode number to be able to read. when you have " Bubblegum Crisis Tokyo 2040 - 01 - Can`t Buy a Thrill" it doesn't know if 2040 is the episode nor the season. Hence why I use the S##E## scheme.

Actually you can solve this with regex. I was looking at all the anime regex out there and didn't find anything that could handle the files the way I have them named right now, which is without seasons and using absolute order. So I decided to learn how to do it myself.

Without any real programming background and after a grueling 2 days banging my head against a wall, I think I've come up with something that handles most of the naming conventions that were giving other regexes trouble.

The key to solve the problem of that show is just to have the regex essentially go back to front by backtracking so it picks the last number in the string that doesn't have brackets {[( around. I also made one that picks up double episodes in the format XXX-XXX. I'm still trying to decide how I'm going to deal with Specials. Probably will make a Specials folder and number them according to the TVDB, since I can't for the life of me understand how Anidb deals with specials and movies.

Anyhow, this will pick that show up correctly. I have some optional checks that I use for my convenience given how my files are named that can be taken out without affecting the match. For example I have all my anime in a folder called (you guess it) Anime, so I make sure I match that, to not ruin my other TV Shows. For example something named TVShow 101, will be read as episode 101 of season 1, since I'm using absolute order, nor season 1 episode 1, so be careful if you want to remove that. I also check that the file is either .mkv, .avi, .mp4 or .ogg, this is because in some folders I have some weird .url or other extension files named the same that were getting picked up with previous regexes. You can safely remove tha as well.

So here it is:

Code:
<regexp>(?i)()Anime(?:(?=.*\.mkv$)|(?=.*\.mp4$)|(?=.*\.avi$)|(?=.*\.ogg$)).+(?=\w)(?&lt;![a-z0-9])(\d+)(?&lt;=\d)(?![a-uw-z0-9\])}px])</regexp>

This fails on double episodes like 01-02 on purpose. I'll post that one that catches those in a little while along with an explanation of what it is it's actually doing. In the mean time feel free to test it out and let me know if it fails at anything (other than the mentioned double episodes and specials). In my testing it's catching 100% of the stuff I've thrown at it so far.
Reply

Logout Mark Read Team Forum Stats Members Help
HOW-TO: Scrape anime metadata without renaming files0