Extra REGEX for TV Show Episode matching

  Thread Rating:
  • 16 Votes - 4.13 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Information  Extra REGEX for TV Show Episode matching
Post: #1
In simple terms adding this to your XBMC configuration will match more TV shows than XBMC will by default.

After several months assisting users via IRC I decided to create a generic set of additional REGEX expressions to catch TV episodes XBMC does not and will not by default (e.g. Topaz etc) for fun. Quite a bit of offline testing has been completed, and whilst I am confident I cannot guarantee this REGEX will not produce some false positives.

The method used in this REGEX differs from XBMC default methodology in SOME places by extracting the season number from folder names rather than the file name. By doing this we can match stuff that otherwise could never be matched.


Examples:

13.show.dvd.avi only has the ep number but "MyShow/season 2/13.show.dvd.avi" has both.
"MyShow/season 5/4400513.show.dvd.avi has a show name that includes numbers making matching very difficult.

Installation

Adding additional TV episode matching is simply a matter of inserting the code listed later to the a file called advancedsettings.xml.
To locate and understand this XML file read the first part of this link http://wiki.xbmc.org/?title=AdvancedSett...e_settings
Remember by default advancedsettings.xml will NOT exist. Also note the name of this file IS CASE SENSITIVE and will require a XBMC restart to be applied.
End to end installation should take no more than 2 minutes.

Required Folder Structure

Approximately 50% of this REGEX requires you to have a sensible folder structure for your TV shows as follows:

/showname/season x/episodes e.g. The Unit/season 2/the.unit.203.avi

Note: Case is irrelevant
Note: This is "Season 1" and NOT "Season 01"

If you do not have this structure 50% of these REGEX's will NOT work for you.

I have had some requests to support different structures. Whilst I am happy to accommodate some slight differences I cannot support multiple languages or weird ass structures. In the end trying to support this would make the REGEX ridiculously complicated for the majority of users whilst only helping a minority.

The chosen structure was decided on after months of seeing what users had developed on their own. Most came to this structure independently and I am happy with it.

Feedback

You are welcome to experiment with this REGEX and report back if you need help or have some suggestions.

I will maintain these first few post. I will happily add new REGEX under a few small conditions:

1. The format you are trying to match allows for a good chance of no false positives i.e. its not my intention to try and deal with absolute rubbish naming dredged from the bowels of the internet.
2. The format you are trying to match will be useful to other people. If the matches will only ever be of use to you alone then this set is not the place for it.
3. If you are suggesting REGEX please supply a couple of examples of the full path you are matching against so I/we can test it.
4. I wont be adding REGEX or updating the existing ones with ├╝ber complex/1337 REGEX just because it can be done more cleverly. Or put another way we need to keep these simple so normal users can get to grips with them.


Support

If you wish support please do the following ([b]NONE OF WHICH ARE OPTIONAL
):

#Update to the very latest SVN version (or in the case of feature freeze the latest Alpha/Beta/RC). This is the version i stick with and will be testing against. Old versions may not even recognize this REGEX.
#Post a COMPLETE debug log to pastebin.com (no where else) and link it here (make sure this log catches the update library procedure.
#The DEBUG log should contain lines with "DEBUG: could not enumerate file" or entry's which are matched incorrectly. If it does not contain these elements then you do not need help.
#The DEBUG log should show that you are using this complete REGEX set and not small parts of it. The reason for this is that I cannot easily identify which set you are running but mostly the order in which these REGEX run is very important.

If you don't do these 4 simple things I cannot help and wont even bother. Sorry but time is to short to help those that wont help themselves Smile

An again... DO NOT post hand written examples of problem file names or things you wish to match. I NEED the failed to enumerate lines to see what XBMC is seeing not what you think it is seeing.

[/b]

Please, this thread is for discussions on THIS REGEX compilation only. It is not for random REGEX support, how do i setup advancedsettings, why doesn't my library work or anything else. For all other topics create a new thread.

Happy hunting Smile


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2012-05-08 12:29 by zag.)
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #2
Current stable: V2.3
V2.4 - 05/09/2011
http://pastebin.com/UPPrk7VU
Added more movie stacking REGEX. Read the WARNING. Dont run if you have a one pile movie pile.
Note there are some weird movie stacking fails even though the REGEX is correct. Will debug with Eden later.
Updated DIRFIX handling based on a bug posted by SoWErA. Cheers
Lastly this is the last release that I will be testing with Dharma since I am moving to Eden.

V2.3 - 18/01/2011
http://pastebin.com/N5mjtBxk
WARNING: Big changes with little to no testing. Released due to demand. Use at your own risk.
Non critical typos, formatting and spelling.
Anime matches now happen before XBMC. By request.
Tighter CRC test in anime
Added movie stacking. Dont run these if you have a one pile file movie folder. (This is the reason XBMC doesnt do it natively)
Added OpenELEC default CPU and GPU temp settings
Added Samba timeout value
Explicit abc123 was catching pipes as well
Can move to \D instead of [^\d] etc. We have had proper PCRE for a long while now.
Stick on \d{1,2} rather than a mix with \d\d?
Now handles S1, S01, S 1, S 01 as well as Season1, Season01, Season 1, Season 01 directorys

V2.2 - 02/11/2010
http://pastebin.com/ehjt8aVh
Minimum required version Dharma Beta 4
Anime REGEX not capturing full ep number causing weird duplicates
Anime REGEX now handles {[( for CRC encapsulation
Anime REGEX - more tweaks
Added dimonscreensave
Removed all URL encoded REGEX as nothing to match against is URL ecoded anymore
Trivial comment typos
Removed a dupe REGEX that creapt in somehow
Removed a bunch of useless trailing spaces
Reordered. Should prove slighly faster now.

V2.1 - 29/09/2010
http://pastebin.com/XDDx3Thy
Yet more silly typos. Apologies to all.
Non critical typos, formatting and spelling.
Added first Anime match attempt. Tread carefully, anime naming is as oddball as anime itself.
Shortened comment separators. Was starting to take up too many lines.
Stripped out intro words and added them to change log.
Increased recently added from 250 to 300.

V2.0 - 20/09/2010
http://pastebin.com/EvL65F34
Added <backgroundupdate> update set to false for video library
Added music library settings placeholder
Added <backgroundupdate> set to false for music library
Added <flattentvshows> set to never for video library
Added first attempt at handling single episode DIRFIX re-releases
Fixed lame copy and paste [ error
Preparing for deprecated URL encoding requirement for RAR containers
Added first attempt at handling multi episode DIRFIX re-releases
Enabled GPU accelerated dds fanart <useddsfanart>. I debated setting this one but I suspect more users will want it than not. Please report back.
Exclude REGEX is now VERY greedy. Anything with "extras" in it anywhere is excluded. In almost all instances this will be fine but YMMV.

V1.9 - 08/09/2010
http://pastebin.com/jCqDF7hk
Changed order. As feared the change to prepend caused false positives.
Fixed bug with exclude REGEX and double //
Exclude REGEX is now VERY greedy. Anything with "sample" in it anywhere is excluded. In almost all instances this will be fine but YMMV.
Ignore Torrent client part files
Tested against Dharma. Consider BETA quality.

V1.8 - 18/01/2010
http://pastebin.com/f4a5aa918
Formatting
Ignore case set.
Ignore files named sample.*
Minimal required SVN to operate r26522.

V1.7 - 21/12/2009
http://pastebin.com/f656dd4f2
Changed to mostly inline comments for REGEX. Added a custom sort token to Ignore " when sorting (Thanks cptspiff for the fix).
Minimal required SVN to operate r25845.

V1.6 - 16/12/2009
http://pastebin.com/f195c0368
This is a significant update. Consider it ALPHA. In order to fix the broken REGEX I had to change almost all the custom REGEX to prepend.
This increases the chances of false positives significantly although in testing they still perform well.
This version requires SVN 25638+ to be fully compatible. Most will work with older versions.
If you have a TV show called Extras rename it to Extras (2005) for the EXCLUDE REGEX to be compatible.
Also a big thanks to Grum in IRC for the RAR REGEX. This will natively handle most SCENE RAR packs very accurately.
Use at you own risk.

V1.5 - 04/12/2009
Seems like EXCLUDE matching is case sensitive. Quick fix for testing.
V1.1 - 11/11/2009
General cleanup in preparation for pastebin.

V1.4 - 30/11/2009
http://pastebin.com/f74eb50d9
Tweaked Episode match to be a little less strict. Should catch /Shows/Mad Men/Season 1/Episode 12.avi etc

V1.3 - 20/11/2009
Added setting to turn off auto thumbs

V1.2 - 16/11/2009
http://pastebin.com/f6edebc7a
Split folder exclusions "extras" into two REGEX one for movies and one for TV.
As of r24405 video stacking regular expressions must contain exactly four (4) capture expressions. Removed old stacking REGEX will add back in as required.

V1.1 - 11/11/2009
http://pastebin.com/f62eee83b
General cleanup in preparation for pastebin.

V1.0 - 30/10/2009
Replaced some of the stacking REGEX removed in commit 24060. WARNING this may break serials support.
In general I am not happy with this new REGEX and it needs more work.
This file also includes some general XBMC settings I use.
It would be better if I didn't include these settings but doing so makes it easier for me.
Delete them if they are not to your taste.

V0.9 - 28/06/2009
http://pastebin.com/f544c8deb
Default XBMC REGEX producing false positives with TPZ.
To deal with this we now have both prepend and append REGEX.

V0.8 - 10/06/2009
http://pastebin.com/f5f4fae52
After a IRC discussion with cptspiff and mgc I release this version to cater for
TOPAZ releases but with NO REQUIRED FOLDER STRUCTURE.
This should also handle Topaz which are still in RAR format.
Please report back on success as I am working only from data scraped from Google.

V0.7 - 08/06/2009
Added excludefromscan section. Do not catalogue anything in a folder called extras.
Using the expected TV folder naming structure still allows the TV show "Extras".
Note: This does not work for me but does for other users. Please report back your experiences.

V0.6 - 06/06/2009
http://pastebin.com/f48cec53d
New component. Commonly missed movie stacking REGEX.
Big caveat, will NOT fix movies already in the library.
To fix completely remove the multiple movie entries and rescan.

V0.5 - 03/06/2009
Added REGEX to match some awful TV naming that has no season.
This release marks 99% completion rate of Google scraped XBMC missed episodes (10,000+ ).
The last REGEX in the list and may product false positives. Use with caution.

V0.4 - 28/05/2009
Cater for cross platform difference in paths i.e. \/

V0.3 - 16/05/2009
Support for /season 5/Lost - 5 x 05.mkv

V0.2 - 08/05/2009
TPZ matches now require season folder. Fixes some false positives.

V0.1 - 05/05/2009
Initial Upload

################################################################################​#######
This REGEX is UNOFFICIAL/EXPERIMENTAL and may require a strict folder structure.

*Use at your own risk*

We use multiple REGEX rather than try to build one REGEX to rule them all.
This wastes CPU cycles but allows easier bug finding, refining and end user understanding.
The order they run is important. It will never catch all episodes.
Since were trying to deal with bad naming it could result in false positives.
Comments and submissions welcomed but try to keep it simple. If in doubt use two simple REGEX rather than one complex one.

To install see: http://www.xbmc.org/wiki/?title=AdvancedSettings.xml

Tested against Dharma onwards only but MAY be backwards compatible.
################################################################################​########


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2011-09-05 11:23 by xexe.)
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #3
Reserved


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2009-12-21 11:02 by xexe.)
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #4
Reserved


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2010-09-08 15:08 by xexe.)
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #5
Reserved


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2010-09-08 15:08 by xexe.)
find quote
havix Offline
Junior Member
Posts: 30
Joined: Sep 2008
Reputation: 0
Post: #6
Are these regex's built into XBMC now? And if they aren't why not?
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,221
Joined: Oct 2003
Reputation: 178
Post: #7
Some of them (as clearly commented in the file) would cause too many false positives, others are specific to scene groups such as tpz which we clearly don't want by default.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #8
jmarshall is obviously is spot on. Its fine for users with the skill and motivation to add these but if they were default the false positives would eat up valuable dev support time epecially since users wouldnt have a clue why it was happening.

I only did this for fun and to save time answering the same question over and over via the support IRC.

Now dont get me wrong it IS surprisingly accurate and in my simulations we are talking only 1 ep in a 1000 errors BUT a user with a naming scheme that triggers one of these errors will likely trip hundreds of them.

Lastly the TPZ naming scheme matching will NEVER be 100% accurate as their naming is quite simply completely useless.


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
find quote
j3ff Offline
Junior Member
Posts: 20
Joined: Sep 2009
Reputation: 0
Post: #9
Thanks for this - before I found this thread I was beating my head against the wall wondering why XBMC was not finding my stuff. This made it all 100% perfect in 2 minutes.
find quote
dc_williamson Offline
Junior Member
Posts: 8
Joined: Jun 2009
Reputation: 0
Location: Dubai, UAE
Post: #10
Most of my TV shows are stored in the folder structure:
show name\season n\show name Snn Enn.extn
(where nn=two digits 0-9).
None of the above regex work for me. However I use this in my advancedsettings.xml file:

<tvshowmatching action="prepend">
<regexp>[\\/]*S([0-9]+) E([0-9]+)[^\\/]*</regexp>
</tvshowmatching>

Which works fine Smile

Only issue comes when I have shows with no season number (like they were one season one) in the format show name\show name Enn.extn

I'm still trying to work out a regex that'll work for this - for now I'm fudging it by renaming all the files in a Snn Enn format Smile Anyway, feel free to use or add this to the above it it's useful.
find quote
locust Offline
Senior Member
Posts: 108
Joined: Aug 2009
Reputation: 0
Post: #11
Boom!

thanks so much for your hard work.

I have lots of -TOPAZ releases and this was pissing me off to no end how it wouldn't scan in their episodes

Nicely done!
find quote
locust Offline
Senior Member
Posts: 108
Joined: Aug 2009
Reputation: 0
Post: #12
Actually I am not having luck with -MEDiEVAL rips though, for example

Arrested.Development.S01E05.WS.DVDRip.XviD-MEDiEVAL has an archive name of med-ad105.rar and it's not picking it up (and i cant scan it into my library)

any chance of getting a solution for that?

thanks!
find quote
EvilMatt666 Offline
Junior Member
Posts: 8
Joined: Nov 2008
Reputation: 0
Post: #13
I'm kinda new here but I've been messing about with XBMC for a while now. I have just installed the newest version and thought maybe it might sort my library problem for the tv show rips on my system but it hasn't and looking at the REGEX code I would doubt it would sort my naming system properly.

Basically the way I rename all my files (because I'm anal) is as follows:-

Tv Hard drive/tv show title/season 1/tv.show.-.101.-.episode.title.(DVD).avi/mkv/etc

"101" would mean season 1 episode 01 and then "2024" would be season 20 episode 24. It just doesn't seem to pick up the episode details at all. In fact this last pass with the REGEX file utilised has just given me files numbered 1-30 and nothing inside in Library view. I can and have been using the file view to go through my TV shows but it's not perfect. Anyone got any ideas?

Thanks in advance.
find quote
jmarshall Offline
Team-XBMC Developer
Posts: 26,221
Joined: Oct 2003
Reputation: 178
Post: #14
@EvilMatt666. Do a debug log while refreshing a show. It'll tell you right off whether it's detecting them properly or not.

Don't just guess that things aren't working due to your file naming!

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


[Image: badge.gif]
find quote
xexe Offline
Fan
Posts: 715
Joined: Sep 2008
Reputation: 1
Post: #15
Can users confirm the the "Extras" removal regex is working for them. I have had reports that it works but i cannot get it working for myself which is curious.

If it does work can you confirm if you use the tvshow.nfo and movie.nfo URL tag method.


Having problems getting your TV shows recognized?

Try my extra TV show matching REGEX here
(This post was last modified: 2009-11-10 15:41 by xexe.)
find quote
Post Reply