scraping: ignore everything in brackets
#1
Hello,

I use additional info in my folder names, but when (re)scraping my folders, this info makes auto-matching.
My folder structure is 
     [year] Title   [additional info]
How can i get EMM to ignore everything in brackets (or maybe recognize the leading bracket as the year and ignore everything else in brackets...)?

thanks,
vonson
Reply
#2
(2020-11-23, 20:12)vonson Wrote: I use additional info in my folder names, but when (re)scraping my folders, this info makes auto-matching.
What do you mean with that? Usually you have run a DB update, Ember adds new movies and try to clean the file oder folder name from all the unnecessarily information to get a clean title for the initial search. There are some filters to clean the file/folder name under Settings => Movies => General => Path/File Name Filters.
How they work: if one of the filters match a part of the file name, this part and all following characters will be removed. Usually only one filter has to match to get the clean title because normally the name starts with the movie title and all additional information comes after. I think the problem with your naming style is that you start with the year and the "year" filter removes the complete file name. If all characters are removed after the filter, Ember does a fallback to the full file/folder name to get at least any title.

So if all of your movies has exactly that scheme you have to change the filters. You can remove ALL filters and add this two that should only remove the years at the beginning and the additional information at the end:
Code:
\[\d\d\d\d]
\[.*\]

The order is important because the first search for a "4 digits inside [ ]", the secound one for "anything inside [ ]". If the secound one is executet at the first place, everything will be removed because all is between [ ], e.g. "[year] Title   [additional info]".

As example:
[2020] Avatar [Bluray, HDR, 4K, TeamX]
\[\d\d\d\d] removes [2020], Result: Avatar [Bluray, HDR, 4K, TeamX]
\[.*\] removes [Bluray, HDR, 4K, TeamX], Result: Avatar

Its simple regex and you can test or try other expressions on https://regex101.com/.
Reply
#3
Hey, thanks for the quick reply, i did not expect that.

Your description seems plausible, I have to do some experiments with that. Sounds promising.
Is there any way to have EMM use that year info, that I remove with that expression, as the movie's year for scraping?

Anyways, I think I have something to go with. Thank you, I will report back.
Reply
#4
The year is determined independently of the filters. Depending on your source settings the whole path or only the file name will be checked for a year between 1900 and 2099.
Reply
#5
It seems to work perfectly. All I was hoping for.

Thank you for your help. Much appreciated.

Best regards,
vonson
Reply

Logout Mark Read Team Forum Stats Members Help
scraping: ignore everything in brackets0