Unable to properly modify a scraper
#1
Hi!

I have a problem with the movie folders/files naming conventions, which is clearly documented here:

http://wiki.xbmc.org/index.php?title=Vid...les/Movies

Unfortunately, I have a large collection already in place and wish not to change the collection. I would rather like to twist XBMC to understand my naming conventions, which are not all that dissimilar from what is requested. For instance, instead of having a movie in

/movie/Balada triste de trompeta (2010)/BALADA_TRISTE.ISO

as it would be fine, I have it in

/movie/Balada triste de trompeta (by Álex de la Iglesia, 2010; 107'; 6.5)/BALADA_TRISTE.ISO

I understood that what I need to change is in

...\AppData\Roaming\XBMC\addons\metadata.themoviedb.org\tmdb.xml

and is this part:

Code:
<CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>

I managed to parse correctly the year (in $$4) with the following small modification (changed the input of the second RegExp to $$1 and changed the regexp itself):

Code:
<CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$1" output="\1" dest="4">
                <expression clear="yes">%20(19[0-9][0-9]|20[0-1][0-9])%3b</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>

Yet, now I am stuck. I see that $$1 is the input of the first RegExp and \1 is the matching (complete, perhaps because the expression is empty?) but I do not see where I can write the (fairly trivial) regexp which removes completely the parentheses at the end of the $$1. I tried to replace

Code:
<expression noclean="1" />

with the (equivalent?)

Code:
<expression noclean="1"></expression>

but then, if I put just about anything in there, things break in ways that I do not understand.

Has anyone a clue? Is there a place where I can find some more documentation (I do not understand the relation of regexp's inside regexps or in parallel, for instance).

Thanks!

p.
Reply
#2
I am not even pretend to fully understand what and why you want to do this,

i think you should link this to themovie.db scraper, people who made that scraper should be able to help you.

If what you want to do is doable.

Reply
#3
Thanks, but I am not sure to understand your suggestion: (1) I *am* doing this within the tmdb scraper, as the code above shows, and (2) I do not see why it would not be doable--at this point, it is only a question of inserting a quite trivial regexp in the right place--it is just that I do not understand enough of the syntax of the scraping code to place the regexp in the right place. It is almost certainly a triviality but I do not get it.

Certainly, those who wrote the scraper can help.

Best,

p.
Reply
#4
I am sorry my English is rubbish.

(2) I did not say that it is not possible just that I do not know if it is Undecided

What I wanted to say is that maybe this http://forum.xbmc.org/showthread.php?tid...RELEASE%5D
is best place to get answers, all addons have their release thread and people who made addon tend to watch those threads.

Hopefully they should help you.

Regards

burke

Reply
#5
Messing with the scraper is not the best way to achieve what you want.

Revert your edits and try adding this to your advancedsettings.xml (wiki):
Code:
<advancedsettings>
<video>
  <cleandatetime>(.+[^ _\,\.\(\)\[\]\-])[ _\,\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-1][0-9])([ _\;\,\.\(\)\[\]\-][^0-9]|$)</cleandatetime>
  <cleanstrings action="append">
    <regexp>\(by .+</regexp>
  </cleanstrings>
</video>
</advancedsettings>

The cleandatetime has been adjusted slightly from the default, so it should capture the year correctly for your naming scheme.
That will then leave the title as "Balada triste de trompeta (by Álex de la Iglesia"
The cleanstrings will then remove the "(by etc." part, leaving you with just the title.
Reply

Logout Mark Read Team Forum Stats Members Help
Unable to properly modify a scraper0