TMDB scraper fix for question mark and dash
#1
Hi,

I have encountered some issues regarding the TMDB scraper and some characters in the title used for searching.

1. The TMDB api does not return a valid search result XML, when the URL-encoded '?' character (%3F) is used. As a result XBMC will say, it cannot connect to the remote server and doesn't even provide the possibility to add the item manually. Furthermore this will interrupt the automatic library update.

2. There is another problem with the '-' character. At least TMDB will respond with a valid result XML saying "nothing found", and you can add the item manually. Here a double '-' character or omitting it completely in the search string fixes the problem.

I have prepared a patch for the current version 1.4.5 of the scraper to remove the two (URL-encoded) characters from the title.

Code:
--- a/metadata.themoviedb.org/tmdb.xml    2011-11-21 20:32:50.366929036 +0100
+++ b/metadata.themoviedb.org/tmdb.xml    2011-11-21 21:08:40.000000000 +0100
@@ -1,10 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<scraper framework="1.1" date="2011-04-25">
        <CreateSearchUrl dest="3">
-               <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1$$4&lt;/url&gt;" dest="3">
+               <RegExp input="$$5" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1$$4&lt;/url&gt;" dest="3">
                        <RegExp input="$$2" output="+\1" dest="4">
                                <expression clear="yes">(.+)</expression>
                        </RegExp>
+                       <RegExp input="$$1" output="\1\2" dest="5">
+                               <expression noclean="1" repeat="yes">%3f|%2d|(%..)|([a-zA-Z0-9]*)</expression>
+                       </RegExp>
                        <expression noclean="1"/>
                </RegExp>
        </CreateSearchUrl>

I am relatively new to Regex expressions, so maybe there is a better solution. Anyway, it works well for me so I wanted to share it.

Kind regards!
Reply
#2
Or... remove those characters from the file name?
Reply
#3
Of course, but IMHO it's a questionable workaround.

The problem occured for me when using Opdenkamp's PVR extension. There the title is delivered by the PVR server. At first I fixed it in the virtual filesystem created in the PVR code by removing the question mark character there. But I realized, that it's not a clean solution at all, because it's not a filesystem issue but a scraper issue, which has nothing to do with PVR.

Even apart from PVR, changing the filenames of a big bunch of video files might be at least annoying.

Futhermore other scrapers or a future TMDB api might support (or less probably need) those character...
Reply
#4
I see this thread is over a year old and this improvement hasn't been integrated into the XBMC builds yet. Is there a technical reason for this or did this thread just get lost in the noise? I did a scrape of my media library and noticed that the #1 reason for an unidentified media file was the presence of a dash (-) character in the title. For example, "X-Men".
Reply
 
Thread Rating:
  • 0 Vote(s) - 0 Average



Logout Mark Read Team Forum Stats Members Help
TMDB scraper fix for question mark and dash00