Scraping recorded TV-Shows - extend TVDB Scraper - get function calls right
#1
Already spent days to get this working: I got a local PVR to record Movies and TV shows. For recorded movie files I was already successful scraping them with the information my PVR gives me by calling its XML-API - with the TV shows I still fail and I now hope somebody can help me.

I thought the task would be simple: Just make a HTTP-Call to the PVR API, get the TVDB-ID there for the file to be scraped, and then go on with the regular TVDB-scraper using this ID. So mainly just modifying the functions "CreateSearchUrl" or maybe also "GetSearchResults".

Yet the main problem I have is calling a function in the right way to do this TVDB-ID lookup. I tried many ways - two of them are shown below.

What I was still not able to figure out is how to call a function in the right way. My main questions are:
  1. When I use a function to get some XML-file from another site - how do I trigger to really GET the content from those sites? At what point of the execution of the scraper are URLs realy evaluated? My log files suggests that this is not triggered just calling a function - they mainly just extend the URLs and make them richer with code.
  2. Do I need to enclose the results of a function with any XML-tags? Mostly all of the code sample put the results between <details> and </details> - but why? Why "details" and could I also use "url" for example. I don't understand how this is used. I just noticed that if I don't use any enclosing tags at all the function simple doesn't show up being executed in the log file.

So this is my code ...

First try - extend CreateSearchUrl to call the PVR API to get the TVDB-ID and then pass it on regularily to GetSerachResults:
Code:
<CreateSearchUrl dest="3">
    <RegExp input="$$1" output="<chain function="GetTVDBIdFromEpisode">\1</chain>" dest="5">
        <expression >(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>
    </RegExp>
    <RegExp input="$$5" output="<url>http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="3">
        <expression noclean="1" />
    </RegExp>
</CreateSearchUrl>

<GetTVDBIdFromEpisode dest="3">
    <RegExp input="$$4" output="<details>\1</details>" dest="3">
        <RegExp input="$$1" output="<url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="4">
            <expression />
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetTVDBIdFromEpisode>

<ParseTVDBIdFromEpisode dest="5">
    <RegExp input="$$1" output="<details>\1</details>" dest="5">
        <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>
    </RegExp>
</ParseTVDBIdFromEpisode>

Not working - both the API call and the function ParseTVDBIdFromEpisode are working, but the result is not passed on to the top. So the value for the ID to be used in CreateSearchUrl is finally empty.

Code:
23:15:30 T:1826356112   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:15:30 T:1826356112   DEBUG: scraper: CreateSearchUrl returned <url>http://thetvdb.com/api/GetEpisode.php?id=<chain function="GetTVDBIdFromEpisode">2026</chain>&language=de</url>
23:15:30 T:1826356112   DEBUG: scraper: GetTVDBIdFromEpisode returned <details><url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=2026</url></details>
23:15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://10.0.0.1:8081/record.onexml?id=2026
23:15:30 T:1826356112    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1
23:15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"
23:15:30 T:1826356112   DEBUG: scraper: ParseTVDBIdFromEpisode returned <details>4818866</details>
23:15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://thetvdb.com/api/GetEpisode.php?id=
23:15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id="
23:15:30 T:1826356112   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><results></results>

Second try - extend the function GetSearchResults to make the API-call:

Code:
<CreateSearchUrl dest="3">
    <RegExp input="$$1" output="<url>http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="3">
        <expression noclean="1">(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>
    </RegExp>
</CreateSearchUrl>

<GetSearchResults dest="8">
    <RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity>\1</entity></results>" dest="8">
        <RegExp input="$$1" output="<title>\1</title>" dest="5">
            <expression><t_caption>([^<]*)</t_caption></expression>
        </RegExp>
        <RegExp input="$$1" output="<url cache="tt\1.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="5+">            
            <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetSearchResults>

<GetTVDBIdFromEpisode dest="3">
    <RegExp input="$$1" output="<url cache="\1-$INFO[language].xml">http://thetvdb.com/api/1D62F2F90030C444/series/\1/all/$INFO[language].zip</url>" dest="3">
        <expression><seriesid>([0-9]+)</seriesid></expression>
    </RegExp>
</GetTVDBIdFromEpisode>

Still not working - the API function call is not done during execution of GetSearchResults. Instead the whole code for calling the function is passed on to the function "GetDetails" and there it is not working.

Code:
23:11:04 T:1825258144   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:11:04 T:1825258144   DEBUG: scraper: CreateSearchUrl returned <url>http://10.0.0.1:8081/record.onexml?id=2026</url>
23:11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://10.0.0.1:8081/record.onexml?id=2026
23:11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1
23:11:04 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"
23:11:04 T:1825258144   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity><title>Bones - Die Knochenj&#xE4;gerin</title><url cache="tt4818866.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></entity></results>
23:11:04 T:1825258144   DEBUG: bool ADDON::CScraper::GetVideoDetails(XFILE::CCurlFile&, const CScraperUrl&, bool, CVideoInfoTag&): Reading movie 'http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de' using IPTV PVR TV Series Scraper scraper (file: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')
23:11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de
23:11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://thetvdb.com
23:11:05 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de"
23:11:05 T:1825258144   DEBUG: scraper: GetDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><id></id><chain function="GetArt"></chain><episodeguide><url cache="-.xml">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></episodeguide></details>
23:11:05 T:1825258144   DEBUG: scraper: GetArt returned <details><url function="ParseArt" cache="-de.xml">http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml</url></details>
23:11:05 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml
23:11:05 T:1825258144   ERROR: CCurlFile::Open failed with code 404 for http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml

Thank you for very much your help or your ideas - highly appreciated!
Gerald
Reply
#2
Okay, so maybe my question was too complex. To make it a simpler:

Is it possible for a scraper to call a custom function in the body of <CreateSearchUrl> and <GetSearchResults>?
Or can a custom function only be used within <GetDetails>?

If possible what is the correct way to make this custom call look up and use some data from another website?
Reply
#3
Okay, so as this seems to be my own lonesome Blog :-) I try to document some findings and my first success that I had.

I dug into the Kodi source code and think I can now answer my questions. This is the scraper library where I found it:
https://github.com/xbmc/xbmc/blob/master...craper.cpp

For <CreateSearchUrl> I think it is not possible to have a custom function working within. Only the result of the function itself is taken as an URL - custom functions are run but not taken into account for the result (due to only the first element of the resulting vector is used).

For <GetSearchResults> it is possible though. Similar for custom functions in <GetDetail>, where you use the enclosing <detail> Tag, you need to enclose it with a common Tag. It works with the <results> Tag in the following way:

Code:
<CreateSearchUrl dest="3">
    <RegExp input="$$1" output="<url cache="pvr\1.xml">http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="3">
        <expression noclean="1">(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>
    </RegExp>
</CreateSearchUrl>

<GetSearchResults dest="1">
    <RegExp input="$$4" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="1">
        <RegExp input="$$1" output="<chain function="GetTVDBIdFromEpisode">\1</chain>" dest="4">            
            <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetSearchResults>

<GetTVDBIdFromEpisode dest="3">
    <RegExp input="$$4" output="<results>\1</results>" dest="3">
        <RegExp input="$$1" output="<url cache="\1-$INFO[language].xml" function="ParseTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="4">
            <expression />
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</GetTVDBIdFromEpisode>

<ParseTVDBIdFromEpisode dest="5">
    <RegExp input="$$4" output="<results><entity>\1</entity></results>" dest="5">
        <RegExp input="$$1" output="<title>\1</title>" dest="4">
            <expression><EpisodeName>(.*)</EpisodeName></expression>
        </RegExp>
        <RegExp input="$$1" output="<url cache="\1-$INFO[language].xml">http://thetvdb.com/api/1D62F2F90030C444/series/\1/all/$INFO[language].zip</url><id>\1</id>" dest="4+">
            <expression><seriesid>([0-9]+)</seriesid></expression>
        </RegExp>
        <expression noclean="1" />
    </RegExp>
</ParseTVDBIdFromEpisode>

According to the log <GetDetails> now work ! Really Cool Cool! So it really downloads the ZIP-File with all necessary TV-show information and all the fanart ... BUT! ...

... nothing shows up in TV Shows in Kodi - here are the last lines in the LOG - no sign to look up the episode details that I am scraping:

Code:
23:49:11 T:1799146144   DEBUG: VideoInfoScanner: Adding new item to tvshows:smb://SHARECENTER/record/Test/20160830_rtl_Bones_-_Die_Knochenjaegerin_2026.ts
23:49:12 T:1799146144   DEBUG: CAnnouncementManager - Announcement: OnUpdate from xbmc
23:49:12 T:1799146144   DEBUG: GOT ANNOUNCEMENT, type: 16, from xbmc, message OnUpdate
23:49:12 T:1799146144   DEBUG: bool VIDEO::CVideoInfoScanner::ProcessItemByVideoInfoTag(const CFileItem*, VIDEO::EPISODELIST&) - found match for: 'smb://SHARECENTER/record/Test/20160830_rtl_Bones_-_Die_Knochenjaegerin_2026.ts', title: 'Bones'
23:49:12 T:1799146144   DEBUG: int CVideoDatabase::GetEpisodeId(const string&, int, int) (smb://SHARECENTER/record/Test/20160830_rtl_Bones_-_Die_Knochenjaegerin_2026.ts), query = select idEpisode from episode where idFile=731

So this it my next challenge: Since alle the files are actually in ONE folder I don't yet know how to call the function <GetEpisodeDetails> and tell it to take the right episode ...
  • How and when is the function <GetEpisodeDetails> in the scraper called?
  • Is it possible to have several TV shows and their episodes ALL IN ONE SINGLE FILE FOLDER?

For all my regular TV-shows I have a folder structure like:
Seriesname > Season XX > Seriesname-SxxExx-Episodename

But my PVR puts all the recorded files into one single folder - how can I handle that?
Reply

Logout Mark Read Team Forum Stats Members Help
Scraping recorded TV-Shows - extend TVDB Scraper - get function calls right0