Kodi Community Forum

Full Version: need help with SageTV recordings scraper
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am working with a SageTV developer trying to get a functioning scraper for SageTV recorded shows. SageTV has an add-on that provides the necessary JSON response to scrape show information into XBMC. Currently we are having issues parsing the results of the webpage.

Below are the results given from the webpage:
Quote:{"MediaFile":{"Airing":{"Channel":{"ChannelLogoCount":2,"IsChannelViewable":true,"StationID":35714,"ChannelNetwork":"PBS Affiliate","IsChannelObject":true,"ChannelDescription":"WKOPDT (WKOP-DT) Knoxville","ChannelNumber":"20-1","ChannelName":"WKOPDT"},"IsShowFirstRun":false,"ParentalRating":"TVY","RealWatchedStartTime":0,"AiringID":2546814,"AiringChannelNumber":"20-1","ScheduleDuration":1800000,"AiringChannelName":"WKOPDT","TrackNumber":0,"ScheduleEndTime":1343737800000,"AiringRatings":["TVY"],"IsNotManualOrFavorite":false,"RealWatchedEndTime":0,"IsAiringHDTV":true,"WatchedEndTime":0,"IsWatched":false,"ExtraAiringDetails":"Closed Captioned, Stereo, HDTV","AiringDuration":1800000,"IsWatchedCompletely":false,"ScheduleStartTime":1343736000000,"LatestWatchedTime":1343736000000,"IsFavorite":true,"RecordingName":"","IsAiringObject":true,"IsManualRecord":false,"Show":{"PeopleListInShow":["William H. Macy","Frank Welker"],"ShowCategoriesList":["Children","Educational","Science","Animated"],"ShowSubCategory":"Educational","ShowRated":"","ShowExternalID":"EP8466780129","ShowCategory":"Children","ShowSeasonNumber":0,"IsShowObject":true,"ShowTitle":"Curious George","ShowCategoriesString":"Children / Educational / Science / Animated","ShowYear":"","ShowEpisode":"Hamster Cam; The Great Monkey Detective","IsShowEPGDataUnique":true,"ShowEpisodeNumber":0,"ShowExpandedRatings":"","OriginalAiringDate":1315440000000,"ShowDuration":0,"PeopleAndCharacterListInShow":["William H. Macy","Frank Welker as George"],"ShowParentalRating":"","ShowLanguage":"","RolesInShow":["Narrator","Actor"],"ShowMisc":"","ShowDescription":"Steve's pet hamster takes off in the city; chef Pisghetti's cookbook vanishes.","PeopleInShow":"William H. Macy, Frank Welker"},"WatchedStartTime":0,"AiringTotalParts":1,"ScheduleRecordingRecurrence":"","AiringEndTime":1343737800000,"AiringPartNumber":1,"RecordingQuality":"","WatchedDuration":0,"IsDontLike":false,"AiringStartTime":1343736000000,"AiringPremiereFinaleInfo":"","AiringTitle":"Curious George","IsShowReRun":true,"AiringAttributeList":["HDTV","Stereo","CC"]},"FileEndTime":1343737800000,"IsPictureFile":false,"MediaFileRelativePath":"","ParentDirectory":"E:\\Recorded TV","IsShowFirstRun":false,"MediaFileFormatDescription":"MPEG2-PS[MPEG2-Video 16:9 [email protected], Dolby Digital/[email protected] Stereo eng]","MediaFileEncoding":"AverMedia M780 PCIe Digital Video Capture 2 WKOPDT","IsVideoFile":true,"NumberOfSegments":1,"IsMusicFile":false,"IsTVFile":true,"IsThumbnailLoaded":false,"FileDuration":1799999,"IsLibraryFile":false,"Size":2009589135,"IsBluRay":false,"IsLocalFile":true,"IsCompleteRecording":true,"SegmentFiles":["E:\\Recorded TV\\CuriousGeorge-HamsterCamTheGreatMonkeyDetective-2546814-0.mpg"],"MediaFileID":2583924,"GetMediaFileMetadataProperties":{"Item":"Frank Welker"},"IsDVDDrive":false,"IsMediaFileObject":true,"IsFileCurrentlyRecording":false,"IsDVD":false,"IsShowReRun":true,"FileStartTime":1343736000001,"MediaTitle":"Curious George"}}

Below is the currently simple scraper code for testing:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<scraper name="SageTV-BMT" content="tvshows" thumb="icon.png" framework="1.1">
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://192.168.0.4:8080/sagex/api?c=plex:GetMediaFileForName&amp;1=\1&amp;encoder=json&lt;/url&gt;" dest="3">
            <expression noclean="1" clear="yes" />
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$3" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\4&lt;/title&gt;&lt;url&gt;http://192.168.04:8080/sagex/api?c=plex:GetMediaFileForName&amp;1=\1&amp;encoder=json&lt;/url&gt;&lt;/entity&gt;" dest="3">
                <expression repeat="yes">&quot;MediaTitle&quot;:&quot;(.+)&quot;</expression>
            </RegExp>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="\1" dest="9">
                <expression fixchars="1">&quot;MediaTitle&quot;:&quot;([^&quot;]*)</expression>
            </RegExp>
        </RegExp>
    </GetDetails>
</scraper>

Currently the URL is hard-coded into the scraper. That will change later. What appears to be happening is the JSON responses can not be parsed correctly. The XBMC log states:
Code:
00:17:11 T:4436   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'smb://PVR/Media/Recorded TV/AmericanPickers-S01E09-KnowWhentoFold-2581802-0.mpg'
00:17:11 T:4436   DEBUG: ADDON::CScraper::FindMovie: Searching for 'AmericanPickers-S01E09-KnowWhentoFold-2581802-0' using SageTV scraper (path: 'D:\HTPC\Xbmc\portable_data\addons\metadata.sagetv.com', content: 'tvshows', version: '0.0.1')
00:17:11 T:4436   DEBUG: scraper: CreateSearchUrl returned <url>http://192.168.0.4:8080/sagex/api?c=plex:GetMediaFileForName&1=americanpickers-s01e09-knowwhentofold-2581802-0&encoder=json</url>
00:17:11 T:4436   DEBUG: CurlFile::Open(097CE0D8) http://192.168.0.4:8080/sagex/api?c=plex:GetMediaFileForName&1=americanpickers-s01e09-knowwhentofold-2581802-0&encoder=json
00:17:11 T:4436    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://192.168.0.4
00:17:11 T:4436   ERROR: ADDON::CScraper::Run: Unable to parse web site

The object of this scraper is to use the JSON or XML output of the SageTV webserver plugin to get information about the recorded shows and import it into the XBMC TV library. The SageTV webserver can offer up all information about the show that is in the SageTV database. In theory you would add your SageTV recording directory as a source and then set the content to TV and set the scraper to this SageTV scraper.
So after speaking with Olympia, it appears this approach will not work. Back to the drawing board Sad
Can't you get SageTV save the recordings according to what XBMC likes and then you can simply scrape the shows with the standard tvdb scraper?
Not really. Plus, there are lots of shows that are not on theTVdb. Many of my sons cartoons or shows that are on HGTV, Food Network, DIY and more. I'd spend months trying to add these different shows and episodes to theTVdb just to scrape them to get them into XBMC and then watch and delete them. There are quite a few Sage users who are looking for something like this. They have made a Plex scraper to do this, but like myself many other users are not happy with Plex.
Just out of curiosity why won't that work? Other than the dates being in unixtime it looks pretty standard.
Why won't what work?
If anyone has any ideas at all what we could do to get SageTV recordings into XBMC, I'm all ears. Above you can see the JSON data we have access to. We need to be able to parse that and get it into the XBMC TV library somehow.
Ok, I'm trying to give my two cents. In pneumatic I have made an option to import nzb's to the XBMC library. The trick is to create .strm files and make sure xbmc indexes the .strm files in the right context.
A .strm file is treated just like a .mkv or .avi so adding a .nfo will help xbmc with index information. In the case of pneumatic the .strm file contains a plugin:// url back to the pneumatic addon with all information needed to play the media.
So parse the data, create strm files and tell xbmc to index the .strm files...
Quote:<GetDetails dest="3">
<RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
<RegExp input="$$1" output="\1" dest="9">

Obvious now?
Since I can't code.....nope not obvious. But I'll look more in depth later today when I don't have clients breathing down my neck.
Then maybe you should read up on the basics of writing a scraper, before doing it. Basically, buffer 1 contains the html, others nothing. You move your result to 9, then you move 5, which is empty, to 3. You should replace 5 with 9.
Thanks for the tip. I have read it multiple times, but that doesn't mean it makes any sens at all to me though Wink I'll pass the info along to the Sage developer who is actually writing the code.