need help scaping metadata text files
#1
I am switching from using my tired Tivo series 3 to using Kodi. I have been struggling to create a scrapper that will read metadata stored in text files for each of the TV shows I have archived from my the Tivo. For some backgroung, the text file is generated by a program called KMTTG that downloads from the Tivo and uses a third program called pyTivo to generate the metadata file with the same filename but ending in txt. Unfortuanely, althought the data in it has a lot in common with an NFO file, it is not in the NFO format, but rather is a list of name/value pairs separated by a colon (see below).

title : Sample Title
seriesTitle : Sample Title
description : As something happens deep down in the Midnight Zone, something else happens somewhere else..
time : 2015-02-18T00:02:00Z
mpaaRating : G1
isEpisode : true
iso_duration : PT16M
episodeTitle : The Episode title goes here.
isEpisodic : true
showingBits : 1
tvRating : x3
displayMajorNumber : 22
callsign : ABC2/ABC4
seriesId : SH0000262601
programId : EP0000262601-0004454490
vProgramGenre : Animation
vSeriesGenre : Animation
vSeriesGenre : Interests

I have been reading the scrappers wiki at "http://kodi.wiki/view/HOW-TO_Write_Media_Info_Scrapers" and other posts on this forum, such as from PKO66 (http://forum.kodi.tv/showthread.php?tid=36422). It looks to me like it shouldn't be too difficult to put together a scrapper to parse the metadata text file and populate the library with at least the Title and plot. Maybe it isn't too hard, but I can't get it to work. The logic I worked on was that it was much like scrapping an NFO file. This is what I have tried to do:
1. Change the extension of the text file to .NFO
2. Make a scrapper XML with an NFO section that simply sent everything to the getdetails section.
<NfoUrl dest="3">
<RegExp input="$$1" output="\1" dest="3">
<expression noclean="1"/>
</RegExp>
</NfoUrl>
3. In the getdtails section make the regex convert the name:value pairs to generate the listing of detailed information in the correct format as detailed in section 1.3 on http://kodi.wiki/view/HOW-TO_Write_Media_Info_Scrapers

I have used ScraperXMLEditor3_6_9_5 to test the regex and feeding in the appropriate test data it looks like I am getting a good output from the GetDetails section... so I think my regex there is ok. What I suspect i going wrong is that the NFO section of the scraper does not like that the content of the file is not in the correct NFO format and is not parsing it to the GetDetails section.

This looked like it would be an elegant solution, but it doesn't appear to be working.

Any thoughts or recommendations?

Regards,
David
Reply
#2
Ok. I have messed around with this a lot and just can't get it to work. I think that the <nfourl> section will only transfer a URL to the <getdetails> section and not the contents of the nfo file itself. Oh well.

Since my original idea doesn't seem to work, I have an alternative that perhaps has more promise. I can generate the name of the text file easily enough as that is the same as the name of the mpg files, with a .txt added. Since the folder/file names are quite uniform, I should be able to manually construct the full path to the file. Luckily for me, the folder name is repeated as the first part of the filename. I know this sounds an odd thing to have done, but it occurs as part of my extraction from the TiVo and helps prevent the creation of filenames starting with a dot where there is no episode title. The following is an example of the metadata text filename.
Octonauts.The Octonauts and the Yeti Crabs.2015-02-18.mp4.txt

My new concept is to construct the full path to the text file and use it as the URL. I capture it twice in the <CreateSearchURL> section; once into buffer 3 and again into buffer 7. Setting clearbuffers=no should allow me to carry buffer 7 over to the <GetSearchResults> section. This will allow me to run the <GetSearchResults> section, but still have the full URL available from buffer 7 in order to provide it to the next section, <GetDetails>. The URL being the full path of the text file, which I can construct because of the uniform nature of my folder\file structure.

It sounds good to me, but there may be a flaw in my logic as it doesn't work. When I scan the content Kodi says - Server Not Found, Do you want to continue?

I am posting the xml below. If someone can take a look at my logic and my code and give some pointers for where to go next, I would much appreciate it.

Thankyou.

<?xml version="1.0" encoding="utf-8"?><scraper framework="1.1" date="2015-11-25" name="pytivotv" content="generic" thumb="icon.png" language="en">
<NfoUrl dest="3">
<RegExp input="$$1" output="\1" dest="3">
<expression noclean="1" />
</RegExp>
</NfoUrl>
<CreateSearchUrl clearbuffers="no" dest="3">
<RegExp input="$$1" output="&lt;url&gt;/media/BlueSATA/tivo/\2/\1.mp4.txt&lt;/url&gt;" dest="3">
<expression noclean="1">((.[^\.]*)\.(.*[^\.]*)\.((?:19|20)[\d]{2,2}-[0-1][0-9]-[0-3][0-9]))(.*)</expression>
</RegExp>
<RegExp input="$$1" output="&lt;url&gt;/media/BlueSATA/tivo/\2/\1.mp4.txt&lt;/url&gt;" dest="7">
<expression noclean="1,7">((.[^\.]*)\.(.*[^\.]*)\.((?:19|20)[\d]{2,2}-[0-1][0-9]-[0-3][0-9]))(.*)</expression>
</RegExp>
</CreateSearchUrl>
<GetSearchResults clearbuffers="no" dest="8">
<RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$6" output="&lt;entity&gt;\1&lt;/entity&gt;" dest="5">
<RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="6">
<expression noclean="1,2,3,4,5,6">(?:episodeTitle : )(.[^\r\n]*)</expression>
</RegExp>
<RegExp input="$$7" output="\1" dest="6+">
<expression noclean="1,2,3,4,5,6,7" />
</RegExp>
<expression noclean="1,2,3,4,5,6,7" />
</RegExp>
<expression noclean="1,2,3,4,5,6,7,8" />
</RegExp>
</GetSearchResults>
<!-- returns: results in xml format <details><writer>*</writer><director>*</director><cast>*</cast><rating>*</rating><rank>*</rank><plot>*</plot> -->
<GetDetails dest="3">
<RegExp input="$$4" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;details&gt;\1&lt;/details&gt;" dest="7">
<RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="4">
<expression noclean="1,2,3,4,5">(?:episodeTitle : )(.[^\r\n]*)</expression>
</RegExp>
<RegExp input="$$1" output="&lt;mpaa&gt;\1&lt;/mpaa&gt;" dest="4+">
<expression>(?:mpaaRating : )(.[^\r\n]*)</expression>
</RegExp>
<RegExp input="$$1" output="&lt;plot&gt;\1 \n\2&lt;/plot&gt;" dest="4+">
<expression>(?:description : )(.[^\r\n]*).*(?:callsign : )(.[^\r\n]*)</expression>
</RegExp>
<RegExp input="$$1" output="" dest="4+">
<expression repeat="yes">(?:vSeriesGenre : )(.[^\r\n]*)</expression>
</RegExp>
<expression noclean="1,2,3,4,5" />
</RegExp>
</GetDetails>
</scraper>
Reply

Logout Mark Read Team Forum Stats Members Help
need help scaping metadata text files0