"scraper" for embedded tags
#1
after several years using (and a little developing) Media Portal I recently discovered that XBMC has build in features (like DNLA) which are very important for me. Nevertheless there is still my problem with meta data for movies. I have a very well tagged library of mkv files and really don't need to access any web site for getting meta data. Actually running different scrapers deliver lots of movies that cannot be found because either the German titles are not maintained or they simply do not exist on the web pages (e.g. private videos). I followed different discussions here - some people seem to have a similar need.
In short, I would need a scraper for tags that are embedded in the movie files. (I know, scraper is not the correct word here, but this plug-in type describes the functionality.)
From what I have seen in the different discussions, it is not possible to start a script from a XBMC scraper.
For my mkv files the tool mkvextract delivers a nice XML formatted output with all needed data in it (see attached sample) when I call 'mkvextract tags /path/moviefile > targetfile.xml'. Of course I could create a web site that, if it is called with the movie file name, runs mkvextract and returns the XML data. But this is a little like "take a sledgehammer to crack a nut". More efficient would be to call mkvextract directly from within a scraper and get the XML result immediately. The good thing is that this could work on all OSes that MKVToolnix supports (my favorite Linux ... another PRO for XBMC).
I am aware of the argument that unfortunately there is no real standard for meta data tags in movies but a little matching table could solve that issue e.g. 'plot=summary'.
In general also for other movie containers that can have embedded tags (e.g. mv4) similar possibilities could be created. E.g. the tool mediainfo (do not mix up with the equally named XBMC plug-in) can be used for different containers.
May be that I have overseen some discussions, that point to possible solutions. I really would appreciate any hint about how to transfer embedded metadata into the XBMC database.
For Media Portal I use Ant AMC and the plug-in MyFilms (which is also not what I really wanted), so it also would be possible to create a MySQL export from AMC and somehow convert it to the XBMC format. But I really would prefer to have the embedded tags as a source.
Another problem seems to be getting the real file name into a scraper buffer (just my impression from scanning the comments).
Mhhh... I did not figure out how to attach files here, so below the output of mkvextract.
Code:
<?xml version="1.0"?>
<!-- <!DOCTYPE Tags SYSTEM "matroskatags.dtd"> -->
<Tags>
  <Tag>
    <Targets>
      <TargetTypeValue>50</TargetTypeValue>
    </Targets>
    <Simple>
      <Name>TITLE</Name>
      <String>10.000 B.C.</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>[/php]
    <Simple>
      <Name>ORIGINAL_TITLE</Name>
      <String>10,000 BC.</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>SUMMARY</Name>
      <String>Ein junger Mann wird dank seines Geschicks bei der Mammut-Jagd... (truncated)</String>
      <TagLanguage>de</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>ACTOR</Name>
      <String>Steven Strait, Camilla Belle, Cliff Curtis</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>DIRECTOR</Name>
      <String>Roland Emmerich</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>IMDB</Name>
      <String>tt0443649</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>RATING</Name>
      <String>5.8</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>COUNTRY</Name>
      <String>USA</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>GENRE</Name>
      <String>Abenteuer, Drama</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>DATE_RELEASED</Name>
      <String>2008</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>LANGUAGES</Name>
      <String>Deutsch, Englisch</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>LAW_RATING</Name>
      <String>12</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>LENGTH</Name>
      <String>104</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
  </Tag>
</Tags>
Reply
#2
This is EXACTLY what I was looking for, but my container is .mp4 files...... I've taken a lot of time to ensure all my embedded meta data is correct and exactly the way I wish them to be and, for me as well, I did all this work prior to discovering Kodi... Does anyone know of a program or a way to use the embedded meta-data and extract for Kodi use? ANY help would be most appreciated!!! Thanks
Reply
#3
I'm glad it isn't just me. I have a large library of MP4 files (actually M4V and M4A but they are all really just MP4s) that are very well tagged and 'all' I want to do is have Kodi pull the tags, artwork etc. out of those files. No need for external lookups, web scrapers etc. I am still amazed that it does not do that 'out of the box'.

I too started down the 'own web site / scraper' route bit it seems (unless I missed something that the only thing that Kodi passes to the scraper is the 'title' (file name minus path and extension) which of course is not at all sufficient since it is quiet possible to have multiple files in a large media library that happen to have the same name part.

If only there were someway to get Kodi to pass the full pathname of the file to the scraper in the web request then at least the 'sledgehammer' approach could work for me (since my CGI script could then figure out exactly which file this should be) then until (if) someone on the Kodi team decides to support local metadata plugins (or even provides an MP4 'scraper' as standard).

So near, and yet so far Sad
Reply

Logout Mark Read Team Forum Stats Members Help
"scraper" for embedded tags0