GSOC 2013: Improved Database Layout

  Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Martijn Offline
Team Kodi
Posts: 16,925
Joined: Jul 2011
Reputation: 310
Location: Dawn of time
Post: #31
(2013-07-28 11:28)da-anda Wrote:  I'm not sure if XML storage is a good idea, especially because you can't really use this information to search/filter your database efficiently which might be an issue on embedded systems and systems like the PI with weak CPU.
not a database expert (far from it) but why use XML when you already have a database structure in place?

Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
(This post was last modified: 2013-07-30 13:35 by Martijn.)
find quote
da-anda Offline
Team-Kodi Member
Posts: 5,779
Joined: Jun 2009
Reputation: 77
Location: germany
Post: #32
Martijn, you can't create a DB field for every EXIF tag of an image f.e. so you store it as XML, same with some other meta properties. So it's not bad in general, but I wouldn't use it for "important" meta data that might be used in query constraints
find quote
Martijn Offline
Team Kodi
Posts: 16,925
Joined: Jul 2011
Reputation: 310
Location: Dawn of time
Post: #33
(2013-07-30 13:42)da-anda Wrote:  Martijn, you can't create a DB field for every EXIF tag of an image f.e. so you store it as XML, same with some other meta properties. So it's not bad in general, but I wouldn't use it for "important" meta data that might be used in query constraints

understood Smile

Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
find quote
Robotica Offline
Banned
Posts: 1,484
Joined: Aug 2010
Post: #34
(2013-07-30 13:42)da-anda Wrote:  Martijn, you can't create a DB field for every EXIF tag of an image f.e. so you store it as XML, same with some other meta properties. So it's not bad in general, but I wouldn't use it for "important" meta data that might be used in query constraints

Yes this is called "sidecars". Similar to .NFO for video. Lots of software use those sidecar files for pictures.
find quote
Montellese Offline
Team Kodi Developer
Posts: 4,836
Joined: Jan 2009
Reputation: 72
Location: Switzerland
Post: #35
First of all I want to say that it sounds impressive what you have achieved so far.

(2013-07-25 16:51)adamsey Wrote:  My biggest project over the last week has been working on the table storing directory entries, which holds the metadata linking objects to the actual data in the file system (if applicable). In the current video database, there is a settings table and a streamdetails table. Settings stores things like zoom amount, selected audio stream, etc; streamdetails stores all the available video, audio, and subtitle streams. However, these tables could be very sparse. Now, the directory entry table has two columns, "settings" and "streams", which instead hold an XML string with these same details. I just finished writing methods which parse this XML and convert it into the same VideoSettings and StreamDetails objects as used before, and vice versa. Since this is all done in XML rather than a separate DB table, this also has the added benefit that a separate type of schema could be used if something like a picture needed file settings as well, perhaps storing changes made to saturation, contrast, things like that.
Have you done any performance testing on this XML serialization/deserialization yet? I have tried to achieve something similar in the current video database with artwork types and paths being encoded in JSON in a single table column. While retrieval of a media item with its art object (instead of seperate queries for the media item and the art items) was a lot faster, the deserialization of the JSON object was so slow that it completely countered the performance gain from the single database query. I'll see if I can find the performance tests that I ran back then. Not sure if XML serialization/deserialization is faster than JSON and by how much.

(2013-07-25 16:51)adamsey Wrote:  
  • We, of course, still have moviesets, but, now, we also easily have one movie in multiple sets.
We had that in the past as well but decided against it for several reasons, mainly that the default scraper (TMDB) has a one-set-per-movie policy and that it caused endless headaches in the video library views where sets and movies would be displayed in a mixed list. If you e.g. search for a specific movie which is part of multiple sets, which one do you point at and why.

(2013-07-25 16:51)adamsey Wrote:  
  • When relating two objects, the relationship has a sequence index. This is used for sorting when fetching all related objects, but also for keeping track of things that should be numbered without adding an unnecessary attribute.
  • So, things like episode number in a season, season number in a series, or track number on an album are managed through the sequence index rather than having to check the attribute list and then sort.

Do you happen to have a diagram of the current database layou? I'm not sure I fully understood the whole segmentation e.g. why specific artwork types should be any different than general attributes etc.

Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not e-mail Team Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
find quote
adamsey Offline
Junior Member
Posts: 14
Joined: Jan 2013
Reputation: 2
Post: #36
(2013-07-30 13:32)DiMag Wrote:  
1. In your schema, would genres, tags and sets be attributes? And would attributes (in general, not just the aforesaid ones) be freely linkable to any object type, or would there be a restriction based on semantical relationships? Example: genre is inappropriate for documentaries or (should they come to be defined as a distinct object) sports shows. Tags can be in principle be used across all objects. Sets only serve as a binder for a single collection.

Genres, tags, and sets are all actually object types. They are child types of the Grouping type, which also includes things like Season, Album, and Playlist. So, there are instead relationships defined, like "movie_has_genre", "object_has_tag", "movieset_has_movie"; this is where the binding between a movie and its genres comes from. So, if documentary is a child type of Movie, the "movie_has_genre" relationship need not be inheritable so that, in your example, we won't be trying to fetch genres in a case where it may be inappropriate.

Also, like the object types, we have attribute types, which have a defined object type they can be applied to and a defined inheritability, so, for example, "filename" is an attribute for the Content object type, but it is inheritable for all of the children so that videos, songs, pictures, etc, can have this attribute as well. Consequently, any given object can not have all possible attributes; this prevents us from trying to fetch the plot of a Band, for example.

(2013-07-30 13:32)DiMag Wrote:  
2. If you plan to---and I submit that you should do so--- drop the segmentation of the video database between unitary (movie) and episodic (tvshow) objects, shall you then retain movie as a template for objects with that content but allow it to have episodic content? (Presently the only practical need for this is the ability to define DVD Extras in the same manner as TV shows define special seasons. It is not a negligible need.)

I'm not entirely certain I'm understanding this question, so please feel free to correct me if this response is not helpful. I'm trying, at least, to treat video files which represent movies and video files which represent episodes of a show similarly. There is a Video object type, and Movie, TV Show, and Episode all inherit from it. (Now that I'm writing this out, it seems like TV Show should be a grouping, and only have Movie and Episode be Videos). Movie (currently) has one grouping relationship, "movieset_has_movie," so you can put a movie into arbitrary sets. Two other relationships are "tvshow_has_season" and "season_has_episode." Through these two, relationship between a tv series and all of its episodes is conducted. Beyond this grouping, episodes are videos just like movies.

As for the DVD extra example: with the new schema of objects connected by relationships, it would be rather trivial to have added an object type, say "DVD Extra", and a relationship "movie_has_extra" which would link these two together. Then, some sort of GUI would need to be added to the movie info page (this part isn't my forte) to access this linked content.

(2013-07-30 13:32)DiMag Wrote:  
3. In defining new objects, do you take into account the availability of scrapable metadata material at present, or do you reason "once I make the structure available, at least one site will rise to the challenge and make metadata available"?

One of the goals of this new schema is
Quote:Everything gets added to the library, whether we can find online information or not.
So, I think the idea would be, we have all of these attributes, more can be added when wanted/necessary, and either allow the user to fill them or a scraper will rise to the occasion. However, I'm just designing the database to make it easy to do, so I can't speak for certain how this will go.
find quote
adamsey Offline
Junior Member
Posts: 14
Joined: Jan 2013
Reputation: 2
Post: #37
(2013-07-30 15:34)Montellese Wrote:  Have you done any performance testing on this XML serialization/deserialization yet? I have tried to achieve something similar in the current video database with artwork types and paths being encoded in JSON in a single table column. While retrieval of a media item with its art object (instead of seperate queries for the media item and the art items) was a lot faster, the deserialization of the JSON object was so slow that it completely countered the performance gain from the single database query. I'll see if I can find the performance tests that I ran back then. Not sure if XML serialization/deserialization is faster than JSON and by how much.

I haven't gotten to the point of performance testing with this. I've just done some tests to make sure the mapping from XML to settings/streams object works. While it seemed pretty fast going both ways, I'll make sure to do some testing performance-wise.

(2013-07-30 15:34)Montellese Wrote:  If you e.g. search for a specific movie which is part of multiple sets, which one do you point at and why.

Yeah, I guess that makes sense. There's nothing forcing the multiple sets thing, but the ability is there if someone wanted to tackle the display problem in the future.

(2013-07-30 15:34)Montellese Wrote:  Do you happen to have a diagram of the current database layout? I'm not sure I fully understood the whole segmentation e.g. why specific artwork types should be any different than general attributes etc.

If you look up the pages for "Database Schema 4.0" and "Database Schema 4.0/a" on the wiki, the 4.0 page has a good listing of the classes and how they work together. I took the idea of the artwork table from 4.0/a. I had to play with this in my head for a while, because, if you look at the 4.0 page, it seems to say that things like fanart, thumbnails, logos, etc. should be stored alongside the users pictures from the picture database, and then we'd need things like "movie_has_fanart", "album_has_cdart" to link them. To me, this seemed counterintuitive. Then I thought, why not make fanart an attribute? But, the thing is, we would probably want more attributes on fanart than just a url or location. Looking at Schema 4.0/a, there was a table called image, which stored this artwork, along with a height and width, as well as a BLOB column which could be used as a cache. I thought this seemed like a good idea, especially in support of the proposed client/server architecture. The server could maintain the cache, and the client could just lookup the BLOB and render it to a texture, rather than having to lookup the url and then fetch the image from there. So, I then renamed this Image table to "artwork" and further added an "artworkType" table because 1) this fit in with the scheme for the rest of the tables and 2) this allowed us to finely specify which types of artwork could go where.
find quote
griffore Offline
Member
Posts: 76
Joined: Oct 2007
Reputation: 0
Location: Ontario, Canada
Post: #38
(2013-07-30 13:42)da-anda Wrote:  Martijn, you can't create a DB field for every EXIF tag of an image f.e. so you store it as XML, same with some other meta properties. So it's not bad in general, but I wouldn't use it for "important" meta data that might be used in query constraints

You could use row modelling or an Entity–attribute–value type data model. In these cases, the data is pivoted to be stored in rows rather than across columns. So you can expand the data that is stored in the table dynamically.

An EAV design would probably have larger impacts to the XBMC data model, but you would be able to store data for an infinite amount of objects and their details and characteristics.

Either way, good progress. Looking forward to working with it.
(This post was last modified: 2013-07-31 05:23 by griffore.)
find quote
da-anda Offline
Team-Kodi Member
Posts: 5,779
Joined: Jun 2009
Reputation: 77
Location: germany
Post: #39
(2013-07-31 05:22)griffore Wrote:  
(2013-07-30 13:42)da-anda Wrote:  Martijn, you can't create a DB field for every EXIF tag of an image f.e. so you store it as XML, same with some other meta properties. So it's not bad in general, but I wouldn't use it for "important" meta data that might be used in query constraints

You could use row modelling or an Entity–attribute–value type data model. In these cases, the data is pivoted to be stored in rows rather than across columns. So you can expand the data that is stored in the table dynamically.

An EAV design would probably have larger impacts to the XBMC data model, but you would be able to store data for an infinite amount of objects and their details and characteristics.
sure, but it'll be slow
(This post was last modified: 2013-07-31 09:49 by da-anda.)
find quote
adamsey Offline
Junior Member
Posts: 14
Joined: Jan 2013
Reputation: 2
Post: #40
(2013-07-31 05:22)griffore Wrote:  You could use row modelling or an Entity–attribute–value type data model. In these cases, the data is pivoted to be stored in rows rather than across columns. So you can expand the data that is stored in the table dynamically.

An EAV design would probably have larger impacts to the XBMC data model, but you would be able to store data for an infinite amount of objects and their details and characteristics.

Either way, good progress. Looking forward to working with it.

The attributes table is essentially EAV, but the object types and attribute types tables help define/limit which types of attributes can be given to which objects; this should make filtering and retrieval better.
find quote
topfs2 Offline
Team-Kodi Developer
Posts: 4,548
Joined: Dec 2007
Reputation: 17
Post: #41
Regarding XML store, may I suggest looking into subject, predicate and object triplet structure. This is what sparql stores with and is very dynamic.
Basically you have 3 columns, subject, predicate and object.

movie113 hasSubtitle http://foo.com/23123123
movie113 hasSubtitle http://foo.com/45554
movie113 hasTitle X-Men

I'm far from a database expert so I'll leave the performance questions to someone who knows better, just something worth considering which is searchable in compared to xml/json store.

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
find quote
DiMag Offline
Senior Member
Posts: 209
Joined: Nov 2012
Reputation: 0
Post: #42

@ adamsey:

1.Everything semantic is an object. You write "Genres, tags, and sets are all actually object types." In other words, everything semantic is an object. Makes sense. The best way to make the definition of relationships flexible is to define all relating parts as independent objects, then link them per separate expressions. It is a perfect mirror of Dr. Codd's logic "everything is a table, and every table can be linked to any table per foreign keys --- you need no more logical definition than that".

2. Attributes distinguished from objects. Does it follow that attributes are not objects because they in contrast describe something inherent to the file? Stated otherwise, is the the distinction between objects and attributes coterminous with the distinction between content and property?.

3. Episodic content: is it an attribute? If that is so I ask myself whether episodic content (and all other format-related distinctions of which the episodic content is a part, say, mini-series = finite episodic content) isn't in reality an attribute. (But if it is easier to treat it as an object for database management purposes, then so be it.)

4. Show schema of new video file types. At some point you must state which new video file types you propose to add to and tvshows (documentatirs for sure, but what else beside them?), how the new additions' basic semantic schema (nfo tags/db fields) will look like, and also which additions to the semantic schemas of movies and tvshows you propose. (We have no entries for Awards and Reviews, for instance.)

5. Full-text search: a matter for the database? What is missing from XBMC is the ability to list all files which are tagged as referring to, say, the M-16 rifle. The GlobalSearch script searches titles only; it would not pick a file named "Assault_Rifles.ext". Is it something for the said script to fix, or would it be more appropriate to include the full-text-search capability into the new database design?

find quote
pecinko Offline
Donor
Posts: 3,954
Joined: Dec 2008
Reputation: 61
Location: Prague / Belgrade
Post: #43
(2013-07-28 11:28)da-anda Wrote:  I'm not sure if XML storage is a good idea, especially because you can't really use this information to search/filter your database efficiently which might be an issue on embedded systems and systems like the PI with weak CPU.

For devices with weaker CPU centralised library with nice set of pre build filters would be more efficient solution than fine-tuning whatever DB. I'm not trying to say we need slow DB just that there might be more efficient solution for the problem.

Furthermore, with centralised library populated on a server it would be interesting to dump some of the directory listings to XMLs and refresh them after library updates. So when you click on Documentaries entry on Home page and choose "Unwatched only" filter, server would give already prepared XML to the client. RPi would then only need to do parsing.

As we can see centralized DB (MySql) helps improve RPi performance and above example is similar to how iTunes works. Now me may or may not like iTunes but described model works nicely performance wise and have other benefits as well.

My skins:

Amber
Quartz

find quote
pecinko Offline
Donor
Posts: 3,954
Joined: Dec 2008
Reputation: 61
Location: Prague / Belgrade
Post: #44
(2013-08-04 20:07)pecinko Wrote:  
(2013-07-28 11:28)da-anda Wrote:  I'm not sure if XML storage is a good idea, especially because you can't really use this information to search/filter your database efficiently which might be an issue on embedded systems and systems like the PI with weak CPU.

For devices with weaker CPU centralised library with nice set of pre build filters would be more efficient solution than fine-tuning whatever DB. I'm not trying to say we need slow DB just that there might be more efficient solution for the problem.

Furthermore, with centralised library populated on a server it would be interesting to dump some of the directory listings to XMLs and refresh them after library updates. So when you click on Documentaries entry on Home page and choose "Unwatched only" filter, server would give already prepared XML to the client. RPi would then only need to do parsing.

As we can see centralized DB (MySql) helps improve RPi performance and above example is similar to how iTunes works. Now me may or may not like iTunes but described model works nicely performance wise and have other benefits as well.

Additional benefit - dump "recently added" and "in progress" directories to XMLs. Use skin widgets (or core functionality) to parse first X items in order to show them on home page.

My skins:

Amber
Quartz

find quote
DiMag Offline
Senior Member
Posts: 209
Joined: Nov 2012
Reputation: 0
Post: #45

Quote:pecinko wrote:
For devices with weaker CPU centralised library with nice set of pre build filters would be more efficient solution than fine-tuning whatever DB. I'm not trying to say we need slow DB just that there might be more efficient solution for the problem.

Totally agree.

find quote