[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)
#31
CrystalP Wrote:Jmarshall, Firnsy, I didn't mention a 'source' table and sourceid xref from 'path' earlier, as it didn't seem required to port current functionnality.

It would be helpful to easily and safely implement certain improvements though, such as ticket 8556 and a smarter 'Clean Library' wrt NAS (test the top level path of the source - if not available, don't clean paths of the source). That's just off the top of my head. I don't mean to request features in this thread, so I'll stop here Smile

When we're talking about source are we referring to a source entry similar to that found in sources.xml? If so, are we not duplicating that information? And if so, do we want that?

If we just want to identify a path table entry as a top root node, are we better off just adding a field for it (eg. "rootNode") and then we can grab all rootNodes via "WHERE rootNode=1"

Just want to get it right in my head, this would save building a pseudo path table for just source paths as well as a corresponding link table.
Reply
#32
ph77 Wrote:I too think that perhaps the collections should be separated in a different table, if only for database performance issues.
The metadata varies a lot depending on the type of collection. In music content, the "Album" is a collection but it can have a quite rich set of metadata. So perhaps as you might have videocontent, musiccontent, etc you can have videocollections, musiccollections, etc so that content specific metadata can be specified.

I am growing on this suggestion of having dedicated *collection and *content tables ensures we can optimise the metadata to content and still have the tables coexist under one DB. My only reservation is that it breaks is the intercontent relations such as Soundtrack to Movie which is to be provided by contentlinkcontent.

ph77 Wrote:One different issue: Is there is a bit of overlap between "classifications" and "collections", as you can have nested collections? Might be a good idea to merge these concepts.

What is the likelihood of mixed content collections (ie collection consisting of both music and video)? This follows on from the aforementioned issue of intercontent relations and the most efficient storage method for these links.

Is there a need for nested collections? Examples?
Reply
#33
firnsy Wrote:I am growing on this suggestion of having dedicated *collection and *content tables ensures we can optimise the metadata to content and still have the tables coexist under one DB. My only reservation is that it breaks is the intercontent relations such as Soundtrack to Movie which is to be provided by contentlinkcontent.
Not necessarily. See last paragraph.

firnsy Wrote:What is the likelihood of mixed content collections (ie collection consisting of both music and video)? This follows on from the aforementioned issue of intercontent relations and the most efficient storage method for these links.
Not sure if it is feasible on the ui side, but for example I would surely like to have grouped together the photos and the video clips from my camera.

firnsy Wrote:Is there a need for nested collections? Examples?
TVSeries > Season > Episodes.
"TVSeries" collection has nested multiple "Season" collections.
Also similarly in music you can have an Album (or BoxSet) > Discs > Tracks.


If I analyze it correctly, there are two types of collections, let me name them as Logical and Natural.
The Logical collections are user defined as you described them in classifications table. They don't hold much of metadata, apart from a title and a description maybe. They can contain any content or collection even of mixed type (like photos and videos). Movie Sets can fit here as well.

The Natural (or structural/technical?) ones are natural content collections based on the content type like Music Albums, TvSeries, etc. These can have quite rich metadata, they can be nested as well, but should be restricted to only one content type.

So I suggest that the classification table to be used only for the user defined collections and that we define separate content and collection tables for each content type. Each table (also the collection ones) has a contentId as primary key that is unique over the whole content and collection tables; it is derived from a global sequence. The parentId on each table will be a reference to a contentId. The contentlinkcontent table can be used for inter-content relations as it is now, with the addition of a "targettable" field which will hold the name of the table that contains the target contentId (not sure that this is optimal though). EDIT: Similar change of course has to be applied in classificationlinkcontent table.
Reply
#34
@ph77: That's exactly how I see it as well - classifications are simply user-defined top-level groupings or content and collections that contain limited metadata (title, description, and some artwork) whereas collections are natural groupings that contain rich metadata. It's a subtle difference, but I think one that's strong enough to warrant separation.

Nested collections are definitely needed (see Show/Season/Episode). I was initially tempted to think that the "immediate parent" collection of a piece of content was unique, but movie sets break this rule, as do compilations of songs in some respects (though the user normally has unique versions of the songs per album). Thus, I think we'd definitely need a link table describing the heirarchy of collections and their children. In fact, I think it's specific enough to warrant 2 or 3 separate tables:

1. collectionslinkcollections. Describes the parent/child relationships with collections, which we hit recursively to grab all the collections under a collection.
2. collectionslinkcontent. Describes the parent/child relationships between collections and content - only needs hitting once as there's no recursion required, so can be used as a join to the content table(s) to get the actual content in a collection.
3. contentlinkcontent. Describes relationships between content that isn't described by a collection, such as movies and soundtracks.

Obviously there'd also be links from classifications to content and collections, though I'm not sure quite how to best represent that - after all, the classifications are really helping to describe the users filesystem in some respects, so could be described by queries rather than links.

As to the question of separate metadata tables per "content type" or "collection type", I guess it depends in some respects as to how we construct queries for the various "grouping" and "filtering" capabilities: If we have separate metadata tables, then we need to query each one of these in turn which may have implications if we wanted to select 100 pieces of content randomly for instance.

Ofcourse, one presumes that such filtering and grouping would have to be done on information that was common to each content type (or defined individually for each content type) anyway, so this possibly wouldn't be an issue. Perhaps, then, we could have the "content" table take all the common things that apply to all content types, and then have specific tables if we deem there's a significant enough differentation to warrant it over a generic key/value pair table?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#35
jmarshall Wrote:Nested collections are definitely needed (see Show/Season/Episode). I was initially tempted to think that the "immediate parent" collection of a piece of content was unique, but movie sets break this rule, as do compilations of songs in some respects (though the user normally has unique versions of the songs per album).
That's why I made the differentiation earlier. Movie Sets and Playlists/Compilations are user defined collections and they can stored in the respective table (currently "classifications").
For example movie sets could be under a standard classification entry like "Movie Sets", music playlists under a standard classification entry like "Music Playlists", etc. Photo albums, Music Video playlists can be there as well. All these don't need any metadata other than name, description and artwork.

Then the collection table can ensure a unique parent per item and the extra tables collectionslinkcollections and collectionslinkcontent are not needed.
Reply
#36
The complication with sets that I was thinking of was that there's perhaps the possibility for more metadata information for the set provided by themoviedb or similar. Perhaps this is not something to worry about - it certainly simplifies things as we can just have a parentID in the collections and content tables rather than worry about link tables. The link table then deals only with non-parent-child relationships.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#37
firnsy Wrote:When we're talking about source are we referring to a source entry similar to that found in sources.xml? If so, are we not duplicating that information? And if so, do we want that?

Yes, I was thinking of a source entity similar in content to sources.xml, with the idea to actually replace sources.xml or reduce it to something like "<sources><database>127.0.0.1</database></sources>"
XBMC has the data either way, but third-party tools such as media managers could benefit from being able to find all the data in one place, instead of having to locate and parse another file to make sense of the db.
I'll agree that it's a nice to have and not a requirement. It also depends on your plans wrt interfaces and third-party tools.

firnsy Wrote:If we just want to identify a path table entry as a top root node, are we better off just adding a field for it (eg. "rootNode") and then we can grab all rootNodes via "WHERE rootNode=1"

Just want to get it right in my head, this would save building a pseudo path table for just source paths as well as a corresponding link table.

I'm not sure why we would want to do that if sources.xml is available with the rootNode path information.

My point was in favor of a way to identify all paths belonging to a source. It would enable janitorial functionnality that's missing today, such as removing all paths of a source when deleting a source or a smarter and more thorough 'Clean library'.

My beef is that right now, the only way to find the paths that belong to a source (or navigate the hierarchy) is to use the path in sources.xml as the root and scan all paths in the path table, looking for a similar beginning. That's not efficient, and is it reliable? Are there guarantees that paths are perfectly hierarchical for all source types (disk, upnp, etc...), that the path component separators are the same, ...
Can we end up with the same path in the db through two different sources? Eek
Always read the Kodi online-manual, the FAQ and search the forum before posting.
Do not e-mail Kodi Team members directly asking for support. Read/follow the forum rules (wiki).
For troubleshooting and bug reporting please make sure you read this first.
Reply
#38
We definitely want/need a way to bind paths to a particular "node" in some way or another to allow efficient cleaning/scanning etc. We definitely cannot rely upon a hierarchical filesystem in terms of the URLs given at each level. However, multiple paths could be in multiple "nodes" though IMO this is not good practice. We do have to take care of it though as it will likely occur.

Whether or not we stick with sources.xml or move that to the database as well is a separate issue, though it certainly makes sense to move it to the database and not have the dupe'd information, given we'd need some way to relate the two otherwise.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#39
Firstly l've moved the schema to the Wiki to ease reading as opposed to being broken up over multiple posts.

CrystalP Wrote:... benefit from being able to find all the data in one place, instead of having to locate and parse another file to make sense of the db.
I'll agree that it's a nice to have and not a requirement. It also depends on your plans wrt interfaces and third-party tools.

A nice to have and something we should definitely consider/plan for.

CrystalP Wrote:I'm not sure why we would want to do that if sources.xml is available with the rootNode path information.

My point was in favor of a way to identify all paths belonging to a source. It would enable janitorial functionnality that's missing today, such as removing all paths of a source when deleting a source or a smarter and more thorough 'Clean library'.

Right. So for source paths (ie root paths/nodes) with the draft schema we could identify them by looking for a parentID=0 and then recurse to find all linked children via idPath/idParent within the paths table.

The recursion is not ideal but, for the low frequency janitorial tasks we are referring to, this could be within limits.

CrystalP Wrote:Can we end up with the same path in the db through two different sources? Eek

I'm sure there exists a user that could achieve this. However, whilst its necessary to identify potential conflicts I don't believe it's necessary to support them.
Reply
#40
This is my interpretation (from recent discussion) of how the classification, collection and content linkages are being described.

Image

I figured I get it mapped out first before massaging the schema.
Reply
#41
Actually, I'd be tempted to have the collectionlinkcontent type a 1:many relationship only - i.e. each content type can belong to one and only one collection. A collection then is simply a naturally occuring collection of content - albums, seasons, shows, artists, rather than an "unnaturally" occuring collection of content that has many:many relationships (sets, genre, keywords, actors, directors etc. etc.). These natural collections could then use triggers to handle the typical summaries of the data underneath them - number of subcollections, number of subcontent, watched and unwatched subcontent (this might be tricky with multiple profiles?)

In this setup the collectionlinkcontent table is replaced directly with a collectionID field in the content and collection tables. Grabbing all content under a collection is then simply a recursive query on the collections table to grab all subcollections, then a single query on the content table.

Classificationlinkcontent and classificationlinkcollection could then possibly be combined with a flag for the table you're linking to, though there's no particular reason for that.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#42
jmarshall Wrote:In this setup the collectionlinkcontent table is replaced directly with a collectionID field in the content and collection tables. Grabbing all content under a collection is then simply a recursive query on the collections table to grab all subcollections, then a single query on the content table.

I think I've captured what's been described and have pushed the concept also upto the collections table. If content can only belong to one collections in that a collection to sub-collection is also a 1:N relationship.

Image

Going one step further, my initial plans for the contentlinkcontent construct may be made redundant by the new collections. Can anyone see benefits (examples) of this additional construct that can't be solved by, or better suited to, collections?

jmarshall Wrote:Classificationlinkcontent and classificationlinkcollection could then possibly be combined with a flag for the table you're linking to, though there's no particular reason for that.

I don't think we need to combine these either.
Reply
#43
Yup, I think that diagram best describes how I see things working at least Smile

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#44
ph77 Wrote:TVSeries > Season > Episodes.
"TVSeries" collection has nested multiple "Season" collections.
Also similarly in music you can have an Album (or BoxSet) > Discs > Tracks.

i still miss the possibility to link "Music Videos" to albums. I have a huge music CD/DVD collection with more than 160 music DVDs (most of them concerts).
at the moment i have 200+ iso files on my HDs cause there's still no way to have the single tracks appear under an album just like music. also many of my CDs have multimedia tracks (music videos, interviews ....) that can't be linked to the CD.

why not treat music like this:

Album/Boxed Set > Disc (CD or DVD) > Tracks (audio or video)
Kodi 18.7 on Windows 10 (2x)/FireTV Stick (5x)/Android (2x) | MariaDB 5 on Synology Diskstation DS115/DS216+II
Reply
#45
dt2510 Wrote:why not treat music like this:

Album/Boxed Set > Disc (CD or DVD) > Tracks (audio or video)
I think that goes to the request section. The new schema will be able to support content relations.


Firnsy, what about the content (attributes etc) of a collection item? Where it will be stored?
And what about the other content types apart from video? Will they have their own content/collection tables or not?
Reply

Logout Mark Read Team Forum Stats Members Help
[RFC] Database Schema 4.0 - Request for Comments (Developers and DB Admins only!)1