Help to rewrite the whole SQL Database?
#1
Hi,

I have since a couple of years used the XBMC database directly (through ODBC) to
  • get nice formatted lists of movies and TV series
  • finding doubles
  • moving whole TV-series to other locations without loosing all the user data (watched, time when watched, etc)
  • and much more.

I have noticed a couple of things with it.
  • the database is not normalized so it has to be some problems for you to use it when programming, compared to a normailzed database.
  • It has structure problems. For instance it does have the information about the id of the movie connected to the users data (watched/unwatched, when it was imported and watched). Then it deletes the private information when moving or deleting and then take the movie back after a while.
  • a lot of "Cxx" field names
  • a lot of wrong types for fields (char when it is supposed to be numeric and such, for instance the season and episode field (C11,C12?)
  • and also here, much more Smile

I have been waiting for Eden to see if you were going to change the structure of the database but now I can see that it is not done any major changes.

My question is simply if you are interested in me rewriting the whole database and (among a lot of other things) make it normalized? That is having the right tables, the right connections between the tables and the right fields in the right tables.
I will not go into much details if you are not interested but some of the differences would be:
  • to separate movie/series data with the users information about movies/series. That would make it possible to move movies/series without loosing the users information. Show that movies are offline (maybe make profiles obsolete with a toggle for showing/not showing offline movies/series because then you can have sources for all the places you are with your laptop (if not using a centralized database)). Delete a whole TV-series and after some time you want to see it again and then put it back (maybe in another format) and still be able to have it marked as watched and also see when it was watched that last time. And even if you mark it unwatched (because you want to see it again) the information of when it was last seen will still be there.
  • when it is normalized it is much easier and more natural to use the database when coding.
  • better naming of fields
  • all fields will be with the right type so they don't have to be converted before use.

(I think) there is a smart way of creating new versions of a database in XBMC. That is one of the reason why a total rewrite wouldn't be that hard to implement. And it is often impossible to do all the coding changes in the same time but that is also easy to use. For all the reading and hopefully a lot of the writing (depending on how SQLlite/MySQL is handling that, and that I don't know) of the database it is possible to use SQL views that looks exactly as the old database looks like, making the transformations easy and the code-changing a process that can be done whenever all the different programmers have the time to do it.

This is not any criticism of how the database looks right now. It is very common that the database gets like this after some years of using when building a lot of new features onto the first version.

In my profession I have been working with MS SQL Server for over 15 years.

/Mats
Reply
#2
You're allowed to critique the current video database - we don't get offended easily, and likely know the faults better than anyone Wink

Compare though, with the other databases - the video database is somewhat unusual in this respect - it is how it is primarily due to faults with the ancient version of sqlite that we used back in the xbox days that couldn't alter tables (thus the c## columns were used to allow simple extension, particularly while the library was under rapid development).

We're not interested in just replacing the existing layout with another layout.

We are interested in replacing the layout with something that allows generality (quite possibly at the cost of a certain amount of unnormalised data - after all, normalisation is not always appropriate.) There's been several aborted attempts at this. I believe the best last version was this: http://wiki.xbmc.org/index.php?title=Dat...chema_4.0a. Related thread is here: http://forum.xbmc.org/showthread.php?tid...tabase+4.0

In particular, what we're after is allowing the support of arbitrary types of content, arbitrary collections of that content (with attached metadata) and arbitrary (where it makes sense) metadata fields for content or collections. Further, all layers in the library hierarchy (filtered lists and the like) must allow retrieval using ideally a single query to get all information required for initial display of list and sorting. Additional information may then be retrieved with individual queries if necessary.

Take a look through the thread/wiki I linked above as a starting point. Once you've absorbed that, if you're still interested, let us know!

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#3
I am glad that you are not easily offended Smile

And I agree, there are always reasons why a database looks like it does and the C##-columns isa perfect example of that now that you explained it.

And my interest is of course also to make a layout that works with all of your current and future ideas of what XBMC should hold.

I have read what you asked me to (although one of the links didn't work but a search fixed that).

In my opinion, what you are trying to do is kind of the opposite of what an SQL database are made for. The database structure you are planning creates:
  • unnormalized (is that even a word Smile ) table structure with a few tables (and I don't think that is anything to strive for) with a lot of different not logical information in the, also not logical, tables.
  • a lot of rather complex, also not logical, queries/views that has to be handled.
  • a database that will be slower (which probably won´t give any problems at all because the databases are really small even with a large amount of movies/series).
  • a much longer start time for new programmers/developers to get into the structure.

Instead what I think you should do is to make a structured normalized database (that of course can take care of everyhing you need regarding your current and future ideas) and then make the complex queries in the few cases where it is needed. It will both be a faster, more general way to do it than your plan and also a lot more easy to program using that database structure.
And if you for some reason that I don't see right now want/need that not logical structure then it can be created with views using Union Select.

Disclaimer: Obviously you know a lot more about XBMC and what your current and future ideas are so I can just answer on what i have seen so far and maybe there is a showstopper somewhere for my idea that I just don't know about right now.

If you still don't agree with me and you think it is worth the time, please explain what your idea of the structure will solve in a better way (or maybe is unsolvable with my suggestion), because I have looked at this for a couple of hours now and I don´t see that at all. Not when I look at the database structure and definitely not when looking at the examples in the other thread .
Or maybe I´m just old fashioned ;-)

/Mats
Reply
#4
This part disappeared from the above post for the simple reason that I forgot to paste it in before posting Smile

I thought EAV/CR was mainly for when you have a lot of different columns and most of them were never used. I also thought that EAV/CR more or less isn't used anymore and that XML-fields were used instead (where it was needed).

This is from Wikipedia and about when EAV is better to use than standard :
  • The data types of individual attributes varies (as seen with clinical findings).
  • The categories of data are numerous, growing or fluctuating, but the number of instances (records/rows) within each category is very small. Here, with conventional modeling, the database’s entity–relationship diagram might have hundreds of tables: the tables that contain thousands/ millions of rows/instances are emphasized visually to the same extent as those with very few rows. The latter are candidates for conversion to an EAV representation.

In my opinion neither of them applies to XBMCs database.
Reply
#5
I suspect you hit the wrong page. We're not wanting EAV/CR at all - that was basically an attempt at something even more general which was thankfully dialed back quite a bit!

Here's the correct link:

http://wiki.xbmc.org/index.php?title=Dat...chema_4.0a

The reason we're looking at genericity (i.e. general "content" table that applies to all video types, and a general "collections" table that applies to natural groupings of that content (tvshows/documentaries/anime/seasons etc.) is that we want the database to be as flexible as possible - there's no points having an episode-specific table and a movie-specific table along with an anime-episodes or documentaries or mini-series or.... - when most of the fields remain the same.

Further, we don't want to restrict the many:many mappings for attributes (genres, actors, directors, tags, keywords etc.) too much, as we almost guarantee there'll be another one we need. Whilst we could use a separate table+linktable(s) per attribute mapping, the tables will essentially be identical, and thus we'd need a further map (or set of functions) to generate the queries. Mind you, this is IMO a minor point compared to the basic idea of content-genericity. (Note that currently we have neither: we have 4 different content tables, and about 7 attribute tables with thus 28 link tables - as you can appreciate, things quickly get out of hand...)

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
I tried the same way (with your first links) again, starting with your link that didnt work, and did a search. Yesterday 4.0 came up (without me noticing the lack of "a") and this time 4.0a came first...
But your link to the forum was for 4.0. I have tried to search for 4.0a but cant find any thread for that. Is there any?

Reply
#7
I had some time today so I started to look at 4.0a, although I don’t know what has been said of the database structure and the other things I wrote down, maybe everything is gone through already.... Then I got a bit carried away. What I am trying to say is that I don’t know if you even are interested in what I have to say, and if you aren’t then it is no idea for me to put down any more hours in this. When I realized that, I stopped working, and I am not finished.

I liked the 4.0a a lot more than the one I looked at yesterday Smile

Here are some suggestions, both small and large, in no particular order. I just wrote down all of the things I saw while looking through it. There should be a lot of “maybe”, IMO and “for example” that I didn’t wrote in the text below:
  1. [All tables] Put DateCreated (with a default of getdate()) and DateChanged in every table (date and time down to at least seconds). You never know what it can be good for, both for information and debugging. And with the small size of the databases even a large collection takes it won’t be any noticeable extra space at all.

  2. [Collection] Take away the link between content and collection for the primary collection. Instead make a new field in collection “TopCollection” that always show the top collection number whatever level it is on (if needed).

  3. [File(s)] 2 new fields, Offline (true/false) and LastChangedOL. This needs to have some kind of fast checking if the files are there or not. I don´t think it needs any hash-check or anything, just to check if the file is there or not.
    It also needs some kind of activation from within XBMC because the user when he sees all the Offline media that are not supposed to be there he checks and fix what was wrong (external HDD was off or forgot to put in the network cable etc.) and then he wants all the media to be online again without restarting XBMC.
    Maybe also be using some higher level of checks first (if 70% is in C:\Movies and 30% is in D:\Movies then maybe a start to check if there is a C:\Movies and if it is not then the 70% of the movies that were there can be tagged online).
    A toggle inside XBMC would be needed to not see the offline files.
    For those that just have a local database and uses profiles, maybe profiles is not needed for them then.

  4. [File or Content new field]. I actually don’t know for sure where to put the field Watched, but I lean to the table Content. I don’t think it should be with playCount > 0 or count(*) from settings. There are lots of reasons when you want to put a movie or TV-show to unwatched and still want to have the information on all the times the media was looked at.

  5. [Settings] I am not sure what you need playCount for, isnt it easier to just do a count()

  6. [Content] I am not sure how you in an easy way get “Top Level”, with that I mean The TV-Series or the Music Album. Maybe just put in a field TopLevel (true/false).
  7. [File(s)]. Have the field FileName just to be the actual filenamem without any username, password and/or tree from the path even if it is something contained in a zip-file.
  8. [New table Sources] I miss the table “Sources” for all places to find media and the link to Scraper. Maybe you plan to still have it in an xml-file but I don’t know why.
  9. [StreamDetails] Split that into 3 tables named FileStreamVideo, FileStreamAudio and FileStreamSubtitle for the 3 different groups (or is it more that I don’t know)?
  10. [FileSettings] Put all fields into table File (it is 1:1).
  11. {StackTimes] I don’t know what it is and it is also 1:1 to the File table.
  12. [All tables] Change the naming of fields wherever the same field-name is in more than one table and doesn’t store the same information.
    Example: Scraper, Collection and profile has the field “name”, but the information is not the same so they should be called ScraperName, CollectionName and ProfileName.
  13. [Content] New field “IsParent” to know that it is the parent (The actual TV-show or album) and not the episodes or tracks. I think it is the easiest way to just filter out the Tv-show/album "headers".
  14. [Server/Client] One way of solving a lot of problems, for example that XBMC should be able to use different databases (MySQL, SQLite and one more I don’t know where I saw) and for those that want to use a central database is to install XBMC in 2 parts (from the same install of course). A server and client installation, a little bit like P L E x but everything in one. Then you always can use the SQL database that is most suited for you, and really use the whole potential of the database, because the server part can send the media information to the client. It will also be simple to make another installation and connect to the server and from there get all information about the media.
    Another example is that you have a server installation and then when installing XBMC to a laptop that you want to move, then you can install both server and client on the laptop. Then you don’t have to have profiles for that reason.
  15. [Server/Client] You can then tell XBMC that the client is movable, a question at the installation, and then have features in XBMC like choose media to copy (with all information) from the server to the laptop and then maybe get a question of deleting the locally stored media next time XBMC connects to that server (of course with moving the user data (watched and when and so on) to the server if the user wants to.

  16. [Sources] Explain to the (inexperienced) user that an easy way of doing subcollections is to have TV-shows in one directory (D:\Media\TvShows) and then for every folder beneath XBMC will automatically create subcollections in XBMC. So if there is D:\Media\TvShows\Documentaries and D:\Media\TvShows\Mini-Series then XBMC automatically create the new (sub-)collections to TV-Shows in XBMC.
    Of course that is changeable both at the Info-page of the TV-show and at the Source-settings.
  17. [Files/Path] I can’t see why table Path is connected to table File. It is just 1:1 if I understand it corrrectly. In that case, change File.idPath to File.FilePath and take away the link between the tables.
    Every Source can have one Scraper and the Source (not the Scraper) has some Paths. So the name of the table Path should be SourcePath(s) and the link should be between Source and SourcePaths.

A diagram with small changes (mostly it is your 4.0a).

Image

So tell me what, if anything, we do next about this.

/Mats
Reply
#8
I don't know why the diagram doesnt show up. But here it is as a link.
Reply
#9
Wow - nice work. Lots to get through, but you put the effort in, so I can at least do the same! Smile

1. Will keep it in mind - a nice to have for debugging, I can see - not sure it needs to be in absolutely every table, but definitely useful in the content table and perhaps files/path.
2. Not sure what you mean here?
3. Yes, we definitely want online/offline status to be apparent. Storing it in the db is fine as theoretically at least our file-finders should pick up if they're available or absent. However, we do need to be careful about how we handle files that are deleted - we should be able to tell this pretty easy though (folder is present but file isn't -> flag as deleted). Perhaps a status field?
4. Part of the reason it's in the file table is because then it works in the current file view (even when not scanned) - in the future we can assume everything will be scanned, so in this case it's user-content information (per-user, so it would have to be linked to the profile in some way)
5. We don't store when things were played (play history) we just store a number of times played. This could be changed if we had a play history (again tied to a profile).
6. I think we probably need a type field in the content table? That way we can pick out any shows/albums etc. simply by SELECT FROM content WHERE type="movie" (suitably normalized).
7. Agreed - it does get a little tricky with our VFS though as it's not necessarily heirarchical, so combining file + path to get actual playable item might be tricky.
8. Yup - agreed. One thing you've dropped (which personally I agree with) is that with this layout you can really only assign a scraper to a particular source, rather than a path. Currently you can assign a scraper to any path (and switch scrapers half way down the tree). Personally I don't see a problem with restricting this - my guess is only "power users" even know about this feature.
9. Could do, yes.
10. I guess potentially this could be per-profile (it stores client-specific things with respect to a shared library)
11. It could be in the file table, yup.
12. I dunno what consensus is here, but I have no issue renaming stuff so they're global names wrt to the db.
13. See 6.
14/15/16 Kinda tangential to this. The new DB will be developed independently from XBMC (for ease and sanity) as much as possible. I doubt we'll ever completely separate them out into separate executables, however, though there may well be a separate server build available.
17. I guess this is normalisation - paths typically have more than one file in them, and we'd want to be able to quickly grab all the files by path (or technically, all the content by path). Actually, this brings up quite a separate issue. It would be useful to be able to define virtual file items (typically a folder) that represent content. i.e. a better mapping from a nicely cleaned up/stacked down etc. file list to content. We currently do this via the basePath/parentPath members of the movie table (likely some of the c## fields).

I think the next step forward is refining the schema, and ensuring that everything we want to be able to do is easily queried, plus that it gives us the flexibility we need. Then get up a sample db running that allows queries to be tested out and further refinement. I'm quite happy to start coding something up at that point.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#10
Thank you for that Smile

  1. The problem is, you don´t know where you would miss it until you do actually miss it Smile. But I agree with you, in some places they wont be necessary, we just don´t know which ones it is going to be.
  2. The person(s) who did the excellent job on 4.0a wrote in one place "2.7. Content is associated to a particular collection via content.idCollection (primary) and the mapping table "collectionlinkcontent" (secondary)."I think it is an easier solution to do as I proposed.
  3. I am not sure you have to delete anything in the database ever when media-files have been deleted. I think you just leave it as is and use On/Offline and deleted flags exactly as you proposed. And for those that are moved it is really easy to just change status and path. If they are moved and the scanner first comes to the place where it is deleted then it marks them as deleted and when the scanner hits the new place it just set the deleted flag off and change the paths, and it is done. If the scanner goes the other way and finds the new place first it obviously cant change the paths until it looks in the old place to see if it is there or not, and if it is in the old place also then you just create new File-records..
  4. It is nowhere right now, the field doesn't exist. But you can always link it back to the Content table from the File table if you will put it there so it works with your example also..
    But the (only) question that should decide in what table the field should be in (IMO) is:If the user has (or afterwards get) multiple copies of the exact same movie or episode, should they both be marked as Watched after you see one or should just the one you have seen be marked as watched and the other be unwatched. My personal opinion as an XBMC user is that I would like to have both marked as watched because I have seen that episode, whatever reason it is that I have double files of the same episode. But other users may look at it differently. But in consequence, if you set the Watched flag in File then you should move the link to Settings from Content to File?! As the structure is now both bookmark and Settings links to Content and not File.
  5. If you put a date-field in table Settings then you have it.
  6. If you put the field in that i suggested in #2, then you have that (through an INNER JOIN): SELECT FROM Content WHERE Collection.localization='TV Shows' (and one INNER JOIN to Collection, of course). If you in the same select add "AND content.TopLevel (=TRUE)" then you get all TV-Series without any considerations if they are in any Subcollections (Documentaries etc.) or not. And if you want Documentaries just change 'TV Shows' to 'Documentaries' above.
  7. I understand. As long as you agree it´s good, then if it cant be done anyway, that is ok Wink
  8. I agree, and the power-users can move their media and again get what they want, for all others it will be much easier. And if you want, you can have it the other way also, it is just when scraping from a higher level the scraping has to check if it is a lower level path that has another scraper. Personally I think it is to over-complicate things.
  9. Ok
  10. Yes, and if that is the case just put the idProfile there instead of moving it.
  11. OK.
  12. The reason I proposed this is when the database gets bigger and the views gets more, you don´t have to rename the fields to something else when you have two tables in a view with the same field-name but with different information in it. You just keep using the original name of the field. And that you can´t do if you have it like it is right now.
  13. Hehe, I did this on and off during the whole day yesterday. Funny that I did the same suggestions in 2 different places with 2 different field-names.
  14. Ok. I just got those ideas while looking at it and wanted to write them down.
  15. This was more of a way to give the inexperienced user an easy way of creating sub-collections.
  16. If we think of the same then it should be a new table ContentPath with links to content, not file. It has to be 1:N because some users may actually save the same TV-Show in more than one place. It should be just for the content that doesnt have any file (TV-Shows and albums).

Now I have seen that you have done a really nice job with the new structure and have done a lot more than I could see when I started this thread. But I still think I can be of help, the question is do you need or want it? If I can be of any assistance in that project, just let me know.

/Mats
Reply
#11
That was quick Smile

2. Do you mean dropping idCollection from the content table and instead just using the collectionlinkcontent link table? I'm not sure under which circumstances a piece of content can be in more than one collection (assuming collection here means something like "TV Show") - I guess technically some episodes can be in 2 "different" seasons (the season they air during, and "specials").

3. Agreed - the only reason for more than just Online/Offline is that in some cases we'll know that not only the item is offline, but it's actually most likely deleted (we'd mark this differently in the UI). Either way, it's the same basic idea.

4. The current db (i.e. in Eden) has playCount in the files table. Personally I think this needs to be a profile-level setting. Possibly also we may wish to consider storing play history (i.e. a list of dates when the file was played - atm we just have playcount and lastPlayed), though I suspect this is getting a little fine-grained for most users. We definitely need the profiles in the database and most things associated with the profile (user rating, playcount, settings possibly tied to that rather than just to the file/content).

5. What I meant is we don't store the history - we do store lastPlayed in the Eden database (in the files table).

6. Hmm, still not sure I get this one. Documentaries would be a sub-collection of TV Shows, but would there be links between the collection "TV Shows" and the content? eg. if we have 2 tv shows "Planet Earth" and "Mad Men" then one is a documentary and one is just a TV Show (no subcategory). How do you see the link tables being setup between the "TV Shows" and "Documentaries" items, the shows, and the episodes in the show?

17. Actually, this is referring to playable items pretty much (episodes, movies, music videos, clips etc) - often movies for instance are actually a folder on disk (or a virtual folder such as a rar or zip) or could be a folder of folders (such as a dvd or bluray folder layout). When the user is browsing their files, they're not really interested in the layout of the files for that movie beyond the fact that the folder item represents it. They don't want to have to click into the folder then click on the movie, they just want it to play when the click on it immediately. Thus, we need something similar to the files table representing these virtual items on the disk - it would have idPath and idContent, possibly idFile and other information (eg hash info so we can detect when things inside that folder changed).

As for where to next, this is currently in limbo due to no-one pushing it forward, so if you'd like to be involved, then great - as I say, it seems to me that the first step is refining the schema to the point we think it has everything we need (we'll likely not know for sure until it's been implemented) and then get a test framework up and running.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#12
sorry - I haven't read the 4 huge posts of you guys yet, I just wanted to mention that I'd really love to work on a new DB layout. Developing domain driven (DDD) for some years now, I have a pretty good sense for abstracting and generalizing data. I also know about the requirements of initially simple queries for basic media listing + on demand queries. The php framework I'm coding with has a nice concept for this - it's called "lazy loading". Basically you have a "repository" (a class that returns objects of a certain type, abstracting the sql stuff from the application layer) that fetches raw DB data from the primary object table and returns it as ready to use objects. Object properties that are relations to other tables can either be loaded eager (so instantly on creating the object by triggering subqueries) or lazy, which means that the object property itself only holds a proxy class that will load the according data/object on demand/first use. This is pretty nice to work with and I think that's also how it should work in XBMC on the long run, because it would easily allow the desired generalized DB schema.
Reply
#13
(2012-04-09, 15:03)da-anda Wrote: sorry - I haven't read the 4 huge posts of you guys yet, I just wanted to mention that I'd really love to work on a new DB layout. Developing domain driven (DDD) for some years now, I have a pretty good sense for abstracting and generalizing data. I also know about the requirements of initially simple queries for basic media listing + on demand queries. The php framework I'm coding with has a nice concept for this - it's called "lazy loading". Basically you have a "repository" (a class that returns objects of a certain type, abstracting the sql stuff from the application layer) that fetches raw DB data from the primary object table and returns it as ready to use objects. Object properties that are relations to other tables can either be loaded eager (so instantly on creating the object by triggering subqueries) or lazy, which means that the object property itself only holds a proxy class that will load the according data/object on demand/first use. This is pretty nice to work with and I think that's also how it should work in XBMC on the long run, because it would easily allow the desired generalized DB schema.

Must say that having a way to handle linking to other databases is a great thing, that would allow us to for example link a fileitem to a movie on themoviedb and thus just have a form of cache which acts themoviedb, that way we could schedule updates from themoviedb as needed but also if a fileitem is wrongly classed as a movie we can just remove that link.
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#14
Another idea is just dropping SQL and using a document store (eg Kyoto Cabinet with a JSON document for value). Though we'd lose the experience of those that deal with SQL, we might gain with a c++ style interface (map:reduce implementation, cursors etc.)
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#15
I have experience with creating a data abstraction layer within VB6 in a manner similar to that described by da-anda. Just this weekend I was telling some old co-workers about how I felt xbmc could benefit from such a framework. Although I know very little c++, this topic interests me and gives me something to do to in c++.
Reply

Logout Mark Read Team Forum Stats Members Help
Help to rewrite the whole SQL Database?2