GSOC 2013: Improved Database Layout

  Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Milhouse Offline
Team-Kodi Member
Posts: 12,001
Joined: Jan 2011
Reputation: 583
Post: #16
(2013-05-30 12:40)Tolriq Wrote:  According to some of my tests, the main problem of rpi and all flash based distribution is much more the slowness of the disk than cpu for database.

The more random access the slower it will be and such a complex multi row link thing will be slow Smile But the benefits are so good over that.

I guess for those platform the only performance solution would be to find some kind of cache system.

Certainly, there have been reports of SD card performance issues relating to the sqlite3 database, but not using the SD card is a fairly popular solution for Pi users (USB storage and MySQL are common solutions for improved performance). In fact, I'd say that MySQL (and whatever replaces it in future) is likely to be essential for larger libraries, however it's the processing required to display just 10 items at a time that is often the Achilles heel for the Pi so while the overall database performance is of course critical, ensuring that clients only retrieve from the database the data they need to display and no more, is paramount for a scalable solution.

Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
find quote
RockerC Offline
Posting Freak
Posts: 1,380
Joined: May 2011
Reputation: 24
Post: #17
Curious to hear and see how this new database will work with pieh's proposal of merging addons and plugins sources into your library?

http://forum.xbmc.org/showthread.php?tid=165024

"Aggregate media from multiple sources and present them in unified library view", should this media data go into the database too?

(2013-05-29 18:49)adamsey Wrote:  
(2013-05-29 18:22)don.corleone Wrote:  My 2 cents: Is there need for separate video and music databases with separate schemas? Wouldn't it be better to have a sinlge media database to allow arbitrary user-configurable relations between entries?

I might want to look up a movie's theme song from its database entry (browsing the movie library) as well as from the album's (which might just be a collection of theme songs) entry in the music library. I might also want to access movie stills (pictures) from the movie's information page or based on actors/characters shown on the picture. The actor may also be a music artist at the same time, so I might want to find that same movie still based on a song I'm listening to. If XBMC should support an ebook library in the future, I might want to find the book a movie was based on. I'd possibly want to find prequels/sequels to a movie or tv-show spinoffs and movies that feature a certain song or songs by an artist I have just another song of, and so on...

I think it's impossible to anticipate what cross-relations a user will expect from the database in the future, so it's best not to introduce unnecessary barriers.

Truly, thank you so much for this comment. I seem to have been thinking about this all wrong, and, in fact, misreading some of the proposals for the new schema. What you have suggested will be exactly the case: all of the content, collections, etc. for any types of media will be all together in the same schema. It should then be able to support arbitrary links between content, like we can now link a TV show and a movie, for example.
Also, please don't forget about existing PVR addon clients, as well as the work being done to make a Games Library for RetroPlayer.
find quote
topfs2 Offline
Team-Kodi Developer
Posts: 4,548
Joined: Dec 2007
Reputation: 17
Post: #18
(2013-05-30 12:40)Tolriq Wrote:  According to some of my tests, the main problem of rpi and all flash based distribution is much more the slowness of the disk than cpu for database.

The more random access the slower it will be and such a complex multi row link thing will be slow Smile But the benefits are so good over that.

I guess for those platform the only performance solution would be to find some kind of cache system.

So if we move the RAW database into RAM, which afaik sqlite supports I wonder how much memory is actually needed for those operations, i.e. how much RAM is needed to make the iterators _not_ hit the disk.
If the pi is even capable of doing the needed operations without needing the disk.

While paginating would probably help quite a bit it only helps if there isn't a table scan in the final query Smile Not sure if there is a simple way to calculate or test this but it would for sure be valuable data.

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
find quote
Tolriq Offline
Donor
Posts: 3,016
Joined: Jun 2009
Reputation: 97
Location: France
Post: #19
Well this is out of my comp for the coding part (and well you know my mind on coding now Tongue).

But I can do tests and validations on rpi and lot's of hardware / VM and can do profiling on Win platform too.
find quote
arucard Offline
Junior Member
Posts: 25
Joined: Feb 2011
Reputation: 1
Post: #20
This might be a good time to try moving the XBMC database layout towards a semantic data model. This would allow you to create a database that could be strongly interrelated, as don.corleone suggested.

As an example, you could think of a movie that would be entered in the database. The information about the movie itself, such as cast, crew, plot summary, etc., would be entered for that movie, but would have any relation with the actual file for that movie. The database entries for the video file would then contain all the media information, such as resolution, fps, filesize, runtime, etc., and would also specify which movie it refers to. In this way, the information for the movie and the file would be completely separated. This could be useful if the video file for that movie has been moved (which is seen as a removal at the old location and a new file at the new location) or has been replaced (e.g. with a better quality or director's cut). You would then only have to change the attribute in the video file's information and not have to retrieve and store the movie's information again.

Of course, the biggest strength of semantic data is its ability to allow searching through the information more meaningfully. This could become more important as the XBMC library becomes more and more customizable. There is obviously still quite an overhead to storing data semantically, but this could pay off big in the long run. And this seems like the appropriate time to at least make a start with trying this, keeping in mind that you can create the semantic database alongside an existing relational database, with a mapping in between.

Either way, I'm really happy to hear about this project.

Edit: Just read about the unified library and realized that the example I provided would also allow easy storage of duplicate movies, since you'd set each video file (or other low-level URI, like upnp location or webstream location) to the same movie. When you then look up that movie, you would immediately get all duplicates of that movie and these could be handled appropriately (e.g. by showing the highest quality source by default).
(This post was last modified: 2013-05-31 13:18 by arucard.)
find quote
Milhouse Offline
Team-Kodi Member
Posts: 12,001
Joined: Jan 2011
Reputation: 583
Post: #21
It's been mentioned before, but this is all starting to look ripe for a data warehouse/ETL type solution.

Getting data out of the database is by far the most typical use case, in which case the data should be stored fully denormalised. Fairly frequent updates will occur to specific fields - modifications of playcount, watched status etc. - with fairly infrequent mass updates (library scans with new content etc.).

While a transactional database that solves all of XBMC's meta storage problems is a good thing, we should be mindful of the fact that most of the time users just want to get data out of the database as quickly as possible. Any complex transactional database design will almost certainly be compromised by this retrieval requirement, in which case a second fully denormalised warehouse-type solution could be worth considering. The warehouse wouldn't need to be that large or encompass the whole transactions database, it would just need to satisfy the most common GUI retrieval requirements (at least initially).

Texture Cache Maintenance Utility: Preload your texture cache for optimal UI performance. Remotely manage media libraries. Purge unused artwork to free up space. Find missing media. Configurable QA check to highlight metadata issues. Aid in diagnosis of library and cache related problems.
find quote
Vankwish Offline
Donor
Posts: 68
Joined: Sep 2008
Reputation: 0
Location: Wales UK
Post: #22
First of all allow me to thank all developers past, present and future for their excellence! Secondly let me apologise for the length of this post.

As more and more low power devices are being supported by xbmc (either under linux or natively under windows or android or ios . . .) I think that the appeal of storing media on portable usb hdd is strong with the 'average' user who does not know/want/need a dedicated NAS solution. I also understand efforts are being made to source content not just locally but also on line (a la boxee).

It is for this reason I would like to see support for removable media sources (usb or online) supported at the database level. I didn't know whether it concerns this thread or either thread here or over here, and I admit I am certainly not knowledgeable enough to even begin to suggest how this can be implemented but I can attempt to describe an ideal outcome ...

At start up or when a device is connected that has previously been added to the library as a source, XBMC would recognise this device (unique identification or by volume label or possibly pivos userdata style with a 'named' file/folder residing on the drive?) and this would be reflected in the library listings (with the specific content being displayed in moves/TV/music/home vids. . . ). When this drive is unmounted the listings are removed (or hidden I guess) until the device is plugged back in. Once a source has been scanned and added to the library it need not be scanned in again until there are any addition/deletions etc. The benefit of this would be that xbmc would remember the watched statuses of the media across different drives and the library could be updated easily independently of which drive was connected at the time. If media was unavailable xbmc could still display it in the library and present the user with the information of the location of the media (i.e. "plug the 'movies' drive in to watch"\"on WD elements drive" etc)

I understand that this may be accomplished already in several different ways, most popularly by mobile device users making separate profiles reflecting changes in their sources\local network. I have also achieved a similar effect for a family members pivos xios where the userdata dir (xbmc-data) is stored on each of the hard drives with the settings specific to the media on that hard drive, but it is restricted to a cold boot start up, one drive at a time and no overall 'library'.

Personally I have a NAS solution for both my media and database and understand that it may conflict with the idea of a central database implementation shared across multiple xbmc instances but I believe that the direction that xbmc is headed rapidly, if implemented transparently, this may be a valuable feature for those who may not frequent these boards as 'enthusiats' and use the low power/mobile versions of xbmc.

Thank you for taking the time to read this admittedly lengthy post, and irrespective of whether ‘I’m talking out of my arse’ would like to wish you all the best of luck!

Playing with XBMC on all platforms :)
(This post was last modified: 2013-06-02 15:35 by Vankwish.)
find quote
Fice Offline
Member
Posts: 98
Joined: Jan 2010
Reputation: 4
Post: #23
@Vankwish: i answered your post in my thread, because i didn't want to hijack this one: http://forum.xbmc.org/showthread.php?tid...pid1435059
find quote
adamsey Offline
Junior Member
Posts: 14
Joined: Jan 2013
Reputation: 2
Post: #24
Hello all,

With a second week of coding finished, I wanted to (finally) post an update here to let you know what I've been doing and where I'm going with the project.

Last week (Week 1), my main goal was to extract all of the classes for interacting with the database from the XBMC source. I also rewrote some of them (such as the logger) to be a little more dumbed down to suit my purposes. Once this was done, and I could write a small test program to connect to an SQLite database, I began implementing the changes proposed in Database Schema 4.0 and Schema 4.0a. By last Friday, I had completed a big file of SQL commands which would create all of the tables/views and insert some of the default object (e.g. Movies, TV Shows, Addons), attribute (e.g. title, filename, plot) and relationship (e.g. movieset_has_movie, object_has_tag) types.

I've been sticking for to Schema 4.0 (rather than 4.0a) for two reasons. I like the OOP techniques it applies to treating the data. For example, each object type can have a parent object type defined for it, and these children can inherit attribute types from their parent. In the current database, for example, "Movies" would be a child object type of "Video". In this way, we can easily define, for example, if we want each object to have a "Date Added" attribute, we can give that attribute to the "Object" datatype and make it inheritable for all of Object's children.

The second reason is because it gives everything one wants to manage with XBMC equal rights. There is only one database, which going with the theme of MyVideos, MyMusic, etc, I'm calling MyObjects. There is then an object type defined for the the types that XBMC has now by default Movies, Music Videos, Albums, Songs, Pictures, Addons, Actors; the list goes on. However, if one wanted to use XBMC for an ebook library (for example) a plugin could add a type "Ebook" to the objectTypes table, add some attributeTypes, maybe a relationship type (say ebook_has_film_adaptation), and that's about it. No need to create a new database file; everything gets treated equally.

As such, what I am primarily working on writing now is a CObjectDatabase class, named similarly to the existing CVideoDatabase, CMusicDatabase, etc. This class will support all of the basic calls to the the objects database. Then, hopefully if the API is good enough, all of the methods in the other database classes that contain SQL queries, can instead be mapped to calls to methods in CObjectDatabase.

Going forward, this is what I'm working on. I'd love some feedback from the community if any of this sounds inherently bad.
find quote
zag Offline
Retired Team-Kodi Member
Posts: 4,006
Joined: Oct 2007
Reputation: 75
Location: UK
Post: #25
Sounds like you've made good progress.

Thanks for the update.
find quote
topfs2 Offline
Team-Kodi Developer
Posts: 4,548
Joined: Dec 2007
Reputation: 17
Post: #26
(2013-06-28 15:34)adamsey Wrote:  I've been sticking for to Schema 4.0 (rather than 4.0a) for two reasons. I like the OOP techniques it applies to treating the data. For example, each object type can have a parent object type defined for it, and these children can inherit attribute types from their parent. In the current database, for example, "Movies" would be a child object type of "Video". In this way, we can easily define, for example, if we want each object to have a "Date Added" attribute, we can give that attribute to the "Object" datatype and make it inheritable for all of Object's children.

The second reason is because it gives everything one wants to manage with XBMC equal rights. There is only one database, which going with the theme of MyVideos, MyMusic, etc, I'm calling MyObjects. There is then an object type defined for the the types that XBMC has now by default Movies, Music Videos, Albums, Songs, Pictures, Addons, Actors; the list goes on. However, if one wanted to use XBMC for an ebook library (for example) a plugin could add a type "Ebook" to the objectTypes table, add some attributeTypes, maybe a relationship type (say ebook_has_film_adaptation), and that's about it. No need to create a new database file; everything gets treated equally.

I think this is a great idea. While I suspect your not on the topic of defining all the objects right now I would suggest looking at uPnP as they do behave, specification wise, as you suggest here. Everything is an object, there are two main types of Objects, containers and items. There are a few items, VideoItem, AudioItem and so forth down to Movies, Episodes. I suspect you could share the base properties with uPnP to ease the mapping.

EDIT: You probably don't need to look at uPnP spec too much, just something to consider that uPnP behaves the same. which is a great thing!

I think this also goes straight into line as how I wanted heimdall to scrape the information, one starts with atleast an Object and we continue to upgrade it and add properties.

Very excited about the ability to use such a database outside of xbmc Smile

If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

[Image: badge.gif]

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
(This post was last modified: 2013-06-28 20:37 by topfs2.)
find quote
adamsey Offline
Junior Member
Posts: 14
Joined: Jan 2013
Reputation: 2
Post: #27
Hey there,

It's been quite a while again since my last update, but I was waiting until something more interesting got finished than writing the same types of methods over for each DB table.

At this point, I believe I have all of the basic CRUD operations for the different tables finished. We can create new object types (video, tv show, documentary, ROM, album), attribute types (file name, plot, review, rating), relationship types (movie_has_actor, season_has_episode, album_has_song), artwork types (cdart, thumbnail, fanart, clearlogo), and create, update, and delete corresponding objects, attributes, relationships, and artwork. [Note: because I'm not working off of a full fork of XBMC, there is a column in the artwork table for storing the actual image file as a BLOB that I'm not yet implementing since it will be easier once my classes have access to some of the Texture utilities]

I have also completed methods to manage the bookmark, profile, settings, and stacktimes tables. One bit of particularly good news is that I have been able to use the exact same method signatures for things like adding artwork, setting bookmarks, and fetching stacktimes, so these things should act nearly the same as in previous versions.

My biggest project over the last week has been working on the table storing directory entries, which holds the metadata linking objects to the actual data in the file system (if applicable). In the current video database, there is a settings table and a streamdetails table. Settings stores things like zoom amount, selected audio stream, etc; streamdetails stores all the available video, audio, and subtitle streams. However, these tables could be very sparse. Now, the directory entry table has two columns, "settings" and "streams", which instead hold an XML string with these same details. I just finished writing methods which parse this XML and convert it into the same VideoSettings and StreamDetails objects as used before, and vice versa. Since this is all done in XML rather than a separate DB table, this also has the added benefit that a separate type of schema could be used if something like a picture needed file settings as well, perhaps storing changes made to saturation, contrast, things like that.

Among creating methods for CRUD operations, I have been scouring over the existing APIs for the video, music, pictures, and addon databases, trying to create all of the general methods I might need to facilitate changing the existing methods from using SQL to calling down to the generic object database. Furthermore, I finally moved from building my database from sqlite on the command line to implementing a the CreateTables method, which not only builds the database, but initializes it with "default" object, attribute, and relationship types so that, on start-up, XBMC would at least have all of the functionality it currently does.

Going forward from here, I'm working on two things. The first is creating something like the existing VideoInfoTag, PictureInfoTag, etc; more or less an ObjectInfoTag, that could fetch an object, a list of its associated attributes and (at least) IDs of other related objects. These leads me to the second, which is more work on the relationship front. I'm finding this to be the workhorse of this new set-up and I want it to be as efficient as possible. No more do we have all of these blank-link-blank tables; it's all done through relationships. So, we need to still be able to efficiently gather all actors in a movie or songs on an album. So, now that I think I can start adding a lot of test objects to the database and seeing how well my relationship methods work now and what more may be needed.

To wrap this up, I wanted to write a bit more about some exciting things with the way these relationships work.
  • We, of course, still have moviesets, but, now, we also easily have one movie in multiple sets.
  • When relating two objects, the relationship has a sequence index. This is used for sorting when fetching all related objects, but also for keeping track of things that should be numbered without adding an unnecessary attribute.
  • So, things like episode number in a season, season number in a series, or track number on an album are managed through the sequence index rather than having to check the attribute list and then sort.
  • I have, though, added a column called "link" to the relationship as well (brought over from 4.0a). The original purpose was defining actor roles. So, "movie_has_actor" and then the link value is the name of the role. But, I was thinking this could be used for other things, too. Like, "tvshow_has_episode". Sure Season 1 is the sequence index #1, but if the link is not null, we could give the season that name. I'm thinking of things like Avatar: The Last Airbender, where you might want to call Season One "Book One: Water", for example.

Sorry if this is a bit long-winded, but I'm excited about where this is going, now that most of the underlying functionality is almost finished.

Again, feedback is always welcome.
find quote
edrikk Offline
Fan
Posts: 327
Joined: Jul 2011
Reputation: 7
Post: #28
Truly great work so far!
find quote
da-anda Offline
Team-Kodi Member
Posts: 5,716
Joined: Jun 2009
Reputation: 74
Location: germany
Post: #29
I'm not sure if XML storage is a good idea, especially because you can't really use this information to search/filter your database efficiently which might be an issue on embedded systems and systems like the PI with weak CPU.

@db relations and objects
Have you thought about lazy loading? So when you create an object and you know that things like f.e. the related actors or stream details are not needed in every usecase, you mark them as lazy property and replace their storage container object/vector/whatever with a lazyLoading proxy which will automatically query the DB and build all actor objects on first use. Not sure if this will work easily in C++, I know this concept from a php framework.
find quote
DiMag Offline
Senior Member
Posts: 209
Joined: Nov 2012
Reputation: 0
Post: #30

@ adamsey:

1. In your schema, would genres, tags and sets be attributes? And would attributes (in general, not just the aforesaid ones) be freely linkable to any object type, or would there be a restriction based on semantical relationships? Example: genre is inappropriate for documentaries or (should they come to be defined as a distinct object) sports shows. Tags can be in principle be used across all objects. Sets only serve as a binder for a single collection.

2. If you plan to---and I submit that you should do so--- drop the segmentation of the video database between unitary (movie) and episodic (tvshow) objects, shall you then retain movie as a template for objects with that content but allow it to have episodic content? (Presently the only practical need for this is the ability to define DVD Extras in the same manner as TV shows define special seasons. It is not a negligible need.)

3. In defining new objects, do you take into account the availability of scrapable metadata material at present, or do you reason "once I make the structure available, at least one site will rise to the challenge and make metadata available"?

(This post was last modified: 2013-07-30 13:32 by DiMag.)
find quote