If you're really curious, a little history might help clear up what we're talking about.
When we originally tackled the problem of matching what you said to what's in your library, we thought to utilize just custom slots, which are basically variables that will take the value of a portion of what you said. To do this, we created the slot generators to ask Kodi for all of its library titles, and we push them into the various custom slots you still see.
The idea was to have Alexa match against these and spit out the right answer; however, we found several problems with this:
- For some odd names, such as 2pac, Hawaii-50, etc, it wouldn't match,
- It would require the use to keep the slots up-to-date,
- The skill became large enough that it was no longer possible to store the entire library contents in the slots.
To solve these issues, we now use fuzzy matching in the skill to match what was heard with what's available in the media library. This is very accurate particularly when we know the media type, like if you say, "ask Kodi to play the movie Ghostbusters" as opposed to "ask Kodi to play Ghostbusters." We do support the latter -- and it works very well -- but there obviously can be collisions if the terms are fairly generic.
This all works fine provided there are enough samples in the slots. Alexa's propensity to send us the literal string the user said (that is, an unmatched string, transmitted literally as it was transcribed) goes up as the number of samples in the slots increases. Conversely, it goes down as the number of samples decreases. To satisfy all of the above listed problems, we depend on her to send the literal strings.
For users with large libraries and a good distribution of samples (varying string/word lengths and titles), everything works as expected; however, if a user's library is deficient in an area (such as bill-orange's 60 music artists), there is the possibility that she might not send the literal string to the skill, so we would have nothing to fuzzy match on.
As an example, if bill-orange had '2pac' in his MUSICARTISTS slot, but MUSICARTISTS only contains 60 items, the likelihood that she'll pass through a transcribed string like "two pack" is slim. And we already know she won't match directly on '2pac,' so we require the literal to even attempt to match further.
There are a number of possible solutions to this dilemma.
We could massage the data in the slot generators to account for all possible oddities, but obviously achieving 100% coverage here is not practical. We do currently filter out some things such as invalid characters, but that's the extent we'd like to go to at this stage.
We could try variants of certain substrings in the skill, such as translating number digits to number words and vice versa. We actually do do this currently. It's helpful, but doesn't cover all of the issues laid out above.
We could ship the skill with known-to-work-well slot samples. Since we fuzzy match in the skill code, it actually doesn't really matter what's in the slot samples, provided they're well-distributed to cover the types of things the user might say. This boils down to a providing a evenly-distributed variety of words, word lengths, and character lengths. The problems with this approach are as I described in an above post regarding language choice. As an aside, you may have noticed that the slot generators we have distribute the strings by word-length -- this is why that is.
Lastly (as far as I know), we could utilize Amazon's Library Built-ins. These are essentially custom slots that are populated with Amazon's own media catalog, which guarantees it to be large enough to spit out the literal string in the end if no exact match could be found. Further, it could increase the number of 'simple' (string equality) matches because Amazon could (and probably does) store variations of the titles; even if not, it's highly likely they have available to them other information to aid Alexa in making the match.
Library Built-ins are more of a code change than I think they should be, but it's something we would definitely utilize if Amazon would get off their asses and release them in the UK and Germany. So far, it's been 8 months since release in the US, and Amazon has given us no indication of when they'll be available in other regions.
Library Built-ins would both simplify the skill setup and improve the matching for users with smaller libraries, making it a win-win, but because the code change is drastic enough, it'd likely mean maintaining a fork until Amazon releases in UK and DE. This is possible, but since this is a hobby project for us, it's not ideal.
TL;DR for those that just want an improved experience now:
In the interim, users can benefit from padding low-populated-slots with samples that aren't in their libraries. An easy way to do this would be to get a friend with Kodi to allow you to point the slot generator at their library so you can combine lists. If that's not possible, you can simply enter in more samples manually. The samples can be anything, but variety is helpful and the lack of it can be detrimental, so to make it simple I'd suggest you look up actual media titles.
There is no real magic number for the slots -- it depends more on variation than anything. We currently default to 100 items in the slot generators and this seems to work out well in the end, but you can pad them as far as the builder will allow if you like. I wouldn't get too close to the ceiling though, because if we add more Intents and/or Utterances that make the skill larger, you'll be constantly trimming your slots to get it to save.