WIP script.module.hay A distributed cloud cache mechanism based on dispersy
#1
Kodi community has a serious problem like zombie users, consuming Terrabytes of data from free services like, tmdb, tvbd, anidb, muscibrainz etc free metadata providers for their scrapers and 3rd party addons.

Most of those services already came to a decision of shutting down their services because of this serious problem, and luckily they still go on but keep deleteing keys of some their api users. However they still need donations to go on. So please do so Smile

There is also another aspect of that problem that most of the endusers are doing this without having any idea, and there is a bill to pay at the end of the day.

I have been developing this addon in mid-term to solve this specific problem.

script.module.hay is a key-value storage system that you throw a need in to your db (hay) and find it after you need it again, so you dont need to query again from remote (lets say tmdb), a simple cacher. The needle can be a pythonic object, hay module will serialize it and store it anyways, can be json or pickle serializer does not matter. This part is already implemented and there is not much interesting stuff about it.

Here comes the twist, when there is no cached entry in the local database, "hay" module will ask the distributed community for that specific needle, whenever someone has it, "hay" will download the data from the peer instead of service provider. The magic here is there is no need for portforwarding, no need for p2p backend like torrent, dispersy is just message based protocol and has nothing to do with torrent and has been proved to scale about thousands or millions without issues. And the shared files will be max 100kb or something json or pickled ascii / binary data. If it can not find anything in the distributed community, it will download from the service, and next time all the neighbours in the community wont need to.

With this implementation, service providers like tmdb will save bandwith i am guessing about %80 at least, kodi users will receive good and continious service with little bandwith contribution, (without even knowing). The creators of the dispersy at delft university will have the wildest environment on the internet (kodi community Smile) for scientific purposes. win-win-win.

This all is my interpretation but delft university guys are quite warm about kodi addons, but for successfull implementation all scrapers must move on to dispersy comunity with "hay" addon, so i would really appreciate what would the community's reaction would for such a solution?

further reading:
https://dispersy.readthedocs.io/en/devel/
Reply
#2
This sounds pretty cool, Kudos! However this possess a serious security issue, if abused.

Sent from my SM-G935T
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#3
Yes exactly, expanding the security:
Secuirty risk of identity: Dispersy has a cryptographic banckend a very much like TOR if it is required to secure online identity, however this is not the case here, shared files are just plain ascii files.
Secuirty risk of shared data: Dispersy messages are customized on application level, in basic, on application and communication level you can specify what kind of transfer is permitted, ie for simplicity it can be guaranteed that the files transffred is max 100kb or whatever the size is, and the data must be serializable ascii data or some class of some base (something like ABC) etc.

And also last 2-3 years this security issue is the hot topic of the team, finally they came up with a solution of self healing of the community based on reputation using blockchain (the base tech of bitcoin). As far as i know at least 2 cryptography professors are involved directly or indirectly not sure.

So secuirty-wise there is a wide variety of solutions exist in dispersy's bag, but i dont want to talk about those because combination of this tech with my hype sounds like little bit science-fiction Smile but this is very achiveable real solution.

Could you please point out some abuse cases based on team kodi's experience?
Reply
#4
Heck, I think this is a pretty nifty idea... unfortunately I place this in the "Utopian" bin. One of those things that ideally is fantastic, but real world... not so much.

A lot of damage can be done under 100kb Wink

Not speaking for Kodi, I would recommend you add a nag prompt that must be disabled by the user. Prompt the user every time a share is opened that "Blah, Blah" plugin is about to share content. Allow or Block.

IMO 99.9% of Kodi users are unaware of the modules running in the background, and any plugin can install modules without consent.

As for abuse... about a billion ideas pop into my mind, avoiding the obvious pitfalls of open communication between devices. Kodis python environment is open, so the content of what's being exchanged is the issue. ie. user passwords top of my list.
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#5
Ah, you are talking about that addon changing addons code issue, "this machine does not have a brain, so use yours" problem. I remember that from past experiences. So im thinking out loud.

Each community is represented with a hash key, and there is an identification for each node, one or more users can have administrative rights cryptographically, and cant be mimiced as long as private key is not known. Dispersy actually stands for distributed permisson system, so based on trust, nodes exchanges their permissons.

The key difference here dispersy has a root admin that can block misbehaving node.

Comparing this to central system, like direct downloading from tvdb, in direct download, database and authority is the same identity. The server. To make it more obvious one can collect all the passwords and upload to the a server. The only identity to prevent this behaviour is the server itself, the root authority which is the same identity as the database.

When it comes to distributed, databases are on several parts of the community, and the authority criptograpically unique. So if a misbehaving node is detected, system-wise passwords will be copied to closest neightbours. As long as the the admin does not block the node. So in a way, it is harder for the bad guy to collect paswords to himself, i dont know this is a bad thing or worse, but i like chaos sounds more fun to me Smile

At the end of the day, what i want to say is secuirty mechanism communication-wise is fundemetally same with HTTP, anonimity is much way better. Those are 2 differenct concepts though, i totally understand your concerns about the pythonic openness.

Sandboxing access is one way to outsource the trust to the user, to keep away from the being blamed, like you said %99 of the people has no idea whats going on behind and also one measure i can think is:

consider this only implemented as http cache mechanism, data can be validated using content-length from the central server using head (header only) request, will bring some small bw load, also tranmission delay but will give data validity, also if the service can provide data hash, this can be used instead of content length to validate.
Reply
#6
Neat! I've been contemplating this problem recently, too, but haven't actually DONE anything yet.

The big bandwidth killer is images, with one image generally being 2-4 orders of magnitude larger than all available text data about a given item gzipped for transmission. Images have to be included or this will barely make a dent in the Terabytes of data being served; before Trakt.tv stopped serving images, they consumed over 85% of Trakt.tv traffic, and their main focus isn't even on providing data that stays cached on each client but a fairly consistent transfer of data back and forth. Add nearly another order of magnitude for the number of images that will be grabbed.

Any add-on that gathers a user password or any other sensitive data can already share it with anyone they like (Ex. post it on pastebin for the world to see), so there's nothing new here. An HTTP request is surely simpler to fire off than implementing Dispersy, or mucking about with an existing add-on that uses Dispersy (urllib/2 is built into Python, after all), so any nefarious add-on will certainly use that instead, no reason to nag users about Dispersy passing around a bit of data.

I might even suggest that only certain trusted members are allowed to add new data to the community, rather than any member, then there is no worry about bad data being added, shared, and acted upon by other members. Trusted members could be set up by the services themselves, or a trusted third-party (perhaps even Delft University). Non-trusted members would then need a way to request a trusted member add new data, but that seems pretty straightforward. Trusted members could also be responsible for updating data in the community (API results change often, with new translations, updated plots, and so on), rather than every member needing to keep an eye out for updates to what is already in the community.

Bandwidth and transfer limits are a must, though, and should be set extremely tight by default, so that this doesn't saturate anyone's pipes or gobble up their quota.

Storage will also have to be tightly controlled, as Kodi is sometimes installed with very little disk space. I've recently seen a suggestion that 4GB is enough, but that can't even hold my Kodi installation's thumbnail cache, never mind a second copy that Dispersy can shared from.
Reply
#7
keeping only authorized members to add content should be possible, afaik there is but not sure this is a part of tribler or dispersy in each case there is already a base implementation,

data / serivce specific authorization seems to be better approach rather than making a generic cache mechanizm as i had first in my mind, considering lunatixz feedbacks, the more generic the system is the more complications are

also i have made sure that cryptographic backend is necessary for sure. currently dispersy uses binary addons like libnacl + m2crypto, to make it broad access, those backends to be replace with pure pythonic replicas as well to make it official and broad accesable, this will cause some latency cpu-wise considering cryptography is a cpu heavy operation. but i think it is achiveable and data to be processed is small and non-critical.

The dispersy community can be managed by the service provider ie (trakt) or delft guys for supervision may be.

Bandwidth management and tranmission rules can be implemented i see no problem and configurable by the service provider or module user.
Reply
#8
IMO this has too many vulnerabilities... local cache is king.
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#9
(2017-05-04, 22:49)boogiepop Wrote: keeping only authorized members to add content should be possible, afaik there is but not sure this is a part of tribler or dispersy in each case there is already a base implementation,

data / serivce specific authorization seems to be better approach rather than making a generic cache mechanizm as i had first in my mind, considering lunatixz feedbacks, the more generic the system is the more complications are

also i have made sure that cryptographic backend is necessary for sure. currently dispersy uses binary addons like libnacl + m2crypto, to make it broad access, those backends to be replace with pure pythonic replicas as well to make it official and broad accesable, this will cause some latency cpu-wise considering cryptography is a cpu heavy operation. but i think it is achiveable and data to be processed is small and non-critical.

The dispersy community can be managed by the service provider ie (trakt) or delft guys for supervision may be.

Bandwidth management and tranmission rules can be implemented i see no problem and configurable by the service provider or module user.

Authorized users? never gonna happen, not officially that is.

All the cryptographics in the world won't solve the major issue, there is no way to enforce which content is being "cached"...

I don't mean to rain on anyone's parade, this is a interesting module, just no a practical one.
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#10
i dont understand, sorry, did i say something dangerous?

local cache does not solve the problem, not at all, every major data requester is already doing it, by authorized user i mean something like, cyrptographic authority, something like CA authorizor in SSL. not some specific if user==root thing.

also by official i mean official repo, so devs can user service provider (tvdb's) dispersy community.

can you eloborate please i seriously dont understand.
Reply
#11
(2017-05-04, 23:05)boogiepop Wrote: i dont understand, sorry, did i say something dangerous?

local cache does not solve the problem, not at all, every major data requester is already doing it, by authorized user i mean something like, cyrptographic authority, something like CA authorizor in SSL. not some specific if user==root thing.

also by official i mean official repo, so devs can user service provider (tvdb's) dispersy community.

can you eloborate please i seriously dont understand.

I'm just one guy with a opinion, please don't take offense... it's not personal.

Local cache does solve the problem, and safely. If this were a module that standardized api requests to several popular apis and locally stored the cache... it would have zero risk, but this IMO is a hornets nest.

Major trick is getting developers to follow a standard, ie all use the same local caching module.

I created a local cache module a few years ago based off of pyfscache, and currently really like the work @marcelvedlt has been offering the community.

Again all this is my opinion, I don't down play any work or development done. I love contributions like this... gets you thinking... but unfortunately it's exploitable.
Image Lunatixz - Kodi / Beta repository
Image PseudoTV - Forum | Website | Youtube | Help?
Reply
#12
Since distributed networks is a little bit gray area and sensitive topic in kodi community, i though i was misunderstood.

If you think that local caching is good enough to solve the problem, then "puff" dump this thread and the code to trash bin anytime totally cool with that. But to my understanding, try any kind of local caching method known,

try http1.1 cache, try lru cache, try time based cache (as in simplecache) you name it, i think situation is more comlex than that, may be if there is an offical trakt, tvdb, or any other metadata provider in the forums, they can easily point out the uniqiness of the requests examining the http request logs.

i am pretty sure about that, if not ofc why bother with cryptography and rocket science thing you are %100 correct.
Reply

Logout Mark Read Team Forum Stats Members Help
script.module.hay A distributed cloud cache mechanism based on dispersy0