Clean scraping API
#51
(2013-04-03, 11:27)garbear Wrote: Just had some beers with a buddy and went into techno-babel mode about heimdall's AI aspects Smile He's not a programmer, so I was explaining to him that you started with a fact (such as "the movie's filename is /home/Avatar.mkv") and try to answer questions (what is the movie's format? what is the movie's genre?). You have tasks which operate on the subject, and their purpose is to generate new facts, assuming that their required premises are true.

I realized that the theory behind its operation is, word for word, that of an inference engine. Subject tasks are treated like inference rules (If P, then Q: "If subject has filename, then title is filename minus extension" or "If subject has title, then genre is <movie["genre"] from api.tmdb.org/title>"). Heimdall schedules the rule when its premises are fulfilled, and the outcome is the possible establishment of its conclusions. Starting with a fact or two, heimdall applies rules to infer new facts about the subject, adding each new fact to its knowledge base, and the cool thing is once a fact has been established it can be used to infer other facts as well (forward-chaining).

As far as future scraper devs go, I feel that re-enforcing the inference engine metaphor is a double edged sword. On the down side, it's a layer of complexity and frustration (AI isn't the most approachable subject). On the other hand, it gracefully handles a lot of other questions that are going to be asked. It helped me out when my buddy asked me a question about rule priorities: specifically, how does an inference engine choose which rule is "best", and like malte is wondering, which platform detection algorithm is best?

My buddy can't code but he likes to eat, so I gave an example of an inference engine that inferred the color of a fruit you were eating. If you started with the rule "If it's a lemon, then it's yellow" and you're eating a lemon, the engine will infer your fruit is yellow. How do you attach a priority to the statement "If it's a lemon, then it's yellow"? There's no concept of rule priority here - either the rule is true, or it isn't. If it's not true, it doesn't belong in the rule base, simple as that. This highlights two important points, the first is that Heimdall's rules must not introduce inconsistent information. (In a normal inference engine, you might have a consistency enforcer attempting to maintain a consistent representation of the emerging solution, usually using timestamps of derived facts or the occam razor principal.) For example, if you're eating a banana, you can't add a rule that says "If it's a banana, then it's yellow" because it might be green. Similarly, it's valid to have the rule "If it's a .nes file then it's a NES game" but not "If it's a .bin file then it's a PSX game".

The second important point i'll make is: don't ignore data. Let's say a scraper dev comes along and adds the following rules: "If it's unripe, it's green", "If it's ripe, it's yellow". Neither type or taste is sufficient, but with both you can unambiguously resolve color for lemons and bananas. As long as we don't violate the consistency requirement, rule bases are never-ending pots that we can keep teaching new ideas to.

Outside of a class from prof. M. Dyer, my AI experience is pretty limited. Topfs2, you're the one with the master's degree (though I wouldn't be surprised if prof Dyer co-authored a course book Smile). Task priorities and concerns over data consistency naturally arise from a task-based paradigm, so I'd like to hear your thoughts on re-enforcing the inference engine metaphor.

This might be the best written explanation of what I tried to achieve with heimdall Smile This is exactly what I had in mind how heimdall will find the scraping pipeline, essentially by stumbling across the end result by interference.

The thing which breaks this is when the rules polishes a inference, for example title. There will be several tasks which will alter that title (polishing it and making it nicer). So there is a small sense of priority in heimdall to accomondate for that, i.e. if a rule depends on title (like tmdb search) don't infer until all rules which infer it has done so.

e.g.
File URL -> title.
File URL -> MediaInfo
MediaInfo -> Audio/Video
Audio/Video -> Duration
Duration -> Movie/TV Show
Movie/TV Show -> (polished) title

title -> tmdb.

so in this case we can't infer title -> tmdb until Movie/TV Show -> Polished title

So the pipeline becomes:

File URL -> title, Media Info -> Audio/Video -> Duration -> Movie/TV Show -> (polished) title -> tmdb

This is mostly to make it more approachable. It would be very possible to do this by pure interference by actually naming the property polished title. Then tmdb couldnt be run until that is inferred
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply


Messages In This Thread
Clean scraping API - by topfs2 - 2012-06-14, 19:33
RE: Clean scraping API - by olympia - 2012-06-14, 22:38
RE: Clean scraping API - by DonJ - 2012-06-15, 01:36
RE: Clean scraping API - by da-anda - 2012-06-15, 10:27
RE: Clean scraping API - by topfs2 - 2012-06-16, 11:30
RE: Clean scraping API - by da-anda - 2012-06-18, 22:19
RE: Clean scraping API - by DonJ - 2012-06-27, 12:59
RE: Clean scraping API - by lboregard - 2012-07-01, 04:57
RE: Clean scraping API - by topfs2 - 2012-07-04, 10:34
RE: Clean scraping API - by lboregard - 2012-07-04, 12:09
RE: Clean scraping API - by olympia - 2012-06-16, 12:02
RE: Clean scraping API - by topfs2 - 2012-06-16, 17:05
RE: Clean scraping API - by Maxoo - 2012-06-17, 01:19
RE: Clean scraping API - by RockerC - 2012-06-20, 15:38
RE: Clean scraping API - by NEOhidra - 2012-06-19, 16:25
RE: Clean scraping API - by solidsatras - 2012-06-20, 09:40
RE: Clean scraping API - by Hitcher - 2012-06-20, 10:08
RE: Clean scraping API - by Martijn - 2012-06-20, 10:16
RE: Clean scraping API - by Montellese - 2012-06-20, 10:13
Re: Clean scraping API - by Martijn - 2012-06-20, 16:34
RE: Clean scraping API - by Martijn - 2012-06-20, 21:04
RE: Clean scraping API - by jmarshall - 2012-06-20, 23:46
RE: Clean scraping API - by solidsatras - 2012-06-30, 16:09
RE: Clean scraping API - by Thorbear - 2012-06-30, 13:53
RE: Clean scraping API - by TheAstronaut - 2012-07-02, 16:39
RE: Clean scraping API - by spiff - 2012-07-03, 18:53
RE: Clean scraping API - by TheAstronaut - 2012-07-03, 21:03
RE: Clean scraping API - by Martijn - 2012-07-04, 11:37
RE: Clean scraping API - by topfs2 - 2012-07-07, 12:43
RE: Clean scraping API - by kimp93 - 2012-08-22, 03:28
RE: Clean scraping API - by topfs2 - 2012-08-22, 11:37
RE: Clean scraping API - by aptalca - 2012-07-24, 21:37
RE: Clean scraping API - by kimp93 - 2012-08-23, 05:26
RE: Clean scraping API - by topfs2 - 2012-08-23, 11:53
RE: Clean scraping API - by malte - 2013-03-03, 10:10
RE: Clean scraping API - by topfs2 - 2013-03-06, 09:19
RE: Clean scraping API - by garbear - 2013-03-06, 08:09
RE: Clean scraping API - by garbear - 2013-03-06, 10:11
RE: Clean scraping API - by malte - 2013-03-06, 18:01
RE: Clean scraping API - by topfs2 - 2013-03-11, 15:11
RE: Clean scraping API - by garbear - 2013-03-30, 16:09
RE: Clean scraping API - by topfs2 - 2013-03-31, 20:00
RE: Clean scraping API - by garbear - 2013-04-01, 07:35
RE: Clean scraping API - by malte - 2013-04-02, 14:25
RE: Clean scraping API - by topfs2 - 2013-04-02, 15:03
RE: Clean scraping API - by garbear - 2013-04-02, 16:56
RE: Clean scraping API - by N3MIS15 - 2013-04-03, 07:12
RE: Clean scraping API - by garbear - 2013-04-03, 11:27
RE: Clean scraping API - by topfs2 - 2013-04-04, 08:59
RE: Clean scraping API - by malte - 2013-04-03, 12:56
RE: Clean scraping API - by garbear - 2013-04-04, 08:38
RE: Clean scraping API - by natethomas - 2013-04-04, 10:23
RE: Clean scraping API - by topfs2 - 2013-04-04, 10:56
RE: Clean scraping API - by natethomas - 2013-04-05, 09:58
RE: Clean scraping API - by da-anda - 2013-04-05, 11:25
RE: Clean scraping API - by Bstrdsmkr - 2013-04-05, 16:05
RE: Clean scraping API - by topfs2 - 2013-04-05, 12:27
RE: Clean scraping API - by garbear - 2013-04-05, 16:27
RE: Clean scraping API - by jmarshall - 2013-04-06, 07:36
RE: Clean scraping API - by topfs2 - 2013-04-10, 08:38
RE: Clean scraping API - by natethomas - 2013-04-10, 09:28
RE: Clean scraping API - by garbear - 2013-04-10, 09:42
RE: Clean scraping API - by N3MIS15 - 2013-04-10, 10:40
RE: Clean scraping API - by garbear - 2013-04-10, 09:34
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:29
RE: Clean scraping API - by garbear - 2013-04-10, 13:43
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:58
RE: Clean scraping API - by jmarshall - 2013-04-10, 10:05
RE: Clean scraping API - by garbear - 2013-04-10, 12:08
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:07
RE: Clean scraping API - by N3MIS15 - 2013-04-11, 11:32
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:42
RE: Clean scraping API - by jmarshall - 2013-04-11, 09:00
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:04
RE: Clean scraping API - by garbear - 2013-04-11, 12:05
Re: Clean scraping API - by queeup - 2013-04-11, 16:58
RE: Clean scraping API - by topfs2 - 2013-04-11, 18:04
Re: Clean scraping API - by queeup - 2013-04-11, 19:44
RE: Clean scraping API - by garbear - 2013-04-11, 21:41
Re: Clean scraping API - by queeup - 2013-04-11, 22:05
RE: Clean scraping API - by garbear - 2013-04-11, 22:51
RE: Clean scraping API - by topfs2 - 2013-04-17, 10:50
RE: Clean scraping API - by garbear - 2013-05-09, 23:05
RE: Clean scraping API - by TheMonkeyKing - 2013-10-18, 22:31
Logout Mark Read Team Forum Stats Members Help
Clean scraping API3