Clean scraping API
#48
(2013-04-02, 14:25)malte Wrote: 2. platform detection:
It looks like you do platform detection via file extension. I am afraid this will always be error-prone at least when you have to deal with multi platform extensions like .img or .bin. Any plans how to solve this?
Just had some beers with a buddy and went into techno-babel mode about heimdall's AI aspects Smile He's not a programmer, so I was explaining to him that you started with a fact (such as "the movie's filename is /home/Avatar.mkv") and try to answer questions (what is the movie's format? what is the movie's genre?). You have tasks which operate on the subject, and their purpose is to generate new facts, assuming that their required premises are true.

I realized that the theory behind its operation is, word for word, that of an inference engine. Subject tasks are treated like inference rules (If P, then Q: "If subject has filename, then title is filename minus extension" or "If subject has title, then genre is <movie["genre"] from api.tmdb.org/title>"). Heimdall schedules the rule when its premises are fulfilled, and the outcome is the possible establishment of its conclusions. Starting with a fact or two, heimdall applies rules to infer new facts about the subject, adding each new fact to its knowledge base, and the cool thing is once a fact has been established it can be used to infer other facts as well (forward-chaining).

As far as future scraper devs go, I feel that re-enforcing the inference engine metaphor is a double edged sword. On the down side, it's a layer of complexity and frustration (AI isn't the most approachable subject). On the other hand, it gracefully handles a lot of other questions that are going to be asked. It helped me out when my buddy asked me a question about rule priorities: specifically, how does an inference engine choose which rule is "best", and like malte is wondering, which platform detection algorithm is best?

My buddy can't code but he likes to eat, so I gave an example of an inference engine that inferred the color of a fruit you were eating. If you started with the rule "If it's a lemon, then it's yellow" and you're eating a lemon, the engine will infer your fruit is yellow. How do you attach a priority to the statement "If it's a lemon, then it's yellow"? There's no concept of rule priority here - either the rule is true, or it isn't. If it's not true, it doesn't belong in the rule base, simple as that. This highlights two important points, the first is that Heimdall's rules must not introduce inconsistent information. (In a normal inference engine, you might have a consistency enforcer attempting to maintain a consistent representation of the emerging solution, usually using timestamps of derived facts or the occam razor principal.) For example, if you're eating a banana, you can't add a rule that says "If it's a banana, then it's yellow" because it might be green. Similarly, it's valid to have the rule "If it's a .nes file then it's a NES game" but not "If it's a .bin file then it's a PSX game".

The second important point i'll make is: don't ignore data. Let's say a scraper dev comes along and adds the following rules: "If it's unripe, it's green", "If it's ripe, it's yellow". Neither type or taste is sufficient, but with both you can unambiguously resolve color for lemons and bananas. As long as we don't violate the consistency requirement, rule bases are never-ending pots that we can keep teaching new ideas to.

Outside of a class from prof. M. Dyer, my AI experience is pretty limited. Topfs2, you're the one with the master's degree (though I wouldn't be surprised if prof Dyer co-authored a course book Smile). Task priorities and concerns over data consistency naturally arise from a task-based paradigm, so I'd like to hear your thoughts on re-enforcing the inference engine metaphor.

(2013-04-03, 07:12)N3MIS15 Wrote: +1 on this, With thegamesdb module i was able to "trick" the title retrieving by using the name "Wonderboy III.sms" and it returned "Wonderboy" (the first in the series). Giving the full/correct name "Wonderboy III The Dragons Trap.sms" returned the correct result. Altenative titles/platforms may also need to be taken into account as, for example the platform "Playstation" is also commonly named "PSX" or "PS1". On top of that regions may also need to be considered. Mario Bros. 2 (JP) is a totaly different game than Mario Bros. 2 (US)/(EU).
I'll give this more thought, but a bayesian filter should make an interactive mode virtually unnecessary.
Reply


Messages In This Thread
Clean scraping API - by topfs2 - 2012-06-14, 19:33
RE: Clean scraping API - by olympia - 2012-06-14, 22:38
RE: Clean scraping API - by DonJ - 2012-06-15, 01:36
RE: Clean scraping API - by da-anda - 2012-06-15, 10:27
RE: Clean scraping API - by topfs2 - 2012-06-16, 11:30
RE: Clean scraping API - by da-anda - 2012-06-18, 22:19
RE: Clean scraping API - by DonJ - 2012-06-27, 12:59
RE: Clean scraping API - by lboregard - 2012-07-01, 04:57
RE: Clean scraping API - by topfs2 - 2012-07-04, 10:34
RE: Clean scraping API - by lboregard - 2012-07-04, 12:09
RE: Clean scraping API - by olympia - 2012-06-16, 12:02
RE: Clean scraping API - by topfs2 - 2012-06-16, 17:05
RE: Clean scraping API - by Maxoo - 2012-06-17, 01:19
RE: Clean scraping API - by RockerC - 2012-06-20, 15:38
RE: Clean scraping API - by NEOhidra - 2012-06-19, 16:25
RE: Clean scraping API - by solidsatras - 2012-06-20, 09:40
RE: Clean scraping API - by Hitcher - 2012-06-20, 10:08
RE: Clean scraping API - by Martijn - 2012-06-20, 10:16
RE: Clean scraping API - by Montellese - 2012-06-20, 10:13
Re: Clean scraping API - by Martijn - 2012-06-20, 16:34
RE: Clean scraping API - by Martijn - 2012-06-20, 21:04
RE: Clean scraping API - by jmarshall - 2012-06-20, 23:46
RE: Clean scraping API - by solidsatras - 2012-06-30, 16:09
RE: Clean scraping API - by Thorbear - 2012-06-30, 13:53
RE: Clean scraping API - by TheAstronaut - 2012-07-02, 16:39
RE: Clean scraping API - by spiff - 2012-07-03, 18:53
RE: Clean scraping API - by TheAstronaut - 2012-07-03, 21:03
RE: Clean scraping API - by Martijn - 2012-07-04, 11:37
RE: Clean scraping API - by topfs2 - 2012-07-07, 12:43
RE: Clean scraping API - by kimp93 - 2012-08-22, 03:28
RE: Clean scraping API - by topfs2 - 2012-08-22, 11:37
RE: Clean scraping API - by aptalca - 2012-07-24, 21:37
RE: Clean scraping API - by kimp93 - 2012-08-23, 05:26
RE: Clean scraping API - by topfs2 - 2012-08-23, 11:53
RE: Clean scraping API - by malte - 2013-03-03, 10:10
RE: Clean scraping API - by topfs2 - 2013-03-06, 09:19
RE: Clean scraping API - by garbear - 2013-03-06, 08:09
RE: Clean scraping API - by garbear - 2013-03-06, 10:11
RE: Clean scraping API - by malte - 2013-03-06, 18:01
RE: Clean scraping API - by topfs2 - 2013-03-11, 15:11
RE: Clean scraping API - by garbear - 2013-03-30, 16:09
RE: Clean scraping API - by topfs2 - 2013-03-31, 20:00
RE: Clean scraping API - by garbear - 2013-04-01, 07:35
RE: Clean scraping API - by malte - 2013-04-02, 14:25
RE: Clean scraping API - by topfs2 - 2013-04-02, 15:03
RE: Clean scraping API - by garbear - 2013-04-02, 16:56
RE: Clean scraping API - by N3MIS15 - 2013-04-03, 07:12
RE: Clean scraping API - by garbear - 2013-04-03, 11:27
RE: Clean scraping API - by topfs2 - 2013-04-04, 08:59
RE: Clean scraping API - by malte - 2013-04-03, 12:56
RE: Clean scraping API - by garbear - 2013-04-04, 08:38
RE: Clean scraping API - by natethomas - 2013-04-04, 10:23
RE: Clean scraping API - by topfs2 - 2013-04-04, 10:56
RE: Clean scraping API - by natethomas - 2013-04-05, 09:58
RE: Clean scraping API - by da-anda - 2013-04-05, 11:25
RE: Clean scraping API - by Bstrdsmkr - 2013-04-05, 16:05
RE: Clean scraping API - by topfs2 - 2013-04-05, 12:27
RE: Clean scraping API - by garbear - 2013-04-05, 16:27
RE: Clean scraping API - by jmarshall - 2013-04-06, 07:36
RE: Clean scraping API - by topfs2 - 2013-04-10, 08:38
RE: Clean scraping API - by natethomas - 2013-04-10, 09:28
RE: Clean scraping API - by garbear - 2013-04-10, 09:42
RE: Clean scraping API - by N3MIS15 - 2013-04-10, 10:40
RE: Clean scraping API - by garbear - 2013-04-10, 09:34
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:29
RE: Clean scraping API - by garbear - 2013-04-10, 13:43
RE: Clean scraping API - by topfs2 - 2013-04-10, 13:58
RE: Clean scraping API - by jmarshall - 2013-04-10, 10:05
RE: Clean scraping API - by garbear - 2013-04-10, 12:08
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:07
RE: Clean scraping API - by N3MIS15 - 2013-04-11, 11:32
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:42
RE: Clean scraping API - by jmarshall - 2013-04-11, 09:00
RE: Clean scraping API - by topfs2 - 2013-04-11, 11:04
RE: Clean scraping API - by garbear - 2013-04-11, 12:05
Re: Clean scraping API - by queeup - 2013-04-11, 16:58
RE: Clean scraping API - by topfs2 - 2013-04-11, 18:04
Re: Clean scraping API - by queeup - 2013-04-11, 19:44
RE: Clean scraping API - by garbear - 2013-04-11, 21:41
Re: Clean scraping API - by queeup - 2013-04-11, 22:05
RE: Clean scraping API - by garbear - 2013-04-11, 22:51
RE: Clean scraping API - by topfs2 - 2013-04-17, 10:50
RE: Clean scraping API - by garbear - 2013-05-09, 23:05
RE: Clean scraping API - by TheMonkeyKing - 2013-10-18, 22:31


Logout Mark Read Team Forum Stats Members Help
Clean scraping API3
This forum uses Lukasz Tkacz MyBB addons.