Release [MOD] AniDB.net scrapers for TV shows and Movies
(2013-05-30, 21:38)ZERO <ibis> Wrote: Suggestion for genera:

Create an option that allows the scrapper to ignore subcategories.

Example: http://anidb.net/perl-bin/animedb.pl?sho...e&aid=8583
Genera are: Alien, Clubs, Ecchi, Humanoid Alien, Mecha, Pantsu, Piloted Robot, Post-apocalypse

Pantsu for example is a sub level of Ecchi and could be ignored to only show ecchi instead.

A better approach may be to allow us to disable particular categories and have them fall back to their master category this way you could still keep some subcategories like Ninja but force others to fall back to their master category or ignore that category all together.

Also there should be an option for force include so that any categories under a defined type are always included regardless of weight. Original Work would be an important one where as we can then ensure that users can always filter by the type of content source the anime is based on which is REALLY useful.

There's definitely room for improvement in the genre handling, and my rewrite for 2.0.0 makes a lot of your suggestions possible (not that I necessarily agree with all of them).

I am very interested in concrete ideas, though, so I'll explain a little about how it works, so you know what we've got to work with.

To start with, this is what the API returns for each category:
Code:
<category id="282" parentid="313" hentai="false" weight="400">
  <name>Ecchi</name>
  <description>
Ecchi, or etchi (エッチ), is a common Japanese word meaning "indecent", "lewd", "frisky" or "sexy"; its usage can be compared to the English word "naughty". Among western anime fans, the term is used when vague sexual content, such as skimpy clothing or nudity, is prominent. In Japanese, however, it is a catch-all term for all things anyhow sexual in nature, mild or otherwise.
  </description>
</category>

And the 2.0.0 method is:
1. Sort the categories by weight (and trim off the description in the process).
2. Remove categories below the minimum weight.
3. "Build" an id filter:
3.1 Start with some ids that will always be ignored. (60|128|129|185|242|255|289)
3.2 Add more ids to the filter based on the presence of other ids. (IF id1 THEN IGNORE id2).
4. Apply the id filter and a (fixed) parentid filter (and convert to XBMC format).
5. Trim down to the maximum number of genres.


Things that would be doable:

Prioritize certain categories. Before step 1. we could sort certain categories to the top regardless of weight (and then ignore them in 1. and 2.).
I'm thinking here mostly of "traditional" genre categories: Comedy, Drama, Action, etc.

Add new id filters. This should go without saying. Look for the ParseGenres function in metadata.common.anidb.net\anidb.xml to see what's currently done, they're all commented for readability.

Expand the parentid filtering. Any of the following would also be possible:
IF id1 THEN IGNORE parentid2
IF parentid1 THEN IGNORE id2
IF parentid1 THEN IGNORE parentid2
It would also be possible to blanket ignore any id as a parentid, i.e gather all ids and use them in the parentid filter, so then only the most general category would remain. Or vice versa, parentids in the id filter, to leave only the most specific.
(It would also be possible to exempt certain ids from that, or restrict it to only certain ones).

Can't do/won't do:

Force inclusion of certain categories above the set maximum limit. The limit should be absolute.

Grandparent filtering. Only information about the direct parent is available. (This can of course be faked to a degree for specific cases but not in general.)

Include categories that aren't there. No replacing of a child with an absent parent.

Deliberately include categories that aren't even broadly "genres", such as the "original work" and "location" categories. This is the major point of the filtering in the first place.

HOWEVER... I do like the idea of taking those categories for use as tags (to the point where this is almost certainly going to happen). This seems like a better fit for the purposes you mention.
So there would be a setting to allow tags of the form "Original Work: 4-koma" (or whatever, the prefix would likely also be settable since the category itself is just "4-koma").
This would also be independent of the anidb tags setting, so you could have one without the other.
Reply


Messages In This Thread
RE: [RELEASE] [MOD] AniDB.net scrapers for TV shows and Movies - by scudlee - 2013-05-31, 20:28
RE: - by scudlee - 2013-10-12, 17:42
Logout Mark Read Team Forum Stats Members Help
[MOD] AniDB.net scrapers for TV shows and Movies8