Release [MOD] AniDB.net scrapers for TV shows and Movies
#1
AniDB.net Scraper Mods for Anime TV shows and Movies
 
Image

So here finally are my mods of the AniDB.net scraper.

Installation
You can download and install both of them through my new repo:
repository.scudlee

AniDB.net Scraper Mod for Anime TV shows
Current version: 2.5.0
There are several major improvements to the TV show scraper:

New anime-list.xml
The scraper now uses my updated anime-list.xml by default. This list is significantly more complete the the old list, and being actively maintained. (Help very much welcome!)

Movie fanart support
Using the new anime-list.xml, the scraper can now retrieve movie fanart directly from themoviedb.org. Movies linked to TV shows will fetch both movie and TV fanart, prioritizing movie.
(Of course, this is intended more for the movie scraper, but if you want to stick with using one scraper, it also works here.)

Search improvements
Several tweaks to the search:
Shows distinguished by a year in brackets (e.g. Bakuman. (2012)) now match correctly.
Shows distinguished by a final apostrophe (i.e. Gintama' and Dog Days') now match correctly.
Some punctuation marks were preventing titles being scraped correctly in Frodo builds, this has been fixed.
Google search works again, should you want to use it.

Support for OPs/EDs, trailers, etc.
OPs/EDs, Trailers, Parodies, Other are now treated as high-valued specials:
OPs/EDs "C": Season 0 Episodes 101-199 (e.g. S00E101)
Trailers "T": Season 0 Episodes 201-299
Parodies "P": Season 0 Episodes 301-399
Other "O": Season 0 Episodes 401-499
These "specials" will always be mapped to the end of the episode list, independent of your settings. I can change this if need be.
Note: This is only fully supported in Frodo builds. Eden will be slightly unpredictable.

Cast handling changes
Cast without a picture in AniDB are now included.
(Note: None of the following currently works because of changes in the xml returned by AniDB (episode data is no longer provided for the cast). The old scraper no longer works either, so both scrapers will appear to behave identically in this regard. I've included the change because if there's a way to get it working, they will be different.)
In XBMC, actors can be added to either TV shows or individual episodes. In the tvdb scraper, Main cast are only added to the TV Show, and guest stars to the episodes they appear in. In AniDB, the cast is split into three, Main, Secondary, and "appears in" (i.e guest stars). The old scraper would ignore the "appears in" cast completely, and add the main and secondary cast to both the TV show and the episodes they appear in (with an option to also ignore secondary cast). The Mod scraper follows the tvdb model: Main only to the show, "appears in" only to episodes, with an option to group Secondary with Main or "appears in".

1.1.0 Features
See Post#10

2.0.0 Features
See Post#270

2.1.0 Features
See Post#292

2.2.0 Features
See Post#663

2.3.0 Features
See Post#689

2.4.0 Features
See Post#724

2.5.0 Features
See Post#973

Planned improvements include: Support for the new thumb aspect feature (added 1.1.0), bugfix for empty plots returning a tag description (I have a fix but it's ugly) (added 1.1.0),... Suggestions welcome.


AniDB.net Scraper Mod for Anime Movies
Current version: 2.3.0
This is a rather crude first pass at a movie scraper. It basically treats any title in AniDB as a movie, ignoring any episode details, so it's not suitable for use with movie series like Break Blade, Kara no Kyoukai, or Mardock Scramble.

There was one slight problem, the movie scraper is essentially the TV scraper with the episode parts chopped off, and without them, it's a little fast. Too fast for AniDB's liking - you'd get banned after only a few titles. To combat this, I've added a delay loop to the scraper, in which the scraper idly runs through some increasingly (and then decreasingly) time-consuming regexps. The amount that it does is controlled by a delay parameter in the settings. The default value (125) produced for me a, let's say, sedate pace. You can decrease it if you want, but if you get banned, you're on your own! Although I would be interested in finding a reasonable sweet spot, if people want to report their findings with lower values.

1.1.0 Features
See Post#10

2.0.0 Features
See Post#270

2.1.0 Features
See Post#292

2.2.0 Features
See Post#724

2.3.0 Features
See Post#973

Planned improvements include: Support for scraping more information (Trailers/certification added 2.0.0)/posters from themoviedb.org (added 1.1.0) (currently only the fanart is), support for movie sets (either scraped from themoviedb.org (added 1.1.0) or by adding them to anime-list.xml, or both (added 2.0.0)),... Suggestions welcome.

Changelogs:

metadata.tvshows.anidb.net.mod
2.5.0
Changed: New ratings format
Changed: Plot filtering improved
Changed: Minor genre handling tweaks
Changed: Default search results display anidb id in unused language spot rather than in title
Changed: Use uniqueids to store multiple ids
Changed: updated icon
Changed: Separate language setting from Official name setting
Added: Option to use old rating format
Added: Option to not get ratings
Added: Audience tagging (kodomo, shounen, josei, etc.)
Added: Episode plots fetched in chosen language (English fallback)
Fixed: TVDB details/artwork scraping (using v2 api)

2.4.0
Fixed: Genre parsing
Fixed: Minor plot handling fixes

2.3.0
Added: Ability to use episode offset attributes in anime-list.xml

2.2.0
Changed: Episode titles now try to respect "Official language" setting

2.1.0
Added: Original Work and Location tagging
Added: Ability to use theTVDB.com Absolute order from mapping list

2.0.1
Version bump to update over 2.0.0rc1

2.0.0
Changed: GetDetails broken into separate shared functions
Changed: Genre handling rewritten
Changed: Simplified artwork handling
Changed: Google search rewritten
Added: Delay parameter to slow scraping
Added: Season artwork support
Fixed: Episode titles with an apostrophe followed by a space would lose the space

1.1.1:
Changed: Dropped the www from thetvdb URLs

1.1.0:
Changed: Simplified genre count code
Changed: Prioritised defaulttvdbseason wide banners
Changed: Now uses displayafterseason/displaybeforeseason
Added: Support for tagging
Added: Movie posters from themoviedb.org
Added: Support for thumb aspects
Fixed: Empty plot description would result in a category description used instead

1.0.0:
(Changes from official anidb.net scraper 2.0.0)
Changed: Default locations of anime lists
Changed: Handling of Main/Secondary/"Appears in" cast altered to better match tvdb scraper
Added: Support for treating OPs/EDs, trailers, etc. as specials (requires Frodo)
Added: Support for retrieving fanart from themoviedb.org
Added: Support for shows distinguished by a year in brackets
Fixed: Cast without pictures were being ignored
Fixed: Shows distinguished by a final apostrophe now match correctly (e.g. Gintama')
Fixed: Some punctuation marks are no longer being percent-encoded
Fixed: Google search results parsed correctly again

metadata.movies.anidb.net.mod
2.3.0
Changed: New ratings format
Changed: Plot filtering improved
Changed: Minor genre handling tweaks
Changed: Default search results display anidb id in unused language spot rather than in title
Changed: Use uniqueids to store multiple ids
Changed: updated icon
Added: Option to use old rating format
Added: Option to not get ratings
Added: Audience tagging (kodomo, shounen, josei, etc.)
Fixed: TVDB details/artwork scraping (using v2 api)

2.2.0
Fixed: Genre parsing
Fixed: Minor plot handling fixes

2.1.0
Added: Original Work and Location tagging

2.0.1
Version bump to update over 2.0.0rc1

2.0.0
Changed: GetDetails broken into separate shared functions
Changed: Genre handling rewritten
Changed: Simplified artwork handling
Changed: Google search rewritten
Added: Movie trailers from themoviedb.org
Added: Certification from themoviedb.org
Added: Movie sets from anime-movieset-list.xml

1.1.1:
Changed: Dropped the www from thetvdb URLs

1.1.0:
Changed: Simplified genre count code
Changed: Switched alternate id to imdb/tmdb
Added: Support for movie sets from themoviedb.org
Added: Support for tagging
Added: Movie posters from themoviedb.org
Added: Support for thumb aspects
Fixed: Empty plot description would result in a category description used instead
Removed: No wide banners for movies

1.0.0:
Initial Commit

To compliment the new scrapers, I also adapted and expanded the various AniDB Client TagSystem rules that were posted in the previous thread.

These rules will separate movies and one-shot OVAs into a movie folder, to be scraped by the movie scraper. They will also renumber OPs/EDs, trailers, etc. to be picked up the TV scraper.

AniDB Client TagSystem Rules
Version 1.1.0
(Full version) http://pastebin.com/raw.php?i=MkswMaME
(Unformatted) http://pastebin.com/raw.php?i=9dgzsZKB
The Full version is too large to save in the AniDB client, but contains explanations and alternatives for each step, so only use it to modify the Unformatted version to your preferences.

To use these rules you'll need to have an account on AniDB, and then go to the AniDB Client and in the options press "Go Advanced" and then enable Filemoving and Filerenaming, using the Tagging System for both. Then edit the Tagging System, deleting the original content and pasting in the contents of the link above.

You'll first need to edit the BaseTVShowPath and BaseMoviePath variables to point to their respective folders, and you can also edit the FileInfo variable to however you like it.

Default file names will look like:
Code:
Z:\Anime\TV Shows\Hyouka\Hyouka - 01v2 - The Esteemed Classics Club Has Been Restored [Mazui][HDTV][1280x720][h264][F2BB20F65].mkv
Z:\Anime\TV Shows\Akazukin Chacha\Akazukin Chacha - S101 - Opening 1 [GCP][DVD][640x480][h264][7AA0FBDB].avi
Z:\Anime\Movies\Gekijouban Macross F Itsuwari no Utahime\Gekijouban Macross F Itsuwari no Utahime [Doki][BluRay][1920x1080][h264][4C96D537].mkv

In order to use the default episode numbering, I recommend the following tvshowmatching rules in your advancedsettings.xml (wiki):
Code:
<advancedsettings>
<tvshowmatching action="prepend">
<regexp> - ()(\d+)((?:-\d+)*)(?:v\d+)? - [^\\/]*$</regexp>
<regexp defaultseason="0"> - ()s(\d+)((?:-\d+)*)(?:v\d+)? - [^\\/]*$</regexp>
</tvshowmatching>
</advancedsettings>
The defaultseason regexp will only work in Frodo builds. If you're still on Eden, you'll need to remove it and also un-comment one of the alternate "Special" variables in the rules.
The " - " in the regexps is intended to match the Separator variable in the rules, if you're running into conflicts on non-anime files, you can change these to something unique.

Known issues:
Shows only distinguished by an added question mark in the title will be placed in separate folders (the "?" is replaced by an "_" ), but will still be scraped as the same show. You'll need to manually refresh the second folder.
...It might only be Shinryaku!? Ika Musume this applies to, so if you don't have that, you can probably safely comment out the line that replaces the question marks, if you'd rather them just be removed.

Suzumiya Haruhi no Yuuutsu (2009) episode numbering is... funky. It includes the 2006 episode numbers, and the filenames will need to be manually fixed.
Reply


Messages In This Thread
[MOD] AniDB.net scrapers for TV shows and Movies - by scudlee - 2012-10-15, 19:34
RE: - by scudlee - 2013-10-12, 17:42
Logout Mark Read Team Forum Stats Members Help
[MOD] AniDB.net scrapers for TV shows and Movies8