Looks like eldon really have no time to update his scraper so i started to make some small tweaks by myself. There is result, it's bit more tweaked than i expected at start
First i must appologize for wall of text i produced
and my english, hope it's somehow understandable.
Features:
Because this scraper is based on eldon's version, you should read his post above first, almost all informations apply there too and i will describe (almost) only changed behaviour.
1/
Searching for anime
- Google search - almost same like in eldon's version and i don't intent to tweak it more because i don't use it
- Anidb.xml search - my personal favorite, because i use AniDB utilities to rename and add anime to MyList i have almost 100% hit percentage (using main anime name as directory), it only miss when i mess directory naming.
-- commented out original download of anidb.xml because it never worked for me (downloaded only ~0.5MB of 2.3MB file - don't know if it's some curl setting or anidb.net limitation), so i only use cached file which i download manualy from
anidb.net (included current version at the end of post). Personally i set file timestamp of anidb.xml 2 month forward so it doesn't expire so soon in cache and when it finally expires, it warns me by XBMC scraper error message that i should update.
-- added some name filtering because <cleanstrings> from advancedsettings.xml doesn't work correctly
2/
Processing anime details
- added overtaking of temporary rating in case permanent isn't present (in case series isn't finished)
- added filtering out some generic (Asia, Japan, Game, Novel) or annoing (Sudden Girlfriend Appearance, Boing) anime genres
- added filtering of plot summary (unfortunatelly there is no pattern how plot summaries are written on anidb.net, so it will never be 100% :-S)
-- removing '*' from start
-- removing empty lines (no need to have them in cramped space for plot summary in XBMC) - sometimes doens't work for some reason (TODO)
-- removing duplicate white characters (spaces, tabs, etc)
-- removing http links to another anidb.net articles (e.g.
http://anidb.net/ch4051 [Chika] --> Chika)
-- removing source of summary informations (like [Source: ANN] or (taken from Animenfo))
- added filtering out (possible to turn it off in setting) only single studio. I'm using skin (Alaska) which shows studio logos as part of media flags, so i need only one studio in database to make it working. Additionally this feature prefer studios for which i have logo (posted my anime studios logos collection
here)
3/
How Fanarts&Poster are looked up (generally thetvdb.com lookup, it's used for episode extra details later too)
- first thetvdb.com is searched for following names (in particular order):
-- anime main name
-- anime x-jat synonym name
-- anime english official name
-- anime english synonym name
- if no match is found scraper tries find anime prequel, primary by "Prequel" link, secondary (if no prequel link exist) by "Parent Story" link (there is possibility to change this link type in setting)
- if prequel is found scraper returns to first step of this cycle (searching thetvdb.com for prequel names)
- it ends when match is found on thetvdb.com or there is no other prequel (at the series "root" anime)
Why this complicated recursive algoritm? It's because different approach of anidb.net and thetvdb.com to anime series, generally animes are not grouped to series, if there is sequel for some anime it has different name (like Hayate no Gotoku! and Hayate no Gotoku!!
). Anidb.net follows this style and has for each anime unique entry, on the other hand thetvdb.com follows more western style of tv series with seasons and so they add sequels as new seasons to first anime in row. Result is scraper basically needs to find that first anime in row because it contains fanarts, thumbs and banners for all seasons. Indeed you will get same pictures for all sequels/seasons but it is how thetvdb.com works, you can choose right one from XBMC because you will get list of all of them.
Off course sometimes this process fails horribly and i have some ideas how to improve it (means rewrite it completely), but right now results are satisfying and i have no much time to spare....
4/
Processing anime episodes details (from anidb.net)
- added filling runtime from anime details - anidb.net has no runtime specific for every episode, but most episodes of one anime has same runtime
- added filling director from anime details - right now it fills in first director what scraper find, filling episode specific director is in TODO
- rest is same like in eldon's original scraper
5/
Processing anime episodes extra details (from thetvdb.com)
- because anidb.net doesn't contains plot summary for every episode eldon added thetvdb.com episode lookup to his scraper to fill plot summary and episode thumb. I only tweaked search for specific episode because it sometimes missfired on multi-season animes.
-- what is new there is way how anime is searched on thetvdb.com, it uses exactly same recursive algorithm as in section 3/ -> advantage is that scraper will end up searching same thetvdb.com entry as for fanarts, disadvantage is that it can be pretty slow, because whole recursive lookup is done again for each episode
, luckily all results from fanart lookup are cached so everything will be done "locally".
-- unfortunatelly unlike fanarts lookup where scraper doesn't need to care about season (because pictures aren't divided by seasons) for episodes scraper must find right season. Basically scraper counts +1 for each prequel lookup, e.g if anime is found on thetvdb.com, then it's considered season 1, if anime prequel is found, it's considered season 2, if prequel of prequel is found, it's considered season 3 ..... Sometimes it works fine, sometimes not (mostly in cases when relations graph is too complicated or some OVA/OAVs are mixed in), live with it.
-- as stated above OVA/OAVs are real PITA because they aren't treated consistently on thetvdb.com, sometimes they are entered as regular season, sometimes as specials. So there is setting where extra details can be turned off for OVA/OAVs.
-- added preset for season and episode offset for episodes extra details lookup (see setting). This is workaround for issues above (mismatched seasons and OVA/OAVs mapped to specials). When season preset is used scraper will use this "hardcoded" value instead of computed one. Additionally for specials there is possibility to set episode offset in case more than one OVA/OAVs are listed in specials.
-- unfortunately i'm still not able to learn regular expressions and scraper engine how to foretell
, so it requires some checking for extra detail matches and/or manual work with presets setting. If you won't be bothered by some manual work simply turn extra details off in setting, you will lost episode plot summary (if filled on thetvdb.com at all) but at least for me it's not big deal because i check it only rarely. Thumbs will be generated from your video files by XBMC. (if enabled).
Settings:
- Use Google Search ... disabled by default, allow use Google as search engine instead of seaching in anidb.xml
- Enable anidb.net prequel lookup ... enabled by default, self-explanatory, see section 3/
- Alternative anidb.net prequel link type ... "Parent Story" as default, allow select alternative prequel link type from "Parent Story", "Alternative Setting" and "Side Story"
- Enable only single Animation studio return ... enabled by default, self-explanatory, see section 2/
- Enable thetvdb.org fanart/posters ... enabled by default, self-explanatory
- Enable thetvdb.org banners ... disabled by default, banners are wide posters and i added this setting because i don't use them in XBMC
- Enable thetvdb.org extra episode details ... enabled by default, self-explanatory, see section 5/
- Enable thetvdb.org extra episode details for OVA/OAVs ... disabled by default, tells scraper if it should lookup extra details on thetvdb.com for OVA/OAVs episodes, see section 5/
- Enable presets for thetvdb.org extra episode details ... disabled by default, enables following two settings
- Preset season number ... 1 as default, see description in section 5/
- Preset episode number offset ... +0 as default, see description in section 5/, enabled only when Preset season number is set to 0 (specials on thetvdb.com)
TODO:
- cleanup ... i started by modifying eldon's original scraper so i'm still using passing function parameters over actual download urls, which isn't clean. I should use clearbuffers="no" instead.
- some tweaks in plot summary cleaning
- some tweaks to multiple directors handling
- add possibility to lookup thetvdb.com episode by name, current solution based on episode number lookup doesn't work correctly with single (long running) anime divided to multiple seasons on thetvdb.com (e.g. One Piece)
Scraper (use Download link in upper right corner):
http://pastebin.com/749rzV3R
Current anidb.xml (Created: Tue Apr 6 02:00:18 2010 (5773 anime, 31702 titles)):
http://www.megaupload.com/?d=0TSMZMX1
Let me know if you find some bugs.