• 1
  • 31
  • 32
  • 33(current)
  • 34
  • 35
  • 37
[WIP] AniDB.net Anime Video Scraper
Do you use "use google search" on your AniDB-Scrapper settings? When i scrap my HxH 1999, my Debug looks different. Maybe it helps to set your Scrapper settings to default.

Your naming is almost identical to mine. With this naming, REGEX 2.4 should work great too.

For "HxH 1999" i had to use a NFO, bc of its Digits 1999 in the naming. But if I understand you correctly, you have problems with all your series.
Reply
Looks like the Google search is the problem.

Here's a quick fix if you want to still use Google search:
Open up the scraper xml (addons\metadata.anidb.net\anidb.xml)

Change line 32
From
Code:
<expression clear="yes" repeat="yes">(?i)&lt;a href=&quot;http://anidb\.net/perl-bin/animedb\.pl\?show=anime&amp;amp;aid=(\d+)&quot;</expression>
To
Code:
<expression clear="yes" repeat="yes">(?i)&lt;cite&gt;anidb\.net/perl-bin/animedb\.pl\?show=anime&amp;amp;aid=(\d+)&lt;/cite&gt;</expression>

And change line 35
From
Code:
<expression repeat="yes">(?i)&lt;a href=&quot;http://anidb\.net/a(\d+)&quot;</expression>
To
Code:
<expression repeat="yes">(?i)&lt;cite&gt;anidb\.net/a(\d+)&lt;/cite&gt;</expression>

Seems to work based on a quick test.
Reply
@Vaneska and @scudlee Google search was the culprit. Thank you both for your help.

I don't recall turning it on, but I was fiddling with a lot of settings. For some reason, it was enabled after I set the scraper to AniDB.net, then when I clicked Defaults it disabled it. I just tried again and it wasn't enabled by default for some reason. Must be operator error.

@scudlee is there any benefit to using Google search? I'll try your changes in the meanwhile.
Reply
The only thing I can think of where Google might be better is for inexact matches, possibly. There's a little wriggle room in the default method, but not much, you can omit words and it can still make a match, but you can't add words or have them in the wrong order, e.g. "Book of Bantorra, The" - Google should have no problems, default will probably fail. If you're using an anidb client to move/rename with precise names there shouldn't be any issues.

There's also a possibility of being out-of-date with the default search. If you look in the settings you'll see the anime list that the search uses is located on the same site as the mapping list. What this means is, like the mapping list, you're reliant on Bambi73 updating the list in order for newer entries to be scrapable. This is rarely an issue though as shows get added to anidb almost as soon as they are announced, so the list has to be massively out of date for something currently airing to not be listed. To give you an idea, the list was only recently updated (1st September), prior to that everyone was using a list from May with no complaints.

Unlike the mapping list, the anime list isn't manually created, it's just copied from a list available directly from anidb, and it's very easy to find the url for the original list and use that instead, if you want to keep absolutely up-to-date. I'm a little reluctant to give the url directly just in case there's a good reason why the scraper doesn't use it by default - like not wanting to annoy anidb, or something. Suffice to say, you can find it on the anidb wiki fairly easily, and it does work.
Reply
Hey i'm just having some problems with the scrapper. It used to work 11/10 but now i'm just crying.

When i try to update library

Could not download information
Unable to connect to remote server
Would you like to continue scanning? yes / no

If click yes the same messages. if i click no the scan stops

I got all my animes in an external hard drive (H: )

LOG FILE http://www.mediafire.com/?voqzp19dtujqfo4
Reply
I think I found a bug. It's a pretty interesting one. Smile

When attempting to scrape "Trigun" and "Trigun ~ Badlands Rumble" with anidb plugin regardless of whether the tvshow.nfo file exists or not, (when it exists, it points to http://anidb.net/perl-bin/animedb.pl?show=anime&aid=53 and http://anidb.net/perl-bin/animedb.pl?sho...e&aid=7208 respectively) the scraper locks up after selecting the series and will not proceed to scanning individual episodes. Upon digging through the debug logs, I've been able to isolate the cause of failure: AniDB API dies due to internal server error on those particular two shows and no others. To be specific, the http request to http://api.anidb.net:9001/httpapi?reques...1&aid=7208 returns "<error>Internal Server Error</error>"

That, obviously, is an AniDB problem, which I reported to the appropriate forum.

However, what constitutes a scraper bug is the fact that upon receiving that response it attempts to proceed anyway, and creates an empty TV show with no name in the database, which then requires going directly to the database to manually get rid of.
Reply
***Alternate anime-list.xml available***

For a while now, I've been expanding the anime-list.xml, that maps between anidb and tvdb, initially for the anime in my collection that was missing from the list, but I soon went beyond that. In April I sent my additions to bambi73 and they were added to the list. In May I sent another batch (along with a suggested modification to the scraper), but so far they haven't been added. With a new season fast approaching and my list even more expanded, I've decided not to wait and offer it directly as an alternative.

Actually, there's two lists:

anime-list.xml
Code:
https://raw.github.com/ScudLee/anime-lists/master/anime-list.xml

anime-list-full.xml
Code:
https://raw.github.com/ScudLee/anime-lists/master/anime-list-full.xml


The difference between them is that the "full" list also contains shows I've checked that are missing from tvdb, marked with tvdbid="unknown". Titles which fall outside the scope of tvdb (i.e standalone movies, one-shot OVAs, hentai, etc.) are included in both lists, and are marked accordingly (e.g. tvdbid="movie").

Marking an entry with a non-numerical tvdbid prevents the scraper from attempting to search tvdb to find a match in order to get fanart. For stuff that's never going to be on tvdb this is fine, for shows that might be at some point, if they do get added you won't get the fanart until the list is updated, but on the other hand there's a small chance the search will fetch the wrong fanart in the meantime (say, if there's a live-action version - happened to me with Taiho Shichau zo (You're Under Arrest) until I added it to the list). The choice is yours (most of the unknowns in fact have never been subbed according to anidb, so the risks are negligible on both sides).

Some stats:
Old anime-list.xml: 2057 titles, 1906 with valid tvdbids
My anime-list.xml (currently): 5146 titles, 3264 with valid tvdbids
anime-list-full.xml: 6065 titles, 3264 with valid tvdbids

To give a perspective, the anidb currently has 7346 anime (as of this morning)

To switch to my list, go to your anime source folder in XBMC in file view, bring up the context menu, select "change content", and then select the settings button. The setting to change is the Anime mapping URL.

The downside
As I mentioned at the start, my second batch of titles I sent came with a suggested mod. This mod allows the scraper to pull fanart directly from themoviedb.org if a tmdbid or imdbid is included in the anime-list entry. This list is designed for that mod. The old list includes (for 74 titles) hard-coded links to fanart on themoviedb.org (and elsewhere), these have been removed in my list (mostly). Without my mod, you won't get any fanart for those 74 titles. With my mod you'd get substantially more artwork over more titles. For example, the entry for Akira on the old list has three fanart listed, with my list and mod you'd currently have twelve to choose from. My list currently has 1156 valid tmdbid/imdbids.

I am planning on releasing a mod (with some other fixes and additions) soon, but if you're adventurous, you can view the diff here:
http://pastebin.com/zSrAnSWi and maybe apply it yourself.

For the less adventurous, here's the full code: http://pastebin.com/0dVZ3GN5 (Thanks to Vaneska for testing Smile )
Note: this is not my full mod, it's just the fanart part, plus the fix below.

If you'd like to help, you can view the titles that still need checking at
https://github.com/ScudLee/anime-lists/b...t-todo.xml
This list is preformatted, so in many cases just the tvdbid and defaulttvdbseason is needed. (The early titles in the list were a bit of a mess on tvdb when I checked them, best to skip them.) I am trying to actively maintain the list, so any changes you suggest will likely go straight in once checked. (Allow 48 hours for your own cache to clear.)
Reply
Thanks alot for the Mod and the Anime List!

Will the modifications work with Frodo-Builds? I tried the modifications and the List with a clean install, but the animes don't get any Posters or Artwork at all. Can provide you with some Logs if you are interested.

----------------------------------------------------------------

I also wonder, why there are Animes who has " . ", " ! " or " ) " Symbols between their names, arent scraped anymore. If the Symbol is at the end of the Name, it is still working Confused

Example:
A.D. Police 1999
Evangelion 1.0 You Are (Not) Alone
M.D. Geist

I could ofc use a nfo, but some Day's, Week's ago it worked Huh

Code:
T:1028  NOTICE: VideoInfoScanner: Starting scan ..
T:1028   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'C:\Animes\M.D. Geist\'
T:1028   DEBUG: ADDON::CScraper::FindMovie: Searching for 'M.D. Geist' using AniDB.net scraper (path: 'C:\Users\Stranger\AppData\Roaming\XBMC\addons\metadata.anidb.net', content: 'tvshows', version: '2.0.0')
T:1028   DEBUG: scraper: CreateSearchUrl returned <url gzip="yes" cache="anidb.xml">http://sites.google.com/site/anidblist/anidb.xml</url>
T:1028   DEBUG: scraper: GetSearchResults returned <results></results>
T:1028   DEBUG: ADDON::CScraper::FindMovie: Searching for 'M.D. Geist' using AniDB.net scraper (path: 'C:\Users\Stranger\AppData\Roaming\XBMC\addons\metadata.anidb.net', content: 'tvshows', version: '2.0.0')
T:1028   DEBUG: scraper: CreateSearchUrl returned <url gzip="yes" cache="anidb.xml">http://sites.google.com/site/anidblist/anidb.xml</url>
T:1028   DEBUG: scraper: GetSearchResults returned <results></results>
Reply
This is interesting, and not (I think) my fault. I can confirm the bug, but it's happening before any of my code changes, and before the list is used. I've just checked with the standard scraper to be sure, and gotten the same results.

My thinking is this: The scraper expects the show name being passed to it to be percent-encoded, so "M.D. Geist" would look like "m%2ed%2e%20geist". In getsearchresults in the scraper (on line 41) the percent-encoded parts are stripped out and replaced by [^<]* forming a regexp m[^<]*d[^<]*geist[^<]* which is then run against the anidb.xml to find matches.

From what I can tell the name is not being fully percent-encoded any more, it's ending up as m.d.%20geist which is then getting cleaned to m[^<]*d[^<]*20geist[^<]* which won't match anything. This must have been a fairly recent change I think, I've never noticed it before. From what I can tell by a quick test the characters .!-() are no longer being encoded.

So, line 41... Change from:
Code:
<expression clear="yes" repeat="yes">(?i)([a-z0-9]+)(?:%[a-f0-9]{2})*</expression>
to:
Code:
<expression clear="yes" repeat="yes">(?i)([a-z0-9]+)(?:%[a-f0-9]{2}|[!\.\-\(\)])*</expression>

Should work.

Reply
Thx, works again Smile And def. not your fault, should have made that more clear in my post, since i already had the problem some days ago. Would be great if you can add this fix to your upcoming Mod.

Reply
@Vaneska
Sorry, I was distracted by the second part of your post that I didn't notice the first part was a separate issue. Is it still an issue? I'm a little confused.

If it is then, yes, I would be interested in a debug log. Also, if this is after applying the mod, it would help to check your modded anidb.xml to make sure it was done correctly (I mean no offense by this, just trying to cover the bases - it definitely works on my end.)
Reply
No problem Smile

The issue with the Symbols between their names was fixed with the "code" fix you posted.

The problem with the Artwork, fixed!
Reply
I just came across this thread and i have a few questions.

First off here is my setup:

XBMC: Eden current release
OS: Win7
Running Sabnzbd+, Sickbeard (Anime version)

XBMC configuration:

Tv Shows and Anime are in seperate smart playlist
Anime is in its own folder on my server
naming format of files:
t:\anime\high school dxd\season 0\high school dxd - 0x02 - special 1.mkv
t:\anime\high school dxd\season 1\high school dxd - 1x01 - 001 - i got a girlfriend!.mkv

Now initially everything was scraped using TVDB and renaming of files was done by sickbeard.

What i have done to test anidb scraper out:

in High School dxd (library view) i removed(deleted) all the Special eps from the library
Went into Videos and change to anidb scraper (default settings)
Did scan for new content...scan last 1sec and files i have removed arent added.

Here is log: http://xbmclogs.com/show.php?id=10444 at line 1773 you can see the scan

So what could be the problem?

Second do i need the new anime list posted on page 49?
Third do i need to add anything to my advancedsettings.xml to get the scraping right for anime? (i plan on using the sabtoanidb.py script in sab)
lastly do i need to change my naming format for this to work? the 001 above is the Absolute number.

Thank you in advance for all the help
Image

If my replies help you, please click on my reputation Image below :) thanks :)
Reply
Code:
20:08:16 T:4316   DEBUG: ADDON::CScraper::GetEpisodeList: Searching 'http://www.thetvdb.com/api/1D62F2F90030C444/series/254653/all/en.zip' using AniDB.net scraper (file: 'C:\Users\htpc\AppData\Roaming\XBMC\addons\metadata.anidb.net', content: 'tvshows', version: '2.0.0')
20:08:16 T:4316   DEBUG: scraper: GetEpisodeList returned <episodeguide></episodeguide>
20:08:16 T:4316   DEBUG: VideoInfoScanner: No (new) information was found in dir T:\Anime\High School DxD\
This seems to be the source of your issue. The episodeguide URL is still pointing to the tvdb when it should be pointing to the anidb.

My guess would be that when you switched scrapers, you would have been given the option to refresh information for all items, and you chose 'no' (because you only needed a few things fixed.) This would have also left the TV show information (where the episodeguide URL is stored) unchanged.

I'd suggest going to the Highschool DxD folder (in library or file view) and refreshing the information for whole show (press 'i' while on the folder, select refresh). That should put in the right episodeguide URL. You can choose to not refresh the episode info, if you want to keep the tvdb-scraped episode info, but you might then have to manually refresh the special episodes to get them to use anidb.

If you only changed to the anidb scraper on that one show, that should be it. If you changed on the root folder, then all your shows will have the wrong episodeguide URL.

(You won't need to switch to the newer anime-list.xml to get episode information for Highschol DxD, but it's still a good idea. It's a lot more up to date.)

On the third issue, you will likely need to add a new tvshowmatching regexp if you plan on dropping the SxEE from the filenames (or hope that they'll be ignored in favor of the absolute number).
Reply
(2012-10-12, 09:09)scudlee Wrote:
Code:
20:08:16 T:4316   DEBUG: ADDON::CScraper::GetEpisodeList: Searching 'http://www.thetvdb.com/api/1D62F2F90030C444/series/254653/all/en.zip' using AniDB.net scraper (file: 'C:\Users\htpc\AppData\Roaming\XBMC\addons\metadata.anidb.net', content: 'tvshows', version: '2.0.0')
20:08:16 T:4316   DEBUG: scraper: GetEpisodeList returned <episodeguide></episodeguide>
20:08:16 T:4316   DEBUG: VideoInfoScanner: No (new) information was found in dir T:\Anime\High School DxD\
This seems to be the source of your issue. The episodeguide URL is still pointing to the tvdb when it should be pointing to the anidb.

My guess would be that when you switched scrapers, you would have been given the option to refresh information for all items, and you chose 'no' (because you only needed a few things fixed.) This would have also left the TV show information (where the episodeguide URL is stored) unchanged.

I'd suggest going to the Highschool DxD folder (in library or file view) and refreshing the information for whole show (press 'i' while on the folder, select refresh). That should put in the right episodeguide URL. You can choose to not refresh the episode info, if you want to keep the tvdb-scraped episode info, but you might then have to manually refresh the special episodes to get them to use anidb.

If you only changed to the anidb scraper on that one show, that should be it. If you changed on the root folder, then all your shows will have the wrong episodeguide URL.

(You won't need to switch to the newer anime-list.xml to get episode information for Highschol DxD, but it's still a good idea. It's a lot more up to date.)

On the third issue, you will likely need to add a new tvshowmatching regexp if you plan on dropping the SxEE from the filenames (or hope that they'll be ignored in favor of the absolute number).

Your right i did hit No, so i should just refresh the show and it should work? As for the filename, i would prefer keeping the format im using (so to not have to rename all my anime). I hope the sabtoanidb will permit that. For the swtch, i guess i have the choice of pointing directly to the xml location or downloading the file to my pc. Should i use the Full list?
Image

If my replies help you, please click on my reputation Image below :) thanks :)
Reply
  • 1
  • 31
  • 32
  • 33(current)
  • 34
  • 35
  • 37

Logout Mark Read Team Forum Stats Members Help
[WIP] AniDB.net Anime Video Scraper3