Kodi Community Forum
[WIP] AniDB.net Anime Video Scraper - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Add-on Support (https://forum.kodi.tv/forumdisplay.php?fid=27)
+---- Forum: Metadata scrapers (https://forum.kodi.tv/forumdisplay.php?fid=147)
+---- Thread: [WIP] AniDB.net Anime Video Scraper (/showthread.php?tid=64587)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37


- Armitage - 2011-03-28

Hello people, new to both XBMC and this forum. I used MediaPortal before but i could never get bitstreaming to my receiver to work, but with XBMC it just... worked without any problems. Enough about that though.

Is there any way to get the scraper to use the english names for my anime instead? I haven't found an option for it and i have tried both the forum search option and google, but with no luck. I was checking around why it didnt find Eureka Seven, but it turns out it did but named it Koujyou-something Eureka Seven. Same with Sailor Moon. I'm just so used to the english names.


- asdzxcman - 2011-03-28

Works great here, too bad AniDB doesnt have movie posters but actual screenshots and my Break Blade movies look pretty much crappy compared to the rest of collection, the only drawback is that it takes A LOT of time to discover my 14+ out of 23 movies, but hey, it's still WIP, good job so far!


- salival - 2011-03-28

Armitage,

the easiest way to rename a series entry is to go to the XBMC context menu with the series you want to rename selected and choose "edit title"

A more round-a-bout way is to use nfo files or an external library manager. Those allow also to change the other info on the series.


- Armitage - 2011-03-28

Hmmmm... That easy-part can be debated, changing over 100 entries manually sounds like tedious work Smile. I'll try and google a bit for an external library manager. Thanks for the tip though.

If someone else has a good idea on how to do it, or if it is possible to add an option to the scraper where you can choose what name you want, please feel free to tell me.


- salival - 2011-03-28

I did say easiest, didn't I Wink

For the library managers a good place to start is the Supplemental Tools section on here.


- pathw - 2011-03-28

Armitage Wrote:Hmmmm... That easy-part can be debated, changing over 100 entries manually sounds like tedious work Smile. I'll try and google a bit for an external library manager. Thanks for the tip though.

If someone else has a good idea on how to do it, or if it is possible to add an option to the scraper where you can choose what name you want, please feel free to tell me.

this can be done. I can post a suggested change to the scraper files here Smile
EDIT: actually, I should have learnt my lesson from my previous attempt working at this. I THINK this can be done. If so, I'll post a patch. I'll have to actually check though.


- pathw - 2011-03-29

bambi73 Wrote:Turning cache off anywhere is bad idea, you will get banned on AniDB.net in no time. It's enough to clear $$1 at the end of each function, so new content from site/cache is appended to empty string.
If you want to test my solution you can try v1.0.1b1. I'll welcome to see your test results.


I have several changes to my personal copy, so I merged in your changes. It works great, thanks Smile. Btw you've done an amazing job working in this awful environment. I really hope someone makes a patch that makes it possible to write scrapers in python Smile

Anyway, these changes work well. I have some changes on my end. Do you want me to send them over to you?


- bambi73 - 2011-03-29

Armitage Wrote:Hmmmm... That easy-part can be debated, changing over 100 entries manually sounds like tedious work Smile. I'll try and google a bit for an external library manager. Thanks for the tip though.

If someone else has a good idea on how to do it, or if it is possible to add an option to the scraper where you can choose what name you want, please feel free to tell me.
I'll add to scraper possibility to select "official" title instead of "main" and select language of official title. Of course when there is no official title in selected language on AniDB it'll fallback to main title.

For example this is set of titles for Fullmetal Alchemist: Brotherhood, you can see what i mean by main and official titles:
Code:
<titles>
<title xml:lang="x-jat" type="main">Hagane no Renkinjutsushi (2009)</title>
<title xml:lang="en" type="synonym">Full Metal Alchemist: Brotherhood</title>
<title xml:lang="en" type="synonym">Fullmetal Alchemist 2</title>
<title xml:lang="en" type="synonym">Fullmetal Alchemist (2009)</title>
<title xml:lang="en" type="synonym">Full Metal Alchemist 2</title>
<title xml:lang="ru" type="synonym">Стальной Алхимик: Братство</title>
<title xml:lang="ko" type="synonym">강철의 연금술사 리메이크</title>
<title xml:lang="pl" type="synonym">Stalowy Alchemik</title>
<title xml:lang="sv" type="synonym">Stålalkemisten</title>
<title xml:lang="ar" type="synonym">الكيميائي المعدني الكامل : الأخوة</title>
<title xml:lang="ar" type="synonym">2 الكيميائي المعدني الكامل</title>
<title xml:lang="he" type="synonym">אלכימאי המתכת 2009</title>
<title xml:lang="lt" type="synonym">Metalinis Alchemikas 2</title>
<title xml:lang="tr" type="synonym">Metal Simyager 2: Kardeşlik</title>
<title xml:lang="tr" type="synonym">Metal Simyacı: Kardeşlik</title>
<title xml:lang="bg" type="synonym">Металния алхимик</title>
<title xml:lang="zh-Hans" type="synonym">钢之炼金术师 2009</title>
<title xml:lang="es-LA" type="synonym">Fullmetal Alchemist: Shintetsu</title>
<title xml:lang="x-unk" type="short">FMA (2009)</title>
<title xml:lang="en" type="short">FMA2</title>
<title xml:lang="en" type="short">FMAB</title>
<title xml:lang="x-jat" type="short">HagaRen (2009)</title>
<title xml:lang="x-jat" type="short">hagaren2</title>
<title xml:lang="ja" type="official">鋼の錬金術師 FULLMETAL ALCHEMIST (2009)</title>
<title xml:lang="en" type="official">Fullmetal Alchemist: Brotherhood</title>
<title xml:lang="fr" type="official">Fullmetal Alchemist: Brotherhood</title>
<title xml:lang="it" type="official">Fullmetal Alchemist: Brotherhood</title>
<title xml:lang="es" type="official">Fullmetal Alchemist: Brotherhood</title>
<title xml:lang="cs" type="official">Fullmetal Alchemist - Bratrství</title>
<title xml:lang="hu" type="official">Fullmetal Alchemist: Testvériség</title>
<title xml:lang="ro" type="official">Full Metal Alchemist: Fraternitate</title>
<title xml:lang="mn" type="official">Цулметалан Алхимич: Ахан дүүсийн барилдлага</title>
</titles>



- bambi73 - 2011-03-29

pathw Wrote:I have several changes to my personal copy, so I merged in your changes. It works great, thanks Smile. Btw you've done an amazing job working in this awful environment. I really hope someone makes a patch that makes it possible to write scrapers in python Smile

Anyway, these changes work well. I have some changes on my end. Do you want me to send them over to you?
Yes, I welcome any reasonable improvements ... which doesn't conflict with my ideas Wink.
Please add some note what did you changed because comparing it with my changes will be PITA.

And i don't think scraper parser environment is so bad, it does good job for regular usage, only maybe AniDB scraper is bit too big monster for it Big Grin
Back when i actively developed this scraper i thought about some Python support and already did some preliminary work on it, but it always ended like "Too much work with unsure results" Smile.
Right now AniDB scraper works fine and fullfil all my needs so it's enough for me.


- bambi73 - 2011-03-29

My plans for version 1.0.1:

- add workaround for ticket #11377 - it causes scraper freezing or wrong parses in some cases ... done
- split settings to categories (General, AniDB, ”TheTVDB”Wink ... done
- add possibility to select different source for anidb.xml and anime-list.xml files ... 30%
- add possibility to select official title (+language) over main title ... 0%

Any other ideas? I'm quite lazy to go thru almost year of posts in this thread :p


- Armitage - 2011-03-29

You guys are awesome. Thanks for the fast response and for jumping straight on to the problem Smile.


- pathw - 2011-03-29

bambi73 Wrote:Yes, I welcome any reasonable improvements ... which doesn't conflict with my ideas Wink.
Please add some note what did you changed because comparing it with my changes will be PITA.

And i don't think scraper parser environment is so bad, it does good job for regular usage, only maybe AniDB scraper is bit too big monster for it Big Grin
Back when i actively developed this scraper i thought about some Python support and already did some preliminary work on it, but it always ended like "Too much work with unsure results" Smile.
Right now AniDB scraper works fine and fullfil all my needs so it's enough for me.

well I mean, parsing xml/html with a regex itself is badass. There were changes I wanted to do, that were quite difficult. I'll list an example of a change I want, I don't know how to do.

How did you do the python work? I didn't know the scraper system in xbmc exposed had any hooks for python. (I don't have xbmc building on my machine, so I havent really seen how the plugin system works). If it's really possible, I dont mind working on the port. I'm sure it will be way better than struggling with regex and these continuous transformations.

so to merge the 2 xmls I used this tool http://tools.decisionsoft.com/xmldiff.html. It seems pretty good.

In the anidb.xml you listed to me yesterday, I saw this regexp
(?i)&lt;a href=&quot;http://anidb\.net/perl-bin/animedb\.pl\?show=anime&amp\;aid=(\d+)&quot;[^&gt;]*&gt;(.*?)&lt;/a&gt;

after show=anime you have &amp\; instead of &amp;. Any idea why?


I have 2 changes I have made.
1) for the fanart lookups in addition to the main title, official english, synonym x-jat and first synonym en, I have added the short en. (I needed this for madoka. There is a better solution, so let me talk about that)

2) the real change I made was to be able to import the other special episodes into xbmc. Trailers, Parodies, Other, and Ops and Eds that are all recognized by anidb. But the solution I have isnt very nice.

xbmc stores season and episode number as integers. So I have mapped episodes like T01 and P01 to very high season numbers. This works for me. Sadly the episodes get displayed by xbmc as being season 115 and so on. Tell me if you are interested in these changes.

There are 5 changes I want to make, but couldnt quite figure out.
1) For some reason the anidb image that shows up doesnt scale properly in the "tv show properties" window. I'm wondering how this logic works, especially because I'd also like to use this with the mediaInfo 2 view of xbmc.

2) I'd like to use the anidb image scaled as if it was fanart, when there is no fanart on tvdb. Is this possible?

3) Some searches on tvdb have the right result but in the wrong order. Again here is where a general purpose language like python would be better. I have a show called a.li.ce. The tvdb result returns alice as the first and a.li.ce further down. but it gets the fanart of the first show which is wrong. Maybe ranking the options by the best match would help.

4) the search for tvdb url runs through some options, but not all options. Specifically, I think it should run through all the english synonym titles, not just the first one. For example the current show Mahou Shoujo Madoka Magika, the api returns the first english synonym as pmagi, which is not on tvdb. But some of the other synonyms are. Is it possible to select all synonyms? I had to add the option of using the short english title because I could not figure out how to use all synonyms.

5) The last piece is about anime movies. Sadly tvdb doesn't have any fanart for them. Do you have any ideas of how we might find some fanart for them? Smile


- bambi73 - 2011-03-29

pathw Wrote:well I mean, parsing xml/html with a regex itself is badass. There were changes I wanted to do, that were quite difficult. I'll list an example of a change I want, I don't know how to do.
Yeah, you right, regexps are not suitable for complicated parsing and conditional execution, but it works (when you know how to write right regexp Smile) without any additional work around. In Python you will need to make some framework around. I know there are libraries, but still ....

pathw Wrote:How did you do the python work? I didn't know the scraper system in xbmc exposed had any hooks for python. (I don't have xbmc building on my machine, so I havent really seen how the plugin system works). If it's really possible, I dont mind working on the port. I'm sure it will be way better than struggling with regex and these continuous transformations.
There is no support for Python in current scraper engine, i added some preliminary support to call Python functions instead of scraper ones. It was working somehow, but there was a lot of problems around so i rather finished scraper with normal regexps Smile

pathw Wrote:so to merge the 2 xmls I used this tool http://tools.decisionsoft.com/xmldiff.html. It seems pretty good.

In the anidb.xml you listed to me yesterday, I saw this regexp
(?i)&lt;a href=&quot;http://anidb\.net/perl-bin/animedb\.pl\?show=anime&amp\;aid=(\d+)&quot;[^&gt;]*&gt;(.*?)&lt;/a&gt;

after show=anime you have &amp\; instead of &amp;. Any idea why?
Not sure about this right now, but i guess it's because i'm looking for text "&amp;" and not for "&". Maybe i'll change it to \Q&amp;\E, it looks better Smile

I'll answer rest scraper related questions when i return home evening, i need my sources for it.


- bambi73 - 2011-03-30

pathw Wrote:I have 2 changes I have made.
1) for the fanart lookups in addition to the main title, official english, synonym x-jat and first synonym en, I have added the short en. (I needed this for madoka. There is a better solution, so let me talk about that)

There are 5 changes I want to make, but couldnt quite figure out.
3) Some searches on tvdb have the right result but in the wrong order. Again here is where a general purpose language like python would be better. I have a show called a.li.ce. The tvdb result returns alice as the first and a.li.ce further down. but it gets the fanart of the first show which is wrong. Maybe ranking the options by the best match would help.

4) the search for tvdb url runs through some options, but not all options. Specifically, I think it should run through all the english synonym titles, not just the first one. For example the current show Mahou Shoujo Madoka Magika, the api returns the first english synonym as pmagi, which is not on tvdb. But some of the other synonyms are. Is it possible to select all synonyms? I had to add the option of using the short english title because I could not figure out how to use all synonyms.

5) The last piece is about anime movies. Sadly tvdb doesn't have any fanart for them. Do you have any ideas of how we might find some fanart for them? Smile
Maybe you noticed anime-list.xml file in your scrape cache, it contains mapping between AniDB and TheTVDB. When this mapping exist for scraped anime then there is no need to make any lookups by name because exact Id is used (solution for your problems #1,#3,#4). In addition there can be mapping for episode numbers when they don't match between both sites. Finally there can be overrides for any values returned by scraper (solution for your problem #5, look for example for anidbid="1208")
This file is normaly hosted on google sites and maitaned by me (last 10 months is missing :o), but i plan add support for second, personal, version of this file which everyone can maintain by himself. Of course you need to host is somewhere because scraper can't load local files, but at least XBMC is running simple web server so you can add it there (not nice, but it's working).

pathw Wrote:I have 2 changes I have made.
2) the real change I made was to be able to import the other special episodes into xbmc. Trailers, Parodies, Other, and Ops and Eds that are all recognized by anidb. But the solution I have isnt very nice.

xbmc stores season and episode number as integers. So I have mapped episodes like T01 and P01 to very high season numbers. This works for me. Sadly the episodes get displayed by xbmc as being season 115 and so on. Tell me if you are interested in these changes.
I was never so much interested in other types of episodes besides regular and special ones, so i never supported them. Maybe i'll try to add some support if i'm able to find some way how to sort them reasonably together with rest.

pathw Wrote:There are 5 changes I want to make, but couldnt quite figure out.
1) For some reason the anidb image that shows up doesnt scale properly in the "tv show properties" window. I'm wondering how this logic works, especially because I'd also like to use this with the mediaInfo 2 view of xbmc.

2) I'd like to use the anidb image scaled as if it was fanart, when there is no fanart on tvdb. Is this possible?
Scraper doesn't download/process any images, it only tells to XBMC where they can be found -> scaling is done by XBMC. Of course you can return AniDB thumb as fanart, but their aspect ration is completely different, so it'll scale horribly. I don't think i'll ever support this functionality.


- pathw - 2011-03-31

bambi73 Wrote:Maybe you noticed anime-list.xml file in your scrape cache, it contains mapping between AniDB and TheTVDB. When this mapping exist for scraped anime then there is no need to make any lookups by name because exact Id is used (solution for your problems #1,#3,#4). In addition there can be mapping for episode numbers when they don't match between both sites. Finally there can be overrides for any values returned by scraper (solution for your problem #5, look for example for anidbid="1208")
This file is normaly hosted on google sites and maitaned by me (last 10 months is missing :o), but i plan add support for second, personal, version of this file which everyone can maintain by himself. Of course you need to host is somewhere because scraper can't load local files, but at least XBMC is running simple web server so you can add it there (not nice, but it's working).

I think a shared copy that's updated would be better. But maybe instead of an xml file, we could make a simple website where you give it the anidbid and it gives you the tvdbid, and we let anyone edit it Smile? Crowdsourcing and all that.

I noticed though that the fanart section, that looks up alternate names and prequels, does not look up this file.?


bambi73 Wrote:I was never so much interested in other types of episodes besides regular and special ones, so i never supported them. Maybe i'll try to add some support if i'm able to find some way how to sort them reasonably together with rest.

I have added the support. You could just merge it in if you want.

bambi73 Wrote:Scraper doesn't download/process any images, it only tells to XBMC where they can be found -> scaling is done by XBMC. Of course you can return AniDB thumb as fanart, but their aspect ration is completely different, so it'll scale horribly. I don't think i'll ever support this functionality.

hmm but I dont get this. The art on the image is always of different aspect ratios. Why does xbmc scale it so badly?