Getting http banned on anidb.net when scraping anime
#1
Question 
Hey,
yesterday i tried several scraper/manager for kodi and at some point i was banned from using the anidb.net api. I thought it was because all of the tools i used. Today i wanted to finish up my library with TMM as it is by far the best manager i got after hours of trying different stuff. After around 5-8 animes i got banned again (anime data such as metadata and pictures wouldnt load with the anidb.net scraper anymore).
Sooo is it a problem on my end or is it the scraper behaviour?
I was doing the following for each anime:

- Select ONE anime
- Search and scrape selected TV show(s)
- Enter the name (if not detected correctly) and select the right show on the scraper window
- At this point metadata and poster was usually loaded
- Some shows wouldnt detect the files inside their folder as season 1 so i had to -> right click the show -> bulk editing -> set season to "1" -> click the checkmark
- As the season was only now discovered correctly i had to scrape the episode/tv show data again - is this where i f***** up? Basically i processed the show twice in this case. Once without the season being detected and once with everything in place. I had this with like 2-4 anime i think.

My fault? Anidb.net scraper fault? TMM fault?
I have no idea what went wrong here. I would love to finish my library already. Thank you very much for any help in advance.

Greetings,
Departet
Reply
#2
yes there is a limitation of anidb (see https://wiki.anidb.net/w/API#Anime_Titles), but we do the best we can for buffering this file (we write it to the disk with a retention time of 2 days - so this file should not be queried more than once all two days, IF nothing else tampers our cache!)

so it is kinda hard to say what exactly caused your IP for getting banned (but AFAIK this ban is only temporary). I've just tried accessing the API of anidb several time in a row (even after a restart of tmm) and the cache works as it should.. did you clear the HTTP cache while playing around with tmm?
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate
Found a bug or want to submit a feature request? Contact us at GitLab
Image
Reply
#3
Thank you for the fast reply. When i first set up TMM i imported 2 folder. One with anime series and one with anime movies. I got the movies perfectly named and set up with the tvdb scraper. Chances are high i did certain actions too often with the anidb scraper yesterday which probably got me banned. But today i only did the steps i mentioned above. If the ban would be like 30min it would be kinda "okay" because i would get my anime done at some point eventually but because of the increasing ban times and different tools i used yesterday i'm still not unbanned at this second and following bans would probably even longer. To prepare for my next try once im unbanned i already put all my files into "Season 1" folders inside their "main" folder. Example: Air/Files.mkv -> Air/Season 1/Files.mkv. So that i only have to scrap each anime once if this works as expected.

What do you mean by clear http cache? I unpacked TMM yesterday and have since been using it as a scraper. I didnt change any main settings or stuff like that. Should i delete my TMM and start it from scrap maybe the next time i'm unbanned?
Reply
#4
no, creating a new instance increases the chance to get banned (because the static file must get fetched once more). As you may see in the thread https://anidb.net/forum/thread/86659 even anidb mods do not know _how_ the banning logic works and they can just suggest you to wait.
I've just review the code of the anidb scraper and I can say that we

- do all neccessary throttling (one request per 2 secs)
- caching the static files 

but I've seen the following sentence in their docs
Quote:All users of this API should employ heavy local caching. Requesting the same dataset multiple times on a single day can get you banned. The same goes for request flooding. You should not request more than one page every two seconds.
and this could be the case, that you may have tried to call the same url multiple times (but I've done that too in my tests here and there was no ban...)

So I do not find any "problem" in tmm and I cannot tell you more than "just wait for getting unbanned" (without trying too often, because every new try may increase the banned time)
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate
Found a bug or want to submit a feature request? Contact us at GitLab
Image
Reply
#5
I was already kinda sure it isn't a TMM bug. I was hoping for some fault on my end which is easy to explain (it kind of is a fault my fault but i don't understand it yet haha).
Anyways, thank you very much for troubleshooting for me! Have a good day.


EDIT: After checking the logs i saw that the scraper is scraping two times even tho i just chose one anime. Is that because i clicked another title in the "scraper-select-anime-name"-window?
I pasted the relevant stuff here: https://paste.kodi.tv/odafanuwil.kodi
Reply
#6
This is the anidb answer on my problem. Is there a way to cache more http data on my end? Like an option?
https://anidb.net/forum/thread/101403#c462354

After further investigations (i really want to have this done) i found out that anime names are always downloaded again. Could i stop this for the moment? Maybe, because i scrape every anime one by one the repeated anime name download is the problem?
2020-07-20 23:17:42,676 DEBUG [SwingWorker-pool-4-thread-10] org.tinymediamanager.scraper.http.Url:288 - getting http://anidb.net/api/anime-titles.dat.gz
I can manually download that file so i don't think thats a problem. Was just a thought.
Reply
#7
we do cache here as much as we can (or the HTTP Standard _allows_ us): if the same url in anidb has been accessed twice per tmm session the result is being delivered by our in memory cache. If you do restart tmm the cache is being lost and the same request is queried again from anidb.

we could also create an disk cache, BUT

a) there is NO guideline from anidb how LONG we should cache
b) according to the code in OkHttp (the HTTP library we use) there was no way to force a disk-cache hit if there are query parameters in the url (at least it was the last time I analyzed the their code). I can have a look if there has something changed since then, but I doubt OkHttp will accept the response of anidb to cache, because their webserver has sent _crappy_ HTTP responses in the past and I don't think something changed since then :/

stay tuned

EDIT: lol I just saw that I've already built a "force disk cache" because of the _problems_ with HTTP conform caching (I've mentioned above) - https://gitlab.com/tinyMediaManager/tiny...edUrl.java
I will exchange the caching method for the AniDB scraper
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate
Found a bug or want to submit a feature request? Contact us at GitLab
Image
Reply



Logout Mark Read Team Forum Stats Members Help
Getting http banned on anidb.net when scraping anime0
This forum uses Lukasz Tkacz MyBB addons.