v18 [split] Testing scraping changes and feedback
#1
EDIT: Split from Music Library Development Roadmap - One Man's Plan in tidy up

scanning right now, at first look, I dont get any additional info.

I let it finish and report back
Reply
#2
(2017-07-02, 12:36)meowmoo Wrote: scanning right now, at first look, I dont get any additional info.
The combined scan (tag) + scrape (additional info) sequence has changed.
To populate the lib fastest, so you can play stuff, all the scanning happens first. Then after that is complete it scrapes info and art for the added artists and albums. This is the slower part - 1s per album (or 2s if lookup by name as no mbid tags), 1s per artist (or 2s if no mbid), but it is a background task.

The key thing is to check for 503 errors in the log (or that you have info for all the artists and albums that you expect, no gaps)

Quote:I let it finish and report back
Great. Thanks for testing this, always good to have real world feedback.
Reply
#3
All good at the End, no 503 errors I the log, and all information is there
Reply
#4
Dave,

did you introduce delays in the scraping process?

If so, is there a way to turn them off?

The reason for asking is that I have put together a simple web application with corresponding scrapers for artists and albums. Now, this process is an order of magnitude slower than before. It used to pretty fast compared to MusicBrainz etc. Ok, I have to manually update the database. But for some reasons, with the music I normally listen to, I have to do some manual work anyway. And MusicBrainz is not that user friendly, At least I have a Web GUI to handle the content with Cut-and-Paste etc. :-)

- Janne
Reply
#5
(2017-07-08, 09:39)JanneT Wrote: Dave,

did you introduce delays in the scraping process?
yes

Quote:If so, is there a way to turn them off?
nope.
It's all or nothing and we decided to throttle online info to reduce load on Musicbrainz to get it working
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#6
(2017-07-08, 09:55)Martijn Wrote:
(2017-07-08, 09:39)JanneT Wrote: Dave,

did you introduce delays in the scraping process?
yes

Quote:If so, is there a way to turn them off?
nope.
It's all or nothing and we decided to throttle online info to reduce load on Musicbrainz to get it working

OK, I have to accept that. I could scrape 600-700 albums in less than 1 minute from my local server. :-(
Reply
#7
There's a difference between scraping and scanning.
Also it's a one time job so it doesn't matter that it takes longer the first time.
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#8
(2017-07-08, 11:14)Martijn Wrote: There's a difference between scraping and scanning.
Also it's a one time job so it doesn't matter that it takes longer the first time.

In this case, it's the scraping I'm talking about. From my local Web server.
Reply
#9
As Martijn said, the initial library scrape is a one time process. How many times are you scraping your library and why??
My Signature
Links to : Official:Forum rules (wiki) | Official:Forum rules/Banned add-ons (wiki) | Debug Log (wiki)
Links to : HOW-TO:Create Music Library (wiki) | HOW-TO:Create_Video_Library (wiki)  ||  Artwork (wiki) | Basic controls (wiki) | Import-export library (wiki) | Movie sets (wiki) | Movie universe (wiki) | NFO files (wiki) | Quick start guide (wiki)
Reply
#10
Any thoughts on below? Cross-posting here as it is very music related.

https://forum.kodi.tv/showthread.php?tid=317925
Server: Asus Sabertooth Z77 | Intel Core i5 3.4 GHz | 16 GB DDR3 | 128 GB SSD, 82 TB (9 x 6 TB, 7 x 4 TB)
HTPC 1: Raspberry Pi 2 | HTPC 2: Raspberry Pi 2 | HTPC 3: Raspberry Pi
Reply
#11
(2017-07-08, 09:39)JanneT Wrote: did you introduce delays in the scraping process?

If so, is there a way to turn them off?

The reason for asking is that I have put together a simple web application with corresponding scrapers for artists and albums. Now, this process is an order of magnitude slower than before. It used to pretty fast compared to MusicBrainz etc. Ok, I have to manually update the database. But for some reasons, with the music I normally listen to, I have to do some manual work anyway. And MusicBrainz is not that user friendly, At least I have a Web GUI to handle the content with Cut-and-Paste etc. :-)

(2017-07-08, 10:53)JanneT Wrote: OK, I have to accept that. I could scrape 600-700 albums in less than 1 minute from my local server. :-(
@JanneT I don't understand what you are doing.

The Universal Artist Scraper and the Universal Album Scraper make requests to Musicbrainz. Unless those requests are throttled to less than 1 per second the Musicbrainz server returns 503 errors and blocks requests from your IP address until they arrive at a slow enough rate. Previous attempts at throttling were flawed, and users got lots of scraper failures (experienced as gaps in artwork for well known music). I fixed this - so scraping is slower (correctly throttled) but gets results!

No idea what you have been doing that worked on 700 albums a minute, your own xml scraper?? You need to explain more what you are trying to achieve if you want help.
Reply
#12
(2017-07-09, 07:15)DaveBlake Wrote:
(2017-07-08, 09:39)JanneT Wrote: did you introduce delays in the scraping process?

If so, is there a way to turn them off?

The reason for asking is that I have put together a simple web application with corresponding scrapers for artists and albums. Now, this process is an order of magnitude slower than before. It used to pretty fast compared to MusicBrainz etc. Ok, I have to manually update the database. But for some reasons, with the music I normally listen to, I have to do some manual work anyway. And MusicBrainz is not that user friendly, At least I have a Web GUI to handle the content with Cut-and-Paste etc. :-)

(2017-07-08, 10:53)JanneT Wrote: OK, I have to accept that. I could scrape 600-700 albums in less than 1 minute from my local server. :-(
@JanneT I don't understand what you are doing.

The Universal Artist Scraper and the Universal Album Scraper make requests to Musicbrainz. Unless those requests are throttled to less than 1 per second the Musicbrainz server returns 503 errors and blocks requests from your IP address until they arrive at a slow enough rate. Previous attempts at throttling were flawed, and users got lots of scraper failures (experienced as gaps in artwork for well known music). I fixed this - so scraping is slower (correctly throttled) but gets results!

No idea what you have been doing that worked on 700 albums a minute, your own xml scraper?? You need to explain more what you are trying to achieve if you want help.

Yes, that was with my own scraper that uses an application on a webserver at home. The idea was to become independent of MusicBrainz as a lot of the music I listen to requires some manual work with MB, TADB etc. anyway. So I thought I would be better off with a private server to maintain the metadata. It works quite well for my needs. Even for Composers.

But as others have pointed out, it's no big deal except for now when I test my webapp. Otherwise I think that throttling the requests is a good idea.
Reply
#13
(2017-07-09, 08:46)JanneT Wrote: Yes, that was with my own scraper that uses an application on a webserver at home. The idea was to become independent of MusicBrainz as a lot of the music I listen to requires some manual work with MB, TADB etc. anyway. So I thought I would be better off with a private server to maintain the metadata. It works quite well for my needs. Even for Composers.

But as others have pointed out, it's no big deal except for now when I test my webapp. Otherwise I think that throttling the requests is a good idea.
Creating your own private xml scraper is unusual. Perhaps get the webapp to create NFO files instead, Kodi will scarpe those without delay.
Reply
#14
(2017-07-09, 10:49)DaveBlake Wrote:
(2017-07-09, 08:46)JanneT Wrote: Yes, that was with my own scraper that uses an application on a webserver at home. The idea was to become independent of MusicBrainz as a lot of the music I listen to requires some manual work with MB, TADB etc. anyway. So I thought I would be better off with a private server to maintain the metadata. It works quite well for my needs. Even for Composers.

But as others have pointed out, it's no big deal except for now when I test my webapp. Otherwise I think that throttling the requests is a good idea.
Creating your own private xml scraper is unusual. Perhaps get the webapp to create NFO files instead, Kodi will scarpe those without delay.

Yes, that is an idea I've bee thinking about. But what of albums with multiple Artists?
Reply
#15
Collaboration albums (multiple album artists) are a problem for scraping NFOs and artwork, something I hope to address.

Meanwhile creating a single import file and importing it could work - play with export to a single file and see what you get. Remember for music export/import only contains the scraped album and artist data, not the things derrived from tags.
Reply

Logout Mark Read Team Forum Stats Members Help
[split] Testing scraping changes and feedback0