Kodi Community Forum
Release Universal Scraper for Music Albums - Printable Version

Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Add-on Support (https://forum.kodi.tv/forumdisplay.php?fid=27)
+---- Forum: Information Providers (scrapers) (https://forum.kodi.tv/forumdisplay.php?fid=147)
+----- Forum: Music Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=303)
----- Thread: Release Universal Scraper for Music Albums (/showthread.php?tid=133547)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34


RE: [Release] Universal Scraper for Music Albums - olympia - 2017-06-25

What help would you need from my side?

Sent from my E5823


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-06-26

@olympia I am finally looking at core changes to make any re-scraping more efficient, such as storing scraped Musicbrainz ids etc. as we discussed long ago (lost the thread). To support those improvements a couple of backwards compatible scraper changes are needed:
  • Return Musicbrainz release group id in album details (along with album mbid and title that it does currently)
  • Return "relevance" (from Musicbrainz scrore) for album search results

I had a go at doing this myself over the weekend, and got brain melt with regex of xml, but something like this (whole section shown):

metadata.common.musicbrainz.org
Code:
<ParseMBAlbumTitle dest="5">
        <RegExp input="$$2" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="&lt;musicBrainzAlbumID&gt;\1&lt;/musicBrainzAlbumID&gt;&lt;title&gt;\2&lt;/title&gt;" dest="2">
                <expression noclean="1">&lt;release id=&quot;([^&quot;]*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;</expression>
            </RegExp>
            <!--MBID - release group-->            
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\stype=&quot;[^&quot;]*&quot;\sid=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\stype=&quot;[^&quot;]*&quot;\stype-id=&quot;[^&quot;]*&quot;\sid=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>            
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\sid=&quot;([^&quot;]*)&quot;\stype=&quot;[^&quot;]*&quot;</expression>
            </RegExp>            
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\sid=&quot;([^&quot;]*)&quot;\stype-id=&quot;[^&quot;]*&quot;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\stype-id=&quot;[^&quot;]*&quot;\sid=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;musicbrainzreleasegroupid&gt;\1&lt;/musicbrainzreleasegroupid&gt;" dest="2+">
                <expression noclean="1">&lt;release-group\stype-id=&quot;[^&quot;]*&quot;\stype=&quot;[^&quot;]*&quot;\sid=&quot;([^&quot;]*)&quot;</expression>
            </RegExp>            
            <expression noclean="1">(.+)</expression>
        </RegExp>
    </ParseMBAlbumTitle>

metadata.album.universal
Code:
<GetAlbumSearchResults dest="8">
        <RegExp input="$$5" output="&lt;results sorted=&quot;yes&quot;&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;Official&lt;/status&gt;(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\stype=&quot;Album&quot;\sid=&quot;[^&quot;]*&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;Official&lt;/status&gt;(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group(?:\stype=&quot;[^&quot;]*&quot;)*\sid=&quot;[^&quot;]*&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\6-\5-T#\7&lt;/year&gt;&lt;artist&gt;\4&lt;/artist&gt;&lt;title&gt;\3&lt;/title&gt;&lt;relevance scale=&quot;100&quot;&gt;\2&lt;/relevance&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot; ext:score=&quot;(\d*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;Official&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\sid=&quot;[^&quot;]*&quot;\stype=&quot;Album&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\6-\5-T#\7&lt;/year&gt;&lt;artist&gt;\4&lt;/artist&gt;&lt;title&gt;\3&lt;/title&gt;&lt;relevance scale=&quot;100&quot;&gt;\2&lt;/relevance&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot; ext:score=&quot;(\d*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;Official&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;)*/script&gt;&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\sid=&quot;[^&quot;]*&quot;(?:\stype=&quot;[^&quot;]*&quot;)*&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;(?!Official)[^&lt;]*&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\stype=&quot;Album&quot;\sid=&quot;[^&quot;]*&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\5-\4-T#\6&lt;/year&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;(?!Official)[^&lt;]*&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group(?:\stype=&quot;[^&quot;]*&quot;)*\sid=&quot;[^&quot;]*&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\6-\5-T#\7&lt;/year&gt;&lt;artist&gt;\4&lt;/artist&gt;&lt;title&gt;\3&lt;/title&gt;&lt;relevance  scale=&quot;100&quot;&gt;\2&lt;/relevance&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot; ext:score=&quot;(\d*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;(?!Official)[^&lt;]*&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\sid=&quot;[^&quot;]*&quot;\stype=&quot;Album&quot;&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;\6-\5-T#\7&lt;/year&gt;&lt;artist&gt;\4&lt;/artist&gt;&lt;title&gt;\3&lt;/title&gt;&lt;relevance scale=&quot;100&quot;&gt;\2&lt;/relevance&gt;&lt;url cache=&quot;mb-\1-album.xml&quot;&gt;$INFO[mbsite]/ws/2/release/\1?inc=recordings+release-groups+artists+labels+ratings&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="yes">id=&quot;([^&quot;]*)&quot; ext:score=&quot;(\d*)&quot;&gt;&lt;title&gt;([^&lt;]*)&lt;/title&gt;&lt;status&gt;(?!Official)[^&lt;]*&lt;/status&gt;(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;packaging&gt;[^&lt;]*&lt;/packaging&gt;)?&lt;text-representation&gt;(?:&lt;language&gt;[^&lt;]*&lt;/language&gt;)*(?:&lt;script&gt;[^&lt;]*&lt;/script&gt;)*&lt;/text-representation&gt;&lt;artist-credit&gt;&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;([^&lt;]*)&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;(?:&lt;name-credit(?:&gt;)*(?:\sjoinphrase=&quot;[^&quot;]*&quot;&gt;)*(?:&lt;name&gt;[^&lt;]*&lt;/name)*(?:&gt;)*&lt;artist\sid=&quot;[^&quot;]*&quot;&gt;&lt;name&gt;[^&lt;]*&lt;/name&gt;(?:&lt;sort-name&gt;[^&lt;]*&lt;/sort-name&gt;)*(?:&lt;disambiguation&gt;[^&lt;]*&lt;/disambiguation&gt;)*(?:&lt;alias-list&gt;(?:&lt;alias[^&lt;]*&lt;/alias&gt;)*&lt;/alias-list&gt;)*&lt;/artist&gt;&lt;/name-credit&gt;)*&lt;/artist-credit&gt;&lt;release-group\sid=&quot;[^&quot;]*&quot;(?:\stype=&quot;[^&quot;]*&quot;)*&gt;(?:&lt;primary-type&gt;[^&lt;]*&lt;/primary-type&gt;)*(?:&lt;secondary-type-list&gt;(?:&lt;secondary-type&gt;[^&lt;]*&lt;/secondary-type&gt;)+&lt;/secondary-type-list&gt;)*&lt;/release-group&gt;&lt;date&gt;(\d{4})[^&lt;]*&lt;/date&gt;(?:&lt;country&gt;)*([^&lt;]*)?.*?&lt;track-list\scount=&quot;(\d+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetAlbumSearchResults>

I can try and PR if you like, although you will be quicker I'm sure.
I have raised a PR https://github.com/xbmc/repo-scrapers/pull/51, it may be easier to see the changes that way.

Also before this I was never a big scraper user, but testing those changes has also allowed me to look more closely at my library items that fail to scrape. Some are in MB database but the XML parsing fails, would you have time to look at what could be happening it I give you some examples?


RE: [Release] Universal Scraper for Music Albums - olympia - 2017-06-26

I am without my dev pc until second half of the week, so I can only then look at it in more detail, but the changes structurally make sense to me (certainly cannot validate the regexps Wink).

Sure, just tell me the examples and I will look at those too.

Sent from my E5823


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-06-26

Thanks @olympia

Examples of albums in MB database (tested the query url manually) that the scraper doesn't parse the sreach results:
Don’t Get Me Wrong, Frances Black
The Best of Frances Black, Frances Black
The Sky Road, Frances Black
The Smile on Your Face, Frances Black
Spirits Colliding, Paul Brady
Treasure the Questions, Martyn Joseph
Last Look, Torcuato Mariano

Here are the links:
Code:
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Don%27t%20Get%20Me%20Wrong%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Best%20of%20Frances%20Black%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Sky%20Road%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Smile%20on%20Your%20Face%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Spirits%20Colliding%22%20AND%20(artistname:%22Paul%20Brady%22%20OR%20artist:%22Paul%20Brady%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Treasure%20the%20Questions%22%20AND%20(artistname:%22Martyn%20Joseph%22%20OR%20artist:%22Martyn%20Joseph%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Last%20Look%22%20AND%20(artistname:%22Torcuato%20Mariano%22%20OR%20artist:%22Torcuato%20Mariano%22)



RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-06-29

Musicbrainz server 503 errors (throttling issues)
(2017-02-24, 21:52)Merwenus Wrote: Musicbrainz:
For example: if your requests are coming in at 4 requests per second, we don't honour 25% of them and decline the other 75% - we decline 100% of them, until the rate drops to 1 per second or lower.

And the scraper is an idiot, try to download 10/seconds instead of 1/1,5 seconds....
I replied
(2017-02-25, 00:12)DaveBlake Wrote: I don't think it is the scraper doing more than 1 per sec. Probably an addon that is hammering Musicbrainz.
and
(2017-02-26, 10:02)DaveBlake Wrote: The scraper has a 1s sleep built into it, if you are sure there are more requests that that happening then it is from an addon.
well I was wrong. Sorry about that Merwenus, and everyone else.Blush

Musicbrainz throttling (server 503 errors) does happen because of global server load, some other addons do make Musicbrainz requests, and during "Query info for all" Kodi does go rapidly through those albums that have already been scraped. But there is a longstanding flaw in the current scraper that means in some circumstances it also can make more than 1 request per sec to the Musicbrainz server.

One of those circumstances is having allmusic.com as one of the sources chosen in settings - an additional request is made of Musicbrainz to fetch the allmusic.com link for the album, and this request happens almost immediately after any lookup by album title and artist name, there is no delay. The current scraper 1s sleep only applies when doing a lookup by mbid (provided by music file tags).

Bottomline: using allmusic.com, or when music fies don't have Musicbrainz id tags, you will get IP adress related 503 errors when scraping either from "query info for all", or scanning many new items with "Fetch online info on update" enabled. It could take many repeat attempts to fill the resulting gaps in artwork and information. Kodi running on faster processors e.g. i7 rather than a RPi, will probably get more throttling (more 503 fails) because requests will be even closer together than on a slower processor..

It is something that I will attempt to fix as it involves core changes (not just an XML tweek) to catch every time the scraper is attempting to make a request to Musicbrainz.


RE: [Release] Universal Scraper for Music Albums - docwra - 2017-06-30

Cool, I always thought that was the case!

For info, MusicBrainz just upped the capacity of their API service a fair bit with new hardware, so it should be a much more reliable now globally. In the past it wasn't always the rate limiter that causes those 503 errors.


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-06-30

Good news about MB hardware Zag, and yes the global limiter (just too much traffic) also leads to 503 error.

Just found that the album scraper has also been making duplicate calls to Musicbrainz when album mbid is known. Simple fix to the XML, getting ResolveIDToUrl to return url cache filename like CreateAlbumSearchUrl does. It will halve the number of requests made when scraping collections with mbids

@olympia I have added this to https://github.com/xbmc/repo-scrapers/pull/51, hope that you can find some time to check it.


RE: [Release] Universal Scraper for Music Albums - olympia - 2017-06-30

(2017-06-26, 17:11)DaveBlake Wrote: Thanks @olympia

Examples of albums in MB database (tested the query url manually) that the scraper doesn't parse the sreach results:
Don’t Get Me Wrong, Frances Black
The Best of Frances Black, Frances Black
The Sky Road, Frances Black
The Smile on Your Face, Frances Black
Spirits Colliding, Paul Brady
Treasure the Questions, Martyn Joseph
Last Look, Torcuato Mariano

Here are the links:
Code:
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Don%27t%20Get%20Me%20Wrong%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Best%20of%20Frances%20Black%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Sky%20Road%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22The%20Smile%20on%20Your%20Face%22%20AND%20(artistname:%22Frances%20Black%22%20OR%20artist:%22Frances%20Black%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Spirits%20Colliding%22%20AND%20(artistname:%22Paul%20Brady%22%20OR%20artist:%22Paul%20Brady%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Treasure%20the%20Questions%22%20AND%20(artistname:%22Martyn%20Joseph%22%20OR%20artist:%22Martyn%20Joseph%22)
http://musicbrainz.org/ws/2/release/?fmt=xml&query=release:%22Last%20Look%22%20AND%20(artistname:%22Torcuato%20Mariano%22%20OR%20artist:%22Torcuato%20Mariano%22)

This albums are without year - currently the scraper expects the year, that's why - I will need to see how to resolve this the most efficiently


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-07-01

(2017-06-30, 22:23)olympia Wrote: This albums are without year - currently the scraper expects the year, that's why - I will need to see how to resolve this the most efficiently
Well spotted @olympia Smile
I stared at the data for ages (until regexp of xml melted my brain) and I couldn't see what the common element was.

Thanks also for checking and merging the scraper PR, now I just need to sort out the throttling in core.


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-07-01

Fix for Musicbrainz throttling raised https://github.com/xbmc/xbmc/pull/12402, and a Leia test build available here de3fa22ed7-TestScraping for anyone interested. It will avoid all those 503 errors, manual retries and unnecessary gaps in album and artist artwork and info.

I hope to backport the throttling fix to v17.4 too.


RE: [Release] Universal Scraper for Music Albums - olympia - 2017-07-02

(2017-07-01, 00:12)DaveBlake Wrote:
(2017-06-30, 22:23)olympia Wrote: This albums are without year - currently the scraper expects the year, that's why - I will need to see how to resolve this the most efficiently
Well spotted @olympia Smile
I stared at the data for ages (until regexp of xml melted my brain) and I couldn't see what the common element was.

Thanks also for checking and merging the scraper PR, now I just need to sort out the throttling in core.

Try v2.7.3 - hopefully I've got that fixed.


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2017-07-02

Thanks @olympia, v2.7.3 fixes the missing albums because the Musicbrainz entry did not have a year.


RE: [Release] Universal Scraper for Music Albums - Boulder - 2017-09-17

I have two issues with the addon:

1) If I use allmusic.com to fetch ratings, the points are between 1-5 and so the number of stars shown is max 2,5 in Krypton.

2) I'm unable to change the rating to be fetched from Musicbrainz. I tried removing my music source, cleaned the library, changed in the addon settings that Musicbrainz is used and readded the music source to scan. Allmusic.com was set for all albums Sad If I try to change the information provider in the album list, it only affects the selected album and not all of my albums (even though I confirmed that I want to change all).

Can anyone throw some ideas on what to try to fix things?


RE: [Release] Universal Scraper for Music Albums - DaveBlake - 2018-01-10

1) I think it is that the scraper needs to add the max rating from the data source e.g. for 5 for Allmusic to the interim xml it creates, otherwise Kodi assumes it is the default max of 10. Something that we need @olympia to deal with.

2) There is a bug in "Change all" that is fixed for Leia (actually the whole way music scraper settings are selectively applied has been reworked, and made it easier to set the default). But I would have expected changing the addon settings, adding source again etc. to have worked. I'll see if I can repeat this.

But also Boulder could you check what version of the addon you have installed.


RE: [Release] Universal Scraper for Music Albums - Epg1 - 2018-01-11

Hi,
    I'm using Universal Scraper for Music Albums and I've a problem scraping a couple of *.flac + *.cue files. The scraper works but doesn't show the correct tracklist in the Kodi information window. Digging in the kodi.log, I see that the scraper finds the correct Artist and Album but doesn't pick the right edition of that album: same artist name, same album title but different total tracks. To be clear, the found Musicbrainz release id is f3d8c303-ab62-40e0-b4bd-e26d09052b02 (a vinyl edition) insted of the right one (a Cd edition), which should be https://musicbrainz.org/release/df2d616a-8fd2-3ceb-b4df-05b3c199787c. The scraper doesn't show me any scroll list to choose from and picks the wrong album edition automatically.

The cuesheet file (Perigeo - La Valle dei Templi.cue) content is:
Quote:REM GENRE Jazz / Prog
REM DATE 1989
PERFORMER "Perigeo"
TITLE "La Valle dei Templi"
FILE "Perigeo - La Valle dei Templi.flac" WAVE
  TRACK 01 AUDIO
    TITLE "Tamale"
    PERFORMER "Perigeo"
    INDEX 00 00:00:00
    INDEX 01 00:00:30
  TRACK 02 AUDIO
    TITLE "La Valle dei Templi"
    PERFORMER "Perigeo"
    INDEX 00 04:31:30
    INDEX 01 04:35:60
  TRACK 03 AUDIO
    TITLE "Looping"
    PERFORMER "Perigeo"
    INDEX 00 10:47:60
    INDEX 01 10:53:30
  TRACK 04 AUDIO
    TITLE "Mistero della Firefly"
    PERFORMER "Perigeo"
    INDEX 00 13:57:46
    INDEX 01 14:00:55
  TRACK 05 AUDIO
    TITLE "Pensieri"
    PERFORMER "Perigeo"
    INDEX 00 19:57:15
    INDEX 01 20:01:25
  TRACK 06 AUDIO
    TITLE "Periplo"
    PERFORMER "Perigeo"
    INDEX 00 22:15:70
    INDEX 01 22:18:40
  TRACK 07 AUDIO
    TITLE "Eucalyptus"
    PERFORMER "Perigeo"
    INDEX 00 27:22:60
    INDEX 01 27:27:40
  TRACK 08 AUDIO
    TITLE "Alba di un Mondo"
    PERFORMER "Perigeo"
    INDEX 01 28:26:40
  TRACK 09 AUDIO
    TITLE "Cantilena"
    PERFORMER "Perigeo"
    INDEX 00 31:20:00
    INDEX 01 31:23:30
  TRACK 10 AUDIO
    TITLE "2000 e due Notti"
    PERFORMER "Perigeo"
    INDEX 00 35:19:45
    INDEX 01 35:24:15
  TRACK 11 AUDIO
    TITLE "Un cerchio Giallo"
    PERFORMER "Perigeo"
    INDEX 00 40:59:35
    INDEX 01 41:03:65
The flac file (Perigeo - La Valle dei Templi.flac) contains the following tags:
Quote:Artist Name : Perigeo
Track Title :
Album Title : La Valle dei Templi
Date : 1989
Genre :
Composer :
Performer :
Album Artist :
Track Number :
Total Tracks : 11
Disc Number :
Total Discs : 1
Comment :
<MUSICBRAINZ_ALBUMID> : df2d616a-8fd2-3ceb-b4df-05b3c199787c
<ORIGINALDATE> : 1975
 An excertp from the kodi.log file (debugging enabled): kodi.log

 In  short, Universal Album Scraper does not elaborate the date tag neither in the flac file nor in the cuesheet so it does not pick the right album edition.

My hardware: Raspberry Pi 2 / OSMC /kodi 17.6, files are stored in a local attached USB hard disk (I have the same problem using Windows 7 64 bit and Kodi 17.6).

What can I do, except doing a *.nfo file, to solve this problem?

Thank you in advance Smile

Epg1


This forum uses Lukasz Tkacz MyBB addons.