Login at Kodi Home

chninkel · 2012-04-08, 23:45

Hi,

I am currently using the script cu.lyrics with lyricsmode and I am very happy with it however sometimes it doesn't seem to find the lyrics although I can find them using the website.

I slightly modified the scraper code so it uses the search box from lyrics mode if the direct url guessing didn't work.
It can often be the case if the song or artist have not been written exactly like lyricsmode stored it in its database (the cranberries vs cranberries) of because it contains some special characters (k's choice).

The modifications I applied are shown at the end of this post (didn't find how to attach a file). Would it be possible to commit them to the cu.lyrics code ?

Thanks in advance,

Yann

Code:
diff -ur script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py

--- script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-01 20:13:54.106691515 +0200

+++ script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-08 23:33:00.699950122 +0200

@@ -139,6 +139,8 @@

         self.clean_lyrics_regex = re.compile( "<.+?>" )

         self.normalize_lyrics_regex = re.compile( "&#[x]*(?P<name>[0-9]+);*" )

         self.clean_br_regex = re.compile( "<br[ /]*>[\s]*", re.IGNORECASE )

+        self.search_results_regex = re.compile("<a href=\"[^\"]+\">([^<]+)</a></td>[^<]+<td><a href=\"([^\"]+)\" class=\"b\">[^<]+</a></td>", re.IGNORECASE)

+        self.next_results_regex = re.compile("<A href=\"([^\"]+)\" class=\"pages\">next .</A>", re.IGNORECASE)

     def get_lyrics_start(self, *args):

         lyricThread = threading.Thread(target=self.get_lyrics_thread, args=args)

@@ -151,8 +154,36 @@

         l.song = song

         try: # below is borowed from XBMC Lyrics

             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )

-            print "Search url: %s" % (url)

-            song_search = urllib.urlopen(url).read()

+

+            while True:

+                print "Search url: %s" % (url)

+                song_search = urllib.urlopen(url).read()

+                if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:

+                        break

+

+                # Let's try to use the research box if we didn't yet

+                if not 'search' in url:

+                    url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())

+                else:

+                    # the search gave several results, let's try to find our song 

+                    url = ""

+                    start = song_search.find('<!--output-->')

+                    end = song_search.find('<!--/output-->', start)

+                    results = self.search_results_regex.findall(song_search, start, end)

+

+                    for result in results:

+                        if result[0].lower() in song.artist.lower():

+                            url = "http://www.lyricsmode.com" + result[1]

+                            break

+

+                    if not url:

+                        # Is there a next page of results ?

+                        match = self.next_results_regex.search(song_search[end:])

+                        if match:

+                            url = "http://www.lyricsmode.com/search.php" + match.group(1)

+                        else:

+                            return None, "No lyrics found"

+

             lyr = song_search.split("<div id='songlyrics_h' class='dn'>")[1].split('<!-- /SONG LYRICS -->')[0]

             lyr = self.clean_br_regex.sub( "\n", lyr ).strip()

             lyr = self.clean_lyrics_regex.sub( "", lyr ).strip()

diff -ur script.cu.lyrics.orig/resources/lib/song.py script.cu.lyrics/resources/lib/song.py

--- script.cu.lyrics.orig/resources/lib/song.py    2012-04-01 20:13:54.158691515 +0200

+++ script.cu.lyrics/resources/lib/song.py    2012-04-08 16:56:32.617536591 +0200

@@ -30,7 +30,9 @@

     def current():

         song = Song()

         song.title = xbmc.getInfoLabel( "MusicPlayer.Title" )

+        song.title = utilities.deAccent(song.title)

         song.artist = xbmc.getInfoLabel( "MusicPlayer.Artist")

+        song.artist = utilities.deAccent(song.artist)

         print "Current Song: %s:%s" % (song.artist, song.title)

DDDamian · 2012-04-09, 00:29

I'll make sure the right guys see it - thx for helping out Smile

amet · 2012-04-09, 07:51

please check that this is fine https://github.com/amet/script.cu.lyrics...d9b70f81e0 before it goes out, I had to manually apply the patch

@chninkel
if you want to, next time just submit the pull request on github and i'll get it in

DDDamian · 2012-04-09, 09:31

Amet's the right guy Smile

I'll test and ping you in IRC for a blessing and update the thread here for chninkel.

chninkel · 2012-04-09, 11:25

Yes it needs a little testing to be sure there is no problem.
I already have a slight modification to avoid a problem when the song page exists but doesn't contain the lyrics for some reason.

Code:
--- lyricsScraper.py.orig    2012-04-09 11:18:15.668261004 +0200

+++ lyricsScraper.py    2012-04-09 11:18:59.944261331 +0200

@@ -154,12 +154,18 @@

         l.song = song

         try: # below is borowed from XBMC Lyrics

             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )

+            lyrics_found = False

             while True:

                 print "Search url: %s" % (url)

                 song_search = urllib.urlopen(url).read()

                 if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:

-                        break

+                    break

+                if lyrics_found:

+                    # if we're here, we found the lyrics page but it didn't

+                    # contains the lyrics part (licensing issue or some bug)

+                    return None, "No lyrics found"

+                    

                 # Let's try to use the research box if we didn't yet

                 if not 'search' in url:

                     url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())

@@ -173,6 +179,7 @@

                     for result in results:

                         if result[0].lower() in song.artist.lower():

                             url = "http://www.lyricsmode.com" + result[1]

+                            lyrics_found = True

                             break

                     if not url:

@amet: I will try to use git if I have more modifications to submit. I would like to propose some modifications to have the lyrics code work with radio songs (where the song title often contains the artist and the title in fact).

amet · (This post was last modified: 2012-04-09, 11:56 by amet.)

yeah, please use github and submit the pull request there, it makes it much easier to review and incorporate. this doesn't apply cleanly on my side and it has to be done manually.

thx for the fixes and help Smile

DDDamian · 2012-04-10, 17:04

@chninkel - amet will push to git - thx for your work! As he mentions, go thru the pull-request system on git, and be sure to test all you can before submitting a PR.

You've now officially added to XBMC Smile