Some improvements for lyricsmode
#1
Wink 
Hi,

I am currently using the script cu.lyrics with lyricsmode and I am very happy with it however sometimes it doesn't seem to find the lyrics although I can find them using the website.

I slightly modified the scraper code so it uses the search box from lyrics mode if the direct url guessing didn't work.
It can often be the case if the song or artist have not been written exactly like lyricsmode stored it in its database (the cranberries vs cranberries) of because it contains some special characters (k's choice).

The modifications I applied are shown at the end of this post (didn't find how to attach a file). Would it be possible to commit them to the cu.lyrics code ?

Thanks in advance,

Yann


Code:
diff -ur script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py
--- script.cu.lyrics.orig/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-01 20:13:54.106691515 +0200
+++ script.cu.lyrics/resources/lib/scrapers/lyricsmode/lyricsScraper.py    2012-04-08 23:33:00.699950122 +0200
@@ -139,6 +139,8 @@
         self.clean_lyrics_regex = re.compile( "<.+?>" )
         self.normalize_lyrics_regex = re.compile( "&#[x]*(?P<name>[0-9]+);*" )
         self.clean_br_regex = re.compile( "<br[ /]*>[\s]*", re.IGNORECASE )
+        self.search_results_regex = re.compile("<a href=\"[^\"]+\">([^<]+)</a></td>[^<]+<td><a href=\"([^\"]+)\" class=\"b\">[^<]+</a></td>", re.IGNORECASE)
+        self.next_results_regex = re.compile("<A href=\"([^\"]+)\" class=\"pages\">next .</A>", re.IGNORECASE)
    
     def get_lyrics_start(self, *args):
         lyricThread = threading.Thread(target=self.get_lyrics_thread, args=args)
@@ -151,8 +154,36 @@
         l.song = song
         try: # below is borowed from XBMC Lyrics
             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )
-            print "Search url: %s" % (url)
-            song_search = urllib.urlopen(url).read()
+
+            while True:
+                print "Search url: %s" % (url)
+                song_search = urllib.urlopen(url).read()
+                if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:
+                        break
+
+                # Let's try to use the research box if we didn't yet
+                if not 'search' in url:
+                    url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())
+                else:
+                    # the search gave several results, let's try to find our song
+                    url = ""
+                    start = song_search.find('<!--output-->')
+                    end = song_search.find('<!--/output-->', start)
+                    results = self.search_results_regex.findall(song_search, start, end)
+
+                    for result in results:
+                        if result[0].lower() in song.artist.lower():
+                            url = "http://www.lyricsmode.com" + result[1]
+                            break
+
+                    if not url:
+                        # Is there a next page of results ?
+                        match = self.next_results_regex.search(song_search[end:])
+                        if match:
+                            url = "http://www.lyricsmode.com/search.php" + match.group(1)
+                        else:
+                            return None, "No lyrics found"
+
             lyr = song_search.split("<div id='songlyrics_h' class='dn'>")[1].split('<!-- /SONG LYRICS -->')[0]
             lyr = self.clean_br_regex.sub( "\n", lyr ).strip()
             lyr = self.clean_lyrics_regex.sub( "", lyr ).strip()
diff -ur script.cu.lyrics.orig/resources/lib/song.py script.cu.lyrics/resources/lib/song.py
--- script.cu.lyrics.orig/resources/lib/song.py    2012-04-01 20:13:54.158691515 +0200
+++ script.cu.lyrics/resources/lib/song.py    2012-04-08 16:56:32.617536591 +0200
@@ -30,7 +30,9 @@
     def current():
         song = Song()
         song.title = xbmc.getInfoLabel( "MusicPlayer.Title" )
+        song.title = utilities.deAccent(song.title)
         song.artist = xbmc.getInfoLabel( "MusicPlayer.Artist")
+        song.artist = utilities.deAccent(song.artist)
        
         print "Current Song: %s:%s" % (song.artist, song.title)
Reply
#2
I'll make sure the right guys see it - thx for helping out Smile
System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon  AVR-3808CI  - Denon DVD-5900 Universal Player  - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray  - X-Box 360  - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
Reply
#3
please check that this is fine https://github.com/amet/script.cu.lyrics...d9b70f81e0 before it goes out, I had to manually apply the patch


@chninkel
if you want to, next time just submit the pull request on github and i'll get it in
Reply
#4
Amet's the right guy Smile

I'll test and ping you in IRC for a blessing and update the thread here for chninkel.
System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon  AVR-3808CI  - Denon DVD-5900 Universal Player  - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray  - X-Box 360  - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
Reply
#5
Yes it needs a little testing to be sure there is no problem.
I already have a slight modification to avoid a problem when the song page exists but doesn't contain the lyrics for some reason.

Code:
--- lyricsScraper.py.orig    2012-04-09 11:18:15.668261004 +0200
+++ lyricsScraper.py    2012-04-09 11:18:59.944261331 +0200
@@ -154,12 +154,18 @@
         l.song = song
         try: # below is borowed from XBMC Lyrics
             url = "http://www.lyricsmode.com/lyrics/%s/%s/%s.html" % (song.artist.lower()[:1],song.artist.lower().replace(" ","_"), song.title.lower().replace(" ","_"), )
+            lyrics_found = False
             while True:
                 print "Search url: %s" % (url)
                 song_search = urllib.urlopen(url).read()
                 if song_search.find("<div id='songlyrics_h' class='dn'>") >= 0:
-                        break
+                    break

+                if lyrics_found:
+                    # if we're here, we found the lyrics page but it didn't
+                    # contains the lyrics part (licensing issue or some bug)
+                    return None, "No lyrics found"
+                    
                 # Let's try to use the research box if we didn't yet
                 if not 'search' in url:
                     url = "http://www.lyricsmode.com/search.php?what=songs&s=" + urllib.quote_plus(song.title.lower())
@@ -173,6 +179,7 @@
                     for result in results:
                         if result[0].lower() in song.artist.lower():
                             url = "http://www.lyricsmode.com" + result[1]
+                            lyrics_found = True
                             break

                     if not url:

@amet: I will try to use git if I have more modifications to submit. I would like to propose some modifications to have the lyrics code work with radio songs (where the song title often contains the artist and the title in fact).
Reply
#6
yeah, please use github and submit the pull request there, it makes it much easier to review and incorporate. this doesn't apply cleanly on my side and it has to be done manually.

thx for the fixes and help Smile
Reply
#7
@chninkel - amet will push to git - thx for your work! As he mentions, go thru the pull-request system on git, and be sure to test all you can before submitting a PR.

You've now officially added to XBMC Smile
System: XBMC HTPC with HDMI WASAPI & AudioEngine - Denon  AVR-3808CI  - Denon DVD-5900 Universal Player  - Denon DCM-27 CD-Changer
- Sony BDP-S580 Blu-Ray  - X-Box 360  - Android tablet wireless remote - 7.1 Streem/Axiom/Velodyne Surround System
If I have been able to help feel free to add to my reputation +/- below - thanks!
Reply

Logout Mark Read Team Forum Stats Members Help
Some improvements for lyricsmode0