Find on Page - mikey1234 - 2012-08-10
how do i find something in a page , basically im scraping an .nfoview for easynews which has all the information about language DTS, blah blah
but all the nfoviews are different what i would like to do is
search page
find english,german,french, blah blah
if found print results basically its just so people dont click on an mkv and find its in the wrong language
the info view looks similar this
Code: ÞÛÛ²² ° ÜÛÛ²²ßÜÛÛ²ÜÛÛÛÛÛ²° ÛÛÛÛ ÞÛÛÝÛÛ²²Ý ÛÛÛÛ ÞÛÛݲÛÛ²± ÜÜÞÜÜþ °°°°°
ÛÛÛ²² ßÜÞÛܱ ÞÛÛ²²Ý±²ÛßÛÛÛßÛÛÛ²± ÛÛÛÛÜÛ²ß ÛÛÛ²Ý ÛÛÛÛÜÛÛß ÛÛÛ²Ü ßßÞ²²° ²²²²²
ÛÛÛ²² ÝÛß ß² ÞÛÛ²²Ý²ÛÛ ßßß ÞÛÛ²±°ÛÛ²²° Ü ÞÛÛ²² ÛÛ²²ÛÛÛÛÛÜÜßÛÛ²²ÜÜ ßß ÛÛÛÛß
ÛÛÛ²²Ü ßßܲ²Ý ÛÛÛ²² ßß ÜÜÜ ßßßß ßßßß ÞÛ² ßß ßßßß ßßÛÛÛÛÛßßÛ²²²²ÛÜÜÛÛÛÝ
ßÛÛÛ²²ÜÜ ßß ÛÛÛ²² °²ß ßÛÛÛÛÜÜÜÝÜÜÜÛ²²±ÛÜÜÜÜÜÛÛÛÜÜÜÜÜ ßßÛ Ü ßßß²²²²²²²Ü
ßßÛÛ²²²²²ÜÜÜßÛÛÛ²² Ûß ßßßÝß ßß ° ß ßß²ÛßÛÜÜÝ Ü²²ÜÜÞܱ
Ü þßßßÛÛ²²²²²ÛÛ²²ÝÝ ° ß ± °Ü
Ü °²ÜÜÜ ßßßÛÛ²²ÝÝ OBEY THE EMPiRE, UNDERLING. ° ° Ü
Ü²ß ÛßÞÜÛ²° ÛÛÛÛÛÝ ° ß²Ü
ÞÛÝÜ Ü ² Û°° ßßß ÜÞÛÝ
ßÛÝ ß²ß ° Û Real Steel (2011) ÞÛß
ܲÜÛß Ü ² Ü ßÛܲÜ
ß ÛÛÜ þ ° þ ÜÛÛ ß
ÛßÛ Ü Release date .............: 27.03.2012 Ü ÛßÛ
Û°ÛÛÝ BluRay date ..............: 12.04.2012 ÞÛÛ°Û
Û±Ûß²Ü Cinema date ..............: 03.11.2011 ܲßÛ±Û
Û²Û Ü ß Runtime ..................: 127 minutes ß Ü Û²Û
Û²Ûß²ß Genre ....................: Action ß²ßÛ²Û
ÛÛÛÜ Subtitles.................: German, English VOB ÜÛÛÛ
ÛÛÛ ÜÜ Source ...................: BluRay ÜÜ ÛÛÛ
ÛÛÛ ß²²Ü Format ...................: x264 ܲ²ß ÛÛÛ
ÛÛÛ ß²²Ü Video ....................: [ ] Untouched ܲ²ß ÛÛÛ
ÛÛÛ ß²²Ü [X] Reencoded ܲ²ß ÛÛÛ
ÛÛÛ ßÜÞÛÛ Language .................: [ ] German DD 5.1 ÛÛÝÜß ÛÛÛ
ÛÛÛ Þ²Ûß [X] German DTS ÛÛÝÜß ÛÛÛ
ÛÛÛ Þ²Ûß [ ] Englisch DD 5.1 ÛÛÝÜß ÛÛÛ
ÛÛÛ Þ²Ûß [X] Englisch DTS ÛÛÝÜß ÛÛÛ
ÛÛÛ Þ²Ûß Extras ...................: [ ] Untouched ßÛ²Ý ÛÛÛ
ÛÛÛ Þ²Ûß [X] None ßÛ²Ý ÛÛÛ
ÛÜÛ þ Disks ....................: 53 * 100 MB þ ÛÜÛ
ÛÛßÜß ßÜßÛÛ
ÛÛÛÝ Ü iMDB......................: 7.2/10 (81549) Ü ÞÛÛÛ
ÛÛÛ Ü http://www.imdb.com/title/tt0433035/ Ü ÛÛÛ
ÞÛÛÜÛß Ü Ü ßÛÜÛÛÝ
ß²²Ý ²Ý Þ² Þ²²ß
ßÛÜßÛÜ Ü Ü ÜÛßÜÛß
ÜßÞÛÝþ ²ßÜ Ü °° Ü °° Ü Üß² þÞÛÝßÜ
ÞÝ ß²Ü ° Ü²ß ß ²² ÜßÜ ß±ß ÜßÜ ²² ß ß²Ü ° Ü²ß ÞÝ
ÛÜ ß ÜÜ° ÞÛÝÜ Ü²Ü ÞÛÝ ßÜ Ü Üß ÞÛÝ Ü²Ü ÜÞÛÝ °ÜÜ ß ÜÛ
² ß þ ßßßÞß²Ý ß Ü ß²ÜÜ ÞÛÜÛÛ²ÜÛÝ ÜÜ²ß Ü ß Þ²ßÝßßß þ ß ²
± Ü Þ ß Ü ßß²²Û ÛÛ²²²²Û Û²²ßß Ü ß Ý Ü ±
° Ü ÜÜ ß þ ²ÜÜÜÜÜÜÜÜܲ þ ß ÜÜ Ü °
Ü Ü²²ß ß²²Ü Ü
Ü²ß ÞÛÛÝÜß ßÜÞÛÛÝ ß²Ü
ÞÛÝÜ ßÛ²Ý M O V I E P L O T Þ²Ûß ÜÞÛÝ
ÞßÛÝ °° ß Ü Ü ß °° ÞÛßÝ
ÞÜ°ß Ü Ü ß°ÜÝ
ßÛß²Ü þ þ ܲßÛß
Ü Ü² Ü Ü ²Ü Ü
ÛÛÛÛ Ü ß http://www.cinefacts.de/blu-ray-film/ ß Ü ÛÛÛÛ
ÛÛÛÛß²Ü 69522-real-steel.html ܲßÛÛÛÛ
ÛÛÛÛ Ü ß ß Ü ÛÛÛÛ
ÛÛÛÛß²ß ß²ßÛÛÛÛ
ÛÛÛÛ ÛÛÛÛ
ÛÛÛÛ ÛÛÛÛ
ÛÛÛÛ I.N.F.O ÛÛÛÛ
ÛÛÛÛ ÛÛÛÛ
ÛÛÛÛ 1280x544 @ crf20 (2533kbps) ÛÛÛÛ
ÞÛÛÜ Ü German DTS @ 1509 kbps Ü ÜÛÛÝ
ß²²Ý Ü English DTS @ 1509 kbps Ü Þ²²ß
also like this
Code: *******************************************************************************
Real Steel
*******************************************************************************
-------------------------------------------------------------------------------
General Information
-------------------------------------------------------------------------------
Type.................: Movie
Platform.............: windows vista
Part Size............: 200,000,000 bytes
Compression Format...: RAR
File Validation......: SFV
Year.................: 12
Type.................: German
Duration.............: 122
Cover(s) Included....: Yes
Audio Format.........: Dolby Digital
Encoder..............: AC3 5.1
Bitrate..............: 256
Hz...................: 48,000
Channels.............: 5,1
Source...............: DVDRip
Video Format.........: MKV Xvid
Video Bitrate........: 2500Kbps
Resolution...........: 1280x720
FPS..................: 29,97
Source...............: DVD 16x9
Original Format......: PAL
Genre................: Action/Abenteuer
IMDb Rating..........: 9.5
-------------------------------------------------------------------------------
Post Information
-------------------------------------------------------------------------------
Posted by............: Gollum fuer usenet
Posted on............: 22.04.2012
-------------------------------------------------------------------------------
Generated with Cool NFO Creator - http://fly.to/coolbeans
-------------------------------------------------------------------------------
RE: Find on Page - sphere - 2012-08-10
There are different ways, best would be to use regular expressions because then you can search case insensitive.
Code: import re
text = ' fooo asasa germAn baar' # replace with your nfoview content
if re.search('german', text, re.IGNORECASE):
print 'german found'
if re.search('english', text, re.IGNORECASE):
print 'english found'
regards,
sphere
RE: Find on Page - mikey1234 - 2012-08-10
if i do this
Code: import re
text = ' fooo asasa germAn baar'
try:
if re.search('english', text, re.IGNORECASE):
re.search= 'english found'
except:
re.search='hello'
print re.search
i get this error
Code: <function search at 0x01DAFC70>
obviously because i hasnt found it but shouldn't it print 'hello'
because if i do thisCode: import re
text = ' fooo asasa germAn baar'
try:
if re.search('german', text, re.IGNORECASE):
re.search= 'german found'
except:
re.search='hello'
print re.search
it prints german found
RE: Find on Page - mikey1234 - 2012-08-10
done it
Code: import re
text = ' fooo asasa english baar'
if re.search('german', text, re.IGNORECASE):
print 'german found'
if not re.search('german', text, re.IGNORECASE):
print 'hello'
RE: Find on Page - mikey1234 - 2012-08-10
if i do this everything is ok
Code: import re
link = ' fooo asasa english baar french 5.1 DtS'
if re.search('Engli', link, re.IGNORECASE):
print 'English Audio'
if not re.search('Engli', link, re.IGNORECASE):
print ''
if re.search('german', link, re.IGNORECASE):
print 'German Audio'
if not re.search('german', link, re.IGNORECASE):
print ''
if re.search('Deuts', link, re.IGNORECASE):
print 'German Audio'
if not re.search('Deuts', link, re.IGNORECASE):
print ''
if re.search('french', link, re.IGNORECASE):
print 'French Audio'
if not re.search('french', link, re.IGNORECASE):
print ''
if re.search('turk', link, re.IGNORECASE):
print 'Turkish Audio'
if not re.search('turk', link, re.IGNORECASE):
print ''
if re.search('DTS', link, re.IGNORECASE):
print 'DTS'
if not re.search('DTS', link, re.IGNORECASE):
print ''
if re.search('DD', link,):
print 'DD'
if not re.search('DD', link,):
print ''
if re.search('AC3', link, re.IGNORECASE):
print 'AC3'
if not re.search('AC3', link, re.IGNORECASE):
print ''
if re.search('5.1', link, re.IGNORECASE):
print '5.1 Surround Sound'
if not re.search('5.1', link, re.IGNORECASE):
print ''
but when doing this
Code: import re
link = ' fooo asasa english baar french 5.1 DtS'
if re.search('Engli', link, re.IGNORECASE):
re.search= 'English Audio'
if not re.search('Engli', link, re.IGNORECASE):
re.search= ''
if re.search('german', link, re.IGNORECASE):
re.search= 'German Audio'
if not re.search('german', link, re.IGNORECASE):
re.search= ''
if re.search('Deuts', link, re.IGNORECASE):
re.search= 'German Audio'
if not re.search('Deuts', link, re.IGNORECASE):
re.search= ''
if re.search('french', link, re.IGNORECASE):
re.search= 'French Audio'
if not re.search('french', link, re.IGNORECASE):
re.search= ''
if re.search('turk', link, re.IGNORECASE):
re.search= 'Turkish Audio'
if not re.search('turk', link, re.IGNORECASE):
re.search= ''
if re.search('DTS', link, re.IGNORECASE):
re.search= 'DTS'
if not re.search('DTS', link, re.IGNORECASE):
re.search= ''
if re.search('DD', link,):
re.search= 'DD'
if not re.search('DD', link,):
re.search= ''
if re.search('AC3', link, re.IGNORECASE):
re.search= 'AC3'
if not re.search('AC3', link, re.IGNORECASE):
re.search= ''
if re.search('5.1', link, re.IGNORECASE):
re.search= '5.1 Surround Sound'
if not re.search('5.1', link, re.IGNORECASE):
re.search= ''
it gives this error
Code: Traceback (most recent call last):
File "C:\Users\Mike\Desktop\link.py", line 7, in <module>
if not re.search('Engli', link, re.IGNORECASE):
TypeError: 'str' object is not callable
RE: Find on Page - giftie - 2012-08-10
you can't set a reg-ex function with a string..
you need to set strings.
Code: import re
audio_language = ""
audio_codec = ""
audio_channels = ""
link = ' fooo asasa english baar french 5.1 DtS'
if re.search('Engli', link, re.IGNORECASE):
audio_language = 'English Audio'
if re.search('german', link, re.IGNORECASE):
audio_language = 'German Audio'
if re.search('Deuts', link, re.IGNORECASE):
audio_language = 'German Audio'
if re.search('french', link, re.IGNORECASE):
audio_language = 'French Audio'
if re.search('turk', link, re.IGNORECASE):
audio_language = 'Turkish Audio'
if re.search('DTS', link, re.IGNORECASE):
audio_codec = 'DTS'
if re.search('DD', link,):
audio_codec = 'DD'
if re.search('AC3', link, re.IGNORECASE):
audio_codec = 'AC3'
if re.search('5.1', link, re.IGNORECASE):
audio_channels = '5.1 Surround Sound'
print audio_language
print audio_codec
print audio_channels
The problem you will have is when there are multiple language tracks, with what you have it will overwrite the previous one. Might need to set up a dict for storing each track.
RE: Find on Page - Bstrdsmkr - 2012-08-11
Just for posterity sake, I'll mirror my solution posted at xbmchub.com:
Your error is because you're trying to set re.search (a function) equal to 'English Audio' (a string), then trying to use it as a function again in the next if statement. You'll want to make a new variable to hold the result. Something like this:
Code: if re.search('Engli', link, re.IGNORECASE):
lang = 'English Audio'
A couple of things to think about though. In the first .nfo, they have the audio listed as check boxes. Your current method would trigger on those options even though they're not "checked"
I think what I would do is create an array of regex's to match for each language, then loop through them (you'll probably want to store them in a separate file and import the file for easy maintenance)
Code: all_languages = {
"English" : ["\[X\] Englisch DD 5\.1", "\[X\] Englisch DTS", "Type(?:.)+?: English"],
"German": ["\[X\] German DD 5\.1", "\[X\] German DTS", "Type(?:.)+?: German"]
}
available_languages = []
for language in all_languages:
for regex in language:
if re.search(regex, link, re.IGNORECASE):
available_languages.append(language)
That should give you back a list of all languages that are indicated in the nfo. When you find a new nfo format that doesn't match any of the existing regexes, just add a new regex to the list for that language.
Find on Page - mikey1234 - 2012-08-11
Lol
RE: Find on Page - mikey1234 - 2012-08-13
im trying to do this cause all i want is a quick dialog coming up
Code: def EasySearch(name,iconimage,fanart):
search_entered = str(name).replace(' ','+') .replace(':','') .replace(', ','+').replace(',','+').replace('[','').replace(']',' ').replace('The','').replace('(','') .replace(')','') .replace('-','+')
theurl = 'http://members.easynews.com/global5/index.html?gps='+search_entered+'&sbj=&
print theurl
username = ADDON.getSetting('easy_user')
password = ADDON.getSetting('easy_pass')
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
link= pagehandle.read()
match=re.compile('<a href="(.+?)" target="subjTarget".+?<span class="autounrarlink">(.+?)</span></a>.+?class="fSize" nowrap>(.+?)</td>').findall(link)
class MyClass():
if re.search('alt="English"', link, re.IGNORECASE):
eng= 'Found English Audio'
if not re.search('alt="English"', link, re.IGNORECASE):
eng= ''
if re.search('alt="German"', link, re.IGNORECASE):
ger= 'Found German Audio'
if not re.search('alt="German"', link, re.IGNORECASE):
ger= ''
if re.search('alt="French"', link, re.IGNORECASE):
fre= 'Found French Audio'
if not re.search('alt="French"', link, re.IGNORECASE):
fre= ''
if re.search('alt="Turkish"', link, re.IGNORECASE):
tur= 'Found Turkish Audio'
if not re.search('alt="Turkish"', link, re.IGNORECASE):
tur= ''
dialog = xbmcgui.Dialog()
dialog.ok= (MyClass())
but i get this error
Code: NameError: free variable 'MyClass' referenced before assignment in enclosing scope
RE: Find on Page - mikey1234 - 2012-08-13
ok even if i do this
Code: eng=''
ger=''
fre=''
tur=''
if re.search('alt="English"', link, re.IGNORECASE):
eng= 'Found English Audio'
if not re.search('alt="English"', link, re.IGNORECASE):
eng= ''
if re.search('alt="German"', link, re.IGNORECASE):
ger= 'Found German Audio'
if not re.search('alt="German"', link, re.IGNORECASE):
ger= ''
if re.search('alt="French"', link, re.IGNORECASE):
fre= 'Found French Audio'
if not re.search('alt="French"', link, re.IGNORECASE):
fre= ''
if re.search('alt="Turkish"', link, re.IGNORECASE):
tur= 'Found Turkish Audio'
if not re.search('alt="Turkish"', link, re.IGNORECASE):
tur= ''
dialog= xbmcgui.Dialog()
dialog.ok= (eng,ger,fre,tur)
error isCode: AttributeError: 'xbmcgui.Dialog' object attribute 'ok' is read-only
RE: Find on Page - mikey1234 - 2012-08-13
thanks i have fixed it
Code: eng=''
ger=''
fre=''
tur=''
if re.search('gb.png" alt="English"', link, re.IGNORECASE):
eng= 'Found English Audio'
if not re.search('gb.png" alt="English"', link, re.IGNORECASE):
eng= ''
if re.search('alt="German"', link, re.IGNORECASE):
ger= 'Found German Audio'
if not re.search('alt="German"', link, re.IGNORECASE):
ger= ''
if re.search('alt="French"', link, re.IGNORECASE):
fre= 'Found French Audio'
if not re.search('alt="French"', link, re.IGNORECASE):
fre= ''
if re.search('alt="Turkish"', link, re.IGNORECASE):
tur= 'Found Turkish Audio'
if not re.search('alt="Turkish"', link, re.IGNORECASE):
tur= ''
xbmcgui.Dialog().ok('Found These Audios',eng,ger,fre)
only problem with dialog is it only has 4 attributes what other dialog or notifications can i use to display more
RE: Find on Page - mikey1234 - 2012-08-13
thanks i have fixed it
Code: eng=''
ger=''
fre=''
tur=''
if re.search('gb.png" alt="English"', link, re.IGNORECASE):
eng= 'Found English Audio'
if not re.search('gb.png" alt="English"', link, re.IGNORECASE):
eng= ''
if re.search('alt="German"', link, re.IGNORECASE):
ger= 'Found German Audio'
if not re.search('alt="German"', link, re.IGNORECASE):
ger= ''
if re.search('alt="French"', link, re.IGNORECASE):
fre= 'Found French Audio'
if not re.search('alt="French"', link, re.IGNORECASE):
fre= ''
if re.search('alt="Turkish"', link, re.IGNORECASE):
tur= 'Found Turkish Audio'
if not re.search('alt="Turkish"', link, re.IGNORECASE):
tur= ''
xbmcgui.Dialog().ok('Found These Audios',eng,ger,fre)
only problem with dialog is it only has 4 attributes what other dialog or notifications can i use to display more
|