handling parsed Hebrew string.
#1
Hi.

In this code segment for parsing http://www.mako.co.il/mako-vod-index:

Code:
soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
    programs = soup('ul')
    for i,prog in enumerate(programs):
        if i==(4+getLetterValue(name)):
            j = 0
            while j < len(prog('li')):
                li = prog('li')[j]
        link = li('a')[0]
        url = link['href']
                text = link.contents
                print ''.join(text)
                j = j+1

The result is unicode code and not hebrew character. when reading the log file, or adding a string I have generated using join to a directory I see

[u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9']

and not the Hebrew text I want.

How to fix this?

Thanks
Reply
#2
yotama9 Wrote:Hi.

In this code segment for parsing http://www.mako.co.il/mako-vod-index:

Code:
soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
    programs = soup('ul')
    for i,prog in enumerate(programs):
        if i==(4+getLetterValue(name)):
            j = 0
            while j < len(prog('li')):
                li = prog('li')[j]
        link = li('a')[0]
        url = link['href']
                text = link.contents
                print ''.join(text)
                j = j+1

The result is unicode code and not hebrew character. when reading the log file, or adding a string I have generated using join to a directory I see

[u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9']

and not the Hebrew text I want.

How to fix this?

Thanks

it already is a unicode string. you may need to encode it to utf-8
Code:
>>> print u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9'.encode('utf-8')
תאמין לי
Reply
#3
once again, t0mm0 to the rescue.

Thanks.
Reply

Logout Mark Read Team Forum Stats Members Help
handling parsed Hebrew string.0