help on unicode string with white spaces
#1
I have a problem unicode string (korean) matching that is surrounded by lots of tab and spaces.


Code:
<strong>등급</strong></dt>
<dd>
                
                                                                                                                                         청소년관람불가(한국)            </dd>

What I trying to get is words between <dd> and </dd>.

Code:
<RegExp input="$$7" output="&lt;mpaa&gt;\1&lt;/mpaa&gt;" dest="8+">
     <RegExp input="$$1" output="\1" dest="7">
               <expression noclean="1">&lt;strong&gt;등급&lt;/strong&gt;&lt;/dt&gt;[^&gt;]*&gt;(.[^&lt;]*)&lt;/dd&gt;</expression>
     </RegExp>
     <expression trim="1"></expression>
</RegExp>

With this, I could get whatever between <dd> and </dd>
problem is that I can not get rid of white spaces around words.

I tried with no "noclean", "trim", /s, /t which does not help.
If I use /b, it get rid of whole string. regex engine does not seem to support /p. I looked at pcre and saying that supporting /p is option.

please guide me on this.
Reply
#2
never mind. I solved the problem.
Reply

Logout Mark Read Team Forum Stats Members Help
help on unicode string with white spaces0