Kodi Community Forum
Solved Regex are greedy? / extracting content from multiple lines - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: Solved Regex are greedy? / extracting content from multiple lines (/showthread.php?tid=338890)



Regex are greedy? / extracting content from multiple lines - programmkino - 2019-01-01

Hi everyone.

I've started to build a movie scraper and extraction of content like that works:
html:
<h1 class="text-serif">LORD OF THE RINGS</h1>

However, extraction of content from multiple lines does not work as expected.
html:
        <a href="/person/filme/2358">
            Ralph Bakshi
        </a>
Corresponding regex, not escaped:
Code:
<a href="/person/filme/.*">(.*)</a>

That regex does not stop at the closing </a> tag but include much more text until some other closing </a> tag.
So I guess the regex is greedy. Is there a way I can change that?
Alternatively, is there a way I can match whitespace and line breaks in the regex? I wasn't successful then trying.

Thank you,
Ben


RE: Regex are greedy? / extracting content from multiple lines - spiff - 2019-01-02

Use .*? to stop after first match.


RE: Regex are greedy? / extracting content from multiple lines - programmkino - 2019-01-02

Great tip! Thanks a lot!
(I'll open another thread for my next question)


Regex are greedy? / extracting content from multiple lines - Karellen - 2019-01-02

Thread marked solved.