Solved - Regex are greedy? / extracting content from multiple lines

Solved Regex are greedy? / extracting content from multiple lines - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: Solved Regex are greedy? / extracting content from multiple lines (/showthread.php?tid=338890)

Regex are greedy? / extracting content from multiple lines - programmkino - 2019-01-01

Hi everyone.

I've started to build a movie scraper and extraction of content like that works:

html:
<h1 class="text-serif">LORD OF THE RINGS</h1>

However, extraction of content from multiple lines does not work as expected.

html:
        <a href="/person/filme/2358">

            Ralph Bakshi

        </a>

Corresponding regex, not escaped:

Code:
<a href="/person/filme/.*">(.*)</a>

That regex does not stop at the closing </a> tag but include much more text until some other closing </a> tag.
So I guess the regex is greedy. Is there a way I can change that?
Alternatively, is there a way I can match whitespace and line breaks in the regex? I wasn't successful then trying.

Thank you,
Ben

RE: Regex are greedy? / extracting content from multiple lines - spiff - 2019-01-02

Use .*? to stop after first match.

RE: Regex are greedy? / extracting content from multiple lines - programmkino - 2019-01-02

Great tip! Thanks a lot!
(I'll open another thread for my next question)

Regex are greedy? / extracting content from multiple lines - Karellen - 2019-01-02

Thread marked solved.