2019-01-01, 18:54
Hi everyone.
I've started to build a movie scraper and extraction of content like that works:
However, extraction of content from multiple lines does not work as expected.
That regex does not stop at the closing </a> tag but include much more text until some other closing </a> tag.
So I guess the regex is greedy. Is there a way I can change that?
Alternatively, is there a way I can match whitespace and line breaks in the regex? I wasn't successful then trying.
Thank you,
Ben
I've started to build a movie scraper and extraction of content like that works:
html:<h1 class="text-serif">LORD OF THE RINGS</h1>
However, extraction of content from multiple lines does not work as expected.
Corresponding regex, not escaped:html:<a href="/person/filme/2358">
Ralph Bakshi
</a>
Code:
<a href="/person/filme/.*">(.*)</a>
That regex does not stop at the closing </a> tag but include much more text until some other closing </a> tag.
So I guess the regex is greedy. Is there a way I can change that?
Alternatively, is there a way I can match whitespace and line breaks in the regex? I wasn't successful then trying.
Thank you,
Ben