I just compared "universal.xml" between versions of Universal Movie Scraper. This block of XML was deleted between 5.4.6, published in late december, and 5.5.0, published June 15th.
Those dates are consistent with my experience of losing "runtime" scraping a few days ago. Until june, IMDB changes and last week UMS updates, those details were scraped just fine.
xml:
<RegExp input="$$1" output="<runtime>\1</runtime>" dest="5+">
<expression trim="1"><h4[^>]*?>Runtime:</h4>[^0-9]*([0-9]+)</expression>
</RegExp>
I just added that XML fragment to UMS 5.5.2, in the right place (after current "runtime" regex", as in UMS 5.4.6). After scanning a handful of movies, "runtime" information seems to be scraped correctly again. Good!!
Please, consider releasing UMS 5.5.3 with that fix, or an improved version.
Thanks for your time and effort, @
olympia .
PS: I think the the fragment "[^>]*?" doesn't do what you think, it will not eat everything until the next ">" in the HTML. That is my understanding, but I could be wrong.
EDIT: My fault, you are right, the XML machinery will change that "[^>]*?" to the real regex executed: "[^>]*?". That is what I would expect. I beg your pardon. That code is fine. Sorry.