Fix for IMDB ParseIMDBOutline function
#1
The v3.0.7 IMDB metadata scraper function ParseIMDBOutline is not returning a proper result for web pages containing a "See full summary..." link within the outline.

Example listing of "A Fork in the Road (2009)" https://www.imdb.com/title/tt1117386/?ref_=nv_sr_1
the function returns
12:22:17.193 T:524   DEBUG: scraper: ParseIMDBOutline returned <details><outline>Will, an escaped convict, inadvertently takes refuge in a barn the same night the owners, April and Martin, get into a terrible fight. A gun shot goes off inside the house. April drags ...
                                                                See full summary »</outline></details>

Adding a new regex to the function so it looks like:
 
Code:
<ParseIMDBOutline dest="5">
<RegExp input="$$2" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
<RegExp input="$$1" output="&lt;outline&gt;\1&lt;/outline&gt;" dest="2">
<expression fixchars="1" trim="1">&lt;div class=&quot;summary_text&quot;&gt;(.+?)&lt;div\sclass</expression>
</RegExp>
<RegExp input="$$1" output="&lt;outline&gt;\1&lt;/outline&gt;" dest="2">
<expression fixchars="1" trim="1">&lt;div class=&quot;summary_text&quot;&gt;(.+?)&lt;a\shref=&quot;[^&quot;]*&quot;\s*&gt;Add\sa\sPlot</expression>
</RegExp>
<RegExp input="$$1" output="&lt;outline&gt;\1&lt;/outline&gt;" dest="2">
<expression fixchars="1" trim="1">&lt;div class=&quot;summary_text&quot;&gt;(.+?)&lt;a\shref=&quot;(.+?)=tt_ov_pl&quot;</expression>
</RegExp>
<expression noclean="1" />
</RegExp>
</ParseIMDBOutline>

returns a more readable:
12:31:42.499 T:4648 DEBUG: scraper: ParseIMDBOutline returned <details><outline>Will, an escaped convict, inadvertently takes refuge in a barn the same night the owners, April and Martin, get into a terrible fight. A gun shot goes off inside the house. April drags ...</outline></details>

Regards,
AT2010
Reply
#2
Nice one, let's hope tt_ov_pl" does not pop up elsewhere on the pages for other movies... Smile

Added at v3.0.8
Reply

Logout Mark Read Team Forum Stats Members Help
Fix for IMDB ParseIMDBOutline function0