2013-06-23, 22:45
I was finally able to get the scraper to create a search URL, but now I'm stuck at the next spot: getting results.
Here is a section of the log:
And here is the scraper:
I've been messing with the regex in GetSearchResults, but keep getting the same results: none! I've been assuming it's the regex, but I'm not positive. The .xml file opens in Firefox with no problems, so it shouldn't be an XML error. The film is listed at the website, so it's not a matter of having nothing to return.
Any suggestions?
Here is a section of the log:
Code:
16:29:21 T:140634451166976 DEBUG: GetMovieId (/home/gagarin/Videos/Grindhouse/Emmanuelle.avi), query = select idMovie from movie where idFile=13
16:29:21 T:140634451166976 DEBUG: VideoInfoScanner: No NFO file found. Using title search for '/home/gagarin/Videos/Grindhouse/Emmanuelle.avi'
16:29:21 T:140634451166976 DEBUG: FindMovie: Searching for 'Emmanuelle' using Grindhouse Database scraper (path: '/home/gagarin/.xbmc/addons/metadata.grindhousedatabase.com', content: 'movies', version: '0.0.2')
16:29:21 T:140634451166976 DEBUG: scraper: CreateSearchUrl returned <url>http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search</url>
16:29:21 T:140634451166976 DEBUG: CurlFile::Open(0x7fe7f4027e50) http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search
16:29:22 T:140635504355200 INFO: LIRC Initialize: using: /dev/lircd
16:29:22 T:140635504355200 DEBUG: Failed to connect to LIRC. Giving up.
16:29:22 T:140634451166976 DEBUG: scraper: GetSearchResults returned <results></results>
16:29:22 T:140634451166976 DEBUG: FindMovie: Searching for 'Emmanuelle' using Grindhouse Database scraper (path: '/home/gagarin/.xbmc/addons/metadata.grindhousedatabase.com', content: 'movies', version: '0.0.2')
16:29:22 T:140634451166976 DEBUG: scraper: CreateSearchUrl returned <url>http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search</url>
16:29:22 T:140634451166976 DEBUG: CurlFile::Open(0x7fe7f4027e50) http://www.grindhousedatabase.com/index.php/Special:Search?search=emmanuelle&fulltext=Search
16:29:23 T:140635504355200 DEBUG: ------ Window Deinit (Pointer.xml) ------
16:29:24 T:140634451166976 DEBUG: scraper: GetSearchResults returned <results></results>
16:29:24 T:140634451166976 WARNING: No information found for item '/home/gagarin/Videos/Grindhouse/Emmanuelle.avi', it won't be added to the library.
And here is the scraper:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<scraper date="2013-06-17" framework="1.1">
<NfoUrl dest="3">
<RegExp input="$$1" output="<url>http://www.grindhousedatabase.com/index.php/\1</url>" dest="3">
<expression noclean="1">grindhousedatabase.com/index.php/([a-zA-Z0-9\s\p{P}]*)</expression>
</RegExp>
</NfoUrl>
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://www.grindhousedatabase.com/index.php/Special:Search?search=\1&amp;fulltext=Search</url>" dest="3">
<expression noclean="1" />
</RegExp>
</CreateSearchUrl>
<GetSearchResults dest="8">
<RegExp input="$$3" output="<results>\1</results>" dest="8">
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.grindhousedatabase.com/index.php/\1</url></entity>" dest="5">
<expression repeat="yes"><div class='mw-search-result-heading'><a href="/index.php/([^"]*)" title="([^"]*)"</expression>
</RegExp>
<expression clear="yes" noclean="1" />
</RegExp>
</GetSearchResults>
<GetDetails dest="3">
<RegExp input="$$5" output="<details>\1</details>" dest="3">
<!-- TITLE -->
<RegExp input="$$1" output="<title>\1</title>" dest="5">
<expression><h1 class="firstHeading">([^<]*)</expression>
</RegExp>
<!-- YEAR -->
<RegExp input="$$1" output="<year>\1</year>" dest="5+">
<expression><a href="/index.php/Category:([0-9]{4})</expression>
</RegExp>
<!-- DIRECTOR -->
<RegExp input="$$1" output="<director>\1</director>" dest="5+">
<expression>Directed by ([a-z,A-Z, ]*$)</expression>
</RegExp>
<!-- TOP250 -->
<!-- Grindhouse Database Top 20 -->
<!-- MPAA -->
<!-- GHDB doesn't really do this, since most will have multiple ratings -->
<!-- TAGLINE -->
<!-- RUNTIME -->
<RegExp input="$$1" output="<runtime>\1</runtime>" dest="5+">
<expression>Running Time: ([0-9]{2,3}) min</expression>
</RegExp>
<!-- THUMB-->
<RegExp input="$$1" output="<thumb>\1</thumb>" dest="5+">
<expression>src="/images/thumb/([a-zA-Z0-9\s\p{P}]*)" width</expression>
</RegExp>
<!-- CREDITS -->
<!-- GHDB doesn't do full credits -->
<!-- RATING -->
<!-- GHDB doesn't do this -->
<!-- VOTES -->
<!-- GHDB doesn't do this -->
<!-- GENRE -->
<!-- Use GHDB categories, excluding year released -->
<RegExp input="$$1" output="<genre>\1</genre>" dest="5+">
<expression><li><a href="/index.php/Category:([a-zA-Z0-9\s\p{P}]*)" title</expression>
</RegExp>
<!-- ACTOR -->
<!-- NAME -->
<!-- ROLE -->
<!-- GHDB doesn't do this -->
<!-- OUTLINE -->
<!-- GHDB doesn't do this -->
<!-- PLOT -->
<!-- GHDB doesn't do this -->
<expression clear="yes" noclean="1" />
</RegExp>
</GetDetails>
</scraper>
I've been messing with the regex in GetSearchResults, but keep getting the same results: none! I've been assuming it's the regex, but I'm not positive. The .xml file opens in Firefox with no problems, so it shouldn't be an XML error. The film is listed at the website, so it's not a matter of having nothing to return.
Any suggestions?