(2012-12-07, 07:27)DarkKnight Wrote: Even in 1308, some of the movies scrape fine on auto, and some choose the wrong one. Although Dan has changed the http request to imdb somewhat, I'm not seeing a difference in the end result. Resident evil scrapes fine, Ice-age chose 'Projector" Ice-Age'
Also, I don't really see why someone would upload their release to a "fremium" host when Github hosts the files free without issue. I don't like dealing with ad clogged pay-per-click services, so I uploaded Dan's version to my github repo: https://github.com/downloads/Darkknight3...7_fix1.rar
Maybe when Dan posts his source, we might make some further progress tuning the auto movie scrape process.
For me it works perfectly (tested with autoscraping).
ice-age.avi = Ice Age (2002)
Race to Witch Mountain.avi = Race to Witch Mountain (2009)
I did not know that you also can host files on Github.
(2012-12-07, 10:00)rodercot Wrote: 2nd - Not scraping TV shows properly. I mentioned this with the 1308 fix as well.
At least with the 1307 fix version I can see and change the tv sources under setting - tvshows. but it does not scrape properly on any TV shows. I am still using 1307 and adding manually adding the imdb selection. I just test the new version as a fresh install and let it rebuild the library from scratch.
THX,
Dave
TV shows scraping also works without problems.
If you've reinstalled Ember you need rescrape ANY tv-show once again, so Ember rebuilds his database.
I made the following changes in the source code:
Code:
Private Const TABLE_PATTERN As String = "<table.*?>\n?(.*?)</table>"
.........
Private Function SearchMovie(ByVal sMovie As String) As MovieSearchResults
Try
Dim D, W As Integer
Dim R As New MovieSearchResults
Dim sHTTP As New HTTP
Dim HTML As String = sHTTP.DownloadData(String.Concat("http://", IMDBURL, "/find?q=", Web.HttpUtility.UrlEncode(sMovie, System.Text.Encoding.GetEncoding("ISO-8859-1")), "&s=all"))
Dim HTMLm As String = sHTTP.DownloadData(String.Concat("http://", IMDBURL, "/find?q=", Web.HttpUtility.UrlEncode(sMovie, System.Text.Encoding.GetEncoding("ISO-8859-1")), "&s=tt&ttype=ft&ref_=fn_ft"))
Dim HTMLe As String = sHTTP.DownloadData(String.Concat("http://", IMDBURL, "/find?q=", Web.HttpUtility.UrlEncode(sMovie, System.Text.Encoding.GetEncoding("ISO-8859-1")), "&s=tt&ttype=ft&exact=true&ref_=fn_tt_ex"))
Dim rUri As String = sHTTP.ResponseUri
sHTTP = Nothing
'Check if we've been redirected straight to the movie page
If Regex.IsMatch(rUri, IMDB_ID_REGEX) Then
Dim lNewMovie As MediaContainers.Movie = New MediaContainers.Movie(Regex.Match(rUri, IMDB_ID_REGEX).ToString, _
StringUtils.ProperCase(sMovie), Regex.Match(Regex.Match(HTML, MOVIE_TITLE_PATTERN).ToString, "(?<=\()\d+(?=.*\))").ToString, 0)
R.ExactMatches.Add(lNewMovie)
Return R
End If
'D = HTML.IndexOf("<b>Popular Titles</b>")
D = HTML.IndexOf("</a>Titles</h3>")
If D <= 0 Then GoTo mPartial
W = HTML.IndexOf("</table>", D) + 8
Dim Table As String = Regex.Match(HTML.Substring(D, W - D), TABLE_PATTERN).ToString
Dim qPopular = From Mtr In Regex.Matches(Table, TITLE_PATTERN) _
Where Not DirectCast(Mtr, Match).Groups("name").ToString.Contains("<img") AndAlso Not DirectCast(Mtr, Match).Groups("type").ToString.Contains("VG") _
Select New MediaContainers.Movie(GetMovieID(DirectCast(Mtr, Match).Groups("url").ToString), _
Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString), Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("year").ToString), StringUtils.ComputeLevenshtein(StringUtils.FilterYear(sMovie).ToLower, StringUtils.FilterYear(Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString)).ToLower))
R.PopularTitles = qPopular.ToList
mPartial:
'D = HTML.IndexOf("Titles (Partial Matches)")
D = HTMLm.IndexOf("</a>Titles</h3>")
If D <= 0 Then GoTo mApprox
W = HTMLm.IndexOf("</table>", D) + 8
Table = Regex.Match(HTMLm.Substring(D, W - D), TABLE_PATTERN).ToString
Dim qpartial = From Mtr In Regex.Matches(Table, TITLE_PATTERN) _
Where Not DirectCast(Mtr, Match).Groups("name").ToString.Contains("<img") AndAlso Not DirectCast(Mtr, Match).Groups("type").ToString.Contains("VG") _
Select New MediaContainers.Movie(GetMovieID(DirectCast(Mtr, Match).Groups("url").ToString), _
Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString), Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("year").ToString), StringUtils.ComputeLevenshtein(StringUtils.FilterYear(sMovie).ToLower, StringUtils.FilterYear(Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString)).ToLower))
R.PartialMatches = qpartial.ToList
mApprox:
'Now process "Approx Matches" and merge both Partial and Approx matches
D = HTML.IndexOf("Titles (Approx Matches)")
If D <= 0 Then GoTo mExact
W = HTML.IndexOf("</table>", D) + 8
Table = Regex.Match(HTML.Substring(D, W - D), TABLE_PATTERN).ToString
Dim qApprox = From Mtr In Regex.Matches(Table, TITLE_PATTERN) _
Where Not DirectCast(Mtr, Match).Groups("name").ToString.Contains("<img") AndAlso Not DirectCast(Mtr, Match).Groups("type").ToString.Contains("VG") _
Select New MediaContainers.Movie(GetMovieID(DirectCast(Mtr, Match).Groups("url").ToString), _
Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString), Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("year").ToString), StringUtils.ComputeLevenshtein(StringUtils.FilterYear(sMovie).ToLower, StringUtils.FilterYear(Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString)).ToLower))
If Not IsNothing(R.PartialMatches) Then
R.PartialMatches = R.PartialMatches.Union(qApprox.ToList).ToList
Else
R.PartialMatches = qApprox.ToList
End If
mExact:
'D = HTML.IndexOf("Titles (Exact Matches)")
D = HTMLe.IndexOf("</a>Titles</h3>")
If D <= 0 Then GoTo mResult
W = HTMLe.IndexOf("</table>", D) + 8
Table = String.Empty
Table = Regex.Match(HTMLe.Substring(D, W - D), TABLE_PATTERN).ToString
Dim qExact = From Mtr In Regex.Matches(Table, TITLE_PATTERN) _
Where Not DirectCast(Mtr, Match).Groups("name").ToString.Contains("<img") AndAlso Not DirectCast(Mtr, Match).Groups("type").ToString.Contains("VG") _
Select New MediaContainers.Movie(GetMovieID(DirectCast(Mtr, Match).Groups("url").ToString), _
Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString.ToString), Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("year").ToString), StringUtils.ComputeLevenshtein(StringUtils.FilterYear(sMovie).ToLower, StringUtils.FilterYear(Web.HttpUtility.HtmlDecode(DirectCast(Mtr, Match).Groups("name").ToString)).ToLower))
R.ExactMatches = qExact.ToList
mResult:
Return R
Catch ex As Exception
Master.eLog.WriteToErrorLog(ex.Message, ex.StackTrace, "Error")
Return Nothing
End Try
End Function
For popular matches:
http://akas.imdb.com/find?q=ice+age&s=all
For more/partial matches:
http://akas.imdb.com/find?q=ice+age&s=tt...ref_=fn_ft
For exact matches:
http://akas.imdb.com/find?q=ice+age&s=tt..._=fn_tt_ex
Currently this is the only way to get useful results from IMDB.