Clean scraping API
#76
Sorry for interrupt. I didn't read all topic but maybe you guys want to check this for some new ideas.
https://github.com/wackou/guessit
Reply
#77
(2013-04-11, 16:58)queeup Wrote: Sorry for interrupt. I didn't read all topic but maybe you guys want to check this for some new ideas.
https://github.com/wackou/guessit

nice find! I bet there is tons we can borrow from that!
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#78
Good, then I will add one more for video metadata.
https://github.com/Diaoul/enzyme
Reply
#79
stop making our lives easier
Reply
#80
Believe me I was waiting this python scraper thing almost two years and finally it's happening. Well done. Bad thing is I saw this topic today. Shame on me :(
Reply
#81
Wink kickass links
Reply
#82
Since this thread has gotten so much heat as of lately I want to start a discussion on something I simply need some discussion on Smile

The discussion is regarding issue #7 #9 and semi related is #8.

The problem is not really the scheduling algorithms (they would need some love but in essence they should work) but more how to reorganize the API of supplies and demands.

Basically what we arrive at IMO is a subgraph find and alteration problem, which we in essence had before but with a single node (subject) and its edge.

So what I envision is something along the lines of
demands: find A where edge(A, owl.sameAs, B) and (B is URL or edge(B, dc.identifier))

As this would allow for this type of owl.sameAs
Code:
{
  owl.sameAs: [
    "http://themoviedb.org/movie/544",
    {
       dc.identifier: [ "http://www.imdb.com/title/tt0372784" ],
       foaf.thumbnail: [ "http://www.imdb.com/media/rm955554048/tt0372784?ref_=tt_ov_i" ]
    }
  ]
}

But I can't find a nice way to produce the above query in python, and in a pythonic way.

I'd love it if the demand and supply API was similair aswell, and provided some validation on the output aswell.

ATM a task can state it outputs a certain edge and nothing else but when run it can output anything Smile This could potentially break scheduling. So I'd love it if the task missbehave heimdall is able to detect that and just throw away the result Smile

Cheers,
Tobias
If you have problems please read this before posting

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Image

"Well Im gonna download the code and look at it a bit but I'm certainly not a really good C/C++ programer but I'd help as much as I can, I mostly write in C#."
Reply
#83
(2013-05-09, 19:49)The Movie Database Wrote:Searching is an important tool for a project like TMDb. Without a good search we end up with duplicates, frustrated users and quite frankly a less than stellar experience. Over the past few years we've had a lot of things change, especially with the amount of non-English content that has been added to our database. We've also grown a lot and our old search infrastructure simply wasn't up for the task.

Starting yesterday, we rolled out a completely brand new, built from scratch search that we feel very proud of. We're not saying it's going to be perfect but it's a foundation we can feel confident growing into.

Along with these improvements behind the scenes, we also added two new options to search with. 'primary_release_year' and 'search_type' are new. You can read about how these work by visiting our search documentation.

http://docs.themoviedb.apiary.io/#search

As always, if you notice any specific issues make sure to head over to our support area and let us know.

One last thing, we also released more than just a new search, as we have brought the idea behind our 2.1 "Movie.browse" method into v3 but made it considerably better. We've renamed it "discover" and it's pretty awesome. You can read more about it by visiting our API documentation.

http://docs.themoviedb.apiary.io/#discover
From their facebook page: https://www.facebook.com/themoviedb

It looks like they've been working heavily on the search issue as well. With a search engine on their end so heavily optimized in the domain of movies, I'm imagining how much thinking we're going to need to put in to actually contribute anything statistically significant to their results.
Reply
#84
Error results on our end. While they have the definitions developed on their end we need the application of terms. Basically we want to sort our false results and possible fixing the erroneous result so it is correct now and remembers the corrected ID. Also, to know when to search and when not to.

(2013-05-09, 23:05)garbear Wrote: It looks like they've been working heavily on the search issue as well. With a search engine on their end so heavily optimized in the domain of movies, I'm imagining how much thinking we're going to need to put in to actually contribute anything statistically significant to their results.
Reply

Logout Mark Read Team Forum Stats Members Help
Clean scraping API3