spiff Wrote:2) i don't get what you mean here? nfourl ultimately returns an url?
there are some that return a string
some that return a <url> element
and some that return <url> element and a <id> element (mind you not enclosed in a tag either just as siblings which makes it difficult to hand with .Net xml handlers)
spiff Wrote:1) we fill up the buffers starting with 1, second url is stuffed in 2 and so on. this semantic is valid everywhere, for those functions which are passed e.g. title in buffer 2, it should be "the buffer after the last url"
every time i think i've got this stuff pegged there's something new that i have to completely redesign for..... well not completely redesign in factwith the way i have things now, there only a small section of each Content handler that needs to be redesigned.
However, with multiple urls what do you use for the url value if there needs to be a url value in the buffer.
there are a whole lot of things that are too often variable..., kinda think things could have a more rigid standard and still be flexible. (And this would make the code more portable as well), but that would be too easy and i would have ben done with this thing ages ago.. then what would i do
ideas to consider
?
1) For the multiple url's i think it would be a better idea to run the get details function against each (i.e. download one page then run getdetails download the next page then run getdetails again, so forth and so on so that the only buffer needed for html is $$1, then run the custom functions after all urls referred to are finished
Edit: I just thought of a few complications this might cause if the different pages have matches for something they shouldn't match. Perhaps have them run as custom functions? in fact that's what i thought custom functions were for.
2. To make the scraper more flexible: Use buffer $$1 and $$2 for storing values only - buffer $$2 holding the item to retrieve from the last (standard, not custom) function.
i.e CreateSearchUrl's results are passed to $$2(or in light of what you just told me about the chaining to buffers ... the next buffer after the last HTML) for use with GetSearchResults, the <entity> to retrieve from GetSearchResults is passed to $$2 on select to be used with GetDetails
in this way the user can gather whatever info they want in the GetSearchResults function, and reuse it in the GetDetails function, Also all neccessary url's Id's and titles will also be there for the purpose of use
3. add a urlencode option to RegExp or expression so that it will be possible to pull things like artist name or title and UrlEncode them for fanart and thumbnail searches.
Code:
<RegExp input="$$2" output="\1" urlencode="yes" dest="3">
<expression clear="yes"><artist>(.+)</artist></expression>
</RegExp>
4. Limit NfoUrl returns to either a <entity> Element (containing things like id, title, url elements) or an <url> Element since its value is used directly in GetDetails
Code:
<NfoUrl dest="4">
<RegExp input = "$$3" output=""<entity>\1"</entity> dest = $$4>
<RegExp input="$$1" output=<url>http://www.\1/title/tt\2/</url><id>tt\2</id>" dest="3">
<expression clear="yes" noclean="1">(imdb.com/)Title\?([0-9]*)</expression>
</RegExp>
<expression noclean="1">(.+)</expression>
</RegExp>
</NfoUrl >
or
Code:
<NfoUrl dest="4">
<RegExp input="$$1" output=<url>http://www.\1/title/tt\2/</url>" dest="4">
<expression clear="yes" noclean="1">(imdb.com/)Title\?([0-9]*)</expression>
</RegExp>
</NfoUrl >
Because most of the problems I'm running into in scraperXML is that there doesn't seem to be an across the board standard,
Each scraper type has the different options it needs to be set to buffers, by using just $$2 to hold values each scraper needs
as a standard, you can use the same code for all scrapers, except in "CreateSearchUrl" where it will still be neccessary to pass
values specific to each scraper. the url encode option can handle setting things that need to be url encoded for customfunction