New to scraper development, help please
#1
Hi

Im have installed apache/mysql/php on my media server (192.168.0.10), and have a script that responds to (for example)

http://192.168.0.10/search.php?videoID=tt4638525

with an xml like this

Code:
<?xml version="1.0" encoding="UTF-8"?>
    <movie>
    <details>
    <title></title>
    <year></year>
    <director></director>
    <top250></top250>
    <mpaa></mpaa>
    <tagline></tagline>
    <runtime></runtime>
    <thumb></thumb>
    <credits></credits>
    <rating></rating>
    <votes></votes>
    <genre></genre>
    <actor>
        <name></name>
        <role></role>
    </actor>
    <outline></outline>
    <plot></plot>
</details>
</movie>

This is how far i have got with the scraper - but i dont think its right - I cant get my head around RegEx's at all.

Code:
<scraper name="LocalMedia" content="movies" thumb="LocalMedia.gif">

  <NfoUrl dest="3">
    <RegExp input="$$1" output="http://192.168.0.10/search.php?videoID=/1"  dest="3">
      <expression noclean="1">192.168.0.10/(.*)</expression>
    </RegExp>
  </NfoUrl>

  <CreateSearchUrl>
    <RegExp>
      <expression></expression>
    </RegExp>
  </CreateSearchUrl>

   <GetSearchResults>
      <RegExp>
         <expression></expression>
      </RegExp>
   </GetSearchResults>

   <GetDetails>
      <RegExp>
         <expression></expression>
      </RegExp>
   </GetDetails>

</scraper>

any help appreciated please Nod
Reply
#2
I don't know why you want to make this way and I am not really good at regex as well.
It should be really simple to do.


Code:
<GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="5">
                <expression noclean="1">&lt;title&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="5+">
                <expression noclean="1">&lt;year&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
</GetDetails>

You just copy and change rest of them same way.


I don't know how your search.php output search results. I can not help on that.
Reply
#3
afaict the response is almost the xbmc format. if so;

Code:
<GetDetails dest="3">
  <RegExp input="$$1" output="\1\2" dest="3">
    <expression noclean="1">(.*)&lt;movie&gt;(.*)&lt;/movie&gt;</expression>
  </RegExp>
</GetDetails>
does the job. if you ditch the movie tag in the output you can do
Code:
<GetDetails dest="3">
  <RegExp input="$$1" output="\1" dest="3">
    <expression noclean="1"/>
  </RegExp>
</GetDetails>

as for nfourl that is for recognizing a url in a .nfo file, something like
Code:
<NfoUrl dest="3">
  <RegExp input="$$1" output="\1" dest="3">
    <expression>(http://192.168.0.10/.*)</expression>
  </RegExp>
</NfoUrl>
does the job
Reply
#4
Wow way better and simple ;P
Reply
#5
wow, thanks for the quick responses.

The reason im wanting to do it this way is to keep all my xbox's syncronised.

ive got a scraper that i made in vb, which adds the details scraped to the mysql. My figuring is that my 100Mbps home network is WAAAAAAAAY faster than my broadband (0.5Mbps) so my scraper examines my media folders then scrapes anything is needed.

Now the files and data are stored locally on my highspeed network - and available to all the xbox's when they run their update, and hopefully they will all update much quicker.
Reply

Logout Mark Read Team Forum Stats Members Help
New to scraper development, help please0