Solved How to scrape 2 pages in one scraper/Merging two titles into customized <title>
#1
Sad 
Hello,

I am trying to create code, which merging and differentiate two titles into <title> tag. I need to search original title first, if exists, then search for local name, which will be as suffix: "original name (local name)". So far so good and no problem with dest+ syntax.

But! The site which I scraping doesnt have always original name listed. In that case, I would like to end with "local name" only. And that is the catch. I even cant create "local name (local name)" result in case that originalname wasnt found. The only result -when I didnt find original name" is " (local name)".

I tried:

Code:
<GetDetails clearbuffers="no" dest="3">
  <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">

    <!-- local name as dummy record -->
    <RegExp input="$$1" output="\1" dest="7">
      <expression noclean="1" trim="1">regex_localname</expression>
    </RegExp>

    <!-- try for original name - shouldnt overwrite, if no match, but in fact it does -->
    <RegExp input="$$1" output="\1" dest="7">
      <expression clear="no" noclean="1" trim="1">regex_originalname</expression>
    </RegExp>

    <!-- local name suffix -->
    <RegExp input="$$1" output=" (\1)" dest="7+">
      <expression noclean="1" trim="1">regex_localname</expression>
    </RegExp>

    <!-- merge to output -->
    <RegExp input="$$7" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
      <expression noclean="1"></expression>
    </RegExp>    

...
...

I tried to use another scraper, and call some method which found original name (GetTMDBTitleByIdChain@TMDB), but I cant get them work. It return whole <details><title>original name</title></details> string into dest=8, or not call at all.

Does anybody know, what I am doing wrong? Doesnt matter if I will use anoher scraper to find out Original name, or will have some way to if-else result of my original name scraping (in case its unsucessfull).

Thank in Advance

P.S.: I want this too titles, because I would like to see both names in Confluence edit skin. If I use <originaltitle> and <title>, I see only title. Besides, if originaltitle will be empty, I think I will hit the same problem as I described only on other place.
Reply
#2
Basically it would be sufficient enough to have working <url function> inside existing scrapper, which would allow me parse imdb site for example, and take only title into one of buffer, so I can working with it. Still no luck, imdb site isnt parsed at all, if I didnt encapsulate it inside two regex vlocks with <details> etc.
Reply
#3
functions can have a 'clearbuffers="no"' attribute on their tag. if this is there (and value is no), the contents of the buffers will not be cleared before the function is evaluated. this way you can pass info from the previous function in a random buffer. just make sure you don't use $$1
Reply
#4
Thank You very much! That parameter saves me. I can paste title from original buffer as parameter to function, which calls TMDB function gettitlebyID.. So it returns always TMDBTitle (localtitle) output. Thank you!
Reply
#5
post edited because it was hurting my eyes too much.
Reply

Logout Mark Read Team Forum Stats Members Help
How to scrape 2 pages in one scraper/Merging two titles into customized <title>0