2008-08-17, 22:18
while i never touched the filmaffinity scraper, all of this might not be 100 correct. however, since you want to use it as a reference...
\1 is NOT the contents of variable one, it is the first selection in the regexp - big diff. $$1 is the contents of variable 1. remember that we evaluate expressions in a LIFO order, meaning we evaluate the innermost expressions first.
"the second expression" (which will be the first one that gets evaluated) grabs the film id and stick it in buffer 7. however this one will only match IF we was redirected to a perfect match, i.e. we are on a film info page. the third expression will fill buffer 5 with this match IF we have one (the $$7 is used to indicate the contents of buffer 7, just like you do input="$$1" - the contents of buffer 1.
as i have always said the wiki is NOT an authorative documentation source. the <id> tag is an optional tag that entities may or may not fill. it is really handy on sites that uses id's to form urls etc.
the fourth expression is irrelevant, it does not help anything.
the fifth expression is the real meat if we are on a search page.
when that is done we return to the first expression to evaluate that. it takes the contents of buffer 5, selects it all and sticks a <results> tag around our matches. empty <expression> tags mean select anything in input
\1 is NOT the contents of variable one, it is the first selection in the regexp - big diff. $$1 is the contents of variable 1. remember that we evaluate expressions in a LIFO order, meaning we evaluate the innermost expressions first.
"the second expression" (which will be the first one that gets evaluated) grabs the film id and stick it in buffer 7. however this one will only match IF we was redirected to a perfect match, i.e. we are on a film info page. the third expression will fill buffer 5 with this match IF we have one (the $$7 is used to indicate the contents of buffer 7, just like you do input="$$1" - the contents of buffer 1.
as i have always said the wiki is NOT an authorative documentation source. the <id> tag is an optional tag that entities may or may not fill. it is really handy on sites that uses id's to form urls etc.
the fourth expression is irrelevant, it does not help anything.
the fifth expression is the real meat if we are on a search page.
when that is done we return to the first expression to evaluate that. it takes the contents of buffer 5, selects it all and sticks a <results> tag around our matches. empty <expression> tags mean select anything in input