Login at Kodi Home

UsagiYojimbo · (This post was last modified: 2013-03-05, 00:40 by UsagiYojimbo.)

22 months after Nicezia's last response on the Scraper Editor (Based on ScraperXML open source C# Library) thread, I put together a similar editor in Java.
It runs under Java 1.6, but that is all the requirements list...

Project page on SourceForge, with description, screen-shots, and download links. Also available on SoftPedia.

I thought it needed a standalone topic, so I opened one.

Below are a list of changes:

Code:
v 0.1.2.66 usagi @ 2013-03-04

- XML syntax highlight in output fields

  (The colors used: <tags>     : cyan

                    attribute  : magenta

                    attr value : blue

                    CDATA      : gray

                    &entity;   : green

                    &inv.entity: red)

v 0.1.2.65 usagi @ 2013-03-03

- Number of buffers changed to 20

- Added comments to the engine, and other classes

- Added a new scraping method to the engine

- Changed FunctionList to not replace existing functions (EG:<include>)

- Adding setting for Scraper folder

- Corrected multiple logging if reused same debugger window

- Minor changes and fixes

v 0.1.2.60 usagi @ 2013-02-06

- Message box on open/save errors

- <include> tags:

  - (Re)loading on scraping/debugging

  - Dropping on closing the debugger window

  - TODO editing includes

  - TODO mapping include folders

- URL encoding input text

- Separate input field for year

v 0.1.2.56 usagi @ 2013-02-03

- Issues with the "cs" attribute

v 0.1.2.55 usagi @ 2013-01-30

- Minor logging error, that broke the debugging process

- JRE 1.6 vs 1.7 (JAXB 2.0/2.1 vs 2.2) issue

v 0.1.2.51 usagi @ 2012-12-19

- GUI Threading fix(?)

- Some minor changes

v 0.1.2.50 usagi @ 2012-11-28

- Added Library support: <scraperfunctions>

v 0.1.2.49 usagi @ 2012-11-27

- Implemented debugger feature

- Some minor improvements

v 0.1.2.45 usagi @ 2012-11-18

- Added debugger interface

- Showing execution progress

- Added $$variable to output, too

- Added configurable values

- Added $INFO to access values

- Added caseSensitive attribute

v 0.1.2.42 usagi @ 2012-11-13

- Added Move Up/Down to RegExp's context menu

- Added Rename to Function's context menu

- Added click on matches event in RegExp tester

v 0.1.2.39 usagi @ 2012-11-12

- Some bug-fixes

v 0.1.2.38 usagi @ 2012-11-11

- Dropped Log4J, returned to java.util.logging

- Grouping of undo edits

- Some minor improvements

- Dropped global Edit menu

v 0.1.2.29 usagi @ 2012-11-08

- Added some editing functions to the text components:

  - Editing actions:     cut, copy, paste, select all, delete

  - Simple undo actions: undo, redo

  - Context menu and key bindings to those actions

- Some minor improvements

- On my TODO list:

  + Grouping of undo edits

  - Implementing global Edit menu

  - Exporting as an Add-On

v 0.1.2.28 usagi @ 2012-10-28

- Added Drag'n'Drop support:

  - Normal Drag: RegExp exchanged with another RegExp

  - Shift + Drag: RegExp copied over another RegExp

- Added Duplicate menu-item to RegExp's context menu

v 0.1.1.25 usagi @ 2012-10-26

- Added RegExp numbering by execution order

- Removed change tracking attempts

- Some minor improvements

- Added CHANGELOG file

v 0.1.1.22 usagi @ 2012-10-22

- Some minor improvements

v 0.1.1.18 usagi @ 2012-10-16

- Added tester

- Some minor improvements

v 0.1.1.7 usagi @ 2012-10-11

- Added Hungarian translation

- Some minor changes

v 0.1.1.6 usagi @ 2012-10-10

- Created project on SF.net

- Initial SVN commit

- Some minor changes

Known issues / planned features:

Code:
- Due to lack of documentation many features missing.

- Engine is not working the same way real XBMC does, as there are features of XBMC that contradict the documentation.

  - Some Scrapers do not return valid XML (EG: tmdb.xml)

  - Some Scrapers use variable numbers above 10, while documentation name only 9 variables (1-9).

  - 

- Currently the result of the scraping does not get processed, just displayed. So no functions get called.

-

UsagiYojimbo · (This post was last modified: 2012-11-17, 00:26 by UsagiYojimbo.)

(2012-11-10, 13:35)takoi Wrote: - When using the "check scraper" menu to create new functions, the tree list is not updated until you create another one via right-click

Fixed.

(2012-11-10, 13:35)takoi Wrote: - Removing functions does not work

Fixed.

(2012-11-10, 13:35)takoi Wrote: - The "expression" box is cleared if you click on a regexp with empty expression

And why is that a problem?

(2012-11-10, 13:35)takoi Wrote: - When browsing for files, there's no way to enter a hidden folder

Fixed.

(2012-11-10, 13:35)takoi Wrote: - $INFO[language] etc, $$n and %20 in the output attribute are not substituted

The wiki pages Scrapers and HOW-TO: Write media scrapers do not mention that $$variables should be substituted in output. Neither are $INFO and %20 mentioned.
If you can give me some specification that describe these, i will implement them.

(2012-11-14, 21:33)spiff Wrote: $INFO[foo] reads the string setting from resources/settings.xml (or more likely, the user data equivalent). it is replaced by the string value prior to a regexp execution, as well as in an output string.

i'm not entirely sure what @takoi refers to when he mentions %20 (i.e. url encoding) as there is no decoding going on in the scraper parser. the closest is the cleaning operation which is applied by default (unless noclean is specified for a buffer) which strips any html tags and trim whitespace. you also have the 'fixchars' attribute which replaces html entities by their unicode equivalent, e.g. & -> &

$$n should be replaced in both expression and output strings. it's the *content* of buffer n at the time of evaluation. i mention time of evalution cause if an expression is run with a repeat, it is applied *before* the loop so you should not go and recursively replace (in case the output buffer is the same..)

I will look into them...

$$n is added.

UsagiYojimbo · 2012-11-19, 07:30

New version is out...

UsagiYojimbo · 2012-11-28, 00:05

New release is out: Added debugger....

Daniel Malmgren · 2012-11-28, 21:35

I don't know how much is finished and supposed to work (didn't find any list of known issues), but I've got some things to report:

Not much of the scraper details are fetched from my scraper. It shows framework version and date, the rest of the fields shows up empty.

The Check Scraper function doesn't work at all, guess it's wip?

The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement

The condition for regexp's are always greyed out.

With that said, this is a nice project. I like editing my stuff in text editor, but it's easier to look at it in a gui like this without all the html encoding :-)

/Daniel

flobbes · 2012-11-28, 22:39

I like it alot as well.

Finally a scraper editor that is running under linux.

The scraper tester isn't working for me either.

Keep up the good work and thanks for program so far!

UsagiYojimbo · (This post was last modified: 2012-11-29, 01:00 by UsagiYojimbo.)

(2012-11-28, 21:35)Daniel Malmgren Wrote: I don't know how much is finished and supposed to work (didn't find any list of known issues), but I've got some things to report:

It is in Alpha/Pre-Beta state. There is no list of known issues.

(2012-11-28, 21:35)Daniel Malmgren Wrote: Not much of the scraper details are fetched from my scraper. It shows framework version and date, the rest of the fields shows up empty.

Could You provide me with such a scraper? As my Scrapers, and the XBMC core Scrapers work well.

(2012-11-28, 21:35)Daniel Malmgren Wrote: The Check Scraper function doesn't work at all, guess it's wip?

It had some bugs, I corrected them (all, I hope).

(2012-11-28, 21:35)Daniel Malmgren Wrote: The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement

This error is not related to the Tester. It is happened because You opened a Library instead of a Scraper. From next release (v 0.1.2-50 and on), ScraperEdit supports Libraries, too.

(2012-11-28, 21:35)Daniel Malmgren Wrote: The condition for regexp's are always greyed out.

Yes, this is because I did not found any documentation about its function or use. I just saw it in ScraperEditor (the .Net/Mono app).

(2012-11-28, 21:35)Daniel Malmgren Wrote: With that said, this is a nice project. I like editing my stuff in text editor, but it's easier to look at it in a gui like this without all the html encoding :-)

Thank You! Blush

opperpanter · (This post was last modified: 2012-12-15, 22:19 by opperpanter.)

Looks like a nice tool.

I was trying to open imdb.xml from xbmc scraper. But nothing is shown in ScraperEdit.

Is this because the imdb.xml uses other functions/names compared to the default ones like <NfoUrl> etc?

Here's the file I am talking about: https://github.com/akuiraz/xbmc-official...m/imdb.xml

EDIT: never mind. I hadn't downloaded the file correctly from github (save as doesn't work when browsing source).

daytooner · 2012-12-17, 04:44

I'm using this on a linux box (Fedora 17 - 64 bit). Some of the gui stuff doesn't seem to be working quite right. In particular, in the tester/debugger, if I click on any of the number buttons (I assume they are for the variables), then the entire app freezes. In the java terminal output, there is this message:

Quote: at hu.yvs.xbmc.scraper.tester.ScraperDebugger.scrape(ScraperDebugger.java:122)
at hu.yvs.xbmc.scraper.tester.ScraperDebugger.run(ScraperDebugger.java:64)

2012-12-16 18:40:44 [INFO] hu.yvs.xbmc.scraper.tester.ScraperTesterDlg actionPerformed: src = javax.swing.JButton[,2,0,36x50,alignmentX=0.0,alignmentY=0.5,border=javax.swing.plaf.synth.SynthBorder@5473b9e,flags=288,maximumSize=,minimumSize=,preferredSize=,defaultIcon=,disabledIcon=,disabledSelectedIcon=,margin=javax.swing.plaf.InsetsUIResource[top=0,left=0,bottom=0,right=0],paintBorder=true,paintFocus=true,pressedIcon=,rolloverEnabled=true,rolloverIcon=,rolloverSelectedIcon=,selectedIcon=,text=1,defaultCapable=true]
2012-12-16 18:40:44 [FINE] hu.yvs.xbmc.scraper.tester.ScraperTesterDlg actionPerformed: null

and a gray box in the center of the screen is left there. The only way to get rid of it, and the frozen app, is to kill the java process.

I don't know if the problem is with the version of java I am using - it is Iced-Tea, not the official version from Oracle. If needed, I can install that version of java and try it.

This is a great utility. Makes life just that much easier. Thx.

ken

UsagiYojimbo · (This post was last modified: 2012-12-21, 19:57 by UsagiYojimbo.)

(2012-12-17, 04:44)daytooner Wrote: In particular, in the tester/debugger, if I click on any of the number buttons (I assume they are for the variables), then the entire app freezes. In the java terminal output, there is this message: ...
and a gray box in the center of the screen is left there. The only way to get rid of it, and the frozen app, is to kill the java process.

I don't know if the problem is with the version of java I am using - it is Iced-Tea, not the official version from Oracle. If needed, I can install that version of java and try it.

This is a great utility. Makes life just that much easier. Thx.

Thank You!
Yes, the numbered buttons should display the content of the variables.

The problem seems to be in threading of the different virtual machines used by the different Java distros... (I use Sun/Oracle JDK on Windows 32)

I put in a "hack," that I think should correct this problem. Try v0.1.2-51, and please, report back, whether it helped, or not.

PS:
Use [code] instead of [quote] for long listings, please.

beamer145 · (This post was last modified: 2013-01-18, 01:48 by beamer145.)

(2012-11-29, 00:59)UsagiYojimbo Wrote:
(2012-11-28, 21:35)Daniel Malmgren Wrote: The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement
This error is not related to the Tester. It is happened because You opened a Library instead of a Scraper. From next release (v 0.1.2-50 and on), ScraperEdit supports Libraries, too.

I just tried opening and debugging tmdb.xml with version 0.1.2-51 and I still have this problem (windows 7 x64)

But I have not played with this tool before so maybe I am doing something wrong:
[1] I just open tmdb.xml
[2] Select Scraper in the tree (or should I select an individual function ?)
[3] I select Tools/Scraper debugger
[4] I enter something in the 'title' field (not really sure what this field is meant for, eg is this what will be passed in $$1 for CreateSearchUrl ? (but what about $$2 ?), should it be a filename, a complete path, something else ? )
[5] I select debug
[6] I get Initializing/Debugger Setup/ Some checks and then the error.

Edit : I also do not know what you mean by the concept of 'library' here

UsagiYojimbo · 2013-01-21, 17:15

(2013-01-18, 01:47)beamer145 Wrote: I just tried opening and debugging tmdb.xml with version 0.1.2-51 and I still have this problem (windows 7 x64)
...
Edit : I also do not know what you mean by the concept of 'library' here

Library is a scraper file, that has no <scraper> tag, but a <scraperfunctions> tag instead. This tag has no attributes, but can contain any functions, just like the <scraper> tag can.

Which "tmdb.xml" did you used? Is it "metadata.common.themoviedb.org/tmdb.xml" (library) or "metadata.themoviedb.org/tmdb.xml" (scraper)?

What Java version are you using?

beamer145 · 2013-01-21, 21:30

It was the scraper ( "metadata.themoviedb.org/tmdb.xml" ).

After upgrading to jdk-7u11 the problem seems solved.

Thanks for the tip !

Other remarks/questions after a quick try:

- Unfortunately at the moment XMBC does a ToLower on the filename string before passing it to the scraper in $$1 for CreateSearchUrl (cs only influences the regexp, not the input string). Your app leaves the case intact.

- When I select my CreateSearchUrl and run Scrape or Debug, he seems to stop after my second nested regexp. No errors are reported and the dest buffer of this second regexp is not yet filled in (button remains grayed out). I am not sure what is going on, I suppose it should have continued all the way ?

If you want to try for yourself, replace the CreateSearchUrl in the tmdb.xml scraper by the thing below. He stops after <RegExp input="$$1" output="\1\2" dest="5">, and 5 is not filled in....

<CreateSearchUrl dest="3">
<RegExp input="$$9" output="<url>http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=\1&amp;year=$$4&amp;language=$INFO[language]</url>" dest="3">
<RegExp input="$$1" output="\1" dest="4">
<expression noclean="1" clear="yes">.*_([0-9][0-9][0-9][0-9])</expression>
</RegExp>
<RegExp input="$$1" output="\1\2" dest="5">
<expression cs="yes" noclean="1" clear="yes">(.*)_[0-9][0-9][0-9][0-9]|(.*)</expression>
</RegExp>

<RegExp input="$$5" output="\1%20" dest="6">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([^_]*)_*</expression>
</RegExp>

<RegExp input="$$6" output="\1 \2 \3_\4 \5_\6 \7_" dest="7">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([a-z]*)([A-Z]*)([A-Z][a-z])|([a-z]+)([A-Z]+)|([A-Za-z]*)([^A-Za-z]*)</expression>
</RegExp>


<RegExp input="$$7" output="\1\2" dest="8">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([^_]*) _|([^_]*)_</expression>
</RegExp>


<RegExp input="$$8" output="\1%20" dest="9">
<expression cs="yes" noclean="8" repeat="yes" clear="yes">[ ]*([^ ]+)</expression>
</RegExp>

<expression noclean="9" />
</RegExp>
</CreateSearchUrl>

( This one converts MovieNameInCamelCase_YEAR formatted folders to %20 seperated words, it works but unfortunately you need to rebuild XBMC with the unwanted ToLower operation on the file name commented out for it to work, but this is not a problem for your scraper debugger)

UsagiYojimbo · 2013-01-23, 10:46

(2013-01-21, 21:30)beamer145 Wrote: After upgrading to jdk-7u11 the problem seems solved.
...
Your app leaves the case intact.

What Java version did you use before the upgrade?

Yes, XBMC cleans up the media-filename to create a search string from it. (Removes words like DVD, DC, Rip, DivX, etc, and convert to lower case,.) ScraperEdit do nothing like this, deliberately.

UsagiYojimbo · (This post was last modified: 2013-01-30, 01:21 by UsagiYojimbo.)

(2013-01-23, 10:46)UsagiYojimbo Wrote:
(2013-01-21, 21:30)beamer145 Wrote: After upgrading to jdk-7u11 the problem seems solved.
What Java version did you use before the upgrade?

Just ran a test on the JRE of JDK 1.6 (Win, 64 bits), and got the problem.

Code:
java version "1.6.0_21"

Java(TM) SE Runtime Environment (build 1.6.0_21-b07)

Java HotSpot(TM) Client VM (build 17.0-b17, mixed mode, sharing)

I would check this incompatibility issue between JRE's 1.6 and 1.7...