ScraperEdit for XBMC (Java)
#1
Brick 
22 months after Nicezia's last response on the Scraper Editor (Based on ScraperXML open source C# Library) thread, I put together a similar editor in Java.
It runs under Java 1.6, but that is all the requirements list...

Project page on SourceForge, with description, screen-shots, and download links. Also available on SoftPedia.

I thought it needed a standalone topic, so I opened one.

Below are a list of changes:
Code:
v 0.1.2.66 usagi @ 2013-03-04
- XML syntax highlight in output fields
  (The colors used: <tags>     : cyan
                    attribute  : magenta
                    attr value : blue
                    CDATA      : gray
                    &entity;   : green
                    &inv.entity: red)

v 0.1.2.65 usagi @ 2013-03-03
- Number of buffers changed to 20
- Added comments to the engine, and other classes
- Added a new scraping method to the engine
- Changed FunctionList to not replace existing functions (EG:<include>)
- Adding setting for Scraper folder
- Corrected multiple logging if reused same debugger window
- Minor changes and fixes

v 0.1.2.60 usagi @ 2013-02-06
- Message box on open/save errors
- <include> tags:
  - (Re)loading on scraping/debugging
  - Dropping on closing the debugger window
  - TODO editing includes
  - TODO mapping include folders
- URL encoding input text
- Separate input field for year

v 0.1.2.56 usagi @ 2013-02-03
- Issues with the "cs" attribute

v 0.1.2.55 usagi @ 2013-01-30
- Minor logging error, that broke the debugging process
- JRE 1.6 vs 1.7 (JAXB 2.0/2.1 vs 2.2) issue

v 0.1.2.51 usagi @ 2012-12-19
- GUI Threading fix(?)
- Some minor changes

v 0.1.2.50 usagi @ 2012-11-28
- Added Library support: <scraperfunctions>

v 0.1.2.49 usagi @ 2012-11-27
- Implemented debugger feature
- Some minor improvements

v 0.1.2.45 usagi @ 2012-11-18
- Added debugger interface
- Showing execution progress
- Added $$variable to output, too
- Added configurable values
- Added $INFO to access values
- Added caseSensitive attribute

v 0.1.2.42 usagi @ 2012-11-13
- Added Move Up/Down to RegExp's context menu
- Added Rename to Function's context menu
- Added click on matches event in RegExp tester

v 0.1.2.39 usagi @ 2012-11-12
- Some bug-fixes

v 0.1.2.38 usagi @ 2012-11-11
- Dropped Log4J, returned to java.util.logging
- Grouping of undo edits
- Some minor improvements
- Dropped global Edit menu

v 0.1.2.29 usagi @ 2012-11-08
- Added some editing functions to the text components:
  - Editing actions:     cut, copy, paste, select all, delete
  - Simple undo actions: undo, redo
  - Context menu and key bindings to those actions
- Some minor improvements
- On my TODO list:
  + Grouping of undo edits
  - Implementing global Edit menu
  - Exporting as an Add-On

v 0.1.2.28 usagi @ 2012-10-28
- Added Drag'n'Drop support:
  - Normal Drag: RegExp exchanged with another RegExp
  - Shift + Drag: RegExp copied over another RegExp
- Added Duplicate menu-item to RegExp's context menu

v 0.1.1.25 usagi @ 2012-10-26
- Added RegExp numbering by execution order
- Removed change tracking attempts
- Some minor improvements
- Added CHANGELOG file

v 0.1.1.22 usagi @ 2012-10-22
- Some minor improvements

v 0.1.1.18 usagi @ 2012-10-16
- Added tester
- Some minor improvements

v 0.1.1.7 usagi @ 2012-10-11
- Added Hungarian translation
- Some minor changes

v 0.1.1.6 usagi @ 2012-10-10
- Created project on SF.net
- Initial SVN commit
- Some minor changes

Known issues / planned features:
Code:
- Due to lack of documentation many features missing.
- Engine is not working the same way real XBMC does, as there are features of XBMC that contradict the documentation.
  - Some Scrapers do not return valid XML (EG: tmdb.xml)
  - Some Scrapers use variable numbers above 10, while documentation name only 9 variables (1-9).
  -
- Currently the result of the scraping does not get processed, just displayed. So no functions get called.
-
Reply
#2
(2012-11-10, 13:35)takoi Wrote: - When using the "check scraper" menu to create new functions, the tree list is not updated until you create another one via right-click
Fixed.
(2012-11-10, 13:35)takoi Wrote: - Removing functions does not work
Fixed.
(2012-11-10, 13:35)takoi Wrote: - The "expression" box is cleared if you click on a regexp with empty expression
And why is that a problem?
(2012-11-10, 13:35)takoi Wrote: - When browsing for files, there's no way to enter a hidden folder
Fixed.
(2012-11-10, 13:35)takoi Wrote: - $INFO[language] etc, $$n and %20 in the output attribute are not substituted
The wiki pages Scrapers and HOW-TO: Write media scrapers do not mention that $$variables should be substituted in output. Neither are $INFO and %20 mentioned.
If you can give me some specification that describe these, i will implement them.
(2012-11-14, 21:33)spiff Wrote: $INFO[foo] reads the string setting from resources/settings.xml (or more likely, the user data equivalent). it is replaced by the string value prior to a regexp execution, as well as in an output string.

i'm not entirely sure what @takoi refers to when he mentions %20 (i.e. url encoding) as there is no decoding going on in the scraper parser. the closest is the cleaning operation which is applied by default (unless noclean is specified for a buffer) which strips any html tags and trim whitespace. you also have the 'fixchars' attribute which replaces html entities by their unicode equivalent, e.g. &amp; -> &

$$n should be replaced in both expression and output strings. it's the *content* of buffer n at the time of evaluation. i mention time of evalution cause if an expression is run with a repeat, it is applied *before* the loop so you should not go and recursively replace (in case the output buffer is the same..)
I will look into them...
$$n is added.
Reply
#3
New version is out...
Reply
#4
Thumbs Up 
New release is out: Added debugger....
Reply
#5
I don't know how much is finished and supposed to work (didn't find any list of known issues), but I've got some things to report:

Not much of the scraper details are fetched from my scraper. It shows framework version and date, the rest of the fields shows up empty.

The Check Scraper function doesn't work at all, guess it's wip?

The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement

The condition for regexp's are always greyed out.

With that said, this is a nice project. I like editing my stuff in text editor, but it's easier to look at it in a gui like this without all the html encoding :-)

/Daniel
Reply
#6
I like it alot as well.

Finally a scraper editor that is running under linux.

The scraper tester isn't working for me either.

Keep up the good work and thanks for program so far!
Reply
#7
(2012-11-28, 21:35)Daniel Malmgren Wrote: I don't know how much is finished and supposed to work (didn't find any list of known issues), but I've got some things to report:
It is in Alpha/Pre-Beta state. There is no list of known issues.

(2012-11-28, 21:35)Daniel Malmgren Wrote: Not much of the scraper details are fetched from my scraper. It shows framework version and date, the rest of the fields shows up empty.
Could You provide me with such a scraper? As my Scrapers, and the XBMC core Scrapers work well.

(2012-11-28, 21:35)Daniel Malmgren Wrote: The Check Scraper function doesn't work at all, guess it's wip?
It had some bugs, I corrected them (all, I hope).

(2012-11-28, 21:35)Daniel Malmgren Wrote: The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement
This error is not related to the Tester. It is happened because You opened a Library instead of a Scraper. From next release (v 0.1.2-50 and on), ScraperEdit supports Libraries, too.

(2012-11-28, 21:35)Daniel Malmgren Wrote: The condition for regexp's are always greyed out.
Yes, this is because I did not found any documentation about its function or use. I just saw it in ScraperEditor (the .Net/Mono app).

(2012-11-28, 21:35)Daniel Malmgren Wrote: With that said, this is a nice project. I like editing my stuff in text editor, but it's easier to look at it in a gui like this without all the html encoding :-)
Thank You! Blush

Reply
#8
Looks like a nice tool.

I was trying to open imdb.xml from xbmc scraper. But nothing is shown in ScraperEdit.

Is this because the imdb.xml uses other functions/names compared to the default ones like <NfoUrl> etc?

Here's the file I am talking about: https://github.com/akuiraz/xbmc-official...m/imdb.xml

EDIT: never mind. I hadn't downloaded the file correctly from github (save as doesn't work when browsing source).
Reply
#9
I'm using this on a linux box (Fedora 17 - 64 bit). Some of the gui stuff doesn't seem to be working quite right. In particular, in the tester/debugger, if I click on any of the number buttons (I assume they are for the variables), then the entire app freezes. In the java terminal output, there is this message:
Quote: at hu.yvs.xbmc.scraper.tester.ScraperDebugger.scrape(ScraperDebugger.java:122)
at hu.yvs.xbmc.scraper.tester.ScraperDebugger.run(ScraperDebugger.java:64)

2012-12-16 18:40:44 [INFO] hu.yvs.xbmc.scraper.tester.ScraperTesterDlg actionPerformed: src = javax.swing.JButton[,2,0,36x50,alignmentX=0.0,alignmentY=0.5,border=javax.swing.plaf.synth.SynthBorder@5473b9e,flags=288,maximumSize=,minimumSize=,preferredSize=,defaultIcon=,disabledIcon=,disabledSelectedIcon=,margin=javax.swing.plaf.InsetsUIResource[top=0,left=0,bottom=0,right=0],paintBorder=true,paintFocus=true,pressedIcon=,rolloverEnabled=true,rolloverIcon=,rolloverSelectedIcon=,selectedIcon=,text=1,defaultCapable=true]
2012-12-16 18:40:44 [FINE] hu.yvs.xbmc.scraper.tester.ScraperTesterDlg actionPerformed: null
and a gray box in the center of the screen is left there. The only way to get rid of it, and the frozen app, is to kill the java process.

I don't know if the problem is with the version of java I am using - it is Iced-Tea, not the official version from Oracle. If needed, I can install that version of java and try it.

This is a great utility. Makes life just that much easier. Thx.

ken




Reply
#10
(2012-12-17, 04:44)daytooner Wrote: In particular, in the tester/debugger, if I click on any of the number buttons (I assume they are for the variables), then the entire app freezes. In the java terminal output, there is this message: ...
and a gray box in the center of the screen is left there. The only way to get rid of it, and the frozen app, is to kill the java process.

I don't know if the problem is with the version of java I am using - it is Iced-Tea, not the official version from Oracle. If needed, I can install that version of java and try it.

This is a great utility. Makes life just that much easier. Thx.

Thank You!
Yes, the numbered buttons should display the content of the variables.

The problem seems to be in threading of the different virtual machines used by the different Java distros... (I use Sun/Oracle JDK on Windows 32)

I put in a "hack," that I think should correct this problem. Try v0.1.2-51, and please, report back, whether it helped, or not.

PS:
Use [code] instead of [quote] for long listings, please.
Reply
#11
(2012-11-29, 00:59)UsagiYojimbo Wrote:
(2012-11-28, 21:35)Daniel Malmgren Wrote: The Scraper Tester doesn't work, when trying to test anything at all I get [SEVERE] hu.yvs.xbmc.xml.addon.scraper.Function cannot be cast to javax.xml.bind.JAXBElement
This error is not related to the Tester. It is happened because You opened a Library instead of a Scraper. From next release (v 0.1.2-50 and on), ScraperEdit supports Libraries, too.


I just tried opening and debugging tmdb.xml with version 0.1.2-51 and I still have this problem (windows 7 x64)

But I have not played with this tool before so maybe I am doing something wrong:
[1] I just open tmdb.xml
[2] Select Scraper in the tree (or should I select an individual function ?)
[3] I select Tools/Scraper debugger
[4] I enter something in the 'title' field (not really sure what this field is meant for, eg is this what will be passed in $$1 for CreateSearchUrl ? (but what about $$2 ?), should it be a filename, a complete path, something else ? )
[5] I select debug
[6] I get Initializing/Debugger Setup/ Some checks and then the error.


Edit : I also do not know what you mean by the concept of 'library' here
Reply
#12
(2013-01-18, 01:47)beamer145 Wrote: I just tried opening and debugging tmdb.xml with version 0.1.2-51 and I still have this problem (windows 7 x64)
...
Edit : I also do not know what you mean by the concept of 'library' here

Library is a scraper file, that has no <scraper> tag, but a <scraperfunctions> tag instead. This tag has no attributes, but can contain any functions, just like the <scraper> tag can.

Which "tmdb.xml" did you used? Is it "metadata.common.themoviedb.org/tmdb.xml" (library) or "metadata.themoviedb.org/tmdb.xml" (scraper)?

What Java version are you using?
Reply
#13
It was the scraper ( "metadata.themoviedb.org/tmdb.xml" ).

After upgrading to jdk-7u11 the problem seems solved.

Thanks for the tip !

Other remarks/questions after a quick try:

- Unfortunately at the moment XMBC does a ToLower on the filename string before passing it to the scraper in $$1 for CreateSearchUrl (cs only influences the regexp, not the input string). Your app leaves the case intact.

- When I select my CreateSearchUrl and run Scrape or Debug, he seems to stop after my second nested regexp. No errors are reported and the dest buffer of this second regexp is not yet filled in (button remains grayed out). I am not sure what is going on, I suppose it should have continued all the way ?

If you want to try for yourself, replace the CreateSearchUrl in the tmdb.xml scraper by the thing below. He stops after <RegExp input="$$1" output="\1\2" dest="5">, and 5 is not filled in....

<CreateSearchUrl dest="3">
<RegExp input="$$9" output="&lt;url&gt;http://api.themoviedb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
<RegExp input="$$1" output="\1" dest="4">
<expression noclean="1" clear="yes">.*_([0-9][0-9][0-9][0-9])</expression>
</RegExp>
<RegExp input="$$1" output="\1\2" dest="5">
<expression cs="yes" noclean="1" clear="yes">(.*)_[0-9][0-9][0-9][0-9]|(.*)</expression>
</RegExp>
<!-- replace underscores by spaces -->
<RegExp input="$$5" output="\1%20" dest="6">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([^_]*)_*</expression>
</RegExp>
<!-- Split eg Bx,DF,900,Conqu,F,500,Est, ,Of,PQR,Qaradi,PPP,Pse,BFG,900. Remark: explanation of the extra _ in next step -->
<RegExp input="$$6" output="\1 \2 \3_\4 \5_\6 \7_" dest="7">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([a-z]*)([A-Z]*)([A-Z][a-z])|([a-z]+)([A-Z]+)|([A-Za-z]*)([^A-Za-z]*)</expression>
</RegExp>

<!-- In the previous step, only one of the subparts (x,y and z in x|y|z) of the regexp matched, the 2 others will have remained empty and introduced " _" after the new string. With the extra _ we can reliabely detect the unwanted spaces to remove them because the results of each run should be glued together and not separted by spaces resulting from the subparts that did not match. Note that there were no _ when we started cos we stripped all of them already, so it is safe to reintroduce them. Note: aarrgg each of the submatches (eg \1) are also stripped of leading/trailing whitespaces before they are put in the output buffer (spaces which we need to preserve here), but luckily this can be disabled with noclean. -->
<RegExp input="$$7" output="\1\2" dest="8">
<expression cs="yes" noclean="1,2,3,4,5,6,7,8,9" repeat="yes" clear="yes">([^_]*) _|([^_]*)_</expression>
</RegExp>

<!-- Replace remaining spaces by %20 for url compatiblity-->
<RegExp input="$$8" output="\1%20" dest="9">
<expression cs="yes" noclean="8" repeat="yes" clear="yes">[ ]*([^ ]+)</expression>
</RegExp>

<expression noclean="9" />
</RegExp>
</CreateSearchUrl>

( This one converts MovieNameInCamelCase_YEAR formatted folders to %20 seperated words, it works but unfortunately you need to rebuild XBMC with the unwanted ToLower operation on the file name commented out for it to work, but this is not a problem for your scraper debugger)
Reply
#14
(2013-01-21, 21:30)beamer145 Wrote: After upgrading to jdk-7u11 the problem seems solved.
...
Your app leaves the case intact.

What Java version did you use before the upgrade?

Yes, XBMC cleans up the media-filename to create a search string from it. (Removes words like DVD, DC, Rip, DivX, etc, and convert to lower case,.) ScraperEdit do nothing like this, deliberately.
Reply
#15
(2013-01-23, 10:46)UsagiYojimbo Wrote:
(2013-01-21, 21:30)beamer145 Wrote: After upgrading to jdk-7u11 the problem seems solved.
What Java version did you use before the upgrade?

Just ran a test on the JRE of JDK 1.6 (Win, 64 bits), and got the problem.
Code:
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) Client VM (build 17.0-b17, mixed mode, sharing)

I would check this incompatibility issue between JRE's 1.6 and 1.7...
Reply

Logout Mark Read Team Forum Stats Members Help
ScraperEdit for XBMC (Java)1