• 1(current)
  • 2
  • 3
  • 4
  • 5
  • 9
Release JAV Movie Scraper
#1
In an effort not to hijack the Data18 thread, i figured we can all chime in on DoctorD's new standalone program JAV Movie Scraper.

I will start now with my 1st impressions :

WISHLIST
  • I would like to be able to disable downloading of posters and/or fanart
  • I would like to be able to set the name of the nfo and the poster/fanart

Obvioulsy i have over 1000 JAV movies (down from over 1800 after a data disaster) and all already have the posters downloaded and cropped. I used PythonCoverCrop from Akiba Online to automatically crop the posters.

I also prefer the movie.nfo, poster.jpg, folder.jpg and fanart.jpg naming convention because i use XBMC and Plex together.

BUGS/UI QWIRKS

All my movies are in their individual folders under a JAV folder and when highlighting a folder and doing a single scrape, the files are saved in the JAV folder and not in the proper title's folder.

That's all for now...

Good work DoctorD! Thanks a bunch for this!
[4 Kodi Clients + 4 Norco RPC-4224 Media Servers w/376 TB HDD Space]
Reply
#2
Hello Pr.Sinister,

Thanks for making the post for the new scraper and your feedback. I've updated the code (both source and precompiled jar) with some additional code to handle selecting a folder and and then doing a single scrape.

I'll look into adding an option of disabling downloading & overwriting of preexisting files and options for file output name. What image typically goes into folder.jpg? The fanart or the poster?

I had never heard of PythonCoverCrop before. The code for it looks pretty interesting and more accurate than the method I'm using to crop covers. I'll probably adapt my code to try to use its method. Do you have a link to the most recent version? This is the only one I could find: http://pastebin.com/rZGHz3QW .

EDIT: I have now added two preferences which control whether poster/fanart files are written and whether to overwrite them if they are already there. Check the new preferences menu to enable/disable these options.
Reply
#3
folder.jpg contains the poster.

As for PythonCoverCrop, it is excellent... Just put the jacket in the folder named the same and run. It does 100 posters in 5 seconds! And it knows about the special ones with weird resolution/spine thickness. Such a life saver!

I have the final versions in exe and py format.

PythonCoverCrop - Python Version

PythonCoverCrop - Exe Version

Now another request for JAV Movie Scraper Smile

Can you add an option to download the actress thumbs to the .actors folder?
Can you add an option to not save thumb URLs in the NFO?
Can you add an option to not get any of the screenshot thumbs (no download and no save to nfo)?
Can you add an option to add a special predefined genre (JAV) to the nfo?

I use the genre to separate my movies, xxx movies and JAV into custom video nodes like you can see here :

Image

I will try the new version as soon as i get home tonight!
[4 Kodi Clients + 4 Norco RPC-4224 Media Servers w/376 TB HDD Space]
Reply
#4
Looks like lots of things that need to be added!

Anyways, I'll try to get to your requests as I have time to work on each one. For today, I added in .actors folder support (see new preference in menu) as this was something on my own wish list anyways!

It still needs a bit of work - I would like it when moving the file to a folder to also copy the files in .actor, but it should be working pretty well when scraping a movie which is in a folder, at least. As a bonus, it can now also read the files from .actors when selecting an already scraped movie which should speed up file reads a bit.
Reply
#5
I've added initial support now for extrafanart.

For it to work, the item being scraped must be a folder and you must enable the option for extrafanart in the preference menu. In addition, the movie must have extra screenshots available on DMM.co.jp available to scrape which are what are treated as extrafanart. Previously these elements had been saved as additional poster options in the nfo, but it made more sense for them to be extrafanart, since they're more like previews of the file than an actual poster. After scraping a movie, clicking write file will then write these extrafanart elements to disk assuming you've met all the criteria.
Reply
#6
I've posted a new build which uses the resizing method from pythoncovercrop.py.
Reply
#7
New features since I've last posted:

1. Typing within the filelist now lets you jump around the list based on the letter you type
2. Program can create folder.jpg files
3. Option to use poster.jpg and fanart.jpg instead of moviename-poster.jpg and moviename-fanart.jpg
4. Extrafanart will also now be written to folder when moving the file to a new folder if the appropriate menu preference is selected
5. Fixed a bug scraping plot from DMM due to a new change DMM made on their site
6. Updated readme with some links on how to use the scraper with Plex as well as XBMC.
Reply
#8
New features and bug fixes in today's commit:

1. Caribbeancom Premium Scraper Support.
2. Added Trailer Support in Nfos.
3. DMM.co.jp trailer scraping added.
4. Added Cache for Icons of Folders in the file list (should make scrolling the file list smoother, especially on network shares)
5. Can Now Write Trailers To File (Controlled by new item in the preference menu).
6. Bugfix: Click previously scraped folder, then click write file data without scraping first and poster art gets messed up.
Reply
#9
Ok so i have been super busy with work so i only just tested this new version. I have a few suggestions and some bug reports...

Suggestions:
  • A setting to save the NFO as movie.nfo instead of <filename>.nfo
  • A Setting to choose saving thumbs as Extrafanarts or Extrathumbs (or both)
  • Batch Scraping (Unless i am doing it wrong)

Bugs:
  • fanart.jpg is actually the Full Jacket and not a Fanart
  • Some DISM and DANDY posters do not display cropped correctly in the GUI unless you highlight another title and come back to it
  • Even when selecting manually, some translated titles are completely wrong

Some DANDY/DISM titles improperly displayed : DANDY-298, DANDY-383

For the wrong titles, you have IENE-420, IENE-430, RCT-136

IENE-420

Japanese Title: 勃起不全を3回発射するまで回復させる回春エステ
JavLibrary Translated Title: Rejuvenated Este To Recover Until You Shoot Three Times Erectile Dysfunction
JavScraper Translated Title: Locked On to Me Like a Crab 2
Why it's wrong: Nowhere does it say it's a massage/esthetics title plus in JPN there is a 3 and in english it's a 2

IENE-430

Japanese Title: 一般人に18cmメガチ○ポを素股してもらったらこんなヤラしい事になりました。3
JavLibrary Translated Title: It Is Now That They Have The Yarra This Not You Have To Intercrural Sex A 18cm Megachi ○ Port To The General Public.Three
JavScraper Translated Title: During Absence of Wife 4 - Yui Hatano
Why it's wrong: JPN 3 becomes an english 4 and there is no missing wife in JPN

RCT-136

Japanese Title: クルクル回る好きな女にぶっかけ放題 顔射できる回転寿司 2号店
JavLibrary Translated Title: No. 2 Sushi Shop Can Bukkake Facials Unlimited Woman Likes To Go Round And Round
JavScraper Translated Title: Pervert X Ray Glasses
Why it's wrong: I just noticed that not only the title is wrong but the plot is also wrong. The images are right.

We are getting there slowly but surely... Thanks for all your hard work!

-Pr.
[4 Kodi Clients + 4 Norco RPC-4224 Media Servers w/376 TB HDD Space]
Reply
#10
Hey Pr.Sinister,

Lots of feedback here, so I'll try to respond to each of your points in a new paragraph to keep things organized.

I see the problem that is causing the issues with the titles/plots being wrong and I've already fixed the bug with the title issue in my local copy. It should be in the next commit, which may take a bit to push out as I'm adding in support for other scrapers (American movies from data18.com, because, hey, why not) that is in a state of buggyness right now.

I need to rewrite this one spaghetti code function though called amalgamateMovie() to really get a proper fix for the plot and other bugs like it. What was happening is that the scraper attempts to guess what the best element is from a variety of different websites. Well sometimes, it finds an item from one of the websites that is useless - either the information is wrong, missing, or it scraped the wrong movie for whatever reason. I have this extremely long block of code (~250 lines) which handles a variety of different cases where the data comes in a number of different ways that I've just added to and added to as the project grew. However, after working on this scraper for a while, I've realized I have much cleaner and concise method of writing this which should get better results.

For the fanart being the jacket, this is not really a bug, but a feature, I guess? There's no other place to display the back cover art of a movie in XBMC, so I like having the full jacket showing so I can see that too somewhere. There isn't any actual fanart made for random JAV movies, so this is really the best we can do anyways for a source of a picture for fanart, unless you know of a better place/idea for this image? Maybe I can make a preference to disable scraping of fanart since different people might not like it.

What DISM and DANDY IDs are the ones that do not work in the GUI for the cropped poster art?

For the extrathumbs, are these just the same images the scraper is currently saving as extrafanart or did you have something else in mind? If it's the exact same image, just with a different filename, then that seems doable. I probably won't write anything at this point which actually goes through the actual movie file and takes screenshots at certain intervals and makes thumbnails from it as that's a bit more than I want to chew at the moment. There might be another program like Ember Media Manager that does that already, too.

I'll try to add a setting for naming the nfo file movie.nfo. The preferences menu is starting to get long! I may have to start doing some grouping into submenus there.
Reply
#11
Thanks for the prompt reply!

The DANDY titles that display wrong are DANDY-298 & DANDY-383 just to name a few. I am testing on just 25 movies as i have over 1000 so i will wait until there is a faster way than scraping one by one...

The fanart i guess can be one of the thumbs you put in the extrafanart folder. It's low-res but still better than the jacket for some people. I personally batch rename the fanart to jacket.jpg and i will modify my current skin to display that when i press the info button.

Yes the extrathumbs is basically just the extrafanart files. Again, i just rename the folder.

I am in the process of building a webapp similar to TheTVDB but for Adult Webisodes (kinda like data18 but with better search capabilities) and i think if i can get it working, i will also make one for JAV stuff. It would be nice to have a community driven JAV site where we could have all that metadata easily scrape-able with extra fields for like human interpreted Titles and Plots.

Is there a way to batch scrape with JavScraper right now or are you working on it?

-Pr.
[4 Kodi Clients + 4 Norco RPC-4224 Media Servers w/376 TB HDD Space]
Reply
#12
Yeah a community controlled website for adult content would be great. What would be really cool is if we could have a list of file hashes (need a list because files are usually released in several formats/resolutions for a given scene from each website) with each database entry so we could automatically tag our file no matter what it is named (I think this is similar to what Mp3 taggers like Musicbrainz can do?). Data18 is good in that it has a lot of data, but it's frustrating not having a standardized API for scrapers that doesn't change and break stuff. In addition, they are just plain missing who knows how many sites - likely due to a lack of business relationship with random 3rd party sites. I suspect they get affiliate fees for membership signups and so without an affiliate fee they probably just don't list the data on their site. Keep me updated if you need help testing!

There was no way to do batch scraping before, but my newest commit has both batch scraping (hold shift or control to select multiple items in the list, click scrape, and then after scraping is finished click write file) and support for Data18 Movies. Data18 content will probably come in a later commit, but for now people can continue to use the XBMC XML scraper I wrote to scrape the Data18 content.

Here's a complete summary of new stuff added since my last post:

1. Support for Data18 Movies (Full Dvds)
2. Added program icon.
3. Batch scraping
4. Misc bug fixes

I had to make some extensive changes to get batch scraping working, so hopefully I didn't add too many bugs in the process! I've tested it quite a bit myself and fixed what I have found so far, but just let me know if you notice any bugs.

By the way, if you start the java program from a console, you should be able to see a lot of debug messages if you're ever wondering what's going on with the program while it is running. The command to do so is:

Code:
java -jar JavMovieScraper.jar

While in the directory that contains JavMovieScraper.jar. While downloading extrathumbs, you can download over 100+ files, so it can be kind of slow while this happens. The program probably needs some kind of message console, but for now you should at least be able to tell the program hasn't crashed by examining the console output.
Reply
#13
Thanks DoctorD - following this with interest and its looking good.
+1 for a community site
Reply
#14
How to save the nfo and other files to WebContent? He thinks the scene but do not know how to save.
Reply
#15
Hello AiWaBR,

Are you using the newest build? I had fixed some issues with WebContent this morning. In general, webcontent scraping is a bit of a mess because data18 has 4 or 5 different page formats they use (that I know of!). If you or anyone else have a specific file that is not working or is missing data and you KNOW it has a page on data18.com, please reply to this thread with the URL on data18.com that is not scraping correctly.

Anyways, the steps to use the program are:
1. Navigate to the directory where your file is by clicking "Open Directory".
2. Select one or more files in the file list on the left.
3. Click "Scrape Data18 WebContent" button for webcontent. If it's a DVD movie, click "Scrape Data18 Movie". If it's a japanese movie, click "Scrape JAV".
4. Wait a while for the info to scrape. Sometimes you'll need to pick a fanart or movie from the popup dialog while the movie is scraping.
5. You should see some info pop up in the center and right panels such as movie name and a poster image. Review the information to make sure the scraper scraped the correct scene.
6. Without changing your selecton in the filelist, Click "Write File Data". This last step is important, without doing this the data won't be written. If you change your file list selection before writing the data, the scraped data will be lost and you'll have to rescrape.
Reply
  • 1(current)
  • 2
  • 3
  • 4
  • 5
  • 9

Logout Mark Read Team Forum Stats Members Help
JAV Movie Scraper1