PyRomInfo - extract data from ROMs using python
#1
Hey guys,

I started a python project called PyRomInfo, https://github.com/garbear/pyrominfo

Quote:PyRomInfo is a convenient, unified way to get data about a file originating from a read-only memory chip, often from a video game cartridge, a computer's firmware, or from an arcade game's main board.

been hacking on it for about a fortnight now, and it currently supports Gameboy/Color, Gameboy Advance, Sega Master System, Genesis, NES, SNES, N64. The goal is both completeness & research, so each platform should support every known format (e.g. both iNES 2.0 and UNIF for nes roms, and the custom SDSC homebrew tag created in 2001 for SMS roms), and every python source file should cite as many online references and emulator source-code files as necessary.

Current task: I got distracted a few weeks ago (a common occurrence Smile) but when I get a chance I'll either take Hachoir-parser and finish the ISO9660 parser (my progress), or write a custom one for PyRomInfo. Once ISO9660 support is in, a parser for PSX roms is only a couple of lines of code away, and we're one step closer to solving the .bin/.iso conundrum.

So, I invite you all to give PyRomInfo a test spin. If anything is screwy I'll try to clarify in the Readme. N3MIS15 has been really helpful in testing out a lot of the platforms, and I'm sure there's some more problems to be weeded out.

Cheers,
Garrett
Reply
#2
Since any Rom could be zipped, how do you want to handle .zip and etc? I can think of a "dummy" parser that reads the file in the zip and recursively calls the other parsers to find a match, or every parser reports .zip as a valid extension and handles the unzip internally.
Also, do you have a list of types that still need to be done?
Reply
#3
I had zips in mind from the beginning. class RomInfo has a parse() [by filename] and parseBuffer(), so any number of abstract file systems/storage methods are possible as long as they can get the data to that second function. I managed to hunt down magic words, or somewhat-passable heuristics, for the formats above.

Limitations are that parseBuffer() won't work with a 700MB iso (python chews down memory), and there's no statistical output -- this could let us posit that there's a 89% chance it's a GBC image given the presence of a valid publisher byte and checksum short int but a missing 48-byte nintendo logo.

Help me with a platform roadmap!!! I'd like to see every popular format scraped into RCB supported and, in the distant future, XBMC's game library as well. Priority-wise I want ambiguous extensions first, like .iso and .bin. Another factor is popularity, both as a fraction of games played in RetroPlayer's emulators (primarily) and games played by all emulators.

First up is PS1, I was making really good progress on that before I got distracted some weeks ago. what formats should I target next?
Reply
#4
What about split rom sets, like higan (formally known as bsnes) and MAME (or any other arcade emulator like FBA or Kawaks) is using?

Where does it know where to find the corresponding data?

Or "torrent zipped", like some old sets used to be with X times the same file, but some of them overdumped, bad or just changed?

But great project!
Reply
#5
(2013-05-14, 06:08)garbear Wrote: Hey guys,

I started a python project called PyRomInfo, https://github.com/garbear/pyrominfo

Quote:PyRomInfo is a convenient, unified way to get data about a file originating from a read-only memory chip, often from a video game cartridge, a computer's firmware, or from an arcade game's main board.

been hacking on it for about a fortnight now, and it currently supports Gameboy/Color, Gameboy Advance, Sega Master System, Genesis, NES, SNES, N64. The goal is both completeness & research, so each platform should support every known format (e.g. both iNES 2.0 and UNIF for nes roms, and the custom SDSC homebrew tag created in 2001 for SMS roms), and every python source file should cite as many online references and emulator source-code files as necessary.

Current task: I got distracted a few weeks ago (a common occurrence Smile) but when I get a chance I'll either take Hachoir-parser and finish the ISO9660 parser (my progress), or write a custom one for PyRomInfo. Once ISO9660 support is in, a parser for PSX roms is only a couple of lines of code away, and we're one step closer to solving the .bin/.iso conundrum.

So, I invite you all to give PyRomInfo a test spin. If anything is screwy I'll try to clarify in the Readme. N3MIS15 has been really helpful in testing out a lot of the platforms, and I'm sure there's some more problems to be weeded out.

Cheers,
Garrett

Really cool !! I'm looking forward to test our old GB and NES games and put them "live" on the family xbmc box Big Grin This will indeed be very fun, cheers Lars.
1. XBMC: http://github.com/FlyingRat/xbmc (ffmpeg-head-inc-xbmc-patches)
2. FFmpeg: http://github.com/FlyingRat/FFmpeg (ffmpeg-head-with-xbmc-custom-patches)
3. XBMC-updated-FFmpeg-binaries (just dev snapshots, no regular distros)
Reply
#6
(2013-05-14, 19:53)garbear Wrote: First up is PS1, I was making really good progress on that before I got distracted some weeks ago. what formats should I target next?

If your end goal is to supply this info to heimdall for scraping purposes, I would focus on platforms that libretro supports.
For disc based parsing you could look at segacd, dreamcast, saturn, gamecube, wii, CD-i, 3DO... afaik out of those only segacd is supported by libretro.
Image
Reply
#7
(2013-05-15, 10:34)N3MIS15 Wrote:
(2013-05-14, 19:53)garbear Wrote: First up is PS1, I was making really good progress on that before I got distracted some weeks ago. what formats should I target next?

If your end goal is to supply this info to heimdall for scraping purposes, I would focus on platforms that libretro supports.
For disc based parsing you could look at segacd, dreamcast, saturn, gamecube, wii, CD-i, 3DO... afaik out of those only segacd is supported by libretro.

There's a Sega Saturn core based on Yabause. I haven't tested it in RetroPlayer, but it seems to have reasonable support through RetroArch.

There should also be TurboGrafx-CD support through one of the Mednafen cores.
Reply
#8
Smile 
Hi all,

Let me start by introducing myself. I'm a C programmer and have no experience in developing Kodi.
I have no experience in Github. (I used SVN).
I wanted to start on something small to help the RetroPlayer project.
And i am interessted in learning something new like python.

So I downloaded PyCharm and started running it.
I got the unittests working, and the code is nicely written so i understand what it does. Smile
I would like to help implementing the psx parsers.

I saw that you worked on the hachoir project for ISO9660 parsing.
I have been reading, reading and searching and also found this tool:
https://github.com/CaptainCPS/PS_ISO_Tool/
It is a C console program that extract game titles from ps1/ps2/ps3/psp image files.

Are there other information in the ISO9660 that is interresting? Or is the title extraction enough?
Would it be interessting for me to port this code to the PyRomInfo project as a RomInfoParser?
Or should it be better to work on the hachoir project and add the parser in that project?
Reply
#9
Hi woerd88,

I've done some work on an iso9660 parser. my strategy was to implement iso9660 parsing in Hachoir, then use Hachior in PyRomInfo. if Hachior can't be integrated into Kodi easily, then I was just gonna write a basic standalone iso9660 parser. I've mostly been referencing the raw ISO9660 specification, and you might find some helpful references in comments throughout the source code.

My Hachoir work is here: branch iso9660 commits

My standalone PyRomInfo iso9660 parser progress is here: branch playstation commits

ISO9660 parsing in pyrominfo would be awesome, because multiple systems can benefit from knowing this header information. I started down both the hachoir and standalone paths simultaneously. i would suggest studying the hachoir code and trying to finish that project, then building a stripped-down standalone parser using the hachoir parser as a reference.

PS ISO Tool is a good find. I'll add a link to the readme so I remember to check this out when I integrate PyRomInfo in Kodi.
Reply
#10
Doesn't Kodi already have libcdio as dependency and is capable of reading iso images through VFS?

http://www.gnu.org/software/libcdio/
https://savannah.gnu.org/projects/libcdio/
http://www.gnu.org/software/libcdio/libcdio.html

Maybe libcdio doesn't support all version types PlayStaion disc images though, or?
Reply
#11
Well if i understand correctly the playstation game title is stored in a small file in the root.
PS1/PS2 is stored as some serialcode and needs to be translated to the real game title from big list or database.
PS3/PSP there is also a serialcode, but also the title itself is saved in the file directly. (poorly formated with caps)
The list of code <> game title is kinda finite, since there aren't any new game releases on the PS1 and PS2 platforms.
There are some gaps / missing serials but i think these are rare and mostly for japanese region games.

I think each psx file extension basicly works like this or refer to a file with this iso structure. Maybe with some compressed format or something it won't work. But thats for later concern.

On the libcdio topic: I noticed this library on linux and windows in the Kodi folder.
But is it also available for other platforms in Kodi? (Mac / Android NDK)

Maybe we could write the detection using this library in Kodi somewehere?
Or should we stick to the detection/scraping in an addon like PyRomInfo?
And maybe use this interface to use the libcdio feature?
(https://pypi.python.org/pypi/pycdio/)

Would be a cool feature to scrape when a disc is inserted using the libcdio.
Reply
#12
BTW, PS3 and PSP images also have a small icon and small fanart embedded.
These are small .png files, but would be a nice option to scrape.
Reply
#13
i forked garbears hachoir repository and continued on his work in the iso9660 branch.
(https://github.com/Woerd88/hachoir/tree/iso9660)

It's started a bit slow since i had no expierence with python or github.
Now my goal is to finish this parser and add more parsers to the hachoir project.
Reply
#14
I have no idea what all that means, but i'd just like to say it's good to have another skilled person like yourself on board. Smile thx!
Reply
#15
great! in a few hours i say goodbye to internet and warm showers for a month.. looking forward to any progress you make til then
Reply

Logout Mark Read Team Forum Stats Members Help
PyRomInfo - extract data from ROMs using python0