A new crowd-sourced data entry site has been created that will allow us to leverage the MAME community's energy to recover dumps of chips whose lids have been lifted and innards have been imaged. Think of it as a re-branded "Typing Monkeys" project.
Please feel free to create a personal login and try it out! Once you have a login, you will be taken to your own home page, where you will see a list of available chips that you can help out with. Next to that will be instructions on how to enter bits for that chip, and next to that will be a link to see what work you have done and how you stack up against others who have typed in data. The site is evolving, so if you've got any input, please let the administrators know through the "contact" link.
At the moment, the only chip available is one of the TGPs from the SEGA Model 1 hardware. Information about this chip is in a Caps0ff blog post here:
To sweeten the deal for the most herculean of typers, Caps0ff has offered to grant the individual who types the most (sensical) entries access to see 5 of the group's decapping posts before the vast majority of the world has. This is a similar reward as what monetary donors receive, so feel free to go typing crazy.
Thanks! The typed data will be used to generate dumps for MAME and improve automated methods for detecting the bits.
Thatís great news about "Typing Monkeys" project comes back, and thanks to all people involved on that project for their time and work. I know that can be off topic, but Iíve read on Guruís site that Operation Wolf C-chip was decapped and the rom had been extracted by Capsoff. Is it true? There isn't any post about it on Caps0ff blog
Thank to everyone who made "The typing of 315-5571" a rousing success!
The user who typed the most quality entries goes by the username "Monkey." At the moment, we don't require users enter an e-mail address when registering. I thought this would simplify things and encourage people who wished to remain anonymous to add their work. This may be the case, but it also means I don't have a way to contact "Monkey." So Monkey, if you could log in to the site again and send us a message using the "contact" link at the top with a valid e-mail address for yourself, we will get you hooked up with your prize.
We got some great suggestions about how to improve the site. All of them were in regards to the widget you type into - automatic carriage returns, better alignment between the font size and the image size, a mono-spaced font (some platforms must not be working like mine?), a way to see which bits are complete as the user types, and even a cool minesweeper-like widget that lets the user tap on which bits are set have all been noted and added to our list of TODOs.
More die images will be posted shortly, and I will announce them again on the MAMEWorld site. Your old logins will still work, so be sure to remember your password :-).
Some random statistics: ~88 users participated It took ~15 hours to complete A quick estimate of how much work was required comes to ~38 hours. There were only 4 die images where 3 or more people disagreed about the results. This means both that the work is of very high-quality and likely that the die image was very clear. Props to the monkeys and to Caps0ff for the quality stuff!
That's all for now. Look for future posts where we do some more of these. /Andrew
> Thatís great news about "Typing Monkeys" project comes back, and thanks to all people > involved on that project for their time and work. > I know that can be off topic, but Iíve read on Guruís site that Operation Wolf C-chip > was decapped and the rom had been extracted by Capsoff. Is it true? There isn't any > post about it on Caps0ff blog
The C-Chip has 2 ROM parts, a MASK rom inside the UPD78C11 which is assumed to be the same for all games and an EPROM which definitely differs between games.
The MASK rom was imaged and typed up, it's rather interesting. The internal checksum on that passes, so chances are it's a good dump.
EPROMs can't be dumped using this technique as there's nothing to see, so an attempt was made to dump the EPROM by wiring it up directly to the decapped die but the Cap0ff guys couldn't get one of the address lines to work properly, so half the data is missing. It was very delicate work, tiniest of slips and you have to start over. The dumped data is interesting as you can see some of the tables the current operation wolf simulation uses in it, but as the dump isn't complete it wasn't possible to switch the emulation over to using it.
The MASK rom actually contains what looks like functions that could be exploited to read out the EPROM, so attempts have been made to use them, however it seems Taito anticipated this and the one of the port writes done in the code is blocking any further commands or responses externally. Annoying, otherwise you would have had some rather juicy news already.
> > > Thatís great news about "Typing Monkeys" project comes back, and thanks to all > > people > > > involved on that project for their time and work. > > > I know that can be off topic, but Iíve read on Guruís site that Operation Wolf > > C-chip > > > was decapped and the rom had been extracted by Capsoff. Is it true? There isn't > any > > > post about it on Caps0ff blog > > > > It's true. > > > > Haze added support for the dumps in > > https://github.com/mamedev/mame/commit/0019c8cbd019f706456f95b82fbcf7ffee641187 > > > > However, as he says, it's currently bad. > > > https://github.com/mamedev/mame/commit/d...10c2da50363a441 > > > > This could be why they didn't post news about it. > > > > - Stiletto > > To clarify > > The C-Chip has 2 ROM parts, a MASK rom inside the UPD78C11 which is assumed to be the > same for all games and an EPROM which definitely differs between games. > > The MASK rom was imaged and typed up, it's rather interesting. The internal checksum > on that passes, so chances are it's a good dump. > > EPROMs can't be dumped using this technique as there's nothing to see, so an attempt > was made to dump the EPROM by wiring it up directly to the decapped die but the > Cap0ff guys couldn't get one of the address lines to work properly, so half the data > is missing. It was very delicate work, tiniest of slips and you have to start over. > The dumped data is interesting as you can see some of the tables the current > operation wolf simulation uses in it, but as the dump isn't complete it wasn't > possible to switch the emulation over to using it. > > The MASK rom actually contains what looks like functions that could be exploited to > read out the EPROM, so attempts have been made to use them, however it seems Taito > anticipated this and the one of the port writes done in the code is blocking any > further commands or responses externally. Annoying, otherwise you would have had some > rather juicy news already.
Annoying, but not insurmountable, right? Is there a way around that?
"Note to Noobs:
We are glad to help you but simply posting that something does not work is not going to lead to you getting help. The more information you can supply defining your problem, the less likely it will be that you will get smart-alec replies.
> > > > Thatís great news about "Typing Monkeys" project comes back, and thanks to all > > > people > > > > involved on that project for their time and work. > > > > I know that can be off topic, but Iíve read on Guruís site that Operation Wolf > > > C-chip > > > > was decapped and the rom had been extracted by Capsoff. Is it true? There isn't > > any > > > > post about it on Caps0ff blog > > > > > > It's true. > > > > > > Haze added support for the dumps in > > > https://github.com/mamedev/mame/commit/0019c8cbd019f706456f95b82fbcf7ffee641187 > > > > > > However, as he says, it's currently bad. > > > > > > https://github.com/mamedev/mame/commit/d...10c2da50363a441 > > > > > > This could be why they didn't post news about it. > > > > > > - Stiletto > > > > To clarify > > > > The C-Chip has 2 ROM parts, a MASK rom inside the UPD78C11 which is assumed to be > the > > same for all games and an EPROM which definitely differs between games. > > > > The MASK rom was imaged and typed up, it's rather interesting. The internal > checksum > > on that passes, so chances are it's a good dump. > > > > EPROMs can't be dumped using this technique as there's nothing to see, so an > attempt > > was made to dump the EPROM by wiring it up directly to the decapped die but the > > Cap0ff guys couldn't get one of the address lines to work properly, so half the > data > > is missing. It was very delicate work, tiniest of slips and you have to start over. > > The dumped data is interesting as you can see some of the tables the current > > operation wolf simulation uses in it, but as the dump isn't complete it wasn't > > possible to switch the emulation over to using it. > > > > The MASK rom actually contains what looks like functions that could be exploited to > > read out the EPROM, so attempts have been made to use them, however it seems Taito > > anticipated this and the one of the port writes done in the code is blocking any > > further commands or responses externally. Annoying, otherwise you would have had > some > > rather juicy news already. > > Annoying, but not insurmountable, right? Is there a way around that?
From a software hacking point of view? maybe not, if you're locked out you're locked out, entirely possible it was done like this specifically to make what I was trying to do impossible on a retail chip. Might just have to go the very risky, very expensive hardware decap route with no guarantee chips won't just be destroyed (every decap is a risk)
I wish people would stop expecting miracles, these are real, tough, problems where yes, some approaches can be ruled out entirely if the right security measures were taken.
Stiletto and Haze thank you for your answers Itís good to read news about this game, Iíve played it at arcades and itís one of my favourite. Now I think it works right with the simulation but now itís going to be 100% sure. At arcades I never completed power magazine stage :_( When Cap0ff dump the other half the data, maybe a way to dump other C-Chips without decapping can be found. When all C-chips or that C-chip are/is beaten, I hope Haze writes and article about it
With the really good / clear images so far it would be really easy to write a little bit of code to work out the average pixel brightness (or the average of the top 50 or whatever brightest pixels) for each region to determine a 0 or 1. From the images I've seen, it should be well over 90% accurate. The "tool" could flag "dirty" pixels - where it isn't definitively a 1 or a 0 - for further human checking.
In fact doing this would be a hell of a lot easier than setting up a web site, UI, etc.
Would probably still need human input for the "dirty" pixels. But from the images I've seen, this should only be a small percentage.
For the current web site, an "AI" button could be added next to the "Submit" button that scans / fills in the 0's / 1's and places ? for "dirty" pixels. All the human has to do is check the results and determine the "dirty" pixels. Should enormously speed things up and involve less work for humans.
Oh, and my guess for "Monkey" is "Moogly" - he ate up the tasks last time there was a data entry job on here.
Attached is a ZIP containing my Java source code that analyses an image and automatically works out the 1/0 for each region. Also includes 2 images (screen shots from siprOn).
NOTES: * The "average brightness of the top 50 pixels in a region" seemed to work better than the "average brightness of all pixels in a region". But I have left the code for both averages in so anyone can explore further. * This code works for included screen shots (see "Sample Output" below). * For other screen shots to work, you will probably need to adjust the size, offsets, etc below. * The threshold value - oneZeroThresholdValue - will almost certainly need adjusting for different images / scans. * Lots of improvements could easily be added (let me know if I can help). e.g. a nice GUI interface, drag and drop, image parameter sensing (where the red lines are, etc), and so on.
Humans are the best at determining if a bit is good or not.
Algorithms can be good. Even 99.999% good, but that means for a 8k die, there is a very good chance that 1 bit is bad. Some algorithms can assign a detection confidence to each bit, but there is a chance this confidence score is wrong as well. We have found that running a very good algorithm yields us very good results, but to my knowledge, they have never been 100% correct.
That means that, thus far, a human needs to go in and check the bits that have a low confidence. Which saves a lot of time, but in my experience, sometimes a bit with high confidence is marked incorrectly too. So, well, now to be sure the dump is good, you need to go in and check every bit, and if that's the case, you're back to where you started - typing every bit in by hand.
Now about the posted code: taking average pixel values can get you 90% on very evenly-imaged dies, but what about something like this (I just looked for some example of a die image - any example - there is a good chance this has never been typed)?
See how the average intensity varies across the surface of the die? That means that, for this die, you need to get the average pixel intensity in a spatially varying fashion. Which will introduce even more errors.
So, yes, there are automated methods, and from modern research, deep learning is getting *really* good at recognizing small images for what they are, but none of them are perfect, and they're all imperfect in different ways.
The only way to get good results for many different types of die image with the highest level of confidence is to use a human. Thus the typing monkey project.
Another day, another die completed by the banana-munching distributed typing machine!
Team Caps0ff wrote a cool script to get some fine-grained statistics. Here they are for the 315-5572.
Estimated total user time: 44:29:42 Median view time: 31 sec
Top user's statistics: Median view time 25 sec Images typed: 400 Estimated time: 2:52:07
Myself and some of team Caps0ff have also fixed a few bugs and added a "profile" link to the top of the page which lets you change/add your e-mail address in case you'd like to get e-mails about the status of the site. So log in again to add your e-mail if you're interested in that sort of thing.
I'm going to go process another die and hopefully get it uploaded before long. Type again with you soon!
> Humans are the best at determining if a bit is good or not.
OK, no worries Andrew, don't want to step on anyone's toes. Just like exploring what computers can do. Been doing that for 37 years, and will probably be doing it till the day I die.
I'd sure like to tackle less optimal images when these become available - just for the fun of seeing how things go and how far I can tweak things. (With the good quality images I have so far, there is a big gap between the "average pixel brightness" for 0 and 1 regions).
You're not stepping on any toes, but I figured I'd respond since we often hear people saying things like "I bet a good algorithm could do this automatically" and "why would you spend time writing a web thing?" We agree - a good algorithm should be able to do it, and we're going to be using the data gathered by humans to validate our tests at making the best algorithm possible.
If you wanna' be cutting edge in your noodling (and directly help us out with some of the things we haven't had time to do yet), checkout the machine learning library Keras:
That's a convolutional neural network (and then some) that guesses the ASCII character from an image of a handwritten character. If you play with it enough, you can get the machine to do better than the average human. I'm hoping to get the same thing for images of bits...
If needed, I'll check this out. But, it sure seems like taking a sledgehammer to crack a nut ...
I've added a GUI to my program - drag and drop an image, click "Process" and it produces a histogram. Scroll down through the histogram and look for a Step Change, select the threshold value accordingly, and click "Process" again, click the "Copy Bit" button, paste into SiprOn and do a final check and click "Submit".
Attached ZIP file includes a Java application and User Guide - which takes you through how to process easy images (which make up the vast majority) and hard images.
For most images, the entire process: Right click Save, drag and drop, Process, scroll down through histogram and check threshold value against the step change, copy, paste, click "Submit" takes about 10-15 seconds.
Processed about 75 images so far. Any errors made are due to me - a bug in an earlier version, and me interpreting a brighter region as a 1 or a duller region as a 0. Some are hard to pick. The histogram makes this a lot easier. But, in the SiprOn results so far, there's still one pixel that I still think is a 1 and other(s) think is a 0.
Anyway, someone might find this tool useful. Others may hate it. All I can do is try.
> It might be interesting to put the codebase on github to have others contribute.
Yes, will almost certainly do this in some way. (I talked about this in the User Guide).
Ideally, it would be great if Andrew incorporated this into his SiprOn web site .... and then we can fine tune it there.
I think simple statistical analysis is the way to go (histogram, averages, etc). This is working awesomely so far (any errors made are mine, not the algorithm). Got ideas to fine tune this further: track largest gap, auto select threshold, multi-average analysis (top 50, top 20, avg all pixels, etc) to build better consensus / confidence, and so on. All very easy to do.
Maybe Andrew will warm to this ? Maybe he wants a sledgehammer ? (Nothing wrong with that).
The source code for v0.001 and v0.002 were released earlier - see above (v0.002 replaced v0.001). This has everything but the histogram and GUI (very simple code).
Happy to make the latest code available - Andrew has first dibs if he wants it. Or, if people want, I'll up to git. Just got to tidy up the code, add comments, etc - this was developed very quickly.
> No worries back atcha' . > > You're not stepping on any toes, but I figured I'd respond since we often hear people > saying things like "I bet a good algorithm could do this automatically" and "why > would you spend time writing a web thing?" We agree - a good algorithm should be able > to do it, and we're going to be using the data gathered by humans to validate our > tests at making the best algorithm possible. > > If you wanna' be cutting edge in your noodling (and directly help us out with some of > the things we haven't had time to do yet), checkout the machine learning library > Keras: > > https://keras.io/ > > And specifically, check out the OCR example here: > > https://github.com/fchollet/keras/tree/master/examples > > That's a convolutional neural network (and then some) that guesses the ASCII > character from an image of a handwritten character. If you play with it enough, you > can get the machine to do better than the average human. I'm hoping to get the same > thing for images of bits... > > /Andrew
Once we've gone through a few more chips and the data piles up, I would love to fit some neural nets on this stuff!
My $0.02 as someone that tried a few CV experiments to help validate the project
I tried some simple thresholding on the dies as well as using some more advanced statistical techniques. 100% automatic recovery is ideal, but it would also be acceptable (and possibly required) to simply flag bits that CV can't confidently resolve.
Unfortunately, although accuracy was pretty good, I wasn't able to successfully flag all bit errors. Ex: the white gunk on the blank area were triggering as 1's when they actually are 0's. IMHO it will take something more intelligent like a neural network to recognize whether the correct shape/pattern is present or not. While I think it would be a great long term project, Andrew offered a short term solution that seems to be working well.
The current approach is a bit brute force, but it is known to work. In fact, some people are entertained by it. Comments include: -"MOAR! This is highly addictive. will you publish more die images to work on?" -"I did a bit more than a hundred images while waiting for a backup to be copied." -"Please send ... photos. They will amuse ..."
We discussed presenting the CV results for quick human validation, but I was afraid this would bias the results and thus defeat the project.
Yeah, apologies for not responding to *any* suggestions yet. I'm quickly running out of time, and wanted to get new dies processed so they could be typed first.
We'll address the suggestions very soon - especially the ones about people wanting to go back and edit entries they have already made (very good suggestion - we'll make that possible someday for sure)!
Also, that 1000 entries bug is totally annoying - apparently it's a bug in the version of the database we're using - SQLite - and is fixed in a newer version. It's preventing some mega monkeys from being more mega, which isn't cool. Like IDrinkHF says, we'll try to fix that as soon as we can.
There are still two dies in progress, so any time you feel you've got the spare cycles, hop on over and throw down a few bits for the sake of Model 1 emulation. This is almost certainly going to be a marathon rather than a sprint, as there are more dies after these two !
Giving users another way to input data is definitely high on our list of things to add. Widgets that let you click on dots (tap on them in your phone) rather than type, widgets that guess then let you correct the guesses, and even widgets that present histograms for the user to twiddle all sound like good ideas.
That being said, I'd release your code to the masses, Moose, as after 2 hours from now, I will have absolutely no time to do anything with it. I'll likely send a few messages, but that will be it.
There are many ways the monkey platform can grow, and we'll definitely consider adding additional features, but please remember that some decaps look like this :
Allowing humans to type what they see will work there just as well as it works with the current die images. A histogram may fall short for that decap, so please (continue to) understand why we're focusing our efforts on making things do'able for humans rather than incorporating automated techniques that may not be as general as plain ole' typing.
Really digging this, it's nice to be able to contribute to the great work of preserving data.
Seconding the wish for editing already-submitted entries and the tap-based option - I know I made a mistake at least on Column 30, Row 29 of the 315-5573 xpol and wouldn't mind doing some of this from my phone in the distant future.
just keep in mind that if multiple people use a tools like this it destroys the whole "multiple sources entering the data are more likely to highlight questionable bits" side of things which is why each tile is handed to multiple users. The tool is always going to give the same result.
> Same place, Sega die 315-5573 and Sega die 315-5677 are now up. > > I will be engaged in a major life change soon, and will have very little time for > this sort of thing for a month or two, so I hope to get a bunch more up before the > end of the day. > > Wish me as much luck as I wish you, and enjoy the typing! > /Andrew
To celebrate having two working hands again, I wrapped up the remaining shots. So it's time to do shots.
> The tool is always going to give the same result.
No, not at all.
Some options have already discussed above: settings could allow the tool to work on the "average of all pixels in a region", and another for "average of the brightest 50 (or whatever) pixels in a region". Other settings could allow processing a colour reduced version of the image (reduce to 256 or 16 or whatever colours), another for a "gray scaled" version of the image, and so on. There's a large number of possible processing strategies. The tool can give different results (maybe better, maybe worse) depending on the processing strategy selected.
And that is just using simple statistical analysis techniques. When you add neural nets, genetic algorithms, OCR, advanced AI, etc into the mix (2 of these mentioned above), then the possibilities expand enormously.
Even without the advanced methods, if multiple simple statistical methods are used to process an image and results cross checked, then the tool should give higher accuracy and higher confidence in results.
I've now got 700+ images on file, and can re-process these to compare strategies and improve results. It's all just a bit of fun.
What started me thinking along these lines was the MH 370 crash and searching satellite photos for wreckage. Picking out light coloured wreckage in the middle of a blue ocean (or dark jungle) and stripping out the effects of clouds, boats, houses, etc.
Then this "identify the 0/1's in chips" cropped up so that people can repair their old games, and I thought - ooohhhhhhh, same kind of problem.
Messing around and processing data is great fun, even if if it goes nowhere and achieves nothing.
We can't promising anything, but this will add direction.
Some fun statistics comparing monkey crowdsource result (CS) to Caps0ff (C0) submission:
sega_315-5571_xpol: CS matches C0 submission
sega_315-5572_xpol: 8 conflicts, all ruled in favor of CS result (ie this project fixed 8 errors)
sega_315-5573_xpol: 10 conflicts, all ruled in favor of CS result
sega_315-5677_xpol: 11 conflicts. 7 ruled in favor of CS result. Other 4 are due to damage on die causing incorrect interpretation. I suspect that C0 fixed up the ROM before submitting (its obvious from the die shot). A final manual inspection would have probably caught this
I wanted to hold off releasing more dies until we could really validate the project was working. I'm pretty happy with the above results and so we've greenlighted more dies to be released.
Above results indicate the C0 submissions are roughly 99.99% accurate. Note: we don't have a C0 submission for all die images. That is, soon we'll be relying on Monkey results only with nothing to compare to.
Other: its great to hear people are interested in the project and are exploring computer vision (CV). However, we please ask users do not submit them into the monkey system as this defeats the purpose of the project. If you have a strong interest in CV, use the "contact us" form and we'll work with you to run some tests. If we didn't get back to you, ping us again now that we've sorted a lot of things out.
Hopefully at some point in the near future we'll do a writeup on the post processing steps and what we learned about user submissions.
Other fun facts:
One user somehow found that I posted and already typed about 20 entries before I finished writing this
The "Expression tree is too large" bug should be fixed now. Please contact us (ideally using the form) if you continue to have issues. Note: this issue only effected users that submitted a large number of tasks
I'm still getting a pretty extensive error when I try to load up an image. I'm TheMogMiner on the site, can you please shoot me an e-mail when you've got it fixed? I sent a copy/paste of the error page via the contact form, but it probably mangled it all to hell.
Yeah I see it. Looks like after running for a while with the new fix it caused something else to break. I've rolled back code to the older release that has the known "query too large" error. (occurs for users that have submitted a large number of fields)