MAME World Short Bus #2

June 30th, 2009

Here we can see the plight of a guy with the original net handle “Marios”, who is apparently the only person in this world who is at once gifted enough to know the term “FM” and dumb enough to not understand that the problem is entirely on this end.

I’d say the cherry on this Fail Sundae is his most recent post:

“No. The sound is okay in all real games i used (even the oldest tune melody games produce nice results and the chip itself as a chip is dynamic - i don’t mean if the sound mimics exactly recorded sound or not but as a stream). I always thought is because they use strong amplifier since i was using MAME. Maybe some code clean-up or fix will help eliminate the blair or poor-producing FM sound.”

So, the sound is okay in all “real games” that he used.  Perhaps this means that all of the games he’s tried in MAME work fine, and he’s actually talking about some other games, seeing how of the games in MAME are “real games” and all.

 

Ah, but Marios has the solution!  Maybe some code clean-up or fix will eliminate the blare - sorry, “blair” - that he’s experiencing.  Yes, this is surely the solution for this issue that only he is experiencing.  Yep, Marios has sussed it out for all of us.

 

Here’s to you, Marios, for giving me an unexpected bit of comedy on this rainy afternoon.

MAME World Short Bus

June 11th, 2009

At first I was angry at Twisty (the admin), but then I realized that if he wants to surround himself in his forum with people who post crap like this, well, so be it!

I really have to wonder what line of thought led to that post.  Picture a stereotypically ugly, bespectacled furry nerd, muttering in a Spanish-accented wheedling nerd voice, “Nehhh, I don’t care if I’m from Mexico and am about as Japanese as Bob Hope, hehhhh, my super-genki Japanese knowledge must be defended, nnh!”

Oh Baby

June 1st, 2009

If anyone wants to try it out, I just committed my CPU core and driver for the Manchester Small-Scale Experimental Machine (SSEM), or “Baby”, to the MESS SVN depot.  It currently runs all known SSEM programs bundled with David Sharp’s SSEM simulator, available here.

I am not entirely happy with the fact that it is compatible with all of the programs, though.  Certain programs in particular, i.e. “nightmare.snp”, would not run on the SSEM had it ever been extended to the full 8192 words of storage space of which it was theoretically capable (per some SSEM history sites), as they pad out the unused 8 address bits with pretty patterns.

And now, a pretty picture:

The controls are as follows:
Up / Down: Move the selected store line up/down
Button 1: Halt / un-halt the SSEM
1-8, Q-I, A-K, Z-,: Toggle bits 0-31 of the currently-selected store line

Nomenclature

May 12th, 2009

I think that the developers of PCSX2 - the Playstation 2 emulator - should seriously consider renaming it to “Emulator for Playstation 2 Games That Are Played Largely By Japan-Obsessed Bespectacled Nerds”.  I just popped on over to the PCSX2 site and checked out their screenshots, and it’s really telling: There are almost no western-deveoped games on the list.

There’s no Ratchet & Clank, Jak & Daxter, Sly Cooper, Madden, NASCAR, NCAA Football, Burnout, Beyond Good & Evil, Splinter Cell, Grand Theft Auto.  The list of games not deemed good enough to have screenshots goes on and on.

Why?  It’s because it’s like I’ve always said: Current-gen emulators are ROM-kiddie magnets.  You don’t try to emulate the PS2 while it’s still going strong out of some sense of “preservation”.  No, you do it because you want accolades from people who are too cheap to buy a real console.  Strangely, the Venn diagram showing the correlation between people being too cheap to buy a real console plus games and people who are obsessed with blue-haired, big-bosomed, giant-eyed women with shrill, piercing voices seems to be almost 1:1.  Imagine that.

Filtration

May 9th, 2009

I’ve been eagerly watching all of the Naomi development that’s been going on in MAME behind the scenes, and I couldn’t help but stick my own grubby mitts into the fray.  It was pretty easy to get bilinear filtering going, and it definitely helps Spikers Battle’s attract mode (compare with the shots on Haze’s blog).

In all cases, click on them for the full-res shot.

Slashditz

May 1st, 2009

This is the sort of crap that passes for an article on Slashdot these days?

Unlike the synopsis on Slashdot, this “group at Georgia Institute of Technology” isn’t emulating shit.  This effect has pretty much no fundamental basis in reality, it was just a bunch of people throwing full-screen effects at the wall and seeing which ones they liked the look of.  It is no more an “emulation” of CRT effects than the PNG-based filters that you can use with MAME. Besides that, the effect just plain looks too crappy.

Folks, THIS is the only legitimate “emulation” of a TV signal and its output.  NRS and blargg based their work on the hard, documented math equations that involve generating and displaying an NTSC video signal on a television.

I’m sick of people heralding this group at GIT as some sort of geniuses.  They don’t know what the fuck they’re doing, and I hope that they sure as hell didn’t get any kind of grant for what they did.

Diversion

April 21st, 2009

I was browsing through a disc image of Superman Returns for the Xbox 360 in MagicISO yesterday evening, and I found something interesting.  It seems that all of the game’s assets are stored inside huge bundles, just like pretty much every game these days.  The contents were very interesting.

See, almost all games that come out these days make sure to compile their scripts before they’re run.  This is partly due to save CPU time, but it’s also partly done just to make things harder on casual hackers, as typically they’ll just wander off and do something else if you throw a mild barrier in their way.  Not Supes!  No, this game apparently has all of its XML and Lua files just hanging out, waving to everyone.

I have to hand it to the folks over at EA, they certainly know how to write some pretty clear scripts.  If you don’t believe me, grab a backup image of your own legally-owned Superman Returns disc and check it out.

The only problem I’m having so far is that their bundle files seem to be recursive in some cases, and I don’t handle those properly.  For starters, though, here’s a preliminary spec:

0000-0007: BGFA1.05
0008-000B: Checksum?
000C-000F: Number of directory entries
0010-0017: Location of directory start
0018-001F: Directory length, in bytes
0020: Unknown (usually 00)
0021: Unknown (usually 01)
0022: Checksum length in bytes? (Unknown, usually 08)
0023: Byte length of each file offset
0024: Byte length of each file length
0025: (Unknown, usually 00)
0026: File offset alignment (1 << x)
0027: (Unknown, usually 00)
0028-002B: Number of padding words at the beginning of the directory
002C-002F: Filename length
0030-003F: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00-01: (Unknown, usually 10xx)
02-09: (Unknown, varies)
09-0B: File offset, in 16-byte chunks
0C-0E: File length, in bytes
0F-xx: Filename, zero-termiated

Go to it!

Yeah, Baby, Yeah

April 15th, 2009

I was rather intrigued by David Link’s project to resurrect the Manchester Mark I, and commenced digging up more info about the Manchester Mark I and its predecessor, the Manchester Small Scale Experimental Machine, or “Baby”.

As it turns out, there’s already a Java-based emulator out there.  Inspiration struck, and I decided to see how easy it would be to emulate the SSEM in MESS.  It was, historically, the first electronic stored-program computer (if I read my sources right), so it seems like a prime candidate for support in MESS.

Without further ado, here is the Manchester Small Scale Experimental Machine displaying the results of Tom Kilburn’s “Highest Common Factor for 989″ program, including the correct answer (43).

Chances are good I’ll be able to get clearance to add this to MESS.  Fingers crossed!

USF

March 9th, 2009

On a weekend with nothing better to do thanks to a deathly head cold, I decided to bolt USF support onto MESS.

So far things seem semi-promising. It’s too slow to be listened to in realtime, but that can probably be fixed with an RSP recompiler. Some games work, some games don’t work, some games work but in strange manners.

Here’s a quick rundown on the USF sets I’ve tried so far, and their relative working or unworking status:

  • Banjo-Kazooie: Works fine.
  • Beetle Adventure Racing: Works fine.
  • Blast Corps: Works fine.
  • Bomberman 64: Plays nothing.
  • Buck Bumble: Plays a very short click, then nothing.
  • Donkey Kong 64: Exits MESS almost immediately with an unknown RSP opcode, presumably due to the game running off into the weeds.
  • Dr. Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Conker’s Bad Fur Day: Works fine.
  • Diddy Kong Racing: Works fine.
  • Goldeneye: Plays the music at around 10 to 20 times the correct tempo.
  • Jet Force Gemini: Works fine.
  • The Legend of Zelda: Ocarina of Time: Plays nothing.
  • Mario Kart 64: Works fine.
  • The New Tetris: Works fine.
  • Perfect Dark: Works fine.
  • Pokemon Stadium: Plays nothing.
  • Sim City 2000: Works fine.
  • Space Station: Silicon Valley: Plays nothing.
  • Super Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Super Smash Brothers: Works fine.
  • Tetrisphere: Works fine.
  • Yoshi’s Story: Plays nothing.

I suppose the next step is to either figure out why some games are playing nothing, or why some games are just running off into the weeds, resulting in MESS fatalerroring!

Performance Anxiety

February 18th, 2009

I’ve decided to take a short break from working on renderer issues, insofar as pretty much every single game that doesn’t run into some sort of bug lurking in machine/n64.c or some sort of MIPS CPU bug has largely correct graphics.  The few games that do run up to a machine/n64.c-related bug or MIPS CPU bug also have largely correct graphics.  Barring a few exceptional cases, these games would be playable if not for the aforementioned bugs and/or performance.

Since I am not quite familiar enough with the N64’s non-graphical functions to be comfortable bug-hunting in those realms, for now I’m going to concentrate on performance.

Using MAME’s built-in profiler to determine CPU load distributions across the main CPU, RSP, and everything else (mainly the RDP), I can break the games down into four categories:

  1. Untestably broken: These games include Indiana Jones, Battle for Naboo, Conker’s Bad Fur Day, Banjo-Kazooie, Banjo-Tooie, Donkey Kong 64, Mario Party 3, Paper Mario, Perfect Dark, Goldeneye, Yoshi’s Story, Gauntlet Legends, Turok - Rage Wars, and I’m sure plenty of others.  Games that don’t show a single thing in MESS before running off into the weeds.
  2. 2D Games: These games largely only use the RSP for audio processing, and limit their use of the RDP to things like Textured Rectangle commands.  As a result, performance data indicates the RDP as being the main bottleneck for them.  These games include Bust-A-Move 2: Arcade Edition and Bust-A-Move ‘99.
  3. 3D Games: These games use the RSP to do a whole bunch of vector calculations, and use the RDP as much as they want.  These are the majority of games, and include Super Mario 64, Mario Kart 64, Army Men: Sarge’s Heroes, Tetrisphere, The Legend of Zelda: Ocarina of Time, Kirby 64: The Crystal Shards, Madden 64, and Aidyn Chronicles: The First Mage.
  4. Namco Museum 64: This game is Namco Museum 64.  It does not use the RSP at all and does not use the RDP at all.  It shoves PCM data out the stereo DAC by way of the main CPU, and it uses the N64’s entire video system for nothing other than a framebuffer.  As a result, it runs at around 160% when unthrottled, compare with 10% unthrottled for most 3D games and 25% unthrottled for most 2D games.  It is the only game of its kind that I know of.

In order to more accurately nail down the performance of 3D games, I’ve run a profile on three games: Castlevania, Tom & Jerry: Fists of Furry, and Super Mario 64.  Unsurprisingly, due to the immensely small number of different microcodes that were ever used on the N64, the code profiles look largely the same.  The percentages listed are the percentage of execution time spent in each function, not including children.

  • Castlevania: RDP = 41.14%, RSP = 53.23%, Other = 5.63%
    • 12.04%: fill_span_buffer_2×2
    • 11.04%: FETCH_TEXEL
    • 8.05%: render_spans_16
    • 5.13%: read_dword_generic
    • 4.99%: handle_vmadn
    • 4.59%: cpu_execute_rsp
    • 3.60%: COLOR_COMBINER
    • 3.36%: write_dword_generic
    • 3.32%: BLENDER2_16
    • 3.11%: SATURATE_ACCUM
    • 3.08%: handle_vmadh
    • 2.01%: handle_vmadm
    • 1.91%: handle_vmulf
    • 1.56%: __divdi3
    • 1.56%: memory_decrypted_read_dword
    • 1.52%: handle_ldv
    • 1.39%: handle_vmudn
    • 1.25%: handle_vmudl
    • 1.23%: handle_vadd
    • 1.18%: handle_lqv
    • 1.05%: handle_vmacu
    • 1.02%: memory_read_byte_32be
    • 0.99%: handle_vector_ops
    • 0.96%: READ8
    • 0.93%: taddr_clamp
    • 0.91%: memory_write_byte_32be
    • 0.87%: handle_vge
    • 0.82%: handle_vmrg
    • 0.80%: WRITE8
    • 0.70%: handle_vmacf
    • 0.66%: handle_vsub
    • 0.62%: handle_sqv
    • 0.62%: debugger_instruction_hook
    • 0.62%: handle_lpv
    • 0.60%: handle_vmudm
    • 0.57%: handle_vmadl
    • 0.53%: calculate_coverage
    • 0.52%: handle_sdv
    • 0.50%: handle_vmudh
    • 0.46%: decompress_z
    • 0.45%: fill_rectangle_16bit
    • 0.43%: handle_luv
    • 0.41%: handle_vcl
    • 0.39%: handle_vmulu
    • 0.38%: handle_lwc2
    • 0.38%: handle_vrcph
    • 0.37%: video_update_n64
    • 0.35%: handle_vand
    • 0.34%: handle_vxnor
    • 0.33%: sp_dma
    • 0.32%: handle_vch
    • 0.31%: handle_vrcpl
    • 0.28%: handle_swc2
    • 0.26%: handle_vlt
    • 0.26%: handle_llv
    • 0.22%: handle_vsaw
    • 0.19%: handle_vor
    • 0.19%: fill_rectangle_32bit
    • 0.16%: rdp_load_block
  • Tom & Jerry: Fists of Furry: RDP = 29.15%, RSP = 64.42%, Other = 6.43%
    • 7.41%: read_dword_generic
    • 7.22%: cpu_execute_rsp
    • 5.52%: handle_vmadn
    • 4.79%: texture_rectangle_16bit
    • 4.38%: write_dword_generic
    • 4.29%: fill_span_buffer_2×2
    • 3.54%: BLENDER1_16
    • 3.38%: FETCH_TEXEL
    • 3.27%: SATURATE_ACCUM
    • 3.04%: handle_vmadh
    • 2.75%: handle_vmadm
    • 2.66%: handle_lqv
    • 2.57%: COLOR_COMBINER
    • 2.40%: memory_decrypted_read_dword
    • 2.30%: handle_vmulf
    • 2.25%: handle_ldv
    • 1.87%: fill_rectangle_16bit
    • 1.79%: handle_vmudl
    • 1.79%: render_spans_16
    • 1.48%: READ8
    • 1.45%: handle_vadd
    • 1.40%: handle_vmudn
    • 1.38%: video_update_n64
    • 1.24%: memory_read_byte_32be
    • 1.19%: memory_write_byte_32be
    • 1.10%: debugger_instruction_hook
    • 0.96%: handle_vector_ops
    • 0.94%: WRITE8
    • 0.92%: handle_vsub
    • 0.88%: handle_vmacf
    • 0.75%: handle_sqv
    • 0.72%: handle_vmudm
    • 0.71%: handle_vsubc
    • 0.70%: handle_sdv
    • 0.68%: calculate_coverage
    • 0.66%: handle_vge
    • 0.65%: sp_dma
    • 0.60%: rdp_load_tile
    • 0.53%: _divdi3
    • 0.52%: mame_rand
    • 0.52%: copyline_rgb32
    • 0.52%: handle_vmudh
    • 0.51%: rand_memory
    • 0.49%: handle_vcl
    • 0.49%: driver_get_name
    • 0.48%: compress_z
    • 0.47%: handle_lwc2
    • 0.47%: handle_vmrg
    • 0.44%: handle_vrcpl
    • 0.37%: handle_vrcph
    • 0.35%: handle_vlt
    • 0.33%: taddr_clamp
    • 0.33%: handle_luv
    • 0.30%: handle_llv
    • 0.30%: region_post_process
    • 0.28%: handle_swc2
    • 0.28%: fill_random
    • 0.27%: handle_vsaw
    • 0.26%: handle_lsv
    • 0.24%: handle_vch
    • 0.23%: handle_vabs
    • 0.22%: handle_ssv
    • 0.19%: handle_vxor
  • Super Mario 64: RDP = 27.33%, RSP = 61.21%, Other = 11.46%
    • 10.73%: fill_span_buffer_2×2
    • 6.56%: handle_vmadn
    • 6.16%: cpu_execute_rsp
    • 5.56%: read_dword_generic
    • 4.63%: render_spans_16
    • 3.61%: SATURATE_ACCUM
    • 3.38%: write_dword_generic
    • 3.20%: handle_vmadm
    • 3.19%: FETCH_TEXEL
    • 2.99%: handle_vmadh
    • 2.74%: BLENDER1_16
    • 2.27%: COLOR_COMBINER
    • 2.10%: memory_decrypted_read_dword
    • 1.97%: handle_vmudl
    • 1.88%: handle_ldv
    • 1.72%: handle_vadd
    • 1.66%: handle_vmudn
    • 1.51%: handle_vmulf
    • 1.26%: handle_vector_ops
    • 1.23%: handle_lqv
    • 1.13%: handle_vsub
    • 1.11%: handle_vge
    • 1.07%: debugger_instruction_hook
    • 1.04%: __divdi3
    • 1.00%: memory_write_byte_32be
    • 0.98%: handle_vsubc
    • 0.98%: READ8
    • 0.92%: memory_read_byte_32be
    • 0.82%: calculate_coverage
    • 0.78%: handle_sdv
    • 0.76%: WRITE8
    • 0.75%: mame_rand
    • 0.74%: handle_vmudm
    • 0.74%: driver_get_name
    • 0.72%: compress_z
    • 0.70%: video_update_n64
    • 0.67%: fill_rectangle_16bit
    • 0.65%: handle_vrcph
    • 0.65%: rand_memory
    • 0.60%: sp_dma
    • 0.59%: handle_vlt
    • 0.54%: decompress_z
    • 0.54%: handle_vmudh
    • 0.49%: handle_vrcpl
    • 0.45%: handle_lwc2
    • 0.43%: region_post_process
    • 0.42%: handle_vmacf
    • 0.40%: handle_vcl
    • 0.39%: handle_sqv
    • 0.38%: handle_vch
    • 0.38%: copyline_rgb32
    • 0.37%: handle_llv
    • 0.37%: handle_vxor
    • 0.36%: handle_vsaw
    • 0.36%: handle_vmrg
    • 0.35%: quark_tables_create
    • 0.35%: fill_random
    • 0.33%: handle_luv
    • 0.32%: taddr_clamp
    • 0.30%: handle_swc2
    • 0.28%: handle_ssv
    • 0.27%: handle_vmadl
    • 0.27%: handle_lsv
    • 0.27%: handle_lpv
    • 0.27%: handle_vaddc
    • 0.26%: handle_vor

As I see it, the first priority is to convert the RSP core over to use MAME’s DRC system.  Unfortunately, I’m not quite sure what sort of performance increase will be seen by DRC-ifying the RSP.  The VMAC* and VMUD* opcodes have a rather large amount of code associated with them, and not only that, they loop 8 times across 8 elements.  This was probably accomplished in parallel on the real RSP.

Another piece of low-hanging fruit is the fact that around 10% of the execution time is taken up by memory accessors thanks to the RSP’s less-than-optimal IMEM and DMEM implementation.  The RSP has to hit the memory system for every single read and write that it does.  However, in reality IMEM and DMEM are accessed far, far less often by the main CPU than they are by the RSP itself.  It therefore makes better performance sense to have two 4kbyte arrays central to the RSP core itself, which it will access directly rather than going through MAME’s core memory accessors.  The main CPU will be able to access these memory spaces by querying the RSP core, and any RSP DMA accesses can be done by simply grabbing a pointer into the RSP’s IMEM or DMEM arrays, just like it works now.

Lastly, the plan is to wire the RDP emulation up to MAME’s “work unit” system, which will allow it to distribute drawing commands across multiple CPU cores when available.  Unfortunately, the RDP being as slow as it is, it will likely not have too terribly much of a performance impact on my laptop, but it might improve in the situation of a quad-core CPU.

Anyway, that’s the main plan.  Here’s hoping I can stick to it.