Posts Tagged ‘n64’

Midas

Wednesday, July 15th, 2009

This is a really lofty goal, but I think in my spare time I’m going to have a try at 100% accurately re-creating the Goldeneye engine in C, based on disassemblies and traces of the actual game.

Well, that’s the end goal, anyway.  For now I’d be satisfied with tracing out the game’s boot process to figure out why, exactly, it fails to boot in MESS currently.

Either way, my current work is here.  That’s after around 4-5 hours’ worth of work.

USF

Monday, March 9th, 2009

On a weekend with nothing better to do thanks to a deathly head cold, I decided to bolt USF support onto MESS.

So far things seem semi-promising. It’s too slow to be listened to in realtime, but that can probably be fixed with an RSP recompiler. Some games work, some games don’t work, some games work but in strange manners.

Here’s a quick rundown on the USF sets I’ve tried so far, and their relative working or unworking status:

  • Banjo-Kazooie: Works fine.
  • Beetle Adventure Racing: Works fine.
  • Blast Corps: Works fine.
  • Bomberman 64: Plays nothing.
  • Buck Bumble: Plays a very short click, then nothing.
  • Donkey Kong 64: Exits MESS almost immediately with an unknown RSP opcode, presumably due to the game running off into the weeds.
  • Dr. Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Conker’s Bad Fur Day: Works fine.
  • Diddy Kong Racing: Works fine.
  • Goldeneye: Plays the music at around 10 to 20 times the correct tempo.
  • Jet Force Gemini: Works fine.
  • The Legend of Zelda: Ocarina of Time: Plays nothing.
  • Mario Kart 64: Works fine.
  • The New Tetris: Works fine.
  • Perfect Dark: Works fine.
  • Pokemon Stadium: Plays nothing.
  • Sim City 2000: Works fine.
  • Space Station: Silicon Valley: Plays nothing.
  • Super Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Super Smash Brothers: Works fine.
  • Tetrisphere: Works fine.
  • Yoshi’s Story: Plays nothing.

I suppose the next step is to either figure out why some games are playing nothing, or why some games are just running off into the weeds, resulting in MESS fatalerroring!

Performance Anxiety

Wednesday, February 18th, 2009

I’ve decided to take a short break from working on renderer issues, insofar as pretty much every single game that doesn’t run into some sort of bug lurking in machine/n64.c or some sort of MIPS CPU bug has largely correct graphics.  The few games that do run up to a machine/n64.c-related bug or MIPS CPU bug also have largely correct graphics.  Barring a few exceptional cases, these games would be playable if not for the aforementioned bugs and/or performance.

Since I am not quite familiar enough with the N64’s non-graphical functions to be comfortable bug-hunting in those realms, for now I’m going to concentrate on performance.

Using MAME’s built-in profiler to determine CPU load distributions across the main CPU, RSP, and everything else (mainly the RDP), I can break the games down into four categories:

  1. Untestably broken: These games include Indiana Jones, Battle for Naboo, Conker’s Bad Fur Day, Banjo-Kazooie, Banjo-Tooie, Donkey Kong 64, Mario Party 3, Paper Mario, Perfect Dark, Goldeneye, Yoshi’s Story, Gauntlet Legends, Turok - Rage Wars, and I’m sure plenty of others.  Games that don’t show a single thing in MESS before running off into the weeds.
  2. 2D Games: These games largely only use the RSP for audio processing, and limit their use of the RDP to things like Textured Rectangle commands.  As a result, performance data indicates the RDP as being the main bottleneck for them.  These games include Bust-A-Move 2: Arcade Edition and Bust-A-Move ‘99.
  3. 3D Games: These games use the RSP to do a whole bunch of vector calculations, and use the RDP as much as they want.  These are the majority of games, and include Super Mario 64, Mario Kart 64, Army Men: Sarge’s Heroes, Tetrisphere, The Legend of Zelda: Ocarina of Time, Kirby 64: The Crystal Shards, Madden 64, and Aidyn Chronicles: The First Mage.
  4. Namco Museum 64: This game is Namco Museum 64.  It does not use the RSP at all and does not use the RDP at all.  It shoves PCM data out the stereo DAC by way of the main CPU, and it uses the N64’s entire video system for nothing other than a framebuffer.  As a result, it runs at around 160% when unthrottled, compare with 10% unthrottled for most 3D games and 25% unthrottled for most 2D games.  It is the only game of its kind that I know of.

In order to more accurately nail down the performance of 3D games, I’ve run a profile on three games: Castlevania, Tom & Jerry: Fists of Furry, and Super Mario 64.  Unsurprisingly, due to the immensely small number of different microcodes that were ever used on the N64, the code profiles look largely the same.  The percentages listed are the percentage of execution time spent in each function, not including children.

  • Castlevania: RDP = 41.14%, RSP = 53.23%, Other = 5.63%
    • 12.04%: fill_span_buffer_2×2
    • 11.04%: FETCH_TEXEL
    • 8.05%: render_spans_16
    • 5.13%: read_dword_generic
    • 4.99%: handle_vmadn
    • 4.59%: cpu_execute_rsp
    • 3.60%: COLOR_COMBINER
    • 3.36%: write_dword_generic
    • 3.32%: BLENDER2_16
    • 3.11%: SATURATE_ACCUM
    • 3.08%: handle_vmadh
    • 2.01%: handle_vmadm
    • 1.91%: handle_vmulf
    • 1.56%: __divdi3
    • 1.56%: memory_decrypted_read_dword
    • 1.52%: handle_ldv
    • 1.39%: handle_vmudn
    • 1.25%: handle_vmudl
    • 1.23%: handle_vadd
    • 1.18%: handle_lqv
    • 1.05%: handle_vmacu
    • 1.02%: memory_read_byte_32be
    • 0.99%: handle_vector_ops
    • 0.96%: READ8
    • 0.93%: taddr_clamp
    • 0.91%: memory_write_byte_32be
    • 0.87%: handle_vge
    • 0.82%: handle_vmrg
    • 0.80%: WRITE8
    • 0.70%: handle_vmacf
    • 0.66%: handle_vsub
    • 0.62%: handle_sqv
    • 0.62%: debugger_instruction_hook
    • 0.62%: handle_lpv
    • 0.60%: handle_vmudm
    • 0.57%: handle_vmadl
    • 0.53%: calculate_coverage
    • 0.52%: handle_sdv
    • 0.50%: handle_vmudh
    • 0.46%: decompress_z
    • 0.45%: fill_rectangle_16bit
    • 0.43%: handle_luv
    • 0.41%: handle_vcl
    • 0.39%: handle_vmulu
    • 0.38%: handle_lwc2
    • 0.38%: handle_vrcph
    • 0.37%: video_update_n64
    • 0.35%: handle_vand
    • 0.34%: handle_vxnor
    • 0.33%: sp_dma
    • 0.32%: handle_vch
    • 0.31%: handle_vrcpl
    • 0.28%: handle_swc2
    • 0.26%: handle_vlt
    • 0.26%: handle_llv
    • 0.22%: handle_vsaw
    • 0.19%: handle_vor
    • 0.19%: fill_rectangle_32bit
    • 0.16%: rdp_load_block
  • Tom & Jerry: Fists of Furry: RDP = 29.15%, RSP = 64.42%, Other = 6.43%
    • 7.41%: read_dword_generic
    • 7.22%: cpu_execute_rsp
    • 5.52%: handle_vmadn
    • 4.79%: texture_rectangle_16bit
    • 4.38%: write_dword_generic
    • 4.29%: fill_span_buffer_2×2
    • 3.54%: BLENDER1_16
    • 3.38%: FETCH_TEXEL
    • 3.27%: SATURATE_ACCUM
    • 3.04%: handle_vmadh
    • 2.75%: handle_vmadm
    • 2.66%: handle_lqv
    • 2.57%: COLOR_COMBINER
    • 2.40%: memory_decrypted_read_dword
    • 2.30%: handle_vmulf
    • 2.25%: handle_ldv
    • 1.87%: fill_rectangle_16bit
    • 1.79%: handle_vmudl
    • 1.79%: render_spans_16
    • 1.48%: READ8
    • 1.45%: handle_vadd
    • 1.40%: handle_vmudn
    • 1.38%: video_update_n64
    • 1.24%: memory_read_byte_32be
    • 1.19%: memory_write_byte_32be
    • 1.10%: debugger_instruction_hook
    • 0.96%: handle_vector_ops
    • 0.94%: WRITE8
    • 0.92%: handle_vsub
    • 0.88%: handle_vmacf
    • 0.75%: handle_sqv
    • 0.72%: handle_vmudm
    • 0.71%: handle_vsubc
    • 0.70%: handle_sdv
    • 0.68%: calculate_coverage
    • 0.66%: handle_vge
    • 0.65%: sp_dma
    • 0.60%: rdp_load_tile
    • 0.53%: _divdi3
    • 0.52%: mame_rand
    • 0.52%: copyline_rgb32
    • 0.52%: handle_vmudh
    • 0.51%: rand_memory
    • 0.49%: handle_vcl
    • 0.49%: driver_get_name
    • 0.48%: compress_z
    • 0.47%: handle_lwc2
    • 0.47%: handle_vmrg
    • 0.44%: handle_vrcpl
    • 0.37%: handle_vrcph
    • 0.35%: handle_vlt
    • 0.33%: taddr_clamp
    • 0.33%: handle_luv
    • 0.30%: handle_llv
    • 0.30%: region_post_process
    • 0.28%: handle_swc2
    • 0.28%: fill_random
    • 0.27%: handle_vsaw
    • 0.26%: handle_lsv
    • 0.24%: handle_vch
    • 0.23%: handle_vabs
    • 0.22%: handle_ssv
    • 0.19%: handle_vxor
  • Super Mario 64: RDP = 27.33%, RSP = 61.21%, Other = 11.46%
    • 10.73%: fill_span_buffer_2×2
    • 6.56%: handle_vmadn
    • 6.16%: cpu_execute_rsp
    • 5.56%: read_dword_generic
    • 4.63%: render_spans_16
    • 3.61%: SATURATE_ACCUM
    • 3.38%: write_dword_generic
    • 3.20%: handle_vmadm
    • 3.19%: FETCH_TEXEL
    • 2.99%: handle_vmadh
    • 2.74%: BLENDER1_16
    • 2.27%: COLOR_COMBINER
    • 2.10%: memory_decrypted_read_dword
    • 1.97%: handle_vmudl
    • 1.88%: handle_ldv
    • 1.72%: handle_vadd
    • 1.66%: handle_vmudn
    • 1.51%: handle_vmulf
    • 1.26%: handle_vector_ops
    • 1.23%: handle_lqv
    • 1.13%: handle_vsub
    • 1.11%: handle_vge
    • 1.07%: debugger_instruction_hook
    • 1.04%: __divdi3
    • 1.00%: memory_write_byte_32be
    • 0.98%: handle_vsubc
    • 0.98%: READ8
    • 0.92%: memory_read_byte_32be
    • 0.82%: calculate_coverage
    • 0.78%: handle_sdv
    • 0.76%: WRITE8
    • 0.75%: mame_rand
    • 0.74%: handle_vmudm
    • 0.74%: driver_get_name
    • 0.72%: compress_z
    • 0.70%: video_update_n64
    • 0.67%: fill_rectangle_16bit
    • 0.65%: handle_vrcph
    • 0.65%: rand_memory
    • 0.60%: sp_dma
    • 0.59%: handle_vlt
    • 0.54%: decompress_z
    • 0.54%: handle_vmudh
    • 0.49%: handle_vrcpl
    • 0.45%: handle_lwc2
    • 0.43%: region_post_process
    • 0.42%: handle_vmacf
    • 0.40%: handle_vcl
    • 0.39%: handle_sqv
    • 0.38%: handle_vch
    • 0.38%: copyline_rgb32
    • 0.37%: handle_llv
    • 0.37%: handle_vxor
    • 0.36%: handle_vsaw
    • 0.36%: handle_vmrg
    • 0.35%: quark_tables_create
    • 0.35%: fill_random
    • 0.33%: handle_luv
    • 0.32%: taddr_clamp
    • 0.30%: handle_swc2
    • 0.28%: handle_ssv
    • 0.27%: handle_vmadl
    • 0.27%: handle_lsv
    • 0.27%: handle_lpv
    • 0.27%: handle_vaddc
    • 0.26%: handle_vor

As I see it, the first priority is to convert the RSP core over to use MAME’s DRC system.  Unfortunately, I’m not quite sure what sort of performance increase will be seen by DRC-ifying the RSP.  The VMAC* and VMUD* opcodes have a rather large amount of code associated with them, and not only that, they loop 8 times across 8 elements.  This was probably accomplished in parallel on the real RSP.

Another piece of low-hanging fruit is the fact that around 10% of the execution time is taken up by memory accessors thanks to the RSP’s less-than-optimal IMEM and DMEM implementation.  The RSP has to hit the memory system for every single read and write that it does.  However, in reality IMEM and DMEM are accessed far, far less often by the main CPU than they are by the RSP itself.  It therefore makes better performance sense to have two 4kbyte arrays central to the RSP core itself, which it will access directly rather than going through MAME’s core memory accessors.  The main CPU will be able to access these memory spaces by querying the RSP core, and any RSP DMA accesses can be done by simply grabbing a pointer into the RSP’s IMEM or DMEM arrays, just like it works now.

Lastly, the plan is to wire the RDP emulation up to MAME’s “work unit” system, which will allow it to distribute drawing commands across multiple CPU cores when available.  Unfortunately, the RDP being as slow as it is, it will likely not have too terribly much of a performance impact on my laptop, but it might improve in the situation of a quad-core CPU.

Anyway, that’s the main plan.  Here’s hoping I can stick to it.

Coordination

Saturday, January 31st, 2009

Continuing this weekend’s N64 extravaganza, some more poking around has fixed a long-standing issue with my new coverage implementation, which is that the texture coordinates and gouraud steps were being mangled by up to +/- 1 pixel delta.  This may not seem like a lot, but keep in mind that the S (aka U) texture coordinate can change by anywhere from 8 to 32 texels when traversing across a triangle by only one pixel vertically.

And now, the pretty pictures:

Before:

After:

There’s still some work to be done on rounding the last pixel in a horizontal span, which is causing the remaining issues that are visible in the Zelda screenshots, but still, things are looking considerably better.

Roundabout

Saturday, January 31st, 2009

I finally decided to figure out why the scene geometry in The Legend of Zelda: Ocarina of Time is so screwed up.  As it turns out, RSP DMA transfers should have their length rounded up to the next 8 bytes, not the next 4.

Since it’s kind of hard to get the dramatic change across with still shots, I broke out the -aviwrite parameter and uploaded a couple of videos to YouTube.

Before: http://www.youtube.com/watch?v=OUwwBc3G1h0

After: http://www.youtube.com/watch?v=7_9L0G7IsRY

The Mother Lode

Saturday, October 11th, 2008

A Play-N-Trade recently opened up in a local mall, and I have to admit that it’s been an absolute godsend as far as N64 games are concerned; they have more unpopular games than I could have hoped for.

My ultimate goal is to amass a complete collection of every Nintendo 64 cartridge released in North America, and I realize it’s going to be tough going, but I think Play-N-Trade is definitely going to help me out in this regard.

Today I collected up all of the unused game consoles and games that I had laying around my apartment that I knew I’d never get around to playing or would never play again, and took them over to the Play-N-Trade.  $303 in in-store credit later, I picked up the following N64 games for only $135 out of that credit pool:

A Bug’s Life
Chopper Attack
Destruction Derby 64
Gauntlet Legends
Gex 64: Enter the Gecko
Gex 3: Deep Cover Gecko
Glover
The Legend of Zelda: Ocarina of Time
The Legend of Zelda: Majora’s Mask
Mario Golf
Mario Kart 64
Monster Truck Madness 64
Pokemon Snap
Re-Volt
Ridge Racer 64
Roadsters
Rugrats Scavenger Hunt
Starfox 64
Super Bowling
Tom & Jerry in: Fists of Furry
Tonic Trouble
WCW vs. NWO: World Tour
WWF: No Mercy

Ultimately, a pretty good haul!

Even and Odd

Wednesday, October 1st, 2008

PIN64 proves its usefulness once again!

I’ve always had some slight suspicions that 1-Cycle mode uses the Color Combiner setup for Cycle 1 rather than Cycle 0, mostly because I’ve observed some games setting different parameters for Cycle 1 versus Cycle 0, even when in 1-Cycle mode.

Sure enough, I finally found one game that proves my theory: F-1 Pole Position 64.  Here’s what its menus originally looked like:

Aaaaand after:

Bicycle Built for Two

Friday, September 26th, 2008

As it turns out, there was a rather significant bug in the way MESS was loading textures using the Load Block command when a line-swap delta  was specified.  I’m actually surprised it worked at all.  Now the secondary texture loaded into the CC_TEX1 input is loaded properly.

Before:

After:

Grab Bag

Thursday, September 25th, 2008

A bunch of miscellaneous fixes today:

After making the decision to remove my original alpha-rejection code as it was causing more bugs than it was fixing, I began the arduous process of tracking down the multitude of ways in which the Nintendo 64 can handle textures that need an alpha border to be cut out.

First, I ran across Quake 64, which uses the RDP’s Alpha Compare functionality to cut out its font.  Alpha Compare is relatively simple - it takes place after the Color Combiner stage and before the Blender stage.  If the alpha value of the color output by the Color Combiner is less than or equal to the alpha value specified in the Blend Color register, the RDP will not pass that pixel into the Blender.  If the combined alpha is greater than the alpha value specified in the Blend Color register, the RDP will rasterize the pixel through the Blender.

Sure enough, the UI’s text is now fine:

Next, I finally came to the realization that there is no other form of automatic pixel rejection based on an alpha value, thanks to a tip from a fellow who goes by the nickname of ‘Happy’.  He put forth the idea that the N64 only supports Alpha Compare and no other method for automatic pixel rejection, which caused me to look a bit closer at things like the karts in Mario Kart 64 and the trees in Super Mario 64.

Sure enough, it turns out that I had to emulate the effects of the “Coverage Times Alpha” bit in the Set Other Modes command.  I was only emulating the “Use Coverage As Alpha” bit, and since the transparent pixels were fully covered, they were being rasterized due to their coverage being non-zero.  However, the pixel alpha value was, in fact, zero, so multiplying the coverage value by zero resulted in a final blended alpha of zero!  Brilliant!  Mario Kart 64 and Super Mario 64 now look much better:

I then turned to Space Station: Silicon Valley, wherein the Take 2 logo was completely black.  It turns out that if both alpha values of the second Blend cycle are zero, the pixel does not become zero - in fact, the cycle is then just ignored.  A quick fix later and the logo sprang to life:

Lastly is a personally controversial fix in that it wasn’t until this blog post that I realized that the fix I came up with causes problems in other games.  In Bust-A-Move 2, there is an animated ground plane that is drawn line-by-line down the screen.  The topmost line of the plane actually has a T coordinate of 160.0, and as a result the Set Tile Size command sets the tile’s T base to be 160.0.  However, the S and T mask bits were set to 7, which results in the coordinates mirroring from 0 to 127 and back.  Since the texture coordinates were being adjusted by the values in the Set Tile Size command after mirroring, and the mirroring was affecting the coordinates, they were not remaining at the appropriate position.

Before my modification to the order of coordinate adjustment, the ground plane looked like this:

After my change, the ground plane looked like this:

Unfortunately, I discovered when writing this blog post that the same change seems to affect the text in some games, so it will need further examination.  Still, I feel I’m on the right track.

Broken Mirrors

Wednesday, September 24th, 2008

A number of N64 games in MESS have suffered from a strangle hall-of-mirrors effect, or assorted missing graphics.  Since I’ve had some time off from work, I decided to look into it.

As it turns out, I wasn’t properly emulating the fact that the most significant coverage bit is stored in CPU-visible memory as the least significant bit in a 16-bit RGB555 triplet.  This didn’t have much of an effect on things unless a game was copying the framebuffer contents into RAM, in which case the low bit was used as the alpha bit in a 16-bit RGB555 texture that was used in the following frame.

In addition, I’ve had it re-confirmed in my mind that in at least some cases the N64 does not choose not to blend a pixel if its alpha value is 0; this along with the coverage bit fix has the following effects on the following games that I’ve tested:

- Mario Kart 64: The full backgrounds in menus now appear, rather than the first 16 lines or so, and the jumbotron in the Luigi Raceway and Wario Speedway venues work:

Dr. Mario 64: The playfield no longer becomes all white and hall-of-mirror-y when playing:

Pokemon Puzzle League: The scrim around the playfield and HUD is now visible.

Tetrisphere: There’s no longer a hall-of-mirrors effect between the gaps in the undulating background tiles.

I have a strong feeling that in some cases the N64 does reject pixels prior to color-combination and blending if they have an alpha value of0, it just may be controlled by a mechanism I do not fully understand (Force Blend bit in the Set Other Modes command?).