FCSC Quals 2021 - Stars:2ndMix CryptoEdition (reverse)

May 03, 2021

Challenge description

I just happened to stumble uppon a CD that had stayed in my Dreamcast.
FCSC2021 is written on it. I thought it would be useful to you. As sending
the CD through postal service would not be practical, I am giving to you
instead a DiscJuggler image of the CD. Good luck !

For this challenge, we are given a single Stars2ndMix_CryptoEdition.cdi disc image.

The tooling

Now that we have the code, let’s get to reverse-engineering it properly.

The first step is being able to do static analysis. The Dreamcast main CPU is a Hitachi SH7750 using the SuperH4 ISA. This ISA is supported by most common disassemblers such as Ghidra, IDA and r2/Cutter (but not Binary Ninja :( ). I settled on using Ghidra as it provides a decompiler for the architecture out of the box.

The second aspect is dynamic analysis. Although static analysis is fun, when analysing an unfamiliar architecture it is always preferable to be able to query and manipulate the CPU state. Unfortunately, most emulator projects do not seem to provide usable debugging interfaces out of the box, which is a true pain for reverse-engineering. Some of them, such as Redream, do provide a gdb stub for debugging.. but not clearly documented. Others, such as Reicast, do provide a separate debugger … but without some features (the cpu debugging is not implemented in reidbg).

In the end I settled on using Reicast because it is open source and easily buildable. By adding custom code in the CPU emulation loop , we can get complete control over the CPU and its state. It can be used to produce execution traces or dump some memory regions, which can come in handy as you will see later on.

Getting the code

The first step is to extract the game code from the cdi file. To do this I used the GDRom Explorer tool, which can be executed with wine:

GDROM Explorer

The 1ST_READ.BIN file contains the game code. However it cannot be directly analysed. The Dreamcast implements a software protection measure called scrambling. The 1ST_READ.BIN file we extracted is completly scrambled and needs to be unscrambled before analysis (more details can be found on Fabien Sanglard’s blog here). To unscramble the game we can use the scramble tool in the KallistiOS os repository here:

https://github.com/KallistiOS/KallistiOS/tree/master/utils/scramble

Simply give the 1ST_READ.BIN to the scramble program as follows to get an unscrambled version:

$ scramble -d 1ST_READ.BIN unscrambled.bin

Analysis of the game binary

Loading the binary

Now that we have a correct binary we can start analysing it. If we follow the documentation about the Dreamcast boot process, we known that the base address for the binary is 0x8c010000. Let’s load it in Ghidra:

Ghidra Load options

Then we can launch the Ghidra analysis and wait for it to complete (note: you may have to stop it midway as it tends to get stuck on some big function).

Debugging: Ghetto mode

Reicast does not have any CPU debugger. But as it is open source we can quickly build our own. The emulator uses dynamic recompilation by default, the first step is disabling this (as it is not easily hookable). To do this we check “interpreter” for everything in the “Advanced” option tab:

disable_dynarec

The CPU interpreter loop is inside the libswirl/hw/sh4/interpr/sh4_interpreter.cpp file. An easily hookable place is the ExecuteOpcode function. Printing the state for a select range of instruction is as easy as follows:

// Helper code
void dump_context()
{
    std::printf("0x%08x", curr_pc);

    for (unsigned i = 0; i < 16; i++)
        std::printf(" r%u = 0x%08x", i, r[i]);

    std::printf("\n");
}

void ExecuteOpcode(u16 op)
{
	if (sr.FD == 1 && OpDesc[op]->IsFloatingPoint())
		RaiseFPUDisableException();

	// Actual hook
    	if ((curr_pc >= <low bound> && curr_pc < <high bound>))
    	    dump_context();

	OpPtr[op](op);
}

Try to trace as little as possible or you may encounter some issues …

vim_trace

Finding the target

A good way to find the checking code is to look for the classic “good boy” and “bad boy” basic blocks. As per usual, we look for what should be printed on the screen after the check. For this game we look for the “YOU FAILED” string:

you_failed_screen

Ghidra quickly finds a reference.

you_failed_ghidra

This string is referenced in the check_function at address 0x8c010d90.

failed_xref

The checking function is quite big and the decompilation pretty cryptic. By looking around we can find a few interesting pieces of code. The first one starts near the beginning of the functions.

interesting_ops

All around the function, the counter variable is used. The fact that our input is exactly 16 bytes and the operations used by this piece of code make it pretty suspicious. We can check that this manipulates our input by taking an execution trace of the code between addresses 0x8c010dae and 8c010de8. We use as an input AAAABBBBCCCCDDDD to get something recognisable. From the following (simplified) execution trace we retrieve our input in the r5 register:

0x8c010db8 r0 = 0x00000000 [...] r5 = 0x00000041 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000001 [...] r5 = 0x00000041 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000002 [...] r5 = 0x00000041 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000003 [...] r5 = 0x00000041 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000004 [...] r5 = 0x00000042 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000005 [...] r5 = 0x00000042 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000006 [...] r5 = 0x00000042 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000007 [...] r5 = 0x00000042 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000008 [...] r5 = 0x00000043 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x00000009 [...] r5 = 0x00000043 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000a [...] r5 = 0x00000043 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000b [...] r5 = 0x00000043 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000c [...] r5 = 0x00000044 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000d [...] r5 = 0x00000044 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000e [...] r5 = 0x00000044 r6 = 0x8c07e25d [...]
0x8c010db8 r0 = 0x0000000f [...] r5 = 0x00000044 r6 = 0x8c07e25d [...]

The instruction just before this address is:

8c010db6 4c 05         mov.b     @(r0,r4)=>DAT_8c07e128,r5

This corresponds to the line 58 in the previous screenshot. At line 68, the input byte is xored with a combination of other variables. By looking at the registers during the xor operation, we can get the key our input is xored with. We “breakpoint” at the following instruction:

8c010de0 1a 25         xor       r1,r5

And get the following trace:

0x8c010de0 [...] r1 = 0xffffffff [...] r5 = 0x00000041 [...]
0x8c010de0 [...] r1 = 0xffffffd7 [...] r5 = 0x00000041 [...]
0x8c010de0 [...] r1 = 0xffffffa4 [...] r5 = 0x00000041 [...]
0x8c010de0 [...] r1 = 0xffffffda [...] r5 = 0x00000041 [...]
0x8c010de0 [...] r1 = 0xfffffff6 [...] r5 = 0x00000042 [...]
0x8c010de0 [...] r1 = 0xffffffaa [...] r5 = 0x00000042 [...]
0x8c010de0 [...] r1 = 0x00000011 [...] r5 = 0x00000042 [...]
0x8c010de0 [...] r1 = 0x00000037 [...] r5 = 0x00000042 [...]
0x8c010de0 [...] r1 = 0xfffffffd [...] r5 = 0x00000043 [...]
0x8c010de0 [...] r1 = 0xffffffc3 [...] r5 = 0x00000043 [...]
0x8c010de0 [...] r1 = 0xffffffda [...] r5 = 0x00000043 [...]
0x8c010de0 [...] r1 = 0x0000006c [...] r5 = 0x00000043 [...]
0x8c010de0 [...] r1 = 0xffffffa3 [...] r5 = 0x00000044 [...]
0x8c010de0 [...] r1 = 0x00000029 [...] r5 = 0x00000044 [...]
0x8c010de0 [...] r1 = 0xffffffda [...] r5 = 0x00000044 [...]
0x8c010de0 [...] r1 = 0xffffff98 [...] r5 = 0x00000044 [...]

By testing multiple times with different input, we can confirm that the key is constant and does not depend on the input. The extracted xor key is thus:

xor_key = [
        0xff, 0xd7, 0xa4, 0xda,
        0xf6, 0xaa, 0x11, 0x37,
        0xfd, 0xc3, 0xda, 0x6c,
        0xa3, 0x29, 0xda, 0x98
]

By scrolling around the function code, we encounter another very suspicious construct.

call_to_check

The function 8c0110b4 is called with a 0x10 as first argument (flag size) and the xored input array (here Ghidra displays puVar9 because it has trouble resolving the indirect calls in SH4, you can see the function address in the disassembly listing). The return value of this function is then compared against 0.

Let’s take a look at the potential flag checking function:

check_bytes_function

From a very quick glance this function seems extremely weird. The input size is stored in a mysterious structure. The encrypted input is copied into this structure (the do/while loop from line 16 to 21). Then a counter is set to 4 just before looping indefinitely on a seemingly always true condition (????).

At first I was surprised by this function because its behavior seemed nonsensical. From the execution traces, I could notice that after a while the sleep_counter would mysteriously be set to 0 and the function would exit with 0x7f. And as we saw, the hypothetical correct return value for this function should be zero.

There were two possible explanations to this behavior. The first one would be threading. From taking a look at the KallistiOS source code, we see that a threading system is implemented. This could match as the code from lines 24 to 27 closely ressembles a spinlock. I spent a lot of time digging through traces, thinking that somehow my hooks in the emulator were magically broken during IRQs (and thus task switching). In the end I ditched this hypothesis and took a step back.

By digging through the documentation and scrolling through traces I got an idea explaining the curious phenomenon. While the dreamcast relies on the SH4 cpu for common tasks, it also uses specialised hardware for others, such as the NEC PowerVR2 GPU for graphics and the Yamaha AICA for sound processing. The instruction loading the sleep_counter in the spinlock is the following:

8c0110d4 22 51         mov.l     @(r2, 0x8), r1

The register state at this address is:

0x8c0110d4 [...] r1 = 0x00000004 r2 = 0xa080ff00 [...]

The 4 in r1 is expected (as it is the value set at line 24), but the structure address in r2 is more peculiar. If we look for the constant 0xa0800000 in the KallistiOS code we find out that this address is inside the Yamaha AICA SPU’s memory (kernel/arch/dreamcast/sound/snd_iface.c).

What this piece code does is sending commands to the SPU ! From a cursory glance, the main CPU places the command id in the sleep_counter? field of the mysterious structure. The size of the argument in input_size?. Then the command argument in the flag_bytes field. It then waits for the SPU to finish processing the command. When the command reaches completion, the SPU sets the input_size? as the return value of the call, and signals the completion by setting sleep_counter? to 0, making the CPU exit its loop.

It explains why values mysteriously changed and why the CPU trace did not show anything ! All of this is possible as the SPU code/firmware is loaded by the main CPU. Anyone is free to upload whatever code they want to run on it. The SPU’s firmware can easily be dumped for analysis by placing the following code in the ExecuteOpcode function, as we did for getting the execution trace:

if (curr_pc == 0x8c0110d2)
{
    // Let's dump the spu ram
    std::vector<u8> arm_code;

    for (unsigned i = 0; i < 0x10000; i++)
        arm_code.push_back(ReadMem8(0xa0800000 + i));

    FILE* fp = fopen("spu.bin", "wb");
    fwrite(arm_code.data(), 1, arm_code.size(), fp);
    fclose(fp);
}

Analysis of the SPU’s firmware

The firmware can be loaded using the Open with options... of Binary Ninja (I used the SPU memory mapping as the base, but I am unsure about this, oh well …):

spu_open_binja

Now that binja is analysing. We need a way of isolating which piece of code loads the encrypted flag bytes. Remember when I said “don’t trace everything” ? Well this is exactly what I did. In the ARM emulation code inside libswirl/hw/arm7/arm7.cpp, I added the following code to the ARM7Backend::Step function:

u32 DYNACALL ARM7Backend::Step(Arm7Context* ctx) {
	u32 clockTicks = 0;

	if (reg[INTR_PEND].I)
	{
		CPUFiq(ctx);
	}
	
	/// start logging code
    	std::printf("ARM:0x%08x ", reg[15]);

    	for (unsigned i = 0; i < 16; i++)
        	std::printf("r%u = 0x%08x ", i, reg[i]);

    	std::printf("\n");
	/// end logging code

	reg[15].I = armNextPC + 8;

	#include "gpl/vba-arm/arm-new.h"

	return clockTicks;
}

This modification almost makes the emulator grind to a halt. The goal is to reach the “YOU FAILED” as quickly as possible by mashing the touchpad. Once the screen is reached we get the following wonderful trace:

$ ls -lh full_trace.txt
-rw-r--r-- 1 sideway sideway 7.2G May  2 23:05 full_trace.txt

From looking at the last few lines of the trace, we can get a few information. The ARM cpu sees the memory at 0xa0800000 (from the main CPU PoV) as if it were 0. We know that the structure containing the main CPU command lies at 0xa080ff00. As the input buffer is at offset 0x10 from the beginning of this structure, we can guess that from the SPU’s PoV the encrypted flag bytes start at address 0x0000ff10.

Trying to grep the trace by relying solely on this information will give too many results. To narrow down on the wanted piece of code we can use the fact that we know the encrypted bytes for a given input. If the input is all zeroes, we get 0x30 ^ 0xff = 0xcf for the first byte, 0x30 ^ 0xd7 = 0xe7 for the second, so on and so forth.

$ grep "0x0000ff10 " full_trace.txt | grep "cf " # Grep for first flag byte address + encrypted value of first 0
ARM:0x00001fec [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf
ARM:0x00001ff0 [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf
ARM:0x00001ff4 [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf
ARM:0x00001ff8 [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf
ARM:0x00001ffc [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf
ARM:0x00001fe8 [...] r2 = 0x0000ff10 [...] r14 = 0x000000cf

$ grep "0x0000ff11 " full_trace.txt | grep "e7 " # Grep for second flag byte address + encrypted value of second 0
ARM:0x00001fec [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7
ARM:0x00001ff0 [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7
ARM:0x00001ff4 [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7
ARM:0x00001ff8 [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7
ARM:0x00001ffc [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7
ARM:0x00001fe8 [...] r2 = 0x0000ff11 [...] r14 = 0x000000e7

We quickly notice that the first two bytes are loaded from the same piece of code and nowhere else.

flag_checking_arm

We know from the input that r14 contains the encrypted flag byte. It means that the instruction at (1) is the one loading the byte (lr is an alias for r14 in ARM). The instruction (2) loads a byte from somewhere. In the last block of instructions (3), the flag byte is xored with the value from the unknown array and the result is orred with r0. This series of instruction is repeated for each input byte.

r0 in ARM contains the return value of a function. Let’s see what this function returns:

$ grep "ARM:0x00002000" full_trace.txt
ARM:0x00002000 r0 = 0x0000007f [...] 

This function returns 0x7f, which is exactly the return value we got on the main CPU side ! To get 0 as a result we need to make sure that the xored bytes from our flag are equal to the ones coming from the unknown array. Fortunately, we can juste grep for them in the trace:

$ grep "ARM:0x00001ff0" full_trace.txt
ARM:0x00001ff0 [...] r3 = 0x000000cd [...]
ARM:0x00001ff0 [...] r3 = 0x000000e3 [...]
ARM:0x00001ff0 [...] r3 = 0x000000e5 [...]
ARM:0x00001ff0 [...] r3 = 0x000000eb [...]
ARM:0x00001ff0 [...] r3 = 0x000000b7 [...]
ARM:0x00001ff0 [...] r3 = 0x00000092 [...]
ARM:0x00001ff0 [...] r3 = 0x00000025 [...]
ARM:0x00001ff0 [...] r3 = 0x00000005 [...]
ARM:0x00001ff0 [...] r3 = 0x000000cd [...]
ARM:0x00001ff0 [...] r3 = 0x000000f1 [...]
ARM:0x00001ff0 [...] r3 = 0x000000ea [...]
ARM:0x00001ff0 [...] r3 = 0x0000005c [...]
ARM:0x00001ff0 [...] r3 = 0x00000090 [...]
ARM:0x00001ff0 [...] r3 = 0x0000006a [...]
ARM:0x00001ff0 [...] r3 = 0x000000e9 [...]
ARM:0x00001ff0 [...] r3 = 0x000000af [...]

By xoring the expected bytes with the constant xor key we get the flag:

expected_bytes = [
    0xcd, 0xe3, 0xe5, 0xeb,
    0xb7, 0x92, 0x25, 0x05,
    0xcd, 0xf1, 0xea, 0x5c,
    0x90, 0x6a, 0xe9, 0xaf,
]

xor_key = [
    0xff, 0xd7, 0xa4, 0xda,
    0xf6, 0xaa, 0x11, 0x37,
    0xfd, 0xc3, 0xda, 0x6c,
    0xa3, 0x29, 0xda, 0x98
]

print("FCSC{" + "".join(chr(a ^ b) for a, b in zip(expected_bytes, xor_key)) + "}")

This prints FCSC{24A1A84202003C37}. And of course, we can enter it inside the game to get the success message:

success

Afterword

Having done some reverse-engineering on retro consoles previously (GBA), I was really happy to be able to play around with a new architecture. These old platforms really force you to go out of your comfort zone and make you go back to the basics. Not having your usual tools forces you to get creative and explore other reverse-engineering approaches (like using execution traces). It was also a good idea to introduce this asynchronous behavior between the main CPU and SPU. All too often reverse-engineering challenges are too serial in nature. Having some kind of multithreading was really refreshing to see !

Overall, I really liked this challenge (apart from setting up a good debugging environment, which was a PITA) and would like to see more exotic architectures in the future !