Challenge description
Votre mission est de trouver un moyen d'extraire les secrets de plusieurs binaires packés dans un temps imparti.
nc challenges.france-cybersecurity-challenge.fr 2202
The attached file is a python pwntools script, which serves as a skeleton for
the solving script:
#!/usr/bin/env python3
import subprocess
from pwn import *
HOST = args.HOST or "challenges.france-cybersecurity-challenge.fr"
PORT = args.PORT or 2202
def main():
r = remote(HOST, PORT)
_ = r.recvlines(5)
cmdline = r.recvline().strip().decode("utf-8").split(" ")
assert cmdline[0] == "hashcash"
assert cmdline[1] == "-mb26"
assert cmdline[2].isalnum()
log.info(f"Solving PoW")
solution = subprocess.check_output([cmdline[0], cmdline[1], cmdline[2]])
log.success(f"Solved PoW: {solution.decode()}")
r.sendline(solution.strip())
_ = r.recvline()
encoded = r.recvline().strip()
binary = b64d(encoded)
with open("/tmp/hyperpacker.bin", "wb") as fp:
fp.write(binary)
log.success("Binary written at /tmp/hyperpacker.bin")
# Solve the challenge here
# ...
if __name__ == "__main__":
main()
The binaries
Executing the python code downloads a binary, which is a PE file.
$ file /tmp/hyperpacker.bin
hyperpacker.bin: PE32+ executable (console) x86-64, for MS Windows
If we execute it with wine, it takes some time and then a message box pops up
with the secret.
So far so good, easy enough. However, after downloading and executing other
binaries we can notice a few things. The binaries are completely identical
in structure (same addresses for functions or global variables) and only
some thing change, such as the big encrypted buffer or constants in certain
functions. Some binaries also fail to unpack as their execution is way too
long. This looks like an optimization problem !
Code analysis
Let’s load the binary in a disassembler and start at the entrypoint.
The entrypoint of calls a function which returns a pointer to code, and if not
null jumps to it.
This function first calls a function with a pointer to a huge buffer in the data
section. We can assume this is the packed code. The other function takes the
same buffer and seems to return the address of the entrypoint of the unpacked
code. Note that if the unpacking fails, the return value is the address of start.
Let’s take a look at the sanity checks in sub_1400973f0.
Sanity check
The checking function first checks that the header is valid, then the other
functions are used to VirtualProtect the code, compute the entrypoint and
return it.
The header check simply compares the first two bytes of the unpacked code
with the word 0x5a4d, which in ascii iz MZ
. This is the same magic bytes
by which a PE file starts !
Now we know the form of the unpacked code. We can move on to the actual unpacking
algorithm itself.
Unpacking
As we guessed that this challenge seems to be an optimization one, we need to
see where the program spends most of its time. This can be done by using perf.
$ perf record wine /tmp/hyperpacker.bin && perf report
On my test binary, the most called function sits à 12.57% of the CPU time and
is at address 0x140097358.
This function does a simple check on an input byte array, doing a logical and
with an input byte and the inverse of a byte in the second buffer. Let’s keep
this aside for now and look at this function’s parent, which happens to be
the function called before the checking code).
In (1) we have the long function called with a stack buffer and a global byte
array. In (2) we have another long mystery function. In (3) is a function
wrapping the checking code we have seen earlier. Its return value is used as a
check to stop the big loop and enter the actual unpacking process. In (4), the
output area, the original packed area as well as the stack buf are passed to the
function doing the decryption.
It simply computes a xor key using the stack_buf passed as an argument and
decrypts the packed code into the new buffer. The actual key generation uses a lot
of aes intrinsics and you don’t want to reverse that, so let’s just not look into
it.
From here the issue is pretty clear. The computation of the stack buffer in the
big loop of weird_loop_stuff, from which the xor key is derived, is too slow.
We need to find a way to speed it up.
Going fast
The second long function takes the stack buf and iterates from a certain offset,
here 0x7 (it changes from binary to binary). It increments the value at this offset,
and if it is different from 0xff it returns immediately. Otherwise, it sets the
current byte to 0, increments the offset and starts the loop again.
This may seem mystical at first, but what it does is increment some kind of pseudo
bignum inside of stack_buf. Calling the function a few times on a number with an
offset of 2 would look like this:
00 00 fe 02 00 00 00 00
00 00 ff 02 00 00 00 00
00 00 00 03 00 00 00 00
00 00 01 03 00 00 00 00
...
The stack buf is then compared in the other long function to a global buffer.
In the binary used for this test the value in the buffer is:
00 00 00 00 00 00 00 8a 00 00 98 00 00 00 00 00
We can already notice that the offset at which the incrementation starts in
long_function_before is the same as the first non null byte of this
global constant. To sum up, here is the stack_buf generation algorithm in
pseudo code:
while True:
if check_decrypt(packed_code, stack_buf):
break
while True:
increment(stack_buf)
if check_constant_and(stack_buf, constant):
break
compute_xor_key_and_decrypt(..., stack_buf)
The issue with this algorithm is that if the constant bignum is big enough,
the incrementation will take forever. To optimize this I chose to patch the
binary and replace the incrementation function by a function doing the
incrementation only at the same offsets as the non null bytes of the global
constant bignum.
uint8_t next(uint8_t cur_val, uint8_t target)
{
for (uint16_t new_val = cur_val + 1; new_val < 0x100; new_val++)
{
if ((~target & new_val) == 0)
return new_val;
}
return 0;
}
void advance(uint8_t* state, uint8_t* final_state)
{
for (unsigned i = 0; i < 16; i++)
{
if (final_state[i] == 0)
continue;
uint8_t next_val = next(state[i], final_state[i]);
state[i] = next_val;
if (next_val)
return;
// Wrap around, we continue
}
}
I compiled this code in -O3
using clang and overwrote the long_function
code at 0x140097358 with it. Now, as the optimized method only generates
valid pseudo bignums, the checking part in main can be replaced as well.
To do this I compiled and patched the code from 0x1400972b1 with the
following shellcode:
[BITS 64]
section .text
; To inject at 0x1400972b1
_start:
push rdi
push rsi
; pointer to stack_buf
lea rdi, [rbp-0x18]
; pointer to global constant
mov rsi, 0x140096a80
; address of long_function (patched with optimized shellcode)
mov rax, 0x140097358
call rax
pop rdi
pop rsi
; jump back to loop start
mov rax, 0x140097277
jmp rax
With these small patches, all binaries, even the longer ones finished
unpacking in only seconds ! However, in some rare instances where the bignum
was too big I would get a segfault when jumping to the unpacked code.
A few long hours were spent debugging. Somehow the unpacking succeeded, but
the output was complete garbage. I started to doubt my implementation and
spent way too much time starring at the same 5 lines of code, thinking
something was at fault. I even attempted to try my luck with the remote
service, hoping it would only send me a long string of non crashing inputs.
After a while everything clicked.
Let’s put the checking function back into focus.
Notice how it only compares the first two bytes with ‘MZ’ ? On bigger
bignums a lot of xor key would be generated, which also means that it
increases the risk of collision. To only match on a correct unpacking,
I patched this function with the following code, checking the first
8 bytes instead of only 2:
[BITS 64]
section .text
; To inject at 0x1400972b1
_start:
mov rax, 0x0000000300905a4d
cmp rax, [rcx]
jne _fail
xor eax, eax
inc eax
ret
_fail:
xor rax, rax
ret
This fixed all issues and all binaries could be successfuly unpacked ! All
that remains is to programmatically recover the secret.
Recovering the secret
The secret is being displayed in a message box, which is pretty troublesome
for scripting. The exact api call for displaying the message box can be
found through tracing wine:
$ WINEDEBUG=+all wine hyperpacker-1.bin >log.txt 2>&1
$ grep SECRET log.txt
18890.724:00d4:00d8:Call user32.MessageBoxA(00000000,140004018 "SECRET is : QDJQD95V41XV2COIZY4L8WDO5PMGN2F8",140004000 "Take your secret !",00000000) ret=14000158a
MessageBoxA receives as a second argument the string to be printed. I
thus chose to patch the user32.dll in /usr/lib/wine/x86_64-windows
with the following shellcode to print the flag on stdout and exit:
[BITS 64]
section .text
_start:
; write
xor rax, rax
inc rax ; syscall write (1)
mov rdi, rax ; fd = 1 (stdout)
mov rsi, rdx ; the flag is in rdx because of the Windows ABI
mov rdx, 0x2d
syscall
; exit
xor rdi, rdi
mov rax, 0x3c
syscall
The secret can now simply be recovered by reading stdout !
$ wine /tmp/hyperpacker.bin
SECRET is : QDJQD95V41XV2COIZY4L8WDO5PMGN2F8%
Solving
Here is the final solving script:
#!/usr/bin/env python3
import subprocess
import time
from pwn import *
HOST = args.HOST or "challenges.france-cybersecurity-challenge.fr"
PORT = args.PORT or 2202
def solve():
shellcode = [
0x57, 0x56, 0x48, 0x8d, 0x7d, 0xe8, 0x48, 0xbe, 0x80, 0x6a,
0x09, 0x40, 0x01, 0x00, 0x00, 0x00, 0x48, 0xb8, 0x58, 0x73,
0x09, 0x40, 0x01, 0x00, 0x00, 0x00, 0xff, 0xd0, 0x5f, 0x5e,
0x48, 0xb8, 0x77, 0x72, 0x09, 0x40, 0x01, 0x00, 0x00, 0x00,
0xff, 0xe0
]
improved_fn_shellcode = [
0x4c, 0x8d, 0x4f, 0x10, 0xeb, 0x17, 0x66, 0x2e, 0x0f, 0x1f,
0x84, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x83, 0xc7, 0x01,
0x48, 0x83, 0xc6, 0x01, 0x49, 0x39, 0xf9, 0x74, 0x4f, 0x0f,
0xb6, 0x16, 0x84, 0xd2, 0x74, 0xec, 0x0f, 0xb6, 0x07, 0x44,
0x8d, 0x40, 0x01, 0x66, 0x3d, 0xff, 0x00, 0x74, 0x2b, 0xf7,
0xd2, 0x41, 0x0f, 0xb7, 0xc8, 0x85, 0xd1, 0x74, 0x35, 0x83,
0xc0, 0x02, 0x0f, 0xb7, 0xc0, 0xeb, 0x10, 0x0f, 0x1f, 0x44,
0x00, 0x00, 0x89, 0xc1, 0x83, 0xc0, 0x01, 0x21, 0xd1, 0x85,
0xc9, 0x74, 0x1d, 0x41, 0x89, 0xc0, 0x66, 0x3d, 0x00, 0x01,
0x75, 0xec, 0xc6, 0x07, 0x00, 0x48, 0x83, 0xc7, 0x01, 0x48,
0x83, 0xc6, 0x01, 0x49, 0x39, 0xf9, 0x75, 0xb1, 0xc3, 0x0f,
0x1f, 0x00, 0x44, 0x88, 0x07, 0xc3
]
check_shellcode = [
0x48, 0xb8, 0x4d, 0x5a, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
0x48, 0x3b, 0x01, 0x75, 0x05, 0x31, 0xc0, 0xff, 0xc0, 0xc3,
0x48, 0x31, 0xc0, 0xc3
]
fd = open("/tmp/hyperpacker.bin", "r+b")
fd.seek(0x000492b1)
fd.write(bytearray(shellcode))
fd.seek(0x000499b9)
fd.write(bytearray(check_shellcode))
fd.seek(0x00049358)
fd.write(bytearray(improved_fn_shellcode))
fd.close()
result = subprocess.run(["wine", "/tmp/hyperpacker.bin"], capture_output=True)
assert result.returncode == 0
res = result.stdout.decode("UTF-8").strip()
return res[len("SECRET is : "):].replace("\x00", "")
def main():
r = remote(HOST, PORT)
_ = r.recvlines(5)
cmdline = r.recvline().strip().decode("utf-8").split(" ")
assert cmdline[0] == "hashcash"
assert cmdline[1] == "-mb26"
assert cmdline[2].isalnum()
log.info(f"Solving PoW")
solution = subprocess.check_output([cmdline[0], cmdline[1], cmdline[2]])
log.success(f"Solved PoW: {solution.decode()}")
r.sendline(solution.strip())
while True:
x = r.recvline()
if b"Here is your binary:" not in x:
log.info("Message:")
print(x)
r.interactive()
break
encoded = r.recvline().strip()
binary = b64d(encoded)
with open("/tmp/hyperpacker.bin", "wb") as fp:
fp.write(binary)
with open("/tmp/hyperpacker.bin.bak", "wb") as fp:
fp.write(binary)
log.success("Binary written at /tmp/hyperpacker.bin")
# Solve the challenge here
# ...
_ = r.recv()
log.info("Solving ...")
secret = solve()
r.sendline(secret.encode("UTF-8"))
log.info(f"Solved: {secret}")
_ = r.recvline()
if __name__ == "__main__":
main()
Running the script gives us the following flag:
FCSC{2b60aef3d241d3c37c30373a9a4446017a6d5b6761ae5492e3f824e280cb8ceb}
Afterword
This challenge was one my favourite from this year. It is always really fun to
patch and repurpose binaries for your own benefit :P.