Back
Featured image of post Google CTF 2022 MIXED Writeup

Google CTF 2022 MIXED Writeup

Hello Python 3.11

This challenge gives a .pyc file and a Python 3.11 alpha7 based patch file.

Patch

This patch file mainly patches the opcode.py file. After the patch, opcode.py file first generated two random lists and modifies the opcode according to the values of the lists. Finally, it will output the new opcode to a tmp file.

+import sys
+if "generate_opcode_h" in sys.argv[0]:
+  random = __import__("random")
+  random.shuffle(perm)
+  random.shuffle(perm2)

-def_op('CACHE', 0)
-def_op('POP_TOP', 1)
-def_op('PUSH_NULL', 2)

+def_op('CACHE', perm[0])
+def_op('POP_TOP', perm[1])
+def_op('PUSH_NULL', perm[2])

+with open("/tmp/opcode_map", "w") as f:
+  for x, y in enumerate(opname):
+    f.write("%s %s\n" % (x, y))

The patch file also modifies the ceval.c file, but I don’t understand what this does.

diff --git a/Python/ceval.c b/Python/ceval.c
index 1d2c6432d0..e275158d9d 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1344,6 +1344,8 @@ eval_frame_handle_pending(PyThreadState *tstate)
 #define OR_DTRACE_LINE
 #endif
 
+#define USE_COMPUTED_GOTOS 0
+
 #ifdef HAVE_COMPUTED_GOTOS
     #ifndef USE_COMPUTED_GOTOS
     #define USE_COMPUTED_GOTOS 1
@@ -1705,6 +1707,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
 #if USE_COMPUTED_GOTOS
 /* Import the static jump table */
 #include "opcode_targets.h"
+    asdasdasd, break the compilation...
 #endif
 
 #ifdef Py_STATS

Opcode

Based on the information in the patch file, we came up with two ideas. One is to solve it by cracking the random number generator, and the other is to recover all opcodes manually.

Cracking the random number generator

Since randon.shuffle is based on mt19937, cracking this requires providing 624 32bit numbers which is impossible. So we spent a lot of time on cracking the seed of the random number generator. For example, I found the file generation date of the patch file very suspicious, so I tried to use that time as random seed. We also used the sleeping hours to run the explosion program. Until we find:

random.seed(a=None, version=2)
    Initialize the random number generator.
    If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

We do not think there is any way to guess os.urandom(), so we focused on the second method.

Recover opcodes by hand

It’s difficult to disasm python 3.11 .pyc files, for the marshal and opcodes changes a lot. The newist tools I know is pycdc which still support python 3.10 only.

I started with some easy functions. For example, it is obvious that __main__ start with import sys and followd with functions defination.

0    RESUME        0
2    LOAD_CONST    0
4    LOAD_CONST    1
6    IMPORT_NAME   0 (sys)
8    STORE_NAME    0             

26   LOAD_CONST    2
28   MAKE_FUNCTION 0
30   STORE_NAME    3 (ks)

68   LOAD_CONST    1
70   RETURN_VALUE  0

Therefore we got:

code op
0x61 RESUME
0x60 LOAD_CONST
0xab IMPORT_NAME
0x96 STORE_NAME
0x31 RETURN
0x63 MAKE_FUNCTION

I also find there were lots of 20 00 in .pyc file. Which is CACHE, and we can ignore it.

Since there were bugs in dis module after I change the opcode.py, I gave it up. But we can still use dis.show_code() to unserialize marshal.

>>> import marshal, dis
>>> fd = open('x.pyc', 'rb')
>>> fd.seek(16)
16
>>> a = marshal.load(fd)
>>> dis.show_code(a)
Name:              <module>
Filename:          /usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py
Argument count:    0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  0
Stack size:        2
Flags:             0x0
Constants:
   0: 0
   1: None
   2: <code object ks at 0x101419470, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 5>
   3: <code object cry at 0x1016003f0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 11>
   4: <code object fail at 0x101553000, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 18>
   5: <code object game1 at 0x12c704920, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 23>
   6: <code object game2 at 0x12c704cc0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 68>
   7: <code object game3 at 0x12c7051f0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 96>
   8: <code object main at 0x12c704f20, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 123>
Names:
   0: sys
   1: random
   2: time
   3: ks
   4: cry
   5: fail
   6: game1
   7: game2
   8: game3
   9: main
>>> dis.show_code(a.co_consts[2])
Name:              ks
Filename:          /usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py
Argument count:    1
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  1
Stack size:        4
Flags:             OPTIMIZED, NEWLOCALS, GENERATOR
Constants:
   0: None
   1: True
   2: 0
   3: 255
   4: 13
   5: 17
   6: 256
Names:
   0: random
   1: seed
   2: randint
Variable names:
   0: seed
>>>

Start with the short function will be easier. For example, it is easy to recognize BINARY_OP and LOAD_FAST in function w() which defined in function game1().

Name:              w
Filename:          /usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py
Argument count:    3
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  3
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, NESTED
Constants:
   0: None
   1: 10
   2: 1
Variable names:
   0: m
   1: i
   2: j

0x7a2 0x7c0
0    RESUME       0
2    LOAD_FAST    0 (m)
4    LOAD_FAST    1 (i)
6    LOAD_CONST   1 (10)
8    BINARY_OP    5 (*)
12   LOAD_FAST    2 (j)
14   BINARY_OP    0 (+)
18   BINARY_OP    9 (>>)
22   LOAD_CONST   2 (1)
24   BINARY_OP    1 (&)
28   RETURN_VALUE 0

Some important information of LOAD_* from docs:

LOAD_CONST(consti)

Pushes co_consts[consti] onto the stack.

LOAD_NAME(namei)

LOAD_NAME [namei] Pushes co_names[x // 2] onto the stack.

LOAD_FAST(var_num)

Pushes a reference to the local co_varnames[var_num] onto the stack.

LOAD_METHOD(namei)

LOAD_METHOD x Pushes co_names[x] onto the stack.

By recognize print () in main():

0    RESUME        0
2    PUSH_NULL
4    LOAD_NAME     0 (print)
16   LOAD_CONST    1 ('Pass 3 tests to prove your worth!')
18   PRECALL       1
22   CALL          1
32   POP_TOP       0

We got:

code op
0xB9 LOAD_NAME
0xBF PRECALL
0x8E CALL
0x01 POP_TOP
0x50 PUSH_NULL

After recover all functions, we got:

code op
0x7b JUMP_BACKWARD
0xb6 FOR_ITER
0x1f GET_ITER
0x8d STORE_FAST
0x8f UNPACK_SEQUENCE
0x62 LOAD_FAST
0xc6 BINARY_OP (+, &, //, «, @, *, %, |, **, », -, /, ^, +=)
0x74 BUILD_LIST
0x68 COMPARE_OP (’<’, ‘<=’, ‘==’, ‘!=’, ‘>’, ‘>=’)
0xba POP_JUMP_FORWARD_IF_FALSE
0x5d JUMP_FORWARD
0x92 LOAD_METHOD
0x88 BUILD_SLICE
0xaf POP_JUMP_BACKWARD_IF_TRUE
0x82 BUILD_TUPLE
0x75 CONTAINS_OP
0xb1 BUILD_CONST_KEY_MAP

There’s still some opcode I didn’t recognize, but that doesn’t affect our decompilation anymore.

The decompilation result (by hand) was:

import sys
import time
import random

def ks(seed):
    random.seed(seed)
    while 1:
        yield (random.randint(0, 255) * 13 + 17) % 256

def cry(s, seed):
    r = []
    for x, y in zip(ks(seed), s):
        r.append(x ^ y)
    bytes(r)

def fail(s):
    print (s)
    print ("Thanks for playing!")
    sys.exit(0)

def game1():
    def w(m, i, j):
        return (m >> (i * 10 + j)) & 1
    m = 1267034045110727999721745963007
    fuel = 8
    x, y = (1, 1)
    stops = set([(5, 4), (3, 3)])
    log = ''
    print("Fuels:", fuel)
    for i in range(10):
        s = ''
        for j in range(10):
            if w(m, i, j):
                s += '🧱'
            elif (j, i) == (x, y):
                s += '🚓'
            elif (j, i) == (8, 8):
                s += '🏁'
            elif (j, i) in stops:
                s += '⛽️'
            else:
                s += '  '
        print (s)
    inp = input().strip()
    for c in inp:
        log += c
        if c not in 'wasd':
            fail('Nope!')
        elif fuel == 0:
            fail('Empty...')
        dx, dy = {'w':(0, -1), 's':(0, 1), 'a':(-1, 0), 'd':(1, 0)}[c]
        x += dx
        y += dy
        if w(m, y, x):
            fail('Crash!')
        fuel -= 1
        if (x, y) in stops:
            stops.remove((x, y))
            fuel += 15
        elif (x, y) == (8, 8):
            print ('Nice!')
            return log

def game2():
    print ('Math quiz time!')
    qs = (('sum', 12, 5), ('difference', 45, 14), ('product', 8, 9), ('ratio', 18, 6), ('remainder from division', 23, 7))
    log = '_'
    for q in qs: 
        print ('What is the %s of %d and %d?' % q)
        x, a, b = q
        if x == 'sum':
            r = a + b
        elif x == 'difference':
            r = a - b
        elif x == 'product':
            r = a * b
        elif x == 'ratio':
            r = a // b
        elif x == 'remainder from division':
            r = a % b
        else:
            print ('What?')
        inp = int(input())
        if inp == r:
            print ('Correct!')
            log += str(inp) + '_'
        else:
            print ('Wrong!')
    return log
    #_17_31_72_3_2_

def game3():
    print ('Speed typing game.')
    t = time.time()
    text = '''
  Text: Because of its performance advantage, today many language implementations
  execute a program in two phases, first compiling the source code into bytecode,
  and then passing the bytecode to the virtual machine.
  '''
    words = text.split()
    it = 1
    log = '_'
    while it != len(words):
        print ('%0.2f seconds left.' % (20 - (time.time() - t)))
        print ('\x1b[32m', ' '.join(words[:it]), '\x1b[39m ', words[it])
        inp = input()
        if time.time() > t + 20:
            fail('Too slow!')
        if inp == words[it]:
            log += words[it].upper() + '_'
            it += 1
        else:
            fail('You made a mistake!')
    print ('Nice!')
    return log
#_BECAUSE_OF_ITS_PERFORMANCE_ADVANTAGE,_TODAY_MANY_LANGUAGE_IMPLEMENTATIONS_EXECUTE_A_PROGRAM_IN_TWO_PHASES,_FIRST_COMPILING_THE_SOURCE_CODE_INTO_BYTECODE,_AND_THEN_PASSING_THE_BYTECODE_TO_THE_VIRTUAL_MACHINE._
def main():
    print('Pass 3 tests to prove your worth!')
    seed = 'seed:'
    seed += game1() + ':'
    print (seed)
    seed += game2() + ':'
    print (seed)
    seed += game3()
    print (seed)
    print ()
    print ("You can drive to work, know some maths and can type fast. You're hired!")
    print('Your sign-on bonus:', cry(b'\xa0?n\xa5\x7f)\x1f6Jvh\x95\xcc!\x1e\x95\x996a\x11\xf6OV\x88\xc1\x9f\xde\xb50\x9d\xae\x14\xde\x18YHI\xd8\xd5\x90\x8a\x181l\xb0\x16^O;]', seed).decode())

SOLVE!

import random

def ks(seed):
    random.seed(seed)
    while 1:
        yield (random.randint(0, 255) * 13 + 17) % 256

def cry(s, seed):
    r = []
    for x, y in zip(ks(seed), s):
        r.append(x ^ y)
        bytes(r)
    return r

seed = 'seed:sssddwwddwddsssdssaaawwssaaaassddddddd:_17_31_72_3_2_:_BECAUSE_OF_ITS_PERFORMANCE_ADVANTAGE,_TODAY_MANY_LANGUAGE_IMPLEMENTATIONS_EXECUTE_A_PROGRAM_IN_TWO_PHASES,_FIRST_COMPILING_THE_SOURCE_CODE_INTO_BYTECODE,_AND_THEN_PASSING_THE_BYTECODE_TO_THE_VIRTUAL_MACHINE._'
print('Your sign-on bonus:', cry(b'\xa0?n\xa5\x7f)\x1f6Jvh\x95\xcc!\x1e\x95\x996a\x11\xf6OV\x88\xc1\x9f\xde\xb50\x9d\xae\x14\xde\x18YHI\xd8\xd5\x90\x8a\x181l\xb0\x16^O;]', seed))

Afterword

After failed to solve adamd in DEFCON 30 QUALS, I finally solved a Python 3.11 challenge :D

Built with Hugo
Theme Stack designed by Jimmy