Binary Ninja is a great platform for automating some reverse engineering tasks,
especially with the headless mode available for commercial licenses. In this
post, we will use Binary Ninja to automate extracting Windows syscall numbers
from ntdll.dll.
binaryninja.open_view is a convenient wrapper function provided by Binary
Ninja to open a binary with default analysis options and returns a BinaryView
object.
The functions property on the BinaryView can be used to iterate through all
the functions within the opened binary. As the names of all syscall functions
begin with Zw, checking the name property of each function can be used to
locate all syscall functions within the binary.
import binaryninja
with binaryninja.open_view("ntdll.dll") as bv:
ssn = [
parse_syscall(func)
for func in bv.functions if func.name.startswith("Zw")
]
Syscall functions in a 64-bit ntdll.dll follow a very consistent pattern.
- The value of
RCXis moved toR10. This is because the Windows x64 calling convention passes the first argument in theRCXregister while the syscall calling convention usesR10as thesyscallinstruction clobbersRCX. EAXis set to the syscall number.- The
syscall(orint 0x2e) instruction is executed.
The following image shows the LLIL view of a typical syscall function. The
undefined branch is int 0x2e. The difference between the two branches is
not relavant to the task of extracting syscall numbers. However,
this blog post
contains an excellent explanation for the readers that are interested.
To extract the syscall number, Binary Ninja's LLIL API is used to find the
LLIL_SYSCALL instruction. Once the instruction is found, the get_reg_value
function is used to obtain the value of EAX at that point in the program's
execution as understood by Binary Ninja's data flow analysis.
def parse_syscall_64(func):
for inst in func.llil.instructions:
if inst.operation == LowLevelILOperation.LLIL_SYSCALL:
return (func.name, inst.get_reg_value("eax").value)
Syscall functions in a 32-bit ntdll.dll has a different pattern. EAX is set
to the syscall number.
A stub function is called to actually make the sysenter instruction. This is
because the syscall calling convention on 32-bit Windows expects EDX to point
to the return address for sysenter plus the arguments and the only way to
push EIP onto the stack, which is where sysenter needs to return to, is
through a call instruction.
The code required to account for this pattern is very similar to the 64-bit
version, except that we search for LLIL_CALL instead of LLIL_SYSCALL.
def parse_syscall_32(func):
for inst in func.llil.instructions:
if inst.operation == LowLevelILOperation.LLIL_CALL:
return (func.name, inst.get_reg_value("eax").value)
The arch property on the BinaryView object can be used to decide which
version of the parse function should be called.
arch = bv.arch.name
click.echo("[+] Arch: {}".format(arch))
parse_syscall = parse_syscall_32 if arch == "x86" else parse_syscall_64
The full script, parse_ntdll_ssn, can be found on the
reutils
GitHub repository and produces a JSON file containing a mapping of all
syscalls to their syscall numbers.
parse_ntdll_ssn ntdll_32.dll ssn_32.json
Function at 0x6a20d32f has HLIL condition that exceeds maximum complexity, omitting condition
[+] Arch: x86
[+] Parsing syscall numbers from ntdll_32.dll
[+] Writing results to ssn_32.json