Evolution of Return Oriented Programming

Mihir Shah
16 min readJan 22, 2021

Before we begin understanding Return Oriented Programming let's first understand the need for Return Oriented Programming.

Featued Image

The Need for Return Oriented Programming

Let's First Kick off with the most basic buffer overflow technique used in exploiting the machines.

The main idea in any of the exploit development is to somehow compromise the system, we use different methods and workarounds to get this done. This paper is written in the format of the attack type discovered and the mitigation technique to it. So, that the reader understands the workaround and the thinking applied behind all the attacks.

Vanilla EIP Overwrite

To Understand the concept of EIP overwrite, consider the following example -

  • Say suppose you are reading a book and then the doorbell rings, so you pause the current activity and then make a note of where you have reached, by either noting the page number, folding the page, or using a bookmark to continue later. This is basically like saving your return address.
  • You go to the door, attend to the visitor and then come back to reading. This is the execution of a new subroutine
  • You then take a look at your bookmark and continue where you had stopped.

This is the basic functional programming of your program in memory. Now let’s say that your mischievous little brother enters your room and then removes the bookmark from its original page location to a new random page(Just for fun!). Your brother, in this case, is the attacker gaining control of the “EIP(bookmark)” in the application.

Now to understand the technicality of the details -

1. You might be wondering, what on earth is EIP? — well, EIP is a register chip in the CPU which holds the address of the next instruction to be executed by the processor. Now, the processor will innocently execute the commands that the EIP points to. Hence, as an attacker, if we can make the EIP point to our malicious piece of code, the attacker would execute our code innocently.

2. Another thing to keep in mind is that the code being executed by the CPU is in assembly, which is a low-level programming language — meaning it can directly be understood by the CPU, unlike other high-level programming languages, which needs to be compiled before the CPU can execute it. So, our malicious code should be injected in an assembly format.

3. Now, when you write a simple hello world program in C, within your system, internally, a stack is created and the corresponding assembly code is PUSHED inside the stack. The stack is basically a data structure wherein data is stored and fetched in an organized manner — First In First Out (FIFO).

Refer to this diagram below to understand the working of the EIP Overwrite technique -

Saved Return Pointer Diagram

In the main function, after the do_something() function is called — a return pointer is saved which points to the next function/statement in the main() function. This pointer holds the address of the next statement in the main() method {do_something_else}. Hence, if we are able to overwrite the address of the function in the Saved Return Pointer (by overflowing the foo buffer) to our desired location of the stack, i.e. gain control of that pointer — we will be able to point the execution to anywhere in the stack, mostly to our malicious payload. If you are wondering how to write malicious code in assembly, as for now, we can use multiple tools that can generate the reverse shell(msfvenom or many others). While trying this out in a debugger — we overwrite the “EIP”, with a jump instruction. In assembly, a jump instruction is used to jump anywhere within the stack. Hence, the next instruction that might get executed is our jump statement, jumping the EIP to the malicious code. This malicious code, which, when executed will be responsible for lending us a shell :)

The mitigation to this technique -

Now, the problem as we see, is that we weren’t able to stop the processor from executing the corrupted value of the EIP, this could be done, if we have a checking mechanism before the instruction at Saved Return Pointer is executed. Hail to the concept of a canary cookie. There’s a real-world example in the next part which might help in making you understand this concept.

Stack cookie

In context with the above-stated example, think of it like this, before you leave to attend the visitor at the door, you underline a peculiar text on that page and leave, in this case, even if your brother takes out the bookmark and puts it to some page else, you’d still be able to understand that — Ok, wait. Something’s wrong — the page does not have that underlined word and stop reading the book any further. Hence, in the memory structure, a stack cookie is used, analogous to marking the word on the page of the book.

A Stack Cookie was inserted right before the EBP and EIP. When an application starts, a program-wide master cookie is calculated (pseudo-random number) and saved. During the epilogue, this cookie is compared again with the program-wide master cookie. If it is different, it concludes that corruption has occurred, and the program is terminated. This made gaining control of the EIP a little more difficult

SEH — A Workaround

Although we won’t be able to overwrite the EIP as we are unsure of the location of the Stack cookie within the do_something(), there are different techniques wherein Jump techniques are used, the idea is wherever, we can find a jump instruction, rather than the program jumping to its typical locations in the stack(where the program flow has to jump), we make it jump to our malicious shellcode to exploit the machine.

SEH workflow

Now, in context to the same example, say suppose you are reading your book as usual and you don’t understand the meaning of a word so you open up the dictionary to find the meaning of the word, and then the doorbell rings, in this case, you put that bookmark in the dictionary, wherein, now, your brother removes the bookmark from over there and sets it to any arbitrary location within the book you were reading. This is a scenario analogous to SEH based attacks.

SEH stands for Structured Exception Handling. A structured exception handler is something built-in by the compiler into a program that will try to handle any kind of error that comes up during the course of running the program. What happens is if an error occurs inside the program while it’s running, it’s going to look at this chain of structured exception handler records (look at the above picture). It’s going to start at the top. It’s going to look at its first pointer. Each structured exception handler record has two parts. It’s going to have a pointer to the next record in the chain so that the list stays linked, and it’s going to have a pointer to the actual exception handler.

SEH Linked List Structure

When that exception happens, it’s going to look at the first one in the chain, find the pointer to the first exception handler that it should try. It’s going to go and run the code at this address. That code is expected to try and do a couple of different things to find a way to handle the error condition that has happened. If the error still exists after this code has run, what happens is it looks to the top at the pointer to the next SEH record and it hops down to that one. Then, it goes to the exception handler pointed to by the SEH record right here and tries the same thing. If the error condition still exists, it continues down that chain until it gets the whole way through that chain. If that exception hasn’t been handled then the program will typically crash.

As we overflow this buffer of this foo string here, again as we did in the previous scenario, that data is going to overflow that buffer and start coming down the stack just like we did in the previous example. But in addition to overwriting all this, it’s going to overwrite the address at the structured exception handler which the program is going to use to tell it to go jump to some piece of code. Just overwriting this by itself is not enough to control execution. You have to create an actual error condition to trigger it to look at one of these SEH records. The most common and easy way of doing this of course, though, is to just continue writing this bad data all the way down until you hit the end of the stack. What will happen is it will generate an exception because you’ve tried to write past the last address of the memory that it’s allowed to access, creating an error condition that must immediately try to be handled by the structured exception handler chain.

When we control this value, it’s going to need to get us to land back in a piece of memory that we control. Typically what you’ll see in a structured exception handler overwrite is this value will point to an address to a known set of instructions that usually are what’s called a POP POP RET or a POP, a POP, and a Return. What happens, without getting too into the nitty-gritty here, is that the instruction set will alter the stack slightly and return execution right back to where this nSEH record is, which we’ve also overwritten. What happens is we put a certain address in here to existing code that exists within the program that does that set of instructions we need which returns execution here. From here, depending on how much space we have either above or below, we overwrite this value with a set of instructions that tells it to jump either backward or forwards past the SEH record. Then, in our buffer overflow, we just put our payload shellcode in that particular area.

It is completely fine if you didn’t understand that — refer to this diagram to get a visual understanding of how it works

ROP Gadgets workflow

By overwriting this SEH handler and then creating an intentional error condition, it will jump to that POP POP Return instruction which will return execution to here which we then jump into shellcode that we’ve landed somewhere in the memory space. Because we do it this way, we never actually exit the do_something function before we’ve altered the execution flow, so the stack cookie never gets evaluated. The program does not perform the check saying the stack cookie’s been corrupted, something bad is happening here. This is how we overcame stack cookies as protection when writing buffer flow exploits.

Mitigation Technique-

SafeSEH

Windows introduced the SafeSEH protection mechanism in which validated exception handlers are registered and stored in a table. The addresses in this table are checked prior to executing a given exception handler to ensure it is deemed “safe”. As a result, a POP+POP+RET address used to overwrite an SEH record that comes from a module compiled with SafeSEH will not appear in the table and the SEH exploit will fail.

SafeSEH is effective at preventing SEH-based exploits as long as the SEH overwrite address (e.g. POP+POP+RET) comes from a module compiled with SafeSEH. The good news (from an exploitability perspective) is that application modules are not typically compiled with SafeSEH by default. Even if most are, any module loaded by an application that was not compiled with SafeSEH can be used for your SEH overwrite. Some additional protection was added to compilers, helping to stop the abuse of SEH overwrites. This protection mechanism is active for all modules that are compiled with /safeSEH.

OS Level Protection

DEP — Data Execution Prevention

Think of it this way, you finally end up being frustrated with your younger brother, and so that whenever you’d have to leave your room, you put the book inside the cupboard and then lock it. In doing so, you limit the reach of your brother. Hence, your brother, in a way, has no access to your cupboard, and hence, your book remains to be protected.

This technique is based on the idea of treating the input in the stack to be non-executable, i.e. to think in a way the stack won’t have the capabilities of executing the data. We can think of differentiating Data with Code by thinking that Code is the data that gets executed. Hence, similar to the file permissions in Linux Operating System — (read, write, execute), Even the stack is supposed to have Executable and non-executable permissions. Now, if DEP is enabled on a system, the stack cannot execute any data stored in it. Now, even if we somehow manage to inject our shellcode into the stack, it won’t matter, until the shellcode is executed by the system. This is provisioned by enabling something known as NX bit (No-Execute). If this is enabled — it renders the stack unexecutable.

Return Oriented Programming

Now, let's just say that somehow, your brother gets to know that you keep your book inside the cupboard and he asks your mom for a spare key for some reason, and your mom innocently gives a spare key to your brother, this completely makes the process of you putting the book inside the cupboard and then locking it, useless. In keeping the same analogy in mind, let's understand how exactly does ROP(Return Oriented Programming) work.

Now, after looking at the mitigation technique — DEP, our main concern was to somehow gain executable permissions to the stack, also, we cannot execute any of our arbitrary code that has been injected into the stack. Hence, we try to call one of the Windows APIs that has the ability to somehow provide the executable permissions to the stack. Windows APIs are the way we interact with the Windows Kernel, these APIs are functions written under certain DLLs. These DLLs(Dynamically Linked Libraries) are invoked by the application during the running- hence, the term dynamically. These DLLs are responsible for making the Windows API calls as and when required by the application. Now, to call any of the APIs, we need to make the EIP point to the address of it, this can be done by Jumping to the address of it.

Now, this is enabled by hunting down the address of the API we wish to call through the DLL. One such Windows API available is the Virtual Protect function. This function is accessible through kernel32.dll which is loaded every time any application runs in Windows. Also, the Linux equivalent of this is Return-to-libc wherein, similarly, the function calls could be made to make the stack executable.

So essentially how this starts to work is that you have your shellcode like normal. But where you get, where you replace either your saved return pointer or your SEH record, whatever it happens to be that you’re using to control execution, you’re going to move to an ROP gadget. Now in the structured exception handler overwrite section we talked briefly about that pop-pop-ret instruction, which is actually a set of three instructions, that’s actually an ROP gadget. It is a series of instructions that is concluded by a ret or return instruction. And what the return instruction always does is it returns back to the next, it returns to whatever the pointer next on the spec is- look at the figure represented below.

So what we’ll typically do is our first pointer will point to some kind of ROP gadget that will adjust the stack frame, which is pointed to by the register ESP, we don’t have to worry too much about those specifics. But what it’s basically gonna do is it’s gonna move where we’re writing stuff on the stack to down here, giving us some room to work. Then we’ll have a return, and so when we return, after that first pointer we have to have another pointer to another ROP gadget that’s going to do something.

In the basic methodology in ROP, we’ll have a set of instructions that will always end with a “return” returning back to another pointer that will point to another ROP gadget. And what you’re trying to do is actually on the new virtual stack, you are putting all of the arguments and then a pointer to the Windows functions you need to call on the stack in such a way that once you’ve finished your ROP chain, that final return will actually return to the function pointer for the Windows function you wanna call, causing it to send all of those arguments you set up on the stack to VirtualProtect or VirtualAlloc or a number of other methods which just tells it, “Hey, this section of the stack way up here where the rest of our shellcode was, I need you to mark all of that executable now.” And so when that happens all of a sudden that shellcode that DEP would have prevented from running is now able to run.

Now, you might be wondering if it’s really a tedious task to hunt for all these ROP gadgets(Sequence of instructions ending with a “ret”). Well, there are countless automated tools to do this. One of them is mona.py. This is an infamous script that can be integrated by Immunity Debugger. Hence, this makes the task way lot easier.

Note: Although the idea of ROP is implemented only on Windows machines, its concept was originated by the initial attack — return-to-libc which works on similar concepts but on Unix-based machines. Unlike Windows, in Unix- all such Gadgets ending with a “ret”.

Mitigation techniques -

This idea of using ROP Gadgets and the whole methodology was first discovered in the year 2012, multiple mitigation techniques have been implemented since then. At first, Microsoft’s EMET protection was responsible for handling such attacks, although EMET reached its End Of Life by July 31, 2018. The mitigation techniques were then clubbed with Microsoft’s ATP (Advanced Threat Protection).

We as an industry can employ various techniques to detect if ROP is being employed.

1. MemProt or Memory Protection: What Mem Prot does is when it detects a call to VirtualProtect, it looks at how that call is being made and tries to determine whether it’s a legitimate use of VirtualProtect or not. And if Mem Prot sees like, oh, this is very obviously being called from inside some shellcode on the stack, then it just goes, “Nope, not allowed to do that.” So that’s one way that it can do it.

2. LoadLibrary: it prevents a running program from making a call to load a library, one of those DLLs, or sometimes we refer to them as modules, from a file share on a remote network resource.

So if there’s a DLL sitting on like an SMB file share somewhere on the network, and some shellcode is executing this program that says, “Hey, load this library that’s on a network resource,” one of the reasons you might do this is if you can’t build a reliable ROP chain in the program itself, if you can trick it into loading a library from somewhere else as a network resource, then you can have the get ROP gadgets already available from that new library that you forced it to load.

So all LoadLibrary does is it says, “Oh, hey, you’re calling LoadLibrary on a file…on a DLL on a file share or on a network resource of some kind.” I’m not gonna let you do that either.

3. ROP Caller Check: This one is another guard on the VirtualProtect, VirtualAlloc, any of those Windows API functions that allow you to change the permissions of a section of memory. So, it triggers when say VirtualProtect is called, it says, “Oh, VirtualProtect is being called. How did we get here? Stop for a second and look back. Did we get this call to VirtualProtect from a return instruction in the assembly?” If we did, it’s a very good chance that this VirtualProtect call is actually part of an ROP chain. And again, we’re just gonna say, “Nope, not allowed to do that.” This is obviously ROP, so this is not a legitimate call to VirtualProtect to change those memory permissions.

4. Stack Pivoting: So basically, every time a new function is loaded, a new stack frame is created. And the way it does that is it has a pointer to the top and the bottom of that…the top and the bottom of that stack frame. And what a stack pivot is is just a set of instructions to move the pointer to the top of the stack somewhere where we have more room, so that when our ROP gadgets are executing, all those arguments and those function calls can be put safely on the stack without overwriting the rest of our code on the stack.

So, stack pivot protection is just another guard that just says, “Hey, if I see this instruction happening that looks like it’s dramatically moving ESP, which is the top of the stack pointer, and moving it into a different location, that looks suspicious, and I’m not gonna let that happen either,” which will really badly damage your ability to successfully use ROP chains to carry out exploitation.

Note: It’s essential to harden the system against ROP attacks and their variants — Jump Oriented Programming, String Oriented Programming, and so on. As these attacks can compromise the complete security of the system, and in case, the system is a part of a Domain, then the attacker may perform a lateral movement or try escalating his/her privileges after infiltrating the internal network, depending on the attacker’s creativity.

So these are just four mitigations. None of them by themselves, or even as a group, are necessarily a silver bullet for stopping the use of ROP and buffer overflow exploits, but each one can make it just a little more difficult for an exploit writer, and implies that the person writing that exploit has to be a lot more expert and a lot more careful in how they’re developing that exploit.

References

  • Rapid7

Also a special thanks to Riyaz Walikar for providing deep insights and suggestions for this blog

Feel free to comment below or hit me up on twitter.

--

--

Mihir Shah

Author | Patent holder on cloud security | Industry mentor @Stanford University.