I recently had the chance to study several code injection techniques in-depth. Specifically Host-Based Code Injection Attacks (HBCIAs). This term was introduced to distinguish code injection attacks that target the local system from ones that target remote systems such as SQL injection. I have implemented 22 HBCIA techniques over the last couple of months and the GhostWriting technique stood out to me in particular.

What is GhostWriting?

GhostWriting is an advanced code injection technique that combines thread hijacking, a write-gadget to write to an arbitrary memory location and an endless loop to stall execution.

Implementation details

The endless loop and write-gadgets in our implementation are located inside ntdll.dll. This ensures that they are available in every process:

; endless loop		; 32-bit write-gadget		; 64-bit write-gadget
jmp 0x0				mov[ecx], edx				mov [rbx], r14
						ret							mov rbx, [rsp + 30h]
													add rsp, 20h
													pop r14
													retn	

The following discussion of the GhostWriting technique is based on the 32-bit write-gadget. The 64-bit version is implemented analogously, but has to account for side effects caused by the 64-bit write-gadget. Our 64-bit write-gadget contains side effects since we have to avoid registers that are not set reliably when calling SetThreadContext or NtSetContextThread. This happens whenever we suspended our thread during a system call as Sam Russel describes in his brilliant blog post.

The first step when performing GhostWriting is to write the endless loop onto a fabricated stack. This is achieved with thread hijacking as described in algorithm 1.

ghostwriting_algorithm

When the instruction pointer of the thread equals the address of the endless loop, we know that the thread has written the address of the endless loop to the stack and is now stuck inside the loop. This is the signal that the data has been written and we can perform the next operation. Afterwards, the ROP chain is written by repeating algorithm 1 and modifying the source and destination on every invocation. We can only write 4 bytes in a 32-bit process and 8 bytes in 64-bit process per invocation, since we are limited by the amount of data a single register can hold.

Now that we have successfully set up our fake stack, we still need to make the stack executable to be able to execute our payload. This is achieved by crafting a ROP chain that calls VirtualProtect to add the execute flag to the corresponding memory pages. Since the calling convention of the function is fastcall, we set up the parameters by writing the corresponding values to RCX, RDX, R8 and R9.

One challenge when implementing GhostWriting is executable memory. I saw many implementations that took shortcuts such as only injecting into the local process or allocating remote executable memory by calling VirtualAllocEx. However, our implementation stays true to the concept and creates executable on the stack by executing a ROP chain that calls VirtualProtect to add the execute flag to the stack. A ROP chain allows to control execution flow via the stack, as each gadget performs its instructions and finishes with a RET instruction that transfers the control flow to the next gadget in the chain on top of the stack. Our ROP chain only uses gadgets that can be found in ntdll.dll since every user mode process maps ntdll. Here it is:

pop rdi; ret
; VirtualProtectAddress
pop rcx; ret
; targetAddress,
pop rdx; pop r11; ret
; size
; trash r11 (gadget sideeffect)
pop r8; ret
; newProtection (PAGE_EXECUTE_READWRITE)
pop r9; pop r10; pop r11; ret
; oldProtection (just some pointer to writeable memory)
; trash r10 (gadget sideeffect)
; trash r11 (gadget sideeffect)
push rdi; ret (this instruction calls VirtualProtect since we put its address into rdi earlier)
; Address of the written shellcode on the stack. VirtuaProtect will use this address to return to after its call. 

Instead of the instructions itself we write the virtual addresses of these instructions inside ntdll to our fabricated stack. The commented lines are also filled in during runtime.

Next, the shellcode is written to the stack. The final step is to execute the ROP chain. This is done by suspending the thread and setting its program counter to the first gadget of the ROP chain and its stack pointer to the address of the second gadget on the stack. When the thread is resumed, it first executes the ROP chain which marks the stack as executable followed by execution of the payload on the stack.

Demo

ghostwriting Notice that the hijacked thread will only write our payload that spawns a MessageBox when we hover the notepad window.

Prototype implementation

My GhostWriting implementation (tested on Windows 10 Build 19043) is available on my GitHub.