Page cover

PE Format

Getting Started

Introduction

When discussing Reverse Engineering, especially with Windows, the Portable Executable (PE) format is necessary for for executables, dynamic-link libraries, control panels and kernel modules. In order for your favorite game, browser, or even malware to run, it must use the Win32 PE format!

For this section of Reverse Engineering, we will go into great depths about the Windows 32 PE Format and understanding its importance for applications and malware alike!

Overview

The Portable Executable is crucial for all the file types listed above to be able to run on the Windows OS. This works by loading in the executable into memory and having it run. For example, let's say you built a new C++ program, an OpenHandle program just like from the previous section. For this executable to even run and properly it must first be called into memory before it can properly execute, this is where the Portable Executable comes in.

However, before the final executable can be created it must first have the necessary obj files which are created by Common Object File Format (COFF). The compiler (MSVC) needs to store its compiled code somewhere, COFF essentially allows for .text and .data to be stored inside their files alongside needed symbols, program fragments, libraries or even full blown executables. COFF is in conjunction with PE to ensure a smooth compiling and execution for applications.

Also, although COFF can store executables, it cannot run them without the PE format

COFF Headers

As for any complicated format created by Microsoft, COFF Header basically define some basic 'metadata' as I call it to what this function will do. The COFF File Header is generally associated with and as such, the image below represents the Win32 API _IMAGE_FILE_HEADER

Looking at MSDN we can see some underlining information about the COFF headers such as:

Machine: specifies the CPU type, for instance an x64 would use IMAGE_FILE_MACHINE_AMD64. This correlates to image files as they can only be ran on a specific machine or emulation.

Number of Sections: how many sections the section table will have, for our program we can see that we have a total of 10, which consists of common sections such as .text for print statements, .data for global and static values and many more.

Different information from the sections table

The others are very self-explanatory, TimeDateStamp which is just the number of seconds elapsed, PointerToSymbolTable and NumberOfSymbols (which are generally 0) just give the function names and SizeOfOptionalHeader which just, does as it says.

The last parameter however is special, Characteristics define what the image will be, whether it is stripped, a DLL, an Executables, and many more.

PE File Structure

Now that we have covered the basics of COFF, it's time to dive a bit deeper with the Portable Executable Format. Similarly to COFF it has its own structure, that being the PE File Structure which consists of the following:

Complete PE File Structure

DOS Header

Starting from the very top, the DOS Header allows a file to become an MS-DOS executable, without it, it would not load on MS-DOS and produce a generic error. Now you may be wondering, who still uses MS-DOS, and hopefully that number is low, it is mostly used to show Windows where the PE file starts. Using the e_lfanew field, is basically points to where the actual PE file starts and skips all the DOS fluff. Another interesting fact is that the DOS Header was created by Mark Zbikowski and as such, the first two bytes of a DOS Header are always 0x5A4D representing his initials, MZ.

Structure

The DOS Header is also a 64-byte-long structure, and by looking at IMAGE_DOS_HEADER we can see its parameters:

Image Dos Header parameters (comments made by 0xrick)

Although is has numerous parameters, the most important of them are e_lfanew located at offset 0x3c which points to the file address of a new EXE and e_magic which marks the file as an MS-DOS using 0x5A4D or better known as MZ.

Looking at the image above we can see the 'magic number' being 5A4D and when following the 0x3C offset we can see the letters PE and the beginning of the NT Headers!

DOS Stub

This is a small section which just prints an error message saying that the executable is not compatible with DOS and then exits, not much else to say.

NT Headers

Before fully diving into NT Headers we must first talk about two importance concepts, the Virtual Address (VA) and Relative Virtual Address (RVA). An RVA is just the distance from where the image was loaded in memory, while the VA is where the actual memory resides.

Another thing to note is that NT Headers come in two forms, PE32 for 32-bit executables and PE32+ for 64-bit executables (why they didn't make it PE64, I have no idea). Although both have very minor differences with the only one being the IMAGE_OPTIONAL_HEADER with a 32 and 64 bit version.

Signature

Another small note (last one, I promise), the NT Headers similar to the DOS Header also have a fixed value of 4 bytes representing the letters "PE", any program you inspect with PE-Bear or any other tool, you will always see the 0x50450000 which translates to PE\0\0!

Optional Headers

Now, you may be wondering why are skipping the NT File Header? Well, that's because we already covered it! The NT File Header is the IMAGE_FILE_HEADER that COFF also uses, and for that reason we will be skipping to the last part, Optional Headers.

Optional Headers are the most important header in the NT Header file structure (funny they put the name as 'optional'), specific information is given to the PE loader in order to even load an executable, although not all file types need it, it is necessary for executables.

Just like the other header files, the Optional Header consists of both a 32-bit and 64-bit version, for our sake we will be discussing the 64-bit version, but for 32-bit the information can be found here.

*Magic: This identifies what kind of image it is, 0x10B for a PE32 executable, 0x20B for a PE32+ or 0x107 for a ROM image.

Major/Minor Linker Version: the version number for both major and minor linker.

SizeOfCode: in the .text section, this holds the size of all the code in all the sections.

SizeOfInitializedData: Holds the size in the .data section of all initialized data.

SizeOfUninitializedData: Holds the size of all uninitialized data in .bss (why not put it all in one, I have no clue).

AddressOfEntryPoint: this depends on the type of file, for program images, this means the starting address for device drivers this is the address of the initialized function, and for DLLs it is optional or zero.

BaseOfCode: Address that is relative to the image base at the start of the code section.

*ImageBase: The preferred base address. The default for DLLs is 0x10000000. The default for Windows CE EXEs is 0x00010000. The default for Windows NT, Windows 2000, Windows XP, Windows 95, Windows 98, and Windows Me is 0x00400000.

SectionAlignment: Hold a value that gets used for the section alignment, this cannot be less than the value of FileAlignment, and allow sections to be aligned in memory boundaries. (must be in bytes).

FileAlignment: Used to align the raw data of sections in an image file, must be a power of 2 and between 512 to 64K, if this is larger than the SectionAlignment it must, at the minimum, match the size of the SectionAlignment. (must be in bytes).

Major/Minor OS, Image, Subsystem Version: Used to specify the major and minor version number for the required Operating System, Image, and Subsystem.

Win32VersionValue: a reserved field that must be set to 0.

SizeOfImage: Size of the image file including all header files, this is used when loading in the image into memory and must be a multiple of SectionAlignment. (Must be in bytes).

SizeOfHeaders: combined size of DOS stub, PE, NT and Section Headers, and must be a multiple of the FileArgument.

CheckSum: checksum for the image file used to validate the image at load time.

*DLLCharacteristics: Although this is designed for executables as well, it essentially checks its compatibility and if it can be relocated during run time.

DLLCharacteristics different flags

Size Of Stack Reserve, Commit, and Heap Reserve and Commit: Specify the size of the stack to reserve. commit and size of the local heap space to reserve and commit.

LoaderFlags: A reserved field that should be set to 0.

NumberOfRvaAndSizes: Size of the DataDirectory array.

DataDirectory: An array of IMAGE_DATA_DIRECTORY structures.

Sections

Sections can be thought of as the books inside a library shelf, they contain the information and data for the actual executable. They generally come last in the PE structure following after all the PE Headers and defined. There are multiple sections, but again, we will only be going over the more important ones, if you want to see them all you can find them in the MSDN PE Format Article!

  • .text: Contains the executable code for the program.

  • .data: contains the initialized data.

  • .bss: contains the uninitialized data.

  • .rdata: contains read-only initialized data.

  • .edata: contains the export tables.

  • .idata: contains the import tables.

  • .reloc: contains image relocation information.

  • .rsrc: contains resources used by the program.

  • .tls: (Thread Local Storage) provides storage for ever executing thread.

Lastly, there is also a Section Header, which contain information about the sections of the PE File named IMAGE_SECTION_HEADER defined in winnt.h.

As we have already done for all the other headers, we cannot let this one feeling alone, so again, we will be discussing what all these parameters do and then conclude with the PE Format!

Name: An 8-byte null-padded UTF-8 encoded string, in English this means that the name must be exactly 8 characters long, anything longer would contain a backslash (/) and an ASCII representation of a decimal number that is an offset into the string table (longer string cannot work with executable images).

VirtualSize: Total size of the section when loaded into memory.

VirtualAddress: For EXEs, the address of the first byte of the section relative to the image base when loaded, and for OBJ, it holds the address of the first byte of the section before relocation is applied.

SizeOfRawData: Contains the size of the section on disk, must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment. If less than VirtualSize the remainder of the section is zero-filled and if greater, the section contains only uninitialized data.

PointerToRawData: File pointer to the first page of the section within the COFF file. For EXEs it must be a multiple of FileAlignment, for OBJs that value should be aligned on a 4-byte boundary.

PointerToRelocations: File pointer to the beginning of relocation entries for the section (usualy 0 for EXEs or if there is no relocation).

PointerToLineNumbers: File pointer to the beginning of COFF line-number entries, if there is none, it is set to 0.

NumberOfRelocations: Number of relocation entries for the section, for EXEs this is set to 0.

NumberOfLinenumbers: Number of COFF line-number entries for the section, if there is none, it is set to 0.

Characteristics: Flags that contain information about the section such as executable code, initialized/uninitialized data, shared memory, etc. More information can be found here.

Conclusion

We have successfully learned about both the COFF and PE File Structure and Headers! I know this was quite a long article, but, it is important to know this when debugging. Now that we have covered both Assembly basics and the PE Format, it's time to have a little fun...

Reversing WannaCry

References

Last updated