PE Format
This article is in conjunction with the Reverse Engineering topic. As such, it is important that you first read Getting Started before continuing your endeavors with the PE Format!
Introduction
When discussing Reverse Engineering, especially with Windows, the Portable Executable (PE) format is necessary for for executables, dynamic-link libraries, control panels and kernel modules. In order for your favorite game, browser, or even malware to run, it must use the Win32 PE format!
For this section of Reverse Engineering, we will go into great depths about the Windows 32 PE Format and understanding its importance for applications and malware alike!
Just as a clarification, I will be using the word 'image' numerous times in this article, the term image refers to a PE executable (EXE or DLL) and not to be confused with a regular image like a picture!
Overview
The Portable Executable is crucial for all the file types listed above to be able to run on the Windows OS. This works by loading in the executable into memory and having it run. For example, let's say you built a new C++ program, an OpenHandle program just like from the previous section. For this executable to even run and properly it must first be called into memory before it can properly execute, this is where the Portable Executable comes in.
However, before the final executable can be created it must first have the necessary obj files which are created by Common Object File Format (COFF). The compiler (MSVC) needs to store its compiled code somewhere, COFF essentially allows for .text
and .data
to be stored inside their files alongside needed symbols, program fragments, libraries or even full blown executables. COFF is in conjunction with PE to ensure a smooth compiling and execution for applications.
COFF Headers
As for any complicated format created by Microsoft, COFF Header basically define some basic 'metadata'
as I call it to what this function will do. The COFF File Header is generally associated with and as such, the image below represents the Win32 API _IMAGE_FILE_HEADER

Looking at MSDN we can see some underlining information about the COFF headers such as:
Machine: specifies the CPU type, for instance an x64 would use IMAGE_FILE_MACHINE_AMD64
. This correlates to image files as they can only be ran on a specific machine or emulation.
Number of Sections: how many sections the section table will have, for our program we can see that we have a total of 10, which consists of common sections such as .text
for print statements, .data
for global and static values and many more.

The others are very self-explanatory, TimeDateStamp
which is just the number of seconds elapsed, PointerToSymbolTable
and NumberOfSymbols
(which are generally 0) just give the function names and SizeOfOptionalHeader
which just, does as it says.
The last parameter however is special, Characteristics define what the image will be, whether it is stripped, a DLL, an Executables, and many more.

PE File Structure
Now that we have covered the basics of COFF, it's time to dive a bit deeper with the Portable Executable Format. Similarly to COFF it has its own structure, that being the PE File Structure which consists of the following:

DOS Header
Starting from the very top, the DOS Header allows a file to become an MS-DOS executable, without it, it would not load on MS-DOS and produce a generic error. Now you may be wondering, who still uses MS-DOS, and hopefully that number is low, it is mostly used to show Windows where the PE file starts. Using the e_lfanew
field, is basically points to where the actual PE file starts and skips all the DOS fluff. Another interesting fact is that the DOS Header was created by Mark Zbikowski and as such, the first two bytes of a DOS Header are always 0x5A4D
representing his initials, MZ.
Structure
The DOS Header is also a 64-byte-long structure, and by looking at IMAGE_DOS_HEADER
we can see its parameters:

Although is has numerous parameters, the most important of them are e_lfanew
located at offset 0x3c
which points to the file address of a new EXE and e_magic
which marks the file as an MS-DOS using 0x5A4D or better known as MZ.

Looking at the image above we can see the 'magic number' being 5A4D
and when following the 0x3C
offset we can see the letters PE and the beginning of the NT Headers!
DOS Stub
This is a small section which just prints an error message saying that the executable is not compatible with DOS and then exits, not much else to say.
NT Headers
Before fully diving into NT Headers we must first talk about two importance concepts, the Virtual Address (VA) and Relative Virtual Address (RVA). An RVA is just the distance from where the image was loaded in memory, while the VA is where the actual memory resides.

Another thing to note is that NT Headers come in two forms, PE32 for 32-bit executables and PE32+ for 64-bit executables (why they didn't make it PE64, I have no idea). Although both have very minor differences with the only one being the IMAGE_OPTIONAL_HEADER with a 32 and 64 bit version.

Signature
Another small note (last one, I promise), the NT Headers similar to the DOS Header also have a fixed value of 4 bytes representing the letters "PE", any program you inspect with PE-Bear or any other tool, you will always see the 0x50450000
which translates to PE\0\0
!

Optional Headers
Now, you may be wondering why are skipping the NT File Header? Well, that's because we already covered it! The NT File Header is the IMAGE_FILE_HEADER that COFF also uses, and for that reason we will be skipping to the last part, Optional Headers.
Optional Headers are the most important header in the NT Header file structure (funny they put the name as 'optional'), specific information is given to the PE loader in order to even load an executable, although not all file types need it, it is necessary for executables.
Just like the other header files, the Optional Header consists of both a 32-bit and 64-bit version, for our sake we will be discussing the 64-bit version, but for 32-bit the information can be found here.

We will be going through all of the parameters, although knowing the information of all these parameters are not necessary, I'll point out which are important to know with a *
*Magic: This identifies what kind of image it is, 0x10B
for a PE32
executable, 0x20B
for a PE32+
or 0x107
for a ROM
image.
Major/Minor Linker Version: the version number for both major and minor linker.
SizeOfCode: in the .text
section, this holds the size of all the code in all the sections.
SizeOfInitializedData: Holds the size in the .data
section of all initialized data.
SizeOfUninitializedData: Holds the size of all uninitialized data in .bss
(why not put it all in one, I have no clue).
AddressOfEntryPoint: this depends on the type of file, for program images, this means the starting address for device drivers this is the address of the initialized function, and for DLLs it is optional or zero.
BaseOfCode: Address that is relative to the image base at the start of the code section.
*ImageBase: The preferred base address. The default for DLLs is 0x10000000
. The default for Windows CE EXEs is 0x00010000
. The default for Windows NT, Windows 2000, Windows XP, Windows 95, Windows 98, and Windows Me is 0x00400000
.
SectionAlignment: Hold a value that gets used for the section alignment, this cannot be less than the value of FileAlignment
, and allow sections to be aligned in memory boundaries. (must be in bytes).
FileAlignment: Used to align the raw data of sections in an image file, must be a power of 2 and between 512
to 64K
, if this is larger than the SectionAlignment
it must, at the minimum, match the size of the SectionAlignment
. (must be in bytes).
Major/Minor OS, Image, Subsystem Version: Used to specify the major and minor version number for the required Operating System, Image, and Subsystem.
Win32VersionValue: a reserved field that must be set to 0
.
SizeOfImage: Size of the image file including all header files, this is used when loading in the image into memory and must be a multiple of SectionAlignment
. (Must be in bytes).
SizeOfHeaders: combined size of DOS stub, PE, NT and Section Headers, and must be a multiple of the FileArgument
.
CheckSum: checksum for the image file used to validate the image at load time.
*DLLCharacteristics: Although this is designed for executables as well, it essentially checks its compatibility and if it can be relocated during run time.

Size Of Stack Reserve, Commit, and Heap Reserve and Commit: Specify the size of the stack to reserve. commit and size of the local heap space to reserve and commit.
LoaderFlags: A reserved field that should be set to 0
.
NumberOfRvaAndSizes: Size of the DataDirectory
array.
DataDirectory: An array of IMAGE_DATA_DIRECTORY
structures.
Sections
Sections can be thought of as the books inside a library shelf, they contain the information and data for the actual executable. They generally come last in the PE structure following after all the PE Headers and defined. There are multiple sections, but again, we will only be going over the more important ones, if you want to see them all you can find them in the MSDN PE Format Article!

.
text
: Contains the executable code for the program..
data
: contains the initialized data..bss
: contains the uninitialized data..rdata
: contains read-only initialized data..edata
: contains the export tables..idata
: contains the import tables..reloc
: contains image relocation information..rsrc
: contains resources used by the program..tls
: (Thread Local Storage) provides storage for ever executing thread.
Lastly, there is also a Section Header, which contain information about the sections of the PE File named IMAGE_SECTION_HEADER
defined in winnt.h
.

This is the last part I promise ;)
As we have already done for all the other headers, we cannot let this one feeling alone, so again, we will be discussing what all these parameters do and then conclude with the PE Format!
Name: An 8-byte null-padded UTF-8 encoded string, in English this means that the name must be exactly 8 characters long, anything longer would contain a backslash (/) and an ASCII representation of a decimal number that is an offset into the string table (longer string cannot work with executable images).
VirtualSize: Total size of the section when loaded into memory.
VirtualAddress: For EXEs, the address of the first byte of the section relative to the image base when loaded, and for OBJ, it holds the address of the first byte of the section before relocation is applied.
SizeOfRawData: Contains the size of the section on disk, must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment
. If less than VirtualSize
the remainder of the section is zero-filled and if greater, the section contains only uninitialized data.
PointerToRawData: File pointer to the first page of the section within the COFF file. For EXEs it must be a multiple of FileAlignment
, for OBJs that value should be aligned on a 4-byte boundary.
PointerToRelocations: File pointer to the beginning of relocation entries for the section (usualy 0 for EXEs or if there is no relocation).
PointerToLineNumbers: File pointer to the beginning of COFF line-number entries, if there is none, it is set to 0
.
NumberOfRelocations: Number of relocation entries for the section, for EXEs this is set to 0
.
NumberOfLinenumbers: Number of COFF line-number entries for the section, if there is none, it is set to 0
.
Characteristics: Flags that contain information about the section such as executable code, initialized/uninitialized data, shared memory, etc. More information can be found here.
Conclusion
We have successfully learned about both the COFF and PE File Structure and Headers! I know this was quite a long article, but, it is important to know this when debugging. Now that we have covered both Assembly basics and the PE Format, it's time to have a little fun...
Reversing WannaCryReferences
Last updated