The DOS world’s need for memory grew and the 64KB available to .COM executables was no longer adequate. The NE “new executable” executable file format was invented and uses the .EXE file extension rather than .COM. The first 2 bytes of these files includes a tag at the front to identify the format and if you guessed that this was “NE” to denote “new executable”, you’d be incorrect. The first 2 bytes of every .EXE file are “MZ” famously because the name of the programmer at Microsoft who wrote the code was “Mark ZbikoNwski”.
The EXE file format allowed multiple segments to be defined, and included ability for separate compilation of portions of the program, and SDKs. That is, different parts of the program could be compiled into .OBJ files, and then a LINK step is performed to assemble the resulting EXE file. This enabled many good things like separate compilation of varied portions of programs and the ability to purchase libraries of code from other developers without them having to provide source code. The NE format also permitted programs to be “large”, occupying space to all the memory available on the DOS computers.
I intended to write a detailed description of this evolution of file formats here, but there’s no need, it’s been well done in detail by others and I provide here a links here
Cutting to the meat of it, the NE (MZ) format executable has these portions
- Header
- Relocation list
- Code
Which is really
- Header
- Relocation list
- <Code>
- [Code]
- […]
The Header includes information for allocating a heap and a stack. One grows up, one grows down, when they collide, the application is out of memory. Notice that this is still DOS so it isn’t like the operating system is going to do anything when the application collides it’s memory. Still, the executable format is starting to grow into a real concept of an operating system, with a loader.
The Code and Relocation list can use a bit more description as there can be multiple code regions, each limited to 64-KB (size of a SEGMENT).
The executable is defined in segments, each of which is loaded into memory at a paragraph boundary (16 byte boundaries). The SEGMENT of that paragraph of memory can be addressed using the segment registers as 16:16 segment:offset addressing converts to physical address by shifting the segment left 4 and adding the offset. At this time in the life of Intel processors, there was no such thing as virtual addresses. The 8086 CPU is a pretty straight forward machine. Segment:Offset converts staight to physical and when the CPU addressed it, it actually went all the way to the ISA bus where memory would respond.
After loading each code segment into RAM, the DOS loader applies the fixup records so that code calling between segments can call the 16:16 addresses where the program segment is actually loaded at runtime. There is NO provision for DLL’s or dynamic linking.
This file format was the primary format for DOS computers through the long life of the DOS operating system and it is still with us today. The modern PE file format includes a NE/MZ format executable as a “DOS Stub” at the front. This is primarily so that programs intended for Windows 3.11 or OS/2 could display a message along the lines of “this program is intended to execute under Windows” and then the stub terminates. Creative programmers can use the DOS stub to run a DOS version of a program when on DOS and a Windows or OS/2 version of program when on those operating systems. We’re on a journey here; PC operating systems are starting to look like “real computers”. The next post will take us into modern times of about 1990.