Revisiting the DOS Memory Models

(blogsystem5.substack.com)

154 points | by mooreds 3 days ago

82 comments

  • jmmv 12 hours ago

    Original author here. Thanks for sharing!

    I see various comments below along the lines of “oh, the article is missing so and so”. OK… then please see the other articles in this series! I think they cover most of what you are mentioning :-)

    The first was on EMS, XMS, HMA and the like: https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos

    The second was on unreal mode: https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-i...

    The third was on DJGPP: https://blogsystem5.substack.com/p/running-gnu-on-dos-with-d...

    And the last, which follows this one, is on 64 bit memory models: https://blogsystem5.substack.com/p/x86-64-programming-models

    Some of these were previously discussed here too, but composing this in mobile and finding links is rather painful… so excuse me from not providing those links now.

    • bonzini 7 hours ago

      Just one nit: contrary to what the article suggests, as far as I remember the compact model was not so common because using far pointers for all data is slow and wastes memory. Also, the globals and the stack had to fit in 64k anyway so compact only bought you a larger heap.

      However, there were variants of malloc and free that returned or accepted far pointers, or alternatively you could ask DOS for memory in 16-byte units and slice it yourself (e.g. by loading game assets). Therefore many programs used the small and medium models instead of compact and large respectively, and annotated pointers to large data (which is almost always runtime-loaded and dynamically allocated anyway) by hand with the __far modifier. This was the most efficient setup with the only problem that, due to the 64k limit, you could hardly use the heap or recursion.

      • tiahura 4 hours ago

        1. Compact Model Limits: The stack and globals don’t strictly need to fit in 64 KB; far pointers allow larger heaps, but inefficiency made this model unpopular. 2. Malloc Variants: While farmalloc and farfree existed, developers often used direct DOS memory allocation for better control. 3. Stack Constraints: Stack and recursion limits were due to 64 KB segments, not specific to compact or small models. 4. Far Pointers: Using __far for dynamic data was common across models; compact/large automated this but were inefficient. 5. Heap/Recursion Use: The heap and recursion were constrained, not “hardly usable,” due to far pointer overhead and stack size.

    • Timwi 8 hours ago

      I read through the whole page from the beginning up to the “Discussion about this post” header. At no point was there any mention of a series, or any other blog posts (the inline links all go to Wikipedia).

      I don't blame anyone for not realizing that there are more articles on the topic.

      • klelatti 7 hours ago

        At the very start of the post:

        > At the beginning of the year, I wrote a bunch of articles on the various tricks DOS played to overcome the tight memory limits of x86’s real mode.

        With link to an article.

        • gibibit 2 hours ago

          Linked in a the style where each word links to _a_ _different_ _page_ that doesn't correspond to the hyperlinked word.

          What do you call this pattern? It seems to be popular lately. I haven't been able to find a description of it, but it would be much more helpful to the reader if it was identified.

          Instead of

          > At the beginning of the year, I wrote a _bunch_ _of_ _articles_ on the various trick

          It's better to write

          > At the beginning of the year, I wrote a bunch of articles (_1_, _2_, _3_) on the various trick

          or something similar.

          • jmmv an hour ago

            I intentionally wrote it that way because these articles are only loosely related to the one discussed here, not a "series I thought through upfront". Yeah, not a fan _of_ _the_ _pattern_, but I wanted to give it a try and see how it worked. But honestly... the text of the very first sentence talks about these articles, so the curious reader will hopefully realize that "there is something more".

          • marxisttemp 2 hours ago

            It bothers me too, in the same fashion as “click here”. Instead, we should prefer e.g.

            At the beginning of the year, I wrote a bunch of articles on the various tricks (_below 1MB_, _above 1 MB_, and _with GNU JMP_)

            Just describe the content you’re linking to. You know best as the author!

        • lproven 6 hours ago

          Correction to the correction: with three links to the three articles.

    • turol 8 hours ago

      If you click on the domain name next to the main link you get a filtered view of submissions for just that domain. This way you can easily find the related posts. It looks like this is the fifth submission of this article but the others didn't get many comments.

      https://news.ycombinator.com/from?site=blogsystem5.substack....

      • jmmv an hour ago

        That's good, but you need to know what you are looking for. If I click on that link now, I see a bunch of repeated submissions, and due to the nature of this publication, the articles are of very varied topics. So a random person won't know what articles are related to this one and which ones aren't with ease.

  • WalterBright 10 hours ago

    The Zortech C/C++ compiler had another memory model: handle pointers. When dereferencing a handle pointer, the compiler emitted code that would swap in the necessary page from expanded memory, extended memory, or disk.

    It works like a virtual memory system, except that the compiler emitted the necessary code rather than the CPU doing it in microcode.

    https://www.digitalmars.com/ctg/handle-pointers.html

    Similarly, Zortech C++ had the "VCM" memory model, which worked like virtual memory. Your code pages would be swapped in an out of memory as needed.

    https://digitalmars.com/ctg/vcm.html

    • jmclnx 3 hours ago

      I was a user of Zortech C 1.0. I loved its disp_* functions.

      One program (com) I wrote with it back then is still being used by at least one person. I talked to them a couple of months ago and they said they still use it.

      • WalterBright 13 minutes ago

        Wow! good to know.

        I used it for Empire, and for my text editor. When moving to Linux, it was easy to convert to using TTY sequences.

    • sitkack 10 hours ago

      That is sort of like inlining the demand paging code from the OS. When we have exokernels, they exist as a library so can be delt with like regular code

      This would be trivial (and fun) to implement with Wasm.

      • actionfromafar 4 hours ago

        Are you saying this could be a way to break out of the 32 bit barrier (a bit) on WASM? Sort of like how Windows NT could handle 64 gigs of RAM even though it was a 32 bit operating system?

  • kookamamie 7 hours ago

    There's at least one more "fun" aspect to DOS memory - Borland's Turbo Pascal overlay files: https://secondboyet.com/articles/publishedarticles/theslithy...

  • pjmlp 11 hours ago

    As someone that was already coding during those days, having done the transition from a Timex 2068 into MS-DOS 3.3 and wonderful 5¼-inch floppies, the article is quite good.

    One thing missing are overlays, where we could have some form of primitive dynamic loading, having multiple code segments for the same memory region, naturally only one could be active at a time.

    • PennRobotics 9 hours ago

      Some of the early Microprose games used this, and it was clever for two reasons:

      First, more functionality. The minigames and intro/conclusion scenes were their own executables that made use of the original, generated game data. These got loaded into RAM on top of the original executable and then called.

      Second, graphics and sound were also overlays. Rather than having useless-to-most Roland MT-32 code in the binary, this was only loaded if requested. There were overlays for Sound Blaster, PC speaker, and Adlib. If your monitor only supported four colors (CGA) there was an overlay for that.

      A post would be nice, although you basically described most of it. An .OVL file with a non-zero overlay number is loaded into memory with INT 3Fh (although strangely enough any interrupt number could be chosen?, and the interrupt also would call the desired function after loading into memory). These overlays are loaded as-needed into a shared memory space.

      I'd be more curious to see how one would have programmed those overlays in Microsoft C Compiler 3.0. More recent compilers seemed to have better menus and documentation for the memory models, but it seems like they were clairvoyant by squeezing every bit of functionality out of version 3.0 that was made easier by Watcom/Borland/MS 5.0. (Then again, they would have evolved their build system with every successful release and every new hire, plus it was their full time job to "figure that crap out", and maybe Microsoft improved their approach to overlays in response to Microprose and others calling all the time)

      The documentation states only one EXE is generated, but Microprose had multiple EXE files. Is it possible those weren't overlays but something very similar? Or did they just change the file extensions? The docs also show the syntax "Object Modules [.OBJ]: a + (b+c) + (e+f) + g + (i)" where everything in parentheses is an overlay. But this isn't elaborated. What are the plus signs? How are these objects grouped? Would their list look like "preload + (cga + mcga + ega + vga) + (nosound + tandy + pcspkr + roland + sb) + (intro) + (newgame) + (maingame) + (minigamea) + (minigameb) + (outro)"? Or would every module be individually parenthesized, and those with plus symbols are interdependent (e.g. not alternatives)? (One website using BLINK seems to suggest the latter.)

      I know there are a lot of DOS tutorials (FreeDOS YT channel, blog posts) but I haven't found one that does a start-to-finish overlay example.

  • Aardwolf 13 hours ago

    Many things in computing are elegant and beautiful, but this is not one if them imho (the overlapping segments, the multiple pointer types, the usage of 32 bits to only access 1MB, 'medium' having less data than 'compact', ...)

    • akira2501 10 hours ago

      > but this is not one

      It really is though. Memory and thus data _and_ instruction encoding were incredibly important. Physical wires on the circuit board were at a premium then as well. It was an incredibly popular platform because it was highly capable while being stupidly cheap compared to other setups.

      Engineering is all about tradeoffs. "Purity" almost never makes it on the whiteboard.

      • tonyedgecombe 8 hours ago

        The 68000 was from the same era yet it had a 24 bit address bus, enough for 16 MB.

        • actionfromafar 4 hours ago

          And the 68008¹ was developed to overcome this problem of requiring too many data and address lines.

          1: https://en.wikipedia.org/wiki/Motorola_68008

          • gpderetta 4 hours ago

            sure, but that limitation didn't show up architecturally, other than requiring more cycles to perform a load or store.

        • elzbardico 3 hours ago

          The 68000 was a high-end product, the 8088 was a lot cheaper, in a big part because of those design decisions, like having a 16 bit memory bus.

          This design allowed for a smaller chip, and keeping backwards compatibility with the 8080.

      • Aardwolf 9 hours ago

        But wouldn't allowing plain addition of 1-byte pointer offsets and 2-byte pointer offsets to a current address (just integer addition, no involvement of segments) have been simpler to design and for CPU usage? Rather than this non-linear system with overlapping segments. This would still allow memory-saving tiny pointers when things are nearby

        • rep_lodsb 8 hours ago

          The problem is that you can't hold a pointer to more than 64K of address space inside a 16-bit register.

          x86 could have easily had an IP-relative addressing mode for data from the beginning (jumps and calls already had it), but to get a pointer you can pass around to use someplace else than the current instruction, it has to be either absolute, or relative to some other "base" register which stays constant. Like the segment registers.

          • gpderetta 4 hours ago

            Just combining two 16 bit registers for a logical 32 bit address would have been better than the weird partially overlapping addressspace.

            • wvenable an hour ago

              But then you'd end up wasting memory because the address space it would be divided into 64K blocks. The first PC had only 16KB of RAM but 128KB was probably more common. With the segments setup the way you describe a 128KB machine could use only 2 segment addresses out of 65,536 -- not very efficient or useful for relocating code and data.

            • rep_lodsb 3 hours ago

              How would you have redesigned the 8086 to do this? And why, other than because of some aesthetic objection to overlapping segments?

              The 286 and 386 in protected mode did allow segments with any base address (24 or 32 bits), so your argument about extending the address space doesn't make sense.

              • gpderetta 3 hours ago

                you explained elsewhere how the overlap is used for relocatability, which is a reasonable justification. But if that were not a concern, non overlapping segments would have provided for a larger address space. I will readily admit that I'm not aware of all the constraints that lead to the 8086 design.

                386 (not sure how 286 works) did extend segments to a larger address space, by converting them to segment selectors, but it requires a significantly more complex MMU as it is a form of virtual memory.

    • Joker_vD 12 hours ago

      Yeah, good thing that e.g. RV64 has RIP-relative addressing mode that can address anywhere in the whole 56-bits of available space with no problems, unlike the silly 8086 that resorted to using a base register to overcome the short size of its immediate fields.

      • akira2501 10 hours ago

        ...and then x86_64 went ahead and added RIP relative addressing back in, and you get the full 64 bits of address space.

        • Joker_vD 10 hours ago

          ...you know that that's not true, neither for x64 nor RV64, and my comment was sarcastic, right? Both can only straightforwardly address ±2 GiB from the instruction pointer; beyond that, it's "large code model" all over again, with the same inelegant workarounds that's been rediscovered since the late sixties or so. GOT and PLT versus pools of absolute 64-bit addresses, pick the least worst one.

          • akira2501 10 hours ago

            > and my comment was sarcastic, right?

            Pardon me for not realizing and treating it appropriately.

            > with the same inelegant workarounds that's been rediscovered since the late sixties or so

            Short of creating instructions that take 64bit immediate operands you're always going to pay the same price. An indirection. This will look different because it will be implemented most efficiently differently on different architectures.

            > GOT and PLT versus pools of absolute 64-bit addresses, pick the least worst one.

            Or statically define all those addresses within your binary. That seems more "elegant" to you? You'll have the same problem but your loader will now be inside out or you'll have none of the features the loader can provide for you.

            At that point just statically link all your dependencies and call it an early day.

            • Joker_vD 8 hours ago

              > You're always going to pay the same price. An indirection.

              There is a difference between indirecting through a register, or through a memory (which in the end also requires a register, in addition to a memory load). On the other hand, I$ is more precious, and the most popular parts of GOT are likely to be in the voluminous D$ anyhow, so it's hard to tell which is more efficient.

              > Or statically define all those addresses within your binary. That seems more "elegant" to you?

              Of course not. I personally think a directly specifiable 64-bit offset from the base register that holds the start of the data section is more elegant. But dynamic libraries don't mesh too well with this approach although IIRC it has been tried.

              > you'll have none of the features the loader can provide for you. At that point just statically link all your dependencies and call it an early day.

              This works surprisingly well in practice, actually. Data relocations are still an issue though.

  • nox101 12 hours ago

    I feel like this is missing EMS and XMS memory. Both were well supported ways of getting more than 640k. EMS worked by page banking. 1 or 2 64k segments of memory would be changed to point to different 64k banks from an add on memory card. XMS just did a copy instead of a page bank IIRC. It's been a long time but I wrote DOS apps that used both to support more than 640k of memory using both standards.

    https://en.wikipedia.org/wiki/Expanded_memory

    https://en.wikipedia.org/wiki/Extended_memory

  • globalnode 5 hours ago

    One of the programs I'm the most pleased with was a small screensaver .COM program I wrote for DOS (for personal use). Pressing both shift keys at the same time toggled a blank screen screensaver on/off. There was a similar program released as part of Norton utilities but I got my .COM file smaller than theirs using assembly. After relocating the loader code or was it PSP? Cannot remember, it was something like 150'ish bytes of code in memory, maybe less :D

  • brudgers 2 days ago

    "DOS Memory Models" brought "QEMM" immediately to mind.

    So possibly related, https://en.wikipedia.org/wiki/QEMM

    • d3Xt3r 7 hours ago

      I was a big fan of JEMM386, was quite revolutionary when it came out - it used only 192 bytes of memory! A godsend for some demanding DOS games back then.

      And there was also HXRT from the same author, which allowed you to run win32 apps in DOS. Never really made good use of it, but thought it was still pretty cool.

    • mobilio 8 hours ago

      386MAX user here!

  • PaulHoule 3 days ago

    Today Java has pointer compression where you use a 32 bit reference but shift it a few places to the left to make a 64-bit address which saves space on pointers but wastes it on alignment

    • xxs 11 hours ago

      All allocated objects would have the three least significant bits as 0. Any java object cannot be 'too small' as they all have object headers (more if you need a fully blown synchronized/mutex). So with compressed pointers (up to 32GB Heaps) all objects are aligned but then again, each pointer is 4 bytes only (instead of 8). Overall it's a massive win.

      • kstrauser 2 hours ago

        Huh, that’s clever! Do you have to choose that at compile or launch time, or does a program start like that and then “grow” when it uses more than 32GB of heap?

        • xxs 28 minutes ago

          In Java you have to set max heap somehow - either ergonomics or just -Xmx command line option. Max heap is given (many a reason, and it sets before running the main method), so if you pick under the 32GB it'd auto use compressed pointers (optimize for size - optimize for speed). That option (compressed pointers) can be switched off, of course, via a command line option as well.

    • o11c 12 hours ago

      It's not wasted on alignment, since that alignment is already required (unless you need a very large heap). Remember that Java's GC heap is only used to allocate Objects, not raw bytes. There are ways to allocate memory outside of the heap and if you're dealing with that much raw data you should probably be using them.

    • layer8 5 hours ago

      Alignment is required anyway to prevent word tearing, for the atomicity guarantees.

  • GarnetFloride 3 hours ago

    I remember some of that. One of my first jobs was a summer internship where I had to setup the engineering computers. They had AutoCAD and Ventura Publisher and one used expanded memory and the other extended memory. I setup batch files to copy the right configuration into config.sys and autoexec.bat so they would work. What a nightmare.

  • mycall 7 hours ago

    I recall RBIL [0] having a detailed list of all the interrupts for all the known memory models available. There were many.

    [0] https://en.wikipedia.org/wiki/Ralf_Brown%27s_Interrupt_List

  • stuaxo 2 hours ago

    As a teenage beginner programmer back then I only had a vague understanding of these (and not even pointers yet), wish I had this article then.

  • ta12653421 9 hours ago

    ah, good ol REAL computing days :-)

    DJGPP was such an eye opener back then and it made things much easier: finally, we were able to have one pointer for linear graphic buffer access; also you could easily save 2MB in memory, and its DPMI was free, compared to the other ones available.

  • o11c 12 hours ago

    It's worth noting that all the memory models have DS=SS, which makes sense for C (where you often take the address of a local variable - though nothing is stopping you from having a separate "data stack" for those) but is a silly restriction for some other languages.

    I'm sure someone took advantage of this, but my knowledge is purely theoretical.

  • dingosity 3 hours ago

    I have such fun memories of x86 real-mode assembly programming. Thx for the stroll down memory lane!

  • wkjagt 5 hours ago

    Precisely the kind of article I love to read. And timely too. I'm just about to fire up an old laptop with MS-DOS and Borland C++ so this will be fun to read alongside that.

  • geon 12 hours ago

    Is this only relevant to real mode, or is it still in use in protected mode and/or x64?

    • Dwedit 12 hours ago

      On 32-bit Windows, segmentation registers still exist, but they are almost always set to zero. CS (code segment), DS (data segment), ES (extra segment), and SS (stack segment) are all set to zero. But FS and GS are used for other purposes.

      For a 32-bit program, FS is used to point to the Thread Information Block (TIB). GS is used to point to thread-local storage since after Windows XP. Programs using GS for thread-local storage won't work on prior versions of Windows (they'll just crash on the first access).

      X64 made it even more formal that CS, DS, SS and ES are fixed at zero. 32-bit programs running on a 64-bit OS can't reassign them anymore, but basically no programs actually try to do that anyway.

      ---

      As for shorter types of pointers being in use? Basically shorter pointers are only used for things relative to the program counter EIP, such as short jumps. With 32-bit protected mode code, you can use 32-bit pointers and not worry about 64K-size segments at all.

      ---

      Meanwhile, some x64 programs did adopt a convention to use shorter pointers, 32-bit pointers on a 64-bit operating system. This convention is called x32, but almost nobody adopted it.

      • rep_lodsb 8 hours ago

        It's quite possible to write a program that uses 32-bit pointers in 64-bit mode, just keep all code and data at addresses below 4G. Such a program will run on any standard x86-64 kernel, because it doesn't use the x32 ABI. x32 is "only" required to support the C library, which expects pointers passed from/to the kernel to be the same size as those in userland.

        (Things THEY don't want you to know: you can in fact write code in languages which aren't C, don't compile down to C, and don't depend on a C library. Even under Linux.)

        As for reloading segment registers, 64-bit Linux is able to run 32-bit binaries, so there have to be ring 3 code segments for both modes. And there is nothing in the architecture stopping assembly code from jumping between those segments!

        With a 32-bit binary that does this, you get access to all the features of 64-bit mode, with everything in your address space guaranteed to be mapped at an address below 4G. The only point where you need to use 64-bit pointers is in structures passed to syscalls. (for arguments in registers it's done automatically by zero-extension)

      • xxs 11 hours ago

        >some x64 programs did adopt a convention to use shorter pointers, 32-bit pointers on a 64-bit operating system.

        It's doable in managed languages, e.g. Java has compressed pointers by default on sub 32GB heaps. I suppose it's doable even in C alike setup (incl OS calls) but that would require wrappers to bit shift the pointers on each dereference (and passive to the OS, extern)

        • gpderetta 4 hours ago

          both GCC and the linux kernel support x32 directly. Distros even shipped system libraries compiled for x32.

          There was no uptake and I believe it is deprecated today.

          • xxs 26 minutes ago

            With x32 the limit would be 4GB which is on the low side of things. Having 8byte alignment (i.e. last 3 bits zero), allows for 32GB - which is better.

  • pcb-rework 9 hours ago

    Spent many hours in Borland C/C++ 3.1 and Borland Pascal 7, with real-mode, unreal mode, and protected mode.

    • zazaulola 7 hours ago

      Yeah. I'd forgot that Borland's turbo-vision interfaces had hamburger on the menu

    • mobilio 8 hours ago

      Let's "Make Borland Great Again"!

  • skissane 10 hours ago

    I think it is a pity Intel went with 16 byte paragraphs instead of 256 byte paragraphs for the 8086.

    With 16 byte paragraphs, a 16 bit segment and 16 bit offset can only address 1MiB (ignoring the HMA you can get on 80286+).

    With 256 byte paragraphs, the 8086 would have been able to address 16MiB in real mode (again not counting the HMA, which would have been a bit smaller: 65,280 bytes instead of 65,520 bytes).

    • spc476 9 hours ago

      The 8086 was released in '78 (or thereabouts). 64K of RAM was very expensive at the time, and wasting 256 bytes just to align segments would have been extravagant. Also, the 8086 was meant as a stop-gap product until the Intel 432 was released (hint: it never really was as it was hideously expensive and hideously slow, but bits of it showed up in the 80286 and 80386).

      The 80286 changed how the segment registers worked in protected mode, giving access to 16M of address space, but couldn't change it for real mode as it would have broken a ton of code. Both Intel and IBM never thought the IBM PC would take over the market like it did.

      • gpderetta 4 hours ago

        I still do not understand this point: intel could have used 16 bits from the offset register and 4 bits from the segment register to get non-overlapping segments, leaving the top 12 bits of the segment register unused (either masked out, mirroring the other segments or trapping). It wouldn't have changed the number of lines it needed to address 1M of memory, but it would have made extending the address space further much simpler.

        • rep_lodsb 3 hours ago

          As TFA explains, the purpose of segment registers wasn't just to extend the address space, it was to make code and data relocatable without the need of fixing up every address referenced.

          They considered 256 byte alignment too wasteful, 64K would have been ridiculous (many business computers at the time didn't even have that much memory)

    • pwg 4 hours ago

      Intel also released both the 8086 and 8088 as 40pin DIP's.

      Squeezing four more address pins in would have meant multiplexing four more of the pins on the chip, and if you exclude power/ground pins there are only 13 pins that are not multiplexed, and several of those either can't be multiplexed (because they are inputs, i.e., CLK, INTR, NMI) or would have made bus design even more painful than it already is for these chips.

      The 4 bit shift, instead of 8 bit shift, for the segment registers was likely as big an address bus they could do that would also fit the constraint of "fits into a 40pin DIP".

      https://en.wikipedia.org/wiki/File:Intel_8086_pinout.svg

  • atan2 3 hours ago

    Very good article. Thank you.

  • malthaus 10 hours ago

    this brings back traumatic memories of fiddling for hours with various config files to make games work on DOS back in the day

  • block_dagger 6 hours ago

    Memories of QEMM shudder