Facepalm: Microsoft deserves kudos for open-sourcing the MS-DOS 4.00 source code, shedding light on an important milestone in computing history. But the tech giant has bungled the release in a way that may cause needless headaches for historians and archivists eager to study the decades-old code.
The fumble is dumping the source into a Git repository rather than providing a pristine archive, which, as software curator Michal Necasek of the OS/2 Museum points out, was the proper approach. He makes an excellent point: "Historic source code should be released simply as an archive of files, ZIP or tar or 7z or whatever, with all timestamps preserved and every single byte kept the way it was. Git is simply not a suitable tool for this."
By tossing the source into Git, Microsoft may have corrupted the files in multiple ways. Git ignored original timestamps, stripping away potentially valuable metadata about when each file was last modified. Even worse, the conversion to UTF-8 encoding turned some code into gibberish, breaking the build process.
As Necasek emphasizes, decades-old source isn't just text; it's essentially binary data that demands full preservation with no modification whatsoever. Re-encoding it causes breakage, since antiquated tools like MASM 5.10 and Microsoft C 5.1 naturally can't handle Unicode formats like UTF-8 that didn't exist back then.
While the availability of MS-DOS 4.00's code is undoubtedly a boon for software historians examining the lineage from MS-DOS to Windows, the GitHubbing approach may have needlessly undermined efforts to build and analyze the code as authentic archival material.
However, one commenter going by the username 'starfrost,' who claims they worked with Microsoft to get this release out, stated under the original piece that they could potentially get the original ZIP file. Timestamps may not be available, though, because "data protection law mandates anonymisation of source files."
Moreover, Necasek did comment that he was able to successfully build the code in its entirety by copying it over to a virtual machine with PC DOS 2000 and running the build process there. So, if you're looking to build, this is the way to go.
Microsoft would have still been wiser to provide the source as a clean ZIP or 7z archive directly from its internal backups with proper encoding, preserving every byte in its original form. Computing's legacy is simply too precious for amateurish antics.
To its credit, Microsoft did go the extra mile by bundling beta binaries from Ray Ozzie's archives, original documentation, and disk images for easy emulation.