Digital archives to last the centuries

Quiz: if you want to write a computer program to run 200 years from now, what programming language should you use?

I’ve got a guess, and while I won’t be around to find out, it’s an experiment I plan to perform. But first some background.

Digital archiving is harder than it looks, even if you ignore hardware deterioration. There’s only one file format I can reasonably say has stood the test of time: 7-bit ASCII (a.k.a. Plain Text.) It’s the one format that all of the computers I’ve used have been able to handle, from my dad’s Heathkit H-89 to my Palm Pilots. In contrast, the letters I wrote in Word 1.0 in junior high were unreadable by Word 5.0 in college. When I discovered HTML in 1993, I immediately saw it as a credible replacement for ASCII. The main reason was that it was built on top of ASCII: even if all the web browsers went away, you could still understand it by reading it as straight ASCII. What’s more, it had cross-platform mass appeal. (Or at least potential: it hadn’t actually caught on yet.) And it had been designed with both backwards and forwards compatibility in mind: the specification had rules for how old browsers should handle unknown or newer versions of HTML. At the time, most file formats were more like Word 1.0: tied to a specific program version, with no guarantee that newer versions would be able to read it.

I still think HTML is the most future-proof file format after ASCII. HTML is now 20 years old, and the latest browsers can still handle the oldest web pages. These days, HTML is more verbose and is rarely hand written, but most pages can still be read as plain text by a determined reader. What’s more, it continues to evolve gracefully, so it’s unlikely to get replaced the way MP3 replaced MOD (remember MOD?) or JPEG-2000 could replace JPEG.

Which brings me to the Seventh Generation project. I’m thinking of not just printing out letters, but making them available online. I don’t think leppik.net will still be around, but someone will have archives. So how do I keep people from reading the letters too early? I plan to encode them and provide a script that will decode them at the appropriate time. (No fancy encryption, that wouldn’t last; ROT-13 should be enough to keep the honest from accidentally seeing too much.)

So what language do you use for scripts in HTML? JavaScript is the only language. And JavaScript is still around not for any technical merits, but because it’s the universal scripting language for HTML. There are many open source JavaScript interpreters out there and a standards body ensures backwards compatibility. That’s about as good as you’re going to get. All the C code in the world might get replaced over the next century, but JavaScript is required for a standards compliant web browser. Indeed, few have ever said JavaScript is a good language, but getting all the browser vendors to switch to a better one proved impossible.

So I’ll put my letters online that are encoded to discourage casual reading, and at some point in the future, a simple JavaScript program (with no external dependencies) will add a “decode” button to the page. Will it work? Two hundred years is a long time, and only time will tell.

Advertisements