What will future historians do?
The other day I was looking for something in the Washington Post and, not finding it, went looking for it on the paper’s website. I discovered that there was a good deal more information, gathered by their reporters and by others, than is in the print-edition. This made me wonder what happens to the web-version of this and other newspapers with similar services.
I’ve done a lot of work in nineteenth-century NYC newspapers. Some times discovering how an event unfolded depends on learning when people became aware of something when the chief medium for disseminating information was the newspaper. Some of the papers had morning and evening editions. These newspapers still exist, at least in microfilm, and provide the physical traces of the past that historians use in their reconstructions. Often enough written traces are all that remains. Now, of course, fewer and fewer traces are being left on paper. I once heard the story that Cardinal Spellman, having read John Tracy Ellis’s biography of Cardinal Gibbons, said, “They’ll never be able to do that with me. I do everything by telephone.”
I’ve been wondering what future historians will have to work with. How much of what a newspaper, especially a “paper of record,” is content to put on its website but not in its print-edition is being preserved at all? How will it be possible to know, for example, how much information on a particular person or event was publicly available, or when it became available? When a newspaper changes what is available on its website, what happens to the previous version? Is there any record of when things were posted on a website?
The questions could be multiplied, of course, with regard to all those other sources of information now available, trustworthy or not. When a journal of opinion ceases its print-edition and goes all-net, what trace of it survives?
It strikes me that future historians may have far fewer traces of the past to work with. I was wondering what historians might have to say about this.



Yes, it’s a big problem and one that archivists are grappling with: how do we “save” the web? The good news, relatively, is email. Even a Spellman would have a massive email correspondence now, and much of this would have been saved. Here the problem will be too much information, not too little.
Another bit of really good news is searchability. I’ve worked with long runs of nineteenth-century periodicals and well-stocked publishers’ archives for a long time, and everyone always complained about the difficulty of gaining control over the chaotic stream of data available even in print or manuscript. But as soon as a periodical becomes available in electronic form, it becomes searchable and controllable in ways we never imagined. The good old days were never like this!
Susan, the searchability is wonderful, and I’ve used it to great advantage many times. My question is: what will future historians have before them to search?
John: why do you think much of e-mail is being saved? How much?
A future historian might find it interesting to search the Commonweal-blog? Will it be preserved? That’s a good question for the editors.
Let me press my point by using the present primary elections. Suppose five years from now, heck two years from now, someone wanted to do a history of the presidential campaigns. This, it seems is now the second presidential campaign in which the Internet will play a role, perhaps a major one. How will it be possible to investigate that role? Do websites archive their material?
Let’s imagine that a story originates and spreads on the Internet that winds up greatly aiding or greatly undercutting one candidate’s campaign. If the story is true, but even more if it is false, a libel, say, will it be possible to identify its source, ascertain when and by what means and how far it spread? Etc., etc. God knows how many blogs there are–millions, probably. Great material for researching, and there are instruments for searching at least some of it even now. But how long will this be within the grasp of Google’s inquisitive eyes?
I imagine someone must be thinking about it.
Joe,
In terms of long term preservation, newsprint is practically the worst thing available. Leave a newspaper out in the sun for just a day, and it will start to yellow and turn brittle. Perhaps some publishers have started to address this, but I doubt it. Everything from 1850 on will probably be unreadable by the end of this century.
The Library of Congress has long since moved the majority of its preservation efforts to electronic media, preserving searchable databses of texts and photos where databases are not available. (LC’s attempts to deacidify paper blew up a couple of buildings, so the electronic choice seemed safer)
Websites like the NY Times keep archives, presumably of all that is posted online. (even ads?) There are some places that are devoted to archiving just about everything on the web that they can, eg The Way Back Machine. That is a little more iffy that individual archiving.
The major problem with electronic archives is retrieval. I can pick up a hundred year old newspaper and try to read it before it falls apart in my hands. I can pick up a twenty year old floppy disk and might not even know what it is, let alone how to get information from it.
Maybe we should bury yrns full of info on parchments in the Egyptian desert. That has preserved some texts for a couple of millenia.
Thanks to the reference to the archiving sites, Jim. The Internet Archive and its associated sites seem to be a worthy effort, though Fr.Komonchak is right about the enormity of the task. Here’s a site that gives a kind of snapshot of the efforts being made by the likes of the American Library Association, The Library of Congress and The Internet Archive. I wonder what kind of funding these efforts have, though.
Sorry! here’s the site:
http://www.archive.org/about/about.php
I have many of the same questions as Fr. Komonchak does. I can see the problems with long-term preservation of paper, though thankfully we have documents from millenia ago. (Yet who knows how much genius was lost to nature’s elements.) But what about electronic archiving? I know little about it, but Jim McK raises a good point about its retrieval. Wouldn’t electronic data have to re-stored (say, every 20 years) to ensure that it is being stored in a form that is compatible with the electronic retrieval system itself? As we know, electronic sysytems are evolving almost at light speed. Also, wouldn’t it be necessary to store such electronic information in multiple repositories to ensure against loss? And absent multiple repositories, any loss of information could be immense. It would be one thing to lose an archival newspaper for some reason, but perhaps catastrophic to lose hundreds of thousands of newspapers that had been digitized onto a single CD that has no back-up.
Even if all the online archival efforts at the Times and the Post work technically quite well over time–as I suspect they will–I’m still concerned about a couple other things:
- What about smaller papers, which often don’t leave articles posted for more than a few weeks or so?
- What about corrections and updates? While papers will often note that articles have been changed, the original text itself, usually unavailable online, can be extremely useful to researchers–especially when the incorrect/incomplete version is itself an interesting story, or sheds light on one. (And it’s hard to imagine the picture of Truman holding up the “Dewey Defeats Truman” copy of the Tribune having the same impact if he was instead holding up his laptop, on which some blogger had managed to capture a screenshot of the faulty headline in the short time before it was forever taken down…)
- washingtonpost.com is not really the site of the Washington Post–it’s technically a separate endeavor, w/ separate staff, etc. So relying on the site as an archive of the printed paper, while in general a useful thing, gets more complicated in the context of questions of editorial accountability, etc.
When people die without leaving printouts of emails or a list of passwords, I guess their emails die with them.
Kinda makes spellcheck seem unnecessary.
Our wonderful Michigan State University librarians digitize print materials that are falling apart. They have a collection of 19th-Century Sunday school books, and a big collection of cookbooks as well as newspapers and other records.
The librarian who developed the digitization program gives seminars to public libraries about how to manage similar programs so that local records–including back issues of the local newspaper–can be converted into electronic formats.
The rising interest in genealogical research has helped push public libraries to preserve more local historical documents.
Digitization reduces the amount of handling of the originals and allows them to be kept longer, and also makes them more widely available.
As Jim noted, newspaper falls apart quickly, and it also takes up a lot of space in the library. Digitizing it means libraries can store more stuff in the space available.
What to keep in paper form and what to digitize and pitch has been a big debate among librarians and archivists. There was a fellow four or five years ago freaking out and writing books about how libraries shouldn’t throw anything away. But, practically speaking, the amount of information that exists today is such that you can’t keep it all.
The morning radio news this morning here in DC reported that forty years of compujerized data, including data for court cases, had been erased through human error or by some computer glitch.
40 years of court case data in DC has been erased?!
[Note to Self: No need to finish that pardon request to Pres. Bush.]