Prometheus Blogs

The Downside of Documents


Information has become the lifeblood of most organizations. Information gets pumped around businesses, agencies, and institutions in the form of spreadsheets, word processor documents, presentations, PDFs, videos, and the like. It’s sent by email, posted on intranets, transferred on bluetooth, and hand carried on memory sticks or optical media.

What may be less obvious is the possibility better methods. Most people and organizations are tied to their documents. They’ve been using documents for years. Documents are “how things work”, and anything else is just about unthinkable. Yet documents have a lot of problems. Many of these problems have solutions (sometimes simple solutions) that nevertheless rarely get implemented.

Image of person swamped by paper documents

This problem never really went away. Technology just replaced ink by electrons.

Documents are difficult to control. It’s easy to mistakenly leak data through documents. I was recently sent a supposedly blank document by a lawyer who forgot to remove a plainly visible client’s name and social security number. Less visible confidential information that was thought to have been removed may still be buried in the document’s change history. Meta-information is an even bigger problem, because few people think to check it. The NSA’s suggestions for scrubbing documents are complicated, and depend on which version of software created the document. Think document meta-information isn’t important? Tools like FOCA can be used to extract meta-information that’s useful for planning phishing attacks.

Documents can be a security risk, when their format supports executable code such as macros.

Documents tend to proliferate. They get copied between laptops and desk tops, copied between work and home, forwarded to other people, backed up, etc. Often these copies are mutated and reproduced again, ad nauseam. Can you point to the most correct version of all your documents?

Documents are rarely version controlled. It’s easy to grab an obsolete version, not get copied on an update, misplace the latest version, etc. More likely than not, your time has been wasted in meetings where people were working off of different versions. It can also be hard to tell who made what changes.

Documents are easily lost. Even compulsives who never delete an email can have trouble finding the attachments they want. People usually don’t think ahead about future searches when they create subject lines or even the text in a message. Even if the right words are present, a typo can prevent you from finding the document you need.

Documents can be difficult to compare. One reason is that most file formats are not “stable”. The smallest change inside a document can result in big changes to the layout of information inside the file. That problem is largely addressed by format-aware comparison tools (for example Word has a version comparison feature). These comparison tools are not very helpful, however, when many changes are present or even when the same content is provided, but organized in a very different way.

Documents often have unwarranted structural differences. Even when a template a provided, and authors are requested to adhere to the template, it rarely happens. People adjust the structure to suit themselves, regardless of how much that complicates working with the documents later.

Documents are difficult to consume electronically, and difficult to mine for information. Sure, anything that’s stored electronically can in principle be read and processed by a machine, but that isn’t practical yet in the vast majority of cases. Documents are organized for use by humans, and artificial intelligence isn’t yet up to reliably extracting meaning from content that’s made for human interpretation.

Sure, it’s not technically difficult to set up a version control system, document control system, or even just a repository for documents. Even when that’s done, users may fail to use it, or use it incorrectly.

Reader Comments
Leave a Comment