As Digital Preservation
is part of the agenda of the US Library of Congress, they're doing a workshop on Software Preservation next week, and Mozilla was invited as an expert group. Otto de Voogd
and myself are in the delegation going there (I'll be roughly in the Washington, DC, area from Saturday until June 2) for Mozilla - and the text below is a guest post by Otto with questions that we would like some feedback on so we can represent the Mozilla community as well as possible:
On the 20th and 21st of May the Library of Congress holds a workshop on the topic of preserving software.
Otto de Voogd and Robert Kaiser will be representing Mozilla, putting forward our viewpoint as custodians of a codebase with a significant heritage and importance.
Many questions and thoughts arise. Here's an overview of ours; we look forward to feedback.
- Should archivists keep source codes or executables or both?
Executables and source code are both valuable. Executables are valuable because the source code is sometimes not available, or perhaps the build tools are not, and setting up a build environment for older code can be a difficult and complex thing.
Source is valuable to determine how a program works. It also makes it possible to reuse code and algorithms, especially, but not only, in the case of open source software.
- Preserving documentation.
Preserving documentation that goes with software, seems logical.
Would this need to go as far as preserving discussion threads and entries in bug trackers?
- Preserving environments/platforms.
It seems obvious that without preserving an environment in which the software can run, it is going to be impossible to experience the software.
Preserving such an environment should therefor be part of the software preservation effort.
To avoid the physical constraints imposed by preserving old hardware (which would be a preservation effort in its own right), a solution would be to build virtual machines and emulators.
As hardware capacity constantly grows, running virtual versions of older hardware should generally be feasible.
To fully recreate an environment we'd also need to preserve the operating systems and other software tools that the preserved software needs to run.
Those being software themselves would logical already be included in any software preservation effort.
Preserving documentation concerning environments, would also be required.
To build virtual machines and emulators it would be helpful for hardware makers to make technical specifications available. One could envision this to become a legal requirement at least for older hardware.
Can we imagine a world where web based emulators would allow an online digital library to serve users worldwide? Users who would be able to run old software in emulators running in their browsers...
- Is everything worth preserving, if not how does one go about selecting what is worth preserving?
Does one need to preserve every version of software, just the last version or all major releases? What about preserving software that has not spread widely. Would there be some threshold, or some other criteria?
- How does one index software and search the library?
There will be a need to gather meta data about software and the preservation of documentation as we already mentioned. This meta data and documentation could serve to populate an index enabling for instance the search for particular features.
- Can software preservation help in making code reusable?
If there are good ways to actually find relevant and useful code, this could lead to more reuse not only of actual code, but also of algorithms and concepts.
It may also become a valuable source for students who wish to learn about actual implementations of software solutions.
At the very least a minimum of meta data, such publication dates, copyright owners and licenses should be available to determine how certain code can be reused.
In particular for open source software we believe that software libraries should strive make it available without restrictions.
- Preserving data formats.
The software preservation effort should also include an effort to preserve data formats. Including technical descriptions of those formats and the tools to read, write and edit those formats.
- Can software preservation help in the discovery of prior art?
We believe it can, and as such preserving old code could be a great tool in preventing the repatenting of existing software concepts.
Of course we believe that software patents shouldn't exist in the first place, as software is already covered by copyrights, but at the very least prior art is a good avenue to prevent some of the worst abuse of software patents.
- How do copyrights affect software libraries?
A lot of software is licensed to be used on a particular piece of hardware or only available via subscription. How does this affect software libraries? Should there be exceptions like there are for traditional libraries?
In the life cycle of software, the commercially exploitable time is limited, likely anything older than 10 years no longer has any commercial value.
Maybe copyrights on software should be significantly reduced to something like 10 years, which is more than enough to cover the commercially exploitable timeframe of the software life cycle.
Such a limit would greatly enhance the work of software libraries, increasing availability and ease of access as well as removing a lot of the red tape involving requests for permission to keep copies.
- What about software as a service?
And what about software as a service, where neither the source code nor the executables are ever published? How can something like Gmail be preserved, when neither the service's code nor the environment is available to the public?
- Preserving "illegal" or cracked copies?
What if a copy of a piece of software comes from an illegal source? A cracked version with modifications maybe? They have value in themselves as they are a cultural expression.
What if such an illegal copy is the only copy still available? Would it make sense to preserve that too?