Preserving Software - Feedback Requested!
On the 20th and 21st of May the Library of Congress holds a workshop on the topic of preserving software.
Otto de Voogd and Robert Kaiser will be representing Mozilla, putting forward our viewpoint as custodians of a codebase with a significant heritage and importance.
Many questions and thoughts arise. Here's an overview of ours; we look forward to feedback.
- Should archivists keep source codes or executables or both?
Executables and source code are both valuable. Executables are valuable because the source code is sometimes not available, or perhaps the build tools are not, and setting up a build environment for older code can be a difficult and complex thing.
Source is valuable to determine how a program works. It also makes it possible to reuse code and algorithms, especially, but not only, in the case of open source software.
- Preserving documentation.
Preserving documentation that goes with software, seems logical.
Would this need to go as far as preserving discussion threads and entries in bug trackers?
- Preserving environments/platforms.
It seems obvious that without preserving an environment in which the software can run, it is going to be impossible to experience the software.
Preserving such an environment should therefor be part of the software preservation effort.
To avoid the physical constraints imposed by preserving old hardware (which would be a preservation effort in its own right), a solution would be to build virtual machines and emulators.
As hardware capacity constantly grows, running virtual versions of older hardware should generally be feasible.
To fully recreate an environment we'd also need to preserve the operating systems and other software tools that the preserved software needs to run.
Those being software themselves would logical already be included in any software preservation effort.
Preserving documentation concerning environments, would also be required.
To build virtual machines and emulators it would be helpful for hardware makers to make technical specifications available. One could envision this to become a legal requirement at least for older hardware.
Can we imagine a world where web based emulators would allow an online digital library to serve users worldwide? Users who would be able to run old software in emulators running in their browsers...
- Is everything worth preserving, if not how does one go about selecting what is worth preserving?
Does one need to preserve every version of software, just the last version or all major releases? What about preserving software that has not spread widely. Would there be some threshold, or some other criteria?
- How does one index software and search the library?
There will be a need to gather meta data about software and the preservation of documentation as we already mentioned. This meta data and documentation could serve to populate an index enabling for instance the search for particular features.
- Can software preservation help in making code reusable?
If there are good ways to actually find relevant and useful code, this could lead to more reuse not only of actual code, but also of algorithms and concepts.
It may also become a valuable source for students who wish to learn about actual implementations of software solutions.
At the very least a minimum of meta data, such publication dates, copyright owners and licenses should be available to determine how certain code can be reused.
In particular for open source software we believe that software libraries should strive make it available without restrictions.
- Preserving data formats.
The software preservation effort should also include an effort to preserve data formats. Including technical descriptions of those formats and the tools to read, write and edit those formats.
- Can software preservation help in the discovery of prior art?
We believe it can, and as such preserving old code could be a great tool in preventing the repatenting of existing software concepts.
Of course we believe that software patents shouldn't exist in the first place, as software is already covered by copyrights, but at the very least prior art is a good avenue to prevent some of the worst abuse of software patents.
- How do copyrights affect software libraries?
A lot of software is licensed to be used on a particular piece of hardware or only available via subscription. How does this affect software libraries? Should there be exceptions like there are for traditional libraries?
In the life cycle of software, the commercially exploitable time is limited, likely anything older than 10 years no longer has any commercial value.
Maybe copyrights on software should be significantly reduced to something like 10 years, which is more than enough to cover the commercially exploitable timeframe of the software life cycle.
Such a limit would greatly enhance the work of software libraries, increasing availability and ease of access as well as removing a lot of the red tape involving requests for permission to keep copies.
- What about software as a service?
And what about software as a service, where neither the source code nor the executables are ever published? How can something like Gmail be preserved, when neither the service's code nor the environment is available to the public?
- Preserving "illegal" or cracked copies?
What if a copy of a piece of software comes from an illegal source? A cracked version with modifications maybe? They have value in themselves as they are a cultural expression.
What if such an illegal copy is the only copy still available? Would it make sense to preserve that too?
Entry written by KaiRo and posted on May 17th, 2013 00:08 | Tags: history, Mozilla, preservation, software | 2 comments | TrackBack
"Would this need to go as far as preserving discussion threads and entries in bug trackers?"
Discussion threads: Probably yes. My initial thought was to just preserve the -devel mailing list (development topics), but a -users mailing list (end user questions and support) might just be as important, and serve as a fallback to look for answers where the real end user documentation (user manual, etc..) doesn't provide a good answer.
Bug tracker entries: Definitely. Imagine a source code comment line that says "We need to reevaluate the query here because of bug 4711." - people studying the source code will have an easier time understanding decisions and motivations if they can read the history of a bug report (and maybe alternative tries at solving the problem and why the alternative solutions weren't chosen).
Ideally all URLs referenced in source code comments should also be mirrored (wayback machine style). Imagine a tricky algorithm that is based on some scientific paper or article somewhere on the web (with link in the source code) - getting access to that paper/article might be very helpful (or even essential) for understanding the algorithm's inner workings.
from San Francisco
This is great. I love your answers and especially making sure that the archive project tries to license the archives in as free and open a manner as possible.
I also agree that bug trackers, forums, and comments will be useful for future research, for people who want to redeploy software on future platforms, or for historians and data scientists. Thanks so much for reporting on this process!