ActivePapers

ActivePapers is a data format for storing scientific data, executable code working on that data, and human-readable documentation in a single file. An ActivePapers file can be archived, shared with collaborators, and submitted as supplementary material for a scientific publication. Ultimately, ActivePapers files could replace PDF files in the electronic publishing process.

The ActivePapers system was submitted to Elsevier’s Executable Paper Grand Challenge and was one of the finalists. A short description has been published in the ICCS 2011 proceedings (free access). Since this description was written before the implementation, it remains a bit vague on some issues. A report on the use of ActivePapers in a few research projects has been published more recently.

Compared to most other approaches to integrating data and code into the scientific research and publication practices, ActivePapers differs by placing the emphasis on the data rather than on the tools used to work on them. Tools tend to change rapidly as computer technology advances. Today we are all excited about Web 2.0, but ten years from now, we may have moved to peer-to-peer networking of mobile computing devices and stare at today’s desktop and server hardware displayed in museum showcases. If we want scientific data to survive such changes (at least some of it deserves it), we need to think about how we store our data now. The ActivePapers format is designed to be usable in all of today’s computing environments (desktop, server, smartphone, …) and the required support infrastructure is sufficiently small that it can be ported to future computer generations with reasonable effort.

There are currently three implementations of the ActivePapers system. The original implementation, based on the Java Virtual Machine, is best described as “proof of concept”. The essential elements are there, but the runtime library needs to be more complete before it can be put to practical use. It consists of a command-line tool for working with ActivePapers that should work out of the box under Linux and MacOS X, and with little effort under Windows. There are also a few simple examples and a tutorial.

The second implementation of ActivePapers is based on the Python language. While it lacks some of the security features of the JVM edition (because the Python platform does not permit their implementation), it has the advantage of supporting the enormous scientific computing ecosystem that has developed around Python, and is thus immediately much more useful. On the other hand, the Python platform is also quite fragile and the long-term reproducibility of ActivePapers based on this platform should be expected to be weak.

The third and youngest implementation is based on the Pharo system and focuses on the user interface aspects of ActivePapers. It is work in progress, to the point that Pharo-based ActivePapers are hard to reproduce at the moment.

For more details, see