Provenance and versioning
A description of versioning issues with regards provenance.
Background
In Kepler 2.0, saving a KAR results in a KAR with KAR-Version: 2.0. The manifest contains an incomplete module-dependencies list. For example, a KAR saved from the kepler suite using the 2.0 branch or released kepler 2.0.0 will have module-dependencies:
core
In Kepler 2.1, saving a KAR results in a KAR with KAR-Version: 2.1. The manifest contains the full module-depedencies list. For example, a KAR saved from the kepler suite on the 2.1 branch will have module-dependencies:
kepler-2.1;outreach-2.1;apple-extensions-2.1;r-2.1;loader-2.1;actors-2.1;directors-2.1;opendap-2.1;dataturbine-2.1;ecogrid-2.1;authentication-gui-2.1;module-manager-gui-2.1;gui-2.1;authentic
ation-2.1;repository-2.1;job-2.1;io-2.1;ssh-2.1;data-handling-2.1;sms-2.1;component-library-2.1;util-2.1;event-state-2.1;core-2.1;common-2.1;module-manager-2.1;configuration-manager-2.1;kepler-tasks-2.1;ptolemy-8.0
and when Kepler 2.1.0 is released, a KAR saved from the kepler suite will have module dependencies:
kepler-2.1.0;outreach-2.1.0;apple-extensions-2.1.0;r-2.1.0;loader-2.1.0;actors-2.1.0;directors-2.1.0;opendap-2.1.0;dataturbine-2.1.0;ecogrid-2.1.0;authentication-gui-2.1.0;module-manager-gui-2.1.0;gui-2.1.0;authentic
ation-2.1.0;repository-2.1.0;job-2.1.0;io-2.1.0;ssh-2.1.0;data-handling-2.1.0;sms-2.1.0;component-library-2.1.0;util-2.1.0;event-state-2.1.0;core-2.1.0;common-2.1;module-manager-2.1.0;configuration-manager-2.1.0;kepler-tasks-2.1.0;ptolemy-8.0.0
Using these module-dependencies lists, the 'Import Module Dependencies' feature was made to work. When attempting to open a KAR, if Kepler isn't currently running the required module-dependencies, the user is prompted to download (if necessary) the missing module(s), and to restart Kepler using them. For example, if user A saves a KAR in the reporting suite, and gives it to user B, who is running the kepler suite, user B will be prompted to download the reporting modules when he attempts to open the KAR.
How often you're prompted is determined by a new user preference within Kepler, 'KAR opening compliance mode', which may be Strict or Relaxed:
* Strict * In order to open a KAR in Strict mode, you must be running Kepler with the exact same modules, in the same order, that the KAR was created with. You will be prompted if you need to change or install modules. This may mean restarting with an older version of a module you're currently using. Strict mode enables maximum compatibility.
* Relaxed * In order to open a KAR in Relaxed mode, you must simply be running version(s) newer than or equal to, in some order, of the modules used to create it. However maximum compatibility is not guaranteed.
It is assumed that the use of Strict mode will be rare, e.g. a user dealing with another's published KAR, and wants to ensure he's running beneath the exact same codebase, or a developer who is debugging. The issue at hand is that the code to open or save a KAR may have changed between versions.
For each execution, provenance records the full module-dependencies list (KARs are not stored in provenance, but all their component parts are). When a run is exported to a KAR, the KAR manifest reflects not the currently running modules, but the modules used for that run's execution.
It is only possible to export runs together into the same KAR that executed beneath the exact same modules. In this way the module-list of the kar is accurate.
When the user is set to Strict, they may only export, upload, or open run(s) when running the exact same modules, in the same order, that created the run(s).
It is possible for the provenance schema to change between versions of provenance (probably only for major and minor version changes).
Plans
When provenance changes, we plan to upgrade the user's provenance database, so their history is maintained. We plan to leave the old database in place untouched (for hsql, in a versioned directory), and create the new provenance database with this information imported. (Leaving the old database around will likely be a user option.)
When a user who is set to Strict attempts to export a run executed beneath old modules from the Workflow Run Manager, he is prompted to change to a set of modules with versions older than those he is currently running, and kepler will restart using provenance code that needs to connect to a store with an old schema. This is why the old database has been left in place, the old runs he is trying to deal with are present in this store also. If the 'Strict user' is being prompted to restart into an old kepler when trying to operate on a KAR he already has on disk, he may simply attempt to open it again after restart into the old kepler.
Provenance may be used with database types that may not easily be kept in separate version directories on the filesystem. We consider these users advanced -- they must set up the database, and configure provenance's configuration file. I believe we've decided it will be up to this user to set the configuration file up so that each version of provenance may connect to a different database. We can help a bit by storing these configuration files in versioned directories, or having different namespaces within one file, or something along these lines.