Interfaces to Provenance for REAP
This page describes ongoing work and plans for developing interfaces to and extending the Kepler Provenance Framework so that data stored in provenance may be used to help complete REAP usecases. We also plan to help satisfy the emerging provenance-related needs of Kepler Reporting.
REAP has several usecases that would benefit from use of the Kepler Provenance Framework (org.kepler.provenance). Initial development in this area is being driven by the Publication Ready Archive usecase and the needs of Kepler Reporting.
One path to complete the Publication Ready Archive usecase is to make any required additions to the Kepler Provenance Framework (KPF) to store all items associated with a workflow run, so that after the run is complete, a Publication Ready Archive may be created. These items include "the Kepler workflow, all required inputs (data sources like database tables are included), and all produced output."
Some background: The KPF ProvenanceRecorder supports different types of storage, File and SQL (MySQL). KPF has many improvements and additions underway, including faster storage to SQL DBs and to work more generally across SQL DBs, including Kepler's included DB, HSQL.
For the Publication Ready Archive Usecase we have initially been using the SQL storage mechanism. A few additions to the KPF SQL schema have been made: a) a table to store the workflow moml associated with a given workflow run, and b) a table to store any files that are referenced by tokens.
To select a given workflow run from provenance for export to a Publication Ready Archive, a Workflow Run Manager is being developed. Also planned are some changes to the main Kepler GUI.The mockups which I will begin developing towards are posted below. This is a work in progress, opinions welcome. Menus for configuring provenance stores are not displayed.
----------------
In the below mockup, the Workflow Run Manager is in collapsed state at bottom. We also see changes to Kepler:
1) New View menu in the toolbar. This switches between what we normally see when using Kepler, what I've called Editor, and other views such as Report Designer (not currently discussed on this page).
2) New workflow tab interface, allowing multiple workflows to be open inside one Kepler window. Here 3 workflows are open.
3) New workflow header pane, with workflow title, description and a tagging interface.
----------------
In the below mockup, at the bottom of the window we see the Workflow Run Manager expanded, with a row selected in blue. Each row represents a workflow run (execution) that has either occurred or is occuring. Red indicates a run that has failed, green a run that is still running (duration might be incrementing in realtime for these rows). The workflow in provenance associated with the selected row (HeatIndex2) has opened. Should a user change this workflow or its tags, they will be prompted to save it locally (i.e. nothing about the selected run changes in provenance).
In this mockup we also see the tagging interface. The user is midway through creating a new tag.
----------------
Here we see the tag menu. The pane expands to show all your previously entered tags. You may add them to the current workflow.
----------------
Here we see the right-click context menu for a row from the Workflow Run Manger, including the option to Export to Archive (i.e. a Publication Ready Archive).
----------------
Here we see the context menu for enabling/disabling the visibility of different columns.
At the top of each column are search menus. The user has already "Baskett Slough" in the Includes section of the Tags search, which is why only rows with that tag have been displayed.
----------------
Here the user has clicked in the begin date search box, they are presented with a popup menu to ease date selection.
----------------
This manager will interact with provenance via a ProvenanceClient API which is also under development (see below). This API should be useful for others, e.g. should a web GUI be desired. Initially the ProvenanceClient will allow common queries to provenance, and a mechanism for deletion, for example, deleting a particular workflow and all associated runs or just a particular workflow run from provenance.
----------------
Below is an idea for tabs within kepler, the idea being that the 3 tabs pictured would always be visible. A user would create workflows beneath the currently labeled "Editor" tab. The tabs in this mockup are from the ptolemy vergil Case actor gui (bur obviously serve a different purpose). Kepler does not currently use this tabbed gui for the Case actor (intentionally or not, I am unsure).
Another way to introduce/use tabs within Kepler would be to adopt an approach similar to the one used in the Kepler/pPOD customization. In particular, pPOD adds a "Workspace" tab to the left-hand pane of Kepler, which includes a folder-hierarchy of workflows (organized into example/demo and personal workflows) and associated workflow-execution traces. The following screenshot shows the general layout.
The top of the workspace pane contains workflows (here, the Pars_Loop_Consense workflow is shown with selected and open on the canvas), and the bottom of the workspace pane contains trace files. Execution traces are placed currently into a folder corresponding to the name of the workflow, and each trace is assigned an id based on the date and time of the workflow run (we plan to modify this slightly to provide additional information; see Kepler bugzilla). Double-clicking on the trace file (or selecting the trace and clicking "open") will load the trace file in the Kepler Provenance Browser, which is shown below. Note that trace files can also be saved within an external provenance database, queried, etc. Our storage approach and implementation is described in our recent EDBT 2009 paper.
The provenance browser visualizes data and invocation dependencies, allows users to step through the execution of a run "vcr-style", and also allows users to view the content of data items in a separate data viewer.
----------------
ProvenanceClient API - used to query and manage provenance storage:
/** Get a list of workflow names. */
List<String> getWorkflows();
/** Get a list of all executions. */
List<Integer> getExecutions();
/** Get a list of executions for a specific workflow. */
List<Integer> getExecutionsForWorkflow(String workflow);
/** Get a list of executions for a specific date range. */
List<Integer> getExecutionsForDateRange(Timestamp start, Timestamp stop);
/** Get workflow MoML for an execution. */
String getMoMLForExecution(int execId);
/** Get the value of a token. */
public String getTokenValue(int tokenId);
/** Get the type of a token. */
public String getTokenType(int tokenId);
/** Get an actor's name for a firing. */
public String getActorName(int fireId);
/** Get an actor's type for a firing. */
public String getActorType(int fireId);
/** Get the firings of the actor(s) that read or wrote a token. */
public List<Integer> getActorFiringForToken(int tokenId, boolean read);
/** Get an actor's parameter name value pairs for a firing. */
public Map<String,String> getParameterNameValues(int fireId);
/** Get an sequence of tokens for an execution.
* @param execId the execution id
* @param last if true, the sequence starts at the last token created
* and goes backwards to the first; otherwise the sequence starts at
* the first token.
*/
public List<Integer> getTokensForExecution(int execId, boolean last);
/** Get the immediate tokens that generated a token. */
public List<Integer> getImmediateDependencies(int tokenId);
/** Delete a list of executions. */
public void deleteExecutions(List<Integer> execIds);
/** Delete a list of workflows (and all their executions). */
public void deleteExecutions(List<String> workflows);