REAP Project
Real-time Environment for Analytical Processing (REAP)
The Real-time Environment for Analytical Processing (REAP) project unites oceanographers, marine and terrestrial ecologists, technologists, and computer science researchers in the goal of constructing, testing and deploying scientific workflow systems that access, monitor, and analyze sensor data.
Investigators are currently developing prototype workflows that use the technologies emerging from the collaboration to address the challenges of accessing and integrating environmental data. In one use case, researchers studying a large-scale invasion of non-native grasses occurring in the western United States are modeling the susceptibility of plants to an aphid-vectored pathogen. REAP investigators are prototyping a wide network (ranging from Canada to Mexico) of real-time land-based environmental sensors that stream data to a DataTurbine server, an open source streaming data middleware software that provides a robust and generic interface for accessing data from a diverse set of sensors, that is accessed by Kepler and provides data to answer questions about the pathogen-host community.
In a second prototype, the new technologies are being used to compare and match-up existing remote-sensed images of sea surface temperature found in OPeNDAP archives. Users specify the time span, sampling rate, and geographical location of interest, and Kepler actors identify and return matching data sets based on information contained in metadata. The Kepler workflow will reduce the complexity of comparing and integrating data collected by a variety of satellite-borne, ship-board, and other in situ instruments that form today’s bewildering array of sea surface temperature sensors.
A cabled seafloor project, also under development, accesses, analyzes and plots data collected at the Kilo Nalu Nearshore Reef Observatory in Hawaii by a Workhorse Acoustic Doppler Current Profiler (ADCP) instrument. These data are sent to a DataTurbine server, and Kepler’s forthcoming DataTurbine actor is used to grab requested data from the server, perform an analysis, and plot the results. The DataTurbine actor automatically assigns “nil” values to missing data points that arise when the sensors drop offline or other real-world problems occur so that time series are well-formed and can be more easily processed. Alternatively, the new Ensemble actor, which converts the ADCPs binary data format to numeric data, can be used to deal directly with the ADCP binary stream, bypassing conversions that occur at the DataTurbine server. REAP investigators are also looking into the feasibility and suitability of using a Sensor Web Enablement interface to aid in the discovery and accessibility of DataTurbine data.
For more information, please see the REAP project site.