Gillius's Programming

RealDB

Master’s project for MS in Computer Science at the Rochester Institute of Technology by Jason Winnebeck, completed in 2010.

Committee

Abstract

Embedded sensor monitoring systems deal with large amounts of live time-sequenced stream data. The embedded system requires a lower overhead data store that can work with limited resources and be able to run reliably and unattended even in the face of power faults.

Relational database management systems (RDBMS) are a well-understood and powerful solution capable of storing time-sequenced data; however, many have a high overhead, are not sufficiently reliable and maintenance free, or are unable to maintain a hard size limit without adding substantial complexity.

RealDB is a specialized solution that capitalizes on the unique attributes of the data stream storage problem to maintain maximum reliability in an unstable environment while significantly reducing overhead from indexing, space allocation, and inter-process communication when compared to a traditional RDBMS-based solution.

Summary

To summarize the main topics on what this work is to accomplish:

Download

Latest Release: RealDB 1.0, released on August 22, 2010.

Please note that unfortunately the full dataset used for benchmarking in the final report cannot be published. A very small dataset usable for functionality demonstration is provided but is not suitable for benchmarking due to its size.

Javadoc for the realdb-core module (the “library” part) can be browsed online here. You may want to read on how to use RealDB.

Report

The final report was completed on September 19, 2010.

License

RealDB Project - Copyright (c) 2008-2010 by Jason Winnebeck

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses>.

Additional permission under GNU GPL version 3 section 7

If you modify this Program, or any covered work, by linking or combining it with JUnit (or a modified version of that library), containing parts covered by the terms of Common Public License, 1.0, the licensors of this Program grant you additional permission to convey the resulting work.

Dependencies

The following libraries are used. Note that with the new Maven build system these would be downloaded for you:

For running the main library, realdb-core, the only dependency is the Antlr runtime jar. For the browser and concept tools, JFreeChart is also needed. All other dependencies are used in building or unit tests.

For the benchmark program, it should be able to work with any JDBC driver, assuming the database server and driver configurations are set up properly. The benchmark as run in the final report uses the following:

Using RealDB

This section is very minimalistic and will be expanded in the future.

Running RealDB

Download the full release of RealDB; no compilation is needed. There are several ways to use RealDB:

  1. Run the realdb-demo.jar in a GUI environment by double-clicking it (if your OS environment has an association for executable jars).
  2. Run the realdb-cli.jar on the command line by executing java -jar RealDB-cli.jar in the directory where you uncompressed RealDB.
  3. Reference realdb-core.jar in your own Java project and use the API calls to read and write data.
  4. Run the RealDB Browser (DB viewer) in a GUI environment by double-clicking realdb-browser.jar
  5. Run the RealDB proof-of-concept:
    1. Execute java -cp realdb-concept-1.0.jar org.gillius.realdb.concept.VehicleDataSimulation data/Test.csv test.rdb – note that this will make a 200MB database file!
    2. Run the concept analysis tool in a GUI environment by double-clicking realdb-concept.jar
  6. Run the RealDB benchmark on a Linux machine:
    1. Follow the instructions in mysql-setup.txt to download and install “private” copy of MySQL.
    2. Run mkdir -p db\_data/realdb to create the DB directories.
    3. Run java -jar realdb-benchmark-1.0.jar realdb\_file myisam innodb derby
      1. If you run as “root” and have a raw partition, you can run the “realdb_raw” test:
        1. Modify data/realdb_raw.properties and setting the “dataFile” to the device you wish to use; the benchmark will replace any data on that partition/device!
        2. Change monitoredDir and monitoredPattern to match.
        3. Include “realdb_raw” as a parameter to the benchmark
    4. If you are on Windows, you are mostly on your own. However the steps are:
      1. Follow equivalent steps to mysql-setup.txt.
      2. Modify the data/innodb.properties and data/myisam.properties to comment out the Linux form of the command line and uncomment the Windows form
      3. Don’t include “realdb_raw” in the run line, since this mode of RealDB works only on Linux.

Note, the current version of RealDB may be unable to work with databases made with versions of RealDB before 1.0.

Compiling RealDB

Apache Maven is used for compilation. To compile RealDB yourself, download a recent version of Maven (2.2 was used at the time), and run mvn install to locally install the RealDB packages. The resulting source, javadoc, and binary jars for each module will be in the respective target directory.

RealDB Definition Language

The RealDB Definition Language (RDL) is used to define RealDB databases. Here is an example RDL file:

SET blockSize     = 2048
SET fileSize      = 204800
SET maxStreams    = 3
SET dataBlockSize = 2

CREATE STREAM Test WITH ID 1 {
value float NULL //will use SampledAlgorithm by default
}

CREATE STREAM CarSnapshots WITH ID 2 {
rpm float WITH CODEC DeadbandAlgorithm PARAMS (deadband=50.0),
speed float WITH CODEC DeadbandAlgorithm PARAMS (deadband=5),
passengers uint8 WITH CODEC StepAlgorithm,
driving boolean WITH CODEC StepAlgorithm
}

Brief descriptions of the elements in this file:

Creating Streams

RealDB Concepts

DataStream

A DataStream is a time-ordered sequence of Records with a fixed format – an ordered list of elements.

Record

A single entry in a DataStream, has a timestamp and can be a “discontinuity” marker.

Element

A specific field in the Record with a known name and type.

StreamInterval

A reconstructed continuous interval of a DataStream that comprises of 1 or more original (possibly dropped) records.

Discontinuities

A stream can be in a “discontinuity” state. This is different from elements individually being null and even all elements being null at the same time. The actual meaning can be defined by the user. For example a null could be “not applicable,” or “collected but not present”, whereas a discontinuity could signifiy ranges where the system was turned off and not collecting data at all. When a discontinuity is written, the stream is said to remain in that state until the next non-discontinuity record is written.

Timestamps

All records in RealDB are timestamped, and written in ascending order of time. The timestamp is a 63 bit unsigned integer. RealDB does not assume any particular unit or meaning for time other than it being linear (i.e. distance between 5 and 10 is the same as between 50 and 55), leaving the user to define both the unit (i.e. seconds versus milliseconds) and the epoch (the meaning of time 0).

Command Line Tool

The current command line tool supports the following commands. Use the help command to get more information:

bulkLoad Bulk loads data into a stream from a tab-separated values file.
create Creates a new RealDB Database
describe Shows information about a database.
help Prints out a list of all commands or detailed help on a single command
intervals Reads intervals in a given time range on an element within a stream.
read Reads all raw records or a range of raw records from a single stream and outputs a tab-delimited output with the first row as the column headers.
summary Reads intervals in a given time range on an element within a stream, and summarizes them into a single report.

Graphical Browser

There is a graphical browser tool that can view information about a database and the streams within. See the Running RealDB section on how to start it.