RealDB

Master’s project for MS in Computer Science at the Rochester Institute of Technology by Jason Winnebeck, completed in 2010.

Committee

Chairperson: Henry A. Etlinger
Reader: Alan Kaminsky
Observer: T.J. Borrelli

Abstract

Embedded sensor monitoring systems deal with large amounts of live time-sequenced stream data. The embedded system requires a lower overhead data store that can work with limited resources and be able to run reliably and unattended even in the face of power faults.

Relational database management systems (RDBMS) are a well-understood and powerful solution capable of storing time-sequenced data; however, many have a high overhead, are not sufficiently reliable and maintenance free, or are unable to maintain a hard size limit without adding substantial complexity.

RealDB is a specialized solution that capitalizes on the unique attributes of the data stream storage problem to maintain maximum reliability in an unstable environment while significantly reducing overhead from indexing, space allocation, and inter-process communication when compared to a traditional RDBMS-based solution.

Summary

To summarize the main topics on what this work is to accomplish:

Create a storage engine for live streaming data streams that performs better than traditional relational database systems, by making applicable assumptions and tradeoffs for the specific situation.
- Faster: constant-time insertion and deletion of records, with respect to the size of the database
- Faster: logarithmic lookup time for data within a data stream, with linear time to read records
- Smaller: tightly compacted dataset with little or no indexing required, data is assumed to come in-order
Represent data streams as a continuous value over time, rather than as a set of discrete samples or rows
- Smaller: allows reduction of data stored by allowing reconstruction of the signal from a reduced data set
- Smarter: allows calculation of averages, integrals, etc based on the reconstruction over timespans.

Download

Latest Release: RealDB 1.0, released on August 22, 2010.

realdb-1.0-bin.zip - Includes the binary versions of each RealDB component
realdb-1.0-deps.zip - Includes runtime dependencies for core library and all other components, plus source where required. Extract this archive to the same directory as realdb-1.0-bin (so that all of the jars are in the same directory) to make RealDB runnable. For build/test dependencies, please use Maven.
realdb-1.0-src.tar.gz - Source code for the entire RealDB project

Please note that unfortunately the full dataset used for benchmarking in the final report cannot be published. A very small dataset usable for functionality demonstration is provided but is not suitable for benchmarking due to its size.

Javadoc for the realdb-core module (the “library” part) can be browsed online here. You may want to read on how to use RealDB.

Report

The final report was completed on September 19, 2010.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses>.

Additional permission under GNU GPL version 3 section 7

If you modify this Program, or any covered work, by linking or combining it with JUnit (or a modified version of that library), containing parts covered by the terms of Common Public License, 1.0, the licensors of this Program grant you additional permission to convey the resulting work.

Dependencies

The following libraries are used. Note that with the new Maven build system these would be downloaded for you:

Antlr 3.2 and its stringtemplate dependency - used to generate lexer and parser for RDL (RealDB Definition Language), Modified BSD license
JUnit 4.8.1, CPL licensed
JFreeChart 1.0.13, and its dependency JCommon, 1.0.15, LGPL licensed.
EasyMock, Apache 2.0 License

For running the main library, realdb-core, the only dependency is the Antlr runtime jar. For the browser and concept tools, JFreeChart is also needed. All other dependencies are used in building or unit tests.

For the benchmark program, it should be able to work with any JDBC driver, assuming the database server and driver configurations are set up properly. The benchmark as run in the final report uses the following:

MySQL Connector/J 5.1.12, GPLv2
Derby 10.5.3.0_1, Apache 2.0 License

Using RealDB

This section is very minimalistic and will be expanded in the future.

Running RealDB

Download the full release of RealDB; no compilation is needed. There are several ways to use RealDB:

Run the realdb-demo.jar in a GUI environment by double-clicking it (if your OS environment has an association for executable jars).
Run the realdb-cli.jar on the command line by executing java -jar RealDB-cli.jar in the directory where you uncompressed RealDB.
Reference realdb-core.jar in your own Java project and use the API calls to read and write data.
Run the RealDB Browser (DB viewer) in a GUI environment by double-clicking realdb-browser.jar
Run the RealDB proof-of-concept:
1. Execute java -cp realdb-concept-1.0.jar org.gillius.realdb.concept.VehicleDataSimulation data/Test.csv test.rdb – note that this will make a 200MB database file!
2. Run the concept analysis tool in a GUI environment by double-clicking realdb-concept.jar
Run the RealDB benchmark on a Linux machine:
1. Follow the instructions in mysql-setup.txt to download and install “private” copy of MySQL.
2. Run mkdir -p db\_data/realdb to create the DB directories.
3. Run java -jar realdb-benchmark-1.0.jar realdb\_file myisam innodb derby
  1. If you run as “root” and have a raw partition, you can run the “realdb_raw” test:
    1. Modify data/realdb_raw.properties and setting the “dataFile” to the device you wish to use; the benchmark will replace any data on that partition/device!
    2. Change monitoredDir and monitoredPattern to match.
    3. Include “realdb_raw” as a parameter to the benchmark
4. If you are on Windows, you are mostly on your own. However the steps are:
  1. Follow equivalent steps to mysql-setup.txt.
  2. Modify the data/innodb.properties and data/myisam.properties to comment out the Linux form of the command line and uncomment the Windows form
  3. Don’t include “realdb_raw” in the run line, since this mode of RealDB works only on Linux.

Note, the current version of RealDB may be unable to work with databases made with versions of RealDB before 1.0.

Compiling RealDB

Apache Maven is used for compilation. To compile RealDB yourself, download a recent version of Maven (2.2 was used at the time), and run mvn install to locally install the RealDB packages. The resulting source, javadoc, and binary jars for each module will be in the respective target directory.

RealDB Definition Language

The RealDB Definition Language (RDL) is used to define RealDB databases. Here is an example RDL file:

SET blockSize     = 2048
SET fileSize      = 204800
SET maxStreams    = 3
SET dataBlockSize = 2

CREATE STREAM Test WITH ID 1 {
value float NULL //will use SampledAlgorithm by default
}

CREATE STREAM CarSnapshots WITH ID 2 {
rpm float WITH CODEC DeadbandAlgorithm PARAMS (deadband=50.0),
speed float WITH CODEC DeadbandAlgorithm PARAMS (deadband=5),
passengers uint8 WITH CODEC StepAlgorithm,
driving boolean WITH CODEC StepAlgorithm
}

Brief descriptions of the elements in this file:

blockSize = sets the low-level block size of the file, must be greater than or a multiple of the native block size to ensure complete reliability.
fileSize = Size of the file, in bytes, should be a multiple of the blockSize; a partial block at the end would be wasted.
maxStreams = Amount of space to allocate in the file for data streams. This must be equal to or greater than the number of streams actually created. In the future, RealDB will support adding new streams to an existing database.
dataBlockSize = multiple of blockSize for a raw stream data block. The larger the dataBlockSize the more efficient the access, but the more data lost in a database/power/IO fault.

Creating Streams

CREATE STREAM <stream name> WITH ID <unique integer user ID>
<element name> <element type> \[NULL\] \[WITH CODEC <codec name>\] \[PARAMS(a=value, b=value, ...)\]
- If NULL is specified, the element is allowed to take on a null value, otherwise it must always have a defined value
<codec name> = one of the following options:
- SampledAlgorithm (the default) - Every record is written
- StepAlgorithm - Record is written if the value changed at all
- DeadbandAlgorithm - works only on numeric types; value written if |nextValue - previousValue| >= deadband; takes deadband as a parameter.
- A fully-qualified class name of a Java class that implements org.gillius.realdb.metadata.ElementCodec (allows user-defined codecs).
<element type>
- sint8, sint16, sint32, sint64, uint8, uint16, uint32 = signed and unsigned integer types of 8, 16, 32, or 64 bits.
- float, double = 32 bit and 64 bit types, supports IEEE-754 ranges and special states including denormals.
- boolean = element can be true or false

RealDB Concepts

DataStream

A DataStream is a time-ordered sequence of Records with a fixed format – an ordered list of elements.

Record

A single entry in a DataStream, has a timestamp and can be a “discontinuity” marker.

Element

A specific field in the Record with a known name and type.

StreamInterval

A reconstructed continuous interval of a DataStream that comprises of 1 or more original (possibly dropped) records.

Discontinuities

A stream can be in a “discontinuity” state. This is different from elements individually being null and even all elements being null at the same time. The actual meaning can be defined by the user. For example a null could be “not applicable,” or “collected but not present”, whereas a discontinuity could signifiy ranges where the system was turned off and not collecting data at all. When a discontinuity is written, the stream is said to remain in that state until the next non-discontinuity record is written.

Timestamps

All records in RealDB are timestamped, and written in ascending order of time. The timestamp is a 63 bit unsigned integer. RealDB does not assume any particular unit or meaning for time other than it being linear (i.e. distance between 5 and 10 is the same as between 50 and 55), leaving the user to define both the unit (i.e. seconds versus milliseconds) and the epoch (the meaning of time 0).

Command Line Tool

The current command line tool supports the following commands. Use the help command to get more information:

bulkLoad	Bulk loads data into a stream from a tab-separated values file.
create	Creates a new RealDB Database
describe	Shows information about a database.
help	Prints out a list of all commands or detailed help on a single command
intervals	Reads intervals in a given time range on an element within a stream.
read	Reads all raw records or a range of raw records from a single stream and outputs a tab-delimited output with the first row as the column headers.
summary	Reads intervals in a given time range on an element within a stream, and summarizes them into a single report.

Graphical Browser

There is a graphical browser tool that can view information about a database and the streams within. See the Running RealDB section on how to start it.