Year

2015

Season

Spring

Paper Type

Master's Thesis

College

College of Computing, Engineering & Construction

Degree Name

Master of Science in Computer and Information Sciences (MS)

Department

Computing

NACO controlled Corporate Body

University of North Florida. School of Computing

Committee Chairperson

Dr. Ching-Hua Chuan

Second Advisor

Dr. Behrooz Seyed-Abbassi

Rights Statement

http://rightsstatements.org/vocab/InC/1.0/

Third Advisor

Dr. Sanjay Ahuja

Fourth Advisor

Dr. Roger Eggen

Department Chair

Dr. Asai Asaithambi

College Dean

Dr. Mark A. Tumeo

Abstract

Time series data are sequences of data points collected at certain time intervals. The advance in mobile and sensor technologies has led to rapid growth in the available amount of time series data. The ability to search large time series data sets can be extremely useful in many applications. In healthcare, a system monitoring vital signals can perform a search against the past data and identify possible health threatening conditions. In engineering, a system can analyze performances of complicated equipment and identify possible failure situations or needs of maintenance based on historical data.

Existing search methods for time series data are limited in many ways. Systems utilizing memory-bound or disk-bound indexes are restricted by the resources of a single machine or hard drive. Systems that do not use indexes must search through the entire database whenever a search is requested.

The proposed system uses multidimensional index in the distributed storage environment to break the bound of one physical machine and allow for high data scalability. Utilizing an index allows the system to locate the patterns similar to the query without having to examine the entire dataset, which can significantly reduce the amount of computing resources required. The system uses an Apache HBase distributed key-value database to store the index and time series data across a cluster of machines. Evaluations were conducted to examine the system’s performance using synthesized data up to 30 million data points. The evaluation results showed that, despite some drawbacks inherited from an R-tree data structure, the system can efficiently search and retrieve patterns in large time series datasets.

Suggested Citation

Charapko, Aleksey, "Time Series Similarity Search in Distributed Key-Value Data Stores Using R-Trees" (2015). UNF Graduate Theses and Dissertations. 565.
https://digitalcommons.unf.edu/etd/565

Download

Included in

Databases and Information Systems Commons

COinS

UNF Graduate Theses and Dissertations

Time Series Similarity Search in Distributed Key-Value Data Stores Using R-Trees

Year

Season

Paper Type

College

Degree Name

Department

NACO controlled Corporate Body

Committee Chairperson

Second Advisor

Rights Statement

Third Advisor

Fourth Advisor

Department Chair

College Dean

Abstract

Suggested Citation

Included in

Search

Links

Browse

Author Corner

UNF Graduate Theses and Dissertations

Time Series Similarity Search in Distributed Key-Value Data Stores Using R-Trees

Author

Year

Season

Paper Type

College

Degree Name

Department

NACO controlled Corporate Body

Committee Chairperson

Second Advisor

Rights Statement

Third Advisor

Fourth Advisor

Department Chair

College Dean

Abstract

Suggested Citation

Included in

Share

Search

Links

Browse

Author Corner