Year
2015
Season
Spring
Paper Type
Master's Thesis
College
College of Computing, Engineering & Construction
Degree Name
Master of Science in Computer and Information Sciences (MS)
Department
Computing
NACO controlled Corporate Body
University of North Florida. School of Computing
First Advisor
Dr. Ching-Hua Chuan
Second Advisor
Dr. Behrooz Seyed-Abbassi
Third Advisor
Dr. Sanjay Ahuja
Fourth Advisor
Dr. Roger Eggen
Department Chair
Dr. Asai Asaithambi
College Dean
Dr. Mark A. Tumeo
Abstract
Time series data are sequences of data points collected at certain time intervals. The advance in mobile and sensor technologies has led to rapid growth in the available amount of time series data. The ability to search large time series data sets can be extremely useful in many applications. In healthcare, a system monitoring vital signals can perform a search against the past data and identify possible health threatening conditions. In engineering, a system can analyze performances of complicated equipment and identify possible failure situations or needs of maintenance based on historical data.
Existing search methods for time series data are limited in many ways. Systems utilizing memory-bound or disk-bound indexes are restricted by the resources of a single machine or hard drive. Systems that do not use indexes must search through the entire database whenever a search is requested.
The proposed system uses multidimensional index in the distributed storage environment to break the bound of one physical machine and allow for high data scalability. Utilizing an index allows the system to locate the patterns similar to the query without having to examine the entire dataset, which can significantly reduce the amount of computing resources required. The system uses an Apache HBase distributed key-value database to store the index and time series data across a cluster of machines. Evaluations were conducted to examine the system’s performance using synthesized data up to 30 million data points. The evaluation results showed that, despite some drawbacks inherited from an R-tree data structure, the system can efficiently search and retrieve patterns in large time series datasets.
Suggested Citation
Charapko, Aleksey, "Time Series Similarity Search in Distributed Key-Value Data Stores Using R-Trees" (2015). UNF Graduate Theses and Dissertations. 565.
https://digitalcommons.unf.edu/etd/565