Year
2017
Season
Fall
Paper Type
Master's Thesis
College
College of Computing, Engineering & Construction
Degree Name
Master of Science in Computer and Information Sciences (MS)
Department
Computing
NACO controlled Corporate Body
University of North Florida. School of Computing
First Advisor
Dr. Sanjay P. Ahuja
Second Advisor
Dr. Zornitza Prodanoff
Rights Statement
http://rightsstatements.org/vocab/InC/1.0/
Third Advisor
Dr. Swapnoneel Roy
Department Chair
Dr. Sherif Elfayoumy
College Dean
Dr. Mark A. Tumeo
Abstract
Cloud computing is a computing paradigm where large numbers of devices are connected through networks that provide a dynamically scalable infrastructure for applications, data and storage. Currently, many businesses, from small scale to big companies and industries, are changing their operations to utilize cloud services because cloud platforms could increase company’s growth through process efficiency and reduction in information technology spending [Coles16]. Companies are relying on cloud platforms like Amazon Web Services, Google Compute Engine, and Microsoft Azure, etc., for their business development.
Due to the emergence of new technologies, devices, and communications, the amount of data produced is growing rapidly every day. Big data is a collection of large dataset, typically hundreds of gigabytes, terabytes or petabytes. Big data storage and the analytics of this huge volume of data are a great challenge for companies and new businesses to handle, which is a primary focus of this paper.
This research was conducted on Amazon’s Elastic Compute Cloud (EC2) and Microsoft Azure platforms using the HiBench Hadoop Big Data Benchmark suite [HiBench16]. Processing huge volumes of data is a tedious task that is normally handled through traditional database servers. In contrast, Hadoop is a powerful framework is used to handle applications with big data requirements efficiently by using the MapReduce
algorithm to run them on systems with many commodity hardware nodes. Hadoop’s distributed file system facilitates rapid storage and data transfer rates of big data among the nodes and remains operational even when a node failure has occurred in a cluster. HiBench is a big data benchmarking tool that is used for evaluating the performance of big data applications whose data are handled and controlled by the Hadoop framework cluster. Hadoop cluster environment was enabled and evaluated on two cloud platforms. A quantitative comparison was performed on Amazon EC2 and Microsoft Azure along with a study of their pricing models. Measures are suggested for future studies and research.
Suggested Citation
Muthiah, Karthika Ms., "Performance Evaluation of Hadoop based Big Data Applications with HiBench Benchmarking tool on IaaS Cloud Platforms" (2017). UNF Graduate Theses and Dissertations. 771.
https://digitalcommons.unf.edu/etd/771