Year of Publication

2017

Season of Publication

Fall

Paper Type

Master's Thesis

College

College of Computing, Engineering & Construction

Degree Name

Master of Science in Computer and Information Sciences (MS)

Department

Computing

NACO controlled Corporate Body

University of North Florida. School of Computing

First Advisor

Dr. Sanjay P. Ahuja

Second Advisor

Dr. Zornitza Prodanoff

Third Advisor

Dr. Swapnoneel Roy

Department Chair

Dr. Sherif Elfayoumy

College Dean

Dr. Mark A. Tumeo

Abstract

Cloud computing is a computing paradigm where large numbers of devices are connected through networks that provide a dynamically scalable infrastructure for applications, data and storage. Currently, many businesses, from small scale to big companies and industries, are changing their operations to utilize cloud services because cloud platforms could increase company’s growth through process efficiency and reduction in information technology spending [Coles16]. Companies are relying on cloud platforms like Amazon Web Services, Google Compute Engine, and Microsoft Azure, etc., for their business development.

Due to the emergence of new technologies, devices, and communications, the amount of data produced is growing rapidly every day. Big data is a collection of large dataset, typically hundreds of gigabytes, terabytes or petabytes. Big data storage and the analytics of this huge volume of data are a great challenge for companies and new businesses to handle, which is a primary focus of this paper.

This research was conducted on Amazon’s Elastic Compute Cloud (EC2) and Microsoft Azure platforms using the HiBench Hadoop Big Data Benchmark suite [HiBench16]. Processing huge volumes of data is a tedious task that is normally handled through traditional database servers. In contrast, Hadoop is a powerful framework is used to handle applications with big data requirements efficiently by using the MapReduce


algorithm to run them on systems with many commodity hardware nodes. Hadoop’s distributed file system facilitates rapid storage and data transfer rates of big data among the nodes and remains operational even when a node failure has occurred in a cluster. HiBench is a big data benchmarking tool that is used for evaluating the performance of big data applications whose data are handled and controlled by the Hadoop framework cluster. Hadoop cluster environment was enabled and evaluated on two cloud platforms. A quantitative comparison was performed on Amazon EC2 and Microsoft Azure along with a study of their pricing models. Measures are suggested for future studies and research.

Share

COinS