This report describes the initial implementation strategy for the virtualised big data use case of the MIKELANGELO project. This use case will showcase the advancements achieved by MIKELANGELO in the field of big data.
The goal of this use case is to leverage MIKELANGELO’s advancements in I/O efficiency of virtual machines and application management to run big data applications on virtual infrastructures. Besides integrating components from other work-packages in MIKELANGELO, this use case will extend existing middleware to run big data clusters on demand in a cloud. At the end of the project the goal is to reach a state that will allow resource providers to deploy an on-demand big data service on their cloud.
This report describes the use case itself along its two main stages. These stages are comprised of benchmarking and of running mini-use cases. The description of the use case consists of a description of hardware and software infrastructure, a description of data sets, a description of mandatory requirements, and a description of how the use case relates to the project’s key performance indicators. Furthermore, we provide an implementation plan for the benchmarking stage and the mini-use case stage. For the benchmarking stage this report offers some more detailed descriptions as they will serve as the project’s initial baseline measurements.
The work already performed for the big data use case has led to a number of tangible results. First, we present in this document an overview of the state of the art with regards to big data benchmarks. These form a basis for our first choice, HiBench, as big data benchmark. Second, we have run HiBench on a test bed to establish a baseline to be used for evaluation of MIKELANGELO’s new developments. The data generated in these experiments can be reviewed in MIKELANGELO’s open data repository. Third, we have formed an implementation plan for the remainder of the project for the big data use case. This implementation plan relies on a requirements analysis and on the analysis of the project’s KPIs. The implementation plan specifies the required features for the benchmarking part and for the mini use-cases.
This report provides the most detailed description of the virtualised big data use case so far. This description recaps the current state of the implementation and an implementation plan to be executed during the next project stages. Conclusively, the use case has received most work on the conceptual side and progresses smoothly into the implementation phases.