MIKELANGELO has adopted and extended Intel’s snap open-source telemetry framework to deliver full-stack instrumentation and monitoring across all of the MIKELANGELO use cases.

Snap is a framework that allows data center owners to dynamically instrument cloud-scale data centers. Precise, custom, complex flows of telemetry can be easily constructed and managed. Data can be captured from hardware and software sources, in-band or out-of-band, local or remote via snap collector plugins. Captured data can be passed through local filters – snap processor plugins – that analyse and perform some action on the data. The processed data can then be published by snap publisher plugins to arbitrary destinations. Endpoints can include SQL and NoSQL databases, message queues, or big data and analytics engines.

Snap has been developed from the ground up to be trustworthy, performant, dynamic, scalable and highly extensible. Snap includes:

  • a daemon on nodes that collect, process and/or publish data. The data can be collected from the local node, or from remote nodes.
  • a dynamic catalogue of metrics, based on currently loaded plugins
  • highly configurable telemetry workflows, knowns as tasks
  • a command line interface that allows metrics, plugins and tasks to be manipulated
  • a RESTful API for remote management
  • simplified cluster-aware management via tribe

The core components of snap


For a complete introduction to snap in MIKELANGELO see our blog post “Full-stack cloud-scale instrumentation? It’s a snap…”.

Achievements and Results

Snap capturing CPU utilisation from a 500 node cluster

Snap capturing CPU utilisation from a 500 node cluster (click to expand)

To date, MIKELANGELO has developed and open-sourced plugins to

  • collect data from Libvirt, OSv, MongoDB, SCSI, vRDMAOpenFOAM, yarn, schedstat and KVM,
  • aggregate Utilisation, Saturation and Errors data from compute, storage, memory, network subsystems,
  • inject meta-data tags to facilitate offline analysis,
  • dynamically reduce telemetry resolution when data is stable – reducing network traffic by factor of 16 in one deployment, without affecting statistical insight
  • publish telemetry to PostgreSQL.

MIKELANGELO has also demonstrated snap running on a 500 node cluster, and proven that the MIKELANGELO-enhanced ScyllaDB rewrite of Apache Cassandra can be employed as a back-end data-store for snap-gathered telemetry.

Snap can already collect data from OpenStack Cinder, Glance, Keystone, Neutron and Nova.

A plugin is currently under development that can collect data from OpenVSwitch and OpenDaylight. Utilities to facilitate installation and configuration of snap are also being finalised. In future work, we will investigate techniques to simplify analysis of the data captured by snap.

Here are pointers to key snap resources from both our project and the community that you may find useful. Enjoy!




Blog Posts

Software Releases

Community Resources