snap

MIKELANGELO has adopted and extended Intel’s snap open-source telemetry framework to deliver full-stack instrumentation and monitoring across all of the MIKELANGELO use cases.

Snap is a framework that allows data center owners to dynamically instrument cloud-scale data centers. Precise, custom, complex flows of telemetry can be easily constructed and managed. Data can be captured from hardware and software sources, in-band or out-of-band, local or remote via snap collector plugins. Captured data can be passed through local filters – snap processor plugins – that analyse and perform some action on the data. The processed data can then be published by snap publisher plugins to arbitrary destinations. Endpoints can include SQL and NoSQL databases, message queues, or analytics engines such as Intel’s open source Trusted Analytics Platform.

Snap has been developed from the ground up to be trustworthy, performant, dynamic, scalable and highly extensible. Snap includes:

  • a daemon on nodes that collect, process and/or publish data. The data can be collected from the local node, or from remote nodes.
  • a dynamic catalogue of metrics, based on currently loaded plugins
  • highly configurable telemetry workflows, knowns as tasks
  • a command line interface that allows metrics, plugins and tasks to be manipulated
  • a RESTful API for remote management
  • simplified cluster-aware management via tribe

The core components of snap

 

For a complete introduction to snap in MIKELANGELO see our blog post “Full-stack cloud-scale instrumentation? It’s a snap…”.

Achievements and Results

Snap capturing CPU utilisation from a 500 node cluster

Snap capturing CPU utilisation from a 500 node cluster (click to expand)

To date, MIKELANGELO has developed and open-sourced plugins to

  • collect data from Libvirt, OSv, MongoDB, SCSI, vRDMAOpenFOAM, yarn, schedstat and KVM,
  • aggregate Utilisation, Saturation and Errors data from compute, storage, memory, network subsystems,
  • inject meta-data tags to facilitate offline analysis,
  • dynamically reduce telemetry resolution when data is stable,
  • publish telemetry to PostgreSQL.

MIKELANGELO has also demonstrated snap running on a 500 node cluster, and proven that the MIKELANGELO-enhanced ScyllaDB rewrite of Apache Cassandra can be employed as a back-end data-store for snap-gathered telemetry.

Development is nearing completion on plugins to automatically reduce data resolution when metric readings are steady, and to gather utilisation, saturation and error data for common subsystems.

Snap can already collect data from OpenStack Cinder, Glance, Keystone, Neutron and Nova. In future work, we will explore if integration with OpenStack Ceilometer would be beneficial. We will also investigate techniques to simplify analysis of the data captured by snap.

Here are pointers to key snap resources from both our project and the community that you may find useful. Enjoy!

MIKELANGELO Resources

Reports

Presentations

Blog Posts

Software Releases

Community Resources