This blog post presents the work done on enabling the virtual machines inside the Torque resource manager. First, what is Torque?
Torque is an open source technology, fork of PBS, to schedule computing tasks (jobs) on big clusters and HPC installations, developed by Adaptive Computing. It consists of three major parts:
- (at least one) queue
- the queuing system
- and a scheduler
Torque manages resources in cluster environments, such as compute nodes, storage or licenses. It allocates resources requested by users for their jobs. In the HPC world compute nodes usually get allocated exclusively for a certain user in contrast to Clouds where many users share the physical resource. For HLRS’ production team Torque’s scheduling functionality is beneficial to manage planned maintenances in such a way that nodes from a certain point in time on will no longer be allocated to end-users. As soon as all running jobs are completed, the maintenance can take place and Torque handles all computing requests on our systems. It is safe to shut this nodes down without interference for other users.
Let’s now investigate the reasons for use of VMs inside HPC systems and, consequently, enabling management of VMs by Torque. HPC applications usually are required to be adapted to the target system’s properties. By system we usually mean operating system, available libraries (down to the specific library version), etc. It is obvious that this process is rather cumbersome and can consume a lot of efforts. Far less tedious approach is through Cloud enabled solution – the preferred solution is to build once and run anywhere, where the set-up of the simulation is done in virtual machines, which are then deployed to an arbitrary HPC system. Having briefly presented the issues and possible solution, let’s present the technical approach towards VMs in HPC, using Torque.
How to enable VMs inside Torque
First let’s see how we submit jobs to torque:
qsub -l nodes=1:ppn=16 ./jobscript.sh
The Torque server receives in this case a jobscript in combination with a resource request of 1 nodes with 16 cores, that is indicated by the
'-l' parameter. This job is a first put in a queue, the execution is scheduled and as soon as sufficient resources become available to satisfy the job’s requirements is gets deployed on the allocated compute resources and executed.
We implemented a new, additional parameter. It leaves the control of the execution to Torque – this process is intact. The important change is implemented for the insertion of the job into the queue. This leaves the Torque source code unchanged – the required changes are only outside of the Torque system, which eases the management of the changes and also flexibility of the system. An example of the job insertion is shown below, with the explanation of the new parameter qsub now accepts prefixed with
qsub -l nodes=1:ppn=16 -vm img=/images/pool/ubuntu_bonesV01.51c.img, vcpu_pinning=./cpu_map.txt,vcpus=14 ./jobscript.sh
Let’s have a closer look onto the extended VM parameter set that is used in the command above:
vmIndicates that the following are parameters for the vm
img=[path]This img will be booted
vcpu_pinning=[path]Path to the cpu pinning file (optional), the format is for the kvm parameters inside the xml
vcpus=[int]Amount of virtual CPUs (optional)
To see this all in action, you can watch our screencast, of the demonstrator
How does it look like to torque
As you can see here both cases will be supported (Job.001 and Job.002). Job.001 is a VM job that will not be executed on the node itself. It is moved to a environment provided by a VM. This is implemented with wrapper around the qsub command and a wrapper around the Job itself. The Wrapper around qsub will notice the new parameter and places VM jobs with different prolog and epilog scripts (scripts that will run before and after the job).
What is the benefit?
Obviously the performance suffers when workloads are run in VMs, but the performance is obviously the most crucial aspect for High Performance Computing (HPC). So what’s the point in doing this?
Cloud like approaches benefit instantaneous form cloud functions like VM migration, VM suspension and the high flexibility of VMs. User get the flexibility of their software. Administrators get the flexibility in migrating / suspending running VMs to schedule earlier there maintenance (degrading nodes). And programmers don’t have to write applications that are capable of “checkpoint and restart” with the suspending feature of VMs.
What is planned further
Beside enabling interactive jobs in combination with VMs, that requires a patch of Torque’s source code at first, we also want to improve the performance with the help of our MIKELANGELO Partners. There is the optimized Hypervisor sKVM, developed by IBM Israel, virtualized RDMA developed by Huwai Germany, IO-CoreManager developed by IBM Israel, OSv a tiny CloudOS developed by SCYLLADB Israel.