By Oliveira, V.; Pina, A.; Rocha, A.
Proceedings - 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2012
The use of virtualization in HPC clusters can provide rich software environments, application isolation and efficient workload management mechanisms, but system-level virtualization introduces a software layer on the computing nodes that reduces performance and inhibits the direct use of hardware devices. We propose an unobtrusive user-level platform that allows the execution of virtual machines inside batch jobs without limiting the computing cluster’s ability to execute the most demanding applications. A per-user platform uses a static mode in which the VMs run entirely using the resources of a single batch job and a dynamic mode in which the VMs navigate at runtime between the continuously allocated jobs node time-slots. A dynamic mode is introduced to build complex scenarios with several VMs for personalized HPC environments or persistent services such as databases or web services based applications. Fault-tolerant system agents, integrated using group communication primitives, control the system and execute user commands and automatic scheduling decisions made by an optional monitoring function. The performance of compute intensive applications running on our system suffers negligible overhead compared to the native configuration. The performance of distributed applications is dependent on their communication patterns as the user-mode network overlay introduces a relevant communication overhead.