Program

Monday, Aug 14, 1:45pm-5pm
Room 3.33

 
1:45 PM
-
2:00 PM

Welcome Address and Opening Remarks 

2:00 PM
-
3:00 PM

Technical Session 1: Performance

14:00-14:30 Thomas Carroll and Prudence W.H. Wong. An Improved Abstract GPU Model with Data Transfer

GPUs are commonly used as coprocessors to accelerate a compute-intensive task, thanks to their massivley parallel architecture. There is study into different abstract parallel models, which allow researchers to design and analyse parallel algorithms. However, most work on analysing GPU algorithms has been software based tools for profiling a GPU program. Recently, some abstract GPU models have been proposed, yet they do not capture all elements of a GPU, missing the data transfer between CPU and GPU, which in practice can cause a bottleneck and reduce performance dramatically. We propose a comprehensive model called Abstract Transferring GPU which to our knowledge is the first abstract GPU model to capture data transfer between CPU and GPU. We show via experiments, that existing models are not able to sufficiently model the actual running time in all cases, as they do not capture data transfer. We show that by capturing the data transfer with our model, we are able to obtain more accurate predictions of the actual running time. It is expected that our model helps improve design and analysis of heterogeneous systems consisting of CPU and GPU, and will allow researchers to make better informed implementation decisions, as they will be aware how data transfer affect their programs.

14:30-15:00 Carlos Reaño and Federico Silla. A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs

The use of Graphics Processing Units (GPUs) has become a very popular way to accelerate the execution of many applications. However, GPUs are not exempt from side effects. For instance, GPUs are expensive devices which additionally consume a non-negligible amount of energy even when they are not performing any computation. Furthermore, most applications present low GPU utilization. To address these concerns, the use of GPU virtualization has been proposed. In particular, remote GPU virtualization is a promising technology that allows applications to transparently leverage GPUs installed in any node of the cluster. In this paper the remote GPU virtualization mechanism is comparatively analyzed across three different generations of GPUs. The goal of this study is to analyze how the performance of the remote GPU virtualization technique is impacted by the underlying hardware. To that end, the Tesla K20, Tesla K40 and Tesla P100 GPUs along with FDR and EDR InfiniBand fabrics are used in the study. The analysis is performed in the context of the rCUDA middleware.  It is clearly shown that the GPU virtualization middleware requires a comprehensive design of its communication layer, which should be perfectly adapted to every hardware generation in order to avoid a reduction in performance.

3:00 PM
-
3:30 PM

Coffee break

3:30 PM
-
5:00 PM

Technical Session 2: Programming and Resource Management

15:30-16:00 Manjunath Gorentla Venkata, Ferrol Aderholdt and Zachary Parchman. SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory
The real-time cloud services and big data analysis applications often require high-performance distributed storage systems. To improve data access performance, the large-scale cache system has become a widely accepted solution. One of the most important technologies of cache systems is the cache prefetching approach. This approach can directly impact the overall system performance by preloading blocks into cache systems before actual accesses. However, traditional cache prefetching approaches are focused on small-scale cache systems with a limited number of applications. Thus, they have no ability to satisfy application requirements in cloud and big data environments. According to the characteristics of distributed storage systems, we propose an application-oriented cache allocation and prefetching method (ACAP) to handle those challenges. Our method possesses the advantages of both sequential and correlated-directed prefetching. More specifically, ACAP contains an application-oriented cache allocation approach AOAC, a long-term effective sequential prefetching approach CBCDP-SP and a high hit rate correlated-directed prefetching approach CBCDP-CP. Based on 23 public real-world datasets and 6 famous replacement strategies, the results of our experiment verify that ACAP can improve the hit rate of cache systems significantly.

16:00-16:30 Huanhuan Xiong and John Morrison. Towards a Scalable and Adaptable Resource Allocation Framework in Cloud Environments
Finding an appropriate resource to host the next application to be deployed in a Cloud environment can be a non-trivial task. To deliver the appropriate level of service, the functional requirements of the application must be met. Ideally, this process involves filtering the best resource from a number of possible candidates, whilst simultaneously satisfying multiple objectives. If timely responses to resource requests are to be maintained, the sophistication of the filtering mechanism and the size of the search space have to be carefully balanced. The quality of the solution will thus not readily scale with growth in cloud resources and filtering complexity. This limitation is becoming more evident with the emergence of hyper-scale clouds and with the increased complexity needed to accommodate the growing heterogeneity in resources. Moreover, meeting non-functional requirements, reflecting the Cloud Service Provider's business objects, is also becoming increasingly critical as service utilization and energy efficiency in a typical cloud deployment are extremely low. 
This paper proposes a reexamination of the resource allocation problem by proposing a framework to support distributed resource allocation decisions and that can be dynamically populated with strategies to reflect the ever-growing number of diverse objectives as they become evident in the evolving cloud infrastructure.

16:30-17:00 Javier Prades and Federico Silla. Turning GPUs into Floating Devices over The Cluster: The Beauty of GPU Migration

Virtualization techniques have shown to report benefits to data centers and other computing facilities. In this regard, not only virtual machines allow reducing the size of the computing infrastructure while increasing overall resource utilization but also virtualizing individual components of computers may provide significant benefits. This is the case, for example, for the remote GPU virtualization technique, implemented in several frameworks during the recent years. The large degree of flexibility provided by the remote GPU virtualization technique, however, can be further increased by applying the migration mechanism to it, so that the GPU part of an application can be live migrated to another GPU elsewhere in the cluster during the execution of the application in a transparent way to it. In this paper we present a discussion about how the migration mechanism has been applied to different GPU virtualization frameworks. We also provide a big picture about the possibilities that migrating the GPU part of applications can provide to data centers and other computing facilities. We finally present the first results of an ongoing work consisting on applying the migration mechanism to the rCUDA remote GPU virtualization framework. 

Comments