Rui Silva finaliza o seu Doutoramento

Tema da Tese: Locality optimisation techniques for platforms

Autor: Rui António Sabino Castiço da Silva

Orientadores:  João Luís Ferreira Sobral

Data: 29/03/2022

Programa Doutoral: Programa Doutoral em Informática


Scientific applications simulate the real world through mathematical models. The simulation of these
models requires all the computational power available in current architectures. However, to take advantage
of this computational power, the simulation code must be optimised, bringing it closer to the execution
platform. This adaptation introduces a lot of complexity in application development. Traditionally the
code is written according to the platform on which it will be executed. This approach has two problems:
first, the code is dependent on the execution platform, and changing the execution platform requires code
rewriting; second, the optimisation code is mixed with the domain code, making it difficult to understand.
The Object-Oriented Paradigm (OOP) is known for bringing the code closer to the domain. However, its
use typically introduces an overhead, preventing its use in scientific applications. This explains why Java,
one of the most widely used OOP languages is not commonly used to develop scientific applications. On
the other hand, modern OOP languages that rely on dynamic compilation (e.g., Java) can remove many
overheads typical of OOP, when more advanced features are not used (e.g. polymorphism), which is the
case of many scientific applications.
This dissertation introduces an approach that allows developers to perform optimisation in the final
development step. The approach enables multiple data layouts and allows the selection of the best layout
according to the execution platform. On the other hand, the approach supports tiling, packing and sorting
optimisations. Additionally, the approach supports parallel execution, hiding the implementation details
related to optimisation. The approach separates the development of the domain code from its optimisation.
The optimisation step is delayed until the final development step, which allows an easy adaptation to the
execution platform.
The supported optimisations rely on two mechanisms: first, changing the data collection layout and
second, decomposing the problem into subproblems. Both optimisations are introduced in the program by
the developer in a simple way (low development cost) and maintaining the domain concepts in the code.
First, for hiding the data layout, the approach is based on the proxy pattern, creating a temporary object that
accesses the data using the same API. Second, the domain decomposition mechanism enables several
common optimisations: processing data in tiles, packing, parallel execution and thread private data. The
technique was implemented by code annotations avoiding more invasive code changes.
The approach was evaluated with a set of case studies: Sum, daxpy, JECoLi, MD and Matrix multiplication.
This set allowed to verify the approach effectiveness in different cases and its execution overhead.
The adaptation of the domain code to support the approach was simpler than transforming the layout in
the domain code. In all cases, the approach obtained a performance similar to traditional approaches.
In the MD case, the example that supports more optimisations, the use of the approach provided a gain
of 50x in execution time. Other cases studies provided gains from 20x to 40x. The JECoLi case has
the lowest gain (1.6x) since the gain was only due to layout change. These gains show the feasibility of
the approach that delivered efficient optimised codes, adding low additional cost when compared with
traditional approaches.

Keywords: Data layout, Java, Optimisations for memory hierarchy, Parallel execution, Tiling