Optimizing a medical image registration algorithm based on profiling data for real-time performance

Image registration is a commonly task in medical image analysis. Therefore, a significant number of algorithms have been developed to perform rigid and non-rigid image registration. Particularly, the free-form deformation algorithm is frequently used to carry out non-rigid registration task; however...

Full description

Bibliographic Details
Main Author: Carlos A. S. J. Gulo (author)
Other Authors: Antonio C. Sementille (author), João Manuel R. S. Tavares (author)
Format: article
Language:eng
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10216/139707
Country:Portugal
Oai:oai:repositorio-aberto.up.pt:10216/139707
Description
Summary:Image registration is a commonly task in medical image analysis. Therefore, a significant number of algorithms have been developed to perform rigid and non-rigid image registration. Particularly, the free-form deformation algorithm is frequently used to carry out non-rigid registration task; however, it is a computationally very intensive algorithm. In this work, we describe an approach based on profiling data to identify potential parts of this algorithm for which parallel implementations can be developed. The proposed approach assesses the efficient of the algorithm by applying performance analysis techniques commonly available in traditional computer operating systems. Hence, this article provides guidelines to support researchers working on medical image processing and analysis to achieve real-time non-rigid image registration applications using common computing systems. According to our experimental findings, significant speedups can be accomplished by parallelizing sequential snippets, i.e., code regions that are executed more than once. For the selected costly functions previously identified in the studied free-form deformation algorithm, the developed parallelization decreased the runtime by up to seven times relatively to the related single thread based implementation. The implementations were developed based on the Open Multi-Processing application programming interface. In conclusion, this study confirms that based on the call graph visualization and detected performance bottlenecks, one can easily find and evaluate snippets which are potential optimization targets in addition to throughput in memory accesses. (c) 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.