Dissertations and habilitations - Laboratoire de Recherche en Informatique

Ph.D de

Ph.D
Group : Parallel Systems

Étiquetage en composantes connexes efficace pour les architectures hautes performances

Starts on 01/11/2012
Advisor : LACASSAGNE, Lionel

Funding : contrat doctoral UPS
Affiliation : Université Paris-Saclay
Laboratory : LRI-ARCHI

Defended on 28/09/2016, committee :

Directeur de thèse
M. Lionel LACASSAGNE, UPMC

Examinateurs :
-M. Sylvain CONCHON, Université Paris-Sud 11,
-M. Daniel ETIEMBLE, Université Paris-Sud,
-M. Olivier SENTIEYS, INRIA,
-M. Sven SIMON, Universität Stuttgart,

Rapporteurs :
-Mme Annick MONTANVERT, Université Pierre-Mendès France,
-M. Hugues TALBOT, Université Paris-Est-Marne-la-Vallée / ESIEE,

Research activities :

Abstract :
This doctoral research takes place in the field of
algorithm-architecture matching for computer vision, specifically for
Connected Component Labelling (CCL) for high performance parallel
architectures.
While modern architectures are overwhelmingly multi-core, CCL algorithms
are mostly sequential, irregular and using a graph structure to
represent the equivalences between labels. This aspects makes their
parallelization quite challenging.

CCL processes a binary image and gathers under the same label all the
connected pixels, and in doing so, CCL is a bridge between low-level
operations like filtering and high-level ones like shape recognition and
decision-making.
It is involved in a large number of processing chains that require
segmented image analysis. The acceleration of this step is therefore an
issue for a variety of algorithms.
At first, the PHD work focused on the comparative performance of the
State-of-the-Art algorithms, for CCL as well as for the features
analysis of the connected components (CCA). This was done in order to
identify a hierarchy and the critical components of the algorithms. For
this, a benchmarking method, reproducible and independent of the
application domain was proposed and applied to a representative set of
State-of-the-Art algorithms. The results show that the fastest
sequential algorithm is the LSL algorithm which manipulates segments
unlike other algorithms that manipulate pixels.
Secondly, a parallelization framework of directs algorithms based on
OpenMP was proposed with the main objective of computing the CCA on the
fly and reducing the cost of communication between threads.
For this, the binary image is divided into bands that are processed in
parallel to each core of the architecture and a pyramidal fusion step
that processes the generated disjointed sets of labels provides the
fully labeled image without concurrent access to data between threads.

The benchmarking procedure applied to several machines of various
parallelism levels, showing that the proposed parallelization framework
applies to all the direct algorithms.
The LSL algorithm is once again the fastest and the only one suitable
when the number of cores increases due to its run-based conception. With
an architecture of 60 cores, the LSL algorithm can process 42.4 billion
pixels per second for images of 8192x8192 pixels, while the fastest
pixel-based algorithm is limited by the bandwidth and saturates at 5.8
billion pixels per second.

After these works, our attention focused on iterative CCL algorithms in
order to develop new algorithms for many-core and GPU architectures. The
Iterative algorithms are based on a local propagation mechanism without
supplementary equivalence structure which allows to achieve a massively
parallel implementation (MPAR). This work then led to the creation of
two new algorithms.

An incremental improvement of MPAR using a set of mechanisms such as
alternative scanning, the use of SIMD instructions and an active tile
mechanism to distribute the load between the different cores while
limiting the processing of the pixels to the active areas of the image
and to their neighbors.
An algorithm that implements the equivalence relation directly into the
image to reduce the number of iterations required for labeling. An
implementation for GPU, based on ``atomic'' instructions with
pre-labeling in the local memory has been realized and it has proven
effective from the small images.