João Pedro Borges Araújo Oliveira e Silva finaliza o Doutoramento
Tema da Tese: Development of Human Body Pose Detection Algorithms for In-Car Scenario, and Validation with Suitable Ground-Truth System
Autor: João Pedro Borges Araújo Oliveira e Silva
Programa Doutoral: Programa Doutoral em Engenharia Eletrónica e de Computadores
Orientador: Jaime Francisco Cruz Fonseca
Data: 29/05/2020
Abstract: Automated driving cars are emerging, increasing the need for advanced occupant monitoring applications. A transversal need for such systems is the detection of the occupants’ posture. Discriminative approaches have received increased focus in the past decade, due to its automated detection and the growth in Machine Learning (ML) applications and frameworks. One of its downsides is the need for a large dataset to train, to achieve high accuracy. To allow a robust algorithmic training and validation, an algorithmic development pipeline able to generate both real and synthetic datasets in the in-car scenario needs to be established, together with adequate evaluation procedures, this thesis addresses such development. The approach focuses first in two toolchains for in-car human body pose dataset generation: (1) real, and (2) synthetic. The first toolchain uses two types of sensors for the data generation: (1) image data is captured through a Time-of-Flight (ToF) sensor, and (2) human body pose data (ground-truth) is captured through an inertial suit and optical system. Besides quantifying the inertial suit inherent sensitivity and accuracy, the feasibility of the overall system for human body pose capture in the in-car scenario was demonstrated. Finally, the feasibility of using system generated data (which was made publicly available) to train ML algorithms is demonstrated. The second toolchain uses the features and labels from the previous one, in this case both sensors are synthetically rendered. The toolchain creates a customized synthetic environment, comprising human models, car, and camera. Poses are automatically generated for each human, taking into account a per-joint axis Gaussian or incremental distribution, constrained by anthropometric and Range of Motion measurements. Scene validation is done through collision detection. Rendering is focused on vision data, supporting ToF and RGB cameras, generating synthetic images from these sensors. The feasibility of using synthetic data (which was made publicly available), combined with real data, to train distinct machine learning agorithms is demonstrated. Finally, several algorithms were evaluated, and a Deep Learning (DL) based algorithm, namely Part Affinity Fields, was selected, customized and trained with datasets generated with the previously mentioned toolchains, ultimately aiming to improve accuracy for the in-car scenario.