Real-Fake Medical Data: Computational Geometry to the Rescue

Hauke Bartsch
10:15 - 11:15
Visual Computing Forum

Hauke is a researcher at the MMIV Center at Haukeland hospital and an adjunct associate professor at the University of Bergen. He plays a key role in the development of the data exploration and hypothesis testing portal (Data Portal) used for example in the Pediatric Imaging Neurocognition and Genetics (PING) study, the Alzheimer’s Disease Research Center (ADRC) project, and in the Pediatric Longitudinal Imagine Neurocognition and Genetics (PLING) study. He also contributed in the development of the MagickBox (MB) PACS Network Appliance, which is now in routine use at UCSD, UCLA, and MGH, enabling automated analysis of imaging data using analysis workflows implemented as virtual appliances (docker containers), as part of clinical workflow.

He holds a PhD in computational neuroscience, a master’s degree in computer science and has 10 years of experience working in both commercial enterprises and scientific research with a focus on data analysis and visualization. He has been responsible for software developments in the areas of diffusion tensor imaging, brain perfusion, whole slice image processing and reconstruction, brain mapping and atlas generation, deformable shape models, shape analysis and general numerical simulations.

His research focuses on methods for extracting information from medical image data, histology, genetics, and behavioral data with the goal of understanding processing of development, disease progression and pathology. In the context of large scale clinical studies he combines these diverse sources of information by leveraging tools for data exploration and statistical hypothesis testing.

Deep learning methods on medical images often rely on out of domain training data. A large model like ResNet-50 is trained on the ImageNet dataset containing RGB images of cats and dogs. Such models have been successful in solving medical classification tasks but it remains to be seen how much smaller, i.e. more energy efficient and how much more accurate they could become if trained with domain data. Domain data for model training provides some additional challenges. One of them is participant confidentiality as each participants data leaves a trace in the model. In this talk I revisit the use of computational geometry to generate domain appropriate data for model training and explain on two tasks how such procedures generate infinitely varying data without the need for data augmentation.