Integrating Computational Tools in Interactive and Visual Methods for Enhancing High-dimensional Data and Cluster Analysis
Abstract
With the advance of new data acquisition and generation technologies, our society is becoming increasingly information-driven. The datasets are getting larger and more complex as new technologies emerge and they are posing new challenges to the analysts who are trying to build an understanding of them. Automated computational approaches and interactive visual methods have been widely used to extract and interpret the relevant information in data analysis. However when these methods are used alone on complex datasets, their effectivity is limited due to several factors. Most of the commonly used computational tools often lead to hard to interpret results that may not be reliable most of the time. This thesis aims to enhance data analysis procedures by integrating computational tools with interactive visual methodologies. The contributions of this thesis are mainly focused on the analysis of (very) high-dimensional data, i.e., hundreds and even thousands of dimensions, and cluster analysis. We introduce the dual analysis approach that makes it possible to analyze the items and the dimensions of a dataset in parallel in two linked visualization spaces. This methodology provides a basis to visually characterize and investigate dimensions as first-order analysis objects. We describe structure-aware analysis procedures that are facilitated by representative factors. Moreover, we present several mechanisms to achieve outlier-aware analysis routines. We describe the notion of outlyingness for the dimensions of a dataset and discuss how they can be determined and treated properly. We then focus on enhancing the dialogue between the analyst and the computer when computational methods are used interactively. We describe how different human factors come into play in visual analysis applications and propose optimized analytical processes that try to comply with the human capabilities. All these different approaches are demonstrated with various use-cases performed mostly together with experts from medical, genetic, and molecular biology domain.
C. Turkay, "Integrating Computational Tools in Interactive and Visual Methods for Enhancing High-dimensional Data and Cluster Analysis," PhD Thesis, 2013.
[BibTeX]
With the advance of new data acquisition and generation technologies, our society is becoming increasingly information-driven. The datasets are getting larger and more complex as new technologies emerge and they are posing new challenges to the analysts who are trying to build an understanding of them. Automated computational approaches and interactive visual methods have been widely used to extract and interpret the relevant information in data analysis. However when these methods are used alone on complex datasets, their effectivity is limited due to several factors. Most of the commonly used computational tools often lead to hard to interpret results that may not be reliable most of the time. This thesis aims to enhance data analysis procedures by integrating computational tools with interactive visual methodologies. The contributions of this thesis are mainly focused on the analysis of (very) high-dimensional data, i.e., hundreds and even thousands of dimensions, and cluster analysis. We introduce the dual analysis approach that makes it possible to analyze the items and the dimensions of a dataset in parallel in two linked visualization spaces. This methodology provides a basis to visually characterize and investigate dimensions as first-order analysis objects. We describe structure-aware analysis procedures that are facilitated by representative factors. Moreover, we present several mechanisms to achieve outlier-aware analysis routines. We describe the notion of outlyingness for the dimensions of a dataset and discuss how they can be determined and treated properly. We then focus on enhancing the dialogue between the analyst and the computer when computational methods are used interactively. We describe how different human factors come into play in visual analysis applications and propose optimized analytical processes that try to comply with the human capabilities. All these different approaches are demonstrated with various use-cases performed mostly together with experts from medical, genetic, and molecular biology domain.
@PHDTHESIS {turkay13thesis,
author = "Cagatay Turkay",
title = "Integrating Computational Tools in Interactive and Visual Methods for Enhancing High-dimensional Data and Cluster Analysis",
school = "Department of Informatics, University of Bergen, Norway",
year = "2013",
month = "November",
abstract = "With the advance of new data acquisition and generation technologies, our society is becoming increasingly information-driven. The datasets are getting larger and more complex as new technologies emerge and they are posing new challenges to the analysts who are trying to build an understanding of them. Automated computational approaches and interactive visual methods have been widely used to extract and interpret the relevant information in data analysis. However when these methods are used alone on complex datasets, their effectivity is limited due to several factors. Most of the commonly used computational tools often lead to hard to interpret results that may not be reliable most of the time. This thesis aims to enhance data analysis procedures by integrating computational tools with interactive visual methodologies. The contributions of this thesis are mainly focused on the analysis of (very) high-dimensional data, i.e., hundreds and even thousands of dimensions, and cluster analysis. We introduce the dual analysis approach that makes it possible to analyze the items and the dimensions of a dataset in parallel in two linked visualization spaces. This methodology provides a basis to visually characterize and investigate dimensions as first-order analysis objects. We describe structure-aware analysis procedures that are facilitated by representative factors. Moreover, we present several mechanisms to achieve outlier-aware analysis routines. We describe the notion of outlyingness for the dimensions of a dataset and discuss how they can be determined and treated properly. We then focus on enhancing the dialogue between the analyst and the computer when computational methods are used interactively. We describe how different human factors come into play in visual analysis applications and propose optimized analytical processes that try to comply with the human capabilities. All these different approaches are demonstrated with various use-cases performed mostly together with experts from medical, genetic, and molecular biology domain. ",
pdf = "pdfs/turkay13thesis.pdf",
images = "images/turkay13thesis.png",
thumbnails = "images/turkay13thesis.png",
isbn = "?? ",
project = "medviz"
}