Towards Integrative Causal Analysis of Heterogeneous Datasets and Studies
Add to Google Calendar
Modern data analysis methods for the most part, concern the analysis of a single dataset. The conclusions of an analysis are published in the scientific literature and their synthesis is left up to a human expert. Integrative Causal Analysis (INCA) aims at automating this process as much as possible. It is a new, causal-based paradigm for inducing models in the context of prior knowledge and by co-analyzing heterogeneous datasets in terms of measured variables, experimental conditions, or sampling methodologies. INCA is related to, but is fundamentally different from statistical meta-analysis, multi-task learning, and transfer learning.
In this talk, we illustrate the enabling INCA ideas, present INCA algorithms, and give proof-of-concept empirical results. Among others, we show that the algorithms are able to predict the existence of conditional and unconditional dependencies (correlations), as well as the strength of the dependence, between two variables Y and Z never measured on the same samples, solely based on prior studies (datasets) measuring either Y or Z, but not both. The algorithms accurately predict thousands of dependencies in a wide range of domains, demonstrating the universality of the INCA idea. The novel inferences are entailed by assumptions inspired by causal and graphical modeling theories, such as the Faithfulness Condition. The results provide ample evidence that these assumptions often hold in many real systems. The long term goal of INCA is to enable the automated large-scale integration of available data and knowledge to construct causal models involving a significant part of human concepts.