Validation in Statistics and Machine Learning - Abstract
(joint work with Vindi Jurinovic)
Gene expression is a dynamic process where thousands of components interact dynamically in a complex way. Microarray data offers a coarse cross-sectional view on these dynamic activities. The talk poses the question of the biological meaning for a gene interaction network estimated from the cross-sectional microarray data and shows that ergodic arguments may be used to support the interpretation of the measured data as time averages. This implies caution with respect to interaction between dynamic components which can not be inferred due to the averaging over time. Furthermore, the aspect of confounding by components not considered in the inferred network may be crucial. Based on these considerations we approach a data set of lymphoma patients with translocated or normal Myc gene. Myc (C-Myc) translocations to immunoglobulin heavy-chain (IGH) or light-chain (IGK, IGL) loci lead to Myc overexpression and are widely believed to be the crucial initiating oncogenic events in the development of Burkitt's lymphoma. There is a rich body of knowledge on the biological implications of the different translocations. The talk analyzes the relationship between the biological knowledge and the results of formal statistical estimates of gene interaction networks. We try to explore how strategies of biological validation are needed to understand the outcome of formal network estimates.