Cancer: could the background noise of DNA microarrays conceal the true signals?
AI-generated hypothesis · Pre-publication · To be tested experimentally
Table of contents — full brief
- Hypothesis and mechanismCausal chain, key assumptions, residual unknowns
- State of the artVerified references and counter-evidence (DOIs)
- Falsifiable predictionsQuantitative bounds, statistical tests, H0
- Experimental protocolThree phases — in silico → minimal → full
- Impact analysisNovelty, residual gaps, available data
- Panel reviewFive personas + meta-review
Verified references
5 of 13 references- DOI: 10.1080/02664763.2018.1454894 ↗
Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data
2017 - DOI: 10.1186/2049-2618-2-15 ↗
Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
2014 - DOI: 10.1093/nargab/lqaa103 ↗
Editorial: Compositional data analysis and related methods applied to genomics—a first special issue from NAR Genomics and Bioinformatics
2020 - DOI: 10.1002/sam.11514 ↗
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data
2021 - DOI: 10.1101/406264 ↗
Normalization methods for microbial abundance data strongly affect correlation estimates
2018
+ 8 more references
Detailed panel scores
The protocol is remarkably well structured, with a logical three-phase progression (in silico, minimal experimental, full validation), each phase governed by clear GO/NO-GO/PIVOT criteria, thereby enabling objective decision-making and efficient resource allocation before proceeding to more costly stages.
The hypothesis correctly identifies the fundamental problem of the closure constraint in microarray data, a point frequently overlooked in the cancer bioinformatics literature. The methodological transfer of CoDA, well established in microbial ecology and metabolomics, to cancer classification is a logical and potentially fruitful extension.
The hypothesis addresses a real and underappreciated problem: the impact of the closure constraint on correlations between genes in DNA microarray data, an artefact that standard normalisation procedures do not explicitly treat.
A clear and quantifiable addressable market: molecular cancer diagnostics (market estimated at $15B by 2028) and subtype classification platforms (e.g., MammaPrint, Oncotype DX) would pay for an 8 percentage-point improvement in accuracy, as this reduces false positives and unnecessary biopsies, directly impacting insurer reimbursement.
The hypothesis is strong and falsifiable, with a clear causal mechanism (CLR removes artefactual correlations arising from the constant-sum constraint), a feature that is rare and valued by ANR/ERC reviewers.
Receive the next SPORE hypotheses
Once or twice a month, in your inbox. No spam, one-click unsubscribe.
Your data stays private. No third-party sharing. GDPR-compliant.