Using AI, Researchers Launch Database to Predict Cancer DNA Anomalies
3D illustration of a cancer cell
August 7, 2024 | By Dave DeFusco
UNC-Chapel Hill researchers have launched a database called CytoCellDB to address a significant gap in cancer research, particularly focusing on extrachromosomal DNA, they explain in their paper, “CytoCellDB: A Resource Database for Classification and Analysis of Extrachromosomal DNA in Cancer,” published in Nucleic Acids Research (NAR) Cancer.
“CytoCellDB provides researchers with an invaluable tool to understand which commonly used cancer cell lines contain ecDNA and other chromosomal aberrations,” said Dr. Elizabeth Brunk, senior and corresponding author of the paper and an assistant professor in the Departments of Pharmacology and Chemistry. “CytoCellDB represents a significant step toward understanding ecDNA, paving the way to more effective treatments and improved outcomes for over 15% of cancers that harbor ecDNA.”
Extrachromosomal DNA (ecDNA) alters how cells divide, express RNA and respond to drug treatments. Without knowing which model systems contain ecDNA, researchers cannot fully understand why some cells respond differently to drugs. EcDNA, also known as double-minute chromosomes, are established markers for malignancy and genome instability, playing a crucial role in cancer proliferation, drug resistance and epigenetic remodeling, which are changes in gene expression that don’t involve alterations in the DNA sequence itself.
“Even though ecDNA is important, fully understanding its biological roles has been challenging due to technological limitations,” said Dr. Brunk, a member of the UNC Lineberger Comprehensive Cancer Center. “This is because there aren’t enough cell-line models with experimental data that clearly show whether ecDNA is present or not.”
EcDNA enables cancer cells to amplify key genes outside of chromosomes, altering how these genes are regulated, replicated and divided. They are more common than previously thought, occurring in 14% to 20% of tumors from oncology patients. Despite their clinical relevance, the frequency of ecDNA in cancer cell line models remains largely unknown, creating a significant knowledge gap that limits basic cancer research.
While extensive genome sequencing data is available for many cell lines, differentiating ecDNA from chromosomal DNA requires microscopic examination of their nuclei. CytoCellDB bridges this gap by providing both experimental details and computational predictions of ecDNA across hundreds of cell lines with publicly available sequencing and multi-omics data.
“By consolidating these complementary data in one place, CytoCellDB enables researchers to study the impacts of chromosomal aberrations and ecDNA on various cellular processes,” she said. “The database includes detailed information on 577 cell lines, significantly expanding the available data on ecDNA by over 400%.”
In the paper, the researchers explain that CytoCellDB was used to explore differences in gene expression, gene dependency and drug response across hundreds of cell lines with and without ecDNA, using the Broad Institute’s Dependency Map multi-omics data. It served as the largest ground truth dataset to increase the power for algorithms that predict ecDNA from DNA sequencing data, thereby enabling the assessment of their accuracy and precision. By combining machine learning, a type of artificial intelligence, with CytoCellDB, the researchers were able to accurately predict the presence of ecDNA in cell lines and tumor samples more than 85% of the time—higher than any other existing algorithm for predicting the existence of ecDNA.
“Understanding the impact of ecDNA is an urgent, unmet need that will change the way we analyze genomics data, develop drug screens and understand drug resistance,” she said. “One of the most exciting aspects of this work is the collaborative effort across diverse disciplines and the range of experience, from undergraduates to rotation students to graduate students to senior scientists.”
Dr. Brunk was joined in the research by co-first author Jacob Fessler, who is an undergraduate student in the UNC Computer Science Department and a member of the Brunk lab. He helped to develop algorithms to process nearly 600 unstructured karyotype records across hundreds of cancer cell lines, and designed and implemented the online database.
Also contributing to this work were the following undergraduate students: Danielle Cannon (Biology), Kohen Goble (Chemistry) and Aarav Mehta (Computer Science), who are members of the Brunk lab. In addition, Brunk lab graduate students and staff: Jingting Chen (Biochemistry and Biophysics), Dr. Christina Ford and Dr. Santiago Haase (Integrative Program for Biological and Genome Sciences, IBGS), Stephanie Ting (Chemistry) and Yue Wang (Pharmacology); Brunk lab collaborator Dr. Hong Yi (Renaissance Computing Institute, RENCI); and rotation students, Saygin Gulec and Nathan Smyers (Curriculum in Bioinformatics and Computational Biology).
The enhanced understanding of ecDNA provided by CytoCellDB will open new avenues for studying genome function, cancer cell fitness, therapeutic response and drug evasion. By identifying more cell lines with ecDNA, researchers can investigate the molecular impacts of ecDNA on a global scale, aiding in the development of new treatment strategies, biomarkers and targeted approaches for ecDNA-driven cancers.
“CytoCellDB represents the most comprehensive cytogenetic resource for cancer cell lines,” said Dr. Brunk, “providing a foundation for groundbreaking research and potential therapeutic innovations.”