Matt Cufari, a senior physics major in the College of Arts and Sciences (A&S), a computer science major in the College of Engineering and Computer Science, a Coronat Scholar and a member of the Renée Crown University Honors Program, has…
Forensic Science Professors Awarded $431,917 for DNA Simulation and Sequencing Tool
Forensics and National Security Sciences Institute (FNSSI) faculty members Michael Marciano and Jonathan Adelman have been awarded a two-year $431,917 grant from the National Institute of Justice to develop a fully continuous machine learning approach to predict the number of contributors in sequence-based DNA profiles. The project builds upon intellectual property the two developed that is owned and licensed by SU.
Previously, Marciano and Adelman developed a method to predict the number of people contributing to mixed DNA samples, dubbed Probabilistic Assessment for Contributor Estimate (PACE), which uses computer technology to determine the number of individuals’ DNA in a given sample. The patented method is licensed to NicheVision Forensics, which is participating as a grant subcontractor on the new project.
Modern forensic DNA analysis uses the differences in the size of DNA markers to differentiate individuals. “At one location, I might have a fragment or marker that is 15 units long, and you might have one that’s 14 units long. The difference between the 15 and the 14 is what helps identify me from you,” explains Marciano, an FNSSI research assistant professor and molecular biologist. That technique becomes limited, however, when multiple individuals have markers of the same length.
Forensic DNA analysis will soon make the transition to using DNA sequencing as a means of identification. DNA sequence data is typically much more complex than fragment (size data). Marciano and Adelman’s new method focuses on using artificial intelligence—specifically, machine deep learning—to tease out complex patterns undetectable to a human analyst.
“The sequence data we’re talking about are very complex and hierarchical in nature,” explains Adelman, FNSSI research assistant professor and computer scientist. “There are layers to the data. Some of the layers have connections and patterns specific to that layer, while other connections and patterns transcend layers within the data. One of the things that machine learning is great at is precisely quantifying those patterns which we can in turn leverage to get stronger assessments of the DNA profile.”
Machine learning uses existing data to train computers how to solve problems on their own with new data. Marciano and Adelman are going a step further to use deep learning, in which the computer figures out not only all the patterns and structure in the data, but the ideal portions of the data to mine for information content. “It’s been used very successfully in a variety of disciplines in the past decade, but has a drawback in that it requires a massive amount of data to teach the computer,” says Adelman.
Marciano and Adelman will be amassing large data sets from collaborators, including Verogen (formerly Illumina), Promega and the New York City Office of the Chief Medical Examiner, to simulate additional data with all of the different nuances that occur in sequence data.
“The purpose of our simulator is to provide enough data in concert with the real DNA mixtures to adequately train the artificial intelligence,” adds Marciano, who says their work would not be possible without the support of the College of Arts and Sciences Information Technology Group and Syracuse University’s Research Computing Group.
This is not basic science research. The goal is to produce a tool that can be used immediately for DNA sequence analysis. “Forensic DNA analysis is undergoing a paradigm shift and we are creating this tool in preparation for that transition,” says Marciano.