



LCP












Speaking DNA
Machine learning
GENA-LM: a family of open-source foundational DNA language models for long sequences
To precisely decode a genome, you need to extract contextual information from sequences which are thousands of base pairs long. Existing AI genomics tools struggle to handle such long sequences. We introduce a set of transformer-based DNA language models that can process up to an unrivalled 36k base pairs. They accurately infer features like promoters, enhancers and splice sites, and match or surpass previous models.