Understanding Crop DNA: Gene Mapping Accuracy and Impact on Protein Research

Jim Crocker
20th June, 2024

Understanding Crop DNA: Gene Mapping Accuracy and Impact on Protein Research

Image Source: Natural Science News, 2024

Key Findings

  • The study by the Technical University of Munich evaluated the performance of gene prediction tools BRAKER2 and Helixer on crop genomes
  • BRAKER2 and Helixer showed varying accuracy depending on the genome's complexity, with high repeat content and polyploidy posing significant challenges
  • The study found that combining predictions from both tools can improve overall accuracy, but each tool has unique strengths and limitations
Plant genomics is a crucial field for enhancing global food security and sustainability. It offers innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy of genome assemblies improves, challenges remain in the structural annotation of plant genomes due to their large size, polyploidy (having multiple sets of chromosomes), and rich repeat content. A recent study conducted by the Technical University of Munich presents an overview of the current landscape in crop genomics research, focusing on the diversity of genomic characteristics across various crop species and assessing the accuracy of popular gene prediction tools, BRAKER2 and Helixer, in identifying genes within crop genomes[1]. The study highlights the significant strides made in plant genomics over the past two decades. Since the sequencing of the first plant genome, Arabidopsis thaliana, and the draft sequencing of the rice genome, over 100 crop genomes have been sequenced. These advancements have expanded plant genome research across multiple fronts, leading to innovations in genome sequencing, genetic mapping, and the integration of biological data across different scales[2]. The 10KP Genome Sequencing Project aims to sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists, contributing to a comprehensive understanding of plant and protist diversity[3]. Despite these advancements, the structural annotation of plant genomes remains challenging. Plant genomes vary widely in size, levels of ploidy, and heterozygosity (genetic variation within an organism). They also contain old and recent bursts of transposable elements, which are DNA sequences that can change their position within the genome, making them difficult to assemble accurately[4]. Recent advances in single molecule sequencing and physical mapping technologies have enabled high-quality, chromosome-scale assemblies of plant species with increasing complexity and size. However, polyploid and heterozygous plant genomes still pose significant challenges[4]. The Technical University of Munich study evaluated the performance of two leading structural genome annotation tools, BRAKER2 and Helixer, in identifying genes within crop genomes. The study found that the complexity, fragmentation, and repeat content of genomes significantly impact the performance of these tools. BRAKER2 and Helixer have their strengths and limitations, and their accuracy can vary depending on the specific characteristics of the genome being analyzed. For example, genomes with high levels of repeats and polyploidy can be particularly challenging for gene prediction tools. The study also assessed the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Proteomics is the large-scale study of proteins, and mass spectrometry is a technique used to identify and quantify proteins in a sample. The accuracy of gene prediction tools is crucial for proteomics studies, as it impacts the reliability of the predicted protein sequences used in these analyses. The findings of the study provide valuable insights for future efforts to refine and advance the field of structural genome annotation, emphasizing the need for continued innovation and improvement in gene prediction tools. One significant milestone in plant genomics is the recent achievement of a complete telomere-to-telomere (T2T) finished genome of maize. This accomplishment was made possible through the use of deep coverage ultralong Oxford Nanopore Technology (ONT) and PacBio HiFi reads. The T2T Mo17 genome assembly provided a comprehensive understanding of the structural features of all repetitive regions of the maize genome, including the assembly of entire nucleolar organizer regions and centromeres. This represents a major step forward in understanding the complexity of highly repetitive regions in higher plant genomes[5]. In summary, the study by the Technical University of Munich underscores the progress and ongoing challenges in the field of plant genomics. The diversity of genomic characteristics across crop species, the complexity of plant genomes, and the limitations of current gene prediction tools highlight the need for continued advancements in genome sequencing and annotation technologies. By addressing these challenges, researchers can further enhance our understanding of plant biology and contribute to global food security and sustainability.

BiotechGeneticsPlant Science


Main Study

1) Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies

Published 19th June, 2024


Related Studies

2) Advancing crop genomics from lab to field.


3) 10KP: A phylodiverse genome sequencing plan.


4) Building near-complete plant genomes.


5) A complete telomere-to-telomere assembly of the maize genome.


Related Articles

An unhandled error has occurred. Reload 🗙