R&D CENTER

A genome sequence of Coffea arabica var. typica (Rubiaceae) for understanding cultivated coffee tree

Jongsun Park1,2,*, Hong Xi1,2, Yongsung Kim1,2, Deokgyu Lee3, and Jongwook Woo3
URL  
Coffee is one of favorite drink in the world. The best countries for coffee production in the world are Brazil, Vietnam, and Indonesia. Recently, Coffea canepora which is paternal species of Coffea arabica was successfully sequenced. It presents overall features of coffee genome, however, it is not cultivated species for coffee bean. With the aid of next generation sequencing technologies, we sequenced C. arabia var. typica to understand cultivated coffee genome. C. arabica is tetraploid, inherited from C. canepora and C. eugenioides. Its genome size was estimated around 1.3Gbp to 1.4Gbp based on the genome sizes of the two species. Around 102 Gbp raw data (72.8x coverage) were generated from one pair-end library and additional sequencing is in progress with different pair-end and mate-pair libraries. Currently assembled sequences shows 1.02Gbp (N50 is 2,586bp) and longest contig is 114k. Cytochrome P450 (CYP) gene family was identified from the two Coffea genomes, presenting that C. arabica genome contain more than 1.5 times CYPs than that of C. canepora (764 vs 464). Considering the incompleteness of current C. arabica genome, this is another of indirect evidence for tetraploid of C. arabica. Based on our plan, we will improve the current assembly with unrevealing characteristics of C. arabica var. Typica. For efficient comparative genomics results of Coffee genomes, we established Coffee Genome Database (CGD; http://www.coffeegneome.info/), which archiving genomes and analysis results as well as diverse comparative genomics web-based tools.