Genome-wide Intra-specific Comparison from GATA Transcription Factors Among Nineteen Arabidopsis thaliana Genomes

PLOS One 16 (5): e0252181

Mangi Kim, Hong Xi, Jongsun Park*
GATA transcription factors (TFs) are widespread eukaryotic regulators whose DNA-binding domain is a class IV zinc finger motif (CX2CX17–20CX2C) followed by a basic region. We identified GATA TFs from 19 eco-type A. thaliana genomes to understand infra-specific characteristics of A. thaliana GATA TFs. 566 GATA genes (772 GATA TFs) from 19 genomes were identified and classified into four subfamilies (I to IV) based on phylogenetic tree of A. thaliana Col0 GATA TFs. Four ecotypes (Hi0, Ler0, Mt0, and Ws0) do not have AtGATA24 gene of which function is cryptochrome1-dependent response to excess light. Only Kn0 ecotype presents alternative splicing forms of AtGATA15 gene of which start positions of ORF are different. It may subtly affect their functions; however, there is no available experimental evidence. 22 out of 2,195 amino acids (1.002\%) originated from 41 GATA domains have variations across 19 ecotypes considering that four GATA TFs have heterogeneous nucleotides in ORF. Amino acid sequence of each GATA TF has a maximum of four forms in 19 ecotypes. Rsch4 and Wu0 genomes present completely identical amino acid sequence of GATA domains. In comparison to Reyes et al. (2004), three GATA genes show different length, indicating that improvement of gene prediction has affected amino acid sequence of GATA TFs. Taken together, our intra-specific comparative analyses will be a corner stone to understand intra-specific characteristics of GATA TFs in plant genomes as well as to update GATA TFs of Arabidopsis.