In silico Prediction of mt DNA Gene Expression Based on Codon Usage Bias in Ants ( Formicidae Latreille , 1802 ) that Inhabit Limestone Quarry Ecosystems

Codon usage is considered as a modulator of gene expression, due to high correlation between codon usage, tRNA abundance and the level of gene expression. Adaptability is primarily manifested at gene level therefore mtDNA gene expression analysis may indicate trends toward the development of adaptive traits for specific environmental conditions. Moreover, modified gene expression patterns may result from such adaptations. Due to their sensitivity to environmental disturbances, great functional importance and accessibility ants (Family: Formicidae Latreille, 1802) are excellent model organisms for molecular and bioinformatics genome analysis. This in silico simulation is based on the comparison of codon usage bias and the level of gene expression of currently available mitochondrial protein-coding genes of ant species that were sampled at quarry Ribnica (Kakanj, Bosnia and Herzegovina). MILC and MELP algorithms were used forcodon usage bias analysis and mitochondrial gene expression prediction, respectively. The analysis included four mtDNA protein-coding genes from eight selected species of ants totaling in 32 protein sequences. The results of codon usage analysis indicated no statistically significant differences in codon usage bias, as well as relative frequencies of the gene expression level. The next step should be directed to molecular ecology studies, even using whole genome measures of gene expression (RNA-seq; transcriptomics) to capture molecular response to environmental challenges.


Introduction
Mitochondrial genome of Formicide encodes for small and large ribosomal subunit, 13 proteins and up to 22 tRNA molecules.Coding sequences use synonymous codons with unequal frequencies.Thus, highly expressed genes use a subset of 'optimal' codons which are recognized by the most abundant tRNAs while in weakly expressed genes codons are recognized by rare tRNAs.Although most proteins are expressed near their optimal levels, abundantly expressed proteins should be highly optimized owing to the high energetic cost of production (Gout et al., 2010;Vishnoi et al., 2010).According to high correlation between codon usage, tRNA abundance and level of gene expression, it has been suggested that codon usage is a modulator of gene expression (Holm, 1986).Nonrandom codon usage, or codon bias, is a common phenomenon in a wide variety of organisms.It is generally thought to be determined by a balance among mutation, genetic drift, and natural selection based on translation efficiency and/or accuracy (Ingvarson, 2007).The aim of this study was to explore possible influence of ecosystems degradation level on gene expression pattern changes in ants by in silico prediction of mtDNA gene expression based on codon usage bias.However, further studies should include transcriptome analysis in order to reveal particular gene expression patterns and their relation with possible adaptations to new adaptive zones in degraded ecosystems with strong anthropogenic impact.Molecular ecology has moved beyond the use of a relatively small number of markers and application of transcriptomics has become an imperative in an ecological context (Richards et al., 2009).Alvarez et al., 2015 have made a major Ten years of transcriptomics in the wild contribution towards understanding of significant impacts that stress response may have on numerous categories of genes, but have also shown that even small changes in environment may affect transcription (Richards et al., 2012).

In silico analysis
Genomic data was obtained from the public biological database NCBI/GenBank and consists of complete sequences of available protein-coding genes from mitochondrial genomes of eight selected formicid species retrieved in FASTA format.Mitochondrial genes used for codon usage bias analysis and prediction of gene expression are cox1, cox2, cytb and nad6 (Tab 1).INCA 2.1 (Interactive Codon Usage Analyzer) software was used for synonymous codon usage analysis of nucleotide sequences (Supek & Vlahoviček, 2006).Two main algorithms for codon usage calculations, MILC and MELP were used.Measure Independent of Length and Composition (MILC) is, mathematically, based on a log-likelihood ratio score used in the statistical G-test for goodness-of-fit.This methodology yields numerically similar results to the more commonly used χ2 test, but may hold theoretical advantages over it in statistical analyses.MILC-based Expression Level Predictor (MELP) is a method of quantitatively predicting gene expressivity and was computed simply as the ratio of respective distances of a gene's codon usage from the genomic average, and a predefined reference set.
Fortunately, optimal codon usage in genes seems to coincide with transcription enhancing factors.Therefore it is sensible to explore a correlation between codon usage (acting at translation level) and the abundance of transcript.Reference set was defined simply by including protein genes that appear to be valid starting point for expression level predictions in the sampled formicid species.

Results and Discussion
Relative values of codon usage frequency (MILC) were used for the calculation of relative proteincoding gene expression level (MELP) values.We Variability of MELP values for 32 gene sequences in eight selected species of the family Formicidae are shown in Figure 7.
Codon usage analysis results of selected formicid mitochondrial genomes point to small differences in relative frequencies of codon usage as well as in the expression levels of genes of species that inhabit three distinct ecosystems.There was a statistically insignificant difference between MELP values for 12 genes for species that inhabit tertiary ecosystems, relative to the mean MELP value of the same genes for species that inhabit primary and secondary ecosystems.Variability of MELP values for 12 genes from species found at tertiary ecosystems relative to the mean MELP value for genes from species of primary ecosystems (MVP) and the mean MELP value for genes from species of secondary ecosystems (MVS) are shown in Figure 8, while the variability of MELP values for 12 genes from species of tertiary ecosystem in relation to the MELP values for genes of species which inhabit primary and secondary ecosystem is shown in Figure 9.
The expression of the analyzed protein-coding genes in the species that inhabit three different ecosystems were >95% identical.The most commonly used codon in the pool of 32 analyzed genes was UUA (Leu) with the number of 122 and relative synonymous codon using value of 3.3.Codon CGC (Arg) is the least frequently used codon with the number of 2.3, and with relative synonymous codon usage value of 0.37.

Conclusions
This in silico prediction of mtDNA gene expression prediction based on codon usage bias showed to be not informative.Further studies should include a wider set of genes/proteins involved in crucial metabolic processes.Furthermore, transcriptome analysis of selected formicid species could help in elucidation of possible adaptations to ecosystems with strong antropogenic impact, considering variation in gene expression affected by environmental stimuli.

Figure 1 .
Figure 1.Schematics of research methodology used

Figure 3 .Figure 4 .Figure 5 .
Figure 3. Relative frequencies of codon usage (MILC) and the level of gene expression (MELP) for cox1 gene of eight selected species from the family Formicidae Latreille, 1802 mediated by INCA 2.1.software

Figure 6 .
Figure 6.Relative frequencies of codon usage (MILC) and the level of gene expression (MELP) for nad6 gene of eight selected species from the family Formicidae Latreille, 1802 mediated by INCA 2.1.software

Table 1 .
Selected ant species and their mtDNA genes used for in silico analysis, with accession numbers in GenBank database.
Average MELP value for cox1 gene is 1.0834, cox2 gene 1.3232, 1.0854 for cytb gene and for nad6 gene 1.3652.MELP values for cox1 gene ranged between 1.0428 to 1.1080, for cox2 gene from 1.0957 to 1.4047, for cytb gene from 0.9672 to 1.1508 and for nad6 gene from 1.0972 to 1.4302.