Contemporary population genetics data for 23 Y-STR loci in the general Bosnian-Herzegovinian population

Bosnia and Herzegovina is located in the South-Eastern Europe, characterized by numerous historical influences, massive migration processes and complex population structure. For that reason, the aim of this study is to provide an accurate and precise update of the population genetics data of allele frequencies on 23 Y-STR loci in Bosnia and Herzegovina using larger sample size. For this purpose, 480 adult male individuals from the general population have been genotyped over 23 Y-STR loci contained in the PowerPlex Y23 system. Population genetics parameters have been calculated, namely allele and haplotype frequencies, gene and haplotype diversity, as well as Rst and P values for the assessment of interpopulation differences. The obtained results are in close agreement with previously published data for Bosnian-Herzegovinian population, as well as for local subpopulations. This study offers significantly increased resolution and information content, with 454 unique haplotypes. Population comparison reveals no statistically significant differences between the study population and 12 European populations used for comparison, as visualized through an MDS plot and neighbour-joining phylogenetic tree. This study offers representative data for local Y chromosomes that can be used for forensic applications, paternity and kinship testing, as well as for genealogical studies.


Introduction
With 57 Mb in length, the Y chromosome is one of the smallest chromosomes in the human genome (Skaletsky et al., 2003). Its main advantage for the population genetics studies is that a major part of the chromosome is inherited from father to son unchanged, except for occasional mutations, as a lineage marker. There is no recombination between the X and Y chromosomes during meiosis, except for the small defined pseudoautosomal regions (PARs) located at the tips of the sex chromosomes (Jobling and Tyler-Smith, 2003;Sun and Heitman, 2012). This allows Y-chromosomal alleles to be inherited as a haplotype through male lineage that can be easily tracked and analysed for newly introduced mutational events (Jobling and Tyler-Smith, 2003;Butler, 2005). Apart from being relatively small, 50% of the Y chromosome consists of repetitive sequences, including single-base substitutions, Alu elements and long interspersed nuclear elements (LINEs). When it comes to more mutable repetitive elements, prominent examples are short tandem repeats (STRs), with an average mutation frequency of ∼0.2% per generation and the minisatellite locus MSY1 with a mutation frequency of 6%-11% per generation (Marjanovic and Primorac, 2009;Ballantyne et al., 2010). Bosnia and Herzegovina (B&H) is a small multiethnic country located in the Southeast Europe on the Balkan Peninsula. According to the results of the 2013 census, the total population size is approximately 3.8 million people (Agency for Statistics of Bosnia and Herzegovina, 2013). Just like the rest of the region, B&H is a very interesting area for the population studies, since the country is a home to several partially isolated indigenous populations, in addition to being found at the crossroads between the Middle East and Western Europe. Due to that, country-level studies on autosomal and linage markers (Marjanovic et al., 2004a;Marjanovic et al., 2005;Kovacevic et al., 2013), as well as the studies of characteristic subpopulations, such as the Sarajevo Canton (Cenanovic et al., 2010) and the Tuzla Canton (Babic et al., 2017) populations, gained the attention of the research community in the past. In addition, studies of isolated populations inhabiting highaltitude areas significantly contributed to the knowledge of the molecular diversity of genetic markers in B&H (Marjanovic et al., 2004b). The main aim of this study is to perform a highresolution molecular characterization of Bosnian-Herzegovinian Y chromosomes through Y-STR marker analysis. The study is conducted on an increased sample size according to the current recommended standards (Carracedo et al., 2014), therefore aiming to revise and update previously published data on the smaller study cohort, as well as to type the study chromosomes using 23 Y-STR loci in order to increase the informativeness of obtained haplotypes.

Material and methods
In this study, buccal swab samples were obtained during 2019 from 480 adult male individuals from different regions of B&H. This is a set of completely new samples that were never previously used for population genetics studies in B&H. All participants signed an informed consent form. Prior to sample collection, the approval of the Ethical Committee of the Faculty of Engineering and Natural Sciences was obtained. The study was conducted in line with Helsinki declaration. Genomic DNA was isolated using Qiagen DNeasy™ Blood and Tissue Kit (Hilden, Germany) according to manufacturer's recommendations. PCR amplification of 23 Y-chromosomal short tandem repeat (STR) loci (DYS19, DYS385a/b, DYS389I/II,DYS390,DYS391,DYS392,DYS393,DYS437,DYS438,DYS439,DYS448,DYS456,DYS458,DYS635,DYS481,DYS533,DYS549,DYS570,DYS576, incorporated in the PowerPlex® Y23 System (Promega Corporation, Madison, WI, USA) was performed according to manufacturer's instructions using SimpliAmp™ Thermal Cycler (Applied Biosystems, Foster City, CA, USA). PCR products were detected through capillary electrophoresis on ABI PRISM® 310 Genetic Analyzer (Applied Biosystems) according to manufacturer's instructions. Allelic data analysis and haplotype assignment were performed using GeneMapper™ v3.2 (Applied Biosystems). Amplified fragment analysis and Y-STR typing were carried out according to the quality assurance standards recommended by the Scientific Working Group on DNA Analysis Methods (SWGDAM, 2014). The number of alleles and different haplotypes, allele and haplotype frequencies, and gene and haplotype diversity were estimated in order to assess the intrapopulation diversity. Haplotype diversity was calculated using Nei's formula: HD = (1-∑p i 2 )*n/(n-1), where n is the sample size and p i is the i th haplotype frequency. Gene diversity was calculated as 1-∑p i 2 , where pi is the allele frequency. The formula ∑p i 2 was used to calculate match probability (MP), where pi is the frequency of the i th haplotype. Discrimination capacity (DC) was calculated by dividing the number of haplotypes by the number of individuals in the population (Nei, 1987;Nei and Kumar, 2000). Allele and haplotype frequencies, as well as gene and haplotype diversity were calculated using the STRAF software package v1.0.5 (Gouy and Zieger, 2017). Genetic distances between groups of males and between populations were quantified by R st using AMOVA online tool from the Y Chromosome Haplotype Reference Database -YHRD (Willuweit and Roewer, 2007). In addition, associated probability values (P values) with 10,000 permutations were included for the studied European populations. Genetic distances were used to produce MDS plots for the comparison of population haplotype data, which were automatically generated on YHRD using the data available in this database. European populations selected for comparison with the population of B&H include: B&H (n = 480, present study), previously published data for the general Bosnian-Herzegovinian population (n = 100, Kovacevic et al., 2013) , 1997;Lessig and Edelmann, 1998;Schneider et al., 1998;Anslinger et al., 2000;Hidding et al., 2000;Henke et al., 2001;Schmidt et al., 2003;Immel et al., 2005;Kayser et al., 2005;Hohoff et al., 2007;Rodig et al., 2007), Czech Republic (n = 109, Zastera et al., 2010), Greece (n = 242, Parreira et al., 2002;Robino et al., 2004;Bosch et al., 2006;Kovatsi et al., 2009;Katsaloulis et al., 2013;Martínez et al., 2016), Italy (n = 1860, Grignani et al., 2000;Presciuttini et al., 2001;Ghiani et al., 2002;Cerri et al., 2005;Turrina et al., 2006;Robino et al., 2006;Onofri et al., 2007;Ferri et al., 2008;Ferri et al., 2009;Verzeletti et al., 2009;Rodríguez et al., 2009;Brisighelli et al., 2012;Piglionica et al., 2013;Robino et al., 2015;Rapone et al., 2016;Sarno et al., 2016;Lacerenza et al., 2017), North Macedonia (n = 96, Spiroski et al., 2005;Jakovski et al., 2019;Jankova et al., 2019), and Serbia (n = 379, Veselinovic et al., 2008;Veselinovic et al., 2014;Zgonjanin et al., 2017). These populations were selected because of the availability of highquality population genetics studies performed on 23 Y-STR loci and on a significant number of samples that could produce meaningful results in population comparison efforts. In addition, we wanted a group of neighbouring populations to B&H, as well as a set of other European populations, in order to check the informativeness of obtained Y-STR data for interpopulation differentiation. The evolutionary history was inferred using the neighbour-joining (NJ) method of phylogenetic tree construction (Saitou and Nei, 1987) in MEGAX (Kumar et al., 2018), whereby the optimal tree is shown.

Results and Discussion
On a sample of 480 participants, a total of 467 different haplotypes were detected in the study, with 454 unique haplotypes and 13 haplotypes appearing twice. In addition, 173 alleles at 23 Y-STR loci were detected (Table S1). Apart from DYS385a/b double locus, the largest number of alleles was recorded on DYS481 with 16 detected alleles. Two loci had the smallest number of alleles, namely DYS438 and Y-GATA-H4 with five alleles each. Average genetic diversity for the study population was 0.634 across all loci, ranging from 0.344 at the locus DYS392 to 0.884 at DYS481. At the population level, the most common allele is allele 11 at locus DYS392 with frequency of 0.8021. This was not surprising considering that DYS392 is one of the least polymorphic loci in the population with six detected alleles and lowest genetic diversity. By comparing the population from the present study with previously published data for 12 European populations, the lowest genetic diversity was observed between the currently analysed population of B&H and previously published Bosnian-Herzegovinian population (R st =0.0021, P=0.2230, Kovacevic et al., 2013), as well as the population of Serbia (R st = 0.0028, P=0.0647, Veselinovic et al., 2008;Veselinovic et al., 2014;Zgonjanin et al., 2017).  Füredi et al., 1999;Egyed et al., 2000;Beer et al., 2004;Völgyi et al., 2009;Pamjav et al., 2017), Greece (R st =0.0943, P=0.0000, Parreira et al., 2002;Robino et al., 2004;Bosch et al., 2006;Kovatsi et al., 2009;Katsaloulis et al., 2013;Martínez et al., 2016) Grignani et al., 2000;Presciuttini et al., 2001;Ghiani et al., 2002;Cerri et al., 2005;Robino et al., 2006;Turrina et al., 2006;Onofri et al., 2007;Ferri et al., 2008;Ferri et al., 2009;Rodríguez et al., 2009;Verzeletti et al., 2009;Brisighelli et al., 2012;Piglionica et al., 2013;Robino et al., 2015;Rapone et al., 2016;Sarno et al., 2016;Lacerenza et al., 2017), Germany (R st =0.1804, P=0.0000, Junge et al., 1997;Lessig and Edelmann, 1998;Schneider et al., 1998;Anslinger et al., 2000;Hidding et al., 2000;Henke et al., 2001;Schmidt et al., 2003;Immel et al., 2005; (Table S2). Genetic relationships between investigated populations are also represented through an MDS plot (Figure 1). The results of such comparisons confirm the general trends that were observed in Table S2. To further investigate molecular evolutionary relationships between the populations of B&H and other European populations, a NJ phylogenetic tree was constructed based on R st values ( Figure  2). Generally, population comparisons point towards a general conclusion that geographically closer populations show higher degree of genetic relatedness. In the NJ tree ( Figure  2), two Bosnian-Herzegovinian populations (including present and previously published data), Serbia, Croatia, and North Macedonia cluster together, while the populations of Slovenia, Hungary, Belgium, Italy, Austria, Czech Republic, Greece and Germany form another cluster on the phylogenetic dendrogram. Greece and Slovenia are the most distant in the second cluster, corresponding to their location in the Southern Europe and in geographic neighbourhood of the Balkan Peninsula, respectively. In addition, population comparisons using comparative analysis methods presented above show very low to no genetic differentiation between the currently analysed population and the Western Balkan populations. Comparing our results with previously published Y-STR data for the Bosnian-Herzegovinian population, it is useful to start by observing individual loci. Loci DYS390, DYS438, DYS437, DYS391 and DYS389I were previously identified as the least polymorphic ones in B&H (Cenanovic et al., 2010;Kovacevic et al., 2013). Current results obtained on 480 participants provide a good agreement, since all of them have six alleles per locus, except for DYS438 with only five alleles. In addition, genetic diversity values for these loci are rather low, ranging from 0.4325 for DYS389I to 0.6206 for DYS391. Similarly, DYS391 locus, with six alleles, was found to be the least polymorphic in the Turkish population recently settled in Sarajevo (Dogan et al., 2014). Conversely, DYS481 is the most polymorphic locus on 480 samples analysed here. This result is also in agreement with previous studies on neighbouring populations, but also for 100 samples of Bosnian-Herzegovinian population published in 2013 (Kovacevic et al., 2013;Babic et al., 2016;Zgonjanjin et al., 2017;Kacar et al., 2019).  Allele 22 at the DYS481 locus was found to be the most frequent allele in a recent study of Serbian population (Kacar et al., 2019), which is also the case in our study when observing this same locus. Improved data resolution in this study is achieved not only through increased sample size and samples being collected from different parts of the country. Using 23 loci significantly improved information content of obtained haplotypes. More precisely, in a sample of 480 Y chromosomes, 454 haplotypes were unique. In a previous study of the local Tuzla Canton (B&H) population, all 100 haplotypes were unique (Babic et al., 2017), while in the local Sarajevo Canton (B&H) population, 98 samples had unique haplotypes, while one haplotype was repeated twice (Kovacevic et al., 2013), with both studies performed using the same Y-STR loci that were used in our study. For comparison, a study of the general Bosnian-Herzegovinian population on 100 typed over 12 Y-STR loci produced only 81 different haplotypes, including 71 unique (Cenanovic et al., 2010). This study aims to provide an update of the current literature regarding Y-STR population data in B&H. Our study was prepared according to the latest guidelines on publishing forensic and population genetics data for Y-STRs (Carracedo et al., 2014), which state that a minimum of 17 Y-STR loci and 200 samples should be used. Autosomal STR data have already been updated in previous years, firstly by publishing data on 1000 samples analysed for 15 STR loci contained in the PowerPlex 16 system (Promega Corporation;Pilav et al., 2017), followed by publishing data on 22 STR loci contained in the PowerPlex Fusion system (Promega Corporation) on a sample of 600 Bosnian-Herzegovinians (Pilav et al., 2020). The need for an increased sample size and improved data resolution for Y-chromosome studies is addressed in the present study. The Y chromosomal STR data of the present study was submitted to Y-STR Haplotype Reference Database (YHRD) (http://www.yhrd.org) and accession number YA003787 was assigned [www.yhrd.org].

Conclusion
Our extended dataset of 480 Y-chromosomal haplotypes collected from different parts of the country and produced based on 23 Y-STR loci gives a more detailed insight into the allele frequency distribution on these loci. By increasing the sample size from 100 to 480 individuals, we are offering precise results on allele and haplotype frequencies, as well as population measures that can be used not only for population genetics studies, but also for forensic applications, paternity testing, kinship analysis, and missing person identification.

Conflict of Interest
The authors declare no conflict of interest.