For decades, the peopling of the Americas has been explored through the analysis of uniparentally inherited genetic systems in Native American populations and the comparison of these genetic data with current linguistic groupings. In northern North America, two language families predominate: Eskimo-Aleut and Na-Dene. Although the genetic evidence from nuclear and mtDNA loci suggest that speakers of these language families share a distinct biological origin, this model has not been examined using data from paternally inherited Y chromosomes. To test this hypothesis and elucidate the migration histories of Eskimoan- and Athapaskan-speaking populations, we analyzed Y-chromosomal data from Inuvialuit, Gwich’in, and Tłįchǫ populations living in the Northwest Territories of Canada. Over 100 biallelic markers and 19 chromosome short tandem repeats (STRs) were genotyped to produce a high-resolution dataset of Y chromosomes from these groups. Among these markers is an SNP discovered in the Inuvialuit that differentiates them from other Aboriginal and Native American populations. The data suggest that Canadian Eskimoan- and Athapaskan-speaking populations are genetically distinct from one another and that the formation of these groups was the result of two population expansions that occurred after the initial movement of people into the Americas. In addition, the population history of Athapaskan speakers is complex, with the Tłįchǫ being distinct from other Athapaskan groups. The high-resolution biallelic data also make clear that Y-chromosomal diversity among the first Native Americans was greater than previously recognized.
The peopling of the Americas is a question of fundamental importance in anthropological and historical disciplines (1, 2). Much research on the issue has focused on testing the hypothesis that several separate migrations entered the New World, with each migration being associated with different linguistic, dental, and presumably, genetic characteristics (3). Under this model, Amerind is the largest, most varied, and oldest language family in the Americas. However, some have questioned the use and/or appropriateness of this linguistic classification (4, 5). Despite this controversy, the designation of the Na-Dene and Eskimo-Aleut language families is well-established, although the inclusion of Haida with Athapaskan, Eyak, and Tlingit (forming the Na-Dene family) has been reconsidered (6, 7).
In addition, there has been debate concerning the validity (or strict separation/genetic differentiation) between speakers of Athapaskan and Eskimo-Aleut languages. It has been argued based on blood group markers and cranial trait data that some Inuit are more closely related to non-Inuit groups and that certain Athapaskan-speaking populations have greater genetic affinity with non-Athapaskan groups (8, 9). Complete correlations between genetics and linguistic classifications are lacking from mtDNA data (5). Even the dental traits used to justify a three-migration hypothesis did not group all Na-Dene speakers into a single category separate from the other two (Amerind and Eskimo-Aleut) groups and furthermore, suggested the inclusion of Aleuts with Athapaskan speakers from northwestern America (3). Thus, although it is generally accepted that the two language families differ from each other, it is not clear whether they have different genetic origins or instead, are the result of separate migrations from the same source.
Not surprisingly, the number and timing of migrations into the Americas are still vigorously debated (10). Previous work focused mostly on mtDNA variation in northern Native American populations (2, 5, 10⇓⇓⇓⇓–15). Work using the Y chromosome to explore these issues, however, used relatively low-resolution haplogroup and haplotype data or did not test the correlation between Y-chromosomal diversity and language use (Athapaskan vs. Eskimoan) in a localized geographic space (16, 17). Here, we rectify this issue by generating the highest resolution nonrecombining region of the Y-chromosome (NRY) dataset to date for the Americas and analyzing populations that fill a major geographic gap between the previously studied Alaskan Inupiat and Greenlandic Inuit.
We characterized the NRY in Athapaskan [Gwich’in (Kutchin) and Tłįchǫ (Dogrib)] and Inuvialuktun (Inuvialuit) speakers from the Canadian Northwest Territories. This analysis led to the more precise identification of indigenous haplogroups and a better understanding of the extent of recent European admixture. In addition, by generating highly resolved Y-chromosome lineages, we were able to confirm the phylogeny of haplogroup Q, providing a detailed basis for future work. We also assessed whether Athapaskan and Eskimoan speakers derived from separate migrations (i.e., whether their genetic variation was structured by language) and examined the relationships of populations within and among these linguistic groups. In doing this assessment, we have expanded our understanding of the migration histories of Aboriginal [the term aboriginal describes indigenous populations in Canada, including First Nations (Indians), Inuit, and Métis] populations from northern North America.
Haplogroup Q Phylogeny.
The structure of the haplogroup Q phylogeny is essentially the same as presented in the work by Dulik et al. (18), but it is enhanced for Native American and Aboriginal Y chromosomes (Fig. 1 and Table S1). The relative position of the Q1a3 branch was verified. A single Chumash haplotype possessed the four markers defining Q1a3 but lacked all nine markers defining Q1a3a, Q1a3b, and Q1a3c. Q1a3a remains defined as described in the work by Dulik et al. (18). Five of the Aboriginal participants had the L54 marker, which defines Q1a3a1*, but lacked any additional derived markers, including the Native American-specific M3. The remaining samples from this branch had the M3 marker but none of the four derived markers common in South America (19, 20).
A revised phylogeny of Y-chromosome haplogroup Q.
A number of haplogroup Q Y chromosomes did not belong to the Q1a3 branch. Most of these chromosomes had markers defining Q1a but lacked those markers that define Q1a1, Q1a2, Q1a3, Q1a4, and Q1a5. M323, which formerly defined Q1a6 (19), is now positioned as a derived mutation in relation to M346 (21). A marker detected in this analysis, called NWT01, differentiated almost one-half of the Inuvialuit Y chromosomes from all others. We have classified these Y chromosomes as belonging to haplogroup Q1a6. In addition to this haplogroup, two samples had the P89 marker, which defines haplogroup Q1a5. Thus, three of six known Q1a branches are found in the Americas (Q1a3, Q1a5, and Q1a6).
NRY Haplogroup Variation.
We observed significant differences in NRY haplogroup frequencies among the three Aboriginal groups from northern Canada (Table 1 and Table S1). All populations had high to moderate frequencies of Q1a3a1a*. The Athapaskan-speaking Gwich’in and Tłįchǫ had higher frequencies of C3b than the Inuvialuit, whereas the Inuvialuit had significantly more Q1a6 lineages. Additional haplogroups that seem to be indigenous in origin were found at low frequencies in the Athapaskan groups. Four samples (three Gwich’in and one First Nation member from British Columbia) belonged to paragroup Q1a3a1*. We also identified 10 samples (9 Athapaskans and 1 Inupiat samples) (22) that clustered with these haplotypes, suggesting a common origin for them. Another Q1a3a1* lineage belonged to a Mi’kmaq from Nova Scotia, but it is not clear that this person’s Y chromosome also shares a recent origin with these other haplotypes. Finally, one Tłįchǫ and one Slave belonged to Q1a5, whereas a similar Y-chromosome short tandem repeat (Y-STR) haplotype was found in one Alaskan Athapaskan (22). This SNP was previously described in the work by Karafet et al. (19), although its geographic distribution was not discussed.
Earlier studies of Aboriginal and Native American Y chromosomes struggled to identify the number of indigenous founder haplogroups from those haplogroups that recently came from European and African sources (16, 23⇓⇓–26). Based on our examination of genealogical information and high-resolution genotypes, we were able to distinguish between these sources; 48% of the Gwich’in and 43% of the Inuvialuit Y chromosomes were more typically found in Europeans. By contrast, only 19% of the Tłįchǫ lineages were nonindigenous. Comparisons of these data with data from worldwide populations (www.yhrd.org) showed exact or near matches between the haplotypes of nonindigenous lineages and those haplotypes of Europeans. Hence, although these men are Aboriginal, some of their genetic ancestry traces back to Europe.
NRY haplogroup frequencies for northern North American populations
Because variation accumulates with time, a relative chronology can be constructed by assessing haplogroup STR diversity, ρ, and mean pairwise differences (Table 2 and Table S2). We noted that Q1a Y chromosomes had far greater diversity than C3 Y chromosomes, and within Q1a, those chromosomes with M3 (Q1a3a1a*) had the greatest amount. The Q1a3a1a* network exhibited only one definitive haplotype cluster, which was composed entirely of Athapaskan-speakers (mostly Tłįchǫ) and distinctive in having a duplication of the DYS448 locus (Fig. S1). All other Q1a3a1a* lineages were dispersed among longer branches of the network, indicating that they derived from a common source much farther in the past. Tłįchǫ Q1a3a1a* showed the lowest intrapopulation variance (Vp) estimate, whereas values were higher in the Gwich’in and Inuvialuit. The low diversity of the Tłįchǫ is likely caused by a recent founder event. Interestingly, when comparisons were expanded to include populations from southeast Alaska, the Tlingit had far greater diversity within this haplogroup, possibly as a result of their geographic location and increased interaction with other Native Americans (27).
Haplogroups Q1a6 and C3b had roughly one-third of the diversity of Q1a3a1a*, suggesting that they arose more recently and at approximately the same time. The majority of Y chromosomes for each of these haplogroups belonged to single haplotype clusters, suggesting that they likely originated from two different ancestors relatively recently (one for each haplogroup). Q1a6 is especially important, because it is largely confined to the Inuvialuit, Inuit, and Inupiat populations. Although no SNP data were presented in the work by Davis et al. (22), a number of their STR haplotypes were similar to the Q1a6 lineages and in several cases, shared with the Inuvialuit (Fig. S2). Furthermore, Q1a6 may also be present in Greenlandic Inuit and Aleuts, although these populations were characterized with fewer STRs. Diversity estimates indicated the greatest assortment of haplotypes in the Yupik and less assortment in Inupiat and Inuvialuit. This trend of slightly decreasing values from west to east suggests an origin for Q1a6 in the westernmost Arctic and its dispersal through an eastward expansion.
Haplogroup C3b was found at comparatively high frequencies in Athapaskan populations. The Tłįchǫ and Gwich’in C3b lineages were similar to one another, forming a single large cluster (Fig. S3). We reduced our NRY data to 8-STR loci haplotypes to compare them with published data (17, 22, 28) and determine the directionality of movement by bearers of C3b Y chromosomes. Vp estimates for the 8-STR haplotypes showed the greatest diversity in the Tanana Athapaskans and Alaskan Athapaskans of the work by Davis et al. (22), moderate values among the Tłįchǫ and Gwich’in, and the lowest values in the Apache, thus forming a north to south gradient of C3b diversity.
The evolutionary mutation rate (29) was used to calculate times to most recent common ancestor (TMRCAs), because previous estimates using the pedigree-based mutation rate gave values that were too recent and conflicted with nongenetic data (18). In most cases, the estimates from Batwing were greater than those estimates from Network—a difference previously noted (30, 31). Unlike ρ-estimates, the estimates generated with Batwing are useful only when the demographic model that is used is appropriate for the sample set. In this case, the model involves a population at a constant size that expands exponentially at time β. Although generally useful, the 95% confidence intervals of Batwing estimates show that the TMRCAs are not precise. In addition, if the root haplotypes were incorrectly inferred for the ρ-statistics, then the TMRCAs could be skewed (31, 32). Regardless, the relative chronology of these haplogroups should not be affected.
The TMRCAs for M3-derived Y chromosomes indicated a coalescent event between 13,000 and 22,000 y ago (Table 3). TMRCAs from each population were substantially more recent, although the dates mirror the diversity estimates in showing the varied collection of Q1a3a1a* haplotypes in each population. The TMRCAs for Q1a6 and C3b were comparable. For Q1a6, the TMRCA was between 4,000 and 7,000 y ago. The overall TMRCA estimate of the C3b lineages was 5,000 y ago with ρ-statistics and about two times that value with Batwing. Similar estimates were calculated for each of the ethnic groups, with a range between 4,000 and 6,000 y ago; Alaskan Athapaskans had greater diversity and an older TMRCA.
Diversity estimates of major NRY haplogroups
TMRCAs of major NRY haplogroups
To make intrapopulation comparisons of Y-chromosome variation, we estimated genetic distances (RST values) from the Y-STR (15 loci) haplotypes and visualized them through a multidimensional scaling plot (Fig. 2 and Table S3). The Athapaskan-speaking populations of the Northwest Territories and Alaska appeared on the right side of the plot, whereas the Eskimoan-speaking populations were positioned to the left. Without exception, the Athapaskan and Eskimoan speakers were significantly different from each other. The Inuvialuit and Inupiat were not significantly different from each other, but the Yupik were from all other populations. Similar patterns of genetic differentiation were also observed for the Athapaskan speakers, where all except the Tłįchǫ were not significantly different from one another. However, when RST values were calculated with only lineages of indigenous origin, the Tłįchǫ were not significantly different from the Gwich’in.
Multidimensional scaling (MDS) plot of RST values estimated from Y-STR haplotypes among northern North American populations. The dotted circles enclose populations that share insignificant genetic distances.
An analysis of molecular variance was conducted to assess whether geography or linguistic affiliation was a better predictor of genetic variation. Using three geographic groups to cluster the data, we noted that the variation among groups was not significant, and among population within-group variation was high (Table S4). When the data were clustered by linguistic affiliation, the among group variation rose to 12%, whereas the among population within-group value fell to about 2.4%. This finding suggested that their paternal genetic history is structured along linguistic instead of geographic lines.
Much of the genetic research on northern Aboriginal populations has centered on the origins of Inuit peoples, particularly from Greenland, and the possible contributions of historic European (Norse) populations to their genetic makeup. This study has examined Y-chromosome diversity in the Inuvialuit to better understand the origin of their Aboriginal lineages. We accomplished this task effectively by using high-resolution haplotypes. In addition, the samples came from populations living in a region not previously well-described, thus filling a significant geographic void in genetic information about northern Aboriginal history.
From an mtDNA perspective, Inuit populations around the Arctic seemed similar to one another, with low levels of genetic diversity (13⇓–15). The general model for Inuit origins proposes a discontinuity between the earliest inhabitants of Greenland (Paleo-Eskimo) and later (Dorset and Thule) cultures (13, 14, 33). Debate continues as to whether Inuit are wholly descended from Thule tribes that migrated across the Arctic about 1,000 y ago (14) or whether modern Inuit formed out of interactions between the previous Dorset inhabitants and later Thule migrants (13). A complete ancient Paleo-Eskimo mitogenome was sequenced to test these models (33). The sample belonged to haplogroup D2a1, which does not appear in modern Inuit populations and is most similar to Aleut and Siberian Yupik D2a1a mitogenomes (15, 33⇓–35), suggesting a genetic discontinuity between ancient and modern Inuit populations.
From an NRY perspective, many Eskimoan Y chromosomes belonged to Q1a6, which has a TMRCA that predates the Paleo-Eskimo material culture. The Y chromosome of the ancient Paleo-Eskimo man was assigned to paragroup Q1a*, but the NWT01 locus was not sequenced (36). Assignment of the Paleo-Eskimo Y chromosome to Q1a6 does not conflict with these data or the TMRCA of Q1a6. Furthermore, autosomal analysis of the ancient Paleo-Eskimo genome suggested that this man had close affinities with the Nganasan, Koryak, and Chukchi of northeastern Siberia. In fact, four Koryaks also have Q1a* Y chromosomes (37), with the number of repeat differences being within the typical range of confirmed Q1a6 haplotypes. Thus, although a discontinuity in mtDNAs between the Paleo-Eskimo and modern Inuit has been shown, this finding may not be the case for Y chromosomes.
Our Q1a6 TMRCAs were comparable with the estimate that the work by Rasmussen et al. (36) calculated from genomic SNP data for a migration from Siberia to the Americas. Although the work by Rasmussen et al. (36) inferred a Siberian origin for this migration, an origin in northwestern North America with a subsequent back migration across the Bering Strait is equally likely (as seen with M3) (38), given that Q1a6 is found at higher frequencies and with greater diversity in Eskimoan speakers. In addition, NRY lineages common to coastal Siberian populations (i.e., C3c and N1c) were not present in American Arctic groups.
In addition to identifying this Eskimo-Aleut haplogroup, we also noted significant genetic differences between Inuvialuit and Athapaskan speakers. Previous mtDNA studies found haplotype sharing between Inuit and Athapaskan speakers (12, 14). This finding was evident from the high frequency of A2 mtDNAs in both linguistic groups and reduced genetic diversity relative to populations to the south—possibly caused by more recent reexpansions from Beringia or northwestern North America (12). In contrast, their Y-chromosome diversity was structured by language affiliation, which suggested that the distribution of genes and language are consistent with at least two migratory events. It is especially convincing given that some number of Inuvialuit and Gwich’in live in the same communities (Aklavik and Inuvik). Furthermore, within the Eskimoan-speaking populations, the Inuvialuit and Inupiat (Inuit speakers) and Yupik (Yupik speakers) have genetically diverged from each other.
However, the data for Na-Dene-speaking populations did not show the same correspondence between populations and language. The Tlingit, who speak the most divergent language of those languages in the Athapaskan-Eyak-Tlingit linguistic family (7, 39), are not significantly different from the Gwich’in and Athapaskan Indians from Central Alaska. Because the Tłįchǫ speak an Athapaskan language closer to the language of the Gwich’in, we expected that the Tłįchǫ and Gwich’in would be more similar to one another than to the Tlingit. However, the Tłįchǫ were genetically distinct from other Athapaskan speakers, possibly because of their lower levels of European admixture. This finding, however, does not explain the small genetic distance between Tlingit and Gwich’in and may indicate that the Tlingit sample is too small and unrepresentative to allow for a complete comparison. The genetic differences between these Aboriginal groups do suggest that the spread of Inuit culture across the Arctic was not simply a cultural phenomenon and that Athapaskan Indians living in the interior were not differentiated by their cultural adaptations to local environments alone (at least from a strictly paternal genetic perspective).
Previous studies using Y-chromosomal data have not conclusively determined whether the Americas were peopled by a single migration event or multiple migrations (23–25, 28, 40). Initial data from STRs, Y-chromosome centromeric heteroduplexes, and surveys of the M3 marker suggested a single migratory event (41⇓⇓⇓–45). The presence of haplogroup C in the Americas has been cited as evidence for a second migration and has also been used as evidence of a separate homeland for Native American progenitors (23, 24). Our data are most consistent with the model in the work by Forster et al. (12) that proposes a single migration into the Americas followed by a second subsequent expanding wave. The first wave was associated with the initial peopling of the Americas through Beringia, and the founding populations were likely composed of Q-MEH2 (xL54), Q-L54, and C-M217 Y chromosomes. Other Y chromosomes were likely present in the founding population at low frequencies but subsequently lost over time.
Q-L54 is unmistakably a founding haplogroup and provides a clear connection to southern Siberia, where L54 is also found (18). The designation of Q-L54 as a major founding haplogroup is also supported by the presence of Q-L54 (xM3) Y chromosomes in our sample set. However, the geographic distribution of this paragroup is not yet clear. By contrast, Q-M3 arose (either in Beringia or Alaska) from a Q-L54 founder and has been shown to be ubiquitous and diverse in most indigenous populations, pointing to its primacy in the first expansions of men throughout the Americas.
Furthermore, a handful of C-M217 Y chromosomes without the P39 marker were found in southeast Alaska and Colombia, South America (23, 27, 46, 47). The spread of C-P39 involved mostly Athapaskan speakers and was associated with a second wave of expansion, which is the same as Q1a6 for Eskimoan speakers. These second wave expansions likely originated from American sources that amalgamated with the first wave inhabitants as they spread throughout the north. Because of the imprecision of TMRCA estimates, it is not possible to determine whether haplogroup C3b originated before Q1a6 or vice versa.
The disparities in language, culture, and Y-chromosome diversity between Athapaskan- and Eskimoan-speaking populations likely reflect the effects of demic expansions after the initial migratory event that brought human groups to the Americas. This second wave likely occurred roughly in parallel, resulting in different migration and population histories, and contributed to the genetic makeup of extant Aboriginal populations of North America. We should emphasize that we are referring to the populations themselves and not the languages that they speak. We can only say that speakers of these two language families have comparatively recent paternal histories. This analysis has allowed us to develop a more detailed picture of the paternal genetic history of Aboriginal and Native Americans, and it shows that the diversity of the founding indigenous American populations was greater than previously acknowledged.
Buccal cells and genealogical data were obtained with informed consent from participants residing in 12 settlements during two expeditions to the Northwest Territories in 2009 and 2010 (Fig. S4). Data and sample collections were approved by the University of Pennsylvania Institutional Review Board, the Aurora Research Institute, the Inuvialuit Regional Corporation, the Gwich’in First Nation, and the Tłįchǫ Government. DNA samples were extracted using QIAamp DNA Mini kits (Qiagen) using the manufacturer’s protocol with minor modifications (SI Methods). Samples were characterized using published methods (48, 49). An SNP was discovered using the primers originally designated for Q4 in the work by Sengupta et al. (32). This SNP, named NWT01, is a C to G transversion at np 65 in the amplicon (Y position 2,888,083 in Build 37.2).
Comparative data came from published literature (Table S2) (22, 27, 48–50). Statistical analysis included estimates of haplotype diversity, mean pairwise differences and analysis of molecular variance using Arlequin v3.11 (51), and Vp estimates assessed as in the work by Kayser et al. (52). Genetic distances (RST values) were calculated using 15 Y-STR loci and visualized with a multidimensional scaling plot in SPSS v.11 (53). Reduced median–median joining networks and ρ-statistics were generated with Network v22.214.171.124 (www.fluxus-engineering.com) (54). TMRCAs were calculated as described elsewhere (SI Methods) (55).
We thank Gwich’in, Inuvialuit, Tłįchǫ, and other First Nations individuals from the Northwest Territories and Nunavut and Alaska Native individuals for their collaboration and participation. We also thank the Gwich’in First Nation, the Inuvialuit Regional Corporation, and the Tłįchǫ First Nation Government for their support of this research (SI Text). We thank Emöke Szathmary for her insightful remarks on the manuscript and Janet Ziegle and Applied Biosystems for providing technical assistance. Funding was provided by the National Geographic Society, IBM, the Waitt Family Foundation, and the University of Pennsylvania.
Author contributions: M.C.D., J.B.G., I.K., S.S., J.M., N.G., T.D.A., and T.G.S. designed research; M.C.D., A.C.O., and T.G.S. performed research; J.B.G., A.A., C.L., M.A.M., R.W., T.G.S., and T.G.C. contributed new reagents/analytic tools; M.C.D. and T.G.S. analyzed data; and M.C.D., A.C.O., J.B.G., M.G.V., and T.G.S. wrote the paper.
The authors declare no conflict of interest.
↵*This Direct Submission article had a prearranged editor.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1118760109/-/DCSupplemental.
The Sahel, which extends between the Atlantic coast of Africa and the Red Sea plateau, represents one of the least sampled areas and populations in the domain of human genetics. The position of Eritrea adjacent to the Red Sea coast provides opportunities for insights regarding human migrations within and beyond the African landscape.
Worth noting in the current data set is the absence of differentiation of Eritrean populations along their geographical and linguistic affiliation, which may be a reflection of their admixture42 or a common founding population with subsequent drift. Sharing of derived alleles for E and other more deep Y-chromosome lineages (unpublished data) of Eritreans with other populations from the region renders this part of East Africa a likely scene for some of the earliest demographic episodes within as well as subsequent expansion off the continent; a scenario that seems to corroborate paleontological, archeological and genetic evidences.9, 43, 44, 45
The network cluster associated with the Eritrean Nilo-Saharan Kunama (Figure 1) may represent an expansion event following the out-of-Africa migration,31, 46 possibly close to the origin of the ancestral Y-chromosome clades.47, 48, 49 The expansion, carrying the diversified E-P2 mutation, may be responsible for the migration of male populations to different parts of the continent and henceforth the rise and spread of the bearers of the macrohaplogroup.50 These type of population movements, or demic expansions, driven by climatic change and/or spread of pastoralism and to some extent agriculture,51, 52, 53, 54 are not uncommon in human history. This scenario is more substantiated by the refining of the E-P2 (Trombetta et al35) and its two basal clades E-M2 and E-M329, which are believed to be prevalent exclusively in Western Africa and Eastern Africa, respectively.
Interestingly, this ancestral cluster includes populations like Fulani who has previously shown to display Eastern African ancestry, common history with the Hausa who are the furthest Afro-Asiatic speakers to the west in the Sahel, with a large effective size and complex genetic background.23 The Fulani who currently speak a language classified as Niger-Kordofanian may have lost their original tongue to associated sedentary group similar to other cattle herders in Africa a common tendency among pastoralists. Clearly cultural trends exemplified by populations, like Hausa or Massalit, the latter who have neither strong tradition in agriculture nor animal husbandry, were established subsequent to the initial differentiation of haplogroup E. For example, the early clusters within the network also include Nilo-Saharan speakers like Kunama of Eritrea and Nilotic of Sudan who are ardent nomadic pastoralists but speak a language of non-Afro-Asiatic background the predominant linguistic family within the macrohaplogroup.
The subclades of the network some of which are associated with the practice of pastoralism are most likely to have taken place in the Sahara, among an early population that spoke ancestral language common to both Nilo-Saharan and Afro-Asiatic speakers, although it is yet to be determined whether pastoralism was an original culture to Nilo-Saharan speakers, a cultural acquisition or vice versa; and an interesting notion to entertain in the light of the proposition that pastoralism may be quite an antiquated event in human history.17 Pushing the dates of the event associated with the origin and spread of pastoralism to a proposed 12 000–22 000 YBP, as suggested by the network dating, will solve the matter spontaneously as the language differences would not have appeared by then and an original pastoralist ancestral group with a common culture and language50 is a plausible scenario to entertain. Such dates will accommodate both the Semitic/pastoralism-associated expansion and the introduction of Bos taurus to Europe from North East Africa or Middle East.55 The network result put North African populations like the Saharawi, Morocco Berbers and Arabs in a separate cluster. Given the proposed origin of Maghreb ancestors56, 57, 58, 59 in North Africa, our network dating suggested a divergence of North Western African populations from Eastern African as early as 32 000 YBP, which is close to the estimated dates to the origin of E-P2 macrohaplogroup.30, 60 It can be further inferred that the high frequency of E-M81 in North Africa and its association to the Berber-speaking populations25, 30, 32, 60, 61 may have occurred after the splitting of that early group, leading to local differentiation and flow of some markers as far as Southern Europe.30, 60, 62
A branching in the network may once again represent an episode of human migration that carried the haplogroup E-M35 and its subhaplogroups farther to the western coast of the Red Sea to Yemen, Oman and Saudi Arabia and concurrently down to Southern Africa as part of a more recent Y chromosome motivated out of Africa migration episode.
The PCA and MDS display similar interesting grouping of the Afar and Saho populations of Eritrea with their Near Eastern Arabian populations to conjure up on the genetic relationship of the two sides of the Red Sea. The arrival of the E-M35 and derived subclades, for example, E-M123/E-M34, to Arabia appears to be strongly linked to expansion into East Africa, North Africa, Europe, Southern Africa, an event that is likely related to pastoralism, hastened by its advent and amenable for analysis and dating using approaches similar to what was proposed for the co-migration of Y chromosome and disease traits.63
The presence of archeological10, 11, 12, 13 and agro-pastoral9, 14, 16 evidences from this side of the Red Sea and the history of migration of animals across the Red Sea,64 however, calls for more molecular dissection of common haplogroups shared by these coastal populations. As suggested by others, this may give clues not only to the origin of E-M123, J-M267, K-M70, but also to the origin of Semitic languages.65, 66 Indeed the trail of such historical movements are detectable by molecular signatures of markers like Y chromosome giving insights into episodes of even more regional nature, for example, the high frequency of E-V32 in Eritrea, in concordance to oral history, supports the historical ties between North East Africa (Egypt) and East Africa including Eritrea, Sudan, Ethiopia and Somalia.