1. Introduction
Key questions in Archaeology and Palaeoanthropology concern the timing and character of human dispersals, with their implications for evolution, culture, and the peopling of the world. Today, these questions are typically answered by combining different threads of evidence, including archaeology, palaeoanthropology, and genetics. This has led to a much deeper understanding of the biological and cultural processes that shaped human history, as shown by the most recent literature on a variety of topics, among others, the early development of our species [1-3], the so-called ‘Out of Africa’ [4-6], and population movements and mixings in Eurasia [7-10]. In addition, it is now clear that these processes are strongly intertwined with many different aspects of human communities including social structure (e.g., [11,12]), languages (see [13,14]), material culture (e.g., [15,16]), and even food choices (e.g., [17]) and oral traditions (e.g., [18]). As Cavalli-Sforza highlighted [19], these lines of research also allow us to recognise the role of such processes on human physiology (e.g., migraine, [20], lactase persistence, [21], and multiple sclerosis [22]).
However, what has become the scientific norm for analytical methods used to address the human past is a fairly recent development introduced and led by Cavalli-Sforza [23]. Luigi Luca Cavalli-Sforza (1922-2018) was a pioneer and driving force in bridging population genetics and statistics with archaeology and other fields in the humanities, and, at the same time, making this knowledge accessible to a broader audience [23,24]. His fundamental work was synthesised in books such as ‘The Neolithic Transition and the Genetics of Populations in Europe’ (1984) [25], ‘The History and Geography of Human Genes’ (1994) [26], ‘The Great Human Diasporas’ (1996) [27], and others. In these, Cavalli-Sforza and his co-authors offered an exhaustive survey of contemporary human evolution research, translating complex population histories into engaging narratives. These works, as Cavalli-Sforza's overall career, fully demonstrated his expertise and ability to cross over several fields including genetics, archaeology, linguistics, demography, anthropology and beyond. He did so by embracing the latest advancements in computational and statistical methods, in demographic and mathematical population genetics, and by analysing cutting-edge markers for the time, from phenotypic variants such as blood groups [28] and even surnames [29], to mitochondrial DNA [30], up to the Y-chromosome [31], restriction fragment length polymorphisms (RFLPs, [32,33]), microsatellites [34], and single nucleotide polymorphisms [35].
In this article, we consider the legacy of his contributions to what has now become the multidisciplinary field of human origins. We will highlight how Cavalli-Sforza succeeded in merging the diverse fields of archaeology, palaeoanthropology and genetics, with fundamental work that expanded our understanding of the past. We will then see how he opened new directions in the reconstruction of the history of our species. With his colleagues, he used genetic data, and statistical and visual methods (e.g., phylogenetic trees, principal component analyses and maps of such components, as well as of phenotype and allele frequencies) to identify the origins of Homo sapiens and retrace its migrations out of Africa. Such work strongly influenced theoretical conceptions of human evolution and cultural transitions. We will then discuss how his interest in cultural evolution also helped to shape both methods and theories concerned with quantifying cumulative culture for the earliest periods of the prehistory of our species. Then, we will explore the limitations and potential misinterpretations linked to such approaches, an aspect that Cavalli-Sforza had always held very clear (e.g., [36]). Finally, we will give our own suggestions on how to build on Cavalli-Sforza’s path towards interdisciplinarity, using the latest available data and technologies to deepen even more our understanding of our species’ evolutionary history, thanks to the integration of paleoclimatic data.
2. Merging Genetics and Archaeology
2.1 The archaeological debate
The work of Cavalli-Sforza and Ammerman represents one of the first cross-disciplinary studies in genetics and archaeology [37-39]. Cavalli-Sforza’s intuition on the possibility of linking genetic variation to the history of migrations was to open a new frontier in better understanding evolutionary processes in humans and other species. This approach would, for the first time, allow the combination and, ultimately, the reconciliation of different threads of evidence (i.e., languages, climate, material culture), sparking no little criticism. Archaeology in the pre-war era had emphasised diffusionist ideas, which came to be linked to race by some. Gustaf Kossina’s Siedlungsarchaeologie in particular [40], linked material culture (i.e., corded-ware pottery) with proto Indo-Europeans and their migrations, ideas that were heavily incorporated into Nazi ideologies. As a result, archaeology in the post-war period generally rejected migration as a major factor in prehistoric cultural change, and even today, the need to warn against misuses of this concept remains relevant (e.g., Supp. in [41, 42]). By the 1970s and 1980s, so-called ‘Processualist’ philosophy focused research on the reconstruction of cultural systems, environmental adaptation, and the recognition of the cultural biases of researchers [43,44]. At the same time, ‘post-processual’ archaeology was also beginning to take root, with its greater emphasis on interpretation, epistemology, and subjectivity, rather than the, at times naive, empiricism of the processualists [45,46]. As a result, the intellectual landscape within archaeology was initially not aligned with the ideas being explored by Cavalli-Sforza.
At the same time, reconciling the different lines of evidence, as Cavalli-Sforza did, was (and still is) challenging, not only in its execution, but also in its interpretation [47]. As an older discipline, archaeology has experienced the consequences and dangers of overly simplistic inferences, both through the misuse of archaeology to promote particular ideologies, as well as through major scientific breakthroughs. For example, prior to the advent of radiocarbon dating, chronologies were relative and based on the stratigraphies of individual sites or through seriation [48-50], a relative dating method in which morphological changes of objects, e.g., pottery, were linked to chronology. The radiocarbon ‘revolution’ broke down many established theories in prehistory, with subsequent revisions undergoing a second ‘revolution’ when calibration methods were introduced [51].
2.2 The genetic background
Cavalli-Sforza’s scientific career lasted from the dawn of molecular genetics through to modern genomics, and he contributed to the development and application of the field to the study of human evolution throughout. Ultimately, all genetic variation comes from errors, or mutations, that occur during DNA replication which create differences in DNA sequences among organisms (reviewed in [52]). These mutations contain information about the history of the populations in which they are found in several ways. First, a mutation that is shared among organisms gives evidence that their most recent common ancestor at that locus was, at oldest, the individual in which that mutation occurred (assuming the mutation occurred just once). This also means that the most recent common ancestor of organisms that do not share the mutation must be older than the individual in which that mutation occurred. Thus, shared-derived genetic variation provides evidence of relatedness. Second, the frequency of mutations in and among populations gives evidence of their demographic history. This is because random mating in constant-sized populations has clear theoretical expectations for the pattern of frequencies of neutrally evolving genetic variants (alleles), and deviations from these expectations reflect population processes such as population substructure, migration, and increases or decreases in population size among others. Natural selection for a beneficial gene also causes changes in expected allele frequencies. These processes can all be inferred by looking at the patterns of allele frequency variation (reviewed in [52]). Cavalli-Sforza and colleagues were instrumental in both applying and developing the methods that use genetic variation to gain insight into the history of our species, and this has led to a fuller understanding of the fossil and archaeological record.
The technology available to characterise genetic variation has developed to the point where we have now fully sequenced the genomes of more than 150,000 people in the UK alone [53], making comparisons of millions of genetic variants from global populations now practical (e.g., [35,54,55]). However, early studies of genetic variation could not be performed on DNA sequences directly. Instead, differences in the DNA had to be inferred by observing variations in their corresponding protein products (the so-called ‘classical markers’). Molecular anthropology took off as a discipline when Sarich and Wilson compared protein differences between humans, apes, and cercopithecoid monkeys and found that the African apes are more closely related to humans than they are to orangutans, and that the last common ancestor of African apes and humans lived only 5 million years ago [56]) (subsequently revised to ~7 million years ago [57]). These findings were surprising and disputed by leading paleoanthropologists at the time (e.g., [58]), but have since proven robust and demonstrated the power of genetics to add to our understanding of human evolution.
The study of genetics was transformed by the development of the ‘Sanger’ dideoxy terminator DNA sequencing method in the late 1970’s [59] followed by the polymerase chain reaction (PCR) method for targeting and amplifying specific stretches of DNA in the mid 1980’s [60]. It suddenly became practical to study DNA sequence variation directly, which Cavalli-Sforza himself defined a revolution “generating great (potentially gigantic) progress” [19]. Studying DNA sequences allowed researchers to reconstruct diagrams of genetic relatedness in the form of bifurcating trees. The root of the tree represents the last common ancestor of the full sample, while each subsequent branching point represents a more recent ancestor common to a subset of the sample. Relationships among the samples can be understood by tracing the branching path between them. Also, because genetic mutations are expected to accumulate regularly through time, the number of DNA sequence differences separating individuals from their last common ancestor gives insight into how long ago that ancestor lived [61,62]. The earliest studies of DNA sequence variation in humans focused on the mitochondrial genome (mtDNA), a small, maternally-transmitted circular chromosome found within the cell's energy-generating organelles, the mitochondria (e.g., [63,64]). Its high copy number facilitated the laboratory work. Moreover, the lack of recombination, strict maternal inheritance, high mutation rate, and smaller effective population size (because it is haploid and only passed on by females), made the locus particularly informative for recent evolution [65]. Attention then shifted to the paternally inherited counterpart to mtDNA: the Y chromosome [31].
Research on mtDNA and Y-chromosome diversity continued to help understand the past history of migrations around the world, including the peopling of Australia [66,67], the Pacific [68,69], the Americas [70,71], and Madagascar [72] among others. However, there was also a growing theoretical realisation that histories inferred from single genetic lineages such as mtDNA or Y-chromosomes were unreliable or potentially misleading because of the inherent stochasticity of the coalescent process. Computer simulations showed that even under highly divergent demographic histories, such as those suggested by Recent African Origins versus multiregional evolution, similar gene trees are expected to occur for any given genetic locus a high proportion of the time. Instead, the patterns from many different genetic loci must be used to reliably distinguish between various possible demographic histories [73].
As a result, attention shifted to ‘genomic’ approaches to understanding our evolutionary history. Genomic approaches look at patterns of variation across the entirety of the genome, rather than focusing on a specific locus such as mtDNA or Y-chromosomes. The first widely applied approach to do so focused on microsatellite DNA (i.e., short repetitive sequences that vary in the number of repeats). These had to be individually amplified with PCR and genotyped using polyacrylamide gel electrophoresis and allowed researchers to look at hundreds of genetic lineages from across the genome [74]. Gene chip, or microarray, technology was then adapted to allow genotyping of known variable sites, and this allowed characterising tens of thousands to millions of single nucleotide polymorphisms (SNPs) across the genome in a single experiment (e.g., [35]). Finally, next generation DNA sequencing technology was introduced that allowed multiple human genomes from modern or ancient DNA sources to be characterised and analysed [75].
These genomic approaches have revolutionised our understanding of human evolution. They increase the confidence we have in evolutionary inferences from genetic data, and make it possible to ask more detailed questions about our evolutionary past. Cavalli-Sforza has contributed to this understanding throughout the development of the field, from the early work on proteins (e.g., [26,28]) to modern genomics (e.g., [35,76]). In the next section we dive into these contributions in more detail.
3. New directions in palaeoanthropology opened by the work of Cavalli-Sforza
3.1 The origins of Homo sapiens
The study of modern human origins largely began in palaeoanthropology where researchers were trying to understand the relationships of the various Pleistocene hominin fossils from around the world to each other and to living people (e.g., [77,78]). These debates were important because the interpretations of the fossil record were used to support various ideas about the antiquity and extent of ‘racial differences’ among living people. Polarization continued through the 1980s and 90s, with the crystallization of the Multiregional versus Out of Africa hypotheses [79]. Cavalli-Sforza’s work began to filter into these debates by providing genetic measures of the extent of relatedness and differentiation among living people [80]. His early research on the topic (summarised in [81]) computed trees using up to 58 alleles in a dozen blood group systems, covering up to 15 populations. The fundamental split in the trees was consistently an east-west one, separating Europeans plus Africans from Asians/Native Americans plus Australo-Melanesians. Africans and Australians were separated by the greatest total difference, and as a whole, the patterns roughly corresponded to geographical distances [28]. Cavalli-Sforza also attempted to calibrate divergences between the studied populations, placing African/Asians as the first separation, on the basis of an estimate of the kinship coefficient (ƒ) which measures genetic similarity within and between populations [80,82]. Calibrating by an assumed date of 15,000 years for the migration of Native Americans to the Americas, the split of Africans and Asians was calculated at 35-40,000 years, assuming that genetic drift, which refers to random changes in allele frequencies between generations caused by stochasticity in finite populations (the magnitude of which are inversely related to the population size), controlled the rate of change [81]. These studies did not place the original pre-split human population on one continent or another, but he said that “the oldest-known fossil remains that are classified as Homo sapiens sapiens are between 40,000 and 60,000 years […] and indicate that the differentiation among [modern populations] began not long after the appearance of modern man” [81].
By 1986, however, Cavalli-Sforza and colleagues had moved to the more direct molecular approach of using newly available DNA markers (restriction fragment length polymorphisms, from lymphoblastoid cell lines) [83]. 120 alleles were studied in 38 populations, which were often more geographically restricted than the larger regional groupings of earlier work. Now, a different pattern emerged to the primary Eurafrican - Greater Asia division, and instead there was a sub-Saharan African - non-African split. However, Cavalli-Sforza was still cautious about publicly concluding that there was an African origin for H. sapiens, saying that more data were needed. In person (as one of the authors, C.S., found at a conference in Italy in 1986), however, he thought it was increasingly supported. By 1988 (e.g., [84]) he had fully embraced a Recent African Origin for H. sapiens, and was mapping new connections and correlations with archaeological and linguistic data in ambitious reconstructions of recent human evolution.
Soon after supporting the theory of an African root for modern humans based on mitochondrial DNA (mtDNA) trees, with some inconsistencies with archaeological evidence [85], Cann and coworkers (1987) proposed what became known as the ‘African Eve’ theory [62]. Their groundbreaking study of worldwide variation in restriction enzyme sites in mtDNA showed that (1) the most recent common maternal ancestor of all living people likely lived in Africa, (2) that all non-African populations are closely related to each other, and restricted to a single branch of the mtDNA phylogeny, and (3) that the most recent common maternal ancestor lived only about 200 kya [62]. These findings were later confirmed and refined with better sampling, and complete mtDNA genome sequences [86]. Cavalli-Sforza and colleagues found a very similar pattern to mtDNA evolution in a worldwide sample of Y chromosome variation, with the most recent common paternal ancestor of all humans living relatively recently in Africa, and all non-Africans restricted to a single branch of the Y-chromosome tree [31]. Together these findings were widely interpreted to support the Recent African Origins model of human evolution.
This work had a seismic effect across both palaeoanthropology and archaeology. In palaeoanthropology, it catalysed a fierce debate between those who favoured a recent and localised origin for H. sapiens, and those who supported a much older, more gradual and more widespread process of evolution. In archaeology, this research forced a confrontation between archaeological and genetic data. First, the body of research pointing towards an African origin for our species led to the expectation that the material culture associated with that origin should also be concentrated in Africa, rather than distributed more globally, as other models would predict [79,87,88]. Prior to this time, research focus had primarily been on Europe, where an array of Upper Palaeolithic artefacts, including statuettes, and cave art, had been interpreted as the early manifestations of modern humans. The new insights generated from mtDNA precipitated an African archaeological ‘gold rush’ looking at the hitherto understudied Middle Stone Age (MSA), which was now chronologically and geographically linked to ‘African Eve’ [88]. At the same time, there appeared to be a considerable mismatch between the archaeology and the genetic inferences presented, validating Cavalli-Sforza’s initial caution. First, the concept of an ‘African Eve’ was misinterpreted (particularly by its ‘multiregionalist’ opponents, and in the media [89,90]) to suggest that the originator of the oldest mtDNA haplogroup was the first woman to have lived, rather than one of many women whose mtDNA lineages, instead, became lost due to drift, i.e., leaving fewer descendants by chance. The fact that uniparental (i.e., inherited from only one parent, as mtDNA or Y-chromosome) ancestry represented only a fraction of human ancestry was not at first broadly recognised. More problematically, these early studies suggested to many that modern humans emerged at the same time as this mtDNA lineage. As a result, there was a wide-ranging tacit understanding that humans had to have a single centre of endemism, dubbed the ‘Simple Out of Africa’ model [89-91], typically identified as somewhere in eastern Africa or southern Africa (see [88] for debates). As a result of this, research began to be concentrated in regions identified as ‘candidate centres of endemism’, at the expense of other regions [3]. This led to further bias in the archaeological record. Finally, mtDNA haplogroups were seen as real populations rather than groups of single-locus alleles featuring common ancestry, amplifying misunderstandings on the definition of ‘population’ [92,93]. It would take another twenty years of research to move all disciplines concerned towards a consensus closer to what Cavalli-Sforza had argued for. This was a process that has required intensive dialogue and scientific communication, which transformed the field of human origins into a much more integrated and truly interdisciplinary endeavour (see, e.g., [1,3,94]).
3.2 Reconstructing migrations
In Cavalli-Sforza and coworkers’ vision, a way to explore population migrations and expansions (notably addressing questions regarding the Neolithisation of Europe, ‘the Neolithic Transition’, and Bronze Age migrations) was through the use of computer models to reconstruct population histories. In a seminal paper [95], Cavalli-Sforza and co-workers used computer simulations to reproduce the genetic clines observed in Europe on the basis of classical markers [37]. This work set the scene for a whole series of research projects into spatial genetics using simulations which continues to this day (e.g., [96,97]), and not only on humans (e.g., [98,99]). This new field of research not only focused on the Neolithic and Bronze ages (e.g., [100-105], and many more), but was also applied to the earlier history of our species, (e.g., [106]), namely the ‘Out of Africa’ expansions (e.g., [94,107,108]) and the interaction between Homo sapiens and other hominin groups in Eurasia, such as the Neanderthals (e.g., [109-112]). These models catalysed debate within the archaeological community [113], not least because Cavalli-Sforza worked on the problem from the ‘outside’ with Ammerman, a prehistoric archaeologist [25,38,39]. This partnership perhaps precipitated greater consideration of research history and context than was evident within the field of human origins (see, e.g., the edited volume by Ammerman and Biagi [114]).
Reconciling the fossil and archaeological evidence of relatively frequent dispersals out of Africa, and the genetic studies indicating non-Africans today descend from just one, major, dispersals, remains a major pursuit [115-117].
3.3 Cultural evolution
One of the many aspects of the human past that Cavalli-Sforza explored throughout his career was cultural evolution, as for example in [118-120]. In these works, he highlighted the similarities between biological and cultural evolution, suggesting that similar approaches could be used to reconstruct the two. Among other concepts, he identified the cultural trait as the basic unit of cultural evolution, analogous to the gene in genetics. As a side note about the broader societal impact of Cavalli-Sforza, it can be highlighted that this idea was then revisited by Richard Dawkins (1976) under the term ‘meme’ [121], indirectly making Cavalli-Sforza the original proposer of the concept behind the digital phenomenon of ‘internet memes’ peaking in the 2010s [122].
When it comes to the deep human past, stone tools (or lithics) make up the majority of material culture. They are the most ubiquitous evidence for understanding and studying culture change, making them the primary resource (or ‘cultural trait’) for exploring cultural evolution in early humans. While analysing lithics is subject to its own set of problems, including problematic taxonomies [123] and problems of replicability [124], the record itself challenges notions of how culture change should operate. This is due to the muted character of cumulative culture (i.e., the accumulation of modifications over time) within the Middle Stone Age (MSA), the first and longest-lasting material culture phase associated with H. sapiens [88]. While the emergence of the MSA itself coincides with the earliest manifestations of physical modernity (e.g., [125]), and represents a profound re-organisation of material culture, there are significant periods of stasis within it. Indeed, the earliest and final examples of MSA assemblages show very little difference between them, despite being some 300 thousand years apart in time (e.g., [126,127]). A range of innovations in that time are evident, attesting to the presence of modern cognition; yet they appear to become lost, and have to be reinvented [88]. Cumulative culture, therefore, appears to be lacking for demographic reasons, rather than a lack of capability.
The character of the record has been argued to be at least partially explained by population structure [3], which presents certain expectations of the material culture record [3]. In particular, studies show that larger populations feature greater cultural complexity, as well as greater genetic diversity [128], and that long-term population size differences among Palaeolithic hunter-gatherers are likely to have played an important role in later cultural processes [129,130]. If populations are structured, the overall metapopulation features greater genetic diversity, since each local population is a reservoir of local genetic diversity. However, as an inverse function of connectivity, the individual local populations, while different to each other, will have lower local genetic diversity. The effects of cultural complexity are not the same. Small, local populations have a lower capacity for innovation and high fidelity copying owing to the small population size (see, e.g., [91,128]). In contrast, larger and/or denser populations are more likely both to innovate, and copy innovations in high fidelity [128,131]. As a result, cultural complexity is lower at the local level, when structure is high, while in the same circumstances, overall genetic diversity of the metapopulation is high. These effects may well explain the pattern seen in the archaeological record of the MSA, where local population extinction and/or fragmentation probably repeatedly resulted in the loss of new innovations beyond knowledge of the basic MSA toolkit which all populations inherited. Specifically, the basic traits that define the MSA, including core and flake technology, retouched points, and side and end retouched pieces, appear to continue from their inception until about 20 thousand years ago, and even later in parts of West Africa [132]. More innovative elements, on the other hand, appear and disappear at different times and places in Africa, often with ‘regionally distinctive’ elements like tanged or pedunculated tools in northern Africa, and possibly even bow and arrow technology [133]. The stability of basic MSA elements indicates that these were present among the earliest H. sapiens populations, but that later demes were unable to hold on to the landscape and population-level knowledge required for innovations to persist. Many appear to have been simply extirpated. Given the similar effects of structure on local cultural complexity, but the opposing effects on metapopulation-level genetic and cultural complexity, it may also be possible that structured models are the only way to explain the patterns seen in both the archaeological record and the genetic data [123].
This is just one illustration of how cultural evolution can be integrated with biology highlighting their complex interplay; more examples exist [16] and can be considered part of Cavalli-Sforza's scientific legacy [134,135]. Cavalli-Sforza and Feldman's work [135] highlighted the need for a more nuanced and interdisciplinary approach in investigating cultural and biological inheritance in the context of human evolution: distancing themselves from genetic determinism (e.g., updating simplistic theories regarding inheritance of traits such as intelligence, see [134] for a complete overview), they recognised culture as a powerful force that interacts with biology to shape evolutive processes.
Cavalli-Sforza was right once more when he wrote: “I find the similarities, dissimilarities, and interactions of cultural and biological evolution an almost virgin field [...] and one that has great potential not only for the intellectual challenge that it offers, but also for a better understanding of human nature.” [136].
4. The Risk of Misinterpretation
The integration of such different disciplines as archaeology and genetics, while offering enormous potential, as discussed in this work, also carries the risk of oversimplification and misinterpretation. These issues often arise when shared terminology is used across fields but carries (slightly or substantially) different meanings, as we will observe with the discrepancies in population/group labelling based on archaeological and genetic evidence.
Furthermore, each discipline operates within distinct frameworks and methodological assumptions, which, while well-understood by experts within that field, may remain opaque to scholars from other fields. Particularly problematic are instances where interpretations rely too heavily on a single source of evidence (e.g., genetic or archaeological data alone), leading to reductive conclusions about inherently complex processes. Such oversights highlight the necessity of a truly integrative approach that critically addresses multiple lines of evidence to explore the multifaceted nature of human evolution. Cavalli-Sforza and colleagues were pioneers in such multidisciplinarity, proposing a model of demic diffusion based on genetic patterns and archaeological evidence (e.g., [28,37], see [137]). His efforts to bridge genetics and archaeology have inspired innovative interdisciplinary methods, but have also exposed the limits of some common assumptions.
In the next few paragraphs, we will discuss some of such assumptions behind population genetics analyses and how this clashed with archaeological research. With this, we aim to raise awareness towards potential misunderstandings to enhance the efficacy of interdisciplinary works.
4.1 Defining and labelling human groups
One important cause of misunderstanding between the two disciplines is the different ways they define human groups (from here on: populations). In population genetics, the term ‘population’ refers to a group of organisms within which mating occurs randomly. The mathematics of population genetics are largely built upon this simplifying assumption (e.g., the famous Hardy-Weinberg equation that describes expected genotype frequencies from allele frequencies). In a practical sense, this means genetic populations are typically groups of individuals from the same species living in the same place under the same environmental conditions, who can mate with each other without specific barriers. The genetic population is a biological unit, whose members are related, and it maintains continuity over time thanks to the reproductive interconnections between generations. This is clearly an oversimplification [138,139]. When Franz Manni [36] asked Cavalli-Sforza how he defined a population (in the context of genetic sampling of contemporary human groups), he replied that “to identify a population, what is required is a clearly identifiable social group where mating is mostly within the population. Old-time genetic theories often assume that mating is random, but this is practically an incongruous assumption and some stratifications - like socioeconomic ones - are key elements that, unfortunately, are difficult to handle”. In brief, he said that “Language is the simplest major common element that defines a social group” [36]. Cavalli-Sforza thus recognised that human populations in the genetic sense will often be living at the same time, in the same place, and in the same environment but kept separate by social rules or linguistic differences that present barriers to random mating.
In an archaeological framework, the definition of populations is more clearly associated with a cultural dimension, and historically linked to material culture. There is no direct assumption that individuals who shared material culture traits were part of the same biological population: culture can be shared without relatedness (because of convergence or exchange) and relatedness does not imply cultural similarity (divergence) (e.g., [140,141]). Nonetheless, considering both material and genetic evidence, while maintaining a clear distinction between them and avoiding conflation of archaeological and genetic classifications is fundamental and encourages further reflection and investigation [142,143]. The same caution applies to language groups, which do not necessarily correspond to biological populations, material assemblages or even social units [138].
These contrasting views have important implications for the scientific interpretation and social meaning of many studies focusing on the human past. Cavalli-Sforza himself drew attention to the fact that the information derived from phylogenetic tree branches is not equivalent to the public’s perspective on ancestry and relatedness: there are multifaceted interpretations of the population-genetic tree (e.g., geographic distance between populations against variability within a population) and its imagery may create racial narratives about human groups and biological relationship, quite contrary to his conclusions about the shared genetic heritage of all humans [27,144,145]. Despite the immense help in reaching a unified perspective on human origins that these approaches offered (i.e., [62,85]), the apprehensions about upholding colonialist and racist ideologies by using tree models of human genetic kinship remain [145,146]. Today, the concept of trees works well when quantifying distantly related species. Still, for closely related hominins, who are likely frequently admixed, the model has become more problematic (e.g., the possibility of Neanderthal and Denisovans introgression [3,94,147]).
Population genetic methods and analyses commonly employ the simplifying assumption of random mating within population or population subgroups (demes). This assumption is reasonable for most species; however, it is confounded in humans where mate choice is influenced by geographic proximity, cultural and ethnic affinity, shared language, and many other factors that make randomly mating subgroups difficult or impossible to define. Studies investigating genetic ancestry have variably categorised populations with discrete labels, such as geopolitical or continental clusterings, ethnicity, traditional racial categories or ‘units’ of culture that attempt to capture the genetic meaning of a population [148]. These various labels reflect the difficulty of defining human groups from a population genetics perspective, and, in most cases, any chosen definition will be a compromise made by researchers. No single discrete grouping will fully capture the multilayered cultural and social complexity inherent in all human societies (see [149] for a detailed analysis of genetic ancestry). Any such grouping risks giving the impression of the existence of genetically ‘pure’ groups akin to outdated ideas about race [150], when in reality human genetic variation is characterised by relatively low diversity, high continuity, and repeated episodes of migration and gene flow [27,151-154].
These insights must be more effectively communicated to the wider public to avoid any misunderstanding in interpreting ancestry. The wide-spread access to direct-to-consumer (DTC) genetic ancestry testing reframes how we talk about ethnicity, demography, population groups and the genetic profiles under which we label them. This can be seen as an apparently ‘objective’, scientific basis to dangerous allegations (tracing individuals’ ‘roots’, ethnic labelling or even justifications to territorial claims) [155], in contrast to the potential of DNA itself in dismantling such biased, and often nationalistic, narratives [156-158]. And in the context of possible socio-political implications, Cavalli-Sforza highlighted that, “It is difficult to believe that knowledge of genes may help to explain [...] conflict[s]. Although population genetics can address issues of relatedness of populations, mating patterns, migrations and so on, obviously it cannot provide evidence about reasons for conflicts between people” [159].
One potential solution to these issues is to contextualise the results in a multidisciplinary framework, enabling a more comprehensive approximation of a complex and often socially constructed reality [160,161]. Additionally, we bear the responsibility of effectively communicating these complex perspectives to the public, a vision close to the life-long efforts of Cavalli-Sforza.
Cavalli-Sforza popularised genetic concepts through ‘mainstream science’ (e.g., [25-27]) and his position on the dialogue on race, opposing genetic determinism and scientific racism is well known [162]. This position was not necessarily followed by a similar awareness of colonialist approaches to research (as recently highlighted in [162]), which emerges more clearly with the birth of the Human Genome Diversity Project (HGDP) [163]. Despite the fact that the scientific environment of the time was already becoming increasingly aware of de-colonisation (e.g., property rights of indigenous peoples, the 1990 Native American Graves Protection and Repatriation Act), the HGDP was still operating within a colonialist framework. The HGDP has made fundamental contributions to advancing the understanding of human genetic diversity (e.g., [35,74,164], see recent review [165]), laying the foundation for other large-scale genetic research projects of critical importance (e.g., 1000 Genomes Project [54,55]). However, it also raised some concerns. Among them, the idea of ‘indigeneity’ of the ideal participant, which unrealistically excludes deep histories of migrations to favour a notion dangerously close to the idea of ‘purity’ as opposed to population admixture. Also, sample classification raised discussion between Cavalli-Sforza, who proposed ethnic groups as the basis of sample collection approach, and Allan Wilson, who proposed a geographical grid-based approach [165]. Also, the practical implementation of the project sparked discussions regarding possibly problematic sampling strategies, such as bioethical concerns regarding ‘helicopter science’ (in which researchers ‘parachute’ into lower-income countries to obtain samples, with little or no engagement with local scientists or communities [166]). These concerns highlighted issues about the protection and rights of indigenous people [162]. The discussion is not yet resolved, as we see a broader awareness of ethics and consent in the use of genetic data (e.g., [167,168]), including from ancient sources (e.g., [169]). The issues discussed here are highly pertinent to the contemporary debate on the use of the ancestry concept in different contexts, such as health and medical research [170], and the relevance extends even to political debate on social media, where biased interpretations of historical examples are often used to promote more extreme point of views [171].
4.2 Genetic demography
Similarly to the misinterpretations we discussed regarding the concepts of population and ancestry, demographic estimates in genetics do not correspond to the actual number of individuals in a population [172]. Genetic demography is based on the concept of ‘effective population size’ (Ne), which reflects the number of breeding individuals [173]. It is a fundamental population genetic parameter because it directly relates to the amount of genetic diversity found in a population. Estimates of Ne do relate to the census population size; however, the relationship is not straightforward and demographic histories that include bottlenecks or population structure confound the relationship [174-176]. This is particularly problematic in humans where structure may result from cultural and social factors [177], imbalanced sex ratio and mating/marriage patterns [178], or migration [179-181]. For example, estimates of Ne in African peoples have been calculated to be only 7,500 people [182], far below the census size of more than 1.5 billion. This is because the population has expanded very rapidly in the historical past and the genetic diversity has not had time to increase accordingly [3]. Thus, estimates of effective population size, although informative in highlighting patterns not accessible from other disciplines, may be misleading if assumed to represent census size [139,183].
Uncritically applying these concepts to archaeology, without considering the assumptions behind these analyses, can significantly mislead the interpretations, as discussed below. This has pushed towards the implementation of more complex models that, for example, are able to include elements such as gene flow and branch merging, and that acknowledge the interactions between populations’ shared ancestry, their geographic distribution and spatial dynamics over time [111,184,185]. Genetic concepts such as isolation-by-distance (IBD), have also made their way into archaeology, highlighting how genetics as well as archaeological and fossil material from different sites are directly linked to geographic distances [3,186].
This push towards an integration of more complex scenarios also allows better alignment of genetic evidence with paleoanthropological findings [3,123,187]. The inclusion of ancient DNA also revolutionised these approaches [1,35,188-194]. However, one must acknowledge that this potential is matched by a number of additional biases [183], chief among them the difficulties of obtaining samples from hot, humid, tropical regions of the world, and from the deeper time periods of human evolution. When direct data from these remote times are limited, yet again, models and simulations can come to our aid [2,108].
A fundamental aspect of archaeological models is that they necessarily represent a reduction of complex scenarios that must be simplified to apply genetic methods. This must also be reflected in their interpretation. As we have seen in the previous examples, the complexity of models is increasing, adding new (or relatively young) lines of evidence such as ancient DNA and (palaeo)climate. Along with the growing complexity, the computational possibilities have also improved, so alternative scenarios can be easily tested and must be included in the analysis [2].
Overall, we explored the idea of genetic demography and population, and showed that (population) genetics could support not only the understanding of evolution, but also offer a new dimension to cultural and anthropological studies. This was already initiated and described in Cavalli-Sforza’s 1973 overview (‘Some Current Problems of Human Population Genetics’) [136]. It should only be considered that these approaches may be also prone to dangerous simplifications and misinterpretations.
4.3 Modelling human origins: bridging disciplines
The problems associated with model inference was presciently recognised by Cavalli-Sforza, who emphasised the need to truly combine the extant evidence and data in order to achieve a more realistic framework for understanding the origins of modern humans. Arguably, we are still catching up with this vision. At the time of writing, the debate is no longer divided by the use of different lines of evidence and, therefore, different, segregated fields, but it focuses on how best to integrate and model complex scenarios and to reach the right balance between information and simplification of reality that these can represent (e.g., [1,3,188,195], but a reader less familiar with modelling may find these core concepts surprisingly - and engagingly - well detailed in [196]). Promoting the integration of insights from various disciplines, i.e., genetics, archaeology, and anthropology, is therefore vital in providing robust (and, hopefully, more accurate and nuanced) models [195].
Cavalli-Sforza was a pioneer in such multidisciplinarity: this gave context to population genetic analyses (at the time, based on limited markers, not always in the form of DNA). His work encouraged consideration of genetic variation between and within population groups, supported by modern data (and databases, e.g., HGDP [153,163,165]), eventually supporting the so-called ‘Out Of Africa’ expansion for the ancestors of non-Africans today. Now, the development of analytical techniques and the amount of available data allow the integration of more complex models, where patterns in genetic data can help infer population structure, and other signatures of demographic processes (e.g., [2,3,197-199]). However, the interpretation and simplification of these models of population histories is still a challenging task, requiring a balance between assumptions and empirical data obtained from different disciplines (e.g., archaeology, genetics, palaeontology, and linguistics) and, ultimately, from ancient material.
In particular, there is an overarching need to better communicate the consensus on the elements that can easily be misunderstood in interdisciplinary analyses, such as how populations are being defined, and the limitations and assumptions of particular models. If, for instance, archaeology was initially hesitant to integrate the ideas proposed by Cavalli-Sforza, the subsequent decades have witnessed a predominance of inferences in archaeology driven by genetic models (see, e.g., [200,201]). This has been the case to the extent that archaeological data has been filtered to match the scenarios proposed by genetic models, even though both fields have comparable gaps in information. For example, as described above, ‘Mitochondrial Eve’ was initially taken to represent the ‘first woman’, requiring archaeologists to explain why the African Palaeolithic record for this period apparently showed nothing striking or heralding a modern mind at the time when she was estimated to have lived in Africa. These doomed attempts at reconciling misinterpreted genetic data at least precipitated conversations regarding the extent to which cultural and biological data should track each other [142,143]. Archaeological data represents the cultural dimension of the non-perishable material culture record and it is, by its very nature, partial. Genetic data represents continuous ancestry, with data ‘lost’ through drift, and, like archaeological data, has patterns that can be explained using different demographic scenarios that cannot easily be distinguished [1]. For example, we only have available limited and fragmentary fossil data, scarce and scattered through space and time. This does not allow a proper quantification of the variation within the groups they belong to, which in turn is needed to correctly quantify variation between them.
All these lines of evidence represent different aspects of the human record, and it is only by finding models that can integrate, standardise and parametrise them, that we can advance a fuller picture of the human past.
5. Looking Towards the Future
In this article we offer a glimpse of the incredible contributions of Cavalli-Sforza's long career in the interdisciplinary exchange between genetics and the broad field of human evolution, and how the groundwork of his research continues to provide new insights for today and tomorrow.
Some aspects have not drastically changed since his extensive interview with Franz Manni in 2010, attesting Cavalli-Sforza’s foresight yet again [202]. An example is the use of principal component analysis (PCA), of which Cavalli-Sforza was a pioneer [26,37,203], which is still greatly used nowadays [204,205], accompanied by other methods like clustering approaches [206,207], or the quantification of ancestral admixture [208], to name only a few. Of course, some aspects have been better understood. Cavalli Sforza and colleagues proposed that one of the two extremities of principal component clines corresponds to the source of a demic expansion [26,37,209,210]. Later, other scholars [211] demonstrated that a PCA cline can also be oriented perpendicular to the axis of expansion. This occurs because PCA maps represent axes of genetic variation, which do not necessarily correspond to genetic clines (i.e., patterns of increasing or decreasing allele frequencies) due to gene surfing, an evolutionary process firstly proposed in an article co-authored by Cavalli-Sforza himself [212]. This process involves an increase in the frequency of genetic variants that are not under selection as a result of the demographic processes observed during a wave of expansion [212-214]. In other words, their work showed that gene surfing can generate genetic clines along the axis of demographic diffusion, which are then represented as PCA clines perpendicular to this axis (because different alleles are fixed or very frequent in different populations due to surfing).
Other aspects changed for the worse, more or less unnoticed. At the end of his career, Cavalli-Sforza looked at the future of this line of research, commenting on the decline of anthropology as a subject among higher education institutions across the USA [202]. This casts a dark shadow on a long-standing discipline, as a similar trend in humanities, including archaeology, can be seen worldwide (e.g., [215-217] for a perspective about the UK). On the other hand, these disciplines themselves are undergoing changes and are under pressure with the development of interdisciplinarity and integration of different data types, as we have highlighted.
But there have also been changes in the right direction. The possibility of combining more and more data was yet another of Cavalli-Sforza's dreams. He pondered on how to “keep up with the mounting volume of data” [136], while “[t]here is a mass of information still hidden in the data” [202]. This is where new bioinformatics tools and the integration of ‘big data’ make an entry. Incorporating ‘big data’ is now a reality, not only with the ever-increasing sequencing data from humans worldwide (pioneered by Cavalli-Sforza’s own HGDP [218]). The last few years have seen the growth of archaeological (e.g., ROCEEH [219]) and cultural (e.g., D-PLACE databases [220]) and the development of large-scale continuous paleoclimatic reconstructions extending up to the out of Africa and before (e.g., [221-226]).
A new direction
In our opinion, the next step in following Cavalli-Sforza’s lead should be more fully integrating climatic records into the picture. Climatic and environmental fluctuations have been long suggested to have significantly shaped human evolution both on a biological basis (by Cavalli-Sforza and colleagues [76,227-229] and others [230-232]), and cultural side (e.g., [233,234]). Despite this, for a long time it hasn’t been possible to properly test for the role of climate in human evolution.
In the last decades, palaeoenvironmental research has seen two significant developments. The first one is the marine isotope revolution of the second half of the twentieth century, revealing the importance of local variation within broader patterns and trends [235,236]. The second one, more recent, has seen the development of continuous paleoclimatic reconstructions covering up to hundreds of thousands or even millions of years (e.g., [222,223,225,226]). Although linking climate changes to events in hominin evolution remains challenging (e.g., [230,237]), it constitutes a key and advancing frontier in human evolutionary research following the vision of Cavalli-Sforza. For example, we are now able to discuss how climate shaped hominin speciation [226], and the impact of climatic adaptation on phenotypes/physiology such as body and brain sizes in the genus Homo [238], even including the diffusion of migraine as an effect of the expansion out of Africa [239] and the development of brown adipose tissue [240].
It is now also possible to test for the role of climate in driving cultural changes (e.g., changes in lithic assemblage following ecological shifts in Central [241] and Eastern Africa [242,243]), or explore the interaction between lithics, environmental conditions, and demographic patterns [244]. The integration of a paleoenvironmental perspective is bringing an additional dimension to the spatio-temporal framework of the structuring of the human species (e.g., [245-248]), and even contributing to shed new light on the very same demic diffusions proposed by Cavalli-Sforza himself (e.g., [102,249]).
As we extensively explored in this article, there has been a call for a better interdisciplinary toolbox, that also requires researchers from very different disciplines to find ways to communicate. While interdisciplinarity is a frequently used word, truly interdisciplinary studies - those that integrate multiple strands of independent evidence - are rare: it is more common for information from one field to be used informally to validate the model results of another discipline (see [250]). This is in part, due to computational complexities, the dynamics of diverse data, and scales and limitations of inference. However, methods are being developed, e.g., informatic tools such as pastclim [251], which provides easy access and manipulation of palaeoclimatic reconstructions, and tidysdm [252] to investigate the distribution of species through time using archaeological and palaeoclimatic data. The underlying philosophy is to grant the ability to integrate multiple analyses and algorithms in a standardised format that is easily accessible to experts from different disciplines. For ancient DNA, new methods of extraction continue to be developed that give hope that greater geographical coverage can be achieved (e.g., [253-255]), and palaeoanthropological research is moving beyond ‘birthplace’ concepts of Homo sapiens [195,256], to investigate the African continent more comprehensively (e.g., [127,257-259]).
Future research will need to be able to handle the expansion in such information, find ways to integrate this to test various model scenarios, and find new methods of overcoming the inevitable lacunae in the record. At the same time, it continues to be crucial to recognise the various limitations of the data, as Cavalli-Sforza did, in individual capacities to answer particular research questions. We predict that the future will bring such increasingly integrated research, together with more blended disciplinary boundaries that will foster new respect, cooperation, and perhaps, the ability to bring us closer to achieving Cavalli-Sforza’s goals of reconstructing the human past using all available parameters, and reaching the widest possible audiences for the results of our research.
Declarations
Ethics Statement
Not applicable.
Consent for Publication
Not applicable.
Availability of Data and Material
Not applicable.
Funding
M.C. and E.M.L.S. were funded by the Lise Meitner Pan-African Evolution Research Group. M.L. was funded by the Leverhulme Research Grant RPG-2020-317. The research of C.S. is supported by the Calleva Foundation and the Human Origins Research Fund.
Competing Interests
The authors have declared that no competing interests exist.
Author Contributions
MC and ML conceptualised the paper with inputs from EMLS. All authors wrote the manuscript and revised the text.
Acknowledgement
The authors thank the anonymous reviewers who, with their comments, improved the quality of the manuscript. We are especially grateful to the Editor Lounès Chikhi who dedicated a significant amount of time to this paper to offer a detailed and constructive comment on the first draft. The authors are also grateful to Andrea Manica for the useful discussions on the topics presented here.
ML would like to thank Guido Barbujani for the many years of guidance and conversations about the concepts considered in this article; and Rino Cella for first introducing the figure of Luca Cavalli-Sforza to her.
MC wishes to thank Mark A. Jobling for the insights on the subject.