Sampling, DNA preparation and sequencing


Samples LOW002, LOW003, LOW006, LOW007, LOW008 and PON012 have been processed on the Archaeological Analysis Laboratory at Stockholm College, Sweden, following strategies beforehand described8. Briefly, this concerned extracting DNA by incubating the bone powder for twenty-four h at 37 °C in 1.5 ml of digestion buffer (0.45 M EDTA (pH 8.0) and 0.25 mg ml–1 proteinase Ok), concentrating supernatant on Amicon Extremely-4 (30-kDa molecular weight cut-off (MWCO)) filter columns (MerckMillipore) and purifying on Qiagen MinElute columns. Double-stranded Illumina libraries have been ready utilizing the protocol outlined in ref. 48, with the inclusion of USER enzyme and the modifications described in ref. 49.

Samples 367, PDM100, Taimyr-1 and Yana-1 have been processed on the Swedish Museum of Pure Historical past in Stockholm, Sweden, following beforehand described strategies8. Briefly, this concerned extracting DNA utilizing a silica-based methodology with focus on Vivaspin filters (Sartorius), in response to a protocol optimized for restoration of historical DNA50. Double-stranded Illumina libraries have been ready utilizing the protocol outlined in ref. 48, with the inclusion of USER enzyme.

Samples ALAS_024, VAL_033, ALAS_016, VAL_008, HMNH_007, HMNH_011, VAL_050, VAL_005, DS04, VAL_037, VAL_012, VAL_011, VAL_18A, IN18_016 and IN18_005 have been processed on the Swedish Museum of Pure Historical past in Stockholm, Sweden, following beforehand described strategies for permafrost bone and tooth samples51. Briefly, this concerned DNA extraction utilizing the methodology of ref. 52 and double-stranded Illumina library preparation as described in ref. 48, with twin distinctive indexes and the inclusion of USER enzyme. Between eight and ten separate PCR reactions with distinctive indexes have been carried out for every pattern to maximise library complexity. The libraries have been sequenced alongside samples HOV4, AL2242, AL2370, AL2893, AL3272 and AL3284 throughout three Illumina NovaSeq 6000 lanes with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.


Samples JAL48, JAL65, JAL69, JAL358, AH574, AH575 and AH577 have been processed on the College of Potsdam. Pre-amplification steps (DNA extraction and library preparation) have been performed in separated laboratory rooms specifically outfitted for the processing of historical DNA. Amplification and post-amplification steps have been carried out in numerous laboratory rooms. DNA was extracted from bone powder (29–54 mg) following a protocol specifically tailored to get better quick DNA fragments52. Single-stranded double-indexed libraries have been constructed from 20 µl of DNA extract in response to the protocol in ref. 53. The libraries have been sequenced on an HiSeq X platform at SciLifeLab in Stockholm.


Samples JK2174, JK2175, JK2179, JK2181, JK2183, TU144, TU148, TU839 and TU840 have been processed on the College of Tübingen, with DNA extraction and pre-amplification steps undertaken in clear room amenities and post-amplification steps carried out in a separate DNA laboratory. Each laboratories fulfil requirements for work with historical DNA54,55. All surfaces of tooth and bone samples have been initially UV irradiated for 30 min, to reduce the potential danger of recent DNA contamination. Subsequently, DNA was extracted by making use of a well-established guanidine silica-based protocol for historical samples52. Illumina sequencing libraries have been ready through the use of 20 µl of DNA extract per library48; afterwards, twin barcodes (indexes) have been chemically added to the prime ends of the libraries56. For the samples from Auneau (TU839 and TU840), 5 sequencing libraries every have been ready; for all different samples processed in Tübingen, three sequencing libraries every have been ready. To detect potential contamination of the chemical compounds, adverse controls have been performed for extraction and library preparation. After preparation of the sequencing libraries, DNA focus was measured with qPCR (Roche LightCycler) utilizing corresponding primers48. The DNA focus was given by the copy variety of the DNA fragments in 1 µl of the pattern.

Amplification of the listed sequencing libraries was carried out utilizing Herculase II Fusion below the next situations: 1× Herculase II buffer, 0.4 µM IS5 primer and 0.4 µM IS6 primer48, Herculase II Fusion DNA polymerase (Agilent Applied sciences), 0.25 mM dNTPs (100 mM; 25 mM every dNTP) and 0.5–4 µl barcoded library as template in a complete response quantity of 100 µl. The utilized amplification thermal profile was processed as follows: preliminary denaturation for two min at 95 °C; denaturation for 30 s at 95 °C, annealing for 30 s at 60 °C and elongation for 30 s at 72 °C for 3 to twenty cycles; and a last elongation step for five min at 72 °C. Thereafter, the amplified DNA was purified utilizing a MinElute purification step and DNA was eluted in 20 µl TET. The focus of the amplified DNA sequencing libraries was measured utilizing a Bioanalyzer (Agilent Applied sciences) and a DNA1000 lab chip from Agilent Applied sciences.

The sequencing libraries have been sequenced on an Illumina HiSeq 4000 platform on the Max Planck Institute for Science of Human Historical past in Jena. The samples from Auneau (TU839 and TU840) have been paired-end sequenced making use of 2 × 50 + 8 + 8 cycles. All different libraries ready in Tübingen have been single-end sequenced utilizing 75 + 8 + 8 cycles.


Samples AL2657, AL2541, AL2741, AL2744, AL3185, AL2350, CH1109, AL2370, AL3272 and AL3284 have been processed on the devoted historical DNA facility on the PalaeoBARN laboratory on the College of Oxford, following strategies described beforehand8. Briefly, double-stranded libraries have been constructed following the protocol in ref. 48. These libraries have been sequenced on a HiSeq 2500 (AL2657, AL2541, AL2741, AL2744) or a HiSeq 4000 (AL3185, AL2350, CH1109) instrument on the Danish Nationwide Sequencing Heart or on a NextSeq 550 instrument (AL2741) on the Pure Historical past Museum of London. For samples AL2370, AL3272 and AL3284, between six and eight separate PCR reactions with distinctive indexes have been carried out on their libraries and so they have been sequenced alongside samples HOV4, VAL_18A and IN18_016 on an Illumina NovaSeq 6000 lane with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.


Samples CGG13, CGG17, CGG19, CGG20, CGG21, CGG25, CGG26, CGG27, CGG28, CGG34, Tumat1 and IRK have been processed on the GLOBE Institute, College of Copenhagen. All pre-PCR work was carried out in historical DNA amenities following historical DNA tips57. The small print of extraction, library building and sequencing for the samples with CGG codes are described in ref. 21, in relation to the publication of mitochondrial knowledge from these specimens. The Tumat1 pattern was processed following the very same protocol. Briefly, DNA extraction was carried out utilizing a buffer containing urea, EDTA and proteinase Ok50, double-stranded libraries have been ready with NEBNext DNA Pattern Prep Grasp MixSet 2 (E6070S, New England Biolabs) and Illumina-specific adaptors48, and sequencing was carried out on an Illumina HiSeq 2500 platform utilizing 100-bp single-read chemistry. For the IRK pattern, DNA was extracted from three subsamples and purified as described in ref. 21. The three DNA extracts and the purified pre-digest of 1 subsample have been integrated into double-stranded libraries following the BEST protocol58, with the modifications described in ref. 59, and sequenced on a BGISEQ-500 platform utilizing 100-bp single-read chemistry.

Santa Cruz

Samples SC19.MCJ017, SC19.MCJ015, SC19.MCJ010 and SC19.MCJ014 have been processed on the UCSC Paleogenomics Lab and have been supplied by the Yukon Authorities Paleontology program. All pre-PCR work was carried out in a devoted historical DNA facility on the College of California, Santa Cruz, following customary historical DNA strategies60. Subsamples (250–350 mg) have been despatched to the UCI KECK AMS facility for radiocarbon courting, and the remaining quantities have been powdered in a Retsch MM400 for extraction. For every pattern, ~100 mg of powder was handled with a 0.5% sodium hypochlorite answer earlier than extraction to take away floor contaminants61 after which mixed with 1 ml lysis buffer for extraction, following the protocol in ref. 52. Samples have been processed in parallel with a adverse management. We quantified the extracts utilizing a Qubit 1× dsDNA HS Assay equipment (Q33231) earlier than making ready libraries. We ready single-stranded libraries following the protocol in ref. 62 and amplified the libraries for 9–16 cycles as knowledgeable by qPCR. After amplification, we cleaned the libraries utilizing a 1.2× SPRI bead answer and pooled them to an equimolar ratio for in-house shallow quality-control sequencing on a NextSeq 550 paired-end 75-bp run. We then despatched the libraries to Fulgent Genetics for deeper sequencing on two paired-end 150-bp lanes on a HiSeq X instrument.


Pattern HOV4 was processed on the Division of Anthropology, College of Vienna. The pattern is a canine tooth, which after sequencing was decided to derive from a dhole (Cuon alpinus). DNA was extracted from its cementum utilizing the strategies described in ref. 63 with a modified incubation time of ~18 h. The library was ready in response to the protocol in ref. 48 with the modifications from ref. 64. 5 separate PCR reactions with distinctive indexes have been carried out on the library and have been sequenced alongside samples VAL_18A, IN18_016, AL2242, AL2370, AL2893, AL3272 and AL3284 on an Illumina NovaSeq 6000 lane with an S4 100-bp paired-end set-up at SciLifeLab in Stockholm.

An outline of all samples and their related metadata is obtainable in Supplementary Knowledge 1.

Genome sequence knowledge processing

For paired-end knowledge, learn pairs have been merged and adaptors have been trimmed utilizing SeqPrep (, discarding reads that would not be efficiently merged. Reads have been mapped to the canine reference genome canFam3.1 utilizing BWA aln (v.0.7.17)65 with permissive parameters, together with a disabled seed (-l 16500 -n 0.01 -o 2). Duplicates have been eliminated by protecting just one learn from any set of reads that had the identical orientation, size and begin and finish coordinates. For pattern Taimyr-1, beforehand revealed knowledge13 have been merged with newly generated knowledge. Knowledge from samples processed in Copenhagen have been processed as described beforehand66 besides that they have been additionally mapped to canFam3.1. Publish-mortem harm was quantified utilizing PMDtools (v0.60)67 with the ‘–first’ and ‘–CpG’ arguments.

Genotyping and integration with beforehand revealed genomes

To assemble a comparative dataset for inhabitants genetic analyses, we began from a broadcast variant name set compiling 722 fashionable canine, wolf and different canid genomes from a number of earlier research (NCBI BioProject accession PRJNA448733)40. To this, we added extra fashionable complete genomes from different research: 4 African golden wolves and 15 Nigerian village canine (Genome Sequence Archive (, accession PRJCA000335)68, 12 Scandinavian wolves (European Nucleotide Archive accession PRJEB20635)69, 9 North American wolves and coyotes (European Nucleotide Archive accession PRJNA496590)25 and eight different canids (African looking canine, dhole, Ethiopian wolf, golden jackal, Center Japanese gray wolves) (European Nucleotide Archive accession PRJNA494815)22. Reads from these genomes have been mapped to the canine reference genome utilizing bwa mem (model 0.7.15)70, marked for duplicates utilizing Picard Instruments (v2.21.4) (, genotyped on the websites current within the above dataset utilizing GATK HaplotypeCaller (v3.6)71 with the ‘-gt_mode GENOTYPE_GIVEN_ALLELES’ argument after which merged into the dataset utilizing bcftools merge ( The next filters have been then utilized to websites and genotypes throughout the complete dataset: websites with extra heterozygosity (bcftools fill-tags ‘ExcHet’ P worth < 1 × 10−6) have been eliminated; indel alleles have been eliminated by setting the genotype of any particular person carrying such an allele to lacking; genotypes at websites with a depth (taken because the sum of the ‘AD’ VCF fields) lower than a 3rd of or greater than twice the genome-wide common for the given genome or decrease than 5 have been set to lacking; genotypes containing any allele aside from the 2 highest-frequency alleles on the web site have been set to lacking; allele illustration was normalized utilizing bcftools norm; and, lastly, websites at which 130 or extra people had a lacking genotype have been eliminated. This resulted in a last dataset of 67.8 million biallelic SNPs. In ancestry analyses (that’s, these involving f-statistics), fashionable wolves have been handled as people whereas for contemporary canine as much as 4 people with the very best sequencing protection from a given breed have been used and mixed into populations. A listing of the trendy genomes utilized in analyses and their related metadata is included in Supplementary Knowledge 2.

All historical genomes have been assigned pseudo-haploid genotypes on the variant websites within the above dataset utilizing htsbox pileup r345 (, requiring a minimal learn size of 35 bp (‘-l 35’), mapping high quality of 20 (‘-q 20’) and base high quality of 30 (‘-Q 30’). If an historical genome carried an allele not current within the dataset, its genotype was set to lacking. Beforehand generated historical and historic wolf and canine genomes mapped to the canine reference have been obtained from the respective publications3,7,8,13,17,66,72,73 (European Nucleotide Archive research accessions PRJEB7788, PRJEB13070, PRJNA319283, PRJEB22026, PRJNA608847, PRJEB38079, PRJEB39580, PRJEB41490) and genotyped in the identical manner. A listing of the traditional genomes utilized in analyses and their related metadata is included in Supplementary Knowledge 2.

Mitochondrial genome phylogenetic evaluation and evolutionary courting

We extracted reads mapped to the mitochondrial genome for the traditional wolf samples utilizing samtools (v1.9)74. We referred to as consensus sequences utilizing a 75% threshold, calling any websites with protection lower than 3 as ‘N’, utilizing Geneious (v9.0.5) and eliminated any samples with larger than 10% lacking knowledge. We included a set of beforehand revealed mitochondrial genomes from historical and fashionable wolves5,9,13,21,75,76,77,78,79,80, which led to a last dataset of 183 people (62 14C-dated historical people, 24 undated historical people of which 7 had infinite 14C dates, and 90 fashionable people). We additionally included three coyote-like sequences as outgroups (from one fashionable coyote and two historical wolves with coyote-like mitochondrial sequences: SC19.MCJ015, 14C dated, and SC19.MCJ017, with an infinite 14C date). We aligned all sequences utilizing Clustal Omega (v1.2.4)81. A Bayesian phylogeny was constructed utilizing BEAST (v1.10.1)82, with an HKY + I + G substitution mannequin chosen by JModelTest2 (v2.1.10)83, uncorrelated relaxed log-normal clock and coalescent fixed dimension tree prior. We mixed 20 MCMC chains (every run for 200 million iterations), after excluding the primary 25% of values as a burn-in. For 14C-dated samples, we included tip date priors that corresponded to a traditional distribution with the identical imply and 95% confidence distribution because the 14C dates. We estimated the ages of undated samples from a previous distribution as follows: (1) for the n = 24 historical samples with no 14C data, we used a uniform prior of 0 to 1,000,000 years earlier than the current (bp); (2) for the n = 7 historical samples with infinite 14C dates, we used a uniform prior as in (1), however with the decrease restrict because the minimal date given by the radiocarbon courting; (3) all n = 90 fashionable samples had already been revealed beforehand21, and the tip date priors for these samples have been the identical because the uniform priors used within the earlier research (both 0 to 100 or 0 to 500 bp). The mitochondrial consensus sequences for the wolf samples newly reported right here (excluding people who have been eliminated as a result of they’d an excessive amount of lacking knowledge) can be found as Supplementary Knowledge 4.

f-statistics and admixture graphs

f3– and f4-statistics have been calculated with ADMIXTOOLS (v5.0)84, utilizing solely transversion websites and with the ‘numchrom: 38’ argument. To beat reminiscence limitations when calculating giant numbers of f4-statistics, block jackknifing was carried out exterior to ADMIXTOOLS throughout 225 blocks of 10 Mb in dimension. Admixture graphs have been match utilizing qpGraph, with arguments ‘outpop: NULL’, ‘useallsnps: NO’, ‘blgsize: 0.05’, ‘forcezmode: YES’, ‘lsqmode: NO’, ‘diag: 0.0001’, ‘bigiter: 6’, ‘hires: YES’ and ‘lambdascale: 1’. Outgroup f3-statistics have been calculated utilizing solely websites ascertained to be heterozygous within the CoyoteCalifornia particular person.

PCA was carried out on outgroup f3-statistics by remodeling the values to distances by taking 1 – f3 after which operating the prcomp R operate on the ensuing distance matrix. Solely historical wolves have been included within the calculation of PCs; present-day wolves and historical and present-day canine have been then individually projected onto the PCs by re-running the evaluation as soon as for every of those people independently with that single particular person added in and saving its coordinates. To keep away from overloading the plot with canine, solely the next canine have been included: Basenji, Boxer, BullTerrier, NewGuineaSingingDog, SiberianHusky, Germany.HXH (7,000 bp), Germany.CTC (4.7 ka), Eire.Newgrange (4,800 bp), Israel.THRZ02 (7,200 bp), Baikal.OL4223 (6,900 bp), Zhokhov.CGG6 (9,500 bp) and PortauChoix.AL3194 (4,000 bp).

PCA was carried out on f4-statistics by remodeling the values to pairwise distances by taking (sqrt{2times (1-r)}), the place r is the Pearson correlation for a given pair of people, after which operating the ppca operate from the pcaMethods (v1.74.0) R package deal on the ensuing distance matrix. For the ‘pre-LGM PCA’ (Fig. 4a and Prolonged Knowledge Fig. 2), solely all attainable f4-statistics of the shape f4(X,A;B,C) have been included, the place X was the post-25 ka and present-day people included within the plot and A, B and C have been drawn from a reference set of historical wolves that lived earlier than 28 ka. For every X, the enter was thus a vector of f4-statistics that quantified its relationships to pre-LGM wolves. Solely wolves (post-25 ka and current day) have been included within the calculation of PCs, and historical and present-day canine have been then individually projected onto the PCs as described above.

Heterozygosity and F
ST estimates

Conditional heterozygosity was estimated at 1,250,173 transversion websites ascertained to be heterozygous within the CoyoteCalifornia particular person, chosen as a result of it’s largely an outgroup to wolf range. For every particular person, precisely two reads have been sampled at every of those websites (if out there), and the fraction of web sites the place these two reads displayed totally different alleles was calculated (alleles aside from the 2 noticed within the coyote have been ignored). Normal errors have been obtained by block jackknifing throughout the 38 chromosomes.

FST was calculated with smartpca from the EIGENSOFT (v7.2.1) package deal85, utilizing the ‘inbreed: YES’ choice to account for the pseudohaploid genotypes of the traditional genomes (this selection was additionally utilized to present-day diploid genomes). FST was calculated pairwise for swimming pools of at the least two genomes, fashioned from people chosen for being shut in time and area (Supplementary Desk 1). A number of pairs of people confirmed excessive similarity indicating attainable relatedness, as assessed by evaluating learn mismatch charges throughout versus inside people, and one particular person from every of those pairs was excluded from these analyses (JK2174 was excluded due to excessive similarity to JK2183, TU839 due to excessive similarity to TU840, and CGG17 due to excessive similarity to Yana-1). FST values for pairs of swimming pools with age midpoints separated by lower than 12,500 years have been included within the plot.

Divergence time and efficient inhabitants dimension analyses with MSMC2

We used MSMC2 (v2.1.2)26 to deduce inhabitants divergence occasions and efficient inhabitants dimension histories. Enter genotypes for this have been referred to as utilizing GATK HaplotypeCaller (v3.6)86 on historical and fashionable genomes with sequencing protection >5.8×. For divergence time analyses, haploid X chromosomes from two totally different male genomes have been mixed and the purpose at which the inferred efficient inhabitants dimension for this ‘pseudodiploid’ chromosome elevated sharply upwards was taken to correspond to a inhabitants divergence. Outcomes have been scaled utilizing a mutation price of 0.4 × 10−8 mutations per web site per era13,87 (with a 25% decrease price for X-chromosome analyses) and a imply generational interval of three years13. For efficient inhabitants dimension inferences, transition variants have been ignored and outcomes have been scaled utilizing a transversions-only mutation price inferred from outcomes on fashionable genomes. For extra particulars on the MSMC2 analyses, see Supplementary Data part 3.

Choice analyses

Choice evaluation was carried out utilizing PLINK (v1.90b5.2)88. This evaluation used the 72 historical wolf genomes and 68 fashionable wolf genomes (with the latter together with a historic Japanese wolf genome73 handled as historical for evaluation functions, with its age set to 200 bp). A listing of the genomes used for this evaluation is obtainable in Supplementary Knowledge 2 (“Used for choice scan” column). All SNPs, not solely transversions, have been used for this evaluation. The age of every wolf was set because the phenotype, with values of 0 for contemporary wolves, and the ‘–linear’ argument was used to check for an affiliation between SNP genotypes and age, additionally making use of the ‘–adjust’ argument to right P values utilizing genomic management. The applying of genomic management34 right here aimed to make use of the magnitude of temporal allele frequency variance noticed throughout the genome to account for what was noticed from genetic drift alone given wolf demographic historical past. Solely outcomes for the next units of web sites have been retained and included within the Manhattan plot: websites the place at the least 40 historical genomes had a genotype name, websites with a minor allele frequency among the many historical wolves of ≥5% and websites that had at the least 7 neighbouring websites inside a 50-kb window with a P worth that was at the least 90% as giant (on a log10 scale) because the P worth of the location itself. The final ‘neighbourhood filter’ aimed to scale back false positives by requiring comparable proof throughout a number of close by websites. As a P-value significance cut-off to right for the genome-wide testing, we used 5 × 10−8, which is usually utilized in genome-wide affiliation research in people and in addition in canine89. We excluded 15 areas the place solely a single variant reached significance. An in depth desk with the 24 detected areas is obtainable in Supplementary Knowledge 3. To check the robustness of this evaluation to false positives arising from genetic drift alone, we utilized the identical evaluation to knowledge from impartial coalescent simulations generated utilizing ms90 and located no false positives. For extra particulars, see Supplementary Data part 4.

Ancestry modelling with qpAdm and qpWave

We used the qpAdm and qpWave strategies43 from ADMIXTOOLS (v5.0)84 to check ancestry fashions for wolf and canine targets postdating 23 ka. For the first analyses, we used the next set of candidate supply populations (age estimate in brackets, years bp): Armenia_Hovk1.HOV4 (historical dhole), Siberia_UlakhanSular.LOW008 (70,772), Germany_Aufhausener.AH575 (57,233), Siberia_BungeToll.CGG29 (48,210), Germany_HohleFels.JK2183 (32,366), Siberia_BelayaGora.IN18_016 (32,020), Yukon_QuartzCreek.SC19.MCJ010 (29,943), Altai_Razboinichya.AL2744 (28,345), Siberia_BelayaGora.IN18_005 (18,148) and Germany_HohleFels.JK2179 (13,229). We used a rotating strategy during which, for every goal, we examined all attainable one-, two- and three-source fashions that could possibly be enumerated from the above set. People from the set that weren’t used as a supply in a given mannequin served as thereference set (or the ‘proper’ inhabitants within the qpAdm framework). Which means, in each mannequin, every of the above people was all the time both within the supply record or within the reference record. We ranked fashions on the idea of their P values, however prioritized fashions with fewer sources utilizing a P-value threshold of 0.01: if a less complicated mannequin (that means a mannequin with fewer sources) had a P worth above this threshold, it ranked above a extra advanced mannequin (that means a mannequin with extra sources) whatever the P worth of the latter. We additionally failed fashions with inferred ancestry proportions bigger than 1.1 or smaller than −0.1. For single-source fashions, qpWave was run as a substitute of qpAdm. Each packages have been run with the ‘allsnps: YES’ choice (with out this selection, there was little or no energy to reject fashions). We describe ancestry assigned to the traditional dhole supply (Armenia_Hovk1.HOV4) as ‘unsampled’ ancestry; word that this doesn’t indicate that such ancestry is of non-wolf origin, solely that it isn’t represented by (that’s, diverged early from and lacks shared genetic drift with) the traditional wolf genomes within the reference set.

To check whether or not any post-23 ka or fashionable wolf genome out there is likely to be a superb proxy for the western Eurasian wolf-related ancestry recognized in Close to Japanese and African canine, we added the 9,500-year-old Zhokhov canine17 to the rotating set of candidate supply populations. Chosen for its excessive protection, early date and easterly location, this makes the idea that the Zhokhov canine is an efficient consultant for the japanese canine ancestry part. Utilizing the African Basenji canine as a goal, fashions involving the Zhokhov canine plus one other given wolf thus allowed us to check whether or not that wolf was a superb match for the extra part of ancestry. For extra particulars on the qpAdm and qpWave analyses, see Supplementary Data sections 2 (wolf targets) and  5 (canine targets).

Reporting abstract

Additional data on analysis design is obtainable within the Nature Analysis Reporting Abstract linked to this paper.



