S and KAn effective using the R function “pcor.test,” available at yilab.gatech.edu/pcor.R, with 95% CIs obtained by bootstrapping across genes. We used the following statistics for each gene as covariates: codon use bias, measured as the proportion of optimal codons (Fop); synonymous divergence (KS), a proxy for the mutation rate of each gene; gene expression, measured using RNAseq (average logdos reads per kilobase of transcript per million mapped reads across all developmental stages of D. melanogaster); smoothed effective rates of crossing over (centimorgan per megabase) from Loess regression fits to the rates of crossing over from the D. melanogaster data of ref. 22, Cary escort service multiplied by one-half to correct for the absence of recombination in males; GC content of short introns (<80 bp); the coding sequence (CDS) length of each gene. For further details concerning these measures, see refs. step one8, 20, and 58.
For the analyses of NS sites, we applied DFE-? (24) to each of 52 (mel-yak) or 53 (mel) sets of genes binned by KA, to estimate the following parameters: ?, the shape parameter of a gamma distribution of the heterozygous fitness effects of deleterious NS mutations (the DFE); ?, the proportion of adaptive substitutions; ?a = ? KA/KS, the rate of adaptive substitutions for NS mutations relative to the neutral rate; ?na = (1 – ?) KA/KS, the rate of nonadaptive substitutions (due to fixations of neutral or slightly deleterious mutations) relative to the neutral rate (26). Binning was necessary, because DFE-? parameter estimates for single genes are very imprecise.
S family members (hill = –0
We ensured that each bin included at least 50 genes (SI Appendix, Table S2) and removed bins with negative estimates of ? (the bin with the lowest KA in each case). We also excluded the last bin of mel-yak and the last two bins of mel, which contained genes with a broad range of very high KA values that yielded anomalous estimates of ? and/or ?; this excluded only 74 and 81 genes for the mel-yak and mel datasets, respectively. This left a total of 50 bins for each NS dataset: 6,748 genes for mel-yak, and 5,397 genes for mel (SI Appendix, Table S2). The binned data gave similar linear regression coefficients for the relation between ?S and mean KA for a bin (–0.026, mel-yak; –0.179, mel) to those for the unbinned KA ? ?028, mel-yak; –0.177, mel).
KYou
To run DFE-?, we used a demographic model where the population at initial size N1 (set to 100) experienced a step change to N2 at n generations in the past. We also generated replicate bootstrap estimates of all of the variables for each bin pling genes 1,000 times within a given bin and running DFE-? for each bootstrap.
We also applied DFE-? to UTRs for genes binned by KA, for both the mel-yak and mel data (SI Appendix, Table S3). (Note that these bins do not contain exactly the same genes as in the NS site analyses.) Least-squares quadratic regressions gave little evidence for significant relations between KA and the prieters, ? and ?, for the two types of UTR. The linear regression coefficient on KA for ? for 5?-UTRs for mel-yak had P = 0.041; no other coefficient approached significance. Given the number of tests, and the fact that normal distribution tests are likely to exaggerate significance, it seems safe to treat ? and ? for UTRs as independent of KA, which reduces the complexity of the models used in the data analyses.
values were, however, strongly related to KA for the binned data. For mel-yak, the quadratic regressions of on KA were y = 0.068 + 1.54x – 6.08x 2 for 3?-UTRs and y = 0.0087 + 0.641x – 1.901x 2 for 5?-UTRs. For mel, the regressions were y = 0.016 + 1.12x – 27.4x 2 for 3?-UTRs, and y = 0.019 + 0.330x for 5?-UTRs (there was no evidence for a significant quadratic term in this case). For all coefficients shown, P < 0.005. These were used to obtain values of for the KA bins used in the BGS and SSW models in the final data analyses described below.