To compare the predictions of the models to the observed values of ?ln(?S) for each bin, we corrected for the possible effect on ?S of differences among bins in KS (15). The regression coefficient for ?ln(?S) on KS was estimated by dividing the multiple regression coefficient for ?S on KS for the unbinned data by the mean of KS. We then multiplied this regression coefficient by the difference between mean KS for a bin and the mean of KS over all bins, and company web site adding the product to ?ln(?S) for the bin. We also applied this procedure to the crossing-over rate, because this also had a substantial effect on ?S (SI Appendix, Table S1); although other variables had significant multiple regression coefficients, the effect sizes were small, with the exception of Fop, and they were thus ignored. In Discussion, we provide reasons for disregarding the effect of Fop.
The brand new shared aftereffects of BGS and SSWs was basically modeled because of the a beneficial amendment of your summary model explained over, utilising the are not made presumption one to sweeps is actually good enough rare that the effects various sweeps into the associated site range are addressed due to the fact separate of each almost every other (fourteen, 62), and therefore BGS consequences try independent off sweep consequences (step 3, fourteen, 63) (Si Appendix, part 4, Eq. S16).
Estimating Positive-Possibilities Parameters.
To estimate the parameters of positive selection, the effects of SSWs were included in the predictions of diversity relative to its value in the absence of selection (?0), using SI Appendix, Eq. S16c. This can be used to determine the deviation, devj, between the observed and predicted values of –ln(?/?0) for the jth bin. For a given pair of values of the scaled selection coefficients ?a and ?u for NS and UTR sites for a bin, all of the variables that appear in the second term on the right-hand side of SI Appendix, Eq. S18, can be computed by using the empirical estimates for the bin from DFE-? of the rates of adaptive substitutions for NS and UTR sites (?a and ?u) used in SI Appendix, Eq. S17. For NS sites, we used a model in which ?a was a linear function of the ratio of KA to its maximum value, which yields different ?a estimates for each bin, whereas ?u was assumed to be constant across bins, because the DFE-? analyses described in Primary Data Analyses, suggested that ? for UTRs were constant across bins. We then used SI Appendix, Eq. S18, to search for a set of parameters of positive selection that minimized the sum of squares of devj, SSD, as described after SI Appendix, section 4, Eq. S19.
To track down CIs on the parameter estimates, bootstrapping over genes in this for each and every container is actually achieved. Right here, we made use of grids regarding 7 philosophy of each and every changeable, with only several iterations of one’s research, once the formula minutes was a lot of time (a couple of days on a desktop).
Here, we describe an approach that involves fitting models of both BGS and SSWs to an important aspect of the population genomic data-the negative relation between the level of synonymous nucleotide site diversity (?S) in a Drosophila melanogaster gene and KA, its nonsynonymous site (NS) divergence from a related species, first noted by Andolfatto (15) and confirmed in later studies (16, 17). We used whole-genome polymorphism data on a Rwandan population of D. melanogaster, previously analyzed for different purposes (18 ? –20). By binning genes into sets with similar KA values with respect to divergence from Drosophila yakuba, or along the D. melanogaster lineage since its divergence from its closest relative Drosophila simulans, we estimated the parameters of the DFE and the extent of positive selection on NS sites for bins with different KA values. We also estimated these parameters for untranslated regions (UTRs) of coding sequences, which show levels of selective constraint that are intermediate between those for synonymous and nonsynonymous sites (21).
Potential Results of BGS Alone
This plots the theoretical values of mean E (percent) against values of mean ?na (percent) for the standard model of a single gene with five exons of 100 codons each; a gamma distribution of selection coefficients with ? = 0.3 was assumed, with ?c = 5. For the results obtained by the summation method (red and blue solid lines), the exons were separated by four introns of 100 bp. For the results obtained from the integral model (black and green dashed lines), a continuous stretch of coding sequence was assumed. The green and blue lines show the net BGS effects arising from both NS and UTR sites; the black and red lines show the effects for NS sites alone. Two-thirds of coding sites were assumed to result in NS mutations. The rate of crossing over per base pair was 1 ? 10 ?8 , and the mutation rate was 4.5 ? 10 ?9 per base pair. The gene conversion parameters for the low gene conversion case (A) were gc = 1 ? 10 ?8 and dg = 440; for the high gene conversion case (B), gc = 5 ? 10 ?8 and dg = 500. No large effect mutations were allowed.
The black diamonds are the observed values of ?ln(?S) for each bin of KA values for autosomes, corrected for the correlation between ?S and KS as described in Materials and Methods, Primary Data Analyses. The circles are the theoretical values of mean E for each bin, obtained by the integral model of BGS, assuming a single gene with 500 NS sites. The crosses are the predicted values of ?ln(?S) for each bin, given by the combined BGS and SSW models at NS and UTR sites. Red and blue correspond to the low and high gene conversion rates used in Fig. 2. The mutation rate and crossing-over parameters are as in Fig. 2, except that large effect mutations constitute 15% of all mutations, with a selection coefficient against heterozygotes of 0.044.
We assumed constancy, across bins of the scaled selection coefficient for positively selected UTR mutations, ?u, given the weak observed relations between KA and ? for UTRs (Materials and Methods, Primary Data Analyses). For NS sites, we used a model in which the scaled selection coefficient for positively selected mutations, ?a, was linearly related to the ratio of KA to its maximum value, yielding different ?a estimates for each bin. The intercept and slope of this model, together with ?u, provide three parameters to be estimated by fitting the predictions to the data. As described in Materials and Methods, the parameter estimates were obtained by minimizing the sum of squares (SSD) of deviations between the predicted and observed values of ?ln(?S) for each bin for all but the first bin; this bin was used to estimate the value of –ln(?0). Given the estimates of ?a and ?u, together with the empirical estimates of the rates of adaptive substitutions of favorable NS and UTR mutations (?a and ?u), the proportions of new NS and UTR mutations (pa and pu) that are beneficial can be obtained from SI Appendix, Eq. S19, as in ref. 33.