{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600 {\fonttbl\f0\fswiss\fcharset0 Helvetica;} {\colortbl;\red255\green255\blue255;} {\*\expandedcolortbl;;} \margl1440\margr1440\vieww10800\viewh8400\viewkind0 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \f0\fs24 \cf0 Hello and thanks for stopping by my poster! My name is Steph - I'm a PhD student in Rajiv McCoy's lab at Johns Hopkins, and I'll be telling you about the work we've been doing on the role of structural variation in human local adapation.\ \ So, humans, like all other species that have migrated into diverse environments, have developed genetic adaptations to these environments. One famous example of this that you might have heard of is the adaptation of some European and African populations to dairy consumption, where there was selection on variants in a region that encodes the lactase enzyme, which allows people to metabolize lactose. So a lot of the research on this phenomenon of local adaptation in humans has focused on single-nucleotide variants, or SNPs, and this is because these are the variants that it's easiest to detect with short-read sequencing.\ \ But this focus on SNPs means that we're missing other classes of larger and more complex variants, which are collectively called structural variants, or SVs. These are large insertions, deletions, and inversions that are very difficult to detect with short-read sequencing due to their size, but they may constitute prominent but hidden targets of positive selection. It's only recently, with the application of long-read sequencing to some human samples, that we've been able to develop a comprehensive and high-quality catalog of SVs in humans.\ \ It would be really interesting to study positive selection and adaptation with SVs in diverse human populations, but long-read sequencing has not really been applied on a population-wide scale. But we do have short-read sequencing databases of thousands of individuals from diverse populations. We set out to combine the SV detection sensitivity of long reads with the quantity and availability of short-read sequencing data, and the way that we did this was through a method called variant graph genotyping, which I'm depicting with this schematic on the bottom left panel.\ \ With graph genotyping, instead of aligning short reads to a linear reference genome and inferring the presence of structural variants from how they align, you create a graph representation of the reference genome that incorporates known variants, including SVs or SNPs, as alternative paths in the graph. You then align the short reads from your sample to the graph along the path of best fit, which lets you determine accurate genotypes for your sample at each of these variants. Using a database of SVs that were discovered from long-read sequencing of 15 diverse human samples, we applied this genotyping method to three populations from the 1000 Genomes Project, using a genotyping software called Paragraph. And I'm currently working on extending this genotyping to all of the populations in 1000 Genomes. Although this method limits me to only genotyping SVs that have been previously discovered from long-read sequencing, variants involved in local adapation should be locally common, and so they're more likely to be present in my SV database even though they were discovered from a small number of samples.\ \ After genotyping SVs in these three populations using Paragraph, we did two quality control steps to filter out variants that were misgenotyped. We required all SVs to be genotyped in at least 98% of samples, and also removed SVs with excess heterozygote calls based on their deviation from Hardy-Weinberg equilibrium. After these filtering steps, we were left with around 90,000 genotyped SVs. We then looked at the distribution of linkage disequilibrium between these SVs and SNPs that were called in these 1000 Genomes samples. In the top middle panel is a histogram showing on the x-axis, an SV's maximum LD with any SNP, and on the y-axis the number of SVs with that max R^2 value. This is really interesting because there are a lot of SVs with low R^2 values, indicating that they are not strongly linked to a SNP, and that suggests that they may have been missed entirely in previous studies of selection and adaptation that only looked at SNP data.\ \ The next thing we did was that we used the population branch statistic, or PBS, to identify SVs with extreme differences in allele frequency between populations, which is one indication of positive selection. We identified 330 SVs with outlier PBS scores of > 0.5, which you can see on this plot on the bottom middle panel, which shows an SV's PBS score on the y-axis and the SV's length on the x-axis. These SVs are also colored by their maximum LD with a SNP, and you can see that among the SVs with high PBS scores, there are many that are unlinked to SNPs. The top outlier in this plot, which is a deletion in the gene SLC35F3, is a positive control SV that was previously identified from short reads in 1000 Genomes, and it shows that our genotyping is working as expected.\ \ I'm going to highlight one particular example SV, which is a 447bp deletion in a gene called PCDH15, which codes for a cadherin protein that's important for retinal and cochlear function. In the top right panel I'm showing the FST-based branch lengths of this SV in the three populations it was genotyped in, along with its allele frequencies in each population, and you can see visually that it has exceptionally high allele frequency in CHB, which is a Han Chinese population. This difference is especially apparent if you compare this SV's tree to the genome-wide average branch length tree on the right, and it suggests that this particular deletion may have undergone positive selection at some point in this Chinese population's history. Below are the aligned reads from a CHB sample that's homozygous for the SV, so that you can visualize it. So this is just one example of one interesting SV, but as you can see from the PBS plot in the previous panel, there are many other potentially interesting outliers, and we're excited to look at those further, and also start establishing thresholds for significance based on neutral demographic simulations.\ \ So in conclusion, we applied variant graph genotyping to genotype structural variants on a population-wide scale, and we're also in the process of extending this genotyping to all the 1000 Genomes populations. Even in just the three populations we have now, we've been able to identify SVs that show signatures of positive selection. And finally, many of the SVs we genotyped were not strongly linked to known SNPs, which suggests that some SVs may constitute previously unknown adaptive loci.\ \ I want to thank my lab and my collaborators for their work and feedback on this project, and if you have any questions, suggestions, or thoughts, feel free to email me! My email is in the bottom right corner of the poster title. Thanks for stopping by!}