Evolutionary genomics of centromeric satellites in House Mice (Mus)
Centromeres execute a conserved role in kinetochore assembly and chromosome segregation. Despite their important functional roles, association studies currently ignore megabases of DNA that spans each centromere because their repetitive sequence content makes them refractory to assembly and analysis using short-read sequencing methods. This has left a requirement to define and characterize centromere variation at the population level. To address these critical knowledge gaps, we used data from diverse house mice (genus Mus) to develop a bioinformatic k-mer based strategy using whole genome shotgun read libraries to quantify centromere copy number and sequence variation. We applied this approach to a sample of 33 laboratory mouse strains and 67 wild-caught mice from 9 diverse mouse (Mus) populations and two divergent Mus species (Mus caroli and Mus pahari). Inbred laboratory strains exhibit striking differences in the relative copy number of minor (core centromere) satellite repeats in their genomes. Surprisingly, centromere satellite copy number divergence does not mirror the known phylogenetic relationships between inbred mouse strains. In addition to copy number differences, our analysis uncovers centromere satellite sequence polymorphisms among house mouse strains and subspecies. These differences demonstrate substantial turnover of centromere satellite repeat composition on short evolutionary time scales. Using a de-novo assembly strategy with highly abundant k-mers, we define, for the first time, a centromeric consensus sequence for distant species Mus pahari. Lastly, we uncovered phenotypic associations by correlating chromosomal instability phenotypes with centromeric satellite copy number. These results highlight the power of k-mer based methods for inferring variation in sequence content and structure of repetitive and dynamic genomic regions and provide the first in-depth, phylogenetic portrait of centromere sequence evolution across Mus.