|Test Date:||December 28, 2016|
|Report Short Url:||http://phylos.me/g2217|
Closest Genetic Relatives
Genetic Novelty Score
Genetically Distant Varieties
The Genotype Seal
Colors - the colors used on the seal are the same as those used to define different population profiles in the report—e.g, green for Landrace and red for Skunk.
Data - the data represent a set of genetic variants that are most predictive of each population—varieties that are predominantly skunk will have mostly dark red because they match the profile of that population and they will have a mixture of light and dark colors for the other populations.
Comparison - the more genetically similar two varieties are, the more similar their seals will look.
URL - the URL takes you to the genetic report for the variety
Lines between samples show family relationships using a metric called identity by descent (IBD) (plink v1.9; Purcell et al. 2007). Specifically, the proportion IBD between two samples is the sum of probability of sharing both alleles at a locus (e.g., AA, AA) and 1/2 of the probability of sharing a single allele at a locus (e.g., AA, Aa). We use a second metric called genetic similarity, to identify putative family members as well as to identify clones. Genetic similarity is simply the number of shared alleles divided by the total number of alleles compared between two samples. Clones are designated as those samples that have extremely high genetic similarity. We set the threshold to be slightly higher than the average technical error rate of genotyping. Based on analysis of known pedigrees, we have set thresholds for likely immediate family relationships. Familial relationships are useful for several reasons. Many varieties within the Galaxy have the same name, yet are genetically distinct. Familial relationships may help determine whether a variety has been mislabeled. They may also help validate pedigrees of varieties.
Genetic Novelty Score
Genetic Novelty Score indicates how many other varieties are nearby on the Phylos Galaxy. The density of surrounding varieties is a proxy for the commonality of a genetic background; rare genotypes have very few neighbors on the Phylos Galaxy. Neighborhood scores are based on a radial density metric which is the density of varieties on the Phylos Galaxy within cubes of increasing volume (much like layers of an onion). The proportion of Phylos Galaxy samples in each of the three radial density score categories is shown along the horizontal bar. Samples considered rare are in the 90th percentile of lowest radial density scores, uncommon samples are 70th-90th percentile. Locations of varieties on the Phylos Galaxy are based on a Principal Components Analysis (PCA) (plink v1.9; Purcell et al. 2007). PCA is a dimension reduction technique for analyzing multivariate data sets, like this genetic variant data set. PCA finds linear combinations of variables that maximize the variance explained by the data; each linear combination is one PC and is orthogonal to other PCs.
Population Structure is estimated using the program Admixture (Alexander et al. 2009), which is a model-based method that uses genotype data to infer population structure and assign individuals to populations. Population structure will evolve as more samples are added to the Phylos Galaxy.
The Genetic Variation plot shows the distribution of heterozygosity levels for varieties in the Phylos Galaxy. Most samples that are drug varieties have high levels of genetic variation, which is possibly as a result of hybridization between divergent parents. A breeder of cannabis may use multiple rounds of inbreeding to select for and stabilize the phenotype of a variety in order to sell seeds or grow from seeds instead of clones. Each generation could be genotyped to measure the progress of inbreeding, which helps estimate the expected phenotypic variation of progeny. Genetic Variation is estimated by the inbreeding coefficient (F) using VCFtools (Danecek P, et al. 2011).
Genetically Distant Varieties
Genetically Distant varieties are genetically divergent from the node (sample) of interest. In order to select a diversity of distant neighbors, we implemented an algorithm that maximizes the sum of the distance between the node of interest as well as chosen distant neighbors; this is similar to the heuristic Furthest Point First algorithm (Gonzalez 1985). The distances used are from a Principal Components Analysis (PCA) (plink v1.9; Purcell et al. 2007). PCA is a dimension reduction technique for analyzing multivariate data sets, like this genetic variant data set. PCA finds linear combinations of variables that maximize the variance explained by the data; each linear combination is one PC and is orthogonal to other PCs. This widget plots genetically distant varieties along the top two principal components.
Redirecting to Galaxy...
We are redirecting you to the Phylos Galaxy view for SoHum Seeds's Harle Tsu in 5 seconds.
Click Cancel or hit ESC to stay on the Genotype Report.