Skip to main content
Fig. 2 | BMC Methods

Fig. 2

From: scAI-SNP: a method for inferring ancestry from single-cell data

Fig. 2

PCA of the training data and the heterogeneity within each population group. a The first two principal components of the 1KGP data. The marker shape and colors capture the 26 different population groups, with a different color used for each of the 5 different geographic regions. The PCA plot shows a clear separation of the points by the 5 different regions. However, the PCA plot also shows the admixed nature of the populations and that many of the population groups form a continuum of points as opposed to distinct clusters. b After splitting the data into training-test (80–20%), we converted each test data point to a vector in the 600-dimensional PCA space obtained from the training data. We used convex optimization then to construct the closet possible vector to the test vector using a linear combination of the mean vectors of the 26 population groups, where the coefficients of the linear combination were constrained to be non-negative and sum to 1. For each test data point, we have plotted the cosine similarity of the vector corresponding to the test data point and its reconstruction. The values range from 0 to 1, where 1 indicates that the data can be perfectly explained by the mean PC vectors of the 26 population groups. Lower values of similarity result from heterogeneity of the data within each population group and lack of population groups in the training data that would explain this heterogeneity through admixture

Back to article page