YFull: FAQ↩ Back 

YFull Definitions

BAM file - a compressed binary version of a SAM file that is used to represent aligned sequences of a comprehensive Y-DNA test such as BigY. [A SAM file is a text-based format for storing biological sequences.]

Raw data file - In the FAQs, YFull may use "raw data" and "raw data file" to refer to a BAM file or a VCF file.

VCF file - a text file format used for reporting the results of a comprehensive Y-DNA test (but level of detail is significantly less than what is contained in a BAM file).

PM - the acronym for YFull's personal message system.

SNP definitions:

* SNP - single nucleotide polymorphism.

* Positive SNP - a SNP with a mutation. Also called a derived SNP and a SNP variant. The mutations in a customer's raw data sample (BAM file) establish the customer's genetic path from the deep past towards the present.

* Negative SNP - a SNP without a mutation. Also called an ancestral SNP. About 95% of a customer's SNPs are negative.

* Known SNP - a SNP that has been discovered and named by YFull or by another organization or person. There is a list of organizations and persons at the end of the Positive, Ambiguous and No call tabs on YFull's Hg and SNPs page.

The letter G (in a yellow box) after a Novel SNP means that the SNP has a mutation in the coding region of the gene. Hovering your pointing device over the yellow box will show the name of the gene.

The letter H (in a red box) after a Novel SNP means that the SNP is in a homologous region. Hovering your pointing device over the red box will show additional information.

The letters MC (in a gray box) after a Novel SNP means that the SNP is in a multi-copy region or a homologous region of the Y chromosome.

* Not used for analysis - This phrase appears after certain SNPs located with the Check SNPs tool. These SNPs will not have a star rating because they are not used for sample analysis for a variety of reasons, such as being located in a homological region or because YFull has no, or only one, sample showing the SNP.

The words up to [name of subclade] after a Novel SNP means that the eventual location of the SNP may be as far in the past as the named subclade or closer to the present than the current Terminal Hg.

* Private SNP - A SNP in a sample in the YFull database is considered private until it has been matched in another YFull database sample with the same "localization". See Definition for Localization. On YFull's home page (available by logging in to your account), YFull indicates that Novel SNPs are considered to be private. In addition, some Known Positive SNPs may be designated as private. Details for each Private SNP may be reviewed by using the YFull tool called "View this position in BAM". Use the spyglass button or the BAM button for a particular SNP in Hg and SNPs or a particular HG19 or HG38 position in Novel SNPs. The tool provides extensive information, including, for example, whether a SNP or position is included in the YBrowse (YB) database. The YF abbreviation refers to YFull's database. YFull names a Private SNP (and adds it to the Yxxxxxx series of SNPs) if there is no known alternative name, it is of Best or Acceptable quality, it is not in a homolguous region and it has no more than one additional Localization.

* Localization - the determination that a Novel SNP lies within a particular haplogroup. The phrase "Additional localizations for variant" followed by a red number, as used in the Ambiguous tab of Novel SNPs, means that the SNP was observed in one or more additional haplogroups.

* Novel SNP - One type of Private SNP. See FAQ Definitions for Private SNP and YFSxxxxxx.

* YFSxxxxxx - The series of SNPs used to designate YFull's Novel SNPs that have not yet been added to the Yxxxxxx series by YFull. Sometimes called "YFull singles".

* YFCxxxxxx - The series of SNPs used to designate SNPs being considered by YFull for naming and inclusion in the YTree. Sometimes called "YFull Candidates."

* Yxxxxxx - The series of SNPs designating SNPs that YFull has named.

* Terminal Hg - the haplogroup or branch with the youngest estimated age (age closest to the present) for a sample in the YFull database. In the context of the YTree, which is frequently updated to include new samples and new branches, Terminal Hg means the most recent branch discovered for a sample as of the time of a specific YTree revision.

* REF and ChrY reference number - REF is the abbreviation for the reference alleles of the reference genome digital nucleic acid sequence database assembled by scientists. Each reference allele is identified by a separate ChrY number. The build/version and reference sequence numbers used by YFull are included in information located with YFull's Check SNPs and Browse RAW data tools and in YFull's Novel SNPs report.

* ALT - the abbreviation for alternate alleles, used by YFull to identify the SNP information it finds during its analysis of a sample (e.g. BAM file). By comparing ALT with REF, the mutation path of a sample can be determined.

* No Call - for a specific SNP or ChrY position in a sample, the YFull analysis did not detect any reads or other information.

* The letter X (in a gold box) shown after the star quality rating for a specific SNP designates the SNP as ambiguous. This means that YFull was not able to determine a positive, negative or no call position for the SNP. YFull sometimes uses the word controversial instead of the word ambiguous.

* Certain letters indicate contradictions - YFull follows the uniform system of lettering established by the IUPAC (International Union of Pure and Applied Chemistry). Certain letters from the IUPAC system indicate contradictions in the reads of a sample allele. For example, the letter M means A or C.

YTree definitions:

* Haplogroup - a major branch of the YTree, such as Haplogroup E or Haplogroup I.

* Clade - from the Greek word klados, clade means branch, and subclade means a further branch.

* ybp - years before present (the number of years ago a common ancestor is estimated to have lived or an event is estimated to have occurred before the current year (for example 2017)). When ybp is used in reference to an individual analyzed sample, "present" means the year in which the YFull age estimation information for the sample is added to the YTree. [ybp does not mean 1 January 1950 as used in some parts of the scientific community.] YFull's definition of ybp should not be confused with the formula used by YFull for age estimation estimation purposes: 144.41 years (assumed mutation rate) plus 60 years (assumed age of person who sample is analyzed by YFull).

* bp - base pairs, as used in the context of the length coverage of a BAM sample.

* Asterick (*) after the name of a subclade - an indication that the YFull database does not yet contain sufficient reliable information to permit the identification of a younger (toward the present) branch of the subclade.

* New - used to identify samples uploaded to YFull within the 45 day period prior to the current date.

* YFxxxxx - used to identify samples provided to YFull by customers of testing companies (other than Enlighten Lab in China).

* ELTxxxxx - used to identify samples provided to YFull by customers of Enlighten Lab in China.

* TMRCA - time to most recent common ancestor.

* Symbol "i" - This information might be about the samples belong to the same tested person or about other possible terminal subclades for this sample.

STR definitions:

STR - short tandem repeat. STRs are short sequences of Y-DNA that are repeated.

MTree definitions:

FASTA - A text-based format for nucleotide or peptide sequences. In the context of mtDNA, a FASTA file typically contains the mitochondrial genome sequence data that can be used for analysis and comparison.

HVR1 (Hypervariable Region 1) - A segment of mtDNA located between positions 16024-16383. It is one of the regions with the highest mutation rates and is often used in genetic studies to determine maternal lineage.

HVR2 (Hypervariable Region 2) - Another segment of mtDNA with high variability, located between positions 57-574. Like HVR1, HVR2 is used to trace maternal ancestry and study human migrations.

CR (Coding Region) - The part of the mitochondrial genome that encompasses the genes responsible for the production of proteins, rRNAs, and tRNAs. It spans from position 575 to 16000 and contains most of the mtDNA sequence.

* Back Mutation - A genetic change that reverses a previous mutation, potentially reverting the DNA sequence back to its ancestral state. This can complicate the interpretation of lineage and evolutionary history. In MTree, a back mutation is denoted by an exclamation mark. 

Extras - Mutations that are unique to an individual's mtDNA sequence and are not currently used in the construction of the mtDNA tree. These could provide new insights into the maternal lineage as more data becomes available.

rCRS (Revised Cambridge Reference Sequence) - The standard reference sequence for human mtDNA, first published in 1981 and revised in 1999. It is derived from a European individual and is used as a comparison point for mtDNA studies.

RSRS (Reconstructed Sapiens Reference Sequence) - A more recent reference sequence that attempts to represent the mtDNA of the common ancestor of all modern humans, known as 'mitochondrial Eve'. It is used alongside rCRS to provide a more complete picture of human mtDNA variation.

Last updated on May 29, 2024