SNPlinks: A brief primer using BLAST
BLAST is a tool for calculating sequence similarity by accessing the NCBI GenBank databases http://www.ncbi.nlm.nih.gov/blast/.
Normally a sequence is copy-pasted into a box for submission to a query queue and BLAST then returns a report based on the comparisons found. It includes tools for comparing nucleotide sequences, protein sequences and whole genomes but most forensic research will only need to perform nucleotide sequence analysis.
Nucleotide BLAST is used in two ways: finding a
location for a submitted sequence (the query being: does the submitted sequence
exist in a GenBank database ?) and checking for coincidental similarity in a
sequence, normally a PCR primer (the query being: what is the degree of
specificity or uniqueness of the submitted sequence ?). The alignment
comparisons required for each of these queries are provided by MegaBLAST and
standard BLAST (blastn) respectively. In developing SNP typing assays
blastn will be the only tool needed. The two statistics that annotate the
returns from blastn: the bit score and E-value are listed in the detailed report
that BLAST returns from a query. This consists of three parts:
i. the header with query sequence information and a
summarizing graphical overview,
ii. single-line matching sequence descriptions and
iii. the matching alignments themselves.
The graphical overview shows the query sequence as a numbered red bar and below this the database hits as coloured bars aligned to the query. The colours and proximity to the query represent the alignment scores from red (highest) through to black (lowest) and uppermost to lowest bars. The single-line descriptions give both a bit score indicating goodness of fit of each matched sequence and an expect value (E-value). The bit score is calculated from a formula that takes into account all the matching nucleotides and gaps, the higher the score the better the alignment (statistical guide at: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html). The E-value gives an idea of the statistical significance of the alignment - reflecting both the size of the database used to prepare the alignments and the score system used. The lower the E-value the more significant the hit. For example a value of 0.05 equates to 5 in 100 or 1 in 20 signifying the probability of this match by chance alone. This latter statistic requires careful interpretation in the context of the sequence comparison being performed.
A helpful BLAST program selection guide can be found at: http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml
Chris Phillips 18.02.2004