Protocol for using Sequencher and BLASTing Results
After doing this exercise, you should be able to
– Tell the difference between a distinct peak and an ambiguous one in your chromatogram
– Edit bases in Sequencher
– Assemble a contig
– Use the Basic Local Alignment Search Tool (BLAST) to compare your sequence to those published in GenBank
Now that your samples are sequenced, they must be analyzed, aligned, and compared to previously discovered sequences. After cycle-sequencing all samples in the forward and reverse direction, obtain sequence files from Core Lab. Unzip and import folder of sequences into Sequencher program.
- Open each sequence separately to view the nucleotide sequence. Once this window opens select Show Chromatogram, which will open another window.
- Visually analyze the chromatogram to make sure that the peaks are distinct with little overlap and background.
- Verify that Sequencher has correctly identified each base.
- Use the slider (located in the top left of the window) to optimize the peak height if needed.
- To view secondary peaks better, click on a respective base (located just below the slider) to “turn it off.”
- If there is an incorrect read, correct it by highlighting the incorrect base and changing it to the correct one. For example, if it reads A but it should be T, just highlight the A and type T. If an N is incorrectly labeled and should be replaced by a base, then do the same for the N.
- You can also pull down the sequencer tab and select call secondary peak, but this will call all of them using the ambiguous DNA codes)
- If there is appears to be generally bad quality repeated at the beginning and/or end of the sequence, delete these bases.
- After editing each sequence, close all chromatogram windows. Check the quality (third column) of each sequence.
- Set the Assembly Parameters as follows:
- Assembly Algorithm: Select Dirty Data
- Optimize Gap placement: Check Use ReAligner and Prefer 3’ Gap Placement
- Minimum Match Percentage: Set to 85
- Minimum Overlap: Set to 45
- Assemble By Name: Optional to enable or not enable (convenient if you have a lot of sequences)
- Highlight the forward and reverse sequence for a sample.
- Click the Assemble Automatically button to form a contig.
- A window reading Assembly Completed will appear, noting the number of contigs created.
- Close this window and then click on the contig to open it. A new window will appear with a graphical view of the contig.
- Click on the Bases tab to view a consensus sequence of these two directions.
- Note that the consensus sequence is along the baseline and the two sequences are above. Mismatches are denoted by a along the consensus sequence.
- To view and edit a particular section, click on a selected base of the consensus (the above bases will highlight as well) and click Show Chromatograms.
- Open your browser and go to the NCBI BLAST website (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Select Nucleotide BLAST.
- In the Sequencher program, highlight the entire consensus sequence and copy it into the box labeled Enter Query Sequence on the blast webpage.
- To mass BLAST a collection of contigs:
- Highlight all contigs
- Export as Consensus
- Select FASTA format
- Export and then upload into NCBI blast webpage.
- Change the page default settings.
- Under Choose Search Set, click on Others
- Under Program Selection, click on Highly Similar Sequences.
- Now, click on the BLAST button.
- A new window will appear in a few seconds with a graphic summary, description and alignment of the “best” blasts or matched database sequences to your submitted sequence.
- Scroll down to the Descriptions tab to check the statistics on these matches. The score represents the similarity between your queried sequence and the found sequence. Meanwhile, the E value represents the amount of alignments you would expect to find by chance that have the same score as that respective alignment. Therefore, a valid and significant e-value should be 10-5 or less.
- The query coverage is the percent of the query length that is included in the aligned segments.
- Sort by max score or E value, and ensure that all of the significant sequences are segment 4 of the HA gene and that their H subtypes all match.
- Record this subtype as the H subtype of your sample in the respective results folder.