Cheat Sheet
Bioinformatics For Dummies Cheat Sheet
Bioinformatics is the marriage of molecular biology and information technology. Web sites direct you to basic bioinformatics data and get down to specifics in helping you analyze DNA/RNA and protein sequences. All this data comes at you in several formats, so becoming familiar with various format types helps you know how to interpret and store the data.
Where to Find Bioinformatics Data
Bioinformatics combines information technology and molecular biology, so it makes sense that the Internet is the main arena for pursuing bioinformatics information. The following list offers links to helpful Web sites around the world and the areas that they specialize in:
-
Ensembl: The Human Genome
-
GenBank/DDBJ/EMBL: Nucleotide sequence
-
PubMed: Literature references
-
Swiss Institiute of Bioinformatics: Annotated protein sequences
-
InterProScan: Protein domains
-
OMIM: Genetic diseases
-
GenomeNet: Metabolic pathways
Bioinformatics Web Sites for Analyzing DNA/RNA Sequences
The bioinformatics Web sites in the following list offer help in analyzing DNA and RNA sequences. And, in the marriage of information technology and molecular biology that is bioinformatics, this type of analysis is what it’s all about.
-
Webcutter: Restriction map
-
GenomeScan: Gene discovery
-
blastn, tblastn, blastx: Database search
-
The Genome Browser: Browse the ultimate data!
-
Mfold: RNA structure prediction
Bioinformatics Web Sites for Analyzing Protein Sequences
With bioinformatics you can explore molecular biology using information technology. The links to the Web sites in the following list focus on protein sequences. Some offer searchable databases, others help you investigate a single protein; all are helpful:
-
BLAST: Database homology search
-
SRS: Database search
-
Entrez: Database search
-
InterProScan: Find protein domains
-
ExPASy: Analyze a protein
-
ClustalW: Multiple sequence alignment
-
T-Coffee: Evaluate multiple alignment
-
Jalview: Multiple alignment editor
-
PSIPRED: Secondary structure prediction
-
Cn3D: Display and spin 3-D structures
Bioinformatics Data Formats
When you’re using the Internet to help with your bioinformatics project, you come across data in all sorts of different formats. The following table can help you understand common bioinformatics formats and what you can and cannot do with them.
Format Name | Description |
---|---|
RAW | Sequence format that doesn’t contain any header. Spaces and numbers are usually tolerated. |
FASTA | This is the default format. Sequence format that contains a header line and the sequence: >name AGCTGTGTGGGTTGGTGGGTT |
PIR | Sequence format that’s similar to FASTA but less common |
MSF | Multiple sequence alignment format |
CLUSTAL | Multiple sequence alignment format (works with T-Coffee) |
TXT | Text format |
GIF, JPEG, PNG, PDF | Graphic formats. Do not use them to store important information. |