Cheat Sheet
Bioinformatics For Dummies
Bioinformatics is the marriage of molecular biology and information technology. Web sites direct you to basic bioinformatics data and get down to specifics in helping you analyze DNA/RNA and protein sequences. All this data comes at you in several formats, so becoming familiar with various format types helps you know how to interpret and store the data.
Where to Find Bioinformatics Data
Bioinformatics combines information technology and molecular biology, so it makes sense that the Internet is the main arena for pursuing bioinformatics information. The following list offers links to helpful Web sites around the world and the areas that they specialize in:
Ensembl: The Human Genome
GenBank/DDBJ/EMBL: Nucleotide sequence
PubMed: Literature references
Swiss Institiute of Bioinformatics: Annotated protein sequences
InterProScan: Protein domains
OMIM: Genetic diseases
GenomeNet: Metabolic pathways
Bioinformatics Web Sites for Analyzing DNA/RNA Sequences
The bioinformatics Web sites in the following list offer help in analyzing DNA and RNA sequences. And, in the marriage of information technology and molecular biology that is bioinformatics, this type of analysis is what it's all about.
Webcutter: Restriction map
GenomeScan: Gene discovery
blastn, tblastn, blastx: Database search
The Genome Browser: Browse the ultimate data!
Mfold: RNA structure prediction
Bioinformatics Web Sites for Analyzing Protein Sequences
With bioinformatics you can explore molecular biology using information technology. The links to the Web sites in the following list focus on protein sequences. Some offer searchable databases, others help you investigate a single protein; all are helpful:
BLAST: Database homology search
SRS: Database search
Entrez: Database search
InterProScan: Find protein domains
ExPASy: Analyze a protein
ClustalW: Multiple sequence alignment
T-Coffee: Evaluate multiple alignment
Jalview: Multiple alignment editor
PSIPRED: Secondary structure prediction
Cn3D: Display and spin 3-D structures
Bioinformatics Data Formats
When you're using the Internet to help with your bioinformatics project, you come across data in all sorts of different formats. The following table can help you understand common bioinformatics formats and what you can and cannot do with them.
| Format Name | Description |
|---|---|
| RAW | Sequence format that doesn't contain any header. Spaces and numbers are usually tolerated. |
| FASTA | This is the default format. Sequence format that contains a
header line and the sequence: >name AGCTGTGTGGGTTGGTGGGTT |
| PIR | Sequence format that's similar to FASTA but less common |
| MSF | Multiple sequence alignment format |
| CLUSTAL | Multiple sequence alignment format (works with T-Coffee) |
| TXT | Text format |
| GIF, JPEG, PNG, PDF | Graphic formats. Do not use them to store important information. |









