Bioinformatics Data Formats - dummies

By Jean-Michel Claverie, Cedric Notredame

Part of Bioinformatics For Dummies Cheat Sheet

When you’re using the Internet to help with your bioinformatics project, you come across data in all sorts of different formats. The following table can help you understand common bioinformatics formats and what you can and cannot do with them.

Format Name Description
RAW Sequence format that doesn’t contain any header. Spaces and
numbers are usually tolerated.
FASTA This is the default format. Sequence format that contains a
header line and the sequence: >name
PIR Sequence format that’s similar to FASTA but less common
MSF Multiple sequence alignment format
CLUSTAL Multiple sequence alignment format (works with T-Coffee)
TXT Text format
GIF, JPEG, PNG, PDF Graphic formats. Do not use them to store important