Amazon Web Services' Elastic Compute Cloud - dummies

Amazon Web Services’ Elastic Compute Cloud

By Bernard Golden

The Elastic Compute Cloud (EC2) is the most widely used Amazon Web Service. Even the term “cloud computing” emphasizes computing — and its computing that EC2 delivers, at scale, in wide varieties of types, and at ridiculously low prices.

EC2 is based on virtualization — the process of using software to create virtual machines that then carry out all the tasks you’d associate with a “real” computer using a “real” operating system. If you have any experience with virtualization, you’ll understand the foundation of EC2.

The foundation isn’t everything to everyone, though. There are significant differences between EC2 and traditional virtualization, typified by products such as VMware ESX and Citrix XenServer — differences that you’ll recognize quickly enough when you begin to use EC2. In a standard virtualization product, a virtual machine is either running or quiescent (a fancy way of saying “not running”).

EC2 has come up with its own terminology: When a virtual machine is running in EC2, it’s referred to as an instance; when an instance isn’t running in EC2, it’s referred to as an image. Likewise, in virtualization, a virtual machine is started, and in EC2 an instance is launched.

Terminology aside, a more significant difference between virtualization and EC2 lies in how a nonrunning virtual machine/instance is stored when it isn’t running. A virtualization product stores the entire virtual machine on disk; the only difference in storage between a running virtual machine and a quiescent virtual machine is that the running machine is brought into the virtual machine manager and made operational — the disk storage requirements are exactly the same.

The implication is that you may have wasted disk storage. If you have, say, a virtual machine with 1.7GB of disk space but the virtual machine operating system and application software require only 300MB of disk space — you have 1.4GB of unused storage and by extension, 1.4GB of wasted disk space.

EC2, by contrast, stores only the actual data necessary to provide the virtual machine and operating system, so only 300MB is stored on disk when the instance is not running — and, crucially, you don’t pay for the 1.4GB of unused disk space that otherwise would sit empty. This arrangement reduces your EC2 cost when your instances are not running.

This is only a simplified version of what really happens. AWS actually has two types of Amazon Machine Images (AMIs). Described here is what happens when EC2 handles images that are stored in the Amazon Simple Storage Service (known as S3). These S3-backed images are given the standard treatment — a full file system while running as an instance but a stripped-down image when not running.

The other type of image, referred to as an EBS-backed image (because of its links to the AWS product Elastic Block Storage), operates more like traditional virtualization, with full storage of the entire instance file system, even if much of it has no data.

S3-backed instances don’t store changes made to the file system when the instance is shut down (terminated). The next time the image is launched, the running instance reflects the layout of the image as originally created. It’s similar to a gold image or a LiveCD (in case you’ve used a CD-based Linux system).

Understanding the transient nature of the file system for S3-backed instances is critical. No changes made to an instance are persistent post-termination — unlike in any operating system you’ve ever used (except for LiveCD). If your instance will process and save data, you must find a way to save the data outside of the instance. Simply put, S3-backed images don’t make data persistent.