What Are Buckets on the Google Cloud Platform?

TensorFlow For Dummies

The filesystem on your computer stores data in files and organizes files using directories. Cloud Storage stores data in objects and collects objects inside buckets. Buckets have a lot in common with directories, but there's one major difference: Buckets can’t be nested. That is, you can’t organize buckets into a hierarchy in the way that you can organize directories.

When working with buckets, you should be familiar with three points:

All load/store/delete operations involving Cloud Storage must identify at least one target bucket.
Every bucket has a globally unique name, a storage class, and a geographic location.
A project can create/delete buckets at most once every two seconds.

This last point is important. Creating and deleting buckets takes a significant amount of time, so Google recommends creating a small number of persistent buckets and reusing them as needed.

Bucket names

When you access a bucket, you need to identify it through its Uniform Resource Identifier (URI), which starts with gs://. A bucket’s name must be unique across all GCP projects, not just your own projects. Therefore, it’s a good idea to prepend your project ID to your bucket name, as in gs://myproject3712_tfbook.

The GCP sets the following criteria for bucket names:

A bucket's name must have more than two characters and fewer than 64.
The characters in a bucket’s name are limited to letters, numbers, dashes, underscores, and dots.
A bucket’s name can’t start with “goog”, and it can’t contain “google” or misspellings of “google.”

If you create a bucket whose name contains dots, Cloud Storage assumes that you’ve named your bucket after a domain, as in Evil Robot. The good news is that Cloud Storage extends the maximum name length of domain-named buckets to 222 characters. The bad news is that you need to convince Google that you own the domain.

Storage classes and locations

Every bucket has a storage class that determines its availability, pricing, and storage characteristics. This table lists the four different storage classes and their characteristics.

Storage Classes of Cloud Storage Buckets

Storage Class	ID	Description
Multi-Regional	`multi_regional`	Data frequently accessed across a wide area (Price: $0.026 per GB per month)
Regional	`regional`	Data frequently accessed in a limited region (Price: $0.02 per GB per month)
Nearline	`nearline`	Data accessed no more than once per month (Price: $0.01 per GB per month)
Coldline	`coldline`	Data accessed no more than once per year (Price: $0.007 per GB per month)

For example, suppose that you want a bucket to contain video that will be displayed across the world. In this case, you'd create a bucket and set its storage class to multi_regional. You can set a multi-regional bucket’s location to one of three values: eu, us, and asia.

If your data needs to be accessed only in a specific region, you should set the bucket's storage class to Regional. You can associate a Regional bucket with one of 13 different locations, and This table lists them all.

Location Codes of Regional Buckets

`us-east1`	`us-east4`	`us-central1`	`us-west1`
`asia-east1`	`asia-northeast1`	`asia-southeast1`	`asia-south`
`australia-southeast1`	`europe-west1`	`europe-west2`	`europe-west3`
`southamerica-east1`

Google's list of supported regions increases regularly. For up-to-date information on storage classes, visit the GCP documentation. Get up-to-date information on bucket locations.

About This Article

About the book author:

Matthew Scarpino has been a programmer and engineer for more than 20 years. He has worked extensively with machine learning applications, especially those involving financial analysis, cognitive modeling, and image recognition. Matthew is a Google Certified Data Engineer and blogs about TensorFlow at tfblog.com.