Understanding How Permanent Storage Works for Python Programming

By John Paul Mueller

You don’t need to understand absolutely every detail about how permanent storage works with Python in order to use it. For example, just how the drive spins (assuming that it spins at all) is unimportant. However, most platforms adhere to a basic set of principles when it comes to permanent storage. These principles have developed over a period of time, starting with mainframe systems in the earliest days of computing.

Data is generally stored in files (with pure data representing application state information), but you could also find it stored as objects (a method of storing serialized class instances). probably know about files already because almost every useful application out there relies on them. For example, when you open a document in your word processor, you’re actually opening a data file containing the words that you or someone else has typed.

Files typically have an extension associated with them that defines the file type. The extension is generally standardized for any given application and is separated from the filename by a period, such as MyData.txt. In this case, .txt is the file extension, and you probably have an application on your machine for opening such files. In fact, you can likely choose from a number of applications to perform the task because the .txt file extension is relatively common.

Internally, files structure the data in some specific manner to make it easy to write and read data to and from the file. Any application you write must know about the file structure in order to interact with the data the file contains. File structures can become quite complex.

Files would be nearly impossible to find if you placed them all in the same location on the hard drive. Consequently, files are organized into directories. Many newer computer systems also use the term folder for this organizational feature of permanent storage. No matter what you call it, permanent storage relies on directories to help organize the data and make individual files significantly easier to find. To find a particular file so that you can open it and interact with the data it contains, you must know which directory holds the file.

Directories are arranged in hierarchies that begin at the uppermost level of the hard drive. For example, when working with the downloadable source code for this book, you find the code for the entire book in the BPPD directory within the user folder on your system. On a Windows system, that directory hierarchy is C:\Users\John\BPPD. However, other Mac and Linux systems have a different directory hierarchy to reach the same BPPD directory, and the directory hierarchy on your system will be different as well.

Notice that you use a backslash (\) to separate the directory levels. Some platforms use the forward slash (/); others use the backslash. The book uses backslashes when appropriate and assumes that you’ll make any required changes for your platform.

A final consideration for Python developers (at least for this book) is that the hierarchy of directories is called a path. You see the term path in a few places in this book because Python must be able to find any resources you want to use based on the path you provide. For example, C:\Users\John\BPPD is the complete path to the source code on a Windows system. A path that traces the entire route that Python must search is called an absolute path. An incomplete path that traces the route to a resource using the current directory as a starting point is called a relative path.

To find a location using a relative path, you commonly use the current directory as the starting point. For example, BPPD\__pycache__ would be the relative path to the Python cache. Note that it has no drive letter or beginning backslash. However, sometimes you must add to the starting point in specific ways to define a relative path. Most platforms define these special relative path character sets:

  • \: The root directory of the current drive. The drive is relative, but the path begins at the root, the uppermost part, of that drive.
  • .\: The current directory. You use this shorthand for the current directory when the current directory name isn’t known. For example, you could also define the location of the Python cache as .\__pycache__.
  • ..\: The parent directory. You use this shorthand when the parent directory name isn’t known.
  • ..\..\: The parent of the parent directory. You can proceed up the hierarchy of directories as far as necessary to locate a particular starting point before you drill back down the hierarchy to a new location.