How to Create Content for Permanent Storage in Python

By John Paul Mueller

Python allows you to permanently store content. A file can contain structured or unstructured data. An example of structured data is a database in which each record has specific information in it. An employee database would include columns for name, address, employee ID, and so on. Each record would be an individual employee and each employee record would contain the name, address, and employee ID fields.

An example of unstructured data is a word processing file whose text can contain any content in any order. There is no required order for the content of a paragraph, and sentences can contain any number of words. However, in both cases, the application must know how to perform CRUD operations with the file.

This means that the content must be prepared in such a manner that the application can both write to and read from the file.

Even with word processing files, the text must follow a certain series of rules. Assume for a moment that the files are simple text. Even so, every paragraph must have some sort of delimiter telling the application to begin a new paragraph.

The application reads the paragraph until it sees this delimiter, and then it begins a new paragraph. The more that the word processor offers in the way of features, the more structured the output becomes. For example, when the word processor offers a method of formatting the text, the formatting must appear as part of the output file.

The cues that make content usable for permanent storage are often hidden from sight. All you see when you work with the file is the data itself. The formatting remains invisible for a number of reasons, such as these:

  • The cue is a control character, such as a carriage return or linefeed, that is normally invisible by default at the platform level.

  • The application relies on special character combinations, such as commas and double quotes, to delimit the data entries. These special character combinations are consumed by the application during reading.

  • Part of the reading process converts the character to another form, such as when a word processing file reads in content that is formatted. The formatting appears onscreen, but in the background the file contains special characters to denote the formatting.

  • The file is actually in an alternative format, such as eXtensible Markup Language (XML). The alternative format is interpreted and presented onscreen in a manner the user can understand.

Other rules likely exist for formatting data. For example, Microsoft actually uses a .zip file to hold its latest word processing files (the .docx) file. The use of a compressed file catalog, such as .zip, makes storing a great deal of information in a small space possible. It’s interesting to see how others store data because you can often find more efficient and secure means of data storage.

Now that you have a better idea of what could happen as part of preparing content for disk storage, it’s time to look at an example. In this case, the formatting strategy is quite simple. All this example does is accept input, format it for storage, and present the formatted version onscreen (rather than save it to disk just yet).

  1. Open a Python File window.

    You see an editor in which you can type the example code.

  2. Type the following code into the window — pressing Enter after each line:

    class FormatData:
     def __init__(self, Name=", Age=0, Married=False):
      self.Name = Name
      self.Age = Age
      self.Married = Married
     def __str__(self):
      OutString = "'{0}', {1}, {2}".format(
       self.Name,
       self.Age,
       self.Married)
      return OutString

    This is a shortened class. Normally, you’d add accessors (getter and setter methods) and error-trapping code. (Remember that getter methods provide read-only access to class data and setter methods provide write-only access to class data.) However, the class works fine for the demonstration.

    The main feature to look at is the __str__() function. Notice that it formats the output data in a specific way. The string value, self.Name, is enclosed in single quotes. Each of the values is also separated by a comma.

    This is actually a form of a standard output format, comma-separated value (CSV), that is used on a wide range of platforms because it’s easy to translate and is in plain text, so nothing special is needed to work with it.

  3. Save the code asFormattedData.py.

  4. Open another Python File window.

  5. Type the following code into the window — pressing Enter after each line:

    from FormattedData import FormatData
    NewData = [FormatData("George", 65, True),
       FormatData("Sally", 47, False),
       FormatData("Doug", 52, True)]
    for Entry in NewData:
     print(Entry)

    The code begins by importing just the FormatData class from FormattedData. In this case, it doesn’t matter because the FormattedData module contains only a single class. However, you need to keep this technique in mind when you need only one class from a module.

    Most of the time, you work with multiple records when you save data to disk. You might have multiple paragraphs in a word processed document or multiple records, as in this case. The example creates a list of records and places them in NewData. In this case, NewData represents the entire document. The representation will likely take other forms in a production application, but the idea is the same.

    Any application that saves data goes through some sort of output loop. In this case, the loop simply prints the data onscreen.

  6. Choose Run→Run Module.

    This is a representation of how the data would appear in the file. In this case, each record is separated by a carriage return and linefeed control character combination. That is, George, Sally, and Doug are all separate records in the file. Each field (data element) is separated by a comma. Text fields appear in quotes so that they aren’t confused with other data types.

    image0.jpg