Working with Text Files

In this lesson, we will see how to use the open() function to open an existing text file, or to create a new text file. We will see how to read text from a file and how to write text to a file.

In several of the examples we see in this lesson, we will be working with the file my_file.txt in the data/ directory. This file is a text file containing the three lines of text shown below.

This is the first line. 
This is the second line. 
This is the third line. 

Opening and Closing Files

We will start by discussing how to open and close files. These tasks can be accomplished using the open() and close() functions. The open() function requires a parameter named file that is expected to be a string representing the path to a file. This function also accepts a number of optional parameters. The most important of these is mode, which we will consider later in the lesson.

In the cell below, we open the file my_file.txt, storing the value returned into a variable named fin (which stands for file input). We then print the type of fin, and see that it has type _io.TextIOWrapper. This object does not contain the actual text from the file, but instead provides a link through which we can access the contents of the file.

fin = open('data/my_file.txt')
print(type(fin))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-d31523d77f51> in <module>
----> 1 fin = open('data/my_file.txt')
      2 print(type(fin))

FileNotFoundError: [Errno 2] No such file or directory: 'data/my_file.txt'

After running the cell above, the file will be open in Python. You won’t see the contents of the file as a window in your operating system, but the file is none-the-less open. If you were to try to delete the file at this point, you would likely see a message similar to the one below:

"The action can't be completed because the file is open in Python."

We can confirm that the file is open by printing the closed attribute of the TextIOWrapper object.

print(fin.closed)
False

It is good practice to always close files when you are done working with them. This can be accomplished using the close() method of the TextIOWrapper object.

Python will automatically close any open files when the Python sessions ends, but closing the files manually will free up valuable resources, and is particularly important in programs that work with multiple files, or very large files.

fin.close()

We will again check the value of the closed attribute to confirm that the file has been closed.

print(fin.closed)
True

The Mode Parameter

We can use the mode parameter to specify the time of file operations should be allowed on the file we have opened. In particular, we can use mode to specify if we would like for the text file to be read-only, or if writing to the file should be allowed.

A list of possible values for the mode parameter is provided below, along with explanations of the purpose of these values.

  • r means “read”. A file opened in this model will be read-only.

  • w means “write”. If the file does not exist, it is created. If the file does exist, it is overwritten.

  • x means “write”. This mode will only works if the file does not already exist. If the file already exists, an error will occur.

  • a means “append”. This mode allows for new lines to be added to the end of a file.

The default value for mode is r, so if we only wish to read the contents of a file, we do not need to specify the mode parameter.

Reading File Contents

There are several tools available for reading the contents of an open file. The three most common such tools are the methods read(), read_lines(), and read_line().

  • The read() method will return a string that contains the entire content of the file.

  • The readlines() method will return a list of strings, with each string representing a single line of the file.

  • The readline() method will return an iterator, each value of which will be a string representing a single line of the file.

read()

We will now take a look at an example of using the read() method.

fin = open('data/my_file.txt')
contents = fin.read()
fin.close()

We will print the data type of the contents variable to confirm that it is a string.

print(type(contents))
<class 'str'>

If we print contents, we will see that it contains all three lines of my_file.txt.

print(contents)
This is the first line.
This is the second line.
This is the third line.

If we disply the contents variable without using print(), we can see that the string contains newline characters used to separate the lines.

contents
'This is the first line.\nThis is the second line.\nThis is the third line.'

If we wanted to separate each line of the file into its own string, we could use the split() method, splitting the string on newline characters.

contents_list = contents.split('\n')
print(contents_list)
['This is the first line.', 'This is the second line.', 'This is the third line.']

readlines()

We will now explore the readlines() method. In the cell below, we open the file my_files.txt, read its contents using readlines(), and then close the file. We also dispay the results returned by readlines() to confirm that this is a list of strings.

fin = open('data/my_file.txt')
contents_list = fin.readlines()
fin.close()

print(contents_list)
['This is the first line.\n', 'This is the second line.\n', 'This is the third line.']

Notice that each string above ends with a newline character. If we wish to remove these, we can use the strip() method which removes whitespace characters from the end of a string.

for i in range(len(contents_list)):
    contents_list[i] = contents_list[i].strip()
    
print(contents_list)
['This is the first line.', 'This is the second line.', 'This is the third line.']

Using With

We can use the with keyword to reduce the number of steps involved in working with a file. When we open a file using with, the file will be automatically closed when we leave the with block. The usage of this keyword is illustrated in the example below.

with open('data/my_file.txt') as fin:
    contents = fin.read()

contents_list = contents.split('\n')
print(contents_list)
['This is the first line.', 'This is the second line.', 'This is the third line.']

Writing to a File

We will see how to write to a file by setting the mode parameter of open to w. When using mode='w', a new file will be created if one does not already exist with the specified name. If the file does already exist, then it will be overwritten.

In the cell below, we will create a file named new_file.txt within the data/ folder, and will then write three lines to it.

line1 = 'This is the first line.\n'
line2 = 'This is the second line.\n'
line3 = 'This is the third line.'

with open('data/new_file.txt', 'w') as fout:
    fout.write(line1)
    fout.write(line2)
    fout.write(line3)

We will confirm that the file was written correctly by opening the file in read-only mode and printing its contents.

with open('data/new_file.txt') as fin:
    print(fin.read())
This is the first line.
This is the second line.
This is the third line.

Appending

If we open a file using mode='a', then we can write to the end of the file. This will not delete the current content of the file, but will instead append new lines to the end of the file.

line4 = '\nThis is the fourth line.'
line5 = '\nThis is the fifth line.'

with open('data/new_file.txt', 'a') as fout:
    fout.write(line4)
    fout.write(line5)

We will confirm that the new content was written to the file by opening the file in read-only mode and printing its contents.

with open('data/new_file.txt') as fin:
    print(fin.read())
This is the first line.
This is the second line.
This is the third line.
This is the fourth line.
This is the fifth line.

Processing Strings of Text

Occasionally, you will need to break each line of a text file up into smaller pieces called tokens. It is particularly necessary in situations in which we are reading tabular data that has been stored as a text file.

In the exampe below, we will open the file titanic.txt and read its contents using readlines(). We will then split each line into tokens, and print the contents of each line in a tabular format.

with open('data/titanic.txt') as fin:
    line_list = fin.readlines()
    
for line in line_list[:20]:
    tokens = line.split('\t')
    print(f'{tokens[0]:<10}{tokens[1]:<8}{tokens[3]:<10}{tokens[4]:<8}{tokens[2]:<60}')
        
Survived  Pclass  Sex       Age     Name                                                        
0         3       male      22      Mr. Owen Harris Braund                                      
1         1       female    38      Mrs. John Bradley (Florence Briggs Thayer) Cumings          
1         3       female    26      Miss. Laina Heikkinen                                       
1         1       female    35      Mrs. Jacques Heath (Lily May Peel) Futrelle                 
0         3       male      35      Mr. William Henry Allen                                     
0         3       male      27      Mr. James Moran                                             
0         1       male      54      Mr. Timothy J McCarthy                                      
0         3       male      2       Master. Gosta Leonard Palsson                               
1         3       female    27      Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson            
1         2       female    14      Mrs. Nicholas (Adele Achem) Nasser                          
1         3       female    4       Miss. Marguerite Rut Sandstrom                              
1         1       female    58      Miss. Elizabeth Bonnell                                     
0         3       male      20      Mr. William Henry Saundercock                               
0         3       male      39      Mr. Anders Johan Andersson                                  
0         3       female    14      Miss. Hulda Amanda Adolfina Vestrom                         
1         2       female    55      Mrs. (Mary D Kingcome) Hewlett                              
0         3       male      2       Master. Eugene Rice                                         
1         2       male      23      Mr. Charles Eugene Williams                                 
0         3       female    31      Mrs. Julius (Emelia Maria Vandemoortele) Vander Planke