In this lesson, we will see how to use the open()
function to open an existing text file, or to create a new text file. We will see how to read text from a file and how to write text to a file.
In several of the examples we see in this lesson, we will be working with the file my_file.txt
in the data/
directory. This file is a text file containing the three lines of text shown below.
This is the first line.
This is the second line.
This is the third line.
We will start by discussing how to open and close files. These tasks can be accomplished using the open()
and close()
functions. The open()
function requires a parameter named file
that is expected to be a string representing the path to a file. This function also accepts a number of optional parameters. The most important of these is mode
, which we will consider later in the lesson.
In the cell below, we open the file my_file.txt
, storing the value returned into a variable named fin
(which stands for file input). We then print the type of fin
, and see that it has type _io.TextIOWrapper
. This object does not contain the actual text from the file, but instead provides a link through which we can access the contents of the file.
fin = open('data/my_file.txt')
print(type(fin))
After running the cell above, the file will be open in Python. You won't see the contents of the file as a window in your operating system, but the file is none-the-less open. If you were to try to delete the file at this point, you would likely see a message similar to the one below:
"The action can't be completed because the file is open in Python."
We can confirm that the file is open by printing the closed
attribute of the TextIOWrapper
object.
print(fin.closed)
It is good practice to always close files when you are done working with them. This can be accomplished using the close()
method of the TextIOWrapper
object.
Python will automatically close any open files when the Python sessions ends, but closing the files manually will free up valuable resources, and is particularly important in programs that work with multiple files, or very large files.
fin.close()
We will again check the value of the closed
attribute to confirm that the file has been closed.
print(fin.closed)
We can use the mode
parameter to specify the time of file operations should be allowed on the file we have opened. In particular, we can use mode
to specify if we would like for the text file to be read-only, or if writing to the file should be allowed.
A list of possible values for the mode
parameter is provided below, along with explanations of the purpose of these values.
r
means "read". A file opened in this model will be read-only. w
means "write". If the file does not exist, it is created. If the file does exist, it is overwritten.x
means "write". This mode will only works if the file does not already exist. If the file already exists, an error will occur.a
means "append". This mode allows for new lines to be added to the end of a file. The default value for mode
is r
, so if we only wish to read the contents of a file, we do not need to specify the mode
parameter.
There are several tools available for reading the contents of an open file. The three most common such tools are the methods read()
, read_lines()
, and read_line()
.
read()
method will return a string that contains the entire content of the file. readlines()
method will return a list of strings, with each string representing a single line of the file. readline()
method will return an iterator, each value of which will be a string representing a single line of the file. We will now take a look at an example of using the read()
method.
fin = open('data/my_file.txt')
contents = fin.read()
fin.close()
We will print the data type of the contents
variable to confirm that it is a string.
print(type(contents))
If we print contents
, we will see that it contains all three lines of my_file.txt
.
print(contents)
If we disply the contents
variable without using print()
, we can see that the string contains newline characters used to separate the lines.
contents
If we wanted to separate each line of the file into its own string, we could use the split()
method, splitting the string on newline characters.
contents_list = contents.split('\n')
print(contents_list)
We will now explore the readlines()
method. In the cell below, we open the file my_files.txt
, read its contents using readlines()
, and then close the file. We also dispay the results returned by readlines()
to confirm that this is a list of strings.
fin = open('data/my_file.txt')
contents_list = fin.readlines()
fin.close()
print(contents_list)
Notice that each string above ends with a newline character. If we wish to remove these, we can use the strip()
method which removes whitespace characters from the end of a string.
for i in range(len(contents_list)):
contents_list[i] = contents_list[i].strip()
print(contents_list)
We can use the with
keyword to reduce the number of steps involved in working with a file. When we open a file using with
, the file will be automatically closed when we leave the with
block. The usage of this keyword is illustrated in the example below.
with open('data/my_file.txt') as fin:
contents = fin.read()
contents_list = contents.split('\n')
print(contents_list)
We will see how to write to a file by setting the mode
parameter of open
to w
. When using mode='w'
, a new file will be created if one does not already exist with the specified name. If the file does already exist, then it will be overwritten.
In the cell below, we will create a file named new_file.txt
within the data/
folder, and will then write three lines to it.
line1 = 'This is the first line.\n'
line2 = 'This is the second line.\n'
line3 = 'This is the third line.'
with open('data/new_file.txt', 'w') as fout:
fout.write(line1)
fout.write(line2)
fout.write(line3)
We will confirm that the file was written correctly by opening the file in read-only mode and printing its contents.
with open('data/new_file.txt') as fin:
print(fin.read())
If we open a file using mode='a'
, then we can write to the end of the file. This will not delete the current content of the file, but will instead append new lines to the end of the file.
line4 = '\nThis is the fourth line.'
line5 = '\nThis is the fifth line.'
with open('data/new_file.txt', 'a') as fout:
fout.write(line4)
fout.write(line5)
We will confirm that the new content was written to the file by opening the file in read-only mode and printing its contents.
with open('data/new_file.txt') as fin:
print(fin.read())
Occasionally, you will need to break each line of a text file up into smaller pieces called tokens. It is particularly necessary in situations in which we are reading tabular data that has been stored as a text file.
In the exampe below, we will open the file titanic.txt
and read its contents using readlines()
. We will then split each line into tokens, and print the contents of each line in a tabular format.
with open('data/titanic.txt') as fin:
line_list = fin.readlines()
for line in line_list[:20]:
tokens = line.split('\t')
print(f'{tokens[0]:<10}{tokens[1]:<8}{tokens[3]:<10}{tokens[4]:<8}{tokens[2]:<60}')