Files and directories¶
pathlib
implements path operations using
pathlib.PurePath
and pathlib.Path
objects. The
os
and os.path
modules, on the other hand, offer
functions that work at a low level with str
- and bytes
which is more in
line with a procedural approach. We consider the object-oriented style of
pathlib
be more readable and therefore present it in more detail
here.
See also
PEP 428: The pathlib module – object-oriented filesystem paths
Reading and writing files¶
In Python, you open and read a file by using the
pathlib.Path.open()
function and various built-in read operations.
pathlib.Path.open()
opens the file to which the path refers, as
the built-in open()
function does. The following short Python
programme reads a line from a text file named myfile.txt
in
docs/save-data/
:
1>>> from pathlib import Path
2>>> p = Path("docs", "save-data", "myfile.txt")
3>>> f = p.open()
4>>> headline = f.readline()
- Line 2:
The arguments of
pathlib.Path
are path segments, either asPosixPath
orWindowsPath
. In the previous example, you open a file that you assume is located relative to your call indocs/save-data/myfile.txt
.The following example opens a file at an absolute location –
C:\My Documents\myfile.txt
:2>>> p = Path("c:", "Users", "Veit", "My Documents", "myfile.txt") 3>>> with p.open() as f: 4... f.readline() 5...
Note
In this example, the keyword
with
is used, which means that the file is opened with a context manager, as explained in more detail in Context management with with. This way of opening files handles potential I/O errors better and should generally be preferred.- Line 3:
pathlib.Path.open()
does not read anything from the file, but returns a file object that you can use to access the opened file. It keeps track of a file and how much of the file has been read or written. All file operations in Python are performed with file objects, not file names.- Line 4:
The first call to
readline()
returns the first line of the file object, that is, everything up to and including the first line break, or the entire file if there is no line break in the file; the next call toreadline()
returns the second line, if it exists, and so on. When there is nothing left to read,readline()
returns an empty string.
This behaviour of readline()
makes it
easy to determine, for example, the number of lines in a file:
>>> with p.open() as f:
... lc = 0
... while f.readline() != "":
... lc = lc + 1
... print(lc)
...
2
A shorter way to count all lines is to use the built-in readlines()
method, which reads all lines of a file and
returns them as a list of strings, with one string per line:
>>> with p.open() as f:
... print(len(f.readlines()))
...
2
However, if you count all lines in a large file, this method can cause memory to
overflow because the entire file is read at once. It is also possible for memory
to overflow with readline()
if you try to
read a line from a large file that does not contain line break characters. To
better handle such situations, both methods have an optional argument that
affects the amount of data read at a time. Another way to iterate over all lines
of a file is to treat the file object as an iterator in a for loop:
>>> with p.open() as f:
... lc = 0
... for l in f:
... lc = lc + 1
... print(lc)
...
2
This method has the advantage that the lines are read into memory as needed, so even with large files, there is no need to worry about running out of memory. The other advantage of this method is that it is simpler and more readable.
However, a potential problem with the read method can arise if translations are
performed in text mode on Windows and macOS when you use the open()
command in text mode, in other words, without appending a b
. In text mode,
macOS converts every \r
to \n
, while Windows converts \r\n
pairs to
\n
. You can specify how line breaks are handled by using the newline
parameter when opening the file and specifying newline="\n"
, \r
or
\r\n
, which will only use that string as a line break:
>>> with p.open(newline="\r\n") as f:
... lc = 0
...
In this example, only \n
is interpreted as a line break. However, if the
file was opened in binary mode, the newline
parameter is not necessary, as
all bytes are returned exactly as they appear in the file.
pathlib.Path.read_text()
returns the decoded content of the specified file as a string:
>>> p.read_text() 'This is the first line of myfile.\nAnd this is another line.\n'
pathlib.Path.write_text()
opens the specified file in text mode, writes data to it, and closes the file:
>>> p.write_text("New content") 11 >>> p.read_text() 'New content'
An existing file with the same name will be overwritten.
Reading directories¶
pathlib.Path.iterdir()
If the path refers to a directory, the path objects of the directory contents are returned:
>>> p = Path("docs", "save-data") >>> for child in p.iterdir(): ... child ... PosixPath('docs/save-data/index.rst') PosixPath('docs/save-data/minidom_example.py') PosixPath('docs/save-data/pickle.rst') PosixPath('docs/save-data/xml.rst') PosixPath('docs/save-data/books.xml') PosixPath('docs/save-data/files.rst')
The child objects are returned in arbitrary order, and the special entries .
and ..
are not included. If the path is not a directory or is otherwise
inaccessible, an OSError
is raised.
pathlib.Path.glob()
finds the specified relative pattern in the directory represented by this path and returns all matching files:
>>> sorted(p.glob("*.rst")) [PosixPath('docs/save-data/files.rst'), PosixPath('docs/save-data/index.rst'), PosixPath('docs/save-data/pickle.rst'), PosixPath('docs/save-data/xml.rst')]
See also
pathlib.Path.rglob()
recursively finds the specified relative pattern. This corresponds to calling with
**/
before the pattern.pathlib.Path.walk()
generates the file names in a directory structure by traversing the structure either from top to bottom or from bottom to top. It returns a 3-tuple consisting of
(dirpath, dirnames, filenames)
.With the default setting of the optional argument
top_down=True
, the triple for a directory is generated before the triples for its subdirectories.With
follow_symlinks=True
, symlinks are resolved and placed indirnames
andfilenames
according to their targets.The following example shows the size of the files in a directory, ignoring
__pycache__
directories:>>> for root, dirs, files in p.walk(): ... print( ... root, ... "consumes", ... sum((root / file).stat().st_size for file in files), ... "bytes in", ... len(files), ... "non-directory files", ... ) ... if "__pycache__" in dirs: ... dirs.remove("__pycache__") ... docs/save-data consumes 88417 bytes in 13 non-directory files docs/save-data/sqlite consumes 35187 bytes in 19 non-directory files
The next example is a simple implementation of
shutil.rmtree()
, whereby the directory tree must be traversed from bottom to top, aspathlib.Path.rmdir()
only allows a directory to be deleted if it is empty:>>> for root, dirs, files in p.walk(top_down=False): ... for name in files: ... (root / name).unlink() ... for name in dirs: ... (root / name).rmdir() ...
Creating files and directories¶
pathlib.Path.touch()
creates a file at the specified path.
mode
can be used to specify the file mode and access flags. If the file already exists, the modification time is updated to the current time ifexist_ok=True
, otherwise aFileExistsError
is raised.Note
pathlib.Path.open()
orpathlib.Path.write_text()
are also often used to create files.pathlib.Path.mkdir()
creates a new directory under the specified path. The parameters
mode
andexist_ok
work as specified inpathlib.Path.touch()
.If
parents=True
, missing parent directories of the path are created as needed with the default permissions. With the default settingparents=False
, however,FileNotFoundError
is triggered.
RRenaming, copying and deleting¶
pathlib.Path.rename()
renames the file or directory to the specified destination and returns a new
pathlib.Path
instance that points to the destination. On Unix, if the destination exists and is a file, it is simply replaced; on Windows, aFileExistsError
is raised.>>> myfile = Path("docs", "save-data", "myfile.txt") >>> newfile = Path("docs", "newdir", "newfile.txt") >>> myfile.rename(newfile) PosixPath('docs/newdir/newfile.txt')
Added in version 3.14: Python 3.14, the methods pathlib.Path.copy()
,
pathlib.Path.copy_into()
, pathlib.Path.move()
and
pathlib.Path.move_into()
are added.
See also
Permissions and ownership¶
pathlib.Path.owner()
returns the name of the person who owns the file. Normally, symlinks are followed, but if you want to determine the person who owns the symlink, add
follow_symlinks=False
. If the user ID (UID) of the file is not found, aKeyError
is raised.pathlib.Path.group()
returns the name of the group that owns the file. The behaviour for symlinks is the same as for
pathlib.Path.owner()
. And if the group ID (GID) of the file is not found, aKeyError
is also raised.pathlib.Path.chmod()
changes the file mode and permissions. Symlinks are normally followed. To change the symlink permissions, you can use
follow_symlinks=False
orpathlib.Path.lchmod()
.
Comparison with os
and os.path
¶
pathlib
implements objects withpathlib.PurePath
andpathlib.Path
, whileos
andos.path
work more procedurally with low-levelstr
andbytes
.Many functions in
os
andos.path
support paths relative to directory descriptors. These functions are not available inpathlib
.str
andbytes
, as well as parts ofpython3:os-os
andos.path
, are written in C and are very fast.pathlib
, on the other hand, is written in Python and is often slower, but this does not always matter.
Despite the differences, many os functions can be translated into corresponding
pathlib.Path
or pathlib.PurePath
functions:
Checks¶
Uses the functions of the
pathlib
module to take a path to a file namedexample.log
and create a new file path in the same directory for a file namedexample.log1
.Open a file
my_file.txt
and insert additional text at the end of the file. Which command would you use to openmy_file.txt
? Which command would you use to reopen the file and read it from the beginning?If you look at the man page for the wc utility, you will see two command line options:
-c
counts the bytes in the file
-m
counts the characters, which in the case of some Unicode characters can be two or more bytes long
Also, if a file is specified, our module should read from and process that file, but if no file is specified, it should read from and process
stdin
.If a context manager is used in a script that reads and/or writes multiple files, which of the following approaches do you think would be best?
Put the entire script in a block managed by a
with
statement.Use one
with
statement for all reads and another for all writes.Use a
with
statement every time you read or write a file, that is, for every line.Use a
with
statement for each file you read or write.
Archive
*.txt
files from the current directory in thearchive
directory as*.zip
files with the current date as the file name.Which modules do you need for this?
Write a possible solution.