Files and directories

pathlib implements path operations using pathlib.PurePath and pathlib.Path objects. The os and os.path modules, on the other hand, offer functions that work at a low level with str- and bytes which is more in line with a procedural approach. We consider the object-oriented style of pathlib be more readable and therefore present it in more detail here.

See also

Reading and writing files

In Python, you open and read a file by using the pathlib.Path.open() function and various built-in read operations. pathlib.Path.open() opens the file to which the path refers, as the built-in open() function does. The following short Python programme reads a line from a text file named myfile.txt in docs/save-data/:

1>>> from pathlib import Path
2>>> p = Path("docs", "save-data", "myfile.txt")
3>>> f = p.open()
4>>> headline = f.readline()
Line 2:

The arguments of pathlib.Path are path segments, either as PosixPath or WindowsPath. In the previous example, you open a file that you assume is located relative to your call in docs/save-data/myfile.txt.

The following example opens a file at an absolute location – C:\My Documents\myfile.txt:

2>>> p = Path("c:", "Users", "Veit", "My Documents", "myfile.txt")
3>>> with p.open() as f:
4...     f.readline()
5...

Note

In this example, the keyword with is used, which means that the file is opened with a context manager, as explained in more detail in Context management with with. This way of opening files handles potential I/O errors better and should generally be preferred.

Line 3:

pathlib.Path.open() does not read anything from the file, but returns a file object that you can use to access the opened file. It keeps track of a file and how much of the file has been read or written. All file operations in Python are performed with file objects, not file names.

Line 4:

The first call to readline() returns the first line of the file object, that is, everything up to and including the first line break, or the entire file if there is no line break in the file; the next call to readline() returns the second line, if it exists, and so on. When there is nothing left to read, readline() returns an empty string.

This behaviour of readline() makes it easy to determine, for example, the number of lines in a file:

>>> with p.open() as f:
...     lc = 0
...     while f.readline() != "":
...         lc = lc + 1
...     print(lc)
...
2

A shorter way to count all lines is to use the built-in readlines() method, which reads all lines of a file and returns them as a list of strings, with one string per line:

>>> with p.open() as f:
...     print(len(f.readlines()))
...
2

However, if you count all lines in a large file, this method can cause memory to overflow because the entire file is read at once. It is also possible for memory to overflow with readline() if you try to read a line from a large file that does not contain line break characters. To better handle such situations, both methods have an optional argument that affects the amount of data read at a time. Another way to iterate over all lines of a file is to treat the file object as an iterator in a for loop:

>>> with p.open() as f:
...     lc = 0
...     for l in f:
...         lc = lc + 1
...     print(lc)
...
2

This method has the advantage that the lines are read into memory as needed, so even with large files, there is no need to worry about running out of memory. The other advantage of this method is that it is simpler and more readable.

However, a potential problem with the read method can arise if translations are performed in text mode on Windows and macOS when you use the open() command in text mode, in other words, without appending a b. In text mode, macOS converts every \r to \n, while Windows converts \r\n pairs to \n. You can specify how line breaks are handled by using the newline parameter when opening the file and specifying newline="\n", \r or \r\n, which will only use that string as a line break:

>>> with p.open(newline="\r\n") as f:
...     lc = 0
...

In this example, only \n is interpreted as a line break. However, if the file was opened in binary mode, the newline parameter is not necessary, as all bytes are returned exactly as they appear in the file.

pathlib.Path.read_text()

returns the decoded content of the specified file as a string:

>>> p.read_text()
'This is the first line of myfile.\nAnd this is another line.\n'
pathlib.Path.write_text()

opens the specified file in text mode, writes data to it, and closes the file:

>>> p.write_text("New content")
11
>>> p.read_text()
'New content'

An existing file with the same name will be overwritten.

Reading directories

pathlib.Path.iterdir()

If the path refers to a directory, the path objects of the directory contents are returned:

>>> p = Path("docs", "save-data")
>>> for child in p.iterdir():
...     child
...
PosixPath('docs/save-data/index.rst')
PosixPath('docs/save-data/minidom_example.py')
PosixPath('docs/save-data/pickle.rst')
PosixPath('docs/save-data/xml.rst')
PosixPath('docs/save-data/books.xml')
PosixPath('docs/save-data/files.rst')

The child objects are returned in arbitrary order, and the special entries . and .. are not included. If the path is not a directory or is otherwise inaccessible, an OSError is raised.

pathlib.Path.glob()

finds the specified relative pattern in the directory represented by this path and returns all matching files:

>>> sorted(p.glob("*.rst"))
[PosixPath('docs/save-data/files.rst'), PosixPath('docs/save-data/index.rst'), PosixPath('docs/save-data/pickle.rst'), PosixPath('docs/save-data/xml.rst')]

See also

Pattern language

pathlib.Path.rglob()

recursively finds the specified relative pattern. This corresponds to calling with **/ before the pattern.

pathlib.Path.walk()

generates the file names in a directory structure by traversing the structure either from top to bottom or from bottom to top. It returns a 3-tuple consisting of (dirpath, dirnames, filenames).

With the default setting of the optional argument top_down=True, the triple for a directory is generated before the triples for its subdirectories.

With follow_symlinks=True, symlinks are resolved and placed in dirnames and filenames according to their targets.

The following example shows the size of the files in a directory, ignoring __pycache__ directories:

>>> for root, dirs, files in p.walk():
...     print(
...         root,
...         "consumes",
...         sum((root / file).stat().st_size for file in files),
...         "bytes in",
...         len(files),
...         "non-directory files",
...     )
...     if "__pycache__" in dirs:
...         dirs.remove("__pycache__")
...
docs/save-data consumes 88417 bytes in 13 non-directory files
docs/save-data/sqlite consumes 35187 bytes in 19 non-directory files

The next example is a simple implementation of shutil.rmtree(), whereby the directory tree must be traversed from bottom to top, as pathlib.Path.rmdir() only allows a directory to be deleted if it is empty:

>>> for root, dirs, files in p.walk(top_down=False):
...     for name in files:
...         (root / name).unlink()
...     for name in dirs:
...         (root / name).rmdir()
...

Creating files and directories

pathlib.Path.touch()

creates a file at the specified path. mode can be used to specify the file mode and access flags. If the file already exists, the modification time is updated to the current time if exist_ok=True, otherwise a FileExistsError is raised.

Note

pathlib.Path.open() or pathlib.Path.write_text() are also often used to create files.

pathlib.Path.mkdir()

creates a new directory under the specified path. The parameters mode and exist_ok work as specified in pathlib.Path.touch().

If parents=True, missing parent directories of the path are created as needed with the default permissions. With the default setting parents=False, however, FileNotFoundError is triggered.

RRenaming, copying and deleting

pathlib.Path.rename()

renames the file or directory to the specified destination and returns a new pathlib.Path instance that points to the destination. On Unix, if the destination exists and is a file, it is simply replaced; on Windows, a FileExistsError is raised.

>>> myfile = Path("docs", "save-data", "myfile.txt")
>>> newfile = Path("docs", "newdir", "newfile.txt")
>>> myfile.rename(newfile)
PosixPath('docs/newdir/newfile.txt')

Added in version 3.14: Python 3.14, the methods pathlib.Path.copy(), pathlib.Path.copy_into(), pathlib.Path.move() and pathlib.Path.move_into() are added.

Permissions and ownership

pathlib.Path.owner()

returns the name of the person who owns the file. Normally, symlinks are followed, but if you want to determine the person who owns the symlink, add follow_symlinks=False. If the user ID (UID) of the file is not found, a KeyError is raised.

pathlib.Path.group()

returns the name of the group that owns the file. The behaviour for symlinks is the same as for pathlib.Path.owner(). And if the group ID (GID) of the file is not found, a KeyError is also raised.

pathlib.Path.chmod()

changes the file mode and permissions. Symlinks are normally followed. To change the symlink permissions, you can use follow_symlinks=False or pathlib.Path.lchmod().

Comparison with os and os.path

  • pathlib implements objects with pathlib.PurePath and pathlib.Path, while os and os.path work more procedurally with low-level str and bytes.

  • Many functions in os and os.path support paths relative to directory descriptors. These functions are not available in pathlib.

  • str and bytes, as well as parts of python3:os-os and os.path, are written in C and are very fast. pathlib, on the other hand, is written in Python and is often slower, but this does not always matter.

Despite the differences, many os functions can be translated into corresponding pathlib.Path or pathlib.PurePath functions:

Checks

  • Uses the functions of the pathlib module to take a path to a file named example.log and create a new file path in the same directory for a file named example.log1.

  • Open a file my_file.txt and insert additional text at the end of the file. Which command would you use to open my_file.txt? Which command would you use to reopen the file and read it from the beginning?

  • If you look at the man page for the wc utility, you will see two command line options:

    -c

    counts the bytes in the file

    -m

    counts the characters, which in the case of some Unicode characters can be two or more bytes long

    Also, if a file is specified, our module should read from and process that file, but if no file is specified, it should read from and process stdin.

  • If a context manager is used in a script that reads and/or writes multiple files, which of the following approaches do you think would be best?

    1. Put the entire script in a block managed by a with statement.

    2. Use one with statement for all reads and another for all writes.

    3. Use a with statement every time you read or write a file, that is, for every line.

    4. Use a with statement for each file you read or write.

  • Archive *.txt files from the current directory in the archive directory as *.zip files with the current date as the file name.

    • Which modules do you need for this?

    • Write a possible solution.