Files and directories¶
pathlib implements path operations using
pathlib.PurePath and pathlib.Path objects. The
os and os.path modules, on the other hand, offer
functions that work at a low level with str- and bytes which is more in
line with a procedural approach. We consider the object-oriented style of
pathlib be more readable and therefore present it in more detail
here.
See also
PEP 428: The pathlib module – object-oriented filesystem paths
Reading and writing files¶
In Python, you open and read a file by using the
pathlib.Path.open() function and various built-in read operations.
pathlib.Path.open() opens the file to which the path refers, as
the built-in open() function does. The following short Python
programme reads a line from a text file named myfile.txt in
docs/save-data/:
1>>> from pathlib import Path
2>>> p = Path("docs", "save-data", "myfile.txt")
3>>> f = p.open()
4>>> headline = f.readline()
- Line 2:
The arguments of
pathlib.Pathare path segments, either asPosixPathorWindowsPath. In the previous example, you open a file that you assume is located relative to your call indocs/save-data/myfile.txt.The following example opens a file at an absolute location –
C:\My Documents\myfile.txt:2>>> p = Path("c:", "Users", "Veit", "My Documents", "myfile.txt") 3>>> with p.open() as f: 4... f.readline() 5...
Note
In this example, the keyword
withis used, which means that the file is opened with a context manager, as explained in more detail in Context management with with. This way of opening files handles potential I/O errors better and should generally be preferred.- Line 3:
pathlib.Path.open()does not read anything from the file, but returns a file object that you can use to access the opened file. It keeps track of a file and how much of the file has been read or written. All file operations in Python are performed with file objects, not file names.- Line 4:
The first call to
readline()returns the first line of the file object, that is, everything up to and including the first line break, or the entire file if there is no line break in the file; the next call toreadline()returns the second line, if it exists, and so on. When there is nothing left to read,readline()returns an empty string.
This behaviour of readline() makes it
easy to determine, for example, the number of lines in a file:
>>> with p.open() as f:
... lc = 0
... while f.readline() != "":
... lc = lc + 1
... print(lc)
...
2
A shorter way to count all lines is to use the built-in readlines() method, which reads all lines of a file and
returns them as a list of strings, with one string per line:
>>> with p.open() as f:
... print(len(f.readlines()))
...
2
However, if you count all lines in a large file, this method can cause memory to
overflow because the entire file is read at once. It is also possible for memory
to overflow with readline() if you try to
read a line from a large file that does not contain line break characters. To
better handle such situations, both methods have an optional argument that
affects the amount of data read at a time. Another way to iterate over all lines
of a file is to treat the file object as an iterator in a for loop:
>>> with p.open() as f:
... lc = 0
... for l in f:
... lc = lc + 1
... print(lc)
...
2
This method has the advantage that the lines are read into memory as needed, so even with large files, there is no need to worry about running out of memory. The other advantage of this method is that it is simpler and more readable.
However, a potential problem with the read method can arise if translations are
performed in text mode on Windows and macOS when you use the open()
command in text mode, in other words, without appending a b. In text mode,
macOS converts every \r to \n, while Windows converts \r\n pairs to
\n. You can specify how line breaks are handled by using the newline
parameter when opening the file and specifying newline="\n", \r or
\r\n, which will only use that string as a line break:
>>> with p.open(newline="\r\n") as f:
... lc = 0
...
In this example, only \n is interpreted as a line break. However, if the
file was opened in binary mode, the newline parameter is not necessary, as
all bytes are returned exactly as they appear in the file.
pathlib.Path.read_text()returns the decoded content of the specified file as a string:
>>> p.read_text() 'This is the first line of myfile.\nAnd this is another line.\n'
pathlib.Path.write_text()opens the specified file in text mode, writes data to it, and closes the file:
>>> p.write_text("New content") 11 >>> p.read_text() 'New content'
An existing file with the same name will be overwritten.
Reading directories¶
pathlib.Path.iterdir()If the path refers to a directory, the path objects of the directory contents are returned:
>>> p = Path("docs", "save-data") >>> for child in p.iterdir(): ... child ... PosixPath('docs/save-data/index.rst') PosixPath('docs/save-data/minidom_example.py') PosixPath('docs/save-data/pickle.rst') PosixPath('docs/save-data/xml.rst') PosixPath('docs/save-data/books.xml') PosixPath('docs/save-data/files.rst')
The child objects are returned in arbitrary order, and the special entries .
and .. are not included. If the path is not a directory or is otherwise
inaccessible, an OSError is raised.
pathlib.Path.glob()finds the specified relative pattern in the directory represented by this path and returns all matching files:
>>> sorted(p.glob("*.rst")) [PosixPath('docs/save-data/files.rst'), PosixPath('docs/save-data/index.rst'), PosixPath('docs/save-data/pickle.rst'), PosixPath('docs/save-data/xml.rst')]
See also
pathlib.Path.rglob()recursively finds the specified relative pattern. This corresponds to calling with
**/before the pattern.pathlib.Path.walk()generates the file names in a directory structure by traversing the structure either from top to bottom or from bottom to top. It returns a 3-tuple consisting of
(dirpath, dirnames, filenames).With the default setting of the optional argument
top_down=True, the triple for a directory is generated before the triples for its subdirectories.With
follow_symlinks=True, symlinks are resolved and placed indirnamesandfilenamesaccording to their targets.The following example shows the size of the files in a directory, ignoring
__pycache__directories:>>> for root, dirs, files in p.walk(): ... print( ... root, ... "consumes", ... sum((root / file).stat().st_size for file in files), ... "bytes in", ... len(files), ... "non-directory files", ... ) ... if "__pycache__" in dirs: ... dirs.remove("__pycache__") ... docs/save-data consumes 88417 bytes in 13 non-directory files docs/save-data/sqlite consumes 35187 bytes in 19 non-directory files
The next example is a simple implementation of
shutil.rmtree(), whereby the directory tree must be traversed from bottom to top, aspathlib.Path.rmdir()only allows a directory to be deleted if it is empty:>>> for root, dirs, files in p.walk(top_down=False): ... for name in files: ... (root / name).unlink() ... for name in dirs: ... (root / name).rmdir() ...
Creating files and directories¶
pathlib.Path.touch()creates a file at the specified path.
modecan be used to specify the file mode and access flags. If the file already exists, the modification time is updated to the current time ifexist_ok=True, otherwise aFileExistsErroris raised.Note
pathlib.Path.open()orpathlib.Path.write_text()are also often used to create files.pathlib.Path.mkdir()creates a new directory under the specified path. The parameters
modeandexist_okwork as specified inpathlib.Path.touch().If
parents=True, missing parent directories of the path are created as needed with the default permissions. With the default settingparents=False, however,FileNotFoundErroris triggered.
Renaming, copying and deleting¶
pathlib.Path.rename()renames the file or directory to the specified destination and returns a new
pathlib.Pathinstance that points to the destination. On Unix, if the destination exists and is a file, it is simply replaced; on Windows, aFileExistsErroris raised.>>> myfile = Path("docs", "save-data", "myfile.txt") >>> newfile = Path("docs", "newdir", "newfile.txt") >>> myfile.rename(newfile) PosixPath('docs/newdir/newfile.txt')
Added in version 3.14: The methods pathlib.Path.copy(), pathlib.Path.copy_into(),
pathlib.Path.move() and pathlib.Path.move_into() are added.
See also
Permissions and ownership¶
pathlib.Path.owner()returns the name of the person who owns the file. Normally, symlinks are followed, but if you want to determine the person who owns the symlink, add
follow_symlinks=False. If the user ID (UID) of the file is not found, aKeyErroris raised.pathlib.Path.group()returns the name of the group that owns the file. The behaviour for symlinks is the same as for
pathlib.Path.owner(). And if the group ID (GID) of the file is not found, aKeyErroris also raised.pathlib.Path.chmod()changes the file mode and permissions. Symlinks are normally followed. To change the symlink permissions, you can use
follow_symlinks=Falseorpathlib.Path.lchmod().
Comparison with os and os.path¶
pathlibimplements objects withpathlib.PurePathandpathlib.Path, whileosandos.pathwork more procedurally with low-levelstrandbytes.Many functions in
osandos.pathsupport paths relative to directory descriptors. These functions are not available inpathlib.strandbytes, as well as parts ofpython3:os-osandos.path, are written in C and are very fast.pathlib, on the other hand, is written in Python and is often slower, but this does not always matter.
Despite the differences, many os functions can be translated into corresponding
pathlib.Path or pathlib.PurePath functions:
Checks¶
Uses the functions of the
pathlibmodule to take a path to a file namedexample.logand create a new file path in the same directory for a file namedexample.log1.Open a file
my_file.txtand insert additional text at the end of the file. Which command would you use to openmy_file.txt? Which command would you use to reopen the file and read it from the beginning?If you look at the man page for the wc utility, you will see two command line options:
-ccounts the bytes in the file
-mcounts the characters, which in the case of some Unicode characters can be two or more bytes long
Also, if a file is specified, our module should read from and process that file, but if no file is specified, it should read from and process
stdin.If a context manager is used in a script that reads and/or writes multiple files, which of the following approaches do you think would be best?
Put the entire script in a block managed by a
withstatement.Use one
withstatement for all reads and another for all writes.Use a
withstatement every time you read or write a file, that is, for every line.Use a
withstatement for each file you read or write.
Archive
*.txtfiles from the current directory in thearchivedirectory as*.zipfiles with the current date as the file name.Which modules do you need for this?
Write a possible solution.