Modules for files

Built-in modules

The Python standard library contains a number of built-in modules that you can use to manage files:

Module

Description

os.path

performs common pathname manipulations

pathlib

manipulates pathnames

fileinput

iterates over multiple input files

filecmp

compares files and directories

tempfile

creates temporary files and directories

glob, fnmatch

use UNIX-like path and file name patterns

linecache

randomly accesses lines of text

shutil

performs higher level file operations

mimetypes

Assignment of file names to MIME types

pickle, shelve

enable Python object serialisation and persistence, see also The pickle module

csv

reads and writes CSV files

json

JSON encoder and decoder

sqlite3

provides a DB-API 2.0 interface for SQLite databases, see also The sqlite module

xml, xml.parsers.expat, xml.dom, xml.sax, xml.etree.ElementTree

reads and writes XML files, see also R:doc:../save-data/xml

html.parser, html.entities

Parsing HTML and XHTML

configparser

reads and writes Windows-like configuration files (.ini)

base64, binascii, quopri, uu

encodes/decodes files or streams

struct

reads and writes structured data to and from files

zlib, gzip, bz2, zipfile, tarfile

for working with archive files and compressions

pandas IO tools

Checks

  • What use cases can you imagine in which the struct module would be useful for reading or writing binary data?

    • when reading and writing a binary file

    • when reading from an external interface, where the data should be stored exactly as it was transmitted

  • Why pickle may or may not be suitable for the following use cases:

    1. Saving some state variables from one run to the next ✅

    2. Storing evaluation results ❌, as pickle is dependent on the respective Python version

    3. Saving user names and passwords ❌, as pickles are not secure

    4. Saving a large dictionary with English terms ❌, as the entire pickle would have to be loaded into memory