Files and paths

At the start of the course, we learned how to manipulate strings, and how to read/write files. In this lecture, we go over a few useful features of Python that make it easier to deal with lists of files, as well as formatting data into strings (which can be useful for e.g. constructing filenames or writing data)

The glob module

In the Linux command-line, it is possible to list multiple files matching a pattern with e.g.:

$ ls *.py

This means list all files ending in .py.

The built-in glob module allows you to do something similar from Python. The only important function here in the glob module is also called glob.

This function can be given a pattern (such as *.py) and will return a list of filenames that match:

In [ ]:
import glob
glob.glob('*.ipynb')

The os module

The os module allows you to interact with the system, and also contains utilities to construct or analyse file paths. The os.path sub-module is particularly useful for accessing files - for example,

In [ ]:
import os
os.path.exists('test.py')

can be used to find out if a file exists.

When constructing the path to a file, for example data/file.txt, one normally has to worry about whether this file is a Linux/Mac or a Windows file path (since Linux/Mac use / and Windows uses \). However, the os module allows you to construct file paths without worrying about this:

In [ ]:
os.path.join('data', 'file.txt')

This can be combined with glob, for example:

glob.glob(os.path.join('data', '*.txt'))

The os module also has other useful functions which you can find about from the documentation.

Exercise

The os.path.getsize function can be used to find the size of a file in bytes. Do a loop over all the files in the current directory using glob and for each one, print out the filename and the size in kilobytes (1024 bytes):

In [ ]:
# your solution here