Notebook 1 - Getting Started with PyTorch on Colab¶

Basic Colab Usage¶

Colab is a service provided by Google which provides access to computing resources (like GPUs) which are useful for training and using neural networks. A Colab notebook consists of:

  • Code cells which accept:
  • Ordinary Python code
  • Unix/Linux style bash commands - these must be preceded by an exclamation mark: !command-name arguments...
  • Notebook-specific "magic commands" - these must be preceded by a percent sign: %command-name arguments...
  • Text (markdown) cells

Importantly, we can set the runtime type using the Runtime menu at the top of the notebook:

Runtime > Change runtime type > T4 GPU

will give you access to a cheap GPU. Feel free to do this.

You can also check your CPU, Disk, and GPU memory usage in the upper right corner; click the box that says RAM and Disk.

The following cells can be used to check what Python version Colab is using.

Unfortunately, the Python version used in the bash command can disagree with what is used by Python cells. This won't be an issue if you don't work with virtual environments or try to change versions.

If you decide to change versions or use venvs for any reason, be careful!

In [ ]:
!python --version # check what version of Python bash commands use
In [ ]:
import sys # check what version of Python the Python cells use
sys.version

In this notebook, we'll need to use the deep learning library PyTorch.

We can ensure these are installed (or install them if not already installed) using the package manager pip. To be safe, use pip3 in most cases - this ensures you are installing packages for use with Python 3.

We will also use google.colab, which of course is pre-installed in Colab environments.

In [ ]:
!pip3 check torch # check that torch is installed - it should be!

If at any point other packages/libraries need to be installed, and Colab throws an error, use !pip3 install package-name**.

Mounting your Google Drive¶

Colab allows you to access your files via Google Drive. This is useful when working with large datasets. Simply store them in a drive you have access to and you'll never have to download them again!

Let's see how:

In [ ]:
# Mount your Google Drive - will show up in the folder 'drive'
from google.colab import drive
drive.mount('/content/drive') # this line will produce a pop-up

You can check the contents of your Drive as follows. You can also use the icon that looks like a folder on the left-hand side of the screen.

In [ ]:
# List the contents of your Google Drive.
!ls "/content/drive/My Drive/"

Data¶

Before we get into PyTorch, we should load up some data so we have something to actually learn from.

An easy-to-use dataset is the Iris dataset, which consists of a set of measurements taken of many iris flowers paired with the flowers' species. It is readily available from sklearn.

In [5]:
from sklearn.datasets import load_iris

data = load_iris()
data['data'].shape, data['target'].shape
Out[5]:
((150, 4), (150,))

sklearn formats the Iris dataset as two arrays.

The first, called 'data', contains 150 rows and 4 columns. This is meant to be interpreted as 150 samples or instances of data, with each having 4 features (sepal length, sepal width, petal length, petal width):

In [6]:
print(data['data'].shape)
data['data'][:5] # look at 5 iris' measurements
(150, 4)
Out[6]:
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

The second, called 'target' is a single vector of 150 entries, which we can interpret as 150 rows and 1 column. Note that there are three classes (0 for Setosa, 1 for Versicolour, or 2 for Virginica) for each 4-feature input:

In [7]:
print('instances:', data['target'].shape)
print('number of unique classes:', len(set(data['target'])))
data['target'][:5] # first 5 are all class 0 (Setosa)
instances: (150,)
number of unique classes: 3
Out[7]:
array([0, 0, 0, 0, 0])

Great! Let's try and pass the data through a neural net!

Writing a model¶

PyTorch is used to build, train, and run deep neural networks. Because the nn (neural network) and nn.functional (common mathematical functions used in neural networks) submodules are used so often, it is common practice to do the following:

In [8]:
import torch
from torch import nn
from torch.nn import functional as F

Let's build a simple neural network and try to complete a forward pass. In PyTorch, a neural network is written as follows:

  • must be a Python class
  • must subclass (inherit from) torch.nn.Module
  • must have an __init__() method (a constructor), in which ALL trainable model parameters are defined
  • must have a forward() method that defines the behavior of the model (how an input is passed through the layers)
In [9]:
class Net(nn.Module):
    def __init__(self):

        # do whatever must be done for an nn.Module
        super(Net, self).__init__()

        # 4D input, 3D output
        self.linear = nn.Linear(4, 3)

    def forward(self, x):

        # just pass the input through the linear layer
        return self.linear(x)

This is a very simple neural network. In fact, it's hard to even call it a neural network because it is not deep - it has only one layer. This is commonly called a lienar model.

Nonetheless, let's try to see what it looks like. I'm going to add a seed as well, so that we all get the same numbers every time (reproducibility):

In [10]:
torch.manual_seed(42)

model = Net() # make an instance of the Net class we just wrote
model # take a look
Out[10]:
Net(
  (linear): Linear(in_features=4, out_features=3, bias=True)
)

4 input features, 3 output features. How many parameters?

In [11]:
s = 0

# iterate over model's parameters
for name, param in model.named_parameters():
    print(name, param.shape)
    print(param, '\n\n\n')
    s += param.numel() # count elements in a tensor
print('total parameters:', s)
linear.weight torch.Size([3, 4])
Parameter containing:
tensor([[ 0.3823,  0.4150, -0.1171,  0.4593],
        [-0.1096,  0.1009, -0.2434,  0.2936],
        [ 0.4408, -0.3668,  0.4346,  0.0936]], requires_grad=True) 



linear.bias torch.Size([3])
Parameter containing:
tensor([0.3694, 0.0677, 0.2411], requires_grad=True) 



total parameters: 15

We have 15 including the bias term. This model represents what's called an affine transformation (one that takes something like $\mathbf{x}$ and maps it to something like $W\mathbf{x}+\mathbf{b}$:

$$ W\mathbf{x} + \mathbf{b} = \hat{\mathbf{y}} $$

$$ \begin{bmatrix} 0.3823 & 0.4150 & -0.1171 & 0.4593 \\ -0.1096 & 0.1009 & -0.2434 & 0.2936 \\ 0.4408 & -0.3668 & 0.4346 & 0.0936 \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ \end{bmatrix} + \begin{bmatrix} 0.3694 \\ 0.0677 \\ 0.2411 \\ \end{bmatrix} = \begin{bmatrix} \hat{y}_1 \\ \hat{y}_2 \\ \hat{y}_3 \\ \end{bmatrix} $$

Alternatively, we can let the first column of $W$ be the bias vector, and assign $\mathbf{x}$ a dummy feature in its first slot which is always 1:

$$ W\mathbf{x} = \hat{\mathbf{y}} $$

$$ \begin{bmatrix} 0.3694 & 0.3823 & 0.4150 & -0.1171 & 0.4593 \\ 0.0677 & -0.1096 & 0.1009 & -0.2434 & 0.2936 \\ 0.2411 & 0.4408 & -0.3668 & 0.4346 & 0.0936 \\ \end{bmatrix} \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ x_3 \\ x_4 \\ \end{bmatrix} = \begin{bmatrix} \hat{y}_1 \\ \hat{y}_2 \\ \hat{y}_3 \\ \end{bmatrix} $$

If we wanted to make this model a bit more powerful, it would be smart to (1) have multiple layers and (2) use a non-linearity between them so the model can learn non-linear functions:

In [12]:
class DeepNet(nn.Module):
    def __init__(self):

        # do whatever must be done for an nn.Module
        super(DeepNet, self).__init__()

        # 4D input, 3D output
        self.linear1 = nn.Linear(4, 8) # 8 is arbitrary
        self.linear2 = nn.Linear(8, 3) # must match here

    def forward(self, x):

        # pass through first layer
        x = self.linear1(x)

        # activation function, ReLU
        x = F.relu(x)

        # pass through second layer
        x = self.linear2(x)

        return x
In [13]:
deep_model = DeepNet()
deep_model
Out[13]:
DeepNet(
  (linear1): Linear(in_features=4, out_features=8, bias=True)
  (linear2): Linear(in_features=8, out_features=3, bias=True)
)
In [14]:
s = 0

# iterate over model's parameters
for name, param in deep_model.named_parameters():
    print(name, param.shape)
    print(param, end='\n\n')
    s += param.numel() # count elements in a tensor
print('total parameters:', s)
linear1.weight torch.Size([8, 4])
Parameter containing:
tensor([[-0.0706,  0.3854,  0.0739, -0.2334],
        [ 0.1274, -0.2304, -0.0586, -0.2031],
        [ 0.3317, -0.3947, -0.2305, -0.1412],
        [-0.3006,  0.0472, -0.4938,  0.4516],
        [-0.4247,  0.3860,  0.0832, -0.1624],
        [ 0.3090,  0.0779,  0.4040,  0.0547],
        [-0.1577,  0.1343, -0.1356,  0.2104],
        [ 0.4464,  0.2890, -0.2186,  0.2886]], requires_grad=True)

linear1.bias torch.Size([8])
Parameter containing:
tensor([ 0.0895,  0.2539, -0.3048, -0.4950, -0.1932, -0.3835,  0.4103,  0.1440],
       requires_grad=True)

linear2.weight torch.Size([3, 8])
Parameter containing:
tensor([[ 0.1464,  0.1118, -0.0062,  0.2767, -0.2512,  0.0223, -0.2413,  0.1090],
        [-0.1218,  0.1083, -0.0737,  0.2932, -0.2096, -0.2109, -0.2109,  0.3180],
        [ 0.1178,  0.3402, -0.2918, -0.3507, -0.2766, -0.2378,  0.1432,  0.1266]],
       requires_grad=True)

linear2.bias torch.Size([3])
Parameter containing:
tensor([ 0.2938, -0.1826, -0.2410], requires_grad=True)

total parameters: 67

This looks something like this, without writing out all the parameters:

$$ W_2\sigma(W_1\mathbf{x} + \mathbf{b}_1) + \mathbf{b}_2 = \hat{\mathbf{y}} $$

with:

$$ W_2 \in \mathbb{R}^{3\times8} $$ $$ W_1 \in \mathbb{R}^{8\times4} $$ $$ \mathbf{x} \in \mathbb{R}^{4\times1} $$ $$ \mathbf{b}_1 \in \mathbb{R}^{8\times1} $$ $$ \mathbf{b}_2 \in \mathbb{R}^{3\times1} $$

Note that an activation function such as ReLU operates element-wise, so $$X\in\mathbb{R}^{m\times n} \implies \sigma(X)\in\mathbb{R}^{m\times n}$$

Forward pass¶

Let's start by changing our data to a torch.Tensor. PyTorch works only on these specialized arrays, which are easily constructed from Python lists, numpy.ndarray objects or the like.

In [15]:
tensor_data = torch.Tensor(data['data'])
print('data shape:', tensor_data.shape)
print('\nsome rows in our data:', tensor_data[50:55], sep='\n')
data shape: torch.Size([150, 4])

some rows in our data:
tensor([[7.0000, 3.2000, 4.7000, 1.4000],
        [6.4000, 3.2000, 4.5000, 1.5000],
        [6.9000, 3.1000, 4.9000, 1.5000],
        [5.5000, 2.3000, 4.0000, 1.3000],
        [6.5000, 2.8000, 4.6000, 1.5000]])

As with other array-like objects, you can slice them to get rows or columns.

Now let's try and pass a single row through the DeepNet:

In [16]:
 # calling a model calls its forward() method
input_tensor = tensor_data[50]
output_tensor = deep_model(input_tensor)
print('input:', input_tensor, sep='\n')
print('input shape:', input_tensor.shape, end='\n\n')
print('output:', output_tensor, sep='\n')
print('output shape:', output_tensor.shape)
print('\nprediction:', output_tensor.argmax().item())
print('    target:', data['target'][50])
input:
tensor([7.0000, 3.2000, 4.7000, 1.4000])
input shape: torch.Size([4])

output:
tensor([ 0.8965,  0.0052, -0.6412], grad_fn=<ViewBackward0>)
output shape: torch.Size([3])

prediction: 0
    target: 1

We can also pass many rows through at once by batching:

In [17]:
batched_input_tensor = tensor_data[50:55]
batched_output_tensor = deep_model(batched_input_tensor)
print('input:', batched_input_tensor, sep='\n')
print('input shape:', batched_input_tensor.shape, end='\n\n')
print('output:', batched_output_tensor, sep='\n')
print('output shape:', batched_output_tensor.shape)
print('\nprediction:', batched_output_tensor.argmax(dim=1).numpy())
print('    target:', data['target'][50:55])
input:
tensor([[7.0000, 3.2000, 4.7000, 1.4000],
        [6.4000, 3.2000, 4.5000, 1.5000],
        [6.9000, 3.1000, 4.9000, 1.5000],
        [5.5000, 2.3000, 4.0000, 1.3000],
        [6.5000, 2.8000, 4.6000, 1.5000]])
input shape: torch.Size([5, 4])

output:
tensor([[ 0.8965,  0.0052, -0.6412],
        [ 0.8701, -0.0025, -0.6034],
        [ 0.8821, -0.0281, -0.6684],
        [ 0.7510, -0.0449, -0.5795],
        [ 0.8383, -0.0217, -0.6428]], grad_fn=<AddmmBackward0>)
output shape: torch.Size([5, 3])

prediction: [0 0 0 0 0]
    target: [1 1 1 1 1]

Our model predicts 0 for everything, but should predict 1 for these flowers - what's wrong?

Let's supply some trained weights:

In [ ]:
trained_weights_path = 'path/to/model/iris_model.pt'

deep_model.load_state_dict(
    torch.load(trained_weights_path)
)
batched_output_tensor = deep_model(batched_input_tensor)
print('prediction:', batched_output_tensor.argmax(dim=1).numpy())
print('    target:', data['target'][50:55])

Hardware considerations¶

Deep learning is computationally expensive. As such, it's good to be mindful of choices that will reduce your computational expense for financial, environmental, and efficiency reasons.

One thing we can do to be faster is to use a GPU.

GPUs are designed to perform vectorized linear algebra computations on arrays of numbers. These computations can be done much more efficiently than on a CPU. To take advantage of this, we must put all relevant objects on the GPU. This can be done using .to(device_name). Using 'cuda' as the device name will find the first GPU the system has detected.

In [ ]:
# move the input and model to GPU for speed if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
input_tensor = input_tensor.to(device) # out of place - must reassign with =
model.to(device) # models can be done in place
output = model(input_tensor)
output

The following cell will confirm that both the input tensor and the model are on the same GPU if you're in a GPU runtime environment. Otherwise, both with be on CPU.

In [ ]:
print(next(model.parameters()).device)
input_tensor.device

Training vs inference practices¶

By default, your model is in training mode. You can ensure training mode with:

In [ ]:
deep_model.train()

If you are done training and ready to:

  • evaluate on a dev set
  • evaluate on a test set
  • run inference on some data without updating your model's weights
  • deploy a model for some purpose

You should do two things:

  1. Set eval mode (turns off dropout, batch normalization, and any other things that are only supposed to be active during training):
In [ ]:
deep_model.eval()
  1. Use the with torch.no_grad() context manager to avoid computation of gradients. This computation adds significantly to the time needed to process an input, and is unnecessary unless we wish to update the model's parameters.
In [ ]:
with torch.no_grad():
    output = deep_model(input_tensor)
output

Be sure to always switch back to training mode before you continue to train!

More practice (if time)¶

Take a look at this dataset of hand-written digits.

Try to define a model architecture suited to these inputs and outputs.

In [19]:
from sklearn.datasets import load_digits
In [ ]:
# YOUR CODE HERE

Bonus¶

In [20]:
from sklearn.datasets import load_iris

data = load_iris()
data['data'].shape, data['target'].shape
Out[20]:
((150, 4), (150,))
In [21]:
X = torch.tensor(data['data'])
y = torch.tensor(data['target'])

# hold out rows 50-55
X_test = X[50:55]
y_test = y[50:55]

X_train = torch.cat((X[:50], X[55:]))
y_train = torch.cat((y[:50], y[55:]))
In [22]:
crit = nn.CrossEntropyLoss()
opt = torch.optim.SGD(model.parameters(), lr=0.1)
In [24]:
for i in range(1000):
    opt.zero_grad()
    loss = crit(model(X_train.float()), y_train.long())
    loss.backward()
    opt.step()
In [25]:
model(X_test.float()).argmax(dim=1).tolist(), y_test.tolist()
Out[25]:
([1, 1, 1, 1, 1], [1, 1, 1, 1, 1])
In [26]:
torch.save(model.state_dict(), 'iris_model.pt')