PyTorch ML Pipeline: Training PyTorch model on GCP

Jieun Jeon
5 min readOct 2, 2021

There are 2 options when you want to use PyTorch for training your models on Google Cloud Platform. I will go over the key features of each option and also go over how to set them up on a GCP project.

Table of Contents:

  1. Cloud Datalab (for prototyping)
  2. GCP Deep Learning VM instance (with Cuda GPU)
  3. Setting up Cloud Datalab with PyTorch
  4. Setting up Deep Learning VM instance with PyTorch

1. Cloud Datalab (for prototyping)

If you want to prototype a model quickly, Google offers Cloud Datalab, which offers hosted Datalab notebooks (just like other Jupyter notebooks)

Jupyter notebooks hosted on Compute Engine VM instances. Datalab is packaged as a Docker container. It also contains jupyter notebooks and all other libraries and frameworks (Pytorch, Tensorflow, etc…)

Key features of Cloud Datalab is:

  1. The standard way to run Python on GCP
  2. Authentication built-in
  3. Jupyter notebooks
  4. Able to execute Python, SQL and JS (for BigQuery UDFs)
  5. Able to connect to a Git Repository: Code Integrated with Cloud Source Repositories, and code is auto-saved to local persistent disk

Key features of Deep Learning VM is:

  1. Datalab packaged as container running on VM
  2. VM is accessible to all users in the project (just like any other VM)

2. GCP Deep Learning VM instance (with Cuda GPU)

If you want to build and train PyTorch models with GPU support then you’ll want to use Deep Learning VM.

Key features of Deep Learning VM is:

  1. Powerful Google Compute Engine VM instance
  2. Pre-installed with TensorFlow, PyTorch, and scikit-learn
  3. Integrated with JupyterLab, web-based interface for Jupyter notebooks

3. Setting up Cloud Datalab with PyTorch

  1. Create new GCP Project

From the GCP console, you can create a brand new GCP project like following

I used my school account, so my school organization is auto-selected.

2. Enable APIs : Compute Engine API and Cloud Source Repositories API

3. Create Cloud Datalab VM on Compute Engine

Open Cloud shell to install Datalab commands

Then you need to specify a region, you can choose “asia-northease3” since this is the region for Seoul, Korea.

Let’s see how you can access other GCP resources within Datalab.

You can use this Cloud Datalab just like other notebooks such as Colab and Jupyter notebooks, but this provides much more functionality (provide other GCP resources within the notebook).

4. Setting up Deep Learning VM instance with PyTorch

  1. Select the same GCP project “pytorch-classification-project” and browse AI Platform -> Notebooks.

This “Notebooks” is a quick way to use notebook instances on a deep learning VM.

2. Create a new instance of a deep learning VM, and you can set this up for PyTorch or TensorFlow.

Notice you have two options to choose from, deep learning VM without GPU and with GPU.

Then you can configure your notebook instance. You can choose the appropriate region here, and I checked to install NVIDIA GPU driver automatically so I don’t need to do this later.

On this JupyterLab, you can also find tutorials from PyTorch. If you click the tutorial, you can use this JupyterLab just like any other Jupyter notebook on your local machine.

Note that since this notebook instance already has a GPU installed, you’ll find that `torch.cuda.is_available` will be true.

Remember we have GPU support in this VM, so when you are training your model on this notebook, you need to make sure that you instantiate your tensors on the GPU, on the Cuda device.

```

device = torch.device(“cuda”)

```

Also, when you train and test your model, make sure that your training and test labels are also on the CUDA device. Remember that PyTorch does not support operations on tensors are on different GPU devices.

This also means that you need to save the network and model parameters onto the same CUDA device.

```

model.to(device)

```

The above code will transfer your neural network parameters to your GPU.

To run our own PyTorch model on this deep learning VM, you can upload your own data and .ipynb notebooks using the upload feature.

So far we have learned how to create notebooks on GCP to train the model with Cloud resources. Next, I will write about the inference of the PyTorch model on GCP.

--

--