How to Mount a GCP Bucket in Google Colab

Jerri Zhang
3 min readNov 11, 2020

--

Google Colab has become quite popular among data science beginners and data science students considering the free offering of GPU instances. We will be focusing on how to handle large data files in Google Colab by mounting a GCP bucket to the Colab notebook. It will allow you to load files and write files to GCP bucket directly from your Colab notebook.

Requirements

  • Gmail account
  • Google Cloud Platform account
  • Google Cloud Storage access and permissions
  • Google Colab notebooks

Mount GCP bucket in Google Colab

To start, you can upload your local image files to the destined GCP bucket using the following command. -m performs a parallel multi-threaded/multi-processing copy and speed up the file transfer.

gsutil -m cp -r local_directory gs://bucket_name/

Content in my GCP bucket. train2014/ contains 80k+ images.

Open your Google Colab, and run the following commands in a code block. The code will return a link for you to log in as a gmail user, and copy an verification code to authenticate yourself.

from google.colab import auth

auth.authenticate_user()

Run the following commands to install gcsfuse.

!echo “deb http://packages.cloud.google.com/apt gcsfuse-bionic main” > /etc/apt/sources.list.d/gcsfuse.list

!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

!apt -qq update

!apt -qq install gcsfuse

The following commands create a directory local to the Colab notebook, and mount the GCP bucket to the directory. --implicit-dirs mounts the directories in the GCP bucket, instead of just the files. gcp_bucket_name is the name of the GCP bucket without the gs:// prefix. For exmaple, your bucket path is gs://my_bucket, then the gcp_bucket_name in the following command should be just my_bucket.

!mkdir colab_direcoty

!gcsfuse --implicit-dirs gcp_bucket_name colab_direcoty

You can check the content of your GCP bucket now by running ls in the Colab notebook.

!ls colab_direcoty

Content in the mounted directory

Done! Now you can Image.open() directly using the file path in the mounted GCP bucket. What’s even better is that when you save your model weights or a serialized model to the mounted colab_direcoty, it also get saved to the GCP bucket:)

Open an image stored in GCP bucket directly in Google Colab
Files can be saved directly to GCP bucket in Colab.

--

--

Jerri Zhang
Jerri Zhang

Written by Jerri Zhang

Health Data Science at Harvard/DFCI

Responses (2)