Cannot write files to Google Cloud Storage from Google Compute Engine VMs ?
Recently while working on this project on Google cloud, as part of me trying to learn Google cloud, I ran into this issue, where I had a bash script and I was trying to get it to run on a Debian Linux 10 Google Compute Engine VM Instance, as part of the VM startup process (Startup Script). In that script, I was trying to do the following:
- Update the software packages on the GCP instance using ‘apt’ (for those who don’t know, it is the command-line tool for handling software packages on a Debian / Ubuntu Linux systems).
- Download and bring up the Google cloud logging agent on the GCP instance. (The logging agent, when configured and brought up, pushes syslog and other logs to the Operations Logging service on GCP, to be able to externally monitor the logs, even if the GCP instance itself is stopped / deleted).
- Install Apache web server on the GCP instance.
- Read a few php files from a Google Cloud Storage bucket and write them to a local directory on the GCP instance so that they could be served up from the Apache web server. Bring up Apache web server.
- Lastly, I was trying to check that the web server was successfully brought up on the GCP instance using curl and I was trying to write a file to a Google Cloud Storage bucket. The file was just a one liner stating that the webserver was up and running.
For those who don’t know, Google Cloud Storage is an object storage service inside GCP like S3 in AWS S3 and Blob storage in Azure. It has a flat structure where files can be stored inside globally uniquely named buckets (kinda like a directory inside a regular file system).
Getting back to the startup script, it was getting executed just fine and I was able to externally access the web pages being serviced by the Apache web server on that instance, but the file that I was trying to write to the Google Cloud Storage bucket, at the end of that script, just wasn’t showing up inside the bucket ! Coming from an AWS background, I expected things to work in a certain way, but to my absolute bewilderment they were not !
Now, you might be thinking, why am I trying to do all this in a startup script and not doing it manually by logging into the instance and doing it all from the command line, but let me tell you that I was actually trying to setup a managed instance group (MIG) inside Google cloud. Managed instance groups allow us to auto scale and automatically bring up new VM instances inside Google cloud, if the currently serving instances get overloaded. This startup script was part of an instance template that was getting used by the managed instance group. Instance templates is a way for us to specify a blue print that will be used to bring up new instances inside the managed instance group, to serve our workload and it has all the configuration needed to bring up new VM instances of the same type.
Now, getting back to the inability to write a file to Google cloud storage bucket from an instance, I followed the below steps to get past this issue:
- First of all, I went to the logs explorer inside the Operations Logging service on the GCP console and checked the log messages getting written from the VM instance. Based on the log messages, it seemed that the startup script was working fine and it was not getting booted out due to an error before actually writing the file to cloud storage. BTW, you have to always keep an eye out for anything inside the Operations monitoring and logging services inside GCP to be able to troubleshoot, investigate and debug any issues that you might face while working on a GCP project.
- Coming from an AWS background, I thought that may be the default service account getting used by the instances, does not the permissions to write files to the cloud storage bucket. For those who don’t know, service accounts are IAM identities that are used by the VM instances to access other services inside GCP, like cloud storage. Like people use usernames and password to access the Google cloud services, the services themselves use service account to access other services. I logged into the GCP instance using ssh and tried to manually write a file into the cloud storage and I got a 403 insufficient permissions error and I thought that was it and the issue was with the VM instance service account not having the permission to write to cloud storage. So, I created a new service account for the instance VMs and made sure that it had the correct role (permissions) assigned to be able to read and write to cloud storage. To my absolute surprise, after I updated the instances to use this new service account, the file was still not getting written to cloud storage from the startup script ! I checked the service account again and followed the manual process of logging into the instance and trying to write a file to cloud storage, but I was still seeing the 403 insufficient permissions issue. :(
- I just did not know what was happening and I was about ready to pull my hair out when I saw the below section inside the VM instance details.
Google cloud has this concept of API access scopes, where by we select certain Google cloud API actions, while creating a new VM instance and that VM only has access to those Google cloud API actions. The default API access scope that I was using while creating new VM instances, only had read only access to cloud storage. So, even though the service account that I was using for the VM instances, had the permission to read and write to cloud storage, the instance still wasn’t be able to write to cloud storage, because the correct API access scope for reading and writing to cloud storage, wasn’t assigned to it.
So, I had to edit the instance template and switch to ‘Set access for each API’ access scope and select ‘Read Write’ from the options inside the ‘Storage’ dropdown, for the VM instance to be able to successfully call the Read and Write cloud storage API actions, as we can see in the below screenshots.
And voila, I was finally able to see the files getting written to cloud storage from the VM instances !
So, in order to be able to access other Google cloud services including cloud storage, from inside a Google compute engine VM instance we need to make sure of the following :
- We need to make sure that the service account being used by the VM instance has the correct role assigned to it, for it to be able to access the other GCP services it needs to.
- We need to also make sure that the access scopes for that VM instance has the correct API actions included for the VM instance to be able to successfully call those API actions. We have seen how to do this in the previous 2 screenshots.
As a side note, let me remind you guys (if you don’t already know about it) that if you open a new Google cloud account you get $300 free credit that you can use during your first 90 days to play around and get to know the different services being offered on the Google cloud platform.
GCP Free Tier - Free Extended Trials and Always Free | Google Cloud
20+ free products Get free hands-on experience with popular products, including Compute Engine and Cloud Storage, up to…
Besides that, there are 20+always free products that you can always use for free (there is a monthly limit, of course) on the Google cloud !
Below is a link from the Google clouds documentation if you want to read up more about service accounts and access scopes.
Service accounts | Compute Engine Documentation | Google Cloud
This page describes service accounts and service account permissions, which can be limited by both access scopes that…
Apparently, there is a way to move away from the access scopes, which are apparently legacy now. I wonder why do access scopes still effect newly created VM instances, if they are legacy now. Below is another link to Google’s documentation if you want to read up more about access scopes.
Migrating from legacy access scopes | Kubernetes Engine Documentation
This page explains changes in access scopes for clusters running Kubernetes version 1.10 and higher. These changes only…
Enjoy and keep learning :)…