An Anxious Engineer's Digital Diary

Hello, World!

Konstantin Andrikopoulos November 10, 2024 #gcp #terraform #about-me

Welcome to Overthinking Compiler, my new blog and my latest attempt at taking my overthinking habit and compiling it into something productive!

For my first post, I thought it’d be fitting to cover how I managed to deploy this very blog. I didn’t go the “easy” route of choosing a managed platform. Instead, I took a detour through Google Cloud Platform (GCP), containers, and Terraform for infrastructure as code. Why? Because I like a good technical challenge. Plus, having some real world goal behind this, helped me push past my fear of making mistakes and actually finish what I started. Here’s a look at the journey, missteps included.

Serving the Blog

First things first. How is the site created and served? Well, as one of the goals was to beat my overthinking I kept things simple. At least I hope I did.

Since I want to keep things simple and not overthink, I am using a static site generator. Specifically Zola. It is written in Rust, a language I really, really like. I am customising the site with the abridged theme. I find it quite nice to look at! And while Zola can generate the static content, it can't serve it. For that I am using Static Web Server. A tiny and fast web server, also written in rust.

Since I also enjoy using containers, I generate the site and package it with the static server in a Dockerfile. This way I can build a container which can be deployed wherever!

Infrastructure

But the above are pretty standard. What really challenged me was creating the infrastructure for the blog. The reason was mainly that I chose the ☁️ cloud ☁️, which I have little experience with. Specifically GCP. To be perfectly honest (and as a disclaimer), I chose GCP because I am currently employed at Google, and I thought it would be quite easy to annoy some colleague if I wanted someone to explain things to me.

To manage my infrastructure, I am using terraform. I think this was a really important decision, even though, as you will soon see, it delayed me quite a bit while I was trying to learn how to use it. But as much as it confused me it was also a great stress reducer. You see, I always had anxiety about system administration. I would try to get something to work, change various things that may or may not have worked until I got it running. Then I wouldn't know what of the things I did was indeed necessary, nor would I remember all of them.

So a few weeks or months would pass, and I would try to change something. Often this change broke other stuff, or I wouldn't remember how I set it up originally and would have to figure it out from scratch. So I would tend to steer clear of system administration/configuration tasks because I would get anxious.

But having my infrastructure as code gives me a bit more courage that I have all of my configuration in a single place and I can easily edit it. Or at least that's the idea, who knows how this whole thing will evolve months (years?) later.

Overthought

Maybe I should try nix sometime... this will make things easier... right?

The architecture

Well it's just a Blog so don't expect anything crazy, O.K.?

Since I wanted to deploy just a simple and lightweight container, I chose to use a Cloud Run service. Cloud Run is a serverless product which allows you to run custom containers.

But it is not quite that simple yet. In order to serve the service to the public properly, and have a custom domain I needed to create a Load Balancer (LB). As far as I understand from the way I had to set up the LB, it is basically a proxy that sits in front of your services, terminates any TLS connection, and then forwards the requests to the corresponding service inside your project.

Overthought

I don't have a good way to automate the creation of the DNS records when recreating the LB. I assume it is possible to do that with terraform, but I haven't figured it out yet.

Maybe part of the problem is that I am using GoDaddy as my registrar. I should probably move my domain to a better alternative like Cloudflare. With a very quick search I see that terraform has resources to add a Cloudflare DNS records.

Then to automate deployment of the blog I have set up a Cloud Build trigger which detects pushes to a GitHub repo, where I store my blog. Then it triggers an automation that builds the container and deploys it to the Cloud Run service.

The code, configuration, and content of the blog are all stored in the same GitHub repo.

Overthought

In retrospect, maybe having the configuration and the content in the same repo was a mistake. I am thinking that if I want to push configuration changes, a build of the blog image would be triggered. Even though no changes to the blog content have been made. On the other hand a single repo keeps things simple. Maybe there is a way to have the best of both worlds, perhaps the trigger can only fire if a specific path has changes?

But as I said: No Overthinking, I went with the simple solution.

Finally, you may or may not know that Terraform requires a state file, which keeps tracks of what the deployment looks like. This is due to many reasons that you can read about here but in brief it has to do with the fact that it needs to:

As an extra benefit, a state file optimises the performance of terraform since it doesn't need to always query the provider to get all existing resources.

But useful as statefiles are, they are also somewhat of a liability. Since terraform relies on it to store its knowledge about your deployment, if you lose it you are toast! Terraform will think that it has never created the resources, resulting in all sorts of trouble. Oh boy.

So here I let my overthinking win and decided to find a proper solution to storing the state file before deploying. Fortunately this was quite easy, as terraform makes it easy to store your state file in a GCS bucket. And to make things even easier, the process is officially documented by GCP.

Where Things Got Ugly

This is all well and good, but as it is often the case, when it's time to apply things in practice, not everything goes smoothly. And this was the case with deploying my blog too, although part of the reason was that I was not very familiar with the tools I was using.

Connecting my Repo to the Cloud Build Trigger

Giving access to my GitHub repo to the Cloud Build trigger was when I met my first difficulties. You see, in the documentation for the terraform cloud google_cloudbuild_trigger resource, I saw that a github block can be provided. So I assumed that this was the intended way of connecting a GitHub repository. And this is what I did:

github {
  owner = "<username>"
  name = "<repo name>"
  push {
    branch = "^main$"
  }
}

Of course I also created google_cloudbuildv2_connection and google_cloudbuildv2_repository to properly connect the repository.

And yet it didn't work.

I did notice however that when creating the connection, terraform created a 2nd gen repository. Probably that v2 in the resource name should have been a clue...

I tried to manually create a 1st gen repository and create the trigger, and that seemed to work. So I decided to try and figure out how I can create 1st gen repos through terraform, hoping this would fix my issues. But I couldn't find how to do it.

Thus I turned my attention to actually doing it properly and understanding how to use 2nd gen repos. Turns out when you don't get anxious, try to understand how things work, and (most importantly) RTFM, you can find solutions. The trick was to not use the github block! Instead this is how you can use a 2nd gen repo in your Cloud Build trigger:

repository_event_config {
  repository = google_cloudbuildv2_repository.my_repository.id
  push {
    branch = "^main$"
  }
}

Using Git Submodules

Well if you are thinking that git submodules are a pretty basic git feature and should work out of the box, that's what I thought too! However, it seems that the Cloud Build environment doesn't initialise submodules 🙄. I had to use a custom cloudbuild.yaml file in order to add steps to initialise the submodules. (If you are wondering why I needed submodules in the first place, it is the recommended way to use the abridged theme):

steps:
  - name: gcr.io/cloud-builders/git
    args: ['submodule', 'init']
  - name: gcr.io/cloud-builders/git
    args: ['submodule', 'update']

Using Git LFS

Thus far in my blog I have a total of one (1) image. My banner! It is not that big, but I decided to use git lfs to store, to follow best practices. However, it seems that the Cloud Build environment doesn't initialise git-lfs 🙄. So I needed more custom stuff in my cloudbuild.yaml. Yay!

However, if you thought that adding

  - name: gcr.io/cloud-builders/git
    args: ['lfs', 'pull']

would work then think again. The first reason this doesn't work is that the git cloud builder provided by GCP, doesn't have git lfs installed. So my second attempt looked like:

  - name: gcr.io/cloud-builders/git
    script: |
      apt-get -y update
      apt-get -y install git-lfs
      git lfs pull

This theoretically should have worked. However, it seems that the Cloud Build environment doesn't set up the git credentials inside the build environment (No rolling eyes here since I believe this is done for security reasons). Thus, git lfs pull fails because it cannot read my private GitHub repository. Yay, security \o/!

But I still need to pull my large files. So what do? One solution was to set up a deploy key. Deploy keys can give read only access to a repo so that its source can be pulled and deployed from somewhere else. However I would need to set up a secret on GCP, and possibly rotate every so often. I didn't want to mess with that.

But there is another way! Specifically, Cloud Build's 2nd gen repository API provides a method that returns a read-only token for a connected repo! So in the end what I ended up doing was run a script in the git cloud builder that:

And it looks like this:

  - name: gcr.io/cloud-builders/git
    script: |
      apt-get -y update
      apt-get -y install git-lfs jq
      token=$(curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)" https://cloudbuild.googleapis.com/v2/$_CB_REPO:accessReadToken | jq -r .token)
      user=$(git remote get-url origin | cut  -d / -f 4)
      new_remote=$(git remote get-url origin | sed "s/^https:\/\//https:\/\/${user}:${token}@/")
      git remote remove origin
      git remote add origin ${new_remote}
      git lfs pull

I don't know if this is the best way to do this, but it is the one I found while searching online. A nice side effect was that I discovered jq, a command line json processor. Neat!

A Small Hiccup Along the Way

A slight issue I encountered (and quite easy to fix) was caused by my using a different domain during development for my managed ssl certificate, instead of blog.mandragore.io. And of course I realised after I had deployed everything. But no worries! I am using terraform, I can just simply change the domain in my terraform files, run apply again and everything should be ok. Well... no.

You see when a managed ssl certificate is already in use, it can't be destroyed. And in order to change the domain of a certificate, terraform needs to destroy the resource, and then recreate it. There is an easy fix for that. Terraform provides a lifecycle meta-argument in all resources that can control the lifecycle of a resource. For example, and what I needed in this case, I added this in the managed certificate's resource definition:

lifecycle {
  create_before_destroy = true
}

and now terraform will first create the new resource. But there is still one issue. In my definition of the managed cert resource I used a hardcoded name. And when terraform tries to create the managed certificate with the same hardcoded name before destroying the old one it fails. You know... since it already exists.

For now I quickly solved the issue by manually using a new name for the certificate. This way the new certificate will have a different name and get created. And the old will still get deleted because in terraform, a missing resource definition of an existing resource in the state implies that the resource must be destroyed.

But in the future I should probably make it so that the name of the certificate is derived using a random_id resource, so that this name changing dance is done automatically.

Seeing Everything Working

I know deploying a static website is not the most complex thing in the world. And yet I must admit that I got a lot of satisfaction seeing it running. I guess it really is the simple things in life...

But in all seriousness, this is the 94272 times I said I should start a blog. This time I actually deployed something. So I was satisfied by seeing that I made one step farther than all the other times.

Beyond that though... When I was done playing around with terraform, had all the resources defined as they needed to be and run terraform apply... It was such a nice feeling seeing my entire infrastructure being created with a single command. And I knew exactly where to look if I needed to change anything, or how to destroy and rebuild everything. I really like the magic of infrastructure as code. It makes me wish I had discovered tools like terraform or ansible sooner.

What touched me the most however was seeing the TLS certificate for my blog being issued by Google Trust Services. And no, before you start thinking "what a schmuck, who cares if an obscure Google service issued his certificate" let me explain... You see, I worked on that team for 2.5 years! So seeing things I worked on actually being useful, and realising that they are useful to so many other people using GCP gave me a nice warm feeling for a little bit 🥺. Well, I guess I can get sentimental some times.

Takeaways

Well I enjoyed the process! I feel like I learnt a lot both from GCP and Terraform. Although I suspect there is much much more to learn. But that's good!

Moreover I should emphasise again how useful it was to have an end goal in mind. It really helped me overcome anxiety and overthinking and actually finish deploying the blog (and writing the first post). I feel like while doing many side projects over the years helped me learn a ton of stuff, putting together something that solves a real problem was something I was missing from my life. It feels incredibly rewarding to see your work exist in the real world.

But beyond all the technical stuff I learned I also got to experience from both sides how it feels to encourage someone to do something and to get encouraged by others. I mentioned that this was the 94272 time I said I should start a blog. So what changed now and I actually did it? Of course part of the reason was that I have worked on my anxiety. And that I feel more confident that others might want to read about things that I have done. But also another part was some encouragement I got from my friend daknob. He kept asking me about how my blog was going, reminding me why it is a good idea to have and generally provided gentle encouragement. Would I have done it without his input? Maybe... But he made me more eager!

To explain how I experienced it from the other side please bear with me for one second. I have a very good friend, Angie, who is now a data scientist. She has a lot of experience working with GCP and terraform. I spammed her with a lot of questions while I was trying to get things right and she helped me a lot! Now an important detail here is that she was not always a data scientist. She used to be an amazing maths teacher. I still remember how she was slowly doing the switch from maths to data science, learning programming and the required technologies. And while she is smart enough that she didn't need particular help with all that, I think I have played a small part in encouraging her along the way. Closing the circle and discussing with her about my technical problems and her helping me felt really satisfying!

So if you are to take one thing from this blog post, above all the technical issues I discussed, let it be this: Helping and encouraging others can be immensely fulfilling and rewarding!

What's Next?

Well, I have some ideas in mind. I won't reveal them just yet but they will involve Rust, open source contributions, linux, and building servers.

But for now, if you stayed until this point thanks a lot for the attention! I hope you enjoyed it and found it useful.

And if you did, consider following me on mastodon to get updates on future posts! (or point out any mistakes)

References