Posts

GitHub Actions Survival Skills

After close to a year of working with GitHub actions, I’ve compiled a list of handy “survival skills” that help to keep developer velocity high.

Kevin WangJune 13, 2022

Having come from teams that used various tools for CI/CD or automation, like Jenkins, CircleCI, Codefresh, and most recently GitHub actions, GitHub actions have certainly taken the cake ¹ in my opinion.

Jenkins was always slow, had an awful user interface, and the groovy language used in the Jenkinsfile never felt right.

Codefresh and CircleCI were large improvements with their modern UIs and various tunable knobs, but the barrier to entry felt, and still does feel, quite high.

Lastly, while still far from perfect, GitHub actions have definitely been the most enabling to me. Additionally, in recent months it seems like GitHub has released a ton of great features on a fairly rapid cadence. This is expected given that they’re a part of the massive entity that is Microsoft, and that they have a ton of catching up to do ².

After close to a year of using GitHub actions for ETL³ pipeline development as well as other bits of automation, I’ve found a few things to be essential for keeping developer velocity high.

Basic `bash` proficiency

I think you can get reasonably far without knowing any bash commands, and relying on publicly available actions instead, but eventually you’ll need to expand your skillset when your use case becomes more complex or specific.

Warning: Any and all following code samples are meant for MacOS / Linux systems. They likely won’t work as intended on Windows.

String manipulation

String manipulation in the terminal is a bit of a niche skill. It may feel foreign especially if you spend most of your time in a text editor like VSCode doing application development (web backends and frontends). However, it’s a pretty fundamental and powerful skill. There are many ways, possibly too many, to manipulate strings in bash, but knowing just enough can make you quite effective.

Here are a few tools...

...and here is an example demonstrating four ways to extract the branch name from the built-in GITHUB_REF environment variable in GitHub actions:

# GITHUB_REF="refs/heads/octocat/branch"

echo ${GITHUB_REF#refs/*/}
echo "${GITHUB_REF#refs/*/}"
echo $GITHUB_REF | awk -F '/' '{printf $3}{print "/"$4}'
echo $GITHUB_REF | sed "s|refs/heads/||"

Note: GITHUB_REF_NAME is available as another default environment variable that has the same output as the above.

Another example is getting the short SHA of the current commit, or GITHUB_SHA environment variable:

# GITHUB_SHA=$(git rev-parse HEAD)

echo ${GITHUB_SHA:0:7}
echo $GITHUB_SHA | cut -c1-7

Though there are multiple options at your disposal, you’ll likely gravitate towards the ones that feel most ergonomic to you and your team.

Regex

Regex, or regular expressions, is another tool that everyone will inevitably encounter in regular day to day software engineering. It’s worth learning at least to a minimal extent, irrespective of use case.

regex=^[-a-z]+/v[0-9]+\.[0-9]\.[0-9]+(-rc\.[0-9]+)?$

[[ foo/0.1.x       =~ $regex ]] && echo "✅" || echo "❌"
[[ bar-bar/v1.1.0  =~ $regex ]] && echo "✅" || echo "❌"
[[ baz/v5.0.0-rc.5 =~ $regex ]] && echo "✅" || echo "❌"

These are three inline if statements that print if some strings match the given regex or not. This may come in handy when you need to validate a branch pattern in your GitHub actions.

Note: =~ is the Equal Tilde operator that allows the use of regex in an if statement.

Warning: While it is not super pertinent, it’s worth knowing that there are a few different regex implementations out there, just in case things go awry when working with regex across different languages.

CLI tools

Beyond string manipulation are additional operations like API calling that you might find yourself doing in more complex workflows. This is where tools like curl and jq become relevant, and knowing them to a basic degree will simply help you to be flexible and productive.

It is helpful to know how some commands, like the docker CLI, may behave differently in different environments. The nice interactive prompts that you receive locally will not be usable in CI environments. However, there are typically --no-prompt or --std-in flags to help you bypass said prompts.

A quick example. The following command will prompt you for input, and will likely result in a CI job hanging up, and eventually timing out or failing.

docker login
# Login with your Docker ID to push and pull images from Docker Hub.
# If you don't have a Docker ID, head over to https://hub.docker.com to create one.
# Username:

This is what you may expect to be run instead, to skip the prompt.

aws ecr get-login-password | docker login -u AWS --password-stdin 000000000000.dkr.ecr.us-east-1.amazonaws.com
# Login Succeeded

Recap and example

Below is an example that combines the tips from above. It uses string replacement and regex matching to exit early if an incoming git tag does not match the expected pattern, and then passes values to npm.

Workflow example:

name: Publish
on:
  release:
    types: [published]

jobs:
  publish:
    name: Publish
    runs-on: ubuntu-latest
    steps:
      - name: Validate Tag
        id: validate_tag
        run: |
          tag=${GITHUB_REF#refs/*/v}
          if [[ $tag =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]
          then
            echo "::notice::Tag is valid — $tag"
            echo ::set-output name=tag::${GITHUB_REF#refs/*/v}
          else
            echo "::error::Tag is invalid — $tag"
            exit 1
          fi
      - name: Setup Node
        uses: actions/setup-node@v2
        with:
          node-version: 16
          registry-url: "https://registry.npmjs.org"
      - run: npm version ${{steps.validate_tag.outputs.tag}} || true
      - run: npm ci
      - run: npm run build
      - run: npm publish
        env:
          NODE_AUTH_TOKEN: ${{secrets.NPM_TOKEN}}

Line 14: Get the tag from the GITHUB_REF environment variable.
Line 15: If the tag is valid, set the tag output to the tag.
Line 31-33: Presence of the NODE_AUTH_TOKEN environment variable allows npm publish to authenticate without prompting for credentials.

Pre-installed software

Speaking of CLI tools, did you know that GitHub actions come with a plethora of pre-installed software? I happened to stumble upon this README which shows a list of pre-installed software in the virtual environment that a GitHub Ubuntu 20.04 runner uses. This is effectively what you get out-of-the-box when you specify runs-on: ubuntu-latest.

- Terraform 1.2.1
- yamllint 1.26.3
- yq 4.25.2
- zstd 1.5.2 (homebrew)

#### CLI Tools

- Alibaba Cloud CLI 3.0.121
- AWS CLI 2.7.4
- AWS CLI Session manager plugin 1.2.331.0
- AWS SAM CLI 1.51.0
- Azure CLI (azure-cli) 2.37.0 (installation method: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt)
- Azure CLI (azure-devops) 0.25.0
- GitHub CLI 2.11.3
- Google Cloud SDK 369.0.0 (apt source repository: https://packages.cloud.google.com/apt)
- Hub CLI 2.14.2
- Netlify CLI 10.4.0
- OpenShift CLI 4.10.15
- ORAS CLI 0.12.0
- Vercel CLI 24.2.5

I was pretty surprised with some of the installed software like terraform, jq, aws, gh, and even vercel. Knowing what is already available to you will save you from having to add boilerplate for installing some commonly used CLI tools.

Note: For the full list, checkout the README

Iterating quickly

Chances are you’ll push changes ~5-10 times before you get a workflow just right... and then push another ~5-10 times while you try to tidy it up but end up breaking it instead.

For iterating quickly, I’ve found the workflow_dispatch event trigger to be immensely useful. As long as a workflow file with this trigger exists on the default branch, it can be modified on any other development branch and then manually triggered at will, via the GitHub API, CLI, or UI.

This way, you can push all the changes you want to your development branch and test run your action at the same time. This saves you from having to do it on main, although that sometimes cannot be avoided.

API

CLI

This is a curl example.

curl \
  -X POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ghp_yourtoken' \
  'https://api.github.com/repos/:owner/:repo/actions/workflows/:workflow_id/dispatches' \
  --data-raw '{
      "ref": "some-branch",
      "inputs": {
          "input1": "boop",
          "input2": "bop"
      }
  }'

Docs: https://docs.github.com/en/rest/actions/workflows#create-a-workflow-dispatch-event

And yes, even if the workflow will ultimately be triggered by another event, like pull_request, you can still use workflow_dispatch for testing without requiring a real event to occur. You just need to map the inputs and contexts to the necessary workflow values you need.

Warning: act is certainly an option for testing actions locally, but it is lacking in feature parity with the actual GitHub actions virtual environment. Things like composite actions and matrix jobs are not supported.

Staging clone

Another method of iterating quickly with GitHub actions is by creating a staging repository clone of your main repository. You can then push all your changes to the main branch here with reckless abandon.

Know your Contexts!

Regardless of the event that triggers your workflow, it is pretty crucial to know how to map incoming context values to usable variables. Different event triggers will have different context values.

pull_request_target: You may be accessing values on ${{ github.event.[*].[*] }}
- see github context
workflow_dispatch: You may be accessing values on ${{ inputs.[*] }}
- see inputs context

Multiple event triggers

If you’re simultaneously supporting two or more event triggers, an event mapping step may be useful.

Workflow example:

Line 30-31 map two different event sources to some job outputs. These outputs are then available to subsequent, dependent jobs.

Warning: Environment variables set within individual jobs are only available to steps within that job. This is likely by design to prevent cross contamination of the virtual environments that runners execute in... or maybe it is simply a known aspect of distributed machines and virtual environments 🤷... This claim needs validation.

In the screenshots below, you can see where some outputs in the job summaries are empty.

name: Input Mapping Example

on:
  pull_request_target:
    branches:
      - main
  workflow_dispatch:
    inputs:
      action:
        type: choice
        description: The pull request action
        required: true
        options:
          - closed
          - opened
          - synchronize
          # List shortened for brevity
      author:
        type: string
        required: true
        description: The pull request author

jobs:
  map_inputs:
    runs-on: ubuntu-latest
    env:
      pr_action: ${{ github.event.action                  || inputs.action }}
      pr_author: ${{ github.event.pull_request.user.login || inputs.author }}
    outputs:
      pr_action: ${{ github.event.action                  || inputs.action }}
      pr_author: ${{ github.event.pull_request.user.login || inputs.author }}
    steps:
      - run: |
          echo "ACTION=${{env.pr_action}}" >> $GITHUB_ENV
          echo "AUTHOR=${{env.pr_author}}" >> $GITHUB_ENV
          echo "### \`map_inputs\`"        >> $GITHUB_STEP_SUMMARY
          echo "ACTION=${{env.pr_action}}" >> $GITHUB_STEP_SUMMARY
          echo "AUTHOR=${{env.pr_author}}" >> $GITHUB_STEP_SUMMARY
          echo ""                          >> $GITHUB_STEP_SUMMARY
  summary:
    needs: map_inputs
    runs-on: ubuntu-latest
    steps:
      - run: |
          echo "### \`summary\`"                                                >> $GITHUB_STEP_SUMMARY
          echo ""                                                               >> $GITHUB_STEP_SUMMARY
          echo "**env.ACTION**: ${{env.ACTION}}"                                >> $GITHUB_STEP_SUMMARY
          echo "**env.AUTHOR**: ${{env.AUTHOR}}"                                >> $GITHUB_STEP_SUMMARY
          echo ""                                                               >> $GITHUB_STEP_SUMMARY
          echo "**outputs.pr_action**: ${{needs.map_inputs.outputs.pr_action}}" >> $GITHUB_STEP_SUMMARY
          echo "**outputs.pr_author**: ${{needs.map_inputs.outputs.pr_author}}" >> $GITHUB_STEP_SUMMARY

The two following screenshots show the eventual job summaries, from two different event triggers.

pull_request_target:

workflow_dispatch:

For event objects, see “Events that trigger workflows” and “GitHub event types”.

Default environment variables

In addition to the event contexts, there are several default environment variables that store commonly used information. These are values like GITHUB_ACTOR, GITHUB_REF_TYPE and GITHUB_SHA, to name a few.

Note: Most of the default environment variables have a corresponding, and similarly named, context property. For example, the value of the GITHUB_REF environment variable can be read during workflow processing using the ${{ github.ref }} context property. ⁴

Here are few envivonment variables and github context equivalents:

Variable Name	Equivalent Context	Value
`$GITHUB_WORKSPACE`	`github.workspace`	`/home/runner/work/<repo>/<repo>`
`$GITHUB_REF_NAME`	`github.ref_name`	`my-branch`, `v1.2.3` (no `refs/*/` prefix)
`$GITHUB_SHA`	`github.sha`	`3bdcab962faa2ce5a9569df792c8009b609bdaab`
`$GITHUB_REPOSITORY`	`github.repository`	`Codertocat/Hello-World`
`$RUNNER_OS`	`runner.os`	`Linux`, `Windows`, `macOS`
Full list	Full list

There is no need to memorize these, but knowing which ones to reach for and when, will be helpful.

Runners’ file systems

It took me a while to develop a clear mental model of the file system of GitHub actions. The biggest hurdle was not knowing what home base was, so I wasn’t clear on where any given file system path would resolve to. I later discovered that GITHUB_WORKSPACE would be the anchoring point that I was looking for.

If you need to do any sort of file system traversal, whether that’s in your GitHub action YAML code, or maybe from a custom JavaScript action, it helps to know where anything and everything gets cloned to and what paths to reference.

Note: GITHUB_WORKSPACE/github.workspace is a default environment variable that is the working directory of your runner and the equivalent of pwd, assuming no directory changes are made. This is an absolute path.

Note: You can specify a working-directory option on your job or step. This can be a path that is absolute or relative to GITHUB_WORKSPACE for the job or step to execute in. In my opinion, it’s preferable to rely on this rather than any manual cd/pushd/popd-ing which could potentially land yourself in file system oblivion.

Let’s say you have a repository named workflows-test, and a workflow that clones down several different repositories, including the original repository itself. Take note of the following path and working-directory options (highlighted). The slight variances are intentional.

name: Print working directory

on:
  workflow_dispatch:
jobs:
  my-job:
    name: Clone repos
    runs-on: ubuntu-latest
    steps:
      - name: Clone this repo
        uses: actions/checkout@v3
        with:
          repository: ${{ github.repository }}
          path: ${{ github.workspace }}/foo
      - name: Clone repo 2
        uses: actions/checkout@v3
        with:
          repository: thiskevinwang/workflows-test
          path: ./bar
      - name: Clone repo 3
        uses: actions/checkout@v3
        with:
          repository: thiskevinwang/workflows-test
          path: baz
      - run: |
          echo "- $GITHUB_WORKSPACE" >> $GITHUB_STEP_SUMMARY
          echo "1. $(pwd)"           >> $GITHUB_STEP_SUMMARY
      - run: echo "2. $(pwd)"        >> $GITHUB_STEP_SUMMARY
        working-directory: ${{ github.workspace }}/foo
      - run: echo "3. $(pwd)"        >> $GITHUB_STEP_SUMMARY
        working-directory: ./bar
      - run: echo "4. $(pwd)"        >> $GITHUB_STEP_SUMMARY
        working-directory: baz

This creates a folder structure like:

/home/runner/work/workflows-test/workflows-test

...and outputs a summary like:

- /home/runner/work/workflows-test/workflows-test

1. /home/runner/work/workflows-test/workflows-test

2. /home/runner/work/workflows-test/workflows-test/foo

3. /home/runner/work/workflows-test/workflows-test/bar

4. /home/runner/work/workflows-test/workflows-test/baz

Warning: actions/checkout@v3 prevents you from cloning a repo to a path that is outside of GITHUB_WORKSPACE. Ex. ../foo will not work.
Error: Repository path '/home/runner/work/workflows-test/foo'
is not under '/home/runner/work/workflows-test/workflows-test'

Closing thoughts

This might’ve been my chonkiest post yet, but hopefully I conveyed the point that you don’t need to be an expert in any of the previous topics, and that knowing a little will go a long way.

I foresee GitHub actions only growing larger and larger in the future, near and far, so it’s something I personally want to keep learning. The surface area is also pretty finite so this is a very reasonable chore in my opinion.

All in all, and I’ll say it again, GitHub actions have felt the most enabling to me for any and all CI/CD or automation needs. I will acknowledge though that this impression is certainly conflating 1.) the platform’s actual UX, 2.) myself being at a point in my life where I also feel the most competent as an individual contributor, and 3.) the experience with using it in the most complex use case ⁵ that I have encountered in my career thus far.

Appendix

Gotchas

`on.deployment_status` and Vercel Monorepos

When using the deployment_status trigger, with Vercel deployments, the github.event.deployment.environment value will be slightly different depending on if you repository is a single-app repo or a multi-app monorepo.

I don’t know the exact reason for this but it might be due to Turbo Repo usage or automatic monorepo detection.

With a single app repo, deployment_status will be string value of Production or Preview.
With a multi-app monorepo, deployment_status will be string value of Production – {VercelProjectName} or Preview – {VercelProjectName}.

An easy way to conditionally run a workflow based on the incoming deployment_status is to use the contains function.

on: deployment_status

jobs:
  do-thing:
    if: github.event.deployment_status.state == 'success' && github.event.sender.id == 35613825 && contains(github.event.deployment.environment, 'production') && contains(github.event.deployment.environment, 'my-project')

There are four conditions:

github.event.deployment_status.state == 'success' — Only run when the deployment is successful.
github.event.sender.id == 35613825 — Only run when the deployment is triggered by a specific user, such as Vercel.
contains(github.event.deployment.environment, 'production') — Only run when the deployment for the production environment.
contains(github.event.deployment.environment, 'my-project') — Only run when the deployment is for a specific Vercel project.

I’m not yet sure how to make this more readable, but readability is a merely a nice-to-have compared with functionality which is a must-have.

To “take the cake” is an American saying that means to be the most remarkable or foolish of its kind. Source: google ↩
Jenkins was released in 2011. CircleCI was founded in 2011. Codefresh was founded in 2014. GitHub actions launched in 2018 ↩
ETL, which stands for extract, transform, and load, is the process data engineers use to extract data from different sources, transform the data into a usable and trusted resource, and load that data into the systems end-users can access and use downstream to solve business problems. — Source: google ↩
See docs for note on environment variable and github context equivalents ↩
At HashiCorp, where I currently work on the Digital Team, we’ve leveraged GitHub actions to build out our ETL pipline for ingesting versioned docs content for our various products' documentation sites — waypointproject.io, vaultproject.io are a couple. ↩