GitHub Actions Survival Skills
After close to a year of working with GitHub actions, I’ve compiled a list of handy “survival skills” that help to keep developer velocity high.
Kevin Wang
Having come from teams that used various tools for CI/CD or automation, like Jenkins, CircleCI, Codefresh, and most recently GitHub actions, GitHub actions have certainly taken the cake 1 in my opinion.
Jenkins was always slow, had an awful user interface, and the
groovy
language used in the Jenkinsfile
never felt right.
Codefresh and CircleCI were large improvements with their modern UIs and various tunable knobs, but the barrier to entry felt, and still does feel, quite high.
Lastly, while still far from perfect, GitHub actions have definitely been the most enabling to me. Additionally, in recent months it seems like GitHub has released a ton of great features on a fairly rapid cadence. This is expected given that they’re a part of the massive entity that is Microsoft, and that they have a ton of catching up to do 2.
After close to a year of using GitHub actions for ETL3 pipeline development as well as other bits of automation, I’ve found a few things to be essential for keeping developer velocity high.
Basic bash
proficiency
I think you can get reasonably far without knowing any bash commands, and relying on publicly available actions instead, but eventually you’ll need to expand your skillset when your use case becomes more complex or specific.
Warning: Any and all following code samples are meant for MacOS / Linux systems. They likely won’t work as intended on Windows.
String manipulation
String manipulation in the terminal is a bit of a niche skill. It may feel foreign especially if you spend most of your time in a text editor like VSCode doing application development (web backends and frontends). However, it’s a pretty fundamental and powerful skill. There are many ways, possibly too many, to manipulate strings in bash, but knowing just enough can make you quite effective.
Here are a few tools...
...and here is an example demonstrating four ways to extract the branch name
from the built-in GITHUB_REF
environment variable in GitHub actions:
Note:
GITHUB_REF_NAME
is available as another default environment variable that has the same output as the above.
Another example is getting the short SHA of the current commit, or GITHUB_SHA
environment variable:
Though there are multiple options at your disposal, you’ll likely gravitate towards the ones that feel most ergonomic to you and your team.
Regex
Regex, or regular expressions, is another tool that everyone will inevitably encounter in regular day to day software engineering. It’s worth learning at least to a minimal extent, irrespective of use case.
These are three inline if statements that print if some strings match the given regex or not. This may come in handy when you need to validate a branch pattern in your GitHub actions.
Note:
=~
is the Equal Tilde operator that allows the use of regex in an if statement.Warning: While it is not super pertinent, it’s worth knowing that there are a few different regex implementations out there, just in case things go awry when working with regex across different languages.
CLI tools
Beyond string manipulation are additional operations like API calling that you might
find yourself doing in more complex workflows. This is where tools like curl
and jq
become relevant, and knowing them to a basic degree will simply help you to be flexible
and productive.
It is helpful to know how some commands, like the docker
CLI, may behave differently in
different environments. The nice interactive prompts that you receive locally will not be
usable in CI environments. However, there are typically --no-prompt
or --std-in
flags to
help you bypass said prompts.
A quick example. The following command will prompt you for input, and will likely result in a CI job hanging up, and eventually timing out or failing.
This is what you may expect to be run instead, to skip the prompt.
Recap and example
Below is an example that combines the tips from above. It uses string replacement
and regex matching to exit early if an incoming git tag does not match the expected
pattern, and then passes values to npm
.
Workflow example:
- Line 14: Get the tag from the
GITHUB_REF
environment variable. - Line 15: If the tag is valid, set the
tag
output to the tag. - Line 31-33: Presence of the
NODE_AUTH_TOKEN
environment variable allowsnpm publish
to authenticate without prompting for credentials.
Pre-installed software
Speaking of CLI tools, did you know that GitHub actions come with a plethora of pre-installed
software? I happened to stumble upon this README
which shows a list of pre-installed software in the virtual environment
that a GitHub Ubuntu 20.04 runner uses. This is effectively
what you get out-of-the-box when you specify runs-on: ubuntu-latest
.
I was pretty surprised with some of the installed software like terraform
,
jq
, aws
, gh
, and even vercel
. Knowing what is already available to you will
save you from having to add boilerplate for installing some commonly used CLI tools.
Note: For the full list, checkout the README
Iterating quickly
Chances are you’ll push changes ~5-10 times before you get a workflow just right... and then push another ~5-10 times while you try to tidy it up but end up breaking it instead.
For iterating quickly, I’ve found the workflow_dispatch
event trigger to be
immensely useful. As long as a workflow file with this trigger exists on the default branch,
it can be modified on any other development branch and then manually triggered at
will, via the GitHub API, CLI, or UI.
This way, you can push all the changes you want to your development branch and test
run your action at the same time. This saves you from having to do it on main
, although
that sometimes cannot be avoided.
This is a curl
example.
Docs: https://docs.github.com/en/rest/actions/workflows#create-a-workflow-dispatch-event
And yes, even if the workflow will ultimately be triggered by another event, like pull_request
,
you can still use workflow_dispatch
for testing without requiring a real event to occur. You
just need to map the inputs and contexts to the necessary workflow values you need.
Warning:
act
is certainly an option for testing actions locally, but it is lacking in feature parity with the actual GitHub actions virtual environment. Things like composite actions and matrix jobs are not supported.
Staging clone
Another method of iterating quickly with GitHub actions is by creating a staging repository
clone of your main repository. You can then push all your changes to the main
branch here
with reckless abandon.
Know your Contexts!
Regardless of the event that triggers your workflow, it is pretty crucial to know how to map incoming context values to usable variables. Different event triggers will have different context values.
pull_request_target
: You may be accessing values on${{ github.event.[*].[*] }}
- see
github
context
- see
workflow_dispatch
: You may be accessing values on${{ inputs.[*] }}
- see
inputs
context
- see
Multiple event triggers
If you’re simultaneously supporting two or more event triggers, an event mapping step may be useful.
Workflow example:
Line 30-31 map two different event sources to some job outputs. These outputs are then available to subsequent, dependent jobs.
Warning: Environment variables set within individual jobs are only available to steps within that job. This is likely by design to prevent cross contamination of the virtual environments that runners execute in... or maybe it is simply a known aspect of distributed machines and virtual environments 🤷... This claim needs validation.
In the screenshots below, you can see where some outputs in the job summaries are empty.
The two following screenshots show the eventual job summaries, from two different event triggers.
pull_request_target
:
workflow_dispatch
:
For event objects, see “Events that trigger workflows” and “GitHub event types”.
Default environment variables
In addition to the event contexts, there are several default environment variables
that store commonly used information. These are values like GITHUB_ACTOR
, GITHUB_REF_TYPE
and GITHUB_SHA
, to name a few.
Note: Most of the default environment variables have a corresponding, and similarly named, context property. For example, the value of the
GITHUB_REF
environment variable can be read during workflow processing using the${{ github.ref }}
context property. 4
Here are few envivonment variables and github
context equivalents:
Variable Name | Equivalent Context | Value |
---|---|---|
$GITHUB_WORKSPACE | github.workspace | /home/runner/work/<repo>/<repo> |
$GITHUB_REF_NAME | github.ref_name | my-branch , v1.2.3 (no refs/*/ prefix) |
$GITHUB_SHA | github.sha | 3bdcab962faa2ce5a9569df792c8009b609bdaab |
$GITHUB_REPOSITORY | github.repository | Codertocat/Hello-World |
$RUNNER_OS | runner.os | Linux , Windows , macOS |
Full list | Full list |
There is no need to memorize these, but knowing which ones to reach for and when, will be helpful.
Runners’ file systems
It took me a while to develop a clear mental model of the file system of GitHub actions.
The biggest hurdle was not knowing what home base was, so I wasn’t clear on where
any given file system path would resolve to. I later discovered that GITHUB_WORKSPACE
would be the anchoring point that I was looking for.
If you need to do any sort of file system traversal, whether that’s in your GitHub action YAML code, or maybe from a custom JavaScript action, it helps to know where anything and everything gets cloned to and what paths to reference.
Note:
GITHUB_WORKSPACE
/github.workspace
is a default environment variable that is the working directory of your runner and the equivalent ofpwd
, assuming no directory changes are made. This is an absolute path.Note: You can specify a
working-directory
option on your job or step. This can be a path that is absolute or relative toGITHUB_WORKSPACE
for the job or step to execute in. In my opinion, it’s preferable to rely on this rather than any manualcd
/pushd
/popd
-ing which could potentially land yourself in file system oblivion.
Let’s say you have a repository named workflows-test
, and a workflow that
clones down several different repositories, including the original repository
itself. Take note of the following path
and working-directory
options (highlighted).
The slight variances are intentional.
This creates a folder structure like:
...and outputs a summary like:
Warning:
actions/checkout@v3
prevents you from cloning a repo to a path that is outside ofGITHUB_WORKSPACE
. Ex.../foo
will not work.
Closing thoughts
This might’ve been my chonkiest post yet, but hopefully I conveyed the point that you don’t need to be an expert in any of the previous topics, and that knowing a little will go a long way.
I foresee GitHub actions only growing larger and larger in the future, near and far, so it’s something I personally want to keep learning. The surface area is also pretty finite so this is a very reasonable chore in my opinion.
All in all, and I’ll say it again, GitHub actions have felt the most enabling to me for any and all CI/CD or automation needs. I will acknowledge though that this impression is certainly conflating 1.) the platform’s actual UX, 2.) myself being at a point in my life where I also feel the most competent as an individual contributor, and 3.) the experience with using it in the most complex use case 5 that I have encountered in my career thus far.
Appendix
Gotchas
on.deployment_status
and Vercel Monorepos
When using the deployment_status
trigger,
with Vercel deployments, the github.event.deployment.environment
value will be slightly
different depending on if you repository is a single-app repo or a multi-app monorepo.
I don’t know the exact reason for this but it might be due to Turbo Repo usage or automatic monorepo detection.
- With a single app repo,
deployment_status
will be string value ofProduction
orPreview
. - With a multi-app monorepo,
deployment_status
will be string value ofProduction – {VercelProjectName}
orPreview – {VercelProjectName}
.
An easy way to conditionally run a workflow based on the incoming deployment_status
is to
use the contains
function.
There are four conditions:
github.event.deployment_status.state == 'success'
— Only run when the deployment is successful.github.event.sender.id == 35613825
— Only run when the deployment is triggered by a specific user, such as Vercel.contains(github.event.deployment.environment, 'production')
— Only run when the deployment for the production environment.contains(github.event.deployment.environment, 'my-project')
— Only run when the deployment is for a specific Vercel project.
I’m not yet sure how to make this more readable, but readability is a merely a nice-to-have compared with functionality which is a must-have.
Footnotes
-
To “take the cake” is an American saying that means to be the most remarkable or foolish of its kind. Source: google ↩
-
Jenkins was released in 2011. CircleCI was founded in 2011. Codefresh was founded in 2014. GitHub actions launched in 2018 ↩
-
ETL, which stands for extract, transform, and load, is the process data engineers use to extract data from different sources, transform the data into a usable and trusted resource, and load that data into the systems end-users can access and use downstream to solve business problems. — Source: google ↩
-
See docs for note on environment variable and
github
context equivalents ↩ -
At HashiCorp, where I currently work on the Digital Team, we’ve leveraged GitHub actions to build out our ETL pipline for ingesting versioned docs content for our various products' documentation sites — waypointproject.io, vaultproject.io are a couple. ↩