Sarah Gibson is an Open Source Infrastructure Engineer at 2i2c, an open source contributor and advocate. She holds more than two years of experience as a Research Engineer at a national institute for data science and artificial intelligence, as well as holding a core contributor role in the open source projects Binder, JupyterHub, and The Turing Way.
Sarah is passionate about working with domain experts to leverage cloud computing in order to accelerate cutting-edge, data-intensive research and disseminating the results in an open, reproducible and reusable manner. Sarah holds a Fellowship with the Software Sustainability Institute and advocates for best software practices in research. She is a member of the mybinder.org operating team and maintains infrastructure supporting a global community in sharing reproducible computational environments. She has also mentored projects through two cohorts of the Open Life Science programme, imparting lived experience of her skills participating and leading in open science projects.
You can follow Sarah’s work on GitHub: @sgibson91.
Read the latest blog posts here
TECH UPDATE: MULTIPLE JUPYTERHUBS, MULTIPLE CLUSTERS, ONE REPOSITORY.
This blog was originally posted by 2i2c and represents a project Sarah lead and developed.
2i2c manages the configuration and deployment of multiple Kubernetes clusters and JupyterHubs from a single open infrastructure repository. This is a challenging problem, as it requires us to centralize information about a number of independent cloud services, and deploy them in an efficient and reliable manner. Our initial attempt at this had a number of inefficiencies, and we recently completed an overhaul of its configuration and deployment infrastructure.
Read moreTHINGS I'VE LEARNED: JANUARY 2022
- Nested build matrices are not (yet?) supported in GitHub Actions, but you can explicitly define a set of matrix parameters using YAML array syntax. See an example here.
- A pattern I often use to update my working branch with the default branch is:Mostly this is fine, but occasionally merge conflicts happen. If I know I want to keep a specific version of a conflicting file from one of the branches (as opposed to finding a non-conflicting combination),
git checkout main git pull # Add `upstream main` if appropriate git checkout my_branch git merge main
--theirs/--ours
can be used.
Note that behaviour can change depending on which branch is checked out and whether a merge or rebase is being performed, so I recommend to double-check online!git checkout --ours conflicting_filename # To keep the version from the current branch git checkout --theirs conflicting_filename # To keep the file from the incoming branch
- That the second
---
in YAML delimits as if what follows it is another YAML file. This can cause issues for command line YAML parsers likeyq
and made pulling the front matter from my Markdown files a little trickier than expected! - How to automatically tweet out new blog posts when they are merged into
main
- How to use a GitHub App to generate tokens in GitHub Action workflows.
The tokens can then be used to securely workaround the fact that GitHub Action workflows can’t be triggered by events that were authorised by the
GITHUB_TOKEN
in another workflow. There is a nice write-up that helped me here.
HOW I AUTOMATED AUTHORISED CLOUD DEPLOYMENTS FROM PULL REQUESTS WITH GITHUB ACTIONS
This blog was originally posted on the Jupyter blog: https://blog.jupyter.org/how-i-automated-authorised-cloud-deployments-from-pull-requests-with-github-actions-13f890538e32
I recently did some work on the mybinder.org deployment infrastructure to solve a problem with testing Pull Requests before deployment.
It had not been possible to test Pull Requests on our staging deployment because our automated workflows don’t have access to secrets.
This resulted in my writing the test-this-pr
action and this blog is a retrospective of what I learned over that process.
CREATE A BLOG WITH HUGO AND GITHUB PAGES
As a scientist in today’s ever connected, digital world, having a platform to talk about one’s work can be a really useful tool. Whether you’re looking to strike up new collaborations or promote your freshly published paper, a blog can help signal boost your work and function as an archive of your ideas (the developed and forgotten).
I put together a tutorial for building a blog site from scratch using Hugo and GitHub Pages - the same tools I use to host this site!
Read moreCOLLABORATIONS WORKSHOP 2021
At the end of March 2021, I attended the Collaborations Workshop hosted by the Software Sustainability Institute. The following is my round-up of what the event is, who runs it, and some of my highlights.
What is the SSI?
The Software Sustainability Institute (SSI for short) is a network of UK universities, Research Software Engineering groups and policy makers dedicated in improving the quality, sustainability and recognition of research software. They help the members in their network learn software skills and best practices, and advocate for culture change in their organisations and institutions. I am also an SSI Fellow and if anyone has any questions about the Fellowship round in the future, please do get in touch (unfortunately, applications have already closed for the 2021 round). And read my previous blog to find out more about my Fellowship goals.
Read moreCONTINUOUS INTEGRATION: FAIL FAST AND FAIL FIRST
Sarah and Graham have different career backgrounds - Sarah having come through academia whereas Graham earned his stripes in industry. However in their current roles, they often find themselves using the same tools, for example Continuous Integration. They have written this blog post to identify how academia and industry may use Continuous Integration in different ways, and what they might learn from one another.
What is Continuous Integration and why do we use it?
In Continuous Integration (CI) and Continuous Deployment (CD), the key concept is “continuous”. That is where it departs from what software engineering teams were doing before: rather than eventually integrating, at the end of developing a feature, we do it continuously, as we’re working. Instead of eventually deploying, when we’ve got a collection of features built and bugs fixed, we do it continuously.
Read moreFEBRUARY 2020 UPDATE
Hello friends! 👋 It feels like such a long time since I wrote a blog post but the truth is that I’ve just been up to so many exciting things! So this blog post will be a quick run down on everything I’ve been up to recently.
At the start of January, I helped organise and run the Research Software Reactor with Tania Allard and Gerard Gorman. Our topic was “DevOps for better software and research reproducibility”. We had around 30 attendees come together at the Turing Institute for 2 days and learn about GitHub Actions for automating their workflows. The aim was to produce reproducible software to support gold-standard research.
Read moreSOFTWARE SUSTAINABILITY INSTITUTE FELLOWSHIP 2020
So 2020 is off to an incredibly exciting start!
I have been awarded a Software Sustainability Institute 2020 Fellowship!!! 🎉 🎉 🎉
The Software Sustainability Institute (SSI) is an organisation that facilitates the advancement of software in research by cultivating better, more sustainable, research software to enable world-class research (“Better software, better research”). Its mission is to become the world-leading hub for research software practice. The Institute is based at the Universities of Edinburgh, Manchester, Oxford and Southampton, and draws on a team of experts with a breadth of experience in software development, project and programme management, research facilitation, publicity and community engagement.
Read more2019: SEASONS OF REFLECTION
As 2019 comes to a close, I have spent a full year as a Research Software Engineer (RSE) at The Alan Turing Institute. A lot has changed over the last 12 months and I’ve come a long way - sometimes surprising myself!
Winter ❄️
In February, I gave my first professional talk at the UKRI Cloud Working Group workshop hosted by The Francis Crick Institute. It wasn’t the furthest I’ve travelled for a conference giving it’s next door to the British Library, but it was certainly good practice!
Read moreDIVING INTO LEADERSHIP TO BUILD PUSH-BUTTON CODE
“Hi everyone, I’m Sarah! I’m a Research Data Scientist at the Alan Turing Institute and I’m also an operator of mybinder.org. It’s really cool seeing how many people here are interested in BinderHub!”
And it is cool. Really cool! But also a bit scary as a room full of Research Software Engineers (each of them much further on in their careers than I am) suddenly turn to me, eager for the knowledge I was surely about to impart to them.
Read more