alphalist Blog

6 Tips to Overcome Scaling Challenges Like Design Decisions, Tech Debt, and Developer Satisfaction

Share

What worked when your company was founded will no longer work now that you are scaling and hiring at a rapid pace. How will people communicate? How will tech debt be tackled? The choices one engineer makes affect many more people so now it's time for you to strategize on how your increased workforce works together.  Rachel tackled many scaling challenges when the team she led grew threefold - to 500 people! This is what she shared on the alphalist CTO podcast on how she had to adapt as the team grew.

Table of Contents
  1. Tackle Tech Debt to Keep Morale Up
  2. Create Processes for Fan-Out Work
  3. Use Design Guidance to Simplify Design Reviews
  4. Use Council Meetings to Make Aligned Technical Decisions
  5. Assign DRIs for Effective Decision Making
  6. Create a DevSat survey

Tackle Tech Debt to Keep Morale Up

If an engineer is going to hit the same friction again and again due to tech debt - it is going to demoralize them. So as important as it is to launch new features, it is also important that your Investment Portfolio is healthy - which is where tackling tech debt comes in.  

Create Processes for Fan-Out Work

Fan-out work is when you decide to get something done - maybe it's a large-scale code-based evolution to clean up some technical debt - and it's something that not just one team can do, but lots of different teams all have to be involved. 

The way it used to work - an engineer somewhere would have a great idea and they would write a discussion post or a team post internal to GitHub and say, ”Hey, I realize this API/technology we're using it's not good. We should refactor and remove it and look, here's my PR where I removed it from my project. Now everyone, please go and do this.” And you know perhaps when there were a small number of engineers working with influence to get stuff like that done could work, but we're not at that scale. Now we're a global company with people working all around the world. 

In the past, one influential developer might want to adapt code to get rid of tech debt - and does this by making a PR refactoring one aspect of the project and asking others to do the same on their work. But at a company so big, not everyone knows each other and it likely ends up becoming a half-done refactor. And we well know that the only thing worse than code that needs refactoring is code that is half refactored!  But that doesn’t mean tech debt and refactorings shouldn’t be initiated - it just needs to be done systematically.  Centrally you need to decide

  • What is the scope of the proposed fan-out project
  • What is it going to cost
  • What is the benefit?

All potential fan-out projects are compared in this way and a select few are picked to work on every quarter, with the goal of getting a few impactful things completed. Each chosen fan-out project is properly tracked (with TPMs and project managers etc.)  to make sure it makes a measurable distinct difference.  One such fan-out process we worked on recently was cleaning up feature flags in the monolith. As you know, feature flags are an excellent tool for deployment but if too many hang around for too long they affect code readability. Over 15 years we had amassed feature flags that were always on, feature flags that were never turned on, and worst-  feature flags that were deployed for some specific enterprise customers and not for others. No customer will appreciate being on a bespoke feature flag that we are not properly aware of!  

We tackled this fan-out process by centrally staffing a bunch of motivated people who were like “yeah, we really wanna do this. We wanna get this done. It's gonna be better for everyone.” This team investigated each feature flag figure to figure out who owned it and if it could be safely removed. We made great progress on our product doing it this way. 

Use Design Guidance to Simplify Design Reviews

The need for paved paths and design guidance is another thing that needs to be addressed as an engineering org scales and matures. How will a team building a new service know what building blocks or languages to use, or how to architect the code in a way that it will work smoothly with the work another team is doing on the monolith? Not only does GitHub have cascading levels of design reviews, but they are increasingly investing in paved paths so engineers can focus on the novel work they’re doing and have infrastructure to rely on that they don’t have to think about too much. With design reviews you want to balance the effort relative to the importance of the change – not everything requires a documented design, but changes that are substantial and will have a lasting impact across engineering should be carefully reviewed and communicated. GitHub also has a Principal Council composed of the company’s most senior technical ICs and VPs who review the most important design decisions. The Principal Council sets the overall architectural roadmap in order to simplify design guidance for teams. The Council might say that if the project meets Condition X, it should be built in Go here but if it meets Condition Y it should be built in the monolith. Of course, choosing a programming language is just scratching the surface so now the Principal Council is working on a broader future architecture plan.

Use Council Meetings to Make Aligned Technical Decisions

We really want teams to have agency in making the decisions that apply to them about the novel work that they're doing and so on. But we also need a way for anyone in engineering to bring big questions that they don't feel can be decided within the realm of their own team - like bigger infrastructure questions, investment questions, or engineering-wide questions - to a senior group that is going to have thoughtful discussions which will be communicated back. This is what happens at our council meetings. Anyone in engineering can submit a GitHub issue that will be discussed by people who work in various parts of the platform who are able to make decisions on things like “ I want to start using this database technology. I'm not sure how that's going to work on GitHub enterprise server, which is our on-prem server deployment, or on our future cloud-based SaaS offering. Is this something I can do and what are the constraints?”

Of course,  GitHub is built on GitHub and uses GitHub. (I love how when we do work on our systems for internal developer productivity, it's not only helping our own developers but it’s also testing our products and making them better for users.). So we use GitHub Discussions and even repos for communication.(Occasionally we'll write Google Docs as Google Docs are really good for iterative commenting and, and working. But then when something's locked, we bring  it into a repo.)

Assign DRIs for Effective Decision Making

Another thing that comes up in a large organisation is that sometimes decisions take too long. Not only should there be processes for healthy escalation, but each team also needs a DRI (Directly Responsible Individual). The DRI is able to make decisions, to be able to iterate, and make sure that teams have a healthy cadence of being able to move forward without getting in their own way.

Create a DevSat survey

A DevSat survey is a developer productivity and happiness survey. The GitHub DevSat survey is focused on the internal GitHub developer experience. There is a whole set of questions that we go through e.g.

  • What is causing developer friction?
  • What is the satisfaction with our tools and systems?
  • What jobs to be done have the most friction? 
  • Psychological safety on teams
  • Decision making on teams
  • The On-Call experience
  • How much unplanned work versus planned work is being done.

We use these to provide anonymized reports to managers and to leadership which really helps inform our investment decisions - really elevating our decision-making.

An example of this is when we asked in our DevSat about any friction involved in using Codespaces within GitHub. We acquired a wealth of insight from our internal developers that the Codespaces team can use, as people tend to provide more information when they have an anonymous survey at their disposal, so we can spot real trends based on what appears to be more important.  This ultimately helps make our external product better as well!

In conclusion: Scaling isn’t easy. You will need to set up processes on how things are going to work now that you are a big company with multiple teams. No one person can understand the details of all the work that is happening, so you need effective development practices and communication strategies. In this article, we discussed how to make design decisions, handle tech debt and keep in touch with developers now that the company is so much bigger.

Rachel Potvin

Rachel Potvin

VP Engineering @ GitHub

I'm an engineering executive with 25 years of experience working in the technology sector. I've spent the bulk of my career working on developer focused infrastructure and am passionate about building and investing in systems that improve developer productivity and happiness. At GitHub, I'm motivated by the mission of helping developers around the world collaborate, solve challenging problems, and have fun while doing so. I'm focused on building healthy sustainable engineering teams that scale to meet the demands of the world's developers. I manage a team of 500+ developers working on products such as Codespaces, Copilot, GitHub Core (PRs, Repos, Notifications), Planning and Tracking (Issues, Projects), GitHub Advanced Security (CodeQL/Code Scanning, Secret Scanning, Software Supply Chain), Code Productivity (CodeSearch, Nav, Dependencies), and Data (Data Platform, Data Science & ML, Insights). Before GitHub, I worked at Google for over 11 years. There, I most recently ran the Google Cloud Insights organization, responsible for managing and building products based on Google Cloud's customer and product data, including Google Cloud's recommendations and insights platform. Prior to that I ran Google’s DevOps teams focused on Developer Productivity: Code Search, Code Review, Issues, Source Control, IDEs, and developer research (see https://dl.acm.org/doi/pdf/10.1145/2854146). Earlier in my career I worked at an ISP startup doing web and database development, spent 6 years working in the video game industry, and spent two years in consulting.