Learn Incremental Deployment the Hard Way

I failed to deploy two big features in the past two years. I want to discuss why I failed them and what I think the best way to finish them.

2 Failed Features

Extract App Configurations to Environment Variables

Context

We have many app configurations (e.g. 3-rd party service configurations like AWS S3, app behavior configurations like host name). We were storing these configurations in files (like a normal Rails app would normally do). In the config files, we have different values for different environment (development/staging/production).

Reason for refactoring

And then we migrated to using Docker/Kubernetes for deployments. We want to setup more staging environments for different teams to test independent features.

The Plan

So we decided to extract these configuration values into environment variables.

I thought this was an easy change so I kept all the changes in a single feature branch. But it turned out to be a silly decision.

Issues

It's a ton of work to review. Although it's almost just replacing the hard-coded value with a call to ENV , it's the most hard to review type of merge requests because it needs reviewer to verify if there are any typos instead of logic defects.
We don't have any tests coverage for most of the features. So it's harder to know if these actually working or not. We have to deploy them to staging or even production to know that.
I took two-month vacation when I finished the code changes. But other developers didn't deploy these changes nor tested them. After I got back, I had to rebase the feature branch to catch the newest changes and deal with newly added configurations.

Results

Of course things went completely wrong after I merged this branch and tested them on staging. Even the homepage had some issues. We had to go back and forth to find the issue and fix it, test it again. Finally, it took us almost 3 months to finish this refactoring.

Merge an API service into the main App

Context

We have two apps running, one of them is the web interface (responds to HTML requests), another one is the API interface (responds to JSON requests).

Reason for refactoring

They share almost the same behaviour, except the view templates are different. Whenever we need to add or modify a piece of logic, we need to do it twice.

The Plan

So we decided to merge these two apps and leverage Rails' powerful respond_to method.

Since I have the experience from last time (Extract Feature Toggles to Environment Variables), I decided to do it one controller by one controller.

It started pretty well: I separated them into different branches, so that they can be tested and merged independently. I planed to test and deploy migrated APIs and migrate other APIs at the same time.

Issues

But then bad things happened in our process:

My manager took several weeks off for vacation. And I took several weeks off for vacation after he got back. We didn't merge/deploy any of these changes in these weeks.
Another developer was still developing new features in the meantime. So that when I got back from the vacation, I had more things to sync.
We didn't have any integration tests for our APIs. Which meant I also needed to add them when I migrated them into the web app. (But I was not sure if these tests were even correct at all.)
The client (mobile apps) for this API is maintained by our clients in a different timezone (Yes, we are only responsible for the web part and the API). They didn't provide any test apps for us to test the migrated API until I migrated all of them.

Of course things went wrong when all these bad things came together.
1. Since my manager was off, I finished all the APIs without deploying anything. (Things started to go off the rails from this decision.)
2. The client team moved really slow. I received the test app weeks after I finished the APIs. And another dev had already added new features to existing old APIs.
3. In our old API App, we had handlers for both .api and .json formats. Since I didn't have any integration tests nor an test app. I had to re-implement both of them, which actually added some garbage code to our web app.
4. The newly added tests didn't provide me with enough confidence so I had to test the new APIs with a staging app. But then the controllers had dependencies on each other due to the business logic on the client side. (I had to login to test all the other controllers.)
5. Since the client team was in a different timezone, I had to wait for a day for their feedback on the API specifications, which made things much harder.

Results

Finally, I had to merge all the changes into one branch and handed over the testing phase to my manage (who's in the same timezone as the client team), and let them test the new APIs. This means my plan to incrementally deploy these new APIs completely failed.

Common Issues

No Tests

These two refactorings both started without any integration test coverage. And testing them on staging/production took a lot of time. Without the confidence provided by integration tests, I was always afraid that some changes would break something when deploy.

Vocations

I both took vocation during the development of these two refactorings. I should've at least deployed what I'd finished and tested them out. So that the changes wouldn't get outdated after my vocation.

Huge Steps

And the most important issue was that I wanted to take a huge step to finish these tasks once and for all, instead of breaking down them into smaller tasks (while these two tasks could be easily separated into sub-tasks by nature). This is against the spirit of Agile and kind of like waterfall.

I ran into many unexpected problems when I deployed the huge changesets. I would gain more experiences dealing with these problems if I took smaller steps and tried to learn from each steps.

Incremental Deployment: the Only Way to Deploy Large Feature Changes

I think the only way to deploy this kind of large changesets is breaking them down into smaller changesets and deploy them incrementally. Even if I have a better test coverage, I would still choose this incremental strategy. I think taking small steps and getting feedback from each step are the core idea of Agile and Continuous Integration. If I always keep the system deployable and runnable, add small refactorings step by step, any huge tasks can be taken down easily.

I'll take the above two examples and explain how I would deploy them, and also give a template for this kind of changes.

Environment Variables

Find out all the environment variables that need to be extracted
Create a separate branch for one variable
Extract it to environment variable and update the deploy settings
Test and deploy it
Repeat 2-4 for every variable

API rewriting

List all the APIs that needs to be rewritten
Figure out the dependencies between controllers to decide the order of controllers to be rewritten
Create a separate branch for one controller (or even one controller action)
Rewrite it in the web app and direct client to use this new endpoint¹
Test and deploy it
Repeat 3-5 for every controller

Template

Break down the task and list every separate task that needs to be done
Figure out the dependencies if there are any
Start working on the first sub-task
Test and deploy it
Repeat it until all the tasks are done

Summary

The final template is super simple and I think this was what I was doing when I tried to implemented the mentioned refactorings. But due to many complex reasons, I was doing it completely wrong. I guess I just learned this idea of incremental deployment the hard way from these real cases.

Footnotes:

Things get trickier when you have to deal with a separate team that controls the client app and calling different endpoints in the same app.

Learn Incremental Deployment the Hard Way

2 Failed Features

Extract App Configurations to Environment Variables

Context

Reason for refactoring

The Plan

Issues

Results

Merge an API service into the main App

Context

Reason for refactoring

The Plan

Issues

Results

Common Issues

No Tests

Vocations

Huge Steps

Incremental Deployment: the Only Way to Deploy Large Feature Changes

Environment Variables

API rewriting

Template

Summary

Footnotes:

Comments