Something that every developer would like to do, is release code more often to a production environment. But why does he want this? Features are written for end-users! And the only place where the end-user can use the feature is in production. So why not release it as soon as possible? This is of course easier said then done. Especially when dealing with a large monolithic system that contains code that has been running for years, as it is in our case. And we also deployed once a sprint (so once every two weeks), but we were not happy about it.
When looking at where we came from, the situation was as follows. We worked on all kinds of features for two weeks. Changes would incorporate new features, database changes, property changes, front-end changes, just about anything you can think of. This would be piled up into one release, which we would create a release branch and put on a staging environment at the last day of the sprint. We would run some tests on it and there would be some manual testing the first couple of days of the next week. With these manual tests, we would check if the basic functionality was still working, but also test each and every delivered story (if it was testable) to check if it was actually working the way it was described. Some bugs might come out of this one, which were then resolved on the release branch.
In order to move the release into production, our test results should look good, and the manual testers should approve it. This would result in an advice to go into production, which then would be done at night. This was done, because most of our releases would be breaking and at night the traffic is low, so the amount of affected users would be small.
This way of working has been done for years and it did work for us… more or less. Every now and then a release would fail and we had to roll back. Sometimes it was because a property was not set correctly. To prevent these silly things, we always had one developer present when the release was being done. So if something did go wrong, he would then check if it was easy fixable, or an actual rollback had to be done.
How do you break the habit
It is not easy to change the way people work. First you need to make sure that everyone in your team actually wants to deploy more often to production. Because the mindset needs to change. Any code change needs to be non-breaking and you might need to use feature switches, so that you can deploy a new feature to production but have it turned off until it is approved by a product owner. This probably is the hardest part and needs some experimentation and investigation. We for instance needed to define how database changes should be done, and since we use distributed caching what would happen when there are different versions of the objects in cache.
Second thing would be, to start with deploying to your staging environment more often. Put a bit of load on your environment and deploy… see what happens.
Another thing would be, that you should make it clear to the business that things might go wrong. A release might break a production environment even when it did not break on staging… and you should accept failure. Of course you can minimise the loss by specifying a window in which you can release, but in the end you even want to get rid of that. But failure is not a bad thing. You can only learn from it and change things so it will not fail the next time. That is also what happened to us. But because we are deploying during the day, there are a lot of developers around who can help fix things!
Finally the most important thing… Just start doing it! Do not try to think of all things that might need to be fixed in order to release more often. You might get depressed, think that it will never work and therefor never make the step to actually release things much faster.
Currently we are releasing features multiple times per week. And with this, it showed us a lot of value. Developers feel more responsible for their code and will actually check logs during startup of the application and when something strange is in there, it will be fixed faster. Also bugs in the application are fixed faster and put into production as soon as possible.
At the beginning of this article I emphasised that we do it for the end-user. So how do we know that he is using the new feature and does he really appreciate the work we do? You should measure it… this is something we are not doing (enough) yet. But it will come. We do see however that the end-user is happy. We see it for instance in the reviews of our native iOS and Android apps. Every now and then there is some feedback that something is not working properly. This results in a negative rating. We can then act upon it and fix the issue and put it within a couple of hours to production. The feedback we get then, is that people are surprised that it is fixed that fast and the rating is changed into a positive one. End-user happy, developers happy and business happy.
What has changed?
- We deploy during the day, when all developers are present.
- We deploy from our main branch (so no fixing on release branches and having the chance to undo the fix because it was not on our main branch).
- Developers feel more responsible for the code running in production.
- Developers are more proud of what they built.
- There is no more manual testing done on a staging environment.
- During the demo at the end of the sprint, we try to demo it on production!
In the last two weeks we released the auctioning platform around 11 times to production (not much compared to other big companies, but our team is relatively small). On some days it was done more than once. Only one of the releases caused some downtime… this lasted for around 20 minutes. Then the platform was functioning properly again. Before, we released only once per two weeks.
For the future we want to release even more!