The Cost of Fixing Failed Software Deployments

A report from Capers Jones found that 60% of US software engineering work-time centers around finding and fixing errors, representing roughly $500 billion in costs from Developers and QA/QE teams finding and fixing deficiencies that pop up during the course of the software development lifecycle. This could even reflect lost IT teams and end user productivity, and delayed product roadmaps. 

Capers Jones, WASTAGE: THE IMPACT OF POOR QUALITY ON SOFTWARE ECONOMICS, Version 8:0 September 3, 2017

We have all experienced these failures as Developers, QA/QE engineers, System Admins, DevOps teams, Tiers 1 and 2 support teams and as end users. 

The sequence of events looks something like this: Developers write code to address requests for feature enhancements, patches and upgrades. That finished product then gets sent to the QA/QE engineers for regression testing. Depending on the test bed these teams test on like for like systems, more often than not spin up VMs to mimic the production environment. Once that testing is complete the release is given over to the system admins for deployment. The system admins then plan the steps for the deployment. At deployment time, the team convenes and pushes out the release, validating the platform is operating as designed. Upon completion of the validation, all concerned teams are notified. In a perfect world, this is the end of the process. 

Unfortunately, this is not always the case. As mentioned above, it is estimated that roughly $500 billion in costs for software development. This does not include the time spent supporting that failed release. The System Admin team supporting the Tier 1 team and ultimately the end user cost. In 2016 the estimated total cost of worldwide software failures was $1.1 Trillion, a whopping 315 years of lost time. 

Requirements for a Successful Deploy 

The morning after a new release will reveal how well the teams named earlier have done their job including: 

  • If the requirements are well written
  • If the code is clear, maintainable and documented
  • If the testing is done in a like for like environment
  • If the testing is thorough and complete including Canary and/or blue/green deployments
  • If the deployment validation is planned and executed properly

That is a lot of “ifs”. 

The Impact of an Unsuccessful Deploy 

The resulting scenario could play out after our deployment above. 

In a “successful” deployment the end users will call about feature changes, “this used to work before, why not now?” or “how do I do this now?” types of questions. There may be some investigation into those questions between the developers and the system admins, with little investment in time and effort. After this effort the teams return to their normal duties and the next release goes into the process. 

In an unsuccessful effort, there are pitfalls that were not avoided including: 

  • Unclear requirements that do not define the end state
  • Poorly written code, that is not maintainable or well documented
  • Inadequate application testing that gives the deployment team a false set of expectations
  • Incomplete Platform validation that does not test fully the release effects on the platform

The morning after a release suffering from these failures deteriorates rapidly. The Help Desk experiences a spike in calls from end users, and Tier 1 support quickly gets overwhelmed with requests for desk side assistance. This rapidly escalating scenario works its’ way back up the deployment chain, with system admins going fully defensive to figure out what to do. The QA/QE team is called in to review the testing and compare against the production issues happening in real time. The developers are also called in to review lines and lines of code in an effort to find where they went wrong.  

In the worst case the platform stops as the peak demand approaches and an outage is declared. This scenario consumes the time and effort of the Developers, QA/QE, System Admins, Help Desk and Tier 1 and 2 support, as well as the Product Managers, and Executive Management. If the customer is external add the corresponding teams on the customer side.  

The Cost of Failure for Online Businesses 

In an October 2019 study Website Downtime: How it Hurts Businesses, Hosting Review listed the following example costs of the revenue losses from prominent online businesses. 

Introduction to SafeDeploy

SafeDeploy is a CI/CD and DevOps Tool that uniquely ensures that all pre-existing functionality works with an application update, eliminating failed canary and blue/green deployments.

Unlike the existing manual “try it and hope it works” canary and blue/green deployment process, SafeDeploy guarantees that all pre-existing functionality works in the new release leading to higher quality and greater agility.

SafeDeploy accomplishes the above with the following unique approach:

  • Unique data collection agent – the only agent able to collect the complete execution stream of an application in production without impacting production operations
  • Unique data collected – the only dataset of the actual execution stream of the application in production
  • Unique playback capability – the only ability to play back the complete execution stream from the previous production release into the new release candidate
  • Unique comparison capability – the only ability to compare the results of the production execution stream across the production release and the release candidate in order to prevent issues

Please fill out the Contact Me Form (link) for a personal demonstration of SafeDeploy and a discussion about how it can dramatically improve your time to market and online business performance.

Author: Michael Lombardo

Michael Lombardo is an IT professional with experience in Operations and project management with companies in the Financial, Wireless, Telecom and Insurance industries. Michael holds a Master’s Degree in Telecommunications from Golden Gate University and a current PMP certificate. Michael currently works as a Technical Program Manager – Infrastructure in the Insurance industry and is an Adjunct Professor at the Ageno School of Business Golden Gate University. Lately he is advising Safedeploy as strategist in IT Operations.