Risk Management in Kanban

12 minute read

This is the fourth post in preparation for my presentation at the Lean Kanban Netherlands 2012 conference, about Enhanced Risk Management in Kanban via the Theory of Constraints. In the earlier posts…

…we learned about the Theory of Constraints, and in particular how TOC deals with:

  • Schedule Management
  • Buffer Management
  • Risk Management
  • Root Cause Analysis
  • People Factors
  • Continuous Improvement

The purpose was to learn how we can use the Theory of Constraint to manage risks, and to improve our software engineering processes. We learned to react to emergency situations (special cause variation), or handle the most pressing recurring problems (common cause variation) by improving the underlying process, and eliminate the root causes. We came across a variety of tools for doing so.

(Also, a general introduction to the Theory of Constraints was given in the earlier post: Theory of Constraints in Software Engineering.)

In this post, we will focus on what has become a very popular approach as of lately: the Kanban Method (or just “Kanban” for short). Specifically we will look at how Kanban is equipped to deal with risk.

We will find that Kanban is already excellently equipped to handle risks, and thus ensure final delivery of software development projects. After learning in more detail about how Kanban handles risk management, in later posts we will look at how we can use what we have learned thus far from the Theory of Constraints to introduce significant innovations over a standard Kanban process.

The end result will give you both the flexibility of Kanban, and the systematic approach of the Theory of Constraints for managing risk and improving the underlying processes.

Kanban in a Nutshell

Kanban is one of the many tools used in Lean Manufacturing, and has appeared as a software method recently. [PATTON-2009] gives a simple introduction to Kanban. [LADAS-2008] and [ANDERSON-2010] describe the method in much detail. In [ANDERSON-2012] there is an recollection of how the method was conceived of over a period of thirteen years.

A Pull System with Limited Work in Process

In essence: Kanban is a pull system whereby work in progress is limited per work state. The number of items in any work state (like analysis, development, testing, etc.) is predetermined, and that limit must be respected. Kanban imposes very few rules, and is even less prescriptive than Scrum. Therefore, Kanban is often combined with other agile approaches: for instance, [LADAS-2008] and [KNIBERG-2010] show how to combine Kanban with Scrum. For the very same reason, it is just as easy to combine Kanban with other, more traditional methods.

Estimation is Optional in Kanban

Kanban, just like other lean methods, is very concerned about identifying and eliminating waste. This goes as far as to consider estimation as optional, or even as waste. Many Kanban software projects forego estimation entirely.

Originally, the idea that estimation is waste comes from Lean Manufacturing, where production companies adopted pull systems. Due to the nature of pull, sales forecasts become redundant, as a consequence. Both Lean Manufacturing in [WOMACK-2003] and the Theory of Constraint in [COX-2010] come to this conclusion that forecasts are wasteful.

Note, however, that there is a substantial difference between sales forecast and software estimation. Referring to what we learned about the Thinking Processes of the Theory of Constraints, the former is outside of your Sphere of Influence, while the latter is definitively in your Span of Control. Sales forecasts are purely speculative, while software estimate are based on first hand experience of your team: you can resort to Expert Estimation. While it is possible to forgo estimation, it is wise to consider other negative consequences. For instance, the missed opportunity of engaging the team in discussions in order to develop a shared understanding and a common vision about the challenges ahead; this is especially important in new greenfield projects (with lots of unknowns) rather than in established projects.

Risk Management is Optional in Kanban

Kanban gives no mandatory recommendations for risk management; though there are many proposals of how to extend Kanban in order to better deal with risk. Many different ideas are emerging, and it is likely Kanban will be extended with explicit risk management in the near future. The present series of blog posts can be considered as a proposal in this direction, where we seek to use the Theory of Constraints to improve risk management in Kanban.

Risk Management without Estimation is Difficult

With respect to risk management, leaving out estimation misses a good opportunity to expose risks, and prevents using Quantitative Risk Management techniques which are well known in a traditional setting. If the intention is to combine Kanban with traditional risk management (based on statistics and probabilities), then it is necessary to resort to software estimation too.

However, the notion that estimation is waste is intriguing, because a lot of risks originate from, and are almost induced by, the very gap that exists between estimates and actuals. Typically, all risks related to expectations and schedule are of this nature. We should therefore consider if we can abandon estimate altogether. Furthermore, with the freeing of resources (those employed in the estimation process), there will be more resources available to better prevent such common risks from materializing.

Kanban Metrics instead of Estimation

Kanban offers different software metrics. [ANDERSON-2010] mentions:

  • WIP Load
  • Lead Time
  • Cycle Time
  • Touch Time
  • Due Date Performance
  • Throughput
  • Flow Efficiency
  • Initial Quality,
  • Defect Rates
  • Failure Load.

With these metrics, risk management can be exercised differently too. Instead of effort estimates, Kanban relies on past performance metrics, and expects the software development team to deliver according to precedent Service Levels. If sufficient past data exists about the team’s delivery performance, then this is a viable alternative that can forgo estimates altogether.

Kanban and Schedule, Budget and Scope Risks

Kanban, like other Agile process, is well equipped to handle the classical schedule, budget and scope risks. These risks are well known in any software project. [ADDISON-2002] confirms that these are the most frequently recurring risks. [BOEHM-1991] ranked them among the top six risk factors in absolute terms, already two decades ago.

Kanban’s fast task flow and short feedback loops allow to specify schedule and budget early, while scope risk is controlled because Kanban is designed to deal with Continuous Requirement Changes, as observed by [IKONEN-2011].

Kanban and Event-Driven Risk Management

Kanban applies the Lean Software Development [POPPENDIECK-2003] principle of committing decisions to the Last Responsible Moment, which in practice enables a Real Options approach, and allows to make decisions with more facts and less speculation.

Kanban is the most reactive of all Agile/Lean methods, and is capable of reacting quickly once problems surface, creating the conditions for Event-Driven Risk Management.

With or without explicit risk management, Kanban offers the practical advantage of reacting to negative events much faster than other approaches, and therefore decreases the need to invest time in Up-front Risk Planning.

Explicit Risk Management in Kanban

Explicit risk management is not prescribed in Kanban; but it is not ignored either. On the contrary, it is encouraged as [ANDERSON-2010] explains:

Building a strong risk management capability as part of an overall goal of improving organizational maturity will improve the predictability of a software engineering function whether it is using Kanban or not. However, Kanban systems exhibit greater predictability when risk is managed well. This build greater trust in the system.

The predictability derives from Kanban ability to quickly arrive at the Steady State, and to immediately react when something goes wrong. WIP limits reduce risks and their impact because “only a small fraction of work is in progress at any given time.” In other words: There is less that can go wrong.

Classify Work Items by Type and Class of Service

Work items can be categorized according to Work Item Type, and classified with a Class of Service. WIP limits can be assigned not only to work states, but also to work item types and class of service. Such additional WIP limits help to absorb common-cause variation. Though, it can be argued, they don’t help in identifying the causes of common-cause variation.

Risk Management through Capacity Allocation with Class of Service Policies

Capacity Allocation is not only an operational practice, but it becomes a risk management practice. WIP limits can differ for different classes of work items. Policies governing how work must be performed can be different according to the class of service. Such policies can take into consideration the risk profile of the class of service or of the work item under consideration; and establish their priority within the system. [ANDERSON-2010] says:

Class of service allow for a self-organizing, value- and risk-optimized approach to prioritization and re-planning.

What class of services to use, and what policies govern their processing, is left to the discretion of the project team. [GOVINDARAJ-2011] suggests class of services similar to those shown in the following figure:

Sample Kanban Class of Services

When class of services are aligned with Risk Profiles, a Risk-Oriented View emerges, encouraging actively thinking about risks, and promoting a Risk-Oriented Pull Policy.

[GOVINDARAJ-2011] further suggests focusing on risk mitigation by relating either to Market Risk (and acknowledge the need of learning, and make cheap prototypes to be validated by the target market), or to Technology Risk (and make a “spike” to grasp the technical unknowns). This facilitates prioritizing the project backlog with a risk-oriented view, and assigning a class of service to work items.

Issue Management and Escalation Policies

When risks materialize they become problems, or issues. [ANDERSON-2010] suggest resorting to strong Issue Management and Escalation Policies. An issue is considered as a first-class work item type blocking the delivery of customer-valued work.

A strong capability for issue management and resolution is essential to maintain flow. [Issues] should be managed as special-cause variation. […] Issue management should be a strong focus of daily stand-up meetings. A strong capability for issue escalation is essential as part of a strong capability for issue management. Escalation policies should be clearly defined and documented and all team members should be aware of them. Escalation policies work better when they are agreed upon collaboratively by all departments involved in the value-stream.

Effective escalation policies can be as simple as promoting the class of service of any problematic work item. For example, a “Fixed Date” item can be promoted to an “Expedite” item if the due date performance is at risk.

Once issues have been resolved, or as part of their resolution, it is suggested to subject them to root-cause analysis, to improve the process.

No Organizational Risk or Coordination Risks with Kanban

Kanban avoids other Organizational Risks and Coordination Risks, because it does not use fixed length iterations, but resorts to Decoupled Cycles (of planning vs implementation) and Variable Work Item Sizes. This affects: prioritization cadence vs. development cycle vs. delivery cadence. Value delivery is decoupled from work item size/effort variability.

While Kanban actually strives to avoid variability in work item size (to increase predictability of performance), it does not predetermine the size, and can handle different sizes, avoiding the risk of artificial mismatch between methodology prescription, and units of value.

As we will see, Kanban’s ability to handle differently sized work items is one of the key features that enables better risk management, when combined with the Theory of Constraints.

Team Swarming Reduces the Impact of Risk Materialization

A common Kanban practice it that of resorting to Team Swarming to resolve issues. Consequently, issue resolution becomes much faster, and the impact of risk materialization becomes smaller. (Usually impact increases with time and delayed or inappropriate resolution.)

Note: A consequence of team swarming is that specialized roles should not be shared between different teams. Naturally, the single team composition must be multi-functional. Kanban almost implies that allocated resources are fully dedicated to the work that is flowing through the system. If you have multiple teams working on multiple projects, or even on parallel development of the same project, and you share resources amongst them, then the advantages will not be realized — delays and waiting times will be introduced: at that point you might just as well use the Critical Chain Project Management to coordinate resources and tasks; but that jeopardize the very purpose of using Kanban in the first place.

Kanban is Good, but not Enough!

From all the above, it is obvious that Kanban offers one of the strongest approaches to risk management. However, it is not enough. While the event-driven nature of risk management, with escalation policies and class of services, is excellent at reacting to Special Cause Variation, there is no consideration given to Common Cause Variation.

This is very much attuned to Deming’s teaching of intervening only when Special Cause Variation is identified, while one should not be concerned with Common Cause Variation. On the other hand, Reinertsen tells us that in a product development settings, Common Cause Variation should be taken into account. That is exactly what the Theory of Constraints will allow us to do. By using the techniques of the Theory of Constraints that were presented in the earlier posts, we will learn to identify the constraints in your process. Such constraints are due to Common Cause Variation. With the focused approach of the Theory of Constraints, we will not handle all sources of Common Cause Variation indiscriminately. Instead we will be able to focus on the one constraints (the weakest link) that is holding your process back from delivering at greater performance levels.

Once this marriage between Kanban and Theory of Constraints is celebrated, you will have an excellent system that takes care of all emergencies (Special Cause Variation) that happen in your project and that hinder the steady state work flow. At the same time, you will have the power of the Theory of Constraints to identify constraints in your process to identify the weaknesses in your process (Common Cause Variation), and then improve it continuously, increasing throughput and decreasing cycle time.

The event driven risk management of Kanban will keep your project healthy, and you can quickly extinguish any fire that might develop. At the same time the systematic root cause analysis power of Kanban will allow you to realize its process of ongoing improvement, and prevent that fires develop in the first place. Once you improve your process systematically, the benefit will be harvested not only by your current project, but also by all projects that follow. Your steady state throughput will have increased, and your steady state cycle time will be lower.

Stay tuned to see how this can be done.

Updated:

Leave a Comment