Friday, October 2, 2009

When it Just Has to Work

At Agile 2009, Brian Shoemaker and Nancy Van Schooenderwoert gave a presentation about safety critical agile projects. They start about by describing how software can contribute to poor safety like in chemical plants, power stations, aviation systems, and medical devices. Many think the solution is do things in a control and sequential matter. In traditional planning, there is a lot of emphasis on upfront planning. In agile you have to have a sense of direction. A product backlog has stories which are requirements that are still negotiable.

Next they discuss some benefits of Agile. It has 2 great strengths: fast time to market, ability to hit a moving target (tracer bullets). Agile teams bring the certainty of project costs forward in time. In traditional development, code gets more brittle with time. More effort is needed to create the same feature later. In general velocity goes down with time as the code base gets more and more complicated (Non-linear effort vs. results). In agile, if we finish off what we commit to in every iteration (delivering small features), we can keep the effort vs. results curve almost linear. If we want to estimate the future based on past performance, a conservative estimate is to use the lowest velocity of the team of the past couple of iterations. Lastly, agile development takes away of the hiding of bad news.

They then cover how risk management benefits from iteration. With iterations, we analyze risk early and often. Requirements and hazards converge when we have positive stories and negative stories prioritized in the product backlog. Hazards are often caught in context. Analysis can be done using a classical risk ranking matrix of probability (high, occasional, low, remote) and severity (major, moderate, minor). Acceptability is ranked using Unacceptable (mitigation required), As low as reasonably practical (mitigate as reasonable, risk decision must be document and reviewed), and Negligible (acceptable without review). Another technique is the fault tree analysis which tracks what is the bad thing that can happen. You start with effect and decide what could be the cause. Another approach is failure mode effect and criticality analysis where you build up from components and decide what can fail in each component.

Next they describe 5 types of failure:

1. Direct failure: software flaw is in normal correct use of system causes or permits incorrect dosage or energy to be delivered to patient.

2. Permitted misuse: software does not reject or prevent entry of data in a way that a) is incorrect according to user instructions, and b) can result in incorrect calculation or logic, and consequent life-threatening or damaging therapeutic action.





3. User complacency: although software or system clearly notes that users must verify results, common use leads to over-reliance on software output and failure to cross-check calculations or results.

4. User interface confusion: software instructions, prompts, input labels, or other information is frequently confusing or misleading and can result in incorrect user actions with potentially harmful or fatal outcome.

5. Security vulnerability: Attack by malicious code causes device to transmit incorrect information, control therapy incorrectly, or cease operating. There are no examples in medical-device software known at this time, but experience in personal computers and small cellular phones suggest this is a serious possibility.

They then describe how team autonomy in agile forces a rethink on interactions. Experienced agile teams report that each individual feels accountability for all team commitments. Specialized yet coordinated, cooperating, and decisions get made and carried out smoothly. This contributes enormously to safety. Safety emerges when there is trust, group renewal and group learning. However, it is important to have a clear mission and to have a team decision making mechanism that does not split the difference.

They share with us some tips for manager: clear bounds for team autonomy, manage team membership with a light touch, stakeholder decision needed rapidly, honest estimates expected – transparency, allocate team members 100%, clear blockages promptly, participate in retrospectives, get a coach – training is not enough.

Finally they wrap up by concluding that iterations lead to safer product, and happier auditors. The advantages of agile include the ability to resolve incomplete/conflicting requirements, ability to reprioritize requirements (mitigations) as system takes shape, and many chances to identify hazards (control not frozen too soon). The key elements are collaborating with customer, delivering working system early and often, automated testing, reviewing hazards often as system becomes better understood, documenting effectively and flexibly.

This presentation is available on InfoQ at  http://www.infoq.com/presentations/when-it-just-has-to-work