IT professionals in general, and Quality Assurance people in particular, spend a great deal of time ensuring that new and changed systems do what they are expected to do, and only what they are expected to do before they are promoted into the Live environment. The commercial driver for getting new and changed products to market, however, often means that Quality Assurance processes are under massive pressure to agree on the implementation of change, and to do so faster, cheaper and more efficiently, every time.
Test environments, regression testing and the release management processes needed to control code progression into Live can therefore all feel like expensive overheads. It’s deceptively attractive to fall into the trap of thinking “Couldn’t we just implement in Live – if it doesn’t work, well, we can always back out the change”.
Live environments should always be separated from the test environments. In practice, in regulated environments, this is a requirement for compliance. In mature organisations, it’s usual to operate with at least three domains: Production/Live, Testing/QA and Development:
It is also worthwhile noting that mature organisations will not use actual production data outside the production domain. Although rarely appropriate, if a copy of production data needs to be used, the data must be anonymised first. This introduces the challenge that in anonymising data copied from Live, data “referential integrity” must be maintained, especially if the data is drawn from multiple sources.
Even when organisations achieve that level of environment segregation and test data control, it remains that we live in a world of devolved control of systems and multi-team delivery into Live.
Some companies are not as aware of the actual content of their live environment as they need to be. Take, for example, the story of Knight Capital – once known as a world-leading market maker in equities. In August 2012, poor code control meant that one of Knight’s eight servers was running with code that should have been removed from Live. An unrelated software upgrade caused that old code to be executed, resulting in a major disruption in the prices of 148 companies listed at the New York Stock Exchange, and ultimately a pre-tax loss of $440 million for Knight Capital and bankruptcy.
In addition to the potential consequences of poor code control, we live in a world in which increasingly, key financial decisions are being made by systems that operate at speed far beyond a human’s ability to recognise errors, and to respond before they cause serious damage. Recognising this, the Markets in Financial Instruments Directive II (MiFID II) requires that companies explicitly recognises the risks of this type of “algorithmic” trading. The Financial Conduct Authority (FCA), has made its views clear on the consequences of MiFID II; due to the tendency of algorithmic trading to increase the speed, volume and complexity of the system-to-system messaging involved in the trading process, the FCA is absolutely clear that testing of these systems must be much more tightly controlled prior to Live than has been accepted to now. Any trading organisation that fails to implement their requirements will do so at considerable risk to their business.
The pattern of well-controlled, segregated test environments as a route to success has been demonstrated time and again in mature, stable organisations. On the other hand, the consequences of not implementing, tight control of systems assurance in high volume, high-speed algorithmic trading was demonstrated elegantly by Knight Capital. Indeed, the consequences of not doing so, enforced by regulatory control, have been made even more serious by the move to MiFID II. It remains to be seen how many trading operations and commercial banks have recognised the risks implicit in these messages and upgraded the levels of control applied to their assurance processes as a result.