
Our daily lives are built on seamless digital experiences, from checking in for a flight to running global logistics.
The reliability of our applications has become a fundamental business dependency.
EMEA Field CTO at New Relic.
The recent high-profile outages affecting major brands serve as a stark reminder that a single incident can cause damage impacting revenue, brand reputation, and consumer trust.
The true cost of IT outages
Recent research has revealed the enormous cost of outages and downtime for organizations across the EMEA region, finding that high-business-impact outages carry an annual median cost of $102 million (£79.9 million) for EMEA organizations, and $38 million (£28.3 million) for organizations based in the UK & Ireland specifically.
The median cost per hour of high impact outages in EMEA is $2 million (£1.49 million), that’s the equivalent of $33,333 or £24,835 for every minute of downtime.
Outages occur more frequently than many would guess; the newspaper headlines we read divulge only a fraction of the outages that actually occur.
Thirty-seven percent of respondents from the report say high-business-impact outages occur at least on a weekly basis, putting their brand’s reputation at risk and impacting customer experience.
It’s time for a new approach
The traditional approach to handling outages is broken. For too long, organizations have adopted a reactive mindset — a culture of firefighting where problems are addressed only after they have impacted customers.
This is an unsustainable strategy. It’s a resource drain that costs millions and diverts valuable engineering talent from innovation to crisis management.
The same research shows that more than a quarter (26 percent) of engineer teams’ time is spent addressing disruptions. Too many customers (41 percent) are learning about software and system disruptions through outdated means such as manual checks, complaints from internal stakeholders — or worse — from customers.
The data shows that deploying observability tools has a substantial positive impact on the ability for organizations to detect and resolve issues before they lead to outages and poor customer experiences.
Sixty-three percent of respondents said mean time to detection (MTTD) and 64 percent said mean time to resolution (MTTR) improved measurably since adopting observability solutions.
To truly mitigate the threat of an outage, we must shift our focus from reacting to preventing. This requires a new mindset where we plan for the worst-case scenario from the very beginning of the design and build phase, long before an application ever goes into production.
Ultimately, this is about a cultural shift. High standards and engineering excellence must be ingrained in everything we build, starting even before the first line of code is written.
Observability, in this new world, must be treated not as a tool for reactive monitoring, but as an integral part of the software development lifecycle (SDLC) from the very beginning.
Building a new engineering mindset will only work if there is buy-in from key stakeholders across the organization.
A lack of a cohesive strategy in IT systems often stems from decentralized decision making in organizations. with a lack of clear governance and policy in the tooling and software that different departments use. Organizations are beginning to understand the advantages of consolidating their tools.
Although the average number of tools used per EMEA organization is four, our data shows that 10 percent of EMEA organizations have consolidated to a single observability tool, up from 2 percent in 2022, and 44 percent plan to consolidate tools within the next year.
The benefits of consolidating observability tools are vast; from boosting productivity and efficiency and driving security and resilience, to generating better data to power decision making.
Balancing speed with stability
It may be contentious to say so, but an argument can be made that, in our race for speed, our devotion to Agile methodology may have gone too far.
Our relentless pursuit of speed, often championed by Agile methodologies, has at times inadvertently sidelined the rigorous engineering practices that once ensured stability.
While Agile is a powerful framework for rapid development and adaptability, a singular focus on feature velocity can lead to a neglect of thorough architectural planning, formalized testing, and comprehensive documentation.
To achieve a more robust and sustainable approach, it’s beneficial to revisit principles from methodologies like Six Sigma.
Six Sigma, originating from manufacturing and rooted in statistical process control, provides a structured, data-driven methodology for eliminating defects and improving processes. Its foundation in engineering practices emphasizes:
- Defining the problem or defect
- Measuring the extent of the problem with data.
- Analyzing the root causes of the problem
- Improving solutions to address the root causes
- Establishing measures to sustain the improvements
Applying these Six Sigma principles to software engineering, particularly with the aid of observability, can significantly enhance stability. Observability tools provide the critical data needed for the “Measure” and “Analyze” phases of Six Sigma. Engineers can leverage this data to:
- Proactively identify and prevent issues: Instead of reacting to outages, observability allows teams to detect anomalies and potential problems early in the development lifecycle, aligning with Six Sigma’s emphasis on defect prevention.
- Improve root cause analysis: Detailed telemetry from observability tools helps pinpoint the exact cause of issues, enabling more effective “Improve” actions.
- Drive continuous improvement: By continuously monitoring system health and performance, engineers can use observability data to inform ongoing process adjustments, fostering a culture of continuous improvement and quality control
- Foster a data-driven culture: When engineers are empowered with comprehensive observability data, they can make informed decisions, understand the impact of their changes, and take ownership of system reliability, embedding engineering excellence into every stage of the software development lifecycle.
Building the resilient digital infrastructure of the future
This is not about slowing down innovation, but rather building a solid foundation that can withstand the complexity of modern systems. By re-embedding these practices, organizations can build a resilient digital infrastructure that protects against the inevitable, securing both their systems and their reputation for the long term.
By embedding a culture of engineering excellence and making observability a core part of the development process, we can build a resilient digital infrastructure that protects against the inevitable and secures our enterprises for the future.
We’ve featured the best business plan software.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
https://cdn.mos.cms.futurecdn.net/2fTNETW2pThAt9VGi9zXW8-970-80.jpg
Source link




