Innovative solutions driving your business forward.
Discover our insights & resources.
Explore your career opportunities.
Learn more about Sogeti.
Start typing keywords to search the site. Press enter to submit.
Generative AI
Cloud
Testing
Artificial intelligence
Security
Imagine it is Friday night and after a long week of hard work you are set to enjoy some much deserved time off. You have planned a night with your significant other to finally watch that movie you have heard so much about. Your kids are at their grandparents and you order your favourite takeout.
But then disaster strikes. When ordering food there is an outage in the ordering system, the payment provider is down and when you try to start the movie on your chosen streaming provider you are faced with timeouts and errors.
Now all these issues can be worked around or are probably just temporary issues, but they do severely impact your experience as a customer. It might lead you to order food from other vendors next time or choose a different streaming platform for your movies and series.
It is therefore imperative that organizations spend time and resources on preventing outages as much as possible and when they happen make sure that either the customer does not notice at all or the problem is quickly solved.
There are many reasons an outage can occur so we need several strategies to mitigate the risk of an outage and recover quickly when they happen.
So how can we do it? Like humans, we can give our systems an immune system that can fight off threats by applying healthy practices. For humans that would be healthy food, regular exercise, enough sleep and rest and when needed see a doctor or specialist.
We can also apply this to digital systems. The six pillars of Digital Immune Systems do just that.
First and foremost getting software from commit to production takes several steps where both accidents and malicious intent have a profound effect on the reliability of the released software. Cases such as the Solarwind attack show the importance of good supply chain security on the reliability and customer experience of software systems.
Chaos Engineering refers to the practice of deliberately introducing failures in systems and seeing how the systems respond to those failures. These chaos experiments can be applied at different levels and in different environments depending on the maturity of the organization. There is much to say about Chaos Engineering, however in a nutshell the practice starts with a hypothesis and designing an experiment to introduce failures such as introduced latency and intermittent failures, This allows engineers to prove the reliability of their systems in production.
By using AI to augment traditional testing we can increase the confidence in our systems and for example come up with scenarios that humans have a hard time to come up with. With more and more automation applying AI to do autonomous testing we save time, increase confidence and reduce risk.
By ensuring we have end-to-end observability over our systems not only in production but also during the entire SDLC. The insight we gain from the metrics, logs and traces allow engineers to see issues before they hit production. If the system is already in production it allows engineers to detect issues and take measures before users are impacted thus increasing the customer experience.
With the increased complexity of software systems new problems are popping up constantly. Depending on the scale of the system this can demand many engineers to be caring for the system. With Auto Remediation we can cater for many different cases both inside and between components. This reduces the demand for human resources and allows for systems to constantly repair themselves. These can be concepts such as retrying failed requests, autoscaling based on metrics and complex workflows in case of specific events.
And since Auto Remediation runs 24×7 it will also help ensure those Friday nights are problem free.
Automation and AI are important to increase efficiency and reduce the workload of experienced engineers. However the human side cannot be understated. By applying good engineering practices and balancing delivery velocity with reliability the customer experience has a constant focus. This reduces the burden on teams for remediation and dealing with technical debt.
None of the concepts from the six pillars are really new. However by applying them together we get more than the sum of the parts. And in the end supplying your customers with the best experience is the purpose of IT solutions and where Digital Immune Systems really shine.
Because all of us deserve their Friday night movie with their favourite food without frustration.
Lead Software Architect | Netherlands
Data Governance is a foundational element for organizations striving to harness the power of data while managing associa…
Data governance – the framework that ensures data is accurate, secure, and well-managed – plays a critical role in shapi…
organizations spend time and resources on preventing outages as much as possible and when they happen make sure that eit…
We use cookies to improve your experience on our website. They help us to improve site performance, present you relevant advertising and enable you to share content in social media.
You may accept all cookies, or choose to manage them individually. You can change your settings at any time by clicking Cookie Settings available in the footer of every page.
For more information related to the cookies, please visit our cookie policy.