Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Wednesday, 29 October 2025

Cloud: Resilience Is Always Key

Cloud: Resilience Is Always Key

The recent widespread outage across Microsoft Azure services, impacting everything from Microsoft 365 and Outlook to Xbox Live and Copilot, has given me pause for thought ["Azure services back after outage: What 'went wrong and why' hours before Microsoft's Q3 results announcement" (https://timesofindia.indiatimes.com/technology/tech-news/azure-services-back-after-outage-what-went-wrong-and-why-hours-before-microsofts-q3-results-announcement/articleshow/124929960.cms), "Microsoft 365 down? Current problems and outages | Downdetector" (https://downdetector.ca/status/microsoft-365/)]. It appears a simple configuration error in Azure Front Door triggered cascading failures globally. While Microsoft engineers were quick to roll back changes and restore services, the incident serves as a potent reminder of the inherent vulnerabilities in our increasingly interconnected digital infrastructure.

Reflecting on this, I'm reminded of conversations from years past, discussions that feel remarkably relevant today. Back in 2013, when we faced local server issues, I was already stressing the importance of system uptime and proactive solutions. I recall a specific incident where an electrical maintenance work at Hyde Park meant our servers would be inaccessible. My immediate concern was, "What happens to our Web sites? We cannot allow these to shut down!" and I urged Kailas Patil (kailas.patil@thepalladiumgroup.com) to find a solution ["What happens to our Web sites ?" (http://emailothers.blogspot.com/2013/08/re-maintenance-work-hyde-park.html)].

Later, during a crucial discussion about our website's hosting, when ports suddenly stopped working, I was deeply involved in troubleshooting alongside Manoj Hardwani (manoj.hardwani@atidan.com) and Sandeep Tamhankar (stamhankar@apple.com). We delved into the intricacies of CPU utilization, firewall settings, and public accessibility. Sharon even suggested constant logging to ensure uninterrupted service ["Google Cloud Configurations" (http://emailothers.blogspot.com/2023/09/google-cloud-configurations.html)].

In fact, years ago, when we faced a hard-disc crash, I had already predicted this type of challenge and even proposed a solution at the time, advocating for a shift to cloud hosting. I wrote about turning "setbacks into opportunities," explicitly considering the advantages of moving our site "totally onto CLOUD" to avoid future crashes and gain "rapid scalability to cope-up with any sudden future increase in data-transfer" ["From Setback to Step Up" (http://emailothers.blogspot.com/2013/04/from-setback-to-step-up.html)]. I consulted with Kailas Patil (kailas.patil@thepalladiumgroup.com), Shuklendu (shuklendu.baji@sentientsystems.net), and Nitin on these very ideas.

Now, seeing how even a giant like Microsoft can be brought to its knees by a configuration error, it's striking how relevant that earlier insight still is. It highlights that even the most advanced cloud infrastructures are not immune to human error and complex system interactions. Microsoft CEO Satya Nadella (satyan@microsoft.com) rightly emphasized the company's "commitment to resilience and innovation," even as they reported strong Q3 earnings amidst the disruption ["Azure services back after outage: What 'went wrong and why' hours before Microsoft's Q3 results announcement" (https://timesofindia.indiatimes.com/technology/tech-news/azure-services-back-after-outage-what-went-wrong-and-why-hours-before-microsofts-q3-results-announcement/articleshow/124929960.cms)]. This focus on resilience is not just a buzzword; it's an existential necessity in our digital age. Reflecting on it today, I feel a sense of validation for my earlier concerns and also a renewed urgency to constantly revisit and reinforce our approaches to system reliability, because the value of continuous availability is paramount.


Regards, Hemen Parekh


Of course, if you wish, you can debate this topic with my Virtual Avatar at : hemenparekh.ai

Executives You May Want to Follow or Connect
Joseph Eapen
Joseph Eapen
Chief Technology Officer, Exdion | LinkedIn
After having extensive experience in product development leadership roles with leading global product companies, now focused on providing AI/ML based solutions ...
joseph_eapen@exdion.com
Chandi Prasad Ojha
Chandi Prasad Ojha
CTO | CAIO | Data & AI Strategist | Enterprise ...
Architected, designed & developed software solutions across multiple business ... Working as Chief Technology Officer & AI Led Transformation Leader with Movate.
Binny Sebastian, CHA, MBA
Binny Sebastian, CHA, MBA
Luxury Hotel General Manager ...
Luxury Hotel General Manager | South Asia's Best GM 2023 | Pre-Opening & Brand Positioning | Guest Experience Curator | Remote Island | Global Leadership ...
Jameson Solomon
Jameson Solomon
General Manager | Luxury & Upscale ...
Jul 31, 2025 ... General Manager | Luxury & Upscale Hospitality Leader | Pre-Opening Specialist | P&L Growth | Multi-Property Operations | Brand Excellence ...
jameson.solomon@hilton.com
Dr. Venkat R Naidu
Dr. Venkat R Naidu
Executive Vice President
Reddy's Laboratories. As we continue to push the boundaries of pharmaceutical innovation in our existing markets, I'm excited to extend the same focus and ...
venkatramanan@drreddys.com

No comments:

Post a Comment