About Customer
An AI-driven tool that simplifies the creation of professional presentations by offering smart design suggestions, templates, and real-time collaboration features. It allows users to focus on content while AI manages the design.
Overview
Challenge
- Manual deployment processes slowed down feature releases.
- Difficulty in rolling out updates quickly impacted time-to-market.
- Limited ability to scale the platform efficiently during peak traffic.
- Operational overhead increased as the platform grew.
- Inconsistent release cycles affected product agility and responsiveness.
Solution
- Containerized the application and deployed it on AWS EKS for scalable orchestration.
- Built CI/CD pipeline for safe rollouts and quick rollbacks.
- Enabled Horizontal Pod Autoscaling to handle traffic spikes dynamically.
- Implemented real-time monitoring for proactive issue detection.
- Leveraged AWS-managed services and secure network isolation to reduce overhead and boost security.
- Optimized resource allocation to improve efficiency and lower cloud costs.
Outcome
- Maintained 100% application uptime by leveraging AWS EKS.
- Achieved 60% faster deployments with zero downtime and lower operational effort.
- Reduced cloud costs by 30% through traffic-based autoscaling.
Architecture Diagram
Business Challenge
- The application was hosted on Virtual Machine, which made changes, deployments, and patches slow and risky. Adding new features was challenging and required extra effort for testing.
- Manual deployment slowed down release cycles and required significant effort. The process was error-prone and lacked consistency across environments. Without automation, scaling and agility were difficult to achieve.
- Critical resources were hosted on the public network, increasing the system’s vulnerability to external threats and risking business continuity and data security.
- Manual debugging and the absence of observability and automation tools made issue resolution slow and inconsistent, putting product stability and future scalability at risk.
- Once everything was evaluated and planned, then went to the implementation phase with the team support.
Solution
To overcome these challenges, Sentinelfox migrated their critical application to AWS EKS, leveraging its robust features to ensure enhanced scalability and reliability.
Infrastructure Redesign
- The entire infrastructure was revamped to improve security and operational efficiency. Applications were containerized using Docker to ensure consistent behavior across all environments(dev, staging & prod). AWS EKS was adopted for reliable and scalable container orchestration.
- We built a custom VPC, moving critical components like the RDS database, EKS Cluster & Worker Nodes, Cache and other components into private subnets to minimize public exposure.
- Secure access to the cluster is managed through a bastion host, enabling controlled entry without exposing the private network to the internet.
Deployment Automation
- A robust deployment pipeline was implemented using github action to automate the build and delivery process. The pipeline pulls updated code from GitHub, builds new Docker images, and pushes them to AWS ECR.
- Application infrastructure and Kubernetes configurations are stored in a separate repository, which Argo CD continuously monitors to apply updates automatically. This GitOps approach ensures seamless, reliable deployments integrated with the existing cloud infrastructure.
Monitoring & Observability
- To maintain system health and catch issues early, APM was deployed to monitor critical application metrics such as response times, transaction volumes, and error rates. Alerts are configured to notify the team instantly via Slack.
- Implemented Grafana to monitor the EKS cluster’s performance, tracking resource usage and pod status. Real-time alerts from Grafana are also routed to Slack, ensuring the team stays informed and can respond quickly to any operational issues.
Business Benefit of Migrating to AWS EKS
- The redesign system enhanced user experience and laid the foundation for a future-ready infrastructure, ensuring consistent and seamless application performance. Migrating to AWS EKS improved reliability, scalability, and deployment efficiency.
- Automated CI/CD pipelines using Jenkins, ECR, and Argo CD enabled faster and more reliable deployments, reducing manual effort and allowing teams to focus on Delivering Business Value.
- Standardized deployment processes improved operational consistency across environments(dev, staging & prod), and autoscaling within EKS helped optimize resource usage, reducing cloud costs and increasing efficiency.
- Centralized monitoring with Grafana, AWS CloudWatch and New Relic enabled real-time visibility and faster issue detection. Proactive alerts reduced downtime, improved system reliability, and allowed the team to focus more on delivering business value.
Modernizing deployment workflows to accelerate developer productivity by 50%
About Customer
DataviCloud is a no-code, AI-powered business and revenue intelligence platform designed to make data insights accessible to non-technical users. It integrates seamlessly with tools like CRM, billing, and support systems, allowing teams to gain actionable insights without writing SQL or building complex dashboards. With features like AutoML, GenAI, and real-time predictive analytics, DataviCloud helps organizations improve pipeline health, forecast revenue accurately, and drive data-informed decisions—all through a simple, conversational interface.
Overview
Challenge
- No Fallback or Rollback Mechanism in the Deployment Workflow
- Lack of Version Control for Critical Infrastructure Manifests
- Slow Tenant Registration and Onboarding Process
- Single-Region Database with Cross-Region Access Dependency
- Limited Observability and Troubleshooting Capabilities
Solution
- Adopt GitOps with ArgoCD for declarative and automated deployments.
- Optimize the tenant bootstrap pipeline to reduce onboarding time.
- Leverage Helm charts and Kubernetes operators for scalable tenant provisioning.
- Establish secure, low-latency cross-region database replication using AWS PrivateLink and NLB.
- Implement end-to-end observability with Grafana, Prometheus, Loki, and Alertmanager.
Outcome
- Improved 50% faster tenant onboarding.
- Enabled 50% faster deployments by implementing GitOps best practices.
- Scalable tenant rollout with Helm & Argocd.
- Enabled developers to focus more on product innovation by minimizing operational overhead.
Architecture Diagram
Business Challenge
- Tenant onboarding relied on a shell script with no fallback or fault tolerance, making the process fragile. Failures often went undetected due to a lack of visibility, leading to unreliable provisioning.
- The script and application manifests were stored on a single server. Any failure of this server could result in loss of access and recovery challenges, introducing major reliability risks.
- The master tenant registry is hosted in the Mumbai (ap-south-1) Kubernetes cluster. To ensure a seamless global experience and prevent duplicate tenant creation, this database must be available across regions. However, replicating and synchronizing it in other regions introduces significant operational and architectural complexity.
- The script dynamically generates Kubernetes manifest YAMLs at runtime. These manifest files are not stored in any version control system (e.g., Git). In the event of a Jenkins server failure, all generated manifests would be lost, making it difficult or impossible to re-provision the tenant or recover the state.
Solution
To address these challenges, Sentinelfox implemented a,
- GitOps-based CI/CD pipeline using ArgoCD for automated and reliable deployments.
- A cross-region database replica was created, and secure connectivity was established using AWS PrivateLink, ensuring consistent tenant management and enhanced operational resilience.
-
Implemented the job on kubernetes for New tenant sign-up, Changed from shell script to job, where jenkins pod run inside the Cluster and triggers the new tenant creation via webhook and have stages of checksum for required resources for every new tenant, that make sure things goes right. Then updates the New tenant configuration on GitHub repo for smooth transition.
-
CI/CD deployments were streamlined using GitHub, Jenkins, and ArgoCD, enabling seamless code changes and automated tenant creation. Application configurations were managed and deployed via Helm charts, providing an effective rollback mechanism.
-
Clusters were provisioned in two regions, with ArgoCD deployed in Mumbai to manage deployments. A master-slave database setup was implemented on Kubernetes, Mumbai as master and US as slave ensuring to avoid duplicate tenant creation. AWS PrivateLink was used to securely enable cross-cluster communication.
-
Built Monitoring for App to clusters has been initialised as pod itself in cluster, Grafana and Prometheus Deployed through helm Charts for observability and visibility of system, no need of separate server to maintain. Therefore, we deployed every workload as pods in clusters to reduce cloud cost. And enabled real-time error alerting notifications are instantly pushed to Slack channels for immediate visibility and response.
Business Benefit
- 80% automation in new tenant onboarding by replacing manual shell scripts with Kubernetes jobs triggered via Jenkins webhooks, ensuring reliable and error-free tenant creation.
- 70% faster CI/CD deployments with streamlined workflows using GitHub, Jenkins, and ArgoCD, enabling rapid and consistent application updates.
- 99.9% uptime and data consistency ensured multi-region cluster provisioning and master-slave database replication secured via AWS PrivateLink.
- Real-time alerting enabled with 100% incident visibility via Grafana, Prometheus, and Slack notifications, reducing mean time to detect and respond by over 60%.