Site Reliability Engineer
- תל אביב
- הכשרה
- משרה מלאה
- Full Deployment Ownership: Manage and execute full deployments of new versions across all regions and data centers, ensuring seamless end-to-end rollout.
- Monitoring and Dashboards: Build powerful dashboards that enable real-time incident detection, rapid troubleshooting, and proactive issue prevention.
- Alerting Systems: Develop, maintain, and optimize alerting mechanisms that serve the DevOps team and enable swift responses to any operational incidents.
- System Stability and Scalability: Continuously strengthen system stability and introduce new resilience mechanisms to support the department's ongoing innovation and rapid evolution.
- Disaster Recovery (DR): Own and manage our Disaster Recovery strategy and execution, ensuring minimal downtime and strong recovery processes.
- Close Collaboration with DevOps: Work hand-in-hand with the DevOps team to deliver robust, scalable, and high-availability infrastructure.
- 2-3 years expiriance in DevOps team or SRE team.
- Proven experience in SRE, DevOps, or infrastructure engineering roles.
- Deep expertise in cloud environments, automation, and system reliability practices.
- Strong skills in monitoring, observability tools (e.g., Prometheus, Grafana, Datadog), and alerting systems.
- Familiarity with disaster recovery planning and execution.
- A proactive, solution-driven mindset with a passion for system stability and innovation.
- Excellent communication and collaboration skills, with a hands-on, ownership mentality.
- ליווי הלי...
Mploy