Site Reliability Engineer
- תל אביב
- משרה קבועה
- משרה מלאה
- Ensure reliability & scalability of our production environment across multiple cloud providers.
- Define and implement SRE best practices-fostering a culture of ownership, continuous improvement, and automation.
- Automate everything-from infrastructure deployment to self-healing mechanisms that eliminate manual intervention.
- Design and improve observability solutions (monitoring, logging, tracing) to enable faster detection and resolution of issues.
- Optimize alerting strategies to ensure actionable, high-quality alerts while minimizing noise and fatigue.
- Improve system resilience, driving chaos engineering, failover strategies, and automatic recovery processes.
- Enhance incident response processes, including on-call strategies, root cause analysis, and post-mortems to drive long-term stability.
- Collaborate with development teams to build reliable, scalable, and efficient architectures, ensuring seamless deployment and rollback processes.
- Promote a culture of reliability, educating teams on best practices, service ownership, and production-readiness.
- 3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
- Strong expertise in Kubernetes and container orchestration in production.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Proven experience with monitoring & observability tools (Prometheus, ELK, Grafana, Coralogix, etc.).
- Strong scripting/programming skills (Python, Go, Bash, or similar).
- Experience with Infrastructure as Code (IaC)-Terraform, Helm, or similar tools.
- Track record of improving system reliability, scalability, and performance.
- Experience designing and implementing self-healing mechanisms to minimize human intervention.
- Ability to foster a strong reliability culture across engineering teams, leading by example.
- Excellent problem-solving skills, with a proactive and ownership-driven mindset.
- ניהול ותחזוקת תשתיות azure עם דגש על עלות-ביצועים.
- פיתוח אוטומציות ותהליכי ci/cd (argoc...
Mploy