Site Reliability Engineer
- תל אביב
- משרה קבועה
- משרה מלאה
- Design, implement, and maintain monitoring and alerting systems (e.g., Prometheus, Grafana) to detect and prevent reliability issues.
- Develop tools and automation (Python, Bash, etc.) for improving infrastructure reliability and operational efficiency.
- Collaborate with R&D and Product teams to embed reliability-first principles into every stage of the development process.
- Participate in and improve incident response processes, including running blameless postmortems and implementing preventive measures.
- Enhance our Infrastructure-as-Code (IaC) and CI/CD practices to streamline deployments and reduce risk.
- Maintain and extend internal AI-driven tools, such as bots that support SRE workflows (on-call management, triaging, etc.).
- Document infrastructure, playbooks, and operational procedures to facilitate onboarding and knowledge sharing.
- ניהול ותחזוקת תשתיות azure עם דגש על עלות-ביצועים.
- פיתוח אוטומציות ותהליכי ci/cd (argoc...
Mploy