
Site Reliability Engineer
- הרצליה
- משרה קבועה
- משרה מלאה
- Collaborate with cross-functional teams to monitor and support cloud-deployed services.
- Design, implement, and maintain automated CI/CD pipelines for seamless deployment into cloud environments.
- Develop and implement observability and monitoring solutions using tools such as Datadog, ELK, Prometheus, or Grafana.
- Troubleshoot, diagnose, and resolve production issues, ensuring minimal impact on customers.
- Lead and participate in incident response, including root cause analysis and post-mortems.
- Manage security incidents and contribute to strengthening cloud security practices.
- Contribute to cloud architecture reviews and support SaaS operations processes.
- Develop scripts and automation tools to improve operational efficiency.
- Proven experience in Cloud DevOps and Site Reliability Engineering.
- Strong hands-on experience with monitoring/observability platforms (e.g., Datadog, Grafana).
- Solid knowledge of cloud security principles, incident handling, and best practices.
- Experience with cloud architecture reviews, SaaS processes, and supporting secure cloud environments.
- Proficiency in scripting (Bash, Python, etc.) and building automated workflows/pipelines.
- Ability to troubleshoot complex issues across distributed systems.
- Excellent communication skills (written and verbal), with the ability to collaborate effectively across teams.
- Hands-on experience with Azure
- Development experience in Python for automation and tooling.
- Experience with microservices development, web services, and service integrations.
- Experience with Docker and Docker Compose for containerized environments.