Senior Technical Operations Engineer
- תל אביב
- משרה קבועה
- משרה מלאה
- Lead deep-dive root cause investigations across infrastructure, application, and network layers.
- Take ownership of critical, time-sensitive customer issues, driving resolution under pressure.
- Collaborate directly with R&D on cases requiring code-level fixes or changes.
- Lead live incidents and SEVs: manage real-time troubleshooting, coordinate with internal teams, and provide timely customer updates.
- Facilitate and document technical post-incident retrospectives, contributing to RCA reports and driving follow-up action items.
- Identify and correlate recurring issues across incidents to uncover systemic weaknesses and chronic problems.
- Drive cross-functional initiatives aimed at eliminating root causes and improving production stability.
- Develop and execute preventive maintenance tasks: system health checks, resource monitoring, configuration audits, and more.
- Work closely with the Observability team to improve alert quality, proactive issue detection, and incident response readiness.
- Implement and maintain operational runbooks, diagnostic scripts, and internal tools to streamline support workflows.
- Build automation solutions to reduce manual tasks and accelerate troubleshooting.
- Contribute to self-service tools for Tier 1 and Tier 2 teams to reduce escalation volume.
- 5+ years of deep, hands-on experience in Tier 3 Technical Support, Production Operations, or Systems Engineering roles.
- Proven track record of resolving complex, customer-facing production issues in mission-critical environments.
- Strong expertise with Linux, networking fundamentals, and distributed system troubleshooting.
- Extensive experience leading live incidents and war rooms, with a focus on real-time impact mitigation.
- Solid working knowledge of log analysis, metrics monitoring, and observability tools such as Grafana, ELK, and Datadog.
- Proficiency in scripting languages (Python, Bash, or equivalent) for automation and tooling.
- Excellent communication skills: able to translate complex technical issues for both technical teams and non-technical stakeholders.
- Strong sense of ownership, urgency, and customer empathy, especially under pressure.
Mploy