
Senior Machine Learning Backend Engineer
- תל אביב
- משרה קבועה
- משרה מלאה
- ML Serving Architecture: Design and implement high-performance inference APIs and model serving backends using Java & Python.
- Model Lifecycle Management: Build systems for model versioning, A/B testing, canary deployments, and automated rollbacks.
- Integration Platform: Create robust APIs and SDKs that enable product teams to integrate AI capabilities seamlessly.
- Observability & Monitoring: Build comprehensive metrics, logging, and tracing systems for ML workloads.
- Cross-team Leadership: Mentor engineers & researchers, drive technical decisions, and influence platform architecture across the organization.
- 6+ years of hands-on experience in large-scale backend development, with strong emphasis on Java programming and building high-performance AI/ML inference systems.
- Strong analytical and problem-solving skills, with the ability to debug and resolve complex technical issues in AI applications serving millions of requests daily.
- Experience with cloud platforms (AWS preferred, Azure, or Google Cloud) and building scalable microservices architectures for AI model serving and data processing pipelines.
- Advantageous experience with ML frameworks (TensorFlow, PyTorch), model serving platforms (Triton, TorchServe, KServe), and building high-throughput AI-powered APIs and data processing systems.
- Excellent communication skills, both verbal and written, with the ability to articulate technical AI system design decisions clearly and collaborate effectively with ML engineers, data scientists, and DevOps teams.
- A Bachelor's degree in Computer Science, Engineering, or a related field is preferred. Experience with AI/ML systems in production environments is highly valued.
🚀 Scale That Matters
We're not just another ML team - we're powering AI that directly impacts how thousands of companies understand their customers. Our platform serves 40M+ inference requests daily, scaling to hundreds of millions, with real business consequences riding on every millisecond of latency.🔬 Research Meets Production
We bridge the gap between cutting-edge AI research and bulletproof production systems. Our engineers work directly with researchers to take models from paper to production, solving novel challenges in LLM serving, real-time inference, and GPU optimization that don't have Stack Overflow answers.⚡ Technical Excellence at Speed
- Sub-100ms p95 latency while serving dozens of models simultaneously
- 99.9% uptime for business-critical AI features used by enterprise customers
- Continuous deployment of new models without service interruption
- Cost optimization that saves millions while improving performance
You won't just maintain existing systems - you'll architect the next generation of ML infrastructure. From GPU cluster design to API optimization, from model lifecycle management to observability platforms, you'll own the complete stack.🔮 Future-Forward Challenges
As AI transforms from feature to core product differentiator, we're solving tomorrow's problems today: multi-modal AI serving, LLM optimization, real-time personalization at scale, and building the infrastructure that will power the next generation of AI-driven products.