Principal Machine Learning Engineer - GenAI
- חולון
- משרה קבועה
- משרה מלאה
- Architect and lead scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.
- Build optimization & profiling tools for inference performance, including GPU utilization, memory footprint, CUDA kernel efficiency, and parallelism strategies.
- Develop Validation-as-a-Service platforms with APIs and self-service tools for standardized, on-demand model evaluation.
- Integrate and optimize model serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic) in production environments.
- Establish dataset & scenario management workflows for reproducible, comprehensive evaluation coverage.
- Implement observability & diagnostics systems (Prometheus, Grafana) for real-time benchmark and inference performance tracking.
- Deploy and manage workloads in Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.
- Lead performance engineering efforts to identify bottlenecks, apply optimizations, and document best practices.
- Stay ahead of the GenAI ecosystem by tracking emerging frameworks, benchmarks, and optimization techniques, and integrating them into the platform.
- Advanced Python for ML/GenAI pipelines, backend development, and data processing.
- Kubernetes (Deployments, Services, Ingress) with Helm for large-scale distributed workloads.
- Deep expertise in LLM serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic).
- GPU optimization mastery: CUDA, mixed precision, tensor/sequence parallelism, memory optimization, kernel-level profiling.
- Design and operation of benchmarking/evaluation pipelines with metrics for accuracy, latency, throughput, cost, and robustness.
- Experience with Hugging Face Hub for model/dataset management and integration.
- Familiarity with GenAI tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.
- Argo CD and Argo Workflows for reproducible ML orchestration.
- CI/CD (GitHub Actions, Jenkins) for ML workflows.
- Cloud expertise (AWS/GCP) for provisioning, running, and optimizing GPU workloads (A100, H100, etc.).
- Monitoring and observability (Prometheus, Grafana) and database experience (PostgreSQL, SQLAlchemy).
- Distributed training across multi-node, multi-GPU environments.
- Advanced model evaluation: bias/fairness testing, robustness analysis, domain-specific benchmarks.
- Experience with OpenShift/RHOAI for enterprise AI workloads.
- Benchmarking frameworks: GuideLLM, HELM (Holistic Evaluation of Language Models), Eval Harness.
- Security scanning for ML artifacts and containers (Trivy, Grype).
- Design of tradeoff-analysis tools for model selection and deployment.
Mploy