Senior Machine Learning Engineer - GenAI
- חולון
- משרה קבועה
- משרה מלאה
- Benchmarking Platform Development: Design and implement scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.
- Optimization Tooling: Build utilities and automation to profile, debug, and optimize inference performance (GPU utilization, memory footprint, CUDA kernels, parallelism strategies).
- Validation-as-a-Service: Develop APIs and self-service platforms for model evaluation, enabling teams to run standardized benchmarks on demand.
- Serving Integration: Integrate and operate high-performance serving frameworks (vLLM, TGI, LMDeploy, Triton) with cloud-native deployment patterns.
- Dataset & Scenario Management: Create reproducible workflows for dataset preparation, augmentation, and scenario-based testing to ensure robust evaluation coverage.
- Observability & Diagnostics: Implement real-time monitoring, logging, and metrics dashboards (Prometheus, Grafana) for benchmark and inference performance.
- Cloud-Native Orchestration: Deploy and manage benchmarking workloads on Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.
- Integration with GenAI Tooling: Leverage Hugging Face Hub, OpenAI SDK, LangChain, LlamaIndex, and internal frameworks for streamlined evaluation workflows.
- Performance Engineering: Identify bottlenecks, apply targeted optimizations, and document best practices for inference scalability.
- Ecosystem Leadership: Track emerging frameworks, benchmarks, and optimization techniques to continuously improve the evaluation platform.
- Advanced Python for backend development, data processing, and ML/GenAI pipelines.
- Kubernetes (Deployments, Services, Ingress) and Helm for large-scale distributed training and inference workloads.
- LLM training, fine-tuning, and optimization (PyTorch, DeepSpeed, HF Transformers, LoRA/PEFT).
- GPU optimization expertise: CUDA, mixed precision, tensor/sequence parallelism, memory management, and throughput tuning.
- High-performance model serving with vLLM, TGI, LMDeploy, Triton, and API-based serving (OpenAI, Mistral, Anthropic).
- Benchmarking and evaluation pipelines: dataset preparation, accuracy/latency/throughput measurement, and cost-performance tradeoffs.
- Multi-model, multi-engine comparative testing for optimal deployment decisions.
- Hugging Face Hub for model/dataset management, including private hosting and pipeline integration.
- GenAI development tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.
- Argo CD & Argo Workflows for reproducible, automated ML pipelines.
- CI/CD (GitHub Actions, Jenkins) for ML lifecycle automation.
- Cloud (AWS/GCP) for provisioning, running, and optimizing GPU workloads (A100, H100, etc.).
- Monitoring & observability (Prometheus, Grafana) and databases (PostgreSQL, SQLAlchemy).
- Distributed training across multi-node, multi-GPU clusters.
- Advanced model evaluation: bias/fairness, robustness, and domain-specific benchmarks.
- Experience with OpenShift/RHOAI for enterprise AI deployments.
- Benchmarking frameworks: GuideLLM, HELM, Eval Harness.
- Security scanning for artifacts/containers (Trivy, Grype).
- Tradeoff-analysis tooling for model selection and deployment
Mploy