Senior Software Engineer - AI Benchmarking
- רעננה
- משרה קבועה
- משרה מלאה
- Platform Architecture & Development: Lead the design and implementation of a modular, extensible, and cloud-native benchmarking platform capable of supporting large-scale AI model evaluations across diverse workloads and hardware.
- Scalable Infrastructure: Build distributed systems optimized for high-throughput and low-latency execution of benchmarks, ensuring the platform can scale seamlessly to meet growing demands.
- Open-Source Leadership: Design and contribute to open-source benchmarking and validation tools, fostering community adoption and influencing industry benchmarking standards.
- Integration with AI Tooling: Develop robust APIs, services, and orchestration layers to connect the benchmarking platform with inference engines and downstream consumers such as AI Hub and observability platforms.
- Workflow Automation: Implement advanced automation for scheduling, executing, and monitoring benchmarks using Kubernetes, OpenShift, and Argo Workflows.
- Data Systems & APIs: Build reliable data pipelines for benchmark result ingestion, storage, querying, and integration with decision-support tools.
- Performance & Reliability Engineering: Proactively identify and resolve system bottlenecks, optimize for resource efficiency, and ensure high availability.
- Engineering Excellence: Apply industry-leading development practices, including automated testing, CI/CD, and rigorous code review, to maintain exceptional software quality.
- Ecosystem Awareness: Maintain deep knowledge of the AI infrastructure and benchmarking ecosystem, tracking emerging frameworks, standards, and best practices to guide platform evolution.
- Proficiency in Python for backend development, API integration, and data processing.
- Hands-on experience with Kubernetes (Deployments, Services, Ingress).
- Ability to create and maintain Helm charts.
- Strong understanding of Docker for building and managing containers.
- Experience with CI/CD pipelines (GitHub Actions or Jenkins).
- Proficiency with Argo CD for workflow orchestration and GitOps.
- Familiarity with model serving frameworks such as vLLM, TGI, or LMDeploy.
- Experience with cloud platforms (AWS or GCP)
- Knowledge of monitoring tools such as Prometheus, Grafana, or Streamlit.
- Experience working with PostgreSQL and an ORM such as SQLAlchemy.
- Experience with Go for tooling and infrastructure development.
- Familiarity with Argo Workflows for pipeline orchestration.
- Experience with OpenShift or RHOAI.
- Direct hands-on experience with cloud GPUs.
- Knowledge of rollout event monitoring and advanced observability practices.
- Familiarity with artifact and code security scanning tools (e.g., Trivy, Grype).
- Experience with LLM benchmarking frameworks such as GuideLLM.
Mploy