Machine Learning Engineer
- רמת גן
- משרה קבועה
- משרה מלאה
- Design and implement strategies to maximize the efficient utilization of GPU resources across the organization.
- Develop tools and processes for GPU allocation, workload management, and performance monitoring in alignment with selected infrastructure tools.
- Monitor and fine-tune GPU performance to ensure optimal throughput for machine learning workloads.
- Build and maintain a robust system for automated reporting of key model performance metrics.
- Integrate with diverse data sources to create customizable dashboards for monitoring performance across datasets.
- Set up anomaly detection systems and alerts to ensure timely identification of performance degradation.
- Enhance the existing benchmarking suite for seamless evaluation of datasets in federated data lakes.
- Partner with machine learning scientists, data engineers, and DevOps teams to enable researchers to efficiently train and deploy models.
- Provide technical guidance and support for effectively utilizing available infrastructure and tools.
- Stay updated with the latest advancements in GPU technologies, ML infrastructure best practices, and model performance metrics.
- Evaluate and recommend new tools, technologies, and approaches to enhance the efficiency of the ML enablement platform.
- Implement best practices for Git workflows, code versioning, and safe release processes.
- Foster a culture of high-quality, collaborative development within the engineering team.
- Bachelor's degree in Computer Science, Engineering, or a related field
- 4+ years of experience as a software engineer
- Proficiency in Python and Git
- 2+ years of experience in cloud infrastructure or developer platform teams
- Hands-on experience with high-performance computing (HPC) and GPU cluster performance optimization for AI workloads
- Strong knowledge of GPU technologies and deployment strategies
- Familiarity with GCP compute deployment options, such as Kubernetes
- Experience integrating observability tools for model performance metrics and evaluation
- Knowledge of federated learning and multi-dataset evaluation methodologies
- Experience in designing and scaling benchmarking frameworks
- Strong analytical and troubleshooting skills in cloud infrastructure and GPU utilization
- You want to make an impact on humankind
- You prioritize “We” over “I”
- You enjoy getting things done and striving for excellence
- You collaborate effectively with people of diverse backgrounds and cultures
- You have a growth mindset
- You are candid, authentic, and transparent
- Lead and mentor a small full-stack...
- Work as part of ...
Mploy