Senior AI Network System Architect
- תל אביב
- משרה קבועה
- משרה מלאה
- Define, develop, and execute cutting-edge benchmarks and workloads to analyze system performance, identify bottlenecks, and drive optimizations across our hardware and software stack.
- Drive the direction of our future products by performing deep-dive analysis of system architectures and solutions to assess their performance, efficiency, and value proposition.
- Develop and validate sophisticated performance and network simulation models, correlating them with real-world hardware to predict and analyze the behavior of future systems.
- Analyze and optimize the entire AI stack, including communication libraries (like NCCL) and system software to the underlying network fabric, developing Proof-of-Concepts (POCs) for new features and improvements.
- Conceptualize next-generation networking architectures driven by emerging DL and AI technologies.
- Collaborate with multi-functional teams, including other architecture teams, logic design, system software, firmware, and DL research teams, to ensure the successful execution of our vision.
- M.Sc. or Ph.D. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
- 6+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
- Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
- Proven experience in simulative performance analysis or benchmarking.
- Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to translate complex technical data into strategic architectural insights.
- Hands-on programming skills in Python and/or AI frameworks for system analysis, automation, and modeling.
- Ability to thrive in a fast-paced, dynamic environment and work concurrently with multiple groups across the organization.
- Expertise in the architecture and system-level requirements of large-scale, distributed DL workloads (e.g., LLMs, Generative AI for vision).
- Deep understanding of communication libraries such as NCCL, UCX, or UCC.
- Expertise in network protocols (Ethernet, InfiniBand, RoCE) and large-scale network topologies.
- Experience with industry-standard AI benchmarks (e.g., MLPerf) and NVIDIA's frameworks (e.g., NeMo) on large-scale clusters.
- הובלת הקמה של מערכת חדשה לניהול נכסי השקעות
- אפיון תהליכי עיבוד נתונים וחישובים מורכבים
- עבודה מול לקוחות עסקיים (חטיבת ה...
Mploy