
Senior Software Architect, AI Networking
- תל אביב
- משרה קבועה
- משרה מלאה
- Design and evolve scalable architectures for multi-node LLM inference across GPU clusters.
- Develop infrastructure to optimize latency, throughput, and cost-efficiency of serving large models in production.
- Collaborate with model, systems, compiler, and networking teams to ensure holistic, high-performance solutions.
- Prototype novel approaches to KV cache handling, tensor/pipeline parallel execution, and dynamic batching.
- Evaluate and integrate new software and hardware technologies relevant to model inference (e.g., memory hierarchy, network topology, modern inference architectures).
- Work closely with internal teams and external partners to translate high-level architecture into reliable, high-performance systems.
- Author design documents, internal specs, and technical blog posts and contribute to open-source efforts when appropriate.
- Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or equivalent experience.
- 5+ years of experience building large-scale distributed systems or performance-critical software.
- Deep understanding of deep learning systems, GPU acceleration, and AI model execution flows.
- Solid software engineering skills in C++ and/or Python, with strong familiarity with CUDA or similar platforms.
- Strong system-level thinking across memory, networking, scheduling, and compute orchestration.
- Excellent communication skills and ability to collaborate across diverse technical domains.
- Experience working on LLM inference pipelines, transformer model optimization, or model-parallel deployments.
- Demonstrated success in profiling and optimizing performance bottlenecks across the LLM training or inference stack.
- Familiarity with data center-scale orchestration, cluster schedulers, or AI service deployment pipelines.
- Passion for solving tough technical problems and shipping high-impact solutions.