Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Other Jobs To Apply

<p><strong>Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA</strong></p><p><br></p><p><strong>Title: </strong>Machine Learning Engineer</p><p><strong>Location: San Jose, CA</strong></p><p><strong>Responsibilities:</strong></p><ul><li>Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost).</li><li>Scale training across nodes/GPUs (DDP/FSDP/ZeRO, pipeline/tensor parallelism) and own throughput/time-to-train using profiling and optimization.</li><li>Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality.</li><li>Build and maintain model-serving systems (vLLM/Triton/TGI/ONNX/TensorRT/AITemplate) with batching, streaming, caching, and memory management.</li><li>Integrate with vector/feature stores and data pipelines (FAISS/Milvus/Pinecone/pgvector; Parquet/Delta) as needed for production.</li><li>Define and track performance and cost KPIs; run continuous improvement loops and capacity planning.</li><li>Partner with ML Ops on CI/CD, telemetry/observability, model registries; partner with Scientists on reproducible handoffs and evaluations.</li></ul><p><br></p><p><strong>Educational Qualifications:</strong></p><ul><li>Bachelors in computer science, Electrical/Computer Engineering, or a related field required; Master’s preferred (or equivalent industry experience).</li><li>Strong systems/ML engineering with exposure to distributed training and inference optimization.</li></ul><p><br></p><p><strong>Industry Experience: </strong></p><ul><li>3–5 years in ML/AI engineering roles owning training and/or serving in production at scale.</li><li>Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.</li><li>Experience collaborating across Research, Platform/Infra, Data, and Product functions.</li></ul><p><br></p><p><strong>Technical Skills:</strong></p><ul><li>Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow.</li><li>Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism); distributed training experience a plus</li><li>Optimization: experience profiling and optimizing code execution and model inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention</li><li>Scalable serving: autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers.</li><li>Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector), Parquet/Delta, object stores.</li><li>Write performant, maintainable code</li><li>Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation.</li></ul><p><br></p><p><strong>Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA</strong></p>

Back to blog