hero

The Storyboard

Welcome to the Storyboard, a place to explore career adventures at start-ups and companies founded by Claremont alumni and the Claremont community. Choose your next adventure at a company where you’ll have an edge from day one, and leverage our Claremont network to build your career.

Also, make sure to check out our newsletter, StoryHouse Review, to find out more about these companies in the Claremont ecosystem.

Staff Machine Learning Architect

Neurophos

Neurophos

Software Engineering, IT, Data Science
United States · California, USA · San Jose, CA, USA · San Mateo, CA, USA
Posted on Jan 16, 2026
At Neurophos, listed as one of EE Times’ 2025 100 Most Promising Start-ups, we are revolutionizing AI computation with the world’s first metamaterial-based optical computing platform. Our design addresses the traditional shortcoming of silicon photonics for inference and provides an unprecedented AI engine with substantially higher throughput and efficiency than any existing solution.

We've created an optical metasurface with 10,000x the density of traditional silicon photonics modulators. This enables a solution with 100x gains in power efficiency for neural network computing without sacrificing throughput; we've made improvements there, too. By integrating metamaterials with conventional optoelectronics, our compute-in-memory optical system surpasses existing solutions by a wide margin and enables truly high-performance and cost-effective AI compute.

Join us to shape the future of optical computing.

Location: Austin, TX or San Francisco, CA. Full-time onsite position.

Position Overview

We are seeking an experienced machine learning architect to lead the porting and optimization of large language models (LLMs), diffusion models, and other ML applications to our revolutionary optical inference engines. This role is critical to demonstrating the full potential of our metamaterial-based optical processing units (OPUs) by adapting state-of-the-art AI models to leverage our ultra-high-throughput, low-precision compute architecture. The ideal candidate will bridge the gap between cutting-edge ML research and novel hardware capabilities, ensuring customers can seamlessly deploy their AI workloads on Neurophos hardware.

Key Responsibilities

  • Lead the porting of LLM applications, diffusion models, and visual ML applications to Neurophos optical inference engines
  • Adapt models from diverse sources, including GitHub, Hugging Face, other open-source repositories, and customer private models
  • Work with models in various formats, including PyTorch, Triton, JAX, and emerging frameworks
  • Develop and implement quantization strategies to migrate models from higher precision formats (FP8, INT8, and above) to our optimized 4-bit precision (FP4/INT4) for weights and activations
  • Design and execute re-quantization, retraining, and other model adaptation techniques to minimize accuracy loss during precision reduction
  • Create or integrate third-party tools and workflows for efficient model porting and optimization
  • Optimize GEMM operations for high-throughput execution
  • Develop benchmarking methodologies to measure and validate model quality post-porting, including perplexity metrics and other quality indicators
  • Collaborate with hardware and software teams to co-optimize model architectures for optical compute characteristics
  • Publish research papers on novel optimization techniques and methodologies (with appropriate IP protection)

Qualifications

  • MS or PhD in Computer Science, Data Science, Machine Learning, Mathematics, or related field
  • 7+ years of experience in machine learning engineering with at least 3 years focused on model optimization and deployment
  • Deep expertise in neural network quantization techniques, including post-training quantization (PTQ) and quantization-aware training (QAT)
  • Strong proficiency in PyTorch and familiarity with other ML frameworks (JAX, Triton, TensorFlow)
  • Hands-on experience with transformer architectures, LLMs, and diffusion models
  • Experience with low-precision inference optimization (INT8, FP8, or lower)
  • Strong understanding of GEMM operations and linear algebra optimizations for deep learning
  • Experience with model evaluation metrics, including perplexity, accuracy, and benchmark suites
  • Track record of successfully deploying ML models on specialized hardware accelerators
  • Excellent communication skills with the ability to collaborate across hardware and software teams

Preferred Skills

  • Experience with sub-8-bit quantization (INT4, FP4) and mixed-precision inference
  • Familiarity with Hugging Face Transformers library and model hub ecosystem
  • Experience with ONNX, TensorRT, or other model optimization frameworks
  • Background in analog or optical computing architectures
  • Knowledge of in-memory computing paradigms and matrix-vector multiplication acceleration
  • Published research in model compression, quantization, or efficient inference
  • Experience with large-scale batch inference optimization
  • Familiarity with prefill vs. decode optimization strategies in LLM inference

What We Offer

  • A pivotal role in an innovative startup redefining the future of AI hardware.
  • A collaborative and intellectually stimulating work environment.
  • Competitive compensation, including salary and equity options.
  • Opportunities for career growth and future team leadership.
  • Access to cutting-edge technology and state-of-the-art facilities.
  • Opportunity to publish research and contribute to the field of efficient AI inference.

If you are passionate about pushing the boundaries of model optimization and driving impact in the semiconductor industry, we want to hear from you! This is a rare opportunity to work on a game-changing technology at the intersection of photonics and AI. As part of our elite team, you’ll contribute to a platform that redefines computational performance and accelerates the future of artificial intelligence. Be a key player in bringing this transformative innovation to the world.