On-Device Machine Learning Engineer
Company: webAI
Location: Austin
Posted on: April 1, 2026
|
|
|
Job Description:
About Us: webAI is pioneering the future of artificial
intelligence by establishing the first distributed AI
infrastructure dedicated to personalized AI. We recognize the
evolving demands of a data-driven society for scalability and
flexibility, and we firmly believe that the future of AI lies in
distributed processing at the edge, bringing computation closer to
the source of data generation. Our mission is to build a future
where a company's valuable data and intellectual property remain
entirely private, enabling the deployment of large-scale AI models
directly on standard consumer hardware without compromising the
information embedded within those models. We are developing an
end-to-end platform that is secure, scalable, and fully under the
control of our users, empowering enterprises with AI that
understands their unique business. We are a team driven by truth,
ownership, tenacity, and humility, and we seek individuals who
resonate with these core values and are passionate about shaping
the next generation of AI. About the Role We’re looking for an
On-Device Machine Learning Engineer to bring modern ML capabilities
directly onto consumer hardware, specifically fast, private, and
reliable. You’ll own the design, optimization, and lifecycle of
models running locally (e.g., iPhone/iPad/Mac-class devices), with
a sharp focus on latency, battery, thermal behavior, and real-world
UX. This role sits at the intersection of ML systems, product
engineering, and performance tuning, and will help power local RAG,
memory, and personalized experiences without relying on the
network. What You’ll Do On-device model optimization and deployment
Convert, optimize, and deploy models to run efficiently on-device
using Core ML and/or MLX. Implement quantization strategies (e.g.,
8-bit / 4-bit where applicable), compression, pruning,
distillation, and other techniques to meet performance targets.
Profile and improve model execution across compute backends
(CPU/GPU/Neural Engine where relevant), and reduce memory
footprint. Local RAG memory systems Build and optimize local
retrieval pipelines (embeddings, indexing, caching, ranking) that
work offline and under tight resource constraints. Implement local
memory systems (short/long-term) with careful attention to privacy,
durability, and performance. Collaborate with product/design to
translate “memory” behavior into concrete technical architectures
and measurable quality targets. Model lifecycle on consumer
hardware Own the on-device model lifecycle: packaging, versioning,
updates, rollback strategies, on-device A/B testing approaches,
telemetry, and quality monitoring. Build robust evaluation and
regression suites that reflect real device constraints and user
workflows. Ensure models degrade gracefully (low-power mode,
thermals, backgrounding, OS interruptions). Performance,
reliability, and user experience Treat battery, thermal, and
latency as first-class product requirements: instrument, benchmark,
and optimize continuously. Design inference pipelines and
scheduling strategies that respect app responsiveness, animations,
and UI smoothness. Partner with platform engineers to integrate ML
into production apps with clean APIs and stable runtime behavior.
What We’re Looking For Strong experience shipping ML features into
production, ideally including mobile / edge / consumer devices.
Hands-on proficiency with Core ML and/or MLX, and the practical
realities of running models locally. Solid understanding of
quantization and optimization techniques for inference
(accuracy/perf tradeoffs, calibration, benchmarking). Experience
building or operating retrieval systems (embedding generation,
vector search/indexing, caching strategies)—especially under
resource constraints. Fluency in performance engineering:
profiling, latency breakdowns, memory analysis, and tuning on real
devices. Strong software engineering fundamentals: maintainable
code, testing, CI, and debugging across complex systems. Nice to
Have Experience with on-device LLMs, multimodal models, or
real-time interactive ML features. Familiarity with Metal / GPU
compute, or performance tuning of ML workloads on Apple platforms.
Experience designing privacy-preserving personalization and memory
(local-first data handling, encryption, retention policies).
Experience building developer tooling for model packaging,
benchmarking, and release management. Prior work on offline-first
architectures, edge inference, or battery/thermal-aware scheduling.
We at webAI are committed to living out the core values we have put
in place as the foundation on which we operate as a team. We seek
individuals who exemplify the following: Truth - Emphasizing
transparency and honesty in every interaction and decision.
Ownership - Taking full responsibility for one’s actions and
decisions, demonstrating commitment to the success of our clients.
Tenacity - Persisting in the face of challenges and setbacks,
continually striving for excellence and improvement. Humility -
Maintaining a respectful and learning-oriented mindset,
acknowledging the strengths and contributions of others. Benefits:
Competitive salary and performance-based incentives. Comprehensive
health, dental, and vision benefits package. 401k Match (US-based
only) $200/mos Health and Wellness Stipend $400/year Continuing
Education Credit $500/year Function Health subscription (US-based
only) Free parking, for in-office employees Unlimited Approved PTO
Parental Leave for Eligible Employees Supplemental Life Insurance
webAI is an Equal Opportunity Employer and does not discriminate
against any employee or applicant on the basis of age, ancestry,
color, family or medical care leave, gender identity or expression,
genetic information, marital status, medical condition, national
origin, physical or mental disability, protected veteran status,
race, religion, sex (including pregnancy), sexual orientation, or
any other characteristic protected by applicable laws, regulations
and ordinances. We adhere to these principles in all aspects of
employment, including recruitment, hiring, training, compensation,
promotion, benefits, social and recreational programs, and
discipline. In addition, it is the policy of webAI to provide
reasonable accommodation to qualified employees who have protected
disabilities to the extent required by applicable laws, regulations
and ordinances where a particular employee works.
Keywords: webAI, Pflugerville , On-Device Machine Learning Engineer, Engineering , Austin, Texas