Software Engineer · Google Cloud

I build the systems
that train AI.

Joseph Thomas — Cornell MEng CS · US Patent Holder · Swarm Robotics Researcher

Engineering Gemini-powered synthetic data pipelines, LLMOps platforms, and RLHF datasets at planet scale.

Open to consulting & speaking Scholar GitHub LinkedIn
Joseph Thomas
21
Google Awards
27
Citations
1
US Patent
4
Publications
10B+
Events / Day

Experience

06 roles
Software Engineer — Google Cloud Oct 2019 – Present
Google · Mountain View, CA
  • Engineered synthetic data generation pipelines utilising Gemini, producing high-fidelity instruction-tuning pairs to improve foundational model performance.
  • Architected a low-latency LLMOps platform (DQaaS) using gRPC/Protobufs, enabling enterprise-scale prompt versioning, testing, and retrieval for GenAI and RAG workflows.
  • Led backend storage and data integrity for CrowdCompute, Google's massive-scale data engine critical for generating RLHF and high-quality SFT datasets for foundation models.
  • Provided technical leadership to the Crowd Data Platform team, driving the evolution of AI data-generation tools for GenAI/LLM use cases.
  • Worked cross-functionally with various teams to streamline the collection of high-quality data.
  • Received 21 awards including 5 Spot Bonuses
Software Development Engineer — Big Data Technologies Feb 2017 – Sep 2019
Amazon · Palo Alto, CA
  • Architected and launched DataCraft, a centralised data ingestion platform processing > 10 billion events / day
  • Designed and implemented resilient, fault-tolerant ingestion systems using Kinesis, Lambda, and S3, ensuring data integrity and timely delivery into the data lake.
Data Scientist Feb 2015 – Jan 2017
Datanyze · San Mateo, CA
  • Owned the strategy and roadmap for a product that integrated directly with customer CRMs (e.g., Salesforce) to analyse clients' most successful customers.
  • Utilised proprietary technology to spider millions of websites daily, identifying technology stacks used by other companies.
  • Leveraged unique "technographic" data as a predictive attribute to find "look-alike" leads, mirroring clients' ideal customer profiles with a higher conversion likelihood.
  • Conducted beta tests demonstrating remarkable results, including an increase in qualified opportunities and higher average deal sizes.
Energy Analytics Software Developer Jun 2013 – Dec 2014
Ascend Analytics · Oakland, CA
  • Successfully migrated the core energy analytics codebase from SAS to WPS, resulting in an estimated $250K annual savings and improving platform stability.
Senior Analyst — R&D Jun 2011 – Jul 2012
Global Analytics · Chennai, India
  • Led a four-member team predicting risk in online lending using social media profiles of customers, generating $1.2M revenue (R, Python, MySQL)
  • Designed, developed and tested key modules in the Automated Modeling Platform. (R, Python, MySQL)
Associate — R&D Jan 2010 – Jun 2011
Idea Research and Development · Pune, India
  • Developed SCION, an evolutionary computational algorithm for maximising the performance of the intake in an air-breathing missile. (Matlab, Python, Qt Creator)

Education

03 degrees
CS
Master of Engineering, Computer Science
Cornell University · New York, USA
Machine Learning · AI · NLP · Algorithm Design · Databases
2012 – 2013 GPA 3.5 / 4.0
AE
MSc (Eng), Aerospace Engineering
Indian Institute of Science · Bangalore, India
Thesis: Odor Source Localization using Swarm Robotics
2006 – 2008 GPA 7.0 / 8.0
EC
BTech, Electronics & Communication Engineering
Government Engineering College · Trichur, India
Thesis: Blind Source Separation using Independent Component Analysis
2002 – 2006
The key insight from swarm intelligence is that complex, intelligent behavior emerges from simple rules applied at scale — the same principle behind modern distributed AI systems.
— From invited talk at IDSIA, Switzerland

Patents, Publications & Talks

06 entries
View Google Scholar Profile
Conference Paper 12 citations
Strategies for Locating Multiple Odor Sources using Glowworm Swarm Optimization ↗
J. Thomas, D. Ghose · IICAI, pp. 842–861 · 2009
Patent 11 citations
Detection of Nuclear Spills Using Swarm Optimization Algorithms ↗
D. Ghose, J. Thomas, K.N. Krishnanand · US Patent 8,838,271 · 2014
Thesis 3 citations
Odor Source Localization using Swarm Robotics ↗
J. Thomas · Master's thesis, Indian Institute of Science, Bangalore · 2008
Book Chapter 1 citation
A GSO-Based Swarm Algorithm for Odor Source Localization in Turbulent Environments ↗
J. Thomas, D. Ghose · Handbook of Approximation Algorithms and Metaheuristics, 2nd Ed., pp. 711–737 (CRC Press, 2018)
Invited Talk
Industry Talk: Data Science Road
University of New Brunswick · March 18, 2022
Invited Talk
Odor Source Localization using Swarm Robotics
IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale), Switzerland · Nov 20, 2009

Skills & Tools

GenAI & ML
Gemini icon Gemini RAG RLHF SFT LLMOps Prompt Engineering Synthetic Data
Cloud & Infrastructure
Google Cloud icon GCP AWS icon AWS Kinesis AWS Lambda icon Lambda AWS S3 icon S3 Distributed Systems Vector Databases
Languages & Backend
Python icon Python Java icon Java gRPC icon gRPC PostgreSQL icon SQL Protobuf