Zeyneb N. Kaya

Hi! I am Zeyneb, a student at Stanford University. I work on understanding and pushing the limits of AI, exploring robustness, learning from data (efficiently), and statistics/optimization---among other things.
Most recently, I've worked on physics-based foundation models for computational design as co-founder/CTO @ Topological (YCS25); decentralized AI, synthetic data, and midtraining @ Dria; and RL/reasoning with text diffusion LLMs @ Stanford.
I’m always eager to discuss interesting ideas and opportunities—please reach out!
zeynebnk [at] stanford [dot] edu
Research.
My work aims to advance our understanding of AI and its capabilities, and use that to improve them and push their limits in their fundamental challenges. I'm interested in robustness, data/efficiency, and generalizability in distribution shifts, working in machine learning, statistics, and physics.
Listed below are selected relevant publications.
The Unified Cognitive Consciousness Theory for Language Models: Semantic Anchoring, Threshold Activation, and Emergent Intelligence
Edward Y. Chang, Zeyneb N. Kaya, Ethan Chang
Under Review
Vector Space Distance as a Measurement of Word Embedding Variability in Low-Resource Linguistic Environments
Annie K. Lamar, Zeyneb N. Kaya, Nichole M. Nomura
Under Review
Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT
Zeyneb N. Kaya, Annie K. Lamar
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT) @ EACL, 2023
MADLIBS: A Novel Multilingual Data Augmentation Algorithm for Low-Resource Neural Machine Translation
Zeyneb N. Kaya
Regeneron Science Talent Search, 2024 & National Junior Science and Humanities Symposium, 2023
Zeyneb N. Kaya, Souvick Ghosh
arXiv preprint
Full Scope Word Embedding Variability for Low-Resource Languages
Zeyneb N. Kaya, Annie K. Lamar
IEEE MIT Undergraduate Research and Technology Conference, 2023
Zeyneb N. Kaya
Proceedings of the Linguistic Society of America (PLSA), 2023
Women in the Workplace: Analyzing Gender Biases in Corporate Email Communications
Zeyneb N. Kaya
International Conference on Computational Social Science (IC2S2), 2023
What You Say Is What You Think: An Analysis Of Intellectual Humility In Online Discussion Forums
Zeyneb N. Kaya, Manya Sriram
University of California, Santa Barbara, 2022
Ahmet C. Genc, Zeyneb N. Kaya, et al
Annals of the Rheumatic Diseases, 2021
Awards & Recognition.
Etched x Mercor x Cognition Hackathon – 1st Place/$40K Winner 2025
Regeneron Science Talent Search Winner – 5th Place/$90K Winner 2024
Coca Cola Scholar – 2024
PearVC x Anthropic Hackathon – 1st Place/Most Technical Winner, 2025
TreeHacks Scrapybara Prize – 1st Place/$16K-valued Winner, 2025
National Junior Science and Humanities Symposium (NJSHS) – National Honorable Mention, Regional 2nd Math/CS, 2023
Congressional App Challenge – 1st Place Winner, 2021
Olympiad in Linguistics (Online) – 10th Place / 1st in USA, 2023
North American Computational Linguistics Olympiad (NACLO) – Finalist / Invitational Round Qualifier, 2023
International Olympiad in Artificial Intelligence (IOAI) – Team USA invited representative (did not attend due to conflicts)
NCWIT Aspirations in Computing – National Award Winner + Regional Affiliate, 2023
Synopsys Science Fair – 1st Award + CSEF Qualifier (did not attend due to conflicts), 2023
Stanford Women in Data Science (WiDS) Datathon – HS Winner, 2023
RISE Challenge – Finalist, 2023
Technovation Global – Semifinalist, 2021
US Presidential Scholars – Semifinalist, 2024
Geoguessr – Master Tier Player, 2025
National Merit Scholarship – Finalist, 2024
Scholastic Art and Writing Competition – Honorable Mention, 2020
Bausch and Lomb Honorary Science Award, University of Rochester – 2023
Yale YES Scholar + Hanh Scholar – 2024
Columbia Egleston Scholar – 2024
Cornell Hunter R. Rawlings III Presidential Research Scholar – 2024

Education.
Stanford University
Computer Science (AI) /
Minor in Electrical Engineering
Saratoga High School
+ Dual Enrolled West Valley College
ASES (Affiliated Stanford Entrepreneurial Students) Bootcamp Scholar + 2nd Place Winner; Co-director.
Relevant Coursework: Deep NLP; Deep RL; Probability & Stochastic Differential Equations; AI & Language; AI for Reasoning; Statistical Mechanics for Computation and Learning.
AI Club Co-President. Linguistics Club Founder + President, Chinese Club Events Coordinator.
Dual Enrollment: Differential Equations, Linear Algebra, Multivariable Calculus, Cultural Anthropology
Projects.
MADLIBS
LLaDA-R1
SHIELD.
In-Context Learning of Transformers: A Statistical Mechanics Lens
Linguistic Reasoning: Dissociating Language and Logic
Language Models (can be)
Few-Shot Fakers
NeuroPilot

Designed Multilingual Augmentation of Data with Alignment-Based Substitution, an efficient multilingual synthetic data generation algorithm achieving SOTA performance with less data.
@ Regeneron Science Talent Search 2024
Created LLaDA-R1, a diffusion LLM optimized for reasoning and efficiency at inference time with SFT+RL for dynamic diffusion step adaptation and remasking refinement.
@ Mercor x Etched x Cognition Inference-Time Compute Hackathon 2025
Built SHIELD., a multi-agent RL + tool use framework for automatic identification and remediation of system vulnerabilities. @ Pear VC x Anthropic Hackathon 2025
Investigated statistical physics models explaining in-context learning; applying spin glasses, random matrix theory, and phase transitions towards transformer interpretability.
@ APPPHYS 229 2025
Developed parallel symbol tuning, an approach to improve in-context linguistic reasoning capabilities of LLMs for few-shot language learning.
@ CS 224N 2025
Investigated CoT faithfulness & the role of memorization; Implemented corrupted CoT RL approach.
@ Anthropic Alignment Research Hackathon 2025
Built brain-computer-interface and agentic AI system for brain-powered natural language commands for hands-free computer control.
@ TreeHacks 2025