Zeyneb N. Kaya
Hi! I am Zeyneb, a student at Stanford University. I’m a researcher in AI interested in understanding and advancing NLP algorithms; my work broadly explores areas in data efficiency and interpretability toward furthering the robust reasoning capabilities of algorithms beyond shallow pattern memorization. I’m always looking to discuss interesting ideas and opportunities—please reach out!
Feel free to reach out at zeynebnk [at] stanford [dot] edu.
Education.
Stanford University
Stanford ASES (Affiliated Stanford Entrepreneurial Students) Bootcamp Scholar
Relevant Coursework: Natural Language Processing with Deep Learning; Deep Reinforcement Learning; AI for Reasoning, Planning, and Decision Making; AI & Language; Programming Abstractions; Computer Organization & Systems; Probability for Computer Scientists
Saratoga High School 2020-2024
AI Club Co-President, Linguistics Club Founder & President, Chinese Club Officer
4.0 UW / 4.5 W (10-12) GPA
West Valley Community College
4.0 GPA
Dual Enrollment: Differential Equations, Linear Algebra, Multivariable Calculus, Cultural Anthropology
Research.
My research interests are in natural language processing, linguistics, and data science. I am interested in advancing our understanding of language models and their capabilities, and using that to advance them and push their limits; I'm broadly interested in robust data-efficient algorithms and interpretability. Listed below are selected relevant publications.
Vector Space Distance as a Measurement of Word Embedding Variability in Low-Resource Linguistic Environments
Zeyneb N. Kaya, Annie K. Lamar
Under Review, The North American Chapter of the Association for Computational Linguistics, 2025
Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions
Zeyneb N. Kaya, Souvick Ghosh
arXiv
Zeyneb N. Kaya, Annie K. Lamar
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT) @ EACL, 2023
Full Scope Word Embedding Variability for Low-Resource Languages
Zeyneb N. Kaya, Annie K. Lamar
IEEE MIT Undergraduate Research and Technology Conference, 2023
MADLIBS: A Novel Multilingual Data Augmentation Algorithm for Low-Resource Neural Machine Translation
Zeyneb N. Kaya
Regeneron Science Talent Search, 2023-2024
Zeyneb N. Kaya
Proceedings of the Linguistic Society of America (PLSA), 2023
Zeyneb N. Kaya
International Conference on Computational Social Science (IC2S2), 2023
Zeyneb N. Kaya, Manya Sriram
University of California, Santa Barbara, 2022
Ahmet C. Genc, Zeyneb N. Kaya, et al
Annals of the Rheumatic Diseases, 2021
Honors.
Selected Honors
1st Place / $40K Winner
$90K/5th Place Winner, 2024
Finalist, 2025
TreeHacks - Scrapybara Award
Valued $16K prize, 2025
US Presidential Scholars Semifinalist
2024
10th Place, 1st in USA, 2023
Scholastic Art and Writing Competition
Honorable Mention, 2020
USACO
Silver, 2020
Honorable Mention, Math/CS, 2023
North American Computational Linguistics Olympiad (NACLO)
Invitational Round Qualifier, 2023
Junior Science and Humanities Symposium (JSHS)
National Qualifier (Top 5),
2nd Math/CS, 2023
Synopsys Science Fair
1st Award, CSEF Qualifier, 2023
Congressional App Challenge Winner
2021
Finalist, 2023
National Merit Scholarship Finalist
2023
Bausch and Lomb Honorary Science Award, University of Rochester
Saratoga High School Junior Awards Ceremony, 2023
National Award Winner + Regional Affiliate, 2023
Stanford Women in Data Science (WiDS) Datathon
Top High School Winner, 2023
Technovation Global Challenge
Semifinalist, 2021
VIP Invitee, 2024
Projects.
Data Efficient NLP +
Synthetic Data
2024
Designed MADLIBS (multilingual augmentation of data with alignment-based substitution), an efficient multilingual synthetic data generation algorithm achieving SOTA neural machine translation performance with less data.
2024 Regeneron STS 5th Place / $90K winner
2023 National Junior Science and Humanities Symposium (NJSHS) HM
2023 Synopsys 1st Award
2024 RISE Global Challenge Finalist
Diffusion LLMs for reasoning and efficiency at inference time
2025
Created LLaDA-R1, a diffusion LLM optimized for reasoning and efficiency at inference time with SFT+RL for dynamic diffusion step adaptation and remasking refinement.
2025 Mercor x Etched x Cognition Inference-Time Compute Hackathon 1st Place / $40K Winner
In-Context Linguistic Reasoning
2025
Developed parallel symbol tuning, an approach to advance robust in-context few-shot deductive multilingual reasoning in LLMs by dissociating language and logic.
2025 CS 224N
NeuroPilot
2025
Built brain-computer-interface and agentic AI system for brain-powered natural language commands for hands-free computer control.
2025 TreeHacks Sponsor Prize Winner