top of page

Zeyneb N. Kaya

Hi! I am Zeyneb, a student at Stanford University. I’m a researcher in AI interested in understanding and advancing NLP algorithms; my work broadly explores areas in data efficiency and interpretability toward furthering the robust reasoning capabilities of algorithms beyond shallow pattern memorization. I’m always looking to discuss interesting ideas and opportunities—please reach out!

Feel free to reach out at zeynebnk [at] stanford [dot] edu. 

Education.

Stanford University 

Stanford ASES (Affiliated Stanford Entrepreneurial Students) Bootcamp Scholar

Relevant Coursework: Natural Language Processing with Deep Learning;  Deep Reinforcement Learning; AI for Reasoning, Planning, and Decision Making; AI & Language; Programming Abstractions; Computer Organization & Systems; Probability for Computer Scientists

Saratoga High School 2020-2024

AI Club Co-President, Linguistics Club Founder & President, Chinese Club Officer 

 

4.0 UW / 4.5 W (10-12) GPA

West Valley Community College

4.0 GPA

Dual Enrollment: Differential Equations, Linear Algebra, Multivariable Calculus, Cultural Anthropology

Research.

My research interests are in natural language processing, linguistics, and data science. I am interested in advancing our understanding of language models and their capabilities, and using that to advance them and push their limits; I'm broadly interested in robust data-efficient algorithms and interpretability. Listed below are selected relevant publications.  

Vector Space Distance as a Measurement of Word Embedding Variability in Low-Resource Linguistic Environments

Zeyneb N. Kaya, Annie K. Lamar

Under Review, The North American Chapter of the Association for Computational Linguistics, 2025

Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions

Zeyneb N. Kaya, Souvick Ghosh

arXiv

Zeyneb N. Kaya, Annie K. Lamar

Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT) @ EACL, 2023

Full Scope Word Embedding Variability for Low-Resource Languages

Zeyneb N. Kaya, Annie K. Lamar

IEEE MIT Undergraduate Research and Technology Conference, 2023

MADLIBS: A Novel Multilingual Data Augmentation Algorithm for Low-Resource Neural Machine Translation 

Zeyneb N. Kaya

Regeneron Science Talent Search, 2023-2024

Zeyneb N. Kaya

Proceedings of the Linguistic Society of America (PLSA), 2023

Zeyneb N. Kaya

International Conference on Computational Social Science (IC2S2), 2023

Zeyneb N. Kaya, Manya Sriram

University of California, Santa Barbara, 2022

Honors.

Selected Honors

TreeHacks - Scrapybara Award

Valued $16K prize, 2025

US Presidential Scholars Semifinalist

2024

10th Place, 1st in USA, 2023

Scholastic Art and Writing Competition

Honorable Mention, 2020

USACO

Silver, 2020

North American Computational Linguistics Olympiad (NACLO)

Invitational Round Qualifier, 2023

Junior Science and Humanities Symposium (JSHS)  

National Qualifier (Top 5),

2nd Math/CS, 2023

Synopsys Science Fair

1st Award, CSEF Qualifier, 2023

Congressional App Challenge Winner

2021

Finalist, 2023

National Merit Scholarship Finalist

2023

Bausch and Lomb Honorary Science Award, University of Rochester

Saratoga High School Junior Awards Ceremony, 2023

National Award Winner + Regional Affiliate, 2023

Stanford Women in Data Science (WiDS) Datathon

Top High School Winner, 2023

Technovation Global Challenge

Semifinalist, 2021

VIP Invitee, 2024

Projects.

Data Efficient NLP +

Synthetic Data

2024

Designed MADLIBS (multilingual augmentation of data with alignment-based substitution), an efficient multilingual synthetic data generation algorithm achieving SOTA neural machine translation performance with less data.

2024 Regeneron STS 5th Place / $90K winner

2023 National Junior Science and Humanities Symposium (NJSHS) HM

2023 Synopsys 1st Award

2024 RISE Global Challenge Finalist

Diffusion LLMs for reasoning and efficiency at inference time

2025

Created LLaDA-R1, a diffusion LLM optimized for reasoning and efficiency at inference time with SFT+RL for dynamic diffusion step adaptation and remasking refinement. 

2025 Mercor x Etched x Cognition Inference-Time Compute Hackathon 1st Place / $40K Winner 

In-Context Linguistic Reasoning 

2025

Developed parallel symbol tuning, an approach to advance robust in-context few-shot deductive multilingual reasoning in LLMs by dissociating language and logic. 

2025 CS 224N

NeuroPilot 

2025

Built brain-computer-interface and agentic AI system for brain-powered natural language commands for hands-free computer control. 

2025 TreeHacks Sponsor Prize Winner 

Gallery.

bottom of page