Allen Thomas
217-298-6572 | allenthomasdev@gmail.com | LinkedIn | Github
Education
University of Illinois at Urbana-Champaign
Master of Computer Science
August 2024 - December 2025
Relevant Coursework: Distributed Systems, Systems for Gen AI, Topics in LLM Agents, Deep Learning
Experience
Data Engineer
Helpshift, Pune, India
June 2022 – June 2024
- Led migration of Issues Analytics API and PowerBi API from HBase to Amazon Redshift, handling 10x traffic increase and 50x surge in queries with zero downtime.
- Drove petabyte-scale pipeline migration to AWS, resulting in monthly savings exceeding $16,000.
- Led successful migration of customer-facing APIs, ensuring zero downtime, improving response times by 40%.
- Migrated legacy stream processing pipelines from Apache Storm and HBase to Apache Flink and YugaByte.
- Played key role in decommissioning legacy infrastructure, supporting cost-saving initiatives.
- Led data democratization initiative, reducing support tickets by 40% by training support staff on Metabase.
Projects
Control Vector-Based LLM Steering and Analysis
LLMs, Hugging Face, OpenAI | March 2025
- Developed framework to train control vectors modulating LLM personality traits.
- Engineered pipeline for generating, collecting, analyzing LLM responses with automated scoring.
- Integrated evaluation compatibility with lm-evaluation-harness for systematic benchmarking.
Stream Processing Framework
Golang, Distributed Systems, Multithreading | November 2024
- Architected high-performance stream processing system with exactly-once semantics and fault tolerance.
- Engineered optimized task scheduling algorithms for enhanced real-time throughput.
- Integrated custom HyDFS storage layer for reliable data persistence and recovery.
Hybrid Distributed File System (HyDFS)
Golang, Distributed Systems, Fault Tolerance | October 2024
- Implemented scalable distributed file system supporting operations across 10 nodes with failure tolerance.
- Implemented consistent hashing for optimal data distribution, improving replication and routing.
- Synthesized HDFS and Cassandra principles for a robust, scalable storage solution.
Research
Independent Research on LLM Inference Scaling
RunPod, Qwen, DeepSeek, OpenAI | 2025
- Investigated inference-time scaling laws and reasoning token manipulation.
- Benchmarked and fine-tuned s1/s1.1 models, analyzed "Wait"-based budget forcing.
- Conducted large-scale inference experiments on distributed GPU clusters.
Agentic Research Code Reproducibility System
Docker, Google Gemini, OpenAI | October 2024
- Engineered autonomous AI system analyzing research papers and reconstructing environments.
- Developed self-directed exploration for automated reproducibility testing via containerized execution.
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
IEEE | 2021
- Researched and implemented MARL algorithms, developed hide-and-seek simulation inspired by OpenAI.
Technical Skills
Languages: Golang, Java, Python, Clojure, SQL, R, Scala, JavaScript
Frameworks & Platforms: Apache Spark, Airflow, Hadoop, Kafka, Flink, Hive, HBase, AWS Redshift, S3, Athena, EMR, dbt, PyTorch, Docker, pandas, NumPy, scikit-learn, Hugging Face Transformers
Developer Tools: Neovim, Git, Gerrit, Jira, AWS, Terraform
Databases: PostgreSQL, MySQL, MongoDB, Cassandra, Elasticsearch