Download PDF Resume

Allen Thomas

217-298-6572 | allenthomasdev@gmail.com | LinkedIn | Github

Education

University of Illinois at Urbana-Champaign

Master of Computer Science

August 2024 - December 2025

Relevant Coursework: Distributed Systems, Systems for Gen AI, Topics in LLM Agents, Deep Learning

Experience

Data Engineer

Helpshift, Pune, India

June 2022 – June 2024

Led migration of Issues Analytics API and PowerBi API from HBase to Amazon Redshift, handling 10x traffic increase and 50x surge in queries with zero downtime.
Drove petabyte-scale pipeline migration to AWS, resulting in monthly savings exceeding $16,000.
Led successful migration of customer-facing APIs, ensuring zero downtime, improving response times by 40%.
Migrated legacy stream processing pipelines from Apache Storm and HBase to Apache Flink and YugaByte.
Played key role in decommissioning legacy infrastructure, supporting cost-saving initiatives.
Led data democratization initiative, reducing support tickets by 40% by training support staff on Metabase.

Projects

Control Vector-Based LLM Steering and Analysis

LLMs, Hugging Face, OpenAI | March 2025

Developed framework to train control vectors modulating LLM personality traits.
Engineered pipeline for generating, collecting, analyzing LLM responses with automated scoring.
Integrated evaluation compatibility with lm-evaluation-harness for systematic benchmarking.

Stream Processing Framework

Golang, Distributed Systems, Multithreading | November 2024

Architected high-performance stream processing system with exactly-once semantics and fault tolerance.
Engineered optimized task scheduling algorithms for enhanced real-time throughput.
Integrated custom HyDFS storage layer for reliable data persistence and recovery.

Hybrid Distributed File System (HyDFS)

Golang, Distributed Systems, Fault Tolerance | October 2024

Implemented scalable distributed file system supporting operations across 10 nodes with failure tolerance.
Implemented consistent hashing for optimal data distribution, improving replication and routing.
Synthesized HDFS and Cassandra principles for a robust, scalable storage solution.

Research

Independent Research on LLM Inference Scaling

RunPod, Qwen, DeepSeek, OpenAI | 2025

Investigated inference-time scaling laws and reasoning token manipulation.
Benchmarked and fine-tuned s1/s1.1 models, analyzed "Wait"-based budget forcing.
Conducted large-scale inference experiments on distributed GPU clusters.

Agentic Research Code Reproducibility System

Docker, Google Gemini, OpenAI | October 2024

Engineered autonomous AI system analyzing research papers and reconstructing environments.
Developed self-directed exploration for automated reproducibility testing via containerized execution.

Multi-Agent Reinforcement Learning-Implementation of Hide and Seek

IEEE | 2021

Researched and implemented MARL algorithms, developed hide-and-seek simulation inspired by OpenAI.

Technical Skills

Languages: Golang, Java, Python, Clojure, SQL, R, Scala, JavaScript

Frameworks & Platforms: Apache Spark, Airflow, Hadoop, Kafka, Flink, Hive, HBase, AWS Redshift, S3, Athena, EMR, dbt, PyTorch, Docker, pandas, NumPy, scikit-learn, Hugging Face Transformers

Developer Tools: Neovim, Git, Gerrit, Jira, AWS, Terraform

Databases: PostgreSQL, MySQL, MongoDB, Cassandra, Elasticsearch