Allen Thomas

217-298-6572 | allenthomasdev@gmail.com | LinkedIn | Github

Education

University of Illinois at Urbana-Champaign

Master of Computer Science

August 2024 - December 2025

Relevant Coursework: Distributed Systems, Systems for Gen AI, Topics in LLM Agents, Deep Learning

Experience

Data Engineer

Helpshift, Pune, India

June 2022 – June 2024

  • Led migration of Issues Analytics API and PowerBi API from HBase to Amazon Redshift, handling 10x traffic increase and 50x surge in queries with zero downtime.
  • Drove petabyte-scale pipeline migration to AWS, resulting in monthly savings exceeding $16,000.
  • Led successful migration of customer-facing APIs, ensuring zero downtime, improving response times by 40%.
  • Migrated legacy stream processing pipelines from Apache Storm and HBase to Apache Flink and YugaByte.
  • Played key role in decommissioning legacy infrastructure, supporting cost-saving initiatives.
  • Led data democratization initiative, reducing support tickets by 40% by training support staff on Metabase.

Projects

Control Vector-Based LLM Steering and Analysis

LLMs, Hugging Face, OpenAI | March 2025

  • Developed framework to train control vectors modulating LLM personality traits.
  • Engineered pipeline for generating, collecting, analyzing LLM responses with automated scoring.
  • Integrated evaluation compatibility with lm-evaluation-harness for systematic benchmarking.

Stream Processing Framework

Golang, Distributed Systems, Multithreading | November 2024

  • Architected high-performance stream processing system with exactly-once semantics and fault tolerance.
  • Engineered optimized task scheduling algorithms for enhanced real-time throughput.
  • Integrated custom HyDFS storage layer for reliable data persistence and recovery.

Hybrid Distributed File System (HyDFS)

Golang, Distributed Systems, Fault Tolerance | October 2024

  • Implemented scalable distributed file system supporting operations across 10 nodes with failure tolerance.
  • Implemented consistent hashing for optimal data distribution, improving replication and routing.
  • Synthesized HDFS and Cassandra principles for a robust, scalable storage solution.

Research

Independent Research on LLM Inference Scaling

RunPod, Qwen, DeepSeek, OpenAI | 2025

  • Investigated inference-time scaling laws and reasoning token manipulation.
  • Benchmarked and fine-tuned s1/s1.1 models, analyzed "Wait"-based budget forcing.
  • Conducted large-scale inference experiments on distributed GPU clusters.

Agentic Research Code Reproducibility System

Docker, Google Gemini, OpenAI | October 2024

  • Engineered autonomous AI system analyzing research papers and reconstructing environments.
  • Developed self-directed exploration for automated reproducibility testing via containerized execution.

Multi-Agent Reinforcement Learning-Implementation of Hide and Seek

IEEE | 2021

  • Researched and implemented MARL algorithms, developed hide-and-seek simulation inspired by OpenAI.

Technical Skills

Languages: Golang, Java, Python, Clojure, SQL, R, Scala, JavaScript

Frameworks & Platforms: Apache Spark, Airflow, Hadoop, Kafka, Flink, Hive, HBase, AWS Redshift, S3, Athena, EMR, dbt, PyTorch, Docker, pandas, NumPy, scikit-learn, Hugging Face Transformers

Developer Tools: Neovim, Git, Gerrit, Jira, AWS, Terraform

Databases: PostgreSQL, MySQL, MongoDB, Cassandra, Elasticsearch