Yashowardhan Singh Tomar

AI Evaluation Engineer | RLHF Specialist | LLM Quality Analyst

Indore, India | Open to remote and relocation | yashtomar10122@gmail.com | +91-97549-50809

linkedin.com/in/yashowardhansinghtomar | github.com/yashowardhansinghtomar

Profile

AI engineer focused on LLM evaluation, RLHF workflows, rubric design, and model-quality analysis. Hands-on experience contributing to RLHF pipelines for xAI and Meta Llama 4 through Turing-sourced work, including single-turn and multi-turn prompt design, side-by-side response evaluation, preference judgments, and ground-truth corrections. Public portfolio now includes config-driven LLM evaluation and RAG evaluation workbenches with validation, scoring, failure tags, and generated reports.

Experience

Jr. AI / Data Science Engineer | Engineer MasterSep 2024 - Jan 2025

Contributed to RLHF training pipelines for xAI and Meta Llama 4 through Turing-sourced work, evaluating model behavior across Python and data-science tasks.
Designed single-turn and multi-turn prompts to stress-test correctness, instruction following, reasoning quality, and response clarity.
Reviewed side-by-side model responses using structured 5-point rubrics and submitted preference judgments for alignment workflows.
Created ground-truth corrections and failure notes to improve prompt-response quality and reduce recurring model errors.
Integrated TTS and STT models into PreCall AI, a production AI calling workflow, adding applied systems experience beyond evaluation-only work.

Business Development Associate | Byju's2020 - 2022

Managed high-volume B2C sales of online education products with weekly revenue targets.
Developed consultative discovery, stakeholder communication, and objection-handling skills across a large customer base.

Selected Projects

llm-evaluation-labhttps://github.com/yashowardhansinghtomar/llm-evaluation-lab

Python, JSONL, CLI, Markdown/JSON reports

Built a config-driven LLM evaluation workbench with rubric validation, pairwise scoring, agreement checks, and failure-mode reporting.
Modeled RLHF-style review data with prompt, candidate responses, weighted dimensions, human preference, notes, and failure tags.

rag-evaluation-workbenchhttps://github.com/yashowardhansinghtomar/rag-evaluation-workbench

Python, BM25 retrieval, citation checks, reports

Evaluates RAG answers for retrieval recall, citation precision/recall, required-fact coverage, grounded answer rate, and failure tags.

Skills

Evaluation: LLM evaluation, RLHF, pairwise comparison, rubric design, prompt regression, failure tagging
RAG Quality: Retrieval recall, citation precision/recall, groundedness checks, required-fact coverage
Tools: Python, JSONL, Markdown/JSON reporting, LangChain, OpenAI API, Groq, Ollama, Hugging Face
Interfaces: CLI tools, Streamlit, Gradio

Education

Prestige Institute of Management & Research
Bachelor of Business Administration (BBA), Global Marketing & Brand Management | 2017 - 2020

Certifications

Data Analyst - Edubridge | Generative AI - Debugshala