I'm Arian, and I like to write when I have free time.

I also pay my bills as an AI Engineer specialized in NLP and Large Language Models.

Lately, I've been diving into evaluation methods for generative ai and agentic systems —making sure we’re building smart stuff with confidence. At my current job, I sit somewhere between engineering, research, and consulting.

QUICK LINKS


PROJECTS

JOURNAL

CONTACT ME

<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" />

Getting Started

Head to Instructions for setting up guide!

</aside>

LATEST POST

1744016230347.jpeg

OPINION

“90% of code will be machine-generated until the end of the year”


Sounds so 2025 but this could be a headline from Byte Magazine in 1975, when C - introduced in 1972 - was gaining popularity.

Imagine Assembly programmers worried their job would become obsolete.

Instead of manually writing Assembly code it became possible to write code in a new language called C. The compiler would then generate the code for you. Imagine that.

READ MORE

1743704045062.jpeg

J O U R N A L

How many samples do we need in our evaluation dataset?


The most common question I hear when teams set up their evaluation framework: "How many samples do we need?"

This question also comes in different forms , like

"How much time do we need from our domain experts to annotate the dataset?"

In the context of big corporate business, domain experts are usually the most expensive resource in the company. Being able to get one hour of their time is serious money.

So we need to prove how much exactly we need to borrow from their time to annotate our brand new AI project.

READ MORE

FIND ME ON


LinkedIn - Instagram - Calendar