[The Null Hypothesis]

How My Agent Learned GitLab

Published
Reading time
5 min read
Category
tutorial
LLM

I work with a monorepo that has over 80 CI/CD jobs across 12 stages. When pipelines fail, I need to trace through parent pipelines, child pipelines, failed jobs, and error logs. There’s an MCP server for GitLab. I tried it once, then installed glab and wrote a basic skill file with command examples.

What’s interesting isn’t the skill itself. It’s how it developed through three investigation sessions.

Session One: Real-Time Self-Correction

“Investigate pipeline 2961721” was my first request. Claude ran a command. Got 20 jobs back. The pipeline had 80+.

I watched Claude notice the discrepancy, run glab api --help, spot the --paginate flag, and try again. This time: all the jobs.

Then it pulled logs with glab ci trace <job-id>. The logs looked clean. No errors visible. But the job had definitely failed.

I didn’t explain what was wrong. I asked: “The job failed, but you’re not seeing errors. What might be happening?”

Claude reasoned through it: “Errors might be going to stderr instead of stdout.” Then checked glab ci trace --help, found nothing about stderr handling, and figured out the solution: glab ci trace <job-id> 2>&1. Reran it. Errors appeared.

After the session, I asked: “What went wrong? What did you learn?”

Claude listed the issues: forgot to paginate (only saw 20 of 80+ jobs), missed stderr output, didn’t know about child pipelines. We talked through each one, then updated the skill file:

## Critical Best Practices

1. **Always use --paginate** for job queries
2. **Always capture stderr** with `2>&1` when getting logs
3. **Always check for child pipelines** via bridges API
4. **Limit log output** — use `tail -100` or `head -50`

Twenty minutes of reflection. Four critical lessons documented.

Session Two: Faster, Smarter

“Check pipeline 2965483.”

This time, Claude used --paginate from the start, captured stderr when pulling logs, and checked for child pipelines via the bridges API. Found a failed child pipeline, got its jobs, identified the error. Start to finish: five minutes.

But something new happened. All 15 Image build jobs failed. Claude started pulling logs for each one. I watched it fetch the first three — all identical errors. The base Docker image was missing from ECR.

“You just pulled three identical error messages,” I pointed out. “What does that tell you?”

Claude recognised the pattern: “When multiple jobs of the same type fail, they likely have the same error. I should check one representative job instead of all 15.”

Added to the skill file:

## Pattern: Multiple Failed Jobs

When many jobs fail (e.g., all Image builds), check one representative job first.

FIRST_FAILED=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate |\
  jq -r '.[] | select(.status == "failed") | .id' | head -1)

glab ci trace $FIRST_FAILED 2>&1 | tail -100

Session Three: Institutional Knowledge

Third investigation. Checkout server build timed out. Claude saw the error, started digging.

“Wait,” I said. “Before you investigate, check the duration.”

Claude checked: 44 minutes. “That’s within normal range for checkout server builds,” I told it. “This is a known issue, not an actual failure.”

Added to the skill file:

## Common Error Patterns

Build Timeout:
ERROR: Job failed: execution took longer than <time>
→ Checkout server builds can take 44+ minutes (known issue)

Missing Docker Image:
manifest for <image> not found: manifest unknown
→ Base runner image not available in ECR (common during Node version transitions)

By session three, the skill file had accumulated pitfalls to avoid:

## Common Pitfalls

- ❌ Forgetting `--paginate` (only gets first 20 jobs)
- ❌ Not checking child pipelines (missing UI Test/Deploy jobs)
- ❌ Confusing Pipeline IDs (~2M) with Job IDs (~20M+)
- ❌ Missing stderr output (forgetting `2>&1`)
- ❌ Dumping entire logs (use tail/head/grep)

This is no longer just a command reference. It’s institutional knowledge about this specific codebase.

Why CLI Tools Enable This

CLI tools provide everything an agent needs for self-correction:

Clear errors: When glab api "projects/2558/pipelines/invalid" fails, stderr shows: “404 Not Found - Pipeline not found.” The error tells you exactly what went wrong.

Exit codes: Every command returns 0 for success, non-zero for failure. The agent knows a command failed before reading any output.

Help flags: Run glab ci trace --help and see every flag, every option, complete syntax. Self-service documentation that’s always current.

Immediate feedback: Try something, see if it works, adjust, try again. The loop is tight.

The help flag tells you what’s possible. The skill file captures what’s effective. Together, they create a learning environment where the agent improves with each session.

The Result

Three sessions. Two hours total, including reflection time. The skill file went from basic command syntax to 200 lines of documented patterns, common errors, project-specific quirks, and investigation strategies.

I didn’t write comprehensive documentation up front. The agent and I built it together through use, through failure, through reflection.

After each session, I ask the same questions: What went wrong? What went well? What should you do differently? Then we update the skill file. The next session starts better.

That’s how this skill developed.