# How My Agent Learned GitLab

> Teaching an agent to use CLI tools isn't about writing perfect documentation. It's about creating a feedback loop where the tool teaches, the agent learns, and reflection builds institutional knowledge.

URL: https://kumak.dev/how-my-agent-learned-gitlab/
Published: 2025-11-17
Category: tutorial

I work with a monorepo that has over 80 CI/CD jobs across 12 stages. When pipelines fail, I need to trace through parent pipelines, child pipelines, failed jobs, and error logs. There's an MCP server for GitLab. I tried it once, then installed `glab` and wrote a basic [skill file](https://gist.github.com/szymdzum/304645336c57c53d59a6b7e4ba00a7a6) with command examples.

What's interesting isn't the skill itself. It's how it developed through three investigation sessions.

## Session One: Real-Time Self-Correction

"Investigate pipeline 2961721" was my first request. Claude ran a command. Got 20 jobs back. The pipeline had 80+.

I watched Claude notice the discrepancy, run `glab api --help`, spot the `--paginate` flag, and try again. This time: all the jobs.

Then it pulled logs with `glab ci trace <job-id>`. The logs looked clean. No errors visible. But the job had definitely failed.

I didn't explain what was wrong. I asked: "The job failed, but you're not seeing errors. What might be happening?"

Claude reasoned through it: "Errors might be going to stderr instead of stdout." Then checked `glab ci trace --help`, found nothing about stderr handling, and figured out the solution: `glab ci trace <job-id> 2>&1`. Reran it. Errors appeared.

**After the session**, I asked: "What went wrong? What did you learn?"

Claude listed the issues: forgot to paginate (only saw 20 of 80+ jobs), missed stderr output, didn't know about child pipelines. We talked through each one, then updated the skill file:

```markdown
## Critical Best Practices

1. **Always use --paginate** for job queries
2. **Always capture stderr** with `2>&1` when getting logs
3. **Always check for child pipelines** via bridges API
4. **Limit log output** — use `tail -100` or `head -50`
```

Twenty minutes of reflection. Four critical lessons documented.

## Session Two: Faster, Smarter

"Check pipeline 2965483."

This time, Claude used `--paginate` from the start, captured stderr when pulling logs, and checked for child pipelines via the bridges API. Found a failed child pipeline, got its jobs, identified the error. Start to finish: five minutes.

But something new happened. All 15 Image build jobs failed. Claude started pulling logs for each one. I watched it fetch the first three — all identical errors. The base Docker image was missing from ECR.

"You just pulled three identical error messages," I pointed out. "What does that tell you?"

Claude recognised the pattern: "When multiple jobs of the same type fail, they likely have the same error. I should check one representative job instead of all 15."

Added to the skill file:

```markdown
## Pattern: Multiple Failed Jobs

When many jobs fail (e.g., all Image builds), check one representative job first.

FIRST_FAILED=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate |\
  jq -r '.[] | select(.status == "failed") | .id' | head -1)

glab ci trace $FIRST_FAILED 2>&1 | tail -100
```

## Session Three: Institutional Knowledge

Third investigation. Checkout server build timed out. Claude saw the error, started digging.

"Wait," I said. "Before you investigate, check the duration."

Claude checked: 44 minutes. "That's within normal range for checkout server builds," I told it. "This is a known issue, not an actual failure."

Added to the skill file:

```markdown
## Common Error Patterns

Build Timeout:
ERROR: Job failed: execution took longer than <time>
→ Checkout server builds can take 44+ minutes (known issue)

Missing Docker Image:
manifest for <image> not found: manifest unknown
→ Base runner image not available in ECR (common during Node version transitions)
```

By session three, the skill file had accumulated pitfalls to avoid:

```markdown
## Common Pitfalls

- ❌ Forgetting `--paginate` (only gets first 20 jobs)
- ❌ Not checking child pipelines (missing UI Test/Deploy jobs)
- ❌ Confusing Pipeline IDs (~2M) with Job IDs (~20M+)
- ❌ Missing stderr output (forgetting `2>&1`)
- ❌ Dumping entire logs (use tail/head/grep)
```

This is no longer just a command reference. It's institutional knowledge about this specific codebase.

## Why CLI Tools Enable This

CLI tools provide everything an agent needs for self-correction:

**Clear errors**: When `glab api "projects/2558/pipelines/invalid"` fails, stderr shows: "404 Not Found - Pipeline not found." The error tells you exactly what went wrong.

**Exit codes**: Every command returns 0 for success, non-zero for failure. The agent knows a command failed before reading any output.

**Help flags**: Run `glab ci trace --help` and see every flag, every option, complete syntax. Self-service documentation that's always current.

**Immediate feedback**: Try something, see if it works, adjust, try again. The loop is tight.

The help flag tells you what's possible. The skill file captures what's effective. Together, they create a learning environment where the agent improves with each session.

## The Result

Three sessions. Two hours total, including reflection time. The skill file went from basic command syntax to 200 lines of documented patterns, common errors, project-specific quirks, and investigation strategies.

I didn't write comprehensive documentation up front. The agent and I built it together through use, through failure, through reflection.

After each session, I ask the same questions: What went wrong? What went well? What should you do differently? Then we update the skill file. The next session starts better.

[That's how this skill developed.](https://gist.github.com/szymdzum/304645336c57c53d59a6b7e4ba00a7a6)