Steve's Real Blog

Getting Good at Coding with Agents

July 16, 2025

A thing I used to do was install new vim plugins.

I was an intern at Yelp who was mostly coding in TextMate through college. TextMate wasn’t going to fly at Yelp because most development happened over ssh. My Real Hacker coworkers mostly used vim and emacs. I wanted to grow up to be a Real Hacker, and so, during work hours, I forced myself to use vim.

At first I slowed down. A lot. But a week into it, I was just as fast as I had been with TextMate, and by the end of the internship, I was much faster.

In 2011, one did not simply “use vim.” vim needed customization. You could optimize all kinds of things! How integrated the programming language was with the editor. How to browse the directory tree. Even how to move the cursor. While it was annoying to need to add plugins to build a good experience, it also meant I was learning each piece at a time, and making a considered decision based on how it would affect my workflow. I was constantly introspecting and “watching myself code.”

My goal as I added plugins to my config was to minimize time and effort between having an intention and fulfilling that intention in code. If I edited a function and wanted to update the corresponding test, I wanted to blink and find my cursor in the right place. If I was deep into a complex problem with ten interrelated files, I wanted to be able to flip between them instantly. (Naturally, I installed too many plugins and had to pare back when things got unwieldy.)

The point of all this isn’t that I eventually reached the perfect state. I didn’t—the point is the process of improvement. My efforts paid off over time, even now in the VSCode era when enough people find vim’s model valuable that its key commands are available in most places that matter.

The outcome you are looking for is productivity

Good engineers care whether they are productive or not. Productivity is hard to measure as an outsider, but as an individual, you know whether you’re doing your best and getting good outcomes.

The metrics you can move when optimizing your coding workflow are roughly:

How fast you can type
How fast you can navigate to a specific place so you can type
Whether you know the right answer to a question without having to ask or look it up
How fast you can discover the right answer to a question
How fast you can get confirmation of a hypothesis
How quickly you enter and how long you stay in flow state

People quibble about how important each one is. Personally, I think raw typing speed is underrated. But when you add large language models using tools in a loop—OK fine, “AI agents”—you introduce a whole new set of metrics you can move:

How much correct code you can produce by typing and thinking as little as possible
How much time you spend unwinding garbage code
How often you are kicked out of the flow state due to waiting for agents
How often you fail to apply your own skills because you are distracted by agents

Moving these metrics for yourself is, to use a BigCo phrase, extremely leveraged.

It’s roughly as hard as learning vim, but the potential gains are much greater. And the state of the art with language models doesn’t need to advance for this to change the industry forever. It’s not only that you can potentially do your existing work faster; embracing these tools can change the nature of the work you choose to take on.

I want to talk about how you can become faster in practice by using language models.

The rest of this post assumes you are using Claude Code because it’s currently the best monetary value, gets the best results in practice, and is the most extensible, but the framing applies to all coding agents.

How to move the metrics

Practice a lot and build a mental model

A coding agent is not a magic “do what I want” machine. It is a tool with tradeoffs, rough edges, and best practices.

Learn when to cut bait

There are times when agents simply cannot make the leap to find the root cause of an issue or the solution to a problem. The only way to know when this is happening is to experience it and develop a taste for it.

Develop a taste for models

Gemini leaves too many useless comments. o3 is a great problem solver but its context window feels short. Opus isn’t that much better than Sonnet as a coding agent. These are beliefs I’ve developed after switching between models a lot during the day in the Cursor IDE.

I don’t want you to adopt my beliefs. I want you to develop your own beliefs through experimentation. If you only ever code with GPT4.1, you’ll think that LLMs are terrible coders and insufferable communicators.

It’s a rough time to be doing this here in July 2025 because Cursor just nerfed its paid plans and all the first-party coding tools (Codex, Gemini CLI, Claude Code) don’t have feature parity with each other, but do what you can.

Get good at prompting

“Prompting” can sound like a silly skill to work on. But there are specific techniques that I’ve found can greatly improve the output quality in practice. The techniques will probably change over time, and this isn’t a how-to-prompt article, so I won’t get deep into it.

However, I want to give a special mention to one technique: having agents write down plans in Markdown files. Having an agent write down a plan forces the agent to gather enough information to make well-informed decisions, and then gives you an opportunity to review its future actions before they happen, reducing the need for you to step in as it works.

Also, agents are just OK at finding content related to a topic. If you mention a few related files in your prompt, you’re likely to save seconds or minutes of the agent grepping pointlessly. It took me some time to start doing this habitually.

Get the agent to do the right thing more often

Document things and help agents find the docs

Because LLMs are “just” spicy autocomplete, what gets into the context window is important.

Suppose you hire an intern and ask them to write an API endpoint. If there are a dozen gotchas with this and you don’t warn them, the intern is going to hit at least four. On the other hand, if you give them a document about best practices for writing API endpoints, you’re a little more likely to get a good result.

The same principle applies to agents. If you write the doc, and then mention its existence in all the API endpoint files, then agents will probably go read it and follow best practices. And then you can say “hey agent, reread the best practices doc and audit the new code carefully” to catch problems.

A related strategy is to use #hashtags in your code. If an agent sees “#auth-edge-case” in a comment above a function, it will probably search your codebase for the string “#auth-edge-case” and find the large comment explaining exactly what the auth edge case is. A lot of orgs do this already for humans, and it’s just as effective for LLMs. Probably more effective, since LLMs don’t get bored.

If you adopt good documentation practices, they benefit humans and agents alike!

Garden your prompts

Every tool has a special set of Markdown files that get injected into the LLM’s context window. With Claude Code, this is CLAUDE.md. This file can make or break your productivity. If it contains the right set of relevant information, you have to do so much less prompting in each session to get an agent to do the right thing.

My personal project prompt files usually contain:

A very brief summary of how the project is structured
A list of encouraged and forbidden commands (for example, “always use git --no-pager diff instead of git diff to avoid pagination”)
Workflow notes such as universal acceptance criteria (tests, typechecking, and lint must pass) and when to commit or push (which may be never)
Style guide rules that can’t be enforced automatically

Use popular technologies with built-in guardrails

Using bleeding-edge or hipster tech has always been risky. Now it can also prevent agents from being able to code effectively. If something is too new or too unusual to have much written about it around the internet, LLMs are likely to hallucinate. I found this to be true when trying to write a Tauri app, and eventually stopped trying to get my agent to write Tauri code. (At the start of the project, I knew zero Rust and zero Tauri. By the time I dropped the agent, I had learned enough to fly solo. The mediocre agent code got me far enough to start learning organically—I never read a tutorial.)

Get the agent to do the wrong thing less often

Use automated typechecking, linting, and testing

One of the great things about the software engineering industry is that we’re mostly bought into automated checks for fallible human changes. LLMs are either more or less fallible than humans, depending on the task, and so these automated checks remain critical.

I am most productive working with an agent when it can check its own output and notice mistakes based on evidence. If they can typecheck, lint, and unit test changes locally, agents can do more on their own and self-correct.

Write consistent code, or make “the right way” obvious

LLMs, being stochastic parrots, love repeating what they’ve recently heard. If there’s mostly one way to accomplish a common task in your codebase, and all features mostly do it that way, then an LLM will be able to pattern match well. Without consistency, you’re likely to get the LLM adding yet another way to do the same thing.

Those of you working in mature codebases might be giving a heavy sigh right now. One thing I’ve found helpful in these situations is to mark some specific files as “the right way” and then provide a way for agents to discover these files.

When I worked at Asana, there were three different ways a feature could be written: oldest, old, and current. We automatically annotated every single file with which pattern it used. We did this in the service of humans, before the age of LLMs, but I think it would work well for LLMs as well.

Treat bad agent behavior like a developer experience problem

If an agent does something obviously wrong, try to figure out if you can prevent that behavior automatically in the future. You have many tools available to you. You can write a hook to catch it, or write docs that agents are prompted in CLAUDE.md to read, or add a lint rule, or an automated test. If you’re lacking inspiration, you can ask the agent itself. Its idea might be bad, or it might not.

Sometimes the answer is to just get better at prompting, or to recognize that a class of problems is a bad fit for coding with agents.

Use the full power of the tools

Claude Code has a feature called Hooks which lets you run scripts when things happen. One way we use this at Descript is to force agents to call pnpm instead of npm. pnpm is popular, but npm will always be the 500lb gorilla of the JavaScript packaging world, and it doesn’t bother me that LLMs can’t help but try to use it. Human engineers make the same mistake sometimes, but I can’t add automated hooks into their brains like I can do with Claude!

Keep momentum

Give agents the tools to see and solve problems without your involvement

If you need to talk to your agent a lot to accomplish simple tasks, you’re probably not moving faster. When agents can test changes and iterate, they can work autonomously. As long as agents are behaving well, autonomy is a very good thing!

Here’s an example. Suppose you have a language model that is able to fix basic test failures. You, a human being, put up a PR that fails a test after a five-minute CI run. You click into the log for the failure and scroll down until you see failure traceback. You paste it into your agent and say “fix this.” It fixes a bug in your code to get the test to pass, but five minutes later, the next CI run fails because the fixed code reveals another issue. You repeat the process, clicking into the logs, pasting the message, having the agent fix the bug, committing and pushing. You check on the build and see that it passed. Congratulations, you used AI to solve a problem! It only took you fifteen minutes of lost focus.

Now instead of manually pasting the CI output to your agent, suppose that you asked Claude Code to run cimonitor watch after every push and fix problems that have obvious solutions. So when the first run fails, the agent noticed the error, pushed the first fix, noticed the second error, and pushed the second fix. You fired it off one time and started working on something else. With enough tools to observe the outcome and the ability to run those tools, you fired it off once and went to work on something else. You still had to check on it at the end, but you had two fewer interruptions.

A good general rule is, any time you copy and paste to or from an agent, find a way to automate it, probably using a command line tool or MCP server. Investigate log forwarding or use persistproc to make sure your agents can read your logs.

Multitask

Let’s continue with the example above. Now you’ve got Claude Code waiting on a CI job for your branch, so you’re free to do whatever. But your git clone is tied up with this CI thing! What do you do?

Developers are used to working out of “their clone” of a git repo. Dev tooling in organizations is mostly set up based on this assumption. In a world of agent-assisted coding, I don’t think this is reasonable anymore because agents encourage you to multitask.

I believe this is one of the frontiers of tooling as far as developer productivity goes. Engineers who embrace coding with agents will have multiple worktrees or containers or cloud VMs at the same time, dedicated to different ongoing coding tasks. During the workday I’ve started using a small script that wraps git worktrees and it’s been going well, but I suspect more will happen in this space, especially open source tools.

Prompt boldly, commit conservatively

One way in which I differ from a lot of AI hypefolk is that I don’t let agents commit my changes. I manage the staging area using Tower and make sure I understand every change that goes into a commit.

This workflow allows me to ask an agent to do something really ambitious, and just back out of the whole thing if it goes badly. It’s one area where my “classical engineering productivity” skill of being good at Git has served me well in the coding-with-agents era.

This is how I operate at work. But at home, on side projects, I do let Claude commit on my behalf because the stakes are lower and the problems are simpler.

It is worth finding out whether it works for you

This might sound like a lot of work. Maybe you’re skeptical that this can ever work. Maybe this isn’t why you got into coding and agent-assisted coding seems like a black hole for joy. Maybe you don’t like the vibes you get from AI companies. Or maybe your coworkers are submitting low-quality AI-generated pull requests without checking the results and you don’t want to be like that.

I’ve heard all of these things from friends, coworkers, and social media posts. I agree with at least one.

But despite all that, I find myself seeing a step change in my output without a drop in quality. It energizes me because I love building, and it scares me because I believe the bar will raise for engineering productivity across the industry and I need to stay employed.

So, seeing an opportunity to become more productive without spending more time, I take it. It seems like the rational thing to do. This entire article is a distillation of what I’ve learned just by practicing and experimenting. I still think I’m half as productive as I could be with the right tooling, and there are so many new plugins to install.

No part of this post was generated by an LLM.

Some articles about how to use modern CSS

July 24, 2024

MDN is great for a reference, but I haven't found a source of truth for modern CSS best practices. Once in a while I run across an article that captures a small piece of it. Here's a list of those articles.

Hit me up on Mastodon if I should add anything to the list.

Three UIKit Protips

August 30, 2022

There are three patterns I use in most of my UIKit projects that I've never seen anyone else talk about. I think they help readability a lot, so I'm sharing them here:

An addSubviews method to define your view hierarchy all at once
An @AssignedOnce property wrapper
A pattern for keeping view creation at the bottom of a file to keep the top clean

`addSubviews`

I've seen a lot of view and view controller code that looks like this:

override func loadView() {
  view = UIView()
  let scrollView = UIScrollView()
  view.addSubview(scrollView)
  let contentView = MyContentView()
  scrollView.addSubview(contentView)
  let topLabel = UILabel()
  let button = UIButton()
  contentView.addSubview(topLabel)
  contentView.addSubview(button)
}

This style of setup is straightforward, but I usually have a difficult time understanding the view hierarchy without spending more time than I'd like.

In most of my projects, I use this simple extension to help with this problem:

extension UIView {
  func addSubviews(_ subviews: UIView...) -> UIView {
    subviews.forEach { self.addSubview($0) }
    return self
  }
}

Now, it's possible for the calls to addSubviews() to visually resemble the view hierarchy!

override func loadView() {
  view = UIView()
  let scrollView = UIScrollView()
  let contentView = MyContentView()
  let topLabel = UILabel()
  let button = UIButton()
  
  view.addSubviews(
    scrollView.addSubviews(
      contentView.addSubviews(
        topLabel,
        bottomLabel)))
}

You can also use this pattern in UIView initializers.

`@AssignedOnce`

When using storyboards, you commonly need to use force-unwrapped optional vars to keep references to views and other things. fine, but there is no compile-time guarantee that the property can't be overwritten. Kotlin solves this problem with the lateinit keyword, but Swift has no equivalent.

You can at least prevent multiple writes to vars at runtime by using this simple property wrapper, which throws an assertion failure in debug builds if you write to the property more than once. It's not as good as a compile-time guarantee, but it does double as inline documentation.

@propertyWrapper
public struct AssignedOnce<T> {
  #if DEBUG
    private var hasBeenAssignedNotNil = false
  #endif

  public private(set) var value: T!

  public var wrappedValue: T! {
    get { value }
    
    // Normally you don't want to be running a bunch of extra code when storing values, but
    // since you should only be doing it one time, it's not so bad.
    set {
      #if DEBUG
        assert(!hasBeenAssignedNotNil)
        if newValue != nil {
          hasBeenAssignedNotNil = true
        }
      #endif

      value = newValue
    }
  }

  public init(wrappedValue initialValue: T?) {
    wrappedValue = initialValue
  }
}

In practice, you can just add @AssignedOnce in front of any properties you want to prevent multiple assignment to:

class MyViewController: UIViewController {
  @AssignedOnce var button: UIButton! // assigned by storyboard
  @AssignedOnce var label: UILabel! // assigned by storyboard
}

Looks pretty nice, right?

View Factories

The most critical part of any source code file is the first hundred lines. If you're browsing through code, it really helps to not have to scroll very much to see what's going on.

Unfortunately, it's very easy to gum up the top of a view view controller file by creating subviews over multiple lines, especially if (like me) you don't use storyboards at all. Here's what I mean:

class MyViewController: UIViewController: UITableViewDataSource, UITableViewDelegate {
  // I'm declaring all these as FUO `var`s instead of `let`s
  // so I can instantiate them in loadView().
  @AssignedOnce private var headerLabel: UILabel!
  @AssignedOnce private var tableView: UITableView!
  @AssignedOnce private var continueButton: UIButton!
  
  override func loadView() {
    view = UIView()
    
    headerLabel = UILabel()
    headerLabel.text = NSLocalizedString("List of things:", comment: "")
    headerLabel.font = UIFont.preferredFont(forTextStyle: .largeTitle)
    headerLabel.textAlignment = .center
    headerLabel.textColor = UIColor.systemBlue
    
    continueButton = UIButton()
    continueButton.setTitle(NSLocalizedString("Continue", comment: ""), for: .normal)
    continueButton.addTarget(self, action: #selector(continueAction), for: .touchUpInside)
    
    tableView = UITableView()
    tableView.dataSource = self
    tableView.delegate = self
    // more semi-arbitrary table view configuration
    tableView.separatorStyle = .none
    tableView.showsVerticalScrollIndicator = false
    
    view.am_addSubviews(
      headerLabel,
      tableView,
      continueButton)
      
    // ... add constraints ...
  }
  
  // MARK: Actions
  
  @objc private func continueAction() {
    dismiss(animated: true, completion: nil)
  }
  
  // MARK: UITableViewDataSource
  
  /* ... */
  
  // MARK: UITableViewDelegate
  
  /* ... */
}

This is OK, but do you really need to know the implementation details of all the views so near the top of the file? In my experience, those parts of the code are written once and then never touched again.

Additionally, it's not great to use force-unwrapped optionals to store anything. But if we use let instead, then all views will be created at init time instead of in loadView().

View factories

We can solve a lot of problems by moving all view creation to the bottom of the file and using lazy var.

class MyViewController: UIViewController: UITableViewDataSource, UITableViewDelegate {
  private lazy var headerLabel = makeHeaderLabel()
  private lazy var tableView = makeTableView()
  private lazy var continueButton = makeContinueButton()
  
  override func loadView() {
    view = UIView()
    
    view.am_addSubviews(
      headerLabel,
      tableView,
      continueButton)
      
    // ... add constraints ...
  }
  
  // MARK: Actions
  
  @objc private func continueAction() {
    dismiss(animated: true, completion: nil)
  }
  
  // MARK: UITableViewDataSource
  
  /* ... */
  
  // MARK: UITableViewDelegate
  
  /* ... */
  
  // MARK: View factories
  
  private func makeHeaderLabel() -> UILabel {
    let headerLabel = UILabel()
    headerLabel.text = NSLocalizedString("List of things:", comment: "")
    headerLabel.font = UIFont.preferredFont(forTextStyle: .largeTitle)
    headerLabel.textAlignment = .center
    headerLabel.textColor = UIColor.systemBlue
    return headerLabel
  }
  
  private func makeTableView() -> UITableView {
    let tableView = UITableView()
    tableView.dataSource = self
    tableView.delegate = self
    // more semi-arbitrary table view configuration
    tableView.separatorStyle = .none
    tableView.showsVerticalScrollIndicator = false
    return tableView
  }
  
  private func makeContinueButton() -> UIButton {
    let continueButton = UIButton()
    continueButton.setTitle(NSLocalizedString("Continue", comment: ""), for: .normal)
    // `self` is available inside `lazy var` method calls!
    continueButton.addTarget(self, action: #selector(continueAction), for: .touchUpInside)
    return continueBUtton
  }
}

The main advantage of this approach is that rarely-touched view creation code is both in a predictable place, and completely out of the way if you're browsing lots of files quickly. A bonus is that FUOs are not necessary due to the use of lazy var. And the factory method return types enable you to remove the explicit types from the property declarations.

That's all, thanks for reading!

Browserboard

October 15, 2021

In the middle of 2020, I was inspired to take everything I had learned from working on Literally Canvas and Buildy, and make a multi-user online whiteboard. The result is Browserboard, and I've been improving it regularly for the past year and a half.

screenshot

If you want to read more about it, check out the Browserboard Blog.

How to draw multi-line text on an HTML canvas in 2021

April 11, 2021

Yesterday, I added PNG export to Browserboard, my multiplayer whiteboard web app. About half the effort was spent getting text to render correctly. 90% of what I found about this topic on Google was garbage, especially on Stack Overflow, so here's my attempt at clearing things up for people who want to do this without relying on somebody's half-baked code snippet.

HTML Canvas has not changed in 18 years

Apple created the <canvas> element to enable Mac developers to create widgets for Dashboard, a feature that died in 2019. Canvas replicates a subset of the Core Graphics C API in JavaScript. Naturally, this hack intended for a proprietary OS feature became the foundation of thousands of browser games and the only browser-native way to generate PNG images from JavaScript.

Because <canvas> is based primarily on a macOS graphics API and not the web, it was not designed with great web support in mind. In particular, its text rendering capabilities are extremely poor. Some issues include:

Line breaks are ignored.
Only one font and style can be used. Multiple styles are not possible.
Text will not wrap automatically. There is a “max width” argument, but it stretches the text instead of wrapping.
Only pixel-based font and line sizes are supported.

So the best we can hope for in “multi-line text support in <canvas>” is to support line breaks, text wrapping, and a single font style. Supporting right-to-left languages is an exercise for the reader.

There is a good JavaScript library for drawing multi-line text in `<canvas>`

There's a library called Canvas-Txt that is as good as it gets, but for some reason doesn't rise above the blog trash on Google.

If you got here just trying to figure out how to accomplish this one task, there is your answer. You can stop here.

Deriving `<canvas>` text styles from the HTML DOM

For Browserboard's PNG export, I needed a way to configure Canvas-Txt to match my quill.js contenteditable text editor. The key to doing that is CSSStyleDeclaration.getPropertyValue(), which you can use to find out any CSS property value from its computed style.

The TypeScript code snippet below finds the first leaf node in a DOM element and applies its styles to Canvas-Txt. (If you're using JavaScript, you can just delete all the type declarations and it should work.)

import canvasTxt from "canvas-txt";

function findLeafNodeStyle(ancestor: Element): CSSStyleDeclaration {
  let nextAncestor = ancestor;
  while (nextAncestor.children.length) {
    nextAncestor = nextAncestor.children[0];
  }
  return window.getComputedStyle(nextAncestor, null);
}

function renderText(
  element: HTMLElement,
  canvas: HTMLCanvasElement,
  x: number,
  y: number,
  maxWidth: number = Number.MAX_SAFE_INTEGER,
  maxHeight: number = Number.MAX_SAFE_INTEGER
) {
  const ctx = canvas.getContext("2d");
  if (!ctx) return canvas;

  // const format = this.quill.getFormat(0, 1);
  const style = findLeafNodeStyle(element);

  ctx.font = style.getPropertyValue("font");
  ctx.fillStyle = style.getPropertyValue("color");

  canvasTxt.vAlign = "top";
  canvasTxt.fontStyle = style.getPropertyValue("font-style");
  canvasTxt.fontVariant = style.getPropertyValue("font-variant");
  canvasTxt.fontWeight = style.getPropertyValue("font-weight");
  canvasTxt.font = style.getPropertyValue("font-family");
  // This is a hack that assumes you use pixel-based line heights.
  // If you're rendering at something besides 1x, you'll need to multiply this.
  canvasTxt.lineHeight = parseFloat(style.getPropertyValue("line-height"));
  // This is a hack that assumes you use pixel-based font sizes.
  // If you're rendering at something besides 1x, you'll need to multiply this.
  canvasTxt.fontSize = parseFloat(style.getPropertyValue("font-size"));

  // you could probably just assign the value directly, but in TypeScript
  // we try to explicitly handle every possible case.
  switch (style.getPropertyValue("text-align")) {
    case "left":
      canvasTxt.align = "left";
      break;
    case "right":
      canvasTxt.align = "right";
      break;
    case "center":
      canvasTxt.align = "center";
      break;
    case "start":
      canvasTxt.align = "left";
      break;
    case "end":
      canvasTxt.align = "right";
      break;
    default:
      canvasTxt.align = "left";
      break;
  }

  canvasTxt.drawText(
    ctx,
    this.quill.getText(),
    x,
    y,
    maxWidth,
    maxHeight
  );
}

So there you go. Good luck.

Oscillator Drum Jams: Coda

February 16, 2020

This is the fifth post in a series about my new app Oscillator Drum Jams. Start here in Part 1.

You can download Oscillator Drum Jams at oscillatordrums.com.

Earlier this year I learned that Garageband on my iPhone can do multitrack recording when I plug it into my 16-channel USB audio interface. This is an object less than 6 inches long handling tasks that would have required thousands of dollars of equipment twenty years ago, accessible to most teenagers today.

The audio system running on iPhones was designed in 2003 for desktop Macs running at around 800 MHz, slightly slower than the original iPhone’s processor. It’s a complex system, but the high level APIs are consistent and well-documented. As a result, there are many fantastic audio-related apps on the store: synthesizers, metronomes, music players, and toys that make music creation accessible to anyone. And because there’s a USB audio standard shared between iOS and macOS, there’s no need to install drivers.

I’m really grateful that I’m able to build on the work of past engineers to make Oscillator Drum Jams. It wasn’t easy, but I was ultimately able to ship it because the pieces already exist and can be plugged together by a single person working on occasional nights and weekends.

I’m also grateful that I got the opportunity to work on this project with Jake, whose passion and dedication to his music meant that we had over a hundred loops to share with the drum students of the world.

Oscillator Drum Jams: The Interface

February 16, 2020

This is the fourth post in a series about my new app Oscillator Drum Jams. Start here in Part 1.

You can download Oscillator Drum Jams at oscillatordrums.com.

This will be a shallower post than the others in this series. I just want to point out a few things.

The original UI was backwards

When I started this project, I thought about it like a programmer: as a view of a collection of data. So I naturally created a hierarchical interface: pages contain exercises, and exercises have a bunch of stuff in them. I worked really hard on an “exercise card” that would slide up from the bottom of the screen and could be swiped up to show more detail or swiped down to be hidden.

Screenshot of old design

After an embarrassing amount of time, I realized I was optimizing for the wrong thing. Really, I was optimizing for nothing. I finally asked myself what people would want to do while using this app. My speculative answers were, in order of frequency:

Stop and start loops
Adjust the tempo
Go to another exercise within the same page
Go to another page

With that insight—and no user research, so never hire me as a PM—I made some wireframes:

Wireframe 1

Wireframe 2

I shed a single tear for my “wasted” work and spent the next couple of weekends replacing all of my UI code.

Although the iPad wireframe was still a bit silly, we ended up in a pretty good place. The important thing is the play/pause button is nice and big. At some point I expect to rearrange all the controls on iPad, though, because the arrangement doesn't have any organizing principle to it.

(It does look much better than it could have due to the efforts of designer Hannah Lipking!)

Final screenshot of iPhone app

Final screenshot of iPad app

AutoLayout is a pain

I did this whole project using nothing but NSLayoutConstraint for layout, and I regret it. Cartography or FlexLayout would have saved me a lot of time and bugs.

Continue to Part 5: Coda

Oscillator Drum Jams: The Audio Player

February 1, 2020

This is the second post in a series about my new app Oscillator Drum Jams. Start here in Part 1.

You can download Oscillator Drum Jams at oscillatordrums.com.

With my audio assets in place, I started work on a proof of concept audio player and metronome.

The audio player in Oscillator has three requirements: 1. It must support multiple audio streams playing exactly in sync. 2. It must loop perfectly. 3. It must include a metronome that matches the audio streams at any tempo.

Making the audio player work involved solving a bunch of really easy problems and one really hard problem. I’m going to gloss over lots of detail in this post because I get a headache just thinking about it.

AudioKit

I used AudioKit, a Swift wrapper on top of Core Audio with lots of nice classes and utilities. My computer audio processing skills are above average but unsophisticated, and using AudioKit might have saved me time.

I say “might have saved me time” because using AudioKit also cost me time. They changed their public APIs several times in minor version bumps over the two years I worked on this project, and the documentation about the changes was consistently poor. I figured things out eventually by experimenting and reading the source code, but I wonder if I would have had an easier time learning Core Audio myself instead of dealing with a feature-rich framework that loves rapid change and hates documentation.

Time stretching is easy unless you want a metronome

Playing a bunch of audio tracks simultaneously and adjusting their speed is simple. Create a bunch of audio players, set them to loop, and add a time pitch that changes their speed and length without affecting their pitch.

My first attempt for adding a metronome to these tracks was to keep doing more of the same: record the metronome to an audio track with the same length as the music tracks and play them simultaneously.

This syncs up perfectly, but sounds horrible when you play it faster or slower than the tempo it was recorded at. This is because each tick of a metronome is supposed to be a sharp transient. If you shorten the metronome loop track, each metronome tick becomes shorter, and because the algorithm can’t preserve all the information accurately, it gets distorted and harder to hear. If you lengthen the metronome loop track, the peak of the metronome’s attack is stretched out, so the listener can’t hear a distinct “tick” that tells them exactly when the beat occurs.

My first solution to this was to use AudioKit’s built-in AKMetronome class. This almost worked, but because it was synchronized to beats-per-minute rather than the sample length of the music tracks, it would drift over time due to tiny discrepancies in the number of audio ticks between the two.

My second, third, and fourth solutions were increasingly hacky variations on my first solution.

My fifth and successful metronome approach was to use a MIDI sequencer that triggers a callback function on each beat. On the first beat, the music loops are all be triggered simultaneously, and a metronome beat is played. On subsequent beats, just the metronome is played.

Metronome timing is hard

With a metronome that never drifted, I still had an issue: the metronome would consistently play too late when the music was sped up, and too early when the music was slowed down.

The reason is obvious when you look at the waveforms:

Illustration of waveforms The peak of each waveform doesn't match exactly with the mathematical location of each beat, because each instrument’s note has an attack time between the start of the beat and the peak of the waveform. When we slow down a loop, the attack time increases, but the metronome attack time is the same, so the music starts to sound “late” relative to the metronome. If we speed it up, the attack time decreases, and it starts to sound “early.”

To get around this, I did some hand-wavey math that nudges the metronome forward or backward in time relative to the time pitch adjustment applied to the music tracks.

This approach uses the CPU in real time, which adds risk of timing problems when the system is under load, but in practice it seems to work fine.

Continue to Part 4: The Interface

Oscillator Drum Jams: The Asset Pipeline

January 31, 2020

This is the second post in a series about my new app Oscillator Drum Jams. Start here in Part 1.

You can download Oscillator Drum Jams at oscillatordrums.com.

To start making this app, I couldn’t just fire up Xcode and get to work. The raw materials were (1) a PDF ebook, and (2) a Dropbox folder full of single-instrument AIFF tracks exported from Jake’s Ableton sessions. Neither of those things could ship in the app as-is; I needed compressed audio tracks, icons for each track representing the instrument, and the single phrase of sheet music for every individual exercise.

Screenshot of Oscillator with controls for each track

Processing the audio

Each music loop has multiple instruments plus a drum reference that follows the sheet music. We wanted to isolate them so people could turn them on and off at will, so each exercise has 3-6 audio files meant to be played simultaneously.

Jake made the loops in Ableton, a live performance and recording tool, and its data isn’t something you can just play back on any computer, much less an iPhone. So Jake had to export all the exercises by hand in Ableton’s interface.

Ableton Live

We had to work out a system that would minimize his time spent clicking buttons in Ableton’s export interface, and minimize my time massaging the output for use in the app. Without a workflow that minimizes human typing, it’s too easy to introduce mistakes.

The system we settled on looked like this:

p12/
    #7/
       GUITAR.aif
       BASS.aif
       Drum Guide.aif
       Metronome.aif
    #9/
       BASS.aif
       Drum Guide.aif
       GUITAR.aif
       Metronome.aif

p36 50BPM triplet click/
                        #16/
                           Metronome.aif
                           GUITAR.aif
                           BASS.aif
                           RHODES.aif
                           MISC.aif
                           Drum Guide.aif

The outermost folder contains the page number. Each folder inside a page folder contains audio loops for a single exercise. The page or the exercise folder name may contain a tempo (“50BPM”) and/or a time signature note (“triplet click”, “7/8”). This notation is pretty ad hoc, but we only needed to handle a few cases. We changed the notation a couple of times, so there were a couple more conventions that work the same way with slight differences.

I wrote a simple Python script to walk the directory, read all that messy human-entered data using regular expressions, and output a JSON file with a well-defined schema for the app to read. I wanted to keep the iOS code simple, so all the technical debt related to multiple naming schemes lives in that Python script.

The audio needed another step: conversion to a smaller format. AIFF, FLAC, or WAV files are “lossless,” meaning they contain 100% of the original data, but none of those formats can be made small enough to ship in an app. I’m talking gigabytes instead of megabytes. I needed to convert them to a “lossy” format, one that discards a little bit of fidelity but is much, much smaller.

I first tried converting them to MP3. This got the app down to about 200 MB, but suddenly the beautiful seamless audio tracks had stutters between each loop. When I looked into the problem, I learned that MP3 files often contain extra data at the end because of how the compression algorithm works, making seamless looping very complex. MP3 was off the table.

Fortunately, there are many other lossy audio formats supported on iOS, and M4A/MPEG-4 has perfect looping behavior.

Finally, because Jake’s Ableton session sometimes contains unused instruments, I needed to delete files that contained only silence. This saved Jake a lot of time toggling things on and off during the export process. I asked FFmpeg to find all periods of silence in a file, and if a file had exactly one period of silence exactly as long as the track, I could safely delete the file.

Here’s how you find the silences in a file using FFmpeg:

ffmpeg
  -i <PATH>
  -nostdin
  -loglevel 32
  -af silencedetect=noise=\(-90.0dB):d=\(0.25)
  -f null
  -

Here’s how the audio pipeline ended up working once I had worked out all the kinks: 1. Loop over all the lossless AIFF audio files in the source folder. 2. Figure out if a file is silent. Skip it if it is. 3. Convert the AIFF file to M4A and put it in the destination folder under the same path. 4. Look at all the file names in the destination folder and output a JSON file listing the details for all pages and exercises.

Creating the images

The exercise images were part of typeset sheet music like this:

Sheet music

There were enough edge cases that I never considered automating the identification of exercises in a page, but I also never considered doing it by hand in an image editor either. No, I am a programmer, and I would rather spend 4 hours writing a program to solve the problem than spending 4 hours solving the problem by hand!

I started by using Imagemagick to convert the PDF into PNGs. Then I wrote a single-HTML-file “web app” that could use JavaScript to display each page of sheet music, with a red rectangle following my mouse. The JavaScript code assigned keys 1-9 to different rectangle shapes, so pressing a key would change the size of the rectangle. When I clicked, the rectangle would “stick” and I could add another one. The points were all stored as fractions of the width and height of the page, in case I decided to change the PPI (pixels per inch) of the PNG export. I’m glad I made that choice because I tweaked the PPI two or three times before shipping.

Here’s what that looked like to use:

Red rectangles around sheet music

The positions of all the rectangles on each page were stored in Safari’s local storage as JSON, and when I finished, I simply copied the value from Safari’s developer tools and pasted it into a text file.

Now that I had a JSON file containing the positions of every exercise on every page, I could write another Python script using Pillow to crop all the individual exercise images out of each page PNG.

But that wasn’t enough. The trouble with hand-crafted data is you get hand-crafted inconsistencies! Each exercise image had a slightly different amount of whitespace on each side. So I added some code to my image trimming script that would detect how much whitespace was around each exercise image, remove it, and then add back exactly 20 pixels of whitespace on each side.

I still wish I had found a way to remove the number in the upper left corner, but at the end of the day I had to ship.

Diagrams of the asset pipeline

Ableton session → Jake exports to AIFF → Dropbox folder of AIFF files, some all silence → Python script → m4a files omitting silent ones → Python script → JSON manifest

Book PDF → convert to PNG → Steve adds rectangles using special tool → Python script → exercise PNGs

Continue to Part 3: The Audio Player

Introducing Oscillator Drum Jams

January 31, 2020

For the past two years, I’ve been slowly working with my drum teacher Jake Wood on an interactive iOS companion to Oscillator, his drum book for beginner. The app is called Oscillator Drum Jams, and it’s out now!

Jake wrote almost 150 music loops tailored to individual exercises in the book. The app lets you view the sheet music for each exercise and play the corresponding loop at a range of tempos.

Instead of practicing all week to a dry metronome, or spending time making loops in a music app like Figure, students can sit down with nothing but their phone and have all the tools they need be productive and engaged.

The app supports all iPhone and iPad models that can run iOS 11, in portrait or landscape mode.

This project ties together a lot of skills, and I’m going to unpack them in a series of posts following this one.

If you enjoy this series, you might also want to check out my procedural guitar pedal generator.

The outcome you are looking for is productivity

How to move the metrics

Practice a lot and build a mental model

Learn when to cut bait

Develop a taste for models

Get good at prompting

Get the agent to do the right thing more often

Document things and help agents find the docs

Garden your prompts

Use popular technologies with built-in guardrails

Get the agent to do the wrong thing less often

Use automated typechecking, linting, and testing

Write consistent code, or make “the right way” obvious

Treat bad agent behavior like a developer experience problem

Use the full power of the tools

Keep momentum

Give agents the tools to see and solve problems without your involvement

Multitask

Prompt boldly, commit conservatively

It is worth finding out whether it works for you

addSubviews

@AssignedOnce

View Factories

View factories

HTML Canvas has not changed in 18 years

There is a good JavaScript library for drawing multi-line text in <canvas>

Deriving <canvas> text styles from the HTML DOM

The original UI was backwards

AutoLayout is a pain

AudioKit

Time stretching is easy unless you want a metronome

Metronome timing is hard

Processing the audio

Creating the images

Diagrams of the asset pipeline

`addSubviews`

`@AssignedOnce`

There is a good JavaScript library for drawing multi-line text in `<canvas>`

Deriving `<canvas>` text styles from the HTML DOM