Benchmarks and Profiling Out of the Box

Here’s something that surprised me: Go has built-in benchmarking. No BenchmarkDotNet to install. No configuration. Write a function, run go test -bench, get numbers.

And profiling? Also built in. CPU profiles, memory profiles, blocking profiles. All from the standard toolchain.

Coming from .NET where you install BenchmarkDotNet for proper benchmarks and a separate profiler for profiling, Go’s integrated approach is refreshing.

Writing Benchmarks

Benchmark functions start with Benchmark and take *testing.B:

func BenchmarkAdd(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Add(2, 3)
    }
}

That b.N is key. The framework controls it, running your code enough times to get stable measurements.

Run with:

go test -bench=.                    # all benchmarks
go test -bench=BenchmarkAdd         # specific benchmark
go test -bench=. -benchtime=5s      # run for 5 seconds
go test -bench=. -count=10          # run 10 times

Output:

BenchmarkAdd-8    1000000000    0.3192 ns/op

Translation: on 8 cores, ran 1 billion iterations, each took ~0.32 nanoseconds.

Comparing to BenchmarkDotNet

In C#, you’d write:

[Benchmark]
public int Add() => Math.Add(2, 3);

Then run the benchmark project. BenchmarkDotNet gives you beautiful output, statistical analysis, memory diagnostics, multiple runtimes…

Go’s benchmarks are simpler. Less analysis, less ceremony, faster feedback.

Benchmarking with Setup

Don’t include setup in the measured loop:

func BenchmarkParse(b *testing.B) {
    data := loadTestData()  // setup, not measured
    
    b.ResetTimer()  // start timing from here
    
    for i := 0; i < b.N; i++ {
        Parse(data)
    }
}

b.ResetTimer() excludes setup time from measurements.

Memory Benchmarks

Track allocations:

go test -bench=. -benchmem

Output:

BenchmarkParse-8    50000    32145 ns/op    4096 B/op    12 allocs/op

Now you see: 4096 bytes allocated per operation, 12 separate allocations.

You can also report memory in code:

func BenchmarkAllocations(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = make([]byte, 1024)
    }
}

Benchmark Comparison

Compare different implementations:

func BenchmarkConcatPlus(b *testing.B) {
    for i := 0; i < b.N; i++ {
        s := "hello" + " " + "world"
        _ = s
    }
}

func BenchmarkConcatBuilder(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var sb strings.Builder
        sb.WriteString("hello")
        sb.WriteString(" ")
        sb.WriteString("world")
        _ = sb.String()
    }
}

Run both:

go test -bench=BenchmarkConcat

Use benchstat for statistical comparison:

go install golang.org/x/perf/cmd/benchstat@latest

go test -bench=. -count=10 > old.txt
# make changes
go test -bench=. -count=10 > new.txt

benchstat old.txt new.txt

Output shows whether changes are statistically significant.

Sub-Benchmarks

Test different sizes:

func BenchmarkSort(b *testing.B) {
    sizes := []int{10, 100, 1000, 10000}
    
    for _, size := range sizes {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            data := generateData(size)
            b.ResetTimer()
            
            for i := 0; i < b.N; i++ {
                sort.Ints(data)
            }
        })
    }
}

Output:

BenchmarkSort/size=10-8       5000000     234 ns/op
BenchmarkSort/size=100-8       500000    3456 ns/op
BenchmarkSort/size=1000-8       30000   45678 ns/op
BenchmarkSort/size=10000-8       2000  678901 ns/op

CPU Profiling

Generate a CPU profile:

go test -bench=. -cpuprofile=cpu.prof

Analyse with pprof:

go tool pprof cpu.prof

Interactive commands:

(pprof) top           # hottest functions
(pprof) top --cum     # including callees
(pprof) list FuncName # source annotation
(pprof) web           # open graph in browser

Or the web UI:

go tool pprof -http=:8080 cpu.prof

Opens a browser with flame graphs, call graphs, source annotations.

Memory Profiling

Generate memory profiles:

go test -bench=. -memprofile=mem.prof

Analyse:

go tool pprof mem.prof
(pprof) top           # biggest allocators
(pprof) list FuncName # source with allocation sizes

Two views:

go tool pprof -alloc_space mem.prof  # total bytes allocated
go tool pprof -inuse_space mem.prof  # bytes in use at snapshot

Runtime Profiling

For running applications (not benchmarks):

import (
    "net/http"
    _ "net/http/pprof"  // side-effect import
)

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // your application
}

Now you can:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/goroutine

Profile a live application without restarting.

The Comparison

Feature	BenchmarkDotNet	Go Benchmarks
Setup	NuGet + project	Built-in
Statistical analysis	Extensive	Basic (use benchstat)
Memory tracking	Detailed	Good
Profiling integration	Separate tools	Built-in
Output format	Beautiful	Functional
Multiple runtimes	Yes	No (just Go)
Configuration	Attributes	Code

Practical Tips

Avoid Compiler Optimisation

The compiler might optimise away unused results:

// BAD: compiler might skip the work
func BenchmarkBad(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Add(2, 3)  // result unused, might be optimised away
    }
}

// GOOD: use the result
var result int

func BenchmarkGood(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = Add(2, 3)
    }
    result = r  // prevent optimisation
}

Benchmark What Matters

Don’t benchmark trivial operations:

// Pointless: measures loop overhead more than Add
func BenchmarkTrivial(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = 2 + 3
    }
}

Benchmark real workloads with realistic data sizes.

Profile Before Optimising

Don’t guess. Profile first:

go test -bench=SlowFunction -cpuprofile=cpu.prof
go tool pprof -http=:8080 cpu.prof

Find the actual bottleneck. Often it’s not where you’d expect.

The Honest Take

Go’s built-in benchmarking is one of its genuinely great features. Zero setup, good enough for most needs.

What Go does better:

Zero dependencies for benchmarking
Integrated profiling
pprof is excellent
Quick feedback loop
Live application profiling is easy

What BenchmarkDotNet does better:

More rigorous statistical analysis
Better output formatting
Multiple runtime comparison
More configuration options
Memory diagnostics are more detailed

The verdict: For day-to-day “is this fast enough?” benchmarking, Go’s built-in tools are excellent. For publishing performance claims or deep analysis, BenchmarkDotNet is more thorough.

But Go wins on the “I can just do this” factor. No packages to install. No project to configure. Write a function, run a command, get numbers. That low friction means you actually benchmark things.

That wraps up Phase 2 on the ecosystem: project structure, testing, mocking, and benchmarking. These are the tools you’ll use every day. Go’s approach is consistently simpler than .NET’s, with trade-offs in power and flexibility. But for most work, simpler is better.

Writing Benchmarks#

Comparing to BenchmarkDotNet#

Benchmarking with Setup#

Memory Benchmarks#

Benchmark Comparison#

Sub-Benchmarks#

CPU Profiling#

Memory Profiling#

Runtime Profiling#

The Comparison#

Practical Tips#

Avoid Compiler Optimisation#

Benchmark What Matters#

Profile Before Optimising#

The Honest Take#

Stay Updated

Writing Benchmarks

Comparing to BenchmarkDotNet

Benchmarking with Setup

Memory Benchmarks

Benchmark Comparison

Sub-Benchmarks

CPU Profiling

Memory Profiling

Runtime Profiling

The Comparison

Practical Tips

Avoid Compiler Optimisation

Benchmark What Matters

Profile Before Optimising

The Honest Take