I’ve been noodling around with the Model Context Protocol lately, and I wanted to share something I’ve built that demonstrates what I think is a genuinely interesting pattern: how you can split responsibilities between an AI and a server in a way that plays to each of their strengths.
The project is a 3D scene builder. But before you start thinking about meshes, textures, and rendering pipelines—stop. This isn’t that. It’s far more minimal, and I’d argue, far more useful as a foundation.
What It Actually Does
The server maintains a collection of named objects in real-world geographic space. That’s it. No rendering, no assets, no visual output. Just a spatial registry that speaks in WGS84 coordinates—the same system your phone’s GPS uses.
Every object has:
| Property | What it is |
|---|---|
name | Unique identifier (“Home”, “Garden Oak”, “The Shed”) |
type | Freeform descriptor—could be “house”, “ancient oak”, or “portal to dimension X” |
lat/lon | Decimal degrees, WGS84 |
elevation | Metres above sea level |
orientation | Degrees clockwise from north |
scale | Uniform scale factor |
Why so minimal? Because downstream systems can interpret this data however they like. Feed it to a game engine, a GIS tool, a mapping API, whatever. The scene server doesn’t care about rendering—it cares about where things are.
The Interesting Bit: Who Does What
Here’s where it gets good. The server is deliberately precise and dumb. It speaks in absolute coordinates only. If you want to place something “20 metres southwest”, the server has no idea what that means. It just knows lat/lon.
Claude, on the other hand, is brilliant at understanding spatial language. “Behind the house”, “near the oak tree”, “along the path”—this is exactly the kind of fuzzy human concept that LLMs handle well.
So the division of labour is:
- Server: Storage, geodesic maths, geocoding
- Claude: Natural language interpretation, spatial reasoning, orchestration
Let me show you what this looks like in practice.
Real Conversation Patterns
“Add a house on the corner of Kiln Close and Church Lane”
Claude’s thinking here:
- User wants an object at a real-world location
- Need to convert that address to coordinates → call
geocode geocode("corner of Kiln Close and Church Lane")returns coords plus a confidence level- If confidence is low (say, “city” level), ask for clarification
- If confident, call
add_object(name="Home", type="house", lat=52.1234, lon=-0.5678)
The server provides the geocoding (via OpenStreetMap Nominatim) and the storage. Claude handles the conversational flow and error handling.
“Move Home 20 metres southwest”
This is where the split really shines:
- Claude interprets “20 metres southwest” → bearing of 225°, distance of 20m
get_object("Home")→ fetches current positioncalculate_offset(lat=52.1234, lon=-0.5678, bearing=225, distance=20)→ server does the geodesic mathsset_position("Home", lat=52.1232, lon=-0.5680)→ new position set
The server handles the tricky bit—calculating what “20 metres at bearing 225” means on a sphere. Claude just needs to know that southwest = 225°.
“Put a shed behind the house”
This one’s fun because “behind” is entirely contextual:
get_object("Home")→ gets position and crucially, orientation (say, 180° = facing south)- Claude reasons: “behind” = opposite of facing direction = 0° (north)
- Picks a reasonable distance (5m? 10m?) based on context
calculate_offset(...)with the computed bearingadd_object(name="Shed", type="shed", ...)
The server has no concept of “behind”. It just knows coordinates and orientations. Claude does the spatial reasoning.
The Toolset
The server exposes a focused set of tools:
Creating/Destroying:
add_object— place something in the sceneremove_object— delete by name
Transforming:
set_position— move to new coordinatesset_orientation— rotateset_scale— resize
Querying:
get_object— full details of one objectlist_objects— all objects, optionally filtered by type
Utilities:
geocode— natural language → coordinatescalculate_offset— “from here, go X metres at bearing Y”
Scene management:
get_scene— export everythingset_scene— import/replace everything
That’s the entire API. Twelve tools. No more than needed.
Why This Pattern Matters
I think there’s a broader lesson here about building MCP servers. The temptation is to make them smart—to handle natural language, fuzzy inputs, contextual interpretation. But that’s Claude’s job. That’s what the LLM is for.
Your MCP server should be:
- Precise — clear inputs, predictable outputs
- Minimal — do one thing well
- Portable — standard formats (WGS84, JSON, etc.)
- Stateless where possible — let the client handle persistence
The magic happens at the boundary. Claude interprets “the old oak tree by the pond” and turns it into a geocode query. The server returns precise coordinates. Claude stores the object. Later, when you say “move it closer to the house”, Claude knows which object you mean, calculates what “closer” means in spatial terms, and calls the appropriate tools.
Neither system is doing something it’s bad at. That’s the whole point.
Running It Yourself
If you want to have a play:
pip install fastmcp pyproj geopy
Clone the repo, install, and run:
scene-mcp
By default it uses stdio transport (for Claude Desktop). For web clients:
SCENE_MCP_TRANSPORT=http SCENE_MCP_PORT=8080 scene-mcp
Then connect your MCP client and start describing scenes. “Create a small village with a pub, a church, and a village green.” See what happens.
What’s Next
This is foundation work. The scene data is portable—it’s just coordinates and metadata. You could feed it to:
- A game engine (Unreal, Unity, Godot)
- A GIS visualisation tool
- A procedural terrain generator
- A VR environment builder
- Literally anything that understands lat/lon
The point is to separate scene description from scene rendering. Let conversations build the spatial model. Let specialised tools handle the visuals.
That’s the bet, anyway. I’ll write more as I build out the rendering side. For now, give the scene server a go and let me know what you think.
The 3D Scene MCP Server is available on GitHub. It’s MIT licensed because life’s too short for anything else.