If you have more than five engineers and an authorized_keys file on a production host, there’s a non-trivial chance that file has a key in it belonging to someone who no longer works at the company. This is not a theoretical problem. It happens all the time, usually quietly, and it’s almost never detected until someone does an audit.
The fix isn’t hard in concept, but the path from “we have shared keys everywhere” to “we have no shared keys anywhere” takes more planning than most teams give it credit for.
Why keys spread
The behavior is rational at the individual level. An engineer gets a new laptop and copies their keys over. A contractor needs to access two servers, so someone adds their key to both. Someone else is offboarded, and whoever handles IT pulls their email access first and SSH access later. Or never.
No single one of those decisions is wrong. Added up over three years, you end up with dozens of copies of a dozen keys, and nobody has a clear map.
The pattern that works
The outline, regardless of what tool you use:
Step 1: Give every operator an SSO identity. If you already have Google Workspace, Okta, or Entra ID, you’re done with this step.
Step 2: Run a per-host agent (or a bastion) that can mint short-lived SSH certificates signed by a CA whose private key lives in KMS or an HSM.
Step 3: When an operator needs access, they authenticate to the agent or bastion via SSO, receive a cert valid for 15 minutes, and use it to connect.
Step 4: After a transition period, remove all long-lived keys from authorized_keys. Keep a single break-glass root key in a vault for actual emergencies.
Step 5: After another transition period (30 to 60 days of uneventful operation), destroy the break-glass key and declare the migration done.
Every good implementation of this pattern uses the same basic moves. The details of tooling change; the outline doesn’t.
What you actually give up
Be honest about the costs:
Offline access. Your SSH cert mint depends on the control plane. If it’s down, you can’t open new sessions. Existing sessions continue, but new ones fail. For most teams, this is fine; for some, it’s a dealbreaker.
Some SSH ergonomics. ~/.ssh/config aliases work differently in a cert-based world. Custom ProxyCommand setups may need rework. Plan for a week of grumbling.
Occasional latency. The cert mint adds a few hundred milliseconds on session start. After that, the session is direct. It’s noticeable if you’re opening 50 sessions in an hour; invisible otherwise.
What you gain
Offboarding becomes a one-step operation. Remove the SSO identity. Done.
Audit becomes tractable. Every session has a named identity attached, with a time, a source, and a cert ID.
Sharing keys becomes structurally impossible. Two operators can’t use the same credentials because there are no persistent credentials to share.
Expired contractors lose access on the expiry date. Without you having to remember.
The bit nobody talks about
The hardest part of this migration isn’t technical. It’s convincing the most senior engineer on the team (who probably has a hardware key and a 15-year-old ~/.ssh/config and strong opinions) that they personally need to switch too.
Don’t exempt them. Exemptions ruin the model, because the exempted operator is the one who most often gets their key copied around. Be kind about it, give them a week to adjust, make sure tmux sessions still work, and they’ll come around.
When not to do this
Three cases where traditional keys are still a better answer:
- Truly air-gapped networks where you cannot reach a control plane.
- One-person or two-person operations where the overhead of setting up the pattern exceeds the benefit.
- Hardware or robotics labs where the cert-mint dependency creates unacceptable brittleness.
For everyone else, it’s one of those investments that looks like overhead for two months and then feels like oxygen.
LynxTrac is free forever for up to 2 servers, no card required. If you want to try it on real infrastructure instead of reading about it: app.lynxtrac.com.
Related posts
Is browser-based SSH secure?
A walk through the actual threat model of browser-based SSH, what it trades away, and what it gains. The answer isn't a one-liner, but it's close.
Browser SSH vs traditional SSH: what actually changes
We put a team on browser-based SSH for six months. What genuinely changed day-to-day, what turned out not to matter, and the two places new operators still get stuck.
SSO and built-in XDR land in LynxTrac
Two things teams kept asking for are now live: single sign-on over SAML and OpenID Connect, and a Wazuh-powered XDR and SIEM suite on the agent you already run.