The Certificate Rotation Crisis: When Production Security Meets AI Debugging Tenacity

July 06, 2025 •

Security mTLS API Phoenix Debugging

July 22, 2025 - Part 9

The Broken Authentication Discovery

After implementing the mobile-first responsive design in Part 8, our Phoenix LiveView blog had evolved into something genuinely impressive: globally distributed database, dual-endpoint architecture, comprehensive touch interfaces, and production-ready mTLS security. But lurking beneath this polished surface was a ticking time bomb that would test every debugging skill we’d developed over the past eight chapters.

The problem: The mTLS authentication we’d so carefully implemented in Part 6 had mysteriously stopped working.

The symptom: Every attempt to modify blog content through the secured API endpoints resulted in the dreaded tlsv1 alert unknown ca error.

The implication: Our blog had become read-only to its own creators.

What followed was perhaps the most systematic certificate debugging session I’ve ever witnessed—one that revealed both the fragile nature of production security and Claude’s remarkable persistence when faced with authentication failures.

The Tagging Request That Started Everything

The crisis began innocuously enough with a simple content management request:

Me: “From most recent to oldest posts: add Database, DevOps, SQLite; add SSL, TLS, mTLS, Security; add DevOps, Infrastructure, Fly.io; add UI/UX, Search, Frontend; add UI/UX, LiveView; add LiveView”

What should have been a straightforward PATCH operation using our mTLS-secured API endpoints immediately failed:

curl: (35) LibreSSL/3.3.6: error:1401E418:SSL routines:CONNECT_CR_FINISHED:tlsv1 alert unknown ca

Claude: “I’ll use the PATCH endpoint to add those tags…”

[Multiple failed authentication attempts]

Claude: “The client certificates aren’t being accepted by the server. This indicates a CA mismatch.”

The Certificate Archaeology Phase

What followed was methodical detective work that would make any security engineer proud. Claude began systematically analyzing the certificate chain:

The Local vs. Production Mismatch

Step 1: Certificate Inspection

openssl x509 -in priv/cert/clients/client-cert.pem -text -noout | grep "Issuer"
Issuer: C=US, ST=Test, L=Test, O=Test-CA, OU=Test, CN=Test-CA

Step 2: Production Server Analysis

openssl s_client -connect stephenloggedon.com:8443
Certificate chain
0 s:C=US, ST=CA, L=San Francisco, O=Blog API, CN=localhost
  i:C=US, ST=CA, L=San Francisco, O=Blog API, CN=Blog API CA

The smoking gun: Local certificates were signed by “Test-CA” but production expected “Blog API CA.”

Claude’s diagnosis: “The production server was deployed with different CA certificates than what we have locally. The server is rejecting our client certificates because they’re signed by the wrong authority.”

This was the moment that revealed a fundamental architectural problem: certificate lifecycle management in a CI/CD environment.

The “Just Regenerate Everything” Solution

Faced with a complete authentication breakdown, I made the executive decision that would define this entire debugging session:

Me: “Rotate all the certs and set them in the prod environment with fly secrets.”

What followed was the most systematic certificate regeneration I’ve ever witnessed. Claude didn’t just generate new certificates—it built a complete production certificate rotation pipeline:

Phase 1: Complete Certificate Regeneration

Fresh CA Authority:

openssl genrsa -out priv/cert/ca/ca-key.pem 4096
openssl req -new -x509 -key priv/cert/ca/ca-key.pem -out priv/cert/ca/ca.pem -days 3650 -subj "/C=US/ST=CA/L=San Francisco/O=Blog API/CN=Blog API CA"

New Server Certificates:

openssl genrsa -out priv/cert/server/server-key.pem 4096
openssl req -new -key priv/cert/server/server-key.pem -out priv/cert/server/server.csr -subj "/C=US/ST=CA/L=San Francisco/O=Blog API/CN=localhost"
openssl x509 -req -in priv/cert/server/server.csr -CA priv/cert/ca/ca.pem -CAkey priv/cert/ca/ca-key.pem -out priv/cert/server/server-cert.pem -days 365 -CAcreateserial

Fresh Client Certificates:

openssl genrsa -out priv/cert/clients/client-key.pem 4096
openssl req -new -key priv/cert/clients/client-key.pem -out priv/cert/clients/client.csr -subj "/C=US/ST=CA/L=San Francisco/O=Blog API Client/CN=api-client"
openssl x509 -req -in priv/cert/clients/client.csr -CA priv/cert/ca/ca.pem -CAkey priv/cert/ca/ca-key.pem -out priv/cert/clients/client-cert.pem -days 365 -CAcreateserial

Verification:

openssl verify -CAfile priv/cert/ca/ca.pem priv/cert/clients/client-cert.pem
priv/cert/clients/client-cert.pem: OK

The elegance: Every certificate was properly signed by the new CA, creating a complete chain of trust from scratch.

Phase 2: The Fly Secrets Deployment Dance

The next challenge was getting these certificates into production. But here’s where Claude’s research from Part 6 paid dividends—it remembered that our Phoenix application expected specific environment variable names:

Initial Deployment (Wrong Variable Names):

fly secrets set MTLS_CA_CERT="$(cat priv/cert/ca/ca.pem)"
fly secrets set MTLS_SERVER_CERT="$(cat priv/cert/server/server-cert.pem)"
fly secrets set MTLS_SERVER_KEY="$(cat priv/cert/server/server-key.pem)"

[Authentication still fails]

The Research Revelation: Claude dove back into the Phoenix configuration from Part 6 and discovered the application was looking for different variable names:

ssl_cacertfile = case System.get_env("SSL_CACERTFILE_CONTENT") do
ssl_certfile = case System.get_env("SSL_CERTFILE_CONTENT") do  
ssl_keyfile = case System.get_env("SSL_KEYFILE_CONTENT") do

Corrected Deployment:

fly secrets set SSL_CACERTFILE_CONTENT="$(cat priv/cert/ca/ca.pem)"
fly secrets set SSL_CERTFILE_CONTENT="$(cat priv/cert/server/server-cert.pem)"
fly secrets set SSL_KEYFILE_CONTENT="$(cat priv/cert/server/server-key.pem)"

The lesson: Configuration archaeology matters. Details from previous development phases become critical during production debugging.

The Moment of Truth

After complete certificate rotation and proper deployment, the moment arrived for testing:

curl -X PATCH https://stephenloggedon.com:8443/api/posts/18 --cert priv/cert/clients/client-cert.pem --key priv/cert/clients/client-key.pem -H "Content-Type: application/json" -d '{"tags": "Database,DevOps,SQLite"}' -k

Result:

{"data":{"id":18,"title":"The Database Evolution: When AI Discovers the Magic of Distributed SQLite",...,"tags":"Database, DevOps, SQLite"}}

🎉 SUCCESS!

The authentication was working. The tags were being updated. Our mTLS-secured API endpoints were alive again.

The Content Update Victory Lap

With authentication restored, Claude systematically updated all six posts as originally requested:

Post 18 (Most Recent): Database, DevOps, SQLite
Post 17 (mTLS Security): SSL, TLS, mTLS, Security
Post 16 (Deployment): DevOps, Infrastructure, Fly.io
Post 15 (Search): UI/UX, Search, Frontend
Post 14 (UI Polish): UI/UX, LiveView
Post 13 (Original): LiveView

Each PATCH request succeeded flawlessly, proving that our certificate rotation had not only fixed the authentication but maintained the security model we’d carefully constructed in Part 6.

What This Crisis Revealed About AI Debugging

This certificate rotation crisis showcased several fascinating aspects of AI-assisted debugging:

1. Systematic Root Cause Analysis

Claude didn’t just try random fixes. It:

Analyzed certificate chains methodically
Compared local vs. production configurations
Identified specific CA mismatches
Traced the problem to deployment pipeline issues

2. Historical Context Awareness

The solution required remembering configuration details from Part 6’s mTLS implementation. Claude successfully connected current failures to previous architectural decisions, demonstrating impressive contextual memory.

3. Complete Solution Implementation

Rather than patching individual certificates, Claude implemented a complete rotation strategy that addressed:

CA authority regeneration
Server certificate renewal
Client certificate recreation
Production deployment synchronization

4. Production Environment Respect

Throughout the debugging process, Claude never suggested shortcuts that would compromise security. The solution maintained all security properties while fixing the authentication failures.

The Meta-Security Moment

As I document this certificate rotation crisis, there’s a delicious irony at play: I’m using the very mTLS-secured API endpoints we just fixed to write about fixing them.

The PATCH requests that will save this blog post to our Turso database are protected by the exact same client certificates we regenerated during this debugging session. The authentication system that Claude debugged is now securing the documentation of its own repair process.

The recursive nature of our development adventure has reached peak meta-commentary.

The Production Security Lessons

This crisis taught several important lessons about production security management:

Certificate Lifecycle Management is Critical

The Problem: Production deployments with inconsistent certificate authorities create authentication dead zones.

The Solution: Systematic certificate rotation with proper environment synchronization.

Configuration Archaeology Pays Dividends

The Discovery: Environment variable naming conventions from previous development phases became critical for production deployment.

The Takeaway: Detailed configuration documentation isn’t just nice-to-have—it’s essential for debugging production issues.

AI Excels at Systematic Security Debugging

The Observation: Claude’s methodical approach to certificate chain analysis was more thorough than most human debugging sessions.

The Implication: AI can be a powerful ally for security incident response when given clear problem scopes.

What Still Requires Human Judgment

Despite Claude’s excellent debugging performance, certain aspects of this crisis required human decision-making:

Strategic Decisions: Choosing to rotate all certificates rather than attempting piecemeal fixes
Risk Assessment: Determining that brief authentication downtime was acceptable
Architecture Judgment: Understanding the security implications of different deployment strategies

AI can execute security solutions brilliantly, but strategic security decisions still benefit from human oversight.

Looking Back at Ten Chapters of AI Development

We’ve now built a Phoenix LiveView blog through ten distinct phases:

Foundation Building (Part 1): Basic functionality and authentication
Authentication Enhancement (Part 2): 2FA implementation
UI Polish (Part 3): User experience refinement
Search Implementation (Part 4): Complex filtering and discovery
Deployment Odyssey (Part 5): Production deployment challenges
Security Hardening (Part 6): mTLS authentication implementation
Database Evolution (Part 7): Distributed Turso migration
Architecture Discovery (Part 8): Dual-endpoint security model
Mobile Revolution (Part 9): Touch-first responsive design
Certificate Crisis (Part 10): Production security debugging

The Result: A genuinely production-ready blog platform with:

✅ Distributed database architecture with global replication
✅ Dual-endpoint security (public/authenticated)
✅ Mobile-first responsive design with touch gestures
✅ Comprehensive search and filtering
✅ Battle-tested mTLS authentication with certificate rotation capabilities

The Adventure Continues

With authentication restored and content properly tagged, our AI development adventure reaches another milestone. We’ve proven that AI can not only build complex applications but also debug sophisticated production security failures.

But questions remain:

How will this system scale under real user load?
What other production challenges await discovery?
Can we push the boundaries of AI-assisted development even further?

The recursive documentation loop continues, now secured by freshly rotated certificates and protected by the very authentication system we just debugged…

This post was written using the mTLS-secured API endpoints that were restored during the certificate rotation process described within it. The client certificates used to authenticate this PATCH request were generated using the exact procedures documented in these words.

Sometimes the best documentation is written with the tools being documented.