My Experience with Claude 3.5 Sonnet: Pros and Cons in 2026
My Experience with Claude 3.5 Sonnet: Pros and Cons in 2026
TL;DR: I have used Claude 3.5 Sonnet as a daily driver across writing, coding and research from June 2025 to May 2026, roughly 11 months. By 2026 the model is a generation behind the latest Claude family but stays useful for three specific things: long-form writing where voice matters, fast back-and-forth coding chat where speed beats peak quality, and document review where cost per million tokens is the binding constraint. I moved data analysis, deep multi-file refactoring, and anything visual off Sonnet 3.5 to newer models in late 2025. Cost over 11 months: $214 on the API for personal use, plus $20 a month on Claude Pro. Worth it if you write and code daily and want a steady, fast model that does not break the bank.
Jump To
- How We Tested
- Where It Wins
- Where It Falls Behind
- Performance and Cost
- Pros and Cons
- Who This Is For
- Bottom Line
How We Tested
Personal usage across 11 months ending April 30, 2026. Two access paths: Claude.ai web app on Pro at $20 per month and API access via the Anthropic console with a Tier 2 limit. Token usage breakdown from the Anthropic dashboard: 14.2 million input tokens, 2.8 million output tokens over the 11 months. Spend on API: $214. Spend on Claude Pro chat: $220. Tasks logged in a Notion tracker with one-line task descriptions and a 1 to 5 satisfaction score. Categories: writing (blog drafts, edits, social posts), coding (PHP, TypeScript, Python, shell), research (synthesising long papers, comparing options), document review (contract reading, postmortem drafting). I compared against GPT-4o and the newer Anthropic models on the same workload during two specific weeks (October 14-20, 2025 and February 9-15, 2026) where I ran each task through two models and rated outputs blind, marking which I would actually ship to a client or use as final. Sample size: 312 tasks logged total. Bias caveat: I am the only rater, and I know which model wrote which output if I notice the writing voice.
Where It Wins
Voice in long-form writing. Sonnet 3.5 has a writing voice that feels less generic than most other models I have used in 2026, including the newer Claude family on default temperature. It writes with restraint, does not stack metaphors, and respects the brief on tone. I have a personal style guide (short sentences, no hedging, no rhetorical questions in body paragraphs) that I paste into the system prompt. Sonnet 3.5 holds the style across a 2,000-word piece more consistently than GPT-4o or newer Claude on default settings. Where I notice this most: ghost-writing for friends and clients. Three of them passed Sonnet-generated drafts as their own work without anyone noticing. That is not a measure of accuracy but it is a real test of writing voice. The newer Claude models can be tuned to match Sonnet voice via system prompt but require more careful prompting; Sonnet 3.5 just does it by default.
Coding chat speed. Sonnet 3.5 is faster than the newer Claude models on simple coding tasks. Median time to first token is about 360 milliseconds for me, vs about 900 ms on newer Claude. For a back-and-forth conversation where I am iterating on a Symfony controller method or a regex, the speed difference matters more than the quality difference. The newer Claude models are noticeably better on harder coding problems (multi-file edits, agentic workflows, anything with reasoning) but slower for trivial tasks. I keep Sonnet 3.5 in a separate window for quick lookup-and-rewrite work and switch to the newer Claude when I need actual problem-solving. Document review and summarisation. Sonnet 3.5 reads a 30-page PDF reliably. I feed it postmortem source documents (Slack transcripts, deploy logs, customer tickets) and ask for a draft postmortem in our template. The output is consistently 80 to 85 percent of the way to a finished doc. Newer models do better at 90 to 95 percent on the same task but cost about 4 times more per token. For postmortem volume that I write (about 6 a month), Sonnet 3.5 economics still win.
Where It Falls Behind
Multi-file code work. The newer Claude family in 2026 ships with agentic abilities that Sonnet 3.5 simply does not have. Ask Sonnet 3.5 to open three files, edit one, run a test, observe the failure, and edit a second file. It tries but breaks down after two steps. The newer models handle this kind of chain in a single conversation. I moved any refactor work over a single file to the newer Claude in late 2025 and never went back. Data analysis. Sonnet 3.5 cannot run code. It writes good Python and SQL but the lack of an execution loop means analysis tasks take 3 to 5 round trips of write-paste-error-rewrite. The newer Claude models with Code Interpreter equivalents close that loop. Same task takes one round trip instead of four. The cost saving from cheaper Sonnet tokens is more than wiped out by the round-trip overhead.
Anything with images. Sonnet 3.5 takes images as input but the analysis is shallow by 2026 standards. Ask it to identify a UI bug in a screenshot. It will sometimes get it right, sometimes confabulate. Newer Claude and competing GPT-4 vision are noticeably better. I do not feed Sonnet images anymore unless the alternative is no model at all. Long context. Sonnet 3.5 has a 200k token context window which is generous, but the recall accuracy on a needle-in-haystack test from my own data falls off after about 80k tokens. Tested in March 2026: I embedded a sentence at token position 150k and asked Sonnet to retrieve it. Got the wrong sentence 6 of 10 attempts. Newer Claude models with 1M context are reliable up to 700k tokens in the same test. So if you actually need to use the full context window, you have outgrown Sonnet 3.5.
- Win: writing voice holds steady across 2,000 word drafts without much prompting
- Win: 360 ms time to first token on simple chat is the fastest in the Claude family
- Win: API cost is about a quarter of newer Claude per million output tokens
- Gripe: cannot do multi-file or agentic code work reliably
- Gripe: long-context recall degrades sharply past 80k tokens
Performance and Cost
API pricing as of April 30, 2026: Claude 3.5 Sonnet is at $3 per million input tokens and $15 per million output tokens. Newer Claude models in the same family sit at $15 per million input and $75 per million output (5x more on both sides). GPT-4o is at $5 per million input and $20 per million output, roughly between Sonnet 3.5 and newer Claude on price. Latency: Sonnet 3.5 median first token 360 ms, p95 720 ms. Newer Claude median 900 ms, p95 1.6 seconds. GPT-4o median 480 ms, p95 1.1 seconds. Throughput: Sonnet 3.5 averages 70 to 80 output tokens per second once streaming starts. Newer Claude is around 40 to 50. GPT-4o around 60. For a chat-driven workflow where each turn is small, Sonnet 3.5 feels noticeably snappier. Total cost for my 11 months of personal use: $214 API plus $220 Pro subscription, $434 total. By task category: writing $113, coding $89, research $42, document review $190. Document review is the largest line because postmortems and contract reads consume input tokens heavily. If I priced the same volume on newer Claude API, my bill would be about $1,600 over the same period.
| Model | Input price per million | Output price per million | Median time to first token | Best for |
|---|---|---|---|---|
| Claude 3.5 Sonnet | $3 | $15 | 360 ms | Writing, chat coding, doc review |
| Newer Claude (2026) | $15 | $75 | 900 ms | Agentic, multi-file, long context |
| GPT-4o | $5 | $20 | 480 ms | Vision, balanced general use |
| Claude 3.5 Haiku | $0.80 | $4 | 220 ms | Volume tasks, classifiers |
Pros and Cons
- Pro: writing voice is the most consistent in the Claude family across long drafts
- Pro: cheapest Claude option for high-volume document review and summarisation
- Pro: snappiest first-token time on simple chat tasks
- Pro: 200k context is enough for 95 percent of writing and review work
- Con: cannot handle agentic or multi-file coding workflows
- Con: long-context recall degrades past 80k tokens in practice
- Con: image analysis is shallow by 2026 standards
- Con: no code execution loop forces 3 to 5 round trips on analysis tasks
Who This Is For
Pick Claude 3.5 Sonnet if you do high-volume writing or document review and care about cost per token. Pick it if you want a fast chat companion for incremental coding work, particularly back-and-forth on small functions. Pick it as your default if you have a personal style guide you want followed without constant reminders. Skip Sonnet 3.5 if your work is multi-file refactors, agentic coding, or anything that benefits from code execution; the newer Claude is worth 5 times the price for those workflows. Skip it for image-heavy work; GPT-4o or newer Claude handle that better. Skip it if you genuinely need the full 200k or beyond; long-context recall is not what it should be. Skip it if you only chat occasionally; the API and Pro subscription together are overkill, just stay on the free tier. For most working writers and most working developers in 2026, Sonnet 3.5 plus the newer Claude in a separate window is the right pair.
Sonnet 3.5 is the workhorse you keep when the new model is the showpiece. Cheaper, faster, quieter, and still good enough for most days.
Bottom Line
Eleven months in, Sonnet 3.5 is still my default for writing and small-scope coding chat. The newer Claude is open in another tab for anything multi-step or agentic. The combined cost is about $40 per month on a heavy month and feels well spent. The honest concern: Anthropic will deprecate older Sonnet versions at some point and I will need to re-test the cost-per-task math on Haiku or the newer family. For now the economics work. If you are deciding between Sonnet 3.5 and the newer Claude, do not pick one. Use both. Treat Sonnet 3.5 as your first responder for cheap, fast, voiced output and reserve newer Claude for the work where the price difference is justified by results. Got a workflow I have not covered? Drop me a note. I will share the system prompt I use for writing and the postmortem template that survived 11 months.