← AI 101
Beginner training

How Claude Actually Works

Build a working mental model of large language models — one level deeper than "it predicts language." Tokens, training, context windows, and why Claude says what it says, without the engineering lecture.

18 minutes Builds on Module 1.1 Includes context visualizer

What you'll be able to explain after this lesson

01

Describe what an LLM does

In plain terms, explain what Claude is doing when it answers — building a response one piece at a time from patterns it learned in training.

02

Explain why context length matters

Tell a coworker why long prompts, long files, and long chats all consume the same finite working space — and why that changes how you should use Claude.

03

Summarize training in two sentences

Describe what training on a large body of text actually gave Claude, and why a knowledge cutoff is not the same thing as memory.

How a large language model produces an answer

T

Training built the patterns

The vendor fed Claude a vast body of text and code over months of compute. Claude did not memorize that material — it learned statistical patterns about how language is used. Training has a cutoff date, after which the model knows nothing new unless you provide it.

P

Prediction builds the answer

When you send a prompt, Claude builds the reply one piece at a time, predicting useful next text from your prompt plus everything it learned. It is not retrieving a stored answer — it is composing one in the moment.

Context shapes every next word

Everything in the conversation — system instructions, your prompt, attached files, prior turns, Claude's own running output — sits in a single working space called the context window. It is finite, and it is what Claude can actually "see" when it predicts the next word.

Trainer note: Knowledge cutoff is not the same thing as memory. Claude does not remember yesterday's conversation. Every chat starts with a blank context — anything Claude needs to know, you have to put in front of it.

What occupies the context window

System prompt & instructions

The standing rules Claude operates under — set by the product (Claude.ai, Cowork, an enterprise integration). Small but always present.

Always on

Your turn & attachments

Everything you type and every file you attach. A short question costs almost nothing. A 40-page PDF can eat most of the budget on its own.

You control

The running conversation

Every prior turn — yours and Claude's — stays in the window as the chat grows. This is why long chats start to drift: the model is reading more and more material each turn.

Grows silently

Five rules for working with the model's mechanics

1

Front-load what matters

Put the most important context, instructions, and questions at the top of your prompt. Claude reads like a person skimming on deadline — the headline goes first.

2

Keep prompts lean

Trim ruthlessly before sending. Extra paragraphs of background, copy-pasted boilerplate, and "just in case" detail all crowd out the answer.

3

Attach the source, don't describe it

If a document is what you actually want Claude to work from, attach it. Paraphrasing a 14-page disclosure in two sentences gives Claude two sentences to reason from.

4

Restart when a chat is past its life

When answers start drifting — wrong tone, contradictions, repeated mistakes — open a clean chat and paste only the important context. You're resetting the budget.

5

Ask Claude to show its reasoning

On high-stakes work, ask Claude to walk through its thinking and cite which part of the source it used. Hidden reasoning is harder to verify than visible reasoning.

Weak prompt

Tell me about our underwriting policy on appraisal gaps.

Work-ready prompt

Attached is our 2026 underwriting guide (PDF). On page 14, the section on appraisal gaps. In plain English, what are the options when an appraisal comes in $8,000 low on a conventional loan? Quote the exact line for each option.

Tasks where understanding how Claude works matters most

Long-document summarization

Attaching a long file and asking the right question front-loaded gets a far better answer than pasting the document text into a long prompt.

Multi-step reasoning

Working through one problem in a tight, focused chat — instead of stacking unrelated questions on top of it — keeps the context clean.

Comparing two sources

Attaching both, labeling them clearly ("Source A:", "Source B:"), and asking a specific comparison question — instead of pasting both into one wall of text.

Structured extraction

Pulling names, dates, or numbers out of unstructured text. The cleaner the prompt and the leaner the source, the more reliable the extraction.

Five signals the model is straining

Employee rule: A straining model is not a broken tool — it's a signal to restart with a cleaner context. If you ignore the signal and ship the answer, you own the consequences.

Six exercises to make this concrete

Pick three. Save anything that produces a finding you'll remember.

  1. Open the context visualizer in this lesson. Run each scenario and write a one-sentence takeaway in your own words.
  2. Attach a long PDF you actually work with. Ask Claude for the second-most-important point, not the most important — and notice what kind of reading it had to do.
  3. Deliberately overload a chat with off-topic detours, then start a clean chat with only the relevant context. Compare the quality.
  4. Ask Claude to explain its last answer — which part of the source it used, and which part it didn't. Look for citations that don't check out.
  5. Run the same prompt twice — once with a thin description, once with the actual source attached. Compare the answers side by side.
  6. Write a one-paragraph explanation of how LLMs work for a coworker who has never used one. Trade with a teammate and grade each other.

Completion standard

You've finished this module when you can explain — in two sentences each — what training gave Claude, what tokens are, and why long chats drift. Bonus credit if you can describe one habit you'll change because of it.