Prompt Engineering Foundations: Universal Principles That Still Work

Mar 9, 2026

New

Prompt Engineering LLM Chain Of Thought Few-Shot Prompting

Why This Still Matters
The Six Universal Principles
What Does NOT Work
Practical Toolkit
Key Takeaway

Why This Still Matters

Prompt engineering has matured rapidly since ChatGPT’s release in late 2022. What began as ad hoc experimentation has become a structured discipline with documented best practices from every major AI provider. Google, OpenAI, Anthropic, and Amazon have all published comprehensive guides, and they agree on more than they disagree.

The core insight across all of these guides: prompt engineering is not about finding magic words. It is about clearly communicating intent. As Anthropic’s documentation puts it: “Think of Claude as a brilliant but new employee who lacks context on your norms and workflows. The more precisely you explain what you want, the better the result.”

Organizations that adopt structured prompt engineering frameworks report an average 67% productivity improvement across AI-enabled processes.

The Six Universal Principles

1. Be Clear and Specific

Vagueness is the single most common prompting failure. Asking for “a summary” without specifying length, format, audience, or focus gives the model too much room for interpretation.

Less effective:

Explain climate change.

More effective:

Write a 3-paragraph summary of the primary causes of climate change for high school students.
Use bullet points for each cause. Maintain a neutral, informational tone.

Google’s Gemini documentation emphasizes four input types: question input (direct queries), task input (specific actions), entity input (defined objects to operate on), and completion input (content to continue). Recognizing which type you are providing helps structure the prompt.

Practical test from Anthropic: Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they would be confused, the model will be too.

2. Provide Context, Not Just Instructions

Research consistently shows that context is “underrated and massively impactful.” Simply providing relevant background information can drastically improve performance --- often more than any clever prompting technique.

Less effective:

NEVER use ellipses.

More effective:

Your response will be read aloud by a text-to-speech engine, so never use ellipses
since the text-to-speech engine will not know how to pronounce them.

The model can generalize from the explanation. When it understands why a rule exists, it applies the spirit of the rule more reliably than when it only knows the letter of it. This applies to business context too --- models do not know your company’s privacy policies, database schemas, or edge cases. Include all relevant constraints upfront.

3. Use Few-Shot Examples

Few-shot prompting --- providing input-output examples before the actual task --- is the single most reliable technique for steering output format, tone, and structure.

Best practices:

Make them relevant: mirror your actual use case closely
Make them diverse: cover edge cases and vary enough to prevent unintended patterns
Use structural delimiters: wrap in tags (Anthropic recommends <example> tags, Bedrock uses <example> with H: and A: delimiters) so the model distinguishes them from instructions
Include 3-5 examples for best results; research shows diminishing returns beyond 4-5
Show positive patterns, not anti-patterns: demonstrate what good output looks like

Google adds the concept of prefixes --- labeling inputs and outputs (e.g., “English:” / “French:”) to help the model understand the mapping.

4. Structure Your Prompts with Delimiters

Delimiters help the model parse different sections unambiguously. All major providers recommend them, though they differ on format:

Format	Best For	Provider Preference
XML tags	Complex multi-section prompts, Claude	Anthropic, OpenAI (strong in testing)
Markdown headers	General use, hierarchical structure	Google, OpenAI (starting point)
JSON	Structured data (use cautiously)	Poor in multi-document tests

OpenAI’s GPT-4.1 testing found XML and delimited formats significantly outperformed JSON for multi-document contexts. JSON’s escaping overhead and verbosity degraded performance.

Anthropic recommends consistent, descriptive tag names with nesting:

<documents>
  <document index="1">
    <source>annual_report.pdf</source>
    <document_content>{{CONTENT}}</document_content>
  </document>
</documents>

Delimiters also defend against prompt injection by clearly separating instructions from user-provided content.

5. Give the Model Room to Think

Chain-of-thought (CoT) prompting --- encouraging step-by-step reasoning --- improves accuracy for multi-step problems. The simplest implementation: add “Think step by step.”

Tradeoffs:

Cost: 3-5x increase in token usage
Compute: reasoning chains can consume 10x more compute
Model size: benefits most pronounced with 100B+ parameter models

Structured Chain-of-Thought (SCoT) combines structured formatting with CoT reasoning. Research shows SCoT outperforms standard CoT by up to 13.79% on HumanEval for code generation.

Key distinction: Prompt chaining breaks tasks into sequential API calls. Chain-of-thought encourages reasoning within a single prompt. Both useful, for different situations.

Critical caveat for reasoning models: OpenAI’s o3 and o4-mini generate internal chains of thought. Adding explicit “think step by step” to reasoning models can degrade performance.

6. Iterate and Refine

Every official guide emphasizes that prompt engineering is empirical and iterative. OpenAI’s GPT-4.1 guide: “AI engineering is inherently an empirical discipline. Large language models are inherently nondeterministic.”

The iteration cycle:

Start with a baseline prompt
Test across diverse inputs
Identify failure patterns
Make targeted adjustments (rewording, adding examples, clarifying constraints)
Compare outputs
Repeat

As prompts move into production, they should be “hardened and optimized like production code.”

What Does NOT Work

Role Prompting Is Overrated

Despite widespread use of “You are a math professor” prefixes, research shows role prompts have “little to no effect on improving correctness.” The Prompt Report (1,500+ papers analyzed) found role prompting was the most frequently mentioned technique (105 mentions) but not the most effective. It influences tone and style, not accuracy. Context and examples do far more.

Longer Is Not Always Better

Prompt compression research shows 50-65% token reductions are possible without sacrificing output quality. The goal is signal density, not volume.

Threatening the AI

“If you don’t answer correctly, you will be shut down” --- these manipulation tactics no longer work and are actively counterproductive with modern models.

Negative Framing

“Don’t use markdown” is less effective than “Write in flowing prose paragraphs.” Models respond better to positive instructions that describe what to do.

Practical Toolkit

Recommended Prompt Structure

OpenAI recommends this ordering for system prompts:

# Role and Objective
# Instructions
## Sub-categories for detailed instructions
# Reasoning Steps
# Output Format
# Examples
# Context
# Final instructions and step-by-step thinking prompt

Universal Checklist

Before sending any prompt, verify:

Is the desired output format specified?
Is there enough context for the task?
Are there examples demonstrating the expected pattern?
Are instructions ordered logically (most important first)?
Are different sections clearly delimited?
Have you tested with varied inputs?

Key Takeaway

The fundamentals --- clarity, context, examples, structure, reasoning scaffolds, and iteration --- remain the highest-leverage techniques available. They work across models, are well-documented by every major provider, and require no special tooling.

Before reaching for advanced techniques, ask: Is my prompt clear? Does it include relevant context? Have I shown examples? If any answer is no, start there.

Next in the series: Model-Specific Prompting --- how Claude, GPT, and Gemini each require different approaches despite sharing these universal foundations.