Skip to content
Back to AI Blog
Prompt Engineering Foundations: Universal Principles That Still Work
 AI

Prompt Engineering Foundations: Universal Principles That Still Work


New
Prompt Engineering LLM Chain Of Thought Few-Shot Prompting

Table of Contents

  1. Why This Still Matters
  2. The Six Universal Principles
  3. What Does NOT Work
  4. Practical Toolkit
  5. Key Takeaway

Why This Still Matters

Prompt engineering has matured rapidly since ChatGPT’s release in late 2022. What began as ad hoc experimentation has become a structured discipline with documented best practices from every major AI provider. Google, OpenAI, Anthropic, and Amazon have all published comprehensive guides, and they agree on more than they disagree.

The core insight across all of these guides: prompt engineering is not about finding magic words. It is about clearly communicating intent. As Anthropic’s documentation puts it: “Think of Claude as a brilliant but new employee who lacks context on your norms and workflows. The more precisely you explain what you want, the better the result.”

Organizations that adopt structured prompt engineering frameworks report an average 67% productivity improvement across AI-enabled processes.


The Six Universal Principles

1. Be Clear and Specific

Vagueness is the single most common prompting failure. Asking for “a summary” without specifying length, format, audience, or focus gives the model too much room for interpretation.

Less effective:

Explain climate change.

More effective:

Write a 3-paragraph summary of the primary causes of climate change for high school students.
Use bullet points for each cause. Maintain a neutral, informational tone.

Google’s Gemini documentation emphasizes four input types: question input (direct queries), task input (specific actions), entity input (defined objects to operate on), and completion input (content to continue). Recognizing which type you are providing helps structure the prompt.

Practical test from Anthropic: Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they would be confused, the model will be too.

2. Provide Context, Not Just Instructions

Research consistently shows that context is “underrated and massively impactful.” Simply providing relevant background information can drastically improve performance --- often more than any clever prompting technique.

Less effective:

NEVER use ellipses.

More effective:

Your response will be read aloud by a text-to-speech engine, so never use ellipses
since the text-to-speech engine will not know how to pronounce them.

The model can generalize from the explanation. When it understands why a rule exists, it applies the spirit of the rule more reliably than when it only knows the letter of it. This applies to business context too --- models do not know your company’s privacy policies, database schemas, or edge cases. Include all relevant constraints upfront.

3. Use Few-Shot Examples

Few-shot prompting --- providing input-output examples before the actual task --- is the single most reliable technique for steering output format, tone, and structure.

Best practices:

  • Make them relevant: mirror your actual use case closely
  • Make them diverse: cover edge cases and vary enough to prevent unintended patterns
  • Use structural delimiters: wrap in tags (Anthropic recommends <example> tags, Bedrock uses <example> with H: and A: delimiters) so the model distinguishes them from instructions
  • Include 3-5 examples for best results; research shows diminishing returns beyond 4-5
  • Show positive patterns, not anti-patterns: demonstrate what good output looks like

Google adds the concept of prefixes --- labeling inputs and outputs (e.g., “English:” / “French:”) to help the model understand the mapping.

4. Structure Your Prompts with Delimiters

Delimiters help the model parse different sections unambiguously. All major providers recommend them, though they differ on format:

FormatBest ForProvider Preference
XML tagsComplex multi-section prompts, ClaudeAnthropic, OpenAI (strong in testing)
Markdown headersGeneral use, hierarchical structureGoogle, OpenAI (starting point)
JSONStructured data (use cautiously)Poor in multi-document tests

OpenAI’s GPT-4.1 testing found XML and delimited formats significantly outperformed JSON for multi-document contexts. JSON’s escaping overhead and verbosity degraded performance.

Anthropic recommends consistent, descriptive tag names with nesting:

<documents>
  <document index="1">
    <source>annual_report.pdf</source>
    <document_content>{{CONTENT}}</document_content>
  </document>
</documents>

Delimiters also defend against prompt injection by clearly separating instructions from user-provided content.

5. Give the Model Room to Think

Chain-of-thought (CoT) prompting --- encouraging step-by-step reasoning --- improves accuracy for multi-step problems. The simplest implementation: add “Think step by step.”

Tradeoffs:

  • Cost: 3-5x increase in token usage
  • Compute: reasoning chains can consume 10x more compute
  • Model size: benefits most pronounced with 100B+ parameter models

Structured Chain-of-Thought (SCoT) combines structured formatting with CoT reasoning. Research shows SCoT outperforms standard CoT by up to 13.79% on HumanEval for code generation.

Key distinction: Prompt chaining breaks tasks into sequential API calls. Chain-of-thought encourages reasoning within a single prompt. Both useful, for different situations.

Critical caveat for reasoning models: OpenAI’s o3 and o4-mini generate internal chains of thought. Adding explicit “think step by step” to reasoning models can degrade performance.

6. Iterate and Refine

Every official guide emphasizes that prompt engineering is empirical and iterative. OpenAI’s GPT-4.1 guide: “AI engineering is inherently an empirical discipline. Large language models are inherently nondeterministic.”

The iteration cycle:

  1. Start with a baseline prompt
  2. Test across diverse inputs
  3. Identify failure patterns
  4. Make targeted adjustments (rewording, adding examples, clarifying constraints)
  5. Compare outputs
  6. Repeat

As prompts move into production, they should be “hardened and optimized like production code.”


What Does NOT Work

Role Prompting Is Overrated

Despite widespread use of “You are a math professor” prefixes, research shows role prompts have “little to no effect on improving correctness.” The Prompt Report (1,500+ papers analyzed) found role prompting was the most frequently mentioned technique (105 mentions) but not the most effective. It influences tone and style, not accuracy. Context and examples do far more.

Longer Is Not Always Better

Prompt compression research shows 50-65% token reductions are possible without sacrificing output quality. The goal is signal density, not volume.

Threatening the AI

“If you don’t answer correctly, you will be shut down” --- these manipulation tactics no longer work and are actively counterproductive with modern models.

Negative Framing

“Don’t use markdown” is less effective than “Write in flowing prose paragraphs.” Models respond better to positive instructions that describe what to do.


Practical Toolkit

OpenAI recommends this ordering for system prompts:

# Role and Objective
# Instructions
## Sub-categories for detailed instructions
# Reasoning Steps
# Output Format
# Examples
# Context
# Final instructions and step-by-step thinking prompt

Universal Checklist

Before sending any prompt, verify:

  1. Is the desired output format specified?
  2. Is there enough context for the task?
  3. Are there examples demonstrating the expected pattern?
  4. Are instructions ordered logically (most important first)?
  5. Are different sections clearly delimited?
  6. Have you tested with varied inputs?

Key Takeaway

The fundamentals --- clarity, context, examples, structure, reasoning scaffolds, and iteration --- remain the highest-leverage techniques available. They work across models, are well-documented by every major provider, and require no special tooling.

Before reaching for advanced techniques, ask: Is my prompt clear? Does it include relevant context? Have I shown examples? If any answer is no, start there.


Next in the series: Model-Specific Prompting --- how Claude, GPT, and Gemini each require different approaches despite sharing these universal foundations.

Sources & References

  1. Anthropic - Prompting Best Practices for Claude (accessed 2026-03-08)
  2. OpenAI - Prompt Engineering Guide (accessed 2026-03-08)
  3. OpenAI - GPT-4.1 Prompting Guide (accessed 2026-03-08)
  4. Google - Gemini API Prompting Strategies (accessed 2026-03-08)
  5. Amazon - Prompt Engineering Concepts (Bedrock) (accessed 2026-03-08)
  6. The Prompt Report - Survey of 1,500+ Papers (arXiv) (accessed 2026-03-08)
  7. Structured Chain-of-Thought Prompting for Code Generation (TOSEM 2025) (accessed 2026-03-08)
  8. AI Prompt Engineering in 2025: What Works (Schulhoff / Lenny's Newsletter) (accessed 2026-03-08)
  9. The Ultimate Guide to Prompt Engineering in 2026 (Lakera) (accessed 2026-03-08)
  10. Chain of Thought Prompting Overview (SuperAnnotate) (accessed 2026-03-08)
  11. OpenAI - Reasoning Best Practices (o3/o4-mini) (accessed 2026-03-08)

Sources compiled from official documentation by Anthropic, OpenAI, Google, and Amazon, supplemented by academic research (The Prompt Report, SCoT paper) and expert analysis. This article is part 1 of a 4-part series on prompt engineering in the agentic era, drawing from the research conducted across all major AI provider guides and community resources.