GenAI: Capabilities, Limits & Use Cases
Where generative AI shines, where it fails (hallucinations and friends), and how to pick the right use case.
What generative AI is great at
Key points
- Text generation & summarization — drafts, reports, meeting notes.
- Chatbots & virtual assistants — customer support, internal helpdesks.
- Code generation — writing, explaining, and reviewing code.
- Translation & rewriting — tone changes, simplification, localization.
- Search & question answering — especially when combined with company data (RAG).
- Image/audio/video generation — marketing assets, product mockups.
- Extraction & classification — pulling structured facts from messy text.
The limitations you MUST know
| Limitation | What it means | Mitigation |
|---|---|---|
| Hallucination | The model confidently generates false or fabricated information | RAG with trusted sources, human review, guardrails |
| Nondeterminism | The same prompt can produce different outputs each time | Lower temperature; don't expect exact repeatability |
| Knowledge cutoff | The model only knows what existed in its training data | RAG or tools to supply current information |
| Bias & toxicity | Models can reproduce biases present in training data | Curated data, guardrails, evaluation, human oversight |
| Prompt sensitivity | Small wording changes can change results significantly | Prompt engineering and testing |
| Cost & latency | Big models are slower and pricier per call | Pick the smallest model that meets the need |
Hallucination is the single most-tested limitation. If a scenario says the chatbot "invented an answer" or "cited a nonexistent policy," the fix is almost always RAG grounded in company data, plus human oversight for high-stakes decisions.
Choosing GenAI (or not)
Generative AI suits open-ended, language-heavy tasks with human review. It's the *wrong* tool when you need deterministic, auditable answers (use rules), precise numeric prediction (use classic ML — e.g., forecasting demand is Amazon Forecast territory, not an LLM), or when errors are unacceptable without review. Judge success with both technical quality and business metrics: cost per interaction, deflection rate, user satisfaction, revenue impact.
An LLM-powered support bot confidently tells a customer about a refund policy that does not exist. What is this failure called?