Back
100 Common AI terminologies for QA

100 Common AI terminologies for QA

🧠
A2A Protocol
Application-to-Application communication method. QA may test for these automated interactions are secure and behave as expected.
πŸ“¦
Acceptance Thresholds (Metrics)
Minimum performance levels for release. The product team defines these threholds and QA would assess if the model is ready for deployment.
πŸͺ„
Accuracy (Metric)
Percentage of correct predictions. QA uses it for assessing overall model correctness on test sets.
βš™οΈ
Adaptive Testing
Testing that changes as the AI evolves. QA continuously adjusts test scenarios to validate model evolving behavior.
🎯
Adversarial
Intentional input manipulation to mislead models. QA tests model robustness against such attacks.
πŸ–ΌοΈ
AGI
Artificial General Intelligence: A theoretical AI that can understand and perform any task like a human. It’s essential for understanding AI evolution.
βš™οΈ
AI Agents
Autonomous components that can make decisions and perform tasks without human intervention. QA may need to test for autonomy, reliability, and safety.
πŸ“¦
AI Alignment
Ensuring AI behaviors stay aligned with human goals and ethical expectations. QA often checks for safety, fairness, and unintended behaviors.
πŸš€
AI Model
A trained system (e.g., neural network) that maps inputs to outputs. Testers often validate it against expected performance, accuracy, and edge cases.
🌐
AI Wrapper
A software layer or application that simplifies access to AI models. QA would test like testing any software application in the context of AI-driven functionalities.
πŸ“
Algorithm
The underlying set of rules or logic that the AI follows. Testers may verify logical consistency and traceability.
πŸ”
ANN (Artificial Neural Network)
A machine learning model inspired by the brain. QA may test the outputs for accuracy, edge case handling, overfitting/underfitting issues, etc.
πŸ“
Artificial Intelligence
Computer systems that mimic human intelligence to solve problems. QA testers evaluate how reliably and safely they perform tasks.
πŸ“Š
Assistant Role (In Prompt Engineering)
The AI’s response behavior. QA tests for correctness, tone, and task-following accuracy.
πŸ’‘
Autonomous
Refers to systems that operate independently. QA tests to validate if agents predictably as expected and safely.
πŸ“
Bias
When a model shows unfair preferences. QA tests for performance across diverse groups to detect bias.
🎯
Bidirectional Testing
Evaluates how input affects output and how outputs may influence subsequent inputs or prompts. QA tests both directions to detect inconsistencies or circular dependencies.
🌐
Chain-of-Thought (CoT) Prompting
Encourages the AI to reason step-by-step. QA checks each reasoning step for correctness and coherence.
πŸ“¦
Chatbot
An AI system that interacts with users in natural language. Testers evaluate language understanding, edge cases, and intent coverage.
πŸ“Š
CNN (Convolutional Neural Network)
A neural network best for image processing. QA tests image classification accuracy and robustness to distortions.
πŸ€–
Computer Vision
AI interpreting visual input. QA tests include edge detection, object recognition accuracy, and robustness to noise.
πŸ“š
Concept Drift
This happens when what a model is predicting starts to mean something different over timeβ€”like when β€œfraud” in financial data changes due to new scam tactics. QA testers check if the model’s predictions still match current realities and whether labels need to be updated.
🌐
Context
Relevant data history an AI uses for decision-making. Testers test for it’s properly maintained and used.
πŸ› οΈ
Context Injection Testing
Simulates scenarios where contextual inputs could manipulate or interfere with system prompts. QA verifies prompt isolation and defense layers.
πŸš€
Context Window
The max input text an LLM can consider. QA checks truncation effects and response degradation.
πŸš€
Continuous Learning
AI improves by learning from new data. QA tests for model updates while it does not introduce regressions.
πŸ“ˆ
Co-piloting
AI-assisted development or testing support. QA validates suggestion accuracy, usefulness, and workflow integration.
πŸ› οΈ
Customize ChatGPT
Modifying ChatGPT’s behavior using system instructions. It’s a useful tool to customize ChatGPT for better quality of the outputs.
πŸš€
Data Dependency
AI relies heavily on training data. QA test for diverse, representative, and high-quality datasets.
🌐
Data Drift
When incoming data shifts from training data. QA detects shifts that could lead to performance and accuracy drops.
πŸͺ„
Data Science
Field focused on extracting insights from data. QA may validate models built by data scientists and the correctness of data pipelines.
πŸ“š
Dataset
A collection of data used for training or testing models. QA checks data quality, labeling accuracy, and bias.
πŸ“ˆ
Deep Learning
Subset of ML with layered neural networks. QA may test for generalization, overfitting, and inference correctness.
πŸ€–
Embedding
Numeric vectors that represent words/phrases. QA validates clustering, similarity scores, and semantic consistency.
πŸ“ˆ
Ethics or Ethical Considerations
Responsible AI usage. QA tests for bias, misuse, and harmful outcomes.
πŸ› οΈ
Explainability
How well humans can understand AI decisions. QA checks that explanations are available, truthful, and useful.
πŸ”’
F1-Score (Metric)
Balance between precision and recall. QA uses it for performance in imbalanced datasets.
πŸ’‘
Fairness
Equity in model outputs across users. QA tests for no group is disadvantaged unfairly.
πŸ”’
Few-shot Prompting
AI is given a few examples before answering. QA validates learning from small examples and matching output format.
πŸ’‘
Fine-tuning
Additional training on specific data. QA compares pre- and post-finetuned performance.
πŸͺ„
Foundation Model
A large pre-trained model adaptable to many tasks. QA evaluates adaptability, efficiency, and risks of over-generalization.
πŸ–ΌοΈ
Frequency Penalty
Reduces repetition by lowering likelihood of repeated words. QA evaluates its effect on redundancy and coherence.
πŸ”’
G.U.I.D.E. Prompting Formular
A structured way (Goals-Users-Instrucions-Details-Examples) to design effective prompts. QA uses it construct better prompts to elicit consistent and relevant AI outputs.
πŸ’‘
Generative AI
AI that creates content (text, images, etc.). QA validates creativity vs. control, safety, and factuality.
🧠
GPU
Graphics hardware used to run AI efficiently. QA may monitor GPU utilization and performance under load.
πŸ–ΌοΈ
Ground Truth
The labeled data used as the correct answer during testing. Testers validate model outputs against ground truth.
πŸš€
Hallucination
When the model generates false or misleading content. Testers identify and report incorrect or nonsensical outputs.
πŸ”’
Human-in-the-Loop (HITL) Testing
Humans validate or correct model decisions. QA includes HITL as part of accuracy assurance.
πŸ“ˆ
Inference
The model making predictions from inputs. QA test for predictions are accurate, fast, and explainable.
πŸ“Š
Instruction Collision Testing
Tests what happens when multiple instructions conflict. QA checks if the model prioritizes or blends conflicting directions responsibly.
πŸͺ„
LLM (Large Language Model)
A massive text-trained model like GPT-4. QA tests outputs for correctness, reliability, and adherence to constraints.
πŸ“¦
Machine Learning
AI that learns from data. QA tests ML systems across different datasets, distributions, and use cases.
🧠
Max Tokens
The maximum number of words or characters the AI can generate. QA testers validate that responses respect this limit and don’t truncate unexpectedly.
πŸ”
MCP (Model Context Protocol)
A protocol framework for passing background informationβ€”like user history, settings, or task dataβ€”to AI models so they can respond more accurately. QA testers validate that this context is delivered correctly and that the model behaves appropriately based on it.
πŸ”
MLOps
Operational practices for ML lifecycle. QA may test deployment stability, versioning, and monitoring processes.
πŸ”’
Model Architecture
The design or structure of the AI model. QA may test performance differences across architectures.
βš™οΈ
Model Drift
When model behavior changes over time. QA monitors for accuracy degradation or unexpected predictions.
πŸ“š
Multimodal Testing
Validates models that use more than one input type (e.g., text + image). QA checks if the system processes and combines inputs correctly.
🧠
Natural Language Processing (NLP)
AI that processes and understands human language. Testers validate intent recognition, language accuracy, and contextual responses.
πŸ“š
Neural Network
A type of model inspired by the human brain. QA may test for learning is effective and interpretable where possible.
βš™οΈ
Overfitting
When the model performs well on training data but poorly on new data. QA uses holdout or test sets to detect it.
πŸ’‘
Parameters
The internal weights the model learns. QA doesn’t directly test them, but they influence output quality.
πŸ€–
Precision (Metric)
Correctness among positive predictions. QA uses it to check for false positives.
πŸ”
Presence Penalty
Discourages using words already mentioned. QA checks for novelty and logical flow in results.
πŸ“Š
Pre-trained Transformer
A model already trained on large data and adapted for new tasks. QA tests adaptation quality and residual knowledge.
πŸ”
Pretraining
Initial large-scale training on generic data. QA may test for knowledge transfer during fine-tuning.
πŸ”
Privacy
Protection of user data. QA checks for data leaks, logging issues, and compliance with data handling policies.
🌐
Probabilistic Outputs
AI provides outputs with varying confidence. QA evaluates certainty thresholds and result variation.
πŸͺ„
Prompt Engineering
Crafting input prompts to guide model behavior. QA creates prompt suites to test consistency and compliance.
🧩
Prompt Entanglement Testing
Examines whether earlier prompts unintentionally affect later responses. QA ensures prompt sessions are scoped correctly.
🧠
RAG (Retrieval-Augmented Generation)
A model architecture that uses search results to enhance output. QA validates grounding, retrieval accuracy, and source attribution.
πŸͺ„
Reasoning Model
Models designed for logical tasks. QA tests chain of thought, deduction accuracy, and reasoning robustness.
🧩
Recall (Metric)
Ability to find all relevant positives. QA uses it to check for false negatives.
🧠
Reinforcement Learning
Learning through reward signals. QA verifies stability, convergence, and policy safety.
πŸ€–
Repeatability Testing
Checks whether a model consistently gives the same output when run multiple times under the same conditions. QA uses it to assess determinism and debugging reliability.
πŸͺ„
Response Format
The structure or schema of the model’s output (e.g., JSON, markdown). QA verifies that formatting meets application needs and is parseable.
πŸ› οΈ
RNN (Recurrent Neural Network)
A network for sequential data (e.g., text, time series). QA checks for memory of past inputs and proper sequence predictions.
πŸ”’
Robustness
Stability of model outputs under noisy, diverse, or adversarial inputs. QA stress-tests edge conditions and rare cases.
βš™οΈ
Semi-supervised Learning (Algorithm)
A method using both labeled and unlabeled data. QA may tests for generalization and monitors performance gaps.
πŸ–ΌοΈ
Sensitivity Analysis
Analyzes how small changes in input affect output. QA uses it to identify fragile or overly sensitive behaviors.
πŸ“š
Stop Sequences
Text that tells the AI where to stop generating. QA verifies proper cutoffs to avoid extra or incomplete responses.
πŸ€–
Supervised Learning
Training with labeled data. QA uses test sets and metrics (accuracy, F1) to validate.
πŸš€
System Role (In Prompt Engineering)
Defines the AI’s behavior context (e.g., expert, assistant). QA test for role consistency and output alignment.
πŸ€–
Temperature
Affects randomnessβ€”higher values produce more creative, varied responses. QA tunes this for output consistency vs creativity.
🎯
Text-to-Image Generation
Creating images from text prompts. QA assesses image quality, prompt alignment, and potential harm.
πŸ”
Tokenization
Splitting text into tokens for processing. QA test for token limits and structure aren’t broken.
πŸ”
Top P
Controls randomness by limiting tokens to a cumulative probability. QA tests how this affects output diversity and determinism.
🌐
TPU
Tensor Processing Unit – Google’s AI chip.
πŸ“¦
Training
The full process of model learning. QA checks for learning progress, convergence, and generalization.
πŸ€–
Training Data
The dataset used to teach the model. QA assesses quality, diversity, and labeling accuracy.
πŸ’‘
Transformer
Model design used in most LLMs. QA focuses on attention behaviors, context limits, and generation stability.
βš™οΈ
Transformer Model
A model using attention mechanisms. QA test for handling of long sequences and correct interpretation.
πŸ”
Tree-of-Thought (ToT) Prompting
AI explores multiple reasoning paths like decision trees. QA test for completeness and evaluates path quality.
πŸ“š
Unsupervised Learning
Learning without labeled data. QA checks discovered patterns and use in downstream tasks.
🌐
User Role (In Prompt Engineering)
The prompt or question from the human user. QA designs diverse prompts to validate AI understanding.
πŸ”’
Variance/Configuration Testing
Tests how changes in settings (e.g., temperature, top_p) affect model behavior. QA evaluates performance and stability across configurations.
πŸ”
Vibe Coding
Natural language-driven coding tools. QA tests IDE integration, code accuracy, and developer prompts.
πŸ“ˆ
Weights
Numerical values the model adjusts during training. QA indirectly assesses them via output quality.
πŸ’‘
Zero-Shot Learning
Making predictions without task-specific training. QA assesses generalization and failure cases.
πŸ“š
Zero-shot Prompting
AI performs a task without specific examples. QA evaluates if it handles generalization effectively.