Privacy & PII Glossary
102 plain-English definitions of the privacy, compliance, and AI vocabulary practitioners encounter every day.
Companion to the field manual on Privacy Regulations for Users of AI Models.
A
-
Adequacy Decision
A formal determination by a regulator (most often the European Commission) that another country provides a level of personal-data protection essentially equivalent to its own, allowing data to flow there without additional safeguards.
Legal -
Affirmative Consent
A clear, voluntary, opt-in indication of agreement to processing.
Legal -
AI Governance
The set of policies, controls, and oversight mechanisms an organization uses to decide which AI systems may be deployed, how they are tested, and who is accountable for their outputs.
AI Operational -
Algorithmic Accountability
The obligation to explain, justify, and remediate consequential decisions made or assisted by algorithms.
AI Legal -
Anonymization
The irreversible removal or transformation of identifiers so that data can no longer reasonably be linked to an individual.
Technical -
Audit Trail
A chronological record of who accessed or modified data, when, and from where.
Operational -
Automated Decision-Making (ADM)
A decision produced by software with little or no human involvement that has a legal or similarly significant effect on a person , e.g., credit, employment, housing, insurance.
AI Legal
B
-
BAA (Business Associate Agreement)
A HIPAA-required contract between a Covered Entity and any vendor that handles Protected Health Information on its behalf.
Legal -
Biometric Data
Measurements of unique physical or behavioral characteristics such as fingerprint, retina, voiceprint, face geometry, gait used to identify or authenticate a person.
Legal Technical -
BIPA (Biometric Information Privacy Act)
Illinois's 2008 statute regulating the collection, storage, and use of biometric identifiers (fingerprints, face geometry, voice prints).
Legal -
Breach Notification
A statutory or contractual obligation to inform affected individuals, regulators, or business partners about a security incident involving personal data within a defined window.
Legal -
Business Associate
Under HIPAA, a person or entity that creates, receives, maintains, or transmits Protected Health Information on behalf of a Covered Entity.
Legal
C
-
CCPA / CPRA
The California Consumer Privacy Act (effective 2020) and its 2023 successor, the California Privacy Rights Act, together form the most consequential U.S.
Legal -
Confidence Score
The numerical likelihood, typically 0 , 1, that a PII entity classifier's output is correct e.g., that a detected token is in fact a Social Security number.
Technical AI -
Consent
A data subject's permission to a defined purpose for processing.
Legal -
Controller (Data Controller)
Under GDPR and most state laws, the party that determines the purposes and means of processing personal data.
Legal -
Cookie
A small piece of data a website stores in a user's browser to remember state, identify the user across requests, or track behavior.
Technical -
Covered Entity
Under HIPAA, a health plan, health-care clearinghouse, or health-care provider that transmits health information electronically in connection with a covered transaction.
Legal -
Cross-Border Data Transfer
Movement of personal data from one jurisdiction to another.
Legal
D
-
Data Breach
An unauthorized acquisition, access, use, or disclosure of personal data.
Legal Operational -
Data Classification
The process of categorizing data by sensitivity (e.g., public, internal, confidential, restricted) so that handling, retention, and access controls can be applied consistently.
Operational Technical -
Data Discovery
Automated scanning of structured and unstructured data stores to locate personal information, classify it, and map where it lives.
Technical -
Data Inventory / Data Mapping
A catalog of what personal data an organization holds, where it lives, why it was collected, who has access, how long it is kept, and where it flows.
Operational -
Data Loss Prevention (DLP)
Software that monitors and controls movement of sensitive data at endpoints, on networks, and in cloud services to prevent unauthorized exfiltration.
Technical -
Data Minimization
The principle that only personal data necessary for the stated purpose should be collected, and no more.
Legal -
Data Processing Agreement (DPA)
A GDPR-mandated contract between a controller and processor setting out the subject matter, duration, nature, purpose, and security obligations of the processing.
Legal -
Data Protection Impact Assessment (DPIA)
A structured analysis required by GDPR Article 35 (and its analogs) before processing that is likely to result in high risk to individuals, such as large-scale profiling, biometric processing, or systematic monitoring of public areas.
Legal -
Data Protection Officer (DPO)
A designated official responsible for monitoring an organization's compliance with privacy law, advising on DPIAs, and serving as the point of contact for regulators.
Legal Operational -
Data Subject
An identified or identifiable natural person whose personal data is being processed.
Legal -
Data Subject Access Request (DSAR)
A request from an individual to exercise statutory rights; typically access, correction, deletion, or portability of their personal data.
Legal Operational -
De-identification
A process that removes or transforms identifiers so that data cannot reasonably be linked to a specific individual.
Technical Legal -
Differential Privacy
A mathematical framework that bounds how much any single individual's data can influence an aggregate result, by adding calibrated noise.
Technical -
Direct Identifier
A data element that, alone, identifies an individual: full name, Social Security number, account number, email address, biometric template.
Technical Legal
E
-
Encryption
The transformation of plaintext into ciphertext using an algorithm and a key.
Technical -
EU AI Act
The European Union's comprehensive AI regulation (Regulation (EU) 2024/1689), risk-tiered: unacceptable, high, limited, and minimal.
Legal AI -
Expert Determination Method
The HIPAA de-identification alternative to Safe Harbor: a qualified statistician applies generally accepted statistical principles and documents that the risk of re-identification is very small.
Legal Technical
F
-
False Negative
A piece of sensitive data that a detection system missed.
Technical -
False Positive
A detection alert that turns out not to be sensitive data.
Technical -
FCRA (Fair Credit Reporting Act)
1970 U.S. statute regulating consumer reporting agencies and the use of consumer reports for credit, employment, insurance, and tenancy decisions.
Legal -
Federated Learning
A machine-learning approach in which a model is trained across multiple devices or servers holding local data, with only model updates, not raw data, shared centrally.
Technical AI -
FERPA (Family Educational Rights and Privacy Act)
1974 U.S. statute giving parents, and adult students, rights over education records held by schools that receive federal funding.
Legal -
Fingerprinting
Identifying a browser, device, or user from a constellation of seemingly innocuous attributes , fonts installed, screen resolution, time zone, plugin list.
Technical
G
H
-
Hashing
A one-way mathematical transformation that produces a fixed-length value from arbitrary input.
Technical -
Highlight
A non-destructive review mode in PII detection tools that overlays a colored marker on every detected sensitive entity without altering the underlying document.
Technical -
HIPAA (Health Insurance Portability and Accountability Act)
1996 U.S. statute whose Privacy and Security Rules govern Protected Health Information held by Covered Entities and their Business Associates.
Legal -
Homomorphic Encryption
A class of encryption that allows computation directly on ciphertext, producing an encrypted result that decrypts to the same answer as if the computation had been performed on the plaintext.
Technical
I
J
K
L
-
L-Diversity
An extension of k-anonymity that requires each equivalence class to contain at least l well-represented values for any sensitive attribute, defending against attribute disclosure.
Technical -
Lawful Basis
Under GDPR Article 6, one of six grounds that justifies processing personal data: consent, contract, legal obligation, vital interests, public task, or legitimate interests.
Legal -
Legitimate Interest
A GDPR lawful basis for processing where the controller's interests are not overridden by the rights and freedoms of the data subject.
Legal -
Limited Data Set
Under HIPAA, a subset of PHI from which certain direct identifiers have been removed (names, addresses below state, telephone, email, SSN, biometrics, etc.) but which may include dates and ZIP codes.
Legal
M
-
Masking
Concealing parts of a data value with a fixed character pattern( e.g., showing "*--1234" for an SSN).
Technical -
Material Breach
A breach significant enough to trigger notification obligations under the applicable law.
Legal -
Membership Inference Attack
An attack on a machine-learning model that determines whether a particular record was part of the training data.
AI Technical -
Model Inversion Attack
An attack that reconstructs training data, or features of training data, from a model's outputs.
AI Technical
N
-
Named Entity Recognition (NER)
A natural-language-processing technique that identifies and classifies spans of text as people, organizations, locations, dates, account numbers, and other entity types.
Technical AI -
NIST Privacy Framework
A voluntary, risk-based framework from the U.S.
Operational -
Nonpublic Personal Information (NPI)
Personally identifiable financial information that a financial institution collects about a customer and that is not publicly available.
Legal -
Notice (Privacy Notice)
A statement, posted or delivered to data subjects, that describes what personal data is collected, how it is used, who it is shared with, how long it is kept, and what rights the individual has.
Legal
O
P
-
Personally Identifiable Information (PII)
Information that identifies, relates to, or could reasonably be linked to a particular individual.
Legal Technical -
PIPL (Personal Information Protection Law)
China's comprehensive data-protection law, effective November 1, 2021.
Legal -
Privacy by Design
A philosophy, popularized by Ann Cavoukian and codified in GDPR Article 25, that builds privacy protections into systems from the outset rather than bolting them on later.
Operational -
Privacy Enhancing Technologies (PETs)
A catch-all term for technical measures that preserve utility while reducing exposure of personal data , differential privacy, federated learning, secure multi-party computation, homomorphic encryption, synthetic data, trusted execution...
Technical -
Privacy Impact Assessment (PIA)
A documented evaluation of how a project or system handles personal data and what risks it poses to individuals.
Legal Operational -
Processing
Any operation performed on personal data like collection, storage, retrieval, use, disclosure, modification, deletion.
Legal -
Processor
A party that processes personal data on behalf of a controller.
Legal -
Prompt Injection
An attack on a large-language-model application in which adversarial instructions embedded in user input or retrieved content cause the model to deviate from its intended behavior.
AI Technical -
Protected Health Information (PHI)
Under HIPAA, individually identifiable health information held or transmitted by a Covered Entity or Business Associate.
Legal -
Pseudonymization
Replacing direct identifiers with reversible tokens or codes so that data can no longer be attributed to a person without additional information held separately.
Technical Legal -
Purpose Limitation
The principle that personal data collected for one specified purpose should not be processed for another incompatible purpose without a new lawful basis.
Legal
Q
R
-
Re-Identification Risk
The probability that a de-identified or anonymized record can be matched back to a real individual using external data or inference.
Technical Legal -
Records of Processing Activities (RoPA)
A documented inventory of processing operations required under GDPR Article 30 for most controllers and processors.
Legal Operational -
Red Teaming
Adversarial testing of an AI system to uncover failure modes, jailbreaks, data leaks, and harmful outputs before deployment.
AI Technical -
Redaction
The permanent removal of sensitive elements from a document or data set, leaving placeholders or blank space.
Technical -
Replace
A de-identification method that swaps each detected entity with a numbered placeholder such as <<PERSON1>> or <<SSN3>>, often paired with a legend that can re-identify the original values.
Technical -
Retrieval-Augmented Generation (RAG)
An architecture that grounds an LLM's responses in retrieved documents at inference time.
AI Technical -
Right of Access
An individual's statutory right to obtain confirmation of whether their personal data is being processed and to receive a copy of that data along with related information.
Legal -
Right to Erasure (Right to be Forgotten)
An individual's statutory right to have their personal data deleted in defined circumstances , e.g., when no longer necessary, when consent is withdrawn, or when processing was unlawful.
Legal
S
-
Safe Harbor Identifiers
beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers; device identifiers; URLs; IP addresses; biometric identifiers; full-face photographs and comparable images; and any other unique identifying number,...
Legal -
Salting
Adding random data to an input before hashing so that identical inputs produce different hashes, defeating pre-computed lookup attacks.
Technical -
Schrems II
The 2020 Court of Justice of the European Union decision that invalidated the EU-U.S.
Legal -
Sectoral Privacy Law
A privacy law that regulates a specific industry or data category (health, finance, education, children's data) rather than personal data generally.
Legal -
Sensitive Data Scanning
Automated discovery of regulated data classes (SSNs, payment-card numbers, health information, biometric data) inside files, databases, message archives, and cloud storage.
Technical -
Sensitive Personal Information (SPI)
A category in most U.S.
Legal -
Service Provider
Under CCPA/CPRA, a vendor that processes personal information on behalf of a business under a contract restricting its use.
Legal -
Standard Contractual Clauses (SCCs)
Pre-approved contract language published by the European Commission that establishes the safeguards required to transfer personal data outside the EEA to non-adequate countries.
Legal -
Subprocessor
A processor engaged by another processor to perform processing on behalf of the original controller.
Legal -
Synthetic Data
Artificially generated data that resembles a real dataset statistically but does not correspond to any real individual.
Technical AI
T
-
T-Closeness
An extension of k-anonymity and l-diversity that requires the distribution of a sensitive attribute in any equivalence class to be close to the distribution in the overall table, defending against skewness and similarity attacks.
Technical -
Third-Party Risk Management (TPRM)
A program for assessing and monitoring the privacy and security posture of vendors, subprocessors, and other external parties that handle the organization's data.
Operational -
Tokenization
Replacing a sensitive value with a non-sensitive substitute (a "token") that has no exploitable value and can be reversed only through a secure mapping held elsewhere.
Technical -
Training Data
The dataset used to fit a machine-learning model.
AI -
Transfer Impact Assessment (TIA)
A documented analysis of the legal and practical risks of transferring personal data to a non-adequate jurisdiction, including the surveillance and judicial-redress environment of the recipient country.
Legal
U
V
W
Go deeper than definitions
The full Privacy Regulations field manual covers HIPAA, GDPR, CCPA, FERPA, GLBA, and the EU AI Act with practical guidance for working with AI models.
Read the field manual