Snowflake PHI Detection
Learn how to detect protected health information (PHI) in Snowflake environments. Follow step-by-step guidance for HIPAA compliance.
Why It Matters
The core goal is to identify every location where protected health information is stored within your Snowflake environment, so you can remediate unintended exposures before they become breaches. Scanning for PHI in Snowflake is a priority for healthcare organizations subject to HIPAA, as it helps you prove you've discovered and accounted for all sensitive healthcare assets—mitigating the risk of data exposure.
A thorough scan delivers immediate visibility, laying the foundation for automated policy enforcement and ongoing compliance.
Prerequisites
Permissions & Roles
- Snowflake ACCOUNTADMIN or SECURITYADMIN role
- USAGE privileges on databases and schemas
- SELECT privileges on tables containing PHI
External Tools
- Snowflake CLI or SnowSQL
- Cyera DSPM account
- API credentials
Prior Setup
- Snowflake account provisioned
- Database and schema structure defined
- Network policies configured
- Business Associate Agreement (BAA) in place
Introducing Cyera
Cyera is a modern Data Security Posture Management (DSPM) platform that discovers, classifies, and continuously monitors your sensitive data across cloud services. By automating the discovery of PHI in Snowflake using advanced AI and Named Entity Recognition (NER) models, Cyera ensures you stay ahead of accidental exposures and meet HIPAA audit requirements in real time. The platform's AI-powered classification engine can identify medical record numbers, patient names, diagnosis codes, and other PHI patterns with high accuracy.
Step-by-Step Guide
Ensure proper role hierarchy is established and create a service account with the minimum required privileges for scanning healthcare data tables.
In the Cyera portal, navigate to Integrations → DSPM → Add new. Select Snowflake, provide your account details and service principal credentials, then define the scan scope to include healthcare databases and patient data tables.
Configure webhooks or streaming exports to push PHI detection results into your healthcare SIEM or Security Hub. Link findings to existing compliance tracking systems like GRC platforms or healthcare audit tools.
Review the initial PHI detection report, prioritize tables with large volumes of patient data, and adjust detection rules to reduce false positives on synthetic or de-identified datasets. Schedule recurring scans to maintain HIPAA compliance visibility.
Architecture & Workflow
Snowflake Information Schema
Source of metadata for databases, schemas, and tables
Cyera Connector
Pulls metadata and samples healthcare data for classification
Cyera AI Engine
Applies NER models and PHI detection algorithms
HIPAA Reporting & Remediation
Compliance dashboards, alerts, and remediation playbooks
Data Flow Summary
Best Practices & Tips
Performance Considerations
- Start with incremental scans on patient data tables
- Use statistical sampling for very large healthcare datasets
- Schedule scans during off-peak hours
Tuning PHI Detection Rules
- Maintain allowlists for synthetic patient datasets
- Adjust confidence thresholds for medical terminology
- Configure custom patterns for organization-specific PHI
Common Pitfalls
- Overlooking temporary tables with patient data
- Missing external stages containing healthcare files
- Neglecting to scan shared databases from partners