Snowflake PHI Detection

Learn how to detect protected health information (PHI) in Snowflake environments. Follow step-by-step guidance for HIPAA compliance.

Why It Matters

The core goal is to identify every location where protected health information is stored within your Snowflake environment, so you can remediate unintended exposures before they become breaches. Scanning for PHI in Snowflake is a priority for healthcare organizations subject to HIPAA, as it helps you prove you've discovered and accounted for all sensitive healthcare assets—mitigating the risk of data exposure.

Primary Risk: Data exposure of protected health information

Relevant Regulation: HIPAA Health Insurance Portability and Accountability Act

A thorough scan delivers immediate visibility, laying the foundation for automated policy enforcement and ongoing compliance.

Prerequisites

Permissions & Roles

  • Snowflake ACCOUNTADMIN or SECURITYADMIN role
  • USAGE privileges on databases and schemas
  • SELECT privileges on tables containing PHI

External Tools

  • Snowflake CLI or SnowSQL
  • Cyera DSPM account
  • API credentials

Prior Setup

  • Snowflake account provisioned
  • Database and schema structure defined
  • Network policies configured
  • Business Associate Agreement (BAA) in place

Introducing Cyera

Cyera is a modern Data Security Posture Management (DSPM) platform that discovers, classifies, and continuously monitors your sensitive data across cloud services. By automating the discovery of PHI in Snowflake using advanced AI and Named Entity Recognition (NER) models, Cyera ensures you stay ahead of accidental exposures and meet HIPAA audit requirements in real time. The platform's AI-powered classification engine can identify medical record numbers, patient names, diagnosis codes, and other PHI patterns with high accuracy.

Step-by-Step Guide

1
Configure your Snowflake workspace

Ensure proper role hierarchy is established and create a service account with the minimum required privileges for scanning healthcare data tables.

snowsql -a <account_identifier> -u

2
Enable scanning workflows

In the Cyera portal, navigate to Integrations → DSPM → Add new. Select Snowflake, provide your account details and service principal credentials, then define the scan scope to include healthcare databases and patient data tables.

3
Integrate with third-party tools

Configure webhooks or streaming exports to push PHI detection results into your healthcare SIEM or Security Hub. Link findings to existing compliance tracking systems like GRC platforms or healthcare audit tools.

4
Validate results and tune policies

Review the initial PHI detection report, prioritize tables with large volumes of patient data, and adjust detection rules to reduce false positives on synthetic or de-identified datasets. Schedule recurring scans to maintain HIPAA compliance visibility.

Architecture & Workflow

Snowflake Information Schema

Source of metadata for databases, schemas, and tables

Cyera Connector

Pulls metadata and samples healthcare data for classification

Cyera AI Engine

Applies NER models and PHI detection algorithms

HIPAA Reporting & Remediation

Compliance dashboards, alerts, and remediation playbooks

Data Flow Summary

Enumerate Databases Send to Cyera Apply PHI Detection Route Findings

Best Practices & Tips

Performance Considerations

  • Start with incremental scans on patient data tables
  • Use statistical sampling for very large healthcare datasets
  • Schedule scans during off-peak hours

Tuning PHI Detection Rules

  • Maintain allowlists for synthetic patient datasets
  • Adjust confidence thresholds for medical terminology
  • Configure custom patterns for organization-specific PHI

Common Pitfalls

  • Overlooking temporary tables with patient data
  • Missing external stages containing healthcare files
  • Neglecting to scan shared databases from partners