Snowflake Analytics Data Detection
Learn how to detect analytics data in Snowflake environments. Follow step-by-step guidance for SOC 2 compliance.
Why It Matters
The core goal is to identify every location where analytics data is stored within your Snowflake environment, so you can remediate unintended exposures before they become breaches. Scanning for analytics data in Snowflake is a priority for organizations subject to SOC 2, as it helps you prove you've discovered and accounted for all sensitive analytical assets—mitigating the risk of shadow data repositories that exist outside your governance framework.
A thorough scan delivers immediate visibility, laying the foundation for automated policy enforcement and ongoing compliance.
Prerequisites
Permissions & Roles
- Snowflake ACCOUNTADMIN or SYSADMIN role
- Database and schema read privileges
- Ability to create service accounts
External Tools
- Snowflake CLI or SnowSQL
- Cyera DSPM account
- API credentials
Prior Setup
- Snowflake account provisioned
- Network policies configured
- Authentication method established
- Data governance framework defined
Introducing Cyera
Cyera is a modern Data Security Posture Management (DSPM) platform that discovers, classifies, and continuously monitors your sensitive data across cloud services. By leveraging advanced AI and natural language processing (NLP) techniques, Cyera automatically identifies analytics data patterns, dashboard queries, and reporting datasets in Snowflake, ensuring you stay ahead of shadow data proliferation and meet SOC 2 audit requirements in real time.
Step-by-Step Guide
Create a dedicated service account with appropriate read permissions across all databases and schemas. Configure network access policies to allow Cyera's scanning infrastructure.
In the Cyera portal, navigate to Integrations → DSPM → Add new. Select Snowflake, provide your account URL and service account credentials, then configure detection rules specifically for analytics data patterns like aggregated tables, BI views, and reporting schemas.
Configure scheduled scans to monitor for new analytics workloads, unauthorized data sharing, and changes in data classification. Set up alerts for when analytics data appears in unexpected locations or with improper access controls.
Review the initial detection report, categorize analytics datasets by sensitivity and business criticality, and establish data lineage tracking. Create policies to prevent analytics data from being inadvertently shared or copied to unsecured environments.
Architecture & Workflow
Snowflake Information Schema
Source of metadata for tables, views, and queries
Cyera Connector
Pulls metadata and analyzes query patterns
AI Classification Engine
Applies ML models to identify analytics workloads
Governance Dashboard
Risk scoring, alerts, and remediation workflows
Data Flow Summary
Best Practices & Tips
Performance Considerations
- Schedule scans during off-peak hours
- Use query result sampling for large datasets
- Implement incremental scanning for frequent updates
Analytics Data Identification
- Focus on aggregated tables and materialized views
- Monitor query history for analytical patterns
- Track data sharing and external access
Common Pitfalls
- Missing temporary analytics tables in transient databases
- Overlooking shared analytics data across accounts
- Failing to track data pipeline transformations