joshita / dev

Published

- 2 min read

Apache Snowflake

img of Apache Snowflake

Details about Apache Snowflake

  1. Snowflake uses a multi-cluster shared data architecture that separates compute from storage, enabling elastic scaling.
  2. Built on cloud infrastructure (AWS, Azure, GCP) and does not rely on Hadoop.
  3. Supports structured and semi-structured data (e.g., JSON, Parquet, Avro).
  4. Automatic query optimization without requiring indexes or tuning.
  5. End-to-end encryption for data at rest and in transit.
  6. Fully managed, with no infrastructure maintenance required.
  7. Simple user interface and support for standard SQL.
  8. Connects with tools like Tableau, Power BI, and Apache Spark.
  9. Optimized query execution and concurrency handling.
  10. No on-premise version, entirely cloud-dependent.

Snowflake Multi Cluster Architecture Centralized Storage: 1.All data is stored centrally in Snowflake’s managed cloud storage. 2.This storage is shared across all compute clusters, and it is automatically optimized and compressed. 3. Virtual warehouses are independent compute clusters that execute queries and other data-processing tasks. Each warehouse is isolated, so workloads on one do not impact others.

Where Is Snowflake Used

  1. High concurrency : Enables many users to run queries simultaneously without performance degradation. Useful for real-time analytics, dashboards, or BI tools where multiple queries run at once.
  2. Workload isolation : Separate workload etl jobs or interactive analysis
  3. Elastic scaling : Automatically scales up an down during peak and less load
  4. Cost efficiency : You pay only for compute as you use, auto scaling helps reduce cost as well
  5. Its a query interface, create a database, create tables, load data from S3 in desired format and you can query in a structured way
  6. Great for sql analytics, fully managed service

Snowflake Weakness

  1. Lacks advanced real-time stream processing capabilities.
  2. Dependent on CLOUD providers, its onlt open source as opposed to hive which can run on hadoop data

Some Competitiors of Snowflake

  1. Big Query
  2. Amazon redshift
  3. Apache hive (open source)
  4. Presto(trino)