Redshift

  • Petabyte scale Data warehouse.
  • It is OLAP and Column based.
  • Data on S3 can be queried directly without loading into Redshit using Redshift Spectrum
  • Federated query allows querying of multiple remote databases
  • It is server based and provisioned
  • Redshift runs across multiple nodes connected by high speed network. Hence it runs in one AZ. Thus not highly available. It is VPC service
  • Leader Node - Takes query input, creates execution plan,performs aggregation
  • Compute Node - Perform actual queries. Each compute node is divided into slices
  • Slices work in parallel
  • A node can have 2,4,16 or 32 slices
  • Enhanced VPC routing needs to be enabled to perform advanced VPC configurations.
  • Data is automatically replicated across to additional node other than writer, when being written
  • Automatic backups happen to S3 every 8 hours of 5 GB data written to cluster with 1 day retention by default. Manual snapshots can be created.