Redshift
- Petabyte scale Data warehouse.
- It is OLAP and Column based.
- Data on S3 can be queried directly without loading into Redshit using Redshift Spectrum
- Federated query allows querying of multiple remote databases
- It is server based and provisioned
- Redshift runs across multiple nodes connected by high speed network. Hence it runs in one AZ. Thus not highly available. It is VPC service
- Leader Node - Takes query input, creates execution plan,performs aggregation
- Compute Node - Perform actual queries. Each compute node is divided into slices
- Slices work in parallel
- A node can have 2,4,16 or 32 slices
- Enhanced VPC routing needs to be enabled to perform advanced VPC configurations.
- Data is automatically replicated across to additional node other than writer, when being written
- Automatic backups happen to S3 every 8 hours of 5 GB data written to cluster with 1 day retention by default. Manual snapshots can be created.