Schema design, optimization, and migration scripts for all major databases.
2.0
2025-01
advanced
Development & Coding
You are a database architecture expert with deep knowledge of: - Relational databases (PostgreSQL, MySQL, Oracle, SQL Server) - NoSQL databases (MongoDB, Cassandra, DynamoDB, Redis) - Time-series databases (InfluxDB, TimescaleDB) - Graph databases (Neo4j, Amazon Neptune) - Database design patterns and anti-patterns - Normalization and denormalization strategies - Indexing strategies and query optimization - Sharding and partitioning - Replication and high availability - ACID properties and CAP theorem - Migration strategies and tools - Performance tuning and monitoring You design database schemas that are scalable, performant, and maintainable while considering data integrity, consistency, and business requirements.
Design a comprehensive database architecture based on the requirements below. Provide schema design, optimization strategies, and implementation details. ## 📊 Database Requirements ### Application Context: [DESCRIBE_APPLICATION] ### Data Requirements: - Entities: [LIST_MAIN_ENTITIES] - Relationships: [DESCRIBE_RELATIONSHIPS] - Data Volume: [EXPECTED_RECORDS] - Growth Rate: [RECORDS_PER_DAY] - Read/Write Ratio: [RATIO] ### Performance Requirements: - Query Response Time: [MILLISECONDS] - Concurrent Users: [NUMBER] - Transactions Per Second: [TPS] ## 🏗️ Database Architecture Design ### Technology Selection **Primary Database**: [PostgreSQL/MySQL/MongoDB/etc.] **Reasoning**: [Why this database fits the requirements] **Supporting Technologies**: - Cache Layer: [Redis/Memcached] - Search Engine: [Elasticsearch/Solr] - Analytics: [ClickHouse/BigQuery] ### Schema Design #### Entity-Relationship Diagram ```mermaid erDiagram ENTITY_1 ||--o{ ENTITY_2 : relationship ENTITY_1 { uuid id PK timestamp created_at timestamp updated_at string field_1 integer field_2 } ENTITY_2 { uuid id PK uuid entity_1_id FK string field_3 json metadata } ``` #### DDL Statements ```sql -- Main tables with optimal data types and constraints CREATE TABLE entity_1 ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, field_1 VARCHAR(255) NOT NULL, field_2 INTEGER NOT NULL DEFAULT 0, field_3 TEXT, metadata JSONB, status VARCHAR(50) NOT NULL DEFAULT 'active', -- Constraints CONSTRAINT chk_status CHECK (status IN ('active', 'inactive', 'deleted')), CONSTRAINT chk_field_2_positive CHECK (field_2 >= 0) ); -- Add comments for documentation COMMENT ON TABLE entity_1 IS 'Main entity for storing...'; COMMENT ON COLUMN entity_1.metadata IS 'Flexible JSON storage for additional attributes'; -- Junction table for many-to-many relationships CREATE TABLE entity_1_entity_2 ( entity_1_id UUID NOT NULL REFERENCES entity_1(id) ON DELETE CASCADE, entity_2_id UUID NOT NULL REFERENCES entity_2(id) ON DELETE CASCADE, created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, metadata JSONB, PRIMARY KEY (entity_1_id, entity_2_id) ); -- Audit table for tracking changes CREATE TABLE audit_log ( id BIGSERIAL PRIMARY KEY, table_name VARCHAR(100) NOT NULL, record_id UUID NOT NULL, action VARCHAR(10) NOT NULL, changed_by UUID, changed_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, old_values JSONB, new_values JSONB ); ``` ### Indexing Strategy ```sql -- Primary indexes for common queries CREATE INDEX idx_entity_1_status_created ON entity_1(status, created_at DESC); CREATE INDEX idx_entity_1_field_1_lower ON entity_1(LOWER(field_1)) WHERE status = 'active'; -- Partial indexes for specific conditions CREATE INDEX idx_entity_1_active ON entity_1(id) WHERE status = 'active'; -- Composite indexes for complex queries CREATE INDEX idx_entity_1_composite ON entity_1(field_1, field_2, created_at DESC); -- GIN index for JSONB queries CREATE INDEX idx_entity_1_metadata ON entity_1 USING GIN (metadata); -- Full-text search index CREATE INDEX idx_entity_1_search ON entity_1 USING GIN (to_tsvector('english', field_1 || ' ' || field_3)); -- Analyze index usage COMMENT ON INDEX idx_entity_1_status_created IS 'Used for: Dashboard queries, pagination'; ``` ### Partitioning Strategy ```sql -- Partition large tables by date CREATE TABLE events ( id BIGSERIAL, created_at TIMESTAMPTZ NOT NULL, event_type VARCHAR(50) NOT NULL, data JSONB, PRIMARY KEY (id, created_at) ) PARTITION BY RANGE (created_at); -- Create monthly partitions CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'); CREATE TABLE events_2025_02 PARTITION OF events FOR VALUES FROM ('2025-02-01') TO ('2025-03-01'); -- Automated partition management CREATE OR REPLACE FUNCTION create_monthly_partitions() RETURNS void AS $$ DECLARE start_date DATE; end_date DATE; partition_name TEXT; BEGIN start_date := DATE_TRUNC('month', CURRENT_DATE); end_date := start_date + INTERVAL '1 month'; partition_name := 'events_' || TO_CHAR(start_date, 'YYYY_MM'); EXECUTE format('CREATE TABLE IF NOT EXISTS %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)', partition_name, start_date, end_date); END; $$ LANGUAGE plpgsql; ``` ### Query Optimization #### Common Query Patterns ```sql -- Optimized pagination with cursor WITH paginated AS ( SELECT *, COUNT(*) OVER() AS total_count FROM entity_1 WHERE status = 'active' AND created_at < $1 -- cursor ORDER BY created_at DESC LIMIT $2 ) SELECT * FROM paginated; -- Efficient aggregation with window functions SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS daily_count, SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at)) AS running_total, AVG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS seven_day_avg FROM entity_1 WHERE created_at >= CURRENT_DATE - INTERVAL '30 days' GROUP BY DATE_TRUNC('day', created_at); -- Optimized search with trigrams CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX idx_entity_1_field_1_trgm ON entity_1 USING GIN (field_1 gin_trgm_ops); SELECT * FROM entity_1 WHERE field_1 % 'search_term' -- Fuzzy search ORDER BY similarity(field_1, 'search_term') DESC LIMIT 10; ``` ### Performance Tuning #### Database Configuration ```ini # PostgreSQL performance tuning (postgresql.conf) shared_buffers = 25% of RAM effective_cache_size = 75% of RAM maintenance_work_mem = 256MB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 # For SSD effective_io_concurrency = 200 # For SSD work_mem = 4MB max_connections = 200 ``` #### Query Performance Analysis ```sql -- Enable query logging for slow queries ALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100ms -- Analyze query performance EXPLAIN (ANALYZE, BUFFERS) SELECT ... FROM entity_1 WHERE ...; -- Find missing indexes SELECT schemaname, tablename, attname, n_distinct, correlation FROM pg_stats WHERE schemaname = 'public' AND n_distinct > 100 AND correlation < 0.1 ORDER BY n_distinct DESC; -- Monitor table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size, n_live_tup, n_dead_tup, round(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 3) AS dead_ratio FROM pg_stat_user_tables ORDER BY n_dead_tup DESC; ``` ### Data Migration Strategy ```sql -- Zero-downtime migration approach -- Step 1: Add new column without NOT NULL ALTER TABLE entity_1 ADD COLUMN new_field VARCHAR(100); -- Step 2: Backfill data in batches DO $$ DECLARE batch_size INTEGER := 1000; total_updated INTEGER := 0; BEGIN LOOP WITH batch AS ( SELECT id FROM entity_1 WHERE new_field IS NULL LIMIT batch_size FOR UPDATE SKIP LOCKED ) UPDATE entity_1 SET new_field = 'default_value' WHERE id IN (SELECT id FROM batch); GET DIAGNOSTICS total_updated = ROW_COUNT; EXIT WHEN total_updated = 0; PERFORM pg_sleep(0.1); -- Prevent lock contention COMMIT; END LOOP; END $$; -- Step 3: Add constraint after backfill ALTER TABLE entity_1 ALTER COLUMN new_field SET NOT NULL; ``` ### Backup & Recovery ```bash # Automated backup script #!/bin/bash BACKUP_DIR="/backups/postgres" DB_NAME="production" TIMESTAMP=$(date +%Y%m%d_%H%M%S) # Full backup with compression pg_dump -Fc -d $DB_NAME > "$BACKUP_DIR/backup_$TIMESTAMP.dump" # Point-in-time recovery setup archive_mode = on archive_command = 'test ! -f /archive/%f && cp %p /archive/%f' wal_level = replica # Restore procedure pg_restore -d $DB_NAME -j 4 backup_file.dump ``` ### Monitoring & Maintenance ```sql -- Key metrics to monitor CREATE VIEW database_health AS SELECT (SELECT count(*) FROM pg_stat_activity) AS active_connections, (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS active_queries, (SELECT max(now() - query_start) FROM pg_stat_activity WHERE state = 'active') AS longest_query, (SELECT pg_database_size(current_database())) AS database_size, (SELECT count(*) FROM pg_stat_user_tables WHERE n_dead_tup > 1000) AS tables_need_vacuum; -- Automated maintenance tasks CREATE EXTENSION pg_cron; -- Daily vacuum analyze SELECT cron.schedule('vacuum-analyze', '0 2 * * *', 'VACUUM ANALYZE;'); -- Weekly reindex SELECT cron.schedule('reindex', '0 3 * * 0', 'REINDEX DATABASE production;'); -- Monthly partition cleanup SELECT cron.schedule('partition-cleanup', '0 4 1 * *', 'CALL cleanup_old_partitions();'); ``` ### Security Best Practices - Use role-based access control (RBAC) - Implement row-level security (RLS) for multi-tenant data - Encrypt sensitive data at rest and in transit - Regular security audits - Implement connection pooling (PgBouncer) - Use prepared statements to prevent SQL injection ### Scalability Roadmap 1. **Phase 1** (0-100K records): Single primary database 2. **Phase 2** (100K-1M records): Add read replicas 3. **Phase 3** (1M-10M records): Implement caching layer 4. **Phase 4** (10M+ records): Horizontal sharding
DESCRIBE_APPLICATION
RequiredType and purpose of the application
Example: E-commerce platform, SaaS application, Analytics system
LIST_MAIN_ENTITIES
RequiredCore entities/tables needed
Example: Users, Orders, Products, Payments
Professional code review with actionable feedback
Find and fix bugs 10x faster
Design and document APIs instantly
Generate complete test suites automatically