Appearance
Platform Architecture
L1 — Logical Architecture
The platform follows a layered data architecture separating ingestion, processing, storage, and consumption concerns.
L2 — Physical Implementation (GCP)
Ingestion
| Source | Method | Executor |
|---|---|---|
| Oracle DB | JDBC via Beam | Dataflow (heavy) or Airflow task (light) |
| CSV / Files | Direct read | Airflow task |
| REST APIs | HTTP client | Airflow task |
| PDFs | Custom PDF parser | Airflow task |
All ingestion outputs land in Cloud Storage as Avro files, partitioned by date.
Processing Decision
The decision of whether to use Dataflow or execute directly in Airflow is driven by the metadata in aud_entidad.
Storage
| Layer | Technology | Format | Partitioning |
|---|---|---|---|
| Raw | Cloud Storage | Avro | fecha_lote (date folder) |
| History | BigQuery native table | Columnar | fecha_lote partition |
| Active | BigQuery native table | Columnar | None (latest version per PK) |
Orchestration
- Cloud Composer 2 running Airflow 2.x
- DAGs read configuration from
aud_*tables in BigQuery at runtime - Framework libraries imported by DAGs as Python modules
Security & Operations
- IAM roles per service account (Composer, Dataflow, BQ, GCS)
- Cloud KMS for encryption at rest
- Cloud Logging for centralized log aggregation
- GitHub Actions for CI/CD of DAGs and pipeline code