Skip to content

Platform Architecture


L1 — Logical Architecture

The platform follows a layered data architecture separating ingestion, processing, storage, and consumption concerns.


L2 — Physical Implementation (GCP)

Ingestion

SourceMethodExecutor
Oracle DBJDBC via BeamDataflow (heavy) or Airflow task (light)
CSV / FilesDirect readAirflow task
REST APIsHTTP clientAirflow task
PDFsCustom PDF parserAirflow task

All ingestion outputs land in Cloud Storage as Avro files, partitioned by date.

Processing Decision

The decision of whether to use Dataflow or execute directly in Airflow is driven by the metadata in aud_entidad.

Storage

LayerTechnologyFormatPartitioning
RawCloud StorageAvrofecha_lote (date folder)
HistoryBigQuery native tableColumnarfecha_lote partition
ActiveBigQuery native tableColumnarNone (latest version per PK)

Orchestration

  • Cloud Composer 2 running Airflow 2.x
  • DAGs read configuration from aud_* tables in BigQuery at runtime
  • Framework libraries imported by DAGs as Python modules

Security & Operations

  • IAM roles per service account (Composer, Dataflow, BQ, GCS)
  • Cloud KMS for encryption at rest
  • Cloud Logging for centralized log aggregation
  • GitHub Actions for CI/CD of DAGs and pipeline code