Platform Architecture

L1 — Logical Architecture

The platform follows a layered data architecture separating ingestion, processing, storage, and consumption concerns.

Source	Method	Executor
Oracle DB	JDBC via Beam	Dataflow (heavy) or Airflow task (light)
CSV / Files	Direct read	Airflow task
REST APIs	HTTP client	Airflow task
PDFs	Custom PDF parser	Airflow task

All ingestion outputs land in Cloud Storage as Avro files, partitioned by date.

The decision of whether to use Dataflow or execute directly in Airflow is driven by the metadata in aud_entidad.

Layer	Technology	Format	Partitioning
Raw	Cloud Storage	Avro	`fecha_lote` (date folder)
History	BigQuery native table	Columnar	`fecha_lote` partition
Active	BigQuery native table	Columnar	None (latest version per PK)