Vantino regularly builds data analytics pipelines for its customers. This is how we migrated a custom on-premises pipeline to Google Cloud Platform.
For business analytics, data is the foundation. For a Swiss financial company with offices in Austria and Germany, a reliable and efficient data analytics pipeline is essential to making informed decisions and staying competitive.
Version 1 of the company's pipeline relied on its ERP (Abacus) and CRM systems to push CSV reports on a daily basis. The reports included key information such as employee timesheets and tasks. Before the data was loaded into a PostgreSQL database, it was validated by Python scripts against a predefined schema using the tableschema Python library, which kept the data consistent and accurate. Once validated, the data was made available in the Apache Superset visualisation platform for analysis and reporting.
To gain scalability and reliability, the company then migrated its data analytics pipeline to Google Cloud Platform (GCP). In version 2, CSV reports are loaded directly into BigQuery, Google's cloud-native data warehouse. This removes the need for a custom analytics server and lets the company query and analyse large volumes of data in near real time.
To run the migration, we used an ETL (Extract, Transform, Load) process. CSV files are uploaded to Google Cloud Storage (GCS) via Google Cloud Functions, transformed with the DBT (Data Build Tool) Python library, and loaded into BigQuery. DBT lets data teams transform and manage their data consistently and maintainably: by declaring its schema and transformations in code, the company can automate and govern its pipeline more efficiently.
Apache Airflow, an open-source platform for orchestrating and scheduling data pipelines, manages the flow of data from the ERP and CRM systems to BigQuery, ensuring data arrives on schedule and in the correct format.
To further optimise the pipeline, the company uses GCP Dataflow, a fully managed service for transforming and enriching large datasets in near real time. This lets the team run complex transformations without dedicated infrastructure or time-consuming manual processes.
Once the data is in BigQuery, it is available for analysis and visualisation in Apache Superset. The company can keep using its existing dashboards while benefiting from the performance and scalability of GCP.
Thanks to the migration to Google Cloud Platform, the Swiss financial company has improved both the efficiency and the reliability of its data analytics pipeline. It can now manage, transform and analyse its data in near real time — making faster, better-informed decisions and staying ahead of the competition.
Contact us for a data management consultation. For more details, visit our Data & BI Consulting page.