Data

Blog posts tagged with Data
Cross-Border Data Processing With Privacy Compliance Through Expanso

Cross-Border Data Processing With Privacy Compliance Through Expanso

22/05/2025
Many organizations work with clients and infrastructure around the world and face significant challenges ensuring they follow privacy regulations as their application data flows across borders. This post looks at how you can use Bacalhau to handle distributed cross-border processing and anonymize data with Microsoft Presidio to help meet some of these requirements.
Getting Started with Machine Learning on Bacalhau

Getting Started with Machine Learning on Bacalhau

08/05/2025
Machine Learning requires vast amounts of resources, and distributing these resources across multiple devices and regions helps with cost, speed, and data sovereignty. Bacalhau is an open-source distributed orchestration framework designed to bring compute resources to the data where and when you want, drastically reducing latency and resource overhead. Instead of moving large datasets around networks, Bacalhau makes it easy to execute jobs close to the data’s location, reducing latency and resource overhead.
Bacalhau v1.7.0 - Day 5: Distributed Data Warehouse with Bacalhau and DuckDB

Bacalhau v1.7.0 - Day 5: Distributed Data Warehouse with Bacalhau and DuckDB

28/03/2025
With many applications that rely on data warehouses, you need to keep data sources in different locations. This could be due to privacy or regulatory reasons or because you want to keep processing close to the source. However, there are still times when you want to perform analysis on and across these data sources from one location but not move the data. This post uses Bacalhau to orchestrate the distributed processing and DuckDB to provide the SQL storage and querying capacity for some mock sales data based in the EU and the US.