Enterprise Data Integration Platform
A high-performance distributed data platform enabling semantic data discovery and SQL-based querying across heterogeneous data sources. Co-developed at Stream Financial to solve the enterprise data integration challenge—allowing tech-savvy business users to access and query data without requiring IT change requests.
Large organisations with complex legacy infrastructure face a dilemma: technology that enabled growth has reached a turning point in value. Legacy systems are inflexible when agility is needed for changing markets.
The tools available to business users for discovering and accessing enterprise data are poor. When data is found, extracting and using it productively is difficult—leading to proliferation of shadow IT spreadsheets and Access databases.
DataFusion provides a semantic layer that makes business-relevant metadata available, along with tools to discover, retrieve, and process information regardless of where that data actually resides in the enterprise.
Written from ground up in C++ to leverage vector processing in modern CPUs. Provides exceptional read/write speeds with data compression that's actually faster than uncompressed operations.
SQL '92 compliant engine that distributes queries across multiple heterogeneous data sources, combining results seamlessly with full data lineage tracking.
Extensible services that provide access to data: CSV files, ODBC sources (Excel/Access), databases (SQL Server, Oracle, Snowflake), and opaque providers for R, Matlab, Python scripting.
Bespoke map-reduce optimized for low latency with in-memory processing while maintaining data persistence to disk for durability.
Client Query → Query Engine → Provider Selection → Distributed Execution → Result Aggregation → Transparent Lineage
Risk & Finance data unification: Enable consolidated views across traditionally siloed systems
Regulatory reporting: Aggregate data from multiple sources for BCBS239 compliance
Data quality initiatives: Provide single query interface for data validation across systems
Business intelligence: Allow business users to create ad-hoc analyses without IT involvement
Legacy system migration: Query both old and new systems during transition periods
DataFusion represents the transition from consulting (identifying data integration problems) to building (creating technology solutions). The same first-principles approach used later in PAI and Risk-Agents—understand the problem deeply, then build something that addresses the root cause.