With the ongoing digitalization, an increasing number of sensors is becoming
part of our digital infrastructure. These sensors produce highly, even globally,
distributed data streams. The aggregate data rate of these streams far exceeds
local storage and computing capabilities. Yet, for radical new services (e.g.,
predictive maintenance and autonomous driving), which depend on various control
loops, this data needs to be analyzed in a timely fashion.
In this position paper, we outline a system architecture that can effectively handle distributed mega-datasets using data aggregation. Hereby, we point out two research challenges: The need for (1) novel computing primitives that allow us to aggregate data at scale across multiple hierarchies (i.e., time and location) while answering a multitude of a priori unknown queries, and (2) transfer optimizations that enable rapid local and global decision making. |