The rate at which data is produced at the network edge, e.g., collected from
sensors and Internet of Things (IoT) devices, will soon exceed the storage and
processing capabilities of a single system and the capacity of the network.
Thus, data will need to be collected and preprocessed in distributed data
stores - as part of a distributed database - at the network edge. Yet, even in this
setup, the transfer of query results will incur prohibitive costs. To further
reduce the data transfers, patterns in the workloads must be exploited.
Particularly in IoT scenarios, we expect data access to be highly skewed. Most
data will be store-only, while a fraction will be popular. Here, the replication
of popular, raw data, as opposed to the shipment of partially redundant query
results, can reduce the volume of data transfers over the network.
In this paper, we design online strategies to decide between replicating data from data stores or forwarding the queries and retrieving their results. Our insight is that by pro ling access patterns of the data we can lower the data transfer cost and the corresponding response times. We evaluate the bene t of our strategies using two real-world datasets. |