Recent studies show that a significant part of Internet traffic is delivered
through Web-based applications. To cope with the increasing
demand for Web content, large scale content hosting and delivery
infrastructures, such as data-centers and content distribution
networks, are continuously being deployed. Being able to identify
and classify such hosting infrastructures is helpful not only to content
producers, content providers, and ISPs, but also to the research
community at large. For example, to quantify the degree of hosting
infrastructure deployment in the Internet or the replication of Web
content.
In this paper, we introduce Web Content Cartography, i. e., the identification and classification of content hosting and delivery infrastructures. We propose a lightweight and fully automated approach to discover hosting infrastructures based only on DNS measurements and BGP routing table snapshots. Our experimental results show that our approach is feasible even with a limited number of well-distributed vantage points. We find that some popular content is served exclusively from specific regions and ASes. Furthermore, our classification enables us to derive content-centric AS rankings that complement existing AS rankings and shed light on recent observations about shifts in inter-domain traffic and the AS topology. |