CAT: A Last Mile Protocol for Content-Centric Networks
ABSTRACT In recent years, content-centric networking has become an active area of research. However, the proposals in this area often use custom protocols for the last mile communication between end users and content networks, which makes the technologies hard to be adopted. In this paper we present a content aware publish-subscribe protocol, called CAT, that can become a common solution for end users to access different content-centric networks. We discuss how CAT protocol can be implemented implemented with HTTP. We analyze the performance of a CAT proxy and discuss the potential performance improvements brought by parallel access and data caching.
-
Citations (0)
-
Cited In (0)
Page 1
CAT: A Last Mile Protocol for Content-Centric
Networks
Sasu Tarkoma, Dmitriy Kuptsov, and Petri Savolainen
Helsinki Institute for Information Technology
University of Helsinki and Aalto University
Finland
Email: firstname.lastname@hiit.fi
Pasi Sarolahti
Aalto University
Finland
Email: pasi.sarolahti@iki.fi
Abstract—In recent years, content-centric networking has
become an active area of research. However, the proposals in this
area often use custom protocols for the last mile communication
between end users and content networks, which makes the tech-
nologies hard to be adopted. In this paper we present a content-
aware publish-subscribe protocol, called CAT, that can become a
common solution for end users to access different content-centric
networks. We discuss how CAT protocol can be implemented
implemented with HTTP. We analyze the performance of a
CAT proxy and discuss the potential performance improvements
brought by parallel access and data caching.
I. INTRODUCTION
The idea of content-centric networking has been made
popular by such proposals such as Data-Oriented Network Ar-
chitecture (DONA) [1], Content-Centric Networking (CCN) [2]
and PSIRP [3]. However, DONA, CCN and PSIRP each
propose using their own custom protocols also for the last
mile communication between the client and the core network,
which makes it difficult for the end users to adopt the proposed
technologies. In this paper, we present a a novel content-aware
publish-subscribe protocol called CAT, which is designed to
act as a common last mile protocol between the user and
the different content-centric networks. CAT offers a common
method for publish/subscribe content delivery between an
Internet service provider and an end-user.
In content-centric networks data is a first-class citizen of the
network, and the user fetches the data by its name or identifier
instead of using a host name or address. CAT adopts publish-
subscribe paradigm, because it is straightforward for users to
specify their interest in named content through subscriptions,
and for content providers to publish their content into the
content-centric networks with publish operations. In practice,
end users can be both subscribers and publishers. In addition,
considering that some content centric networks such as PSIRP
were from the start built as publish-subscribe systems, it
is natural for CAT to be based on the publish-subscribe
paradigm.
Besides being based on the publish-subscribe paradigm, our
proposed CAT protocol is also content-aware, which means
that it allows attaching metadata to the content. Based on the
metadata the protocol implementers and network administra-
tors can choose different levels of service based on the content
type. For example, it is possible sacrifice reliability to meet
timing constraints for real-time content while downloading
larger files on a lower priority in the background.
The main motivations for the CAT protocol are the follow-
ing: providing content centric networks as a generic service,
allowing incremental deployment of these networks, and being
able to access the networks in parallel with the aim of minimiz-
ing delays and costs. CAT is implemented following a proxy-
based approach, in which a proxy works as an intermediate
between end users and content network. To further improve
the performance of our solution, we propose implementing
data caching and parallel accessing of different content centric
networks on CAT proxies.
The remainder of the paper is organized as follows: In
section II we present background and related work. In section
III we introduce the CAT protocol and its implementation over
HTTP. In section IV we analyze our design and propose further
improvements, and in section V we present our conclusions
and future work.
II. BACKGROUND AND RELATED WORK
Even though content-centric networking has been an active
discussion topic in the recent years, transport protocols for
content networking have received less attention. Figure 1 gives
an overview of the salient features of three recent network
architecture proposals. We observe that data and content-
centric operation as well as receiver-driven communications
are the key properties of these systems.
DONA proposes a “shim layer” between the transport layer
and network layer for data lookup, and relies on existing
transport protocols for data transmission. CCN assumes a
new data-oriented packet-level protocol, but does discuss the
transport issues, such as reliability and congestion control in
great detail. Transport issues have not yet been extensively
addressed in the PSIRP system based on the publish/subscribe
paradigm.
An API for publish/subscribe networks has also been dis-
cussed recently in [4]. We do not focus on APIs in this paper,
but refer the reader to one of the existing APIs, such as the
one proposed in the aforementioned paper.
The most prevalent transport protocol, TCP, has served the
Internet and its users well, but is approaching the limits of its
extensibility. TCP allows a maximum of 40 bytes of option
Page 2
Fig. 1.Overview of three new network architecture proposals.
space for new features. By now multiple extensions, such as
the selective acknowledgments [5], or the TCP timestamps [6]
have been defined and are in wide-spread use, potentially
taking a significant portion of the available option space. New
features, such as Multipath TCP [7], are under development
that are expected to need multiple additional options and
even more bytes from the scarce available option space. In
this respect SCTP [8] provides a more flexible extension
mechanism that is not bound to a predefined option space
limitation.
In addition to the traditional sender-oriented transport proto-
cols, there are some proposals that are more receiver oriented,
and therefore potentially a good match for content-centric net-
working. WebTP is an alternative received-oriented transport
protocol designed to be used with the world-wide web [9].
RCP is another application of receiver-oriented transport,
motivated by the wireless benefits [10].
The challenge with many new transport protocols, such as
SCTP, is that they are known to face deployment challenges
due to middleboxes such as home gateways, that try to
process the transport layer headers [11]. Therefore HTTP and
TCP are the natural candidates for a deployable application-
layer protocol although such a solution will have inherent
transmission overhead. Instead of designing a new transport-
layer protocol, we have decided to rely on an existing protocol
that we believe will serve our purpose.
III. CAT PROTOCOL
We assume a setup illustrated in Figure 2, where an Internet
Service Provider (ISP) connects with one or more content-
centric networks to enhance its content delivery efficiency,
possibly in addition to being connected to the traditional
Internet. The different content-centric domains may be based
on their own respective architectures and communication pro-
tocols the ISP needs to support. The gray circle in the bottom
left corner illustrates the scope of this paper: the CAT protocol,
that is designed for the last-mile communication between
end-user clients and a server at the local ISPs network. The
CAT protocol hides the multitude of choices by offering a
single protocol for the communication between the end-user
clients and a server in the ISPs network. The CAT server
then translates the CAT content requests to appropriate content
delivery protocol, as determined by the ISP.
Internet
ISP
Content Network AContent Network B
CAT proxy
Content protocol B
Content protocol A
IP
CAT clients
CAT protocol
Fig. 2.Network setup with CAT protocol between an ISP and clients.
A. Network elements
We begin the description of our protocol by introducing the
entities that comprise the network and shed light on what is
the corresponding functionality of each of these entities.
Page 3
1) Content distribution infrastructure: The core of each
content distribution network comprises the large number of
distributed routers and content storage entities. Such core net-
work is responsible for routing the messages using the content
identifiers, as well as for storing and retrieving the actual
content. Such network should be robust and able to serve
a large number of content retrieval requests. While the high
level functionality is common to all content-centric networks,
the specifics of each type of network may be different. Such
specifics may include the type and form of data naming and
identification, security mechanisms, protocol semantics, and
routing algorithms. The two major responsibilities of core
network are: (i) maintaining the distributed infrastructure that
stores the content, and (ii) routing the requests and data
retrieval in a timely manner.
2) Proxies: As the name implies, proxies are the interme-
diaries that interconnect the end-hosts with the content-centric
networks. Proxies are interacting with the core network on
behalf of end-hosts. This on one hand allows to hide the
complexity and internals of the specific content distribution
network; on the other hand, proxies may impose additional
functionality, varying from security policy enforcement to
content caching to increase the system responsiveness.
3) End-hosts: Publishers and Subscribers: Finally, we de-
scribe the end-hosts as actual content producers (publishers)
and content consumers (subscribers). The publishers are in-
terested in making their content highly visible and available
in the network while the subscribers are interested in an easy
and rapid way of data discovery and retrieval.
B. Protocol overview
Almost all proposed content-centric approaches suggest
clean slate designs and therefore require re-architecting the
existing Internet, and most importantly the end-hosts. There
is no doubt that the deployability of such systems will be a
considerable issue. On the other hand our suggested protocol
tries to make the deployment as seamless as possible both
to the end-hosts and to any possible intermediaries (such
as middleboxes). To mitigate these issues which will arise
during the transition period (or when interconnecting different
networks) we define our protocol to be agnostic about the type
of the underlying protocol used for message passing. CAT can
be implemented over any reliable transport – it can be run
over HTTP, TCP or SCTP protocols, or implemented on top
of some future reliable transport protocol.
In order to be able to leverage content-centric networks
immediately, we assume in the remainder of the paper that
CAT is implemented on top of HTTP in a REST-like manner.
There are several obvious advantages of such a design (i)
it allows including content that is retrieved from multiple
CCNs (separately or simultaneously) on web pages using
familiar URL syntax, and (ii) masking messages in well-
known protocol allows seamlessly traversing the middleboxes,
such as NATs and firewalls, which are known to be a limiting
factor [11] when deploying new protocols.
In the following subsections we briefly describe the main
functionality of the CAT protocol.
1) Local proxy discovery: An end-host attached to a current
administrative domain, i.e., a serving ISP, should discover the
closest CAT proxy within the domain. There are several alter-
natives to perform such discovery: (i) an end-host configured
with a CAT proxy URL, e.g., local.cat.proxy, will send a DNS
query in its local domain to resolve a name local.cat.proxy to
a valid routing identifier; we assume that ISP supporting CAT
proxy should respond such queries with a routing identifier
(e.g., an IP address) of a locally maintained CAT proxy; (ii)
another alternative can be a DHCP-like configuration, where
a client along with other network configuration parameters
receives local CAT proxy routing identifier; and (iii) a client
may perform an IPv6-like CAT proxy discovery. Throughout
this paper we assume that clients are using DNS resolution,
whereas, ISPs that support CAT proxies respond to such
queries using locally deployed DNS servers.
2) Connection establishment: Before publishing or sub-
scribing content, the CAT client opens a single reliable trans-
port connection (usually TCP or SCTP) that we refer to as con-
trol association, to a CAT proxy. A new connection, referred
to as data association, is opened for transporting the data
of each content item much the same way as different HTTP
1.0 requests are carried within separate transport connections.
After the control association is established the client may send
publish/subscribe requests. We define these operations in the
proceeding sections.
3) Publishing: In order to publish (see Figure 3) a content
item, the CAT client first sends metadata about the content
item to be published to the CAT proxy using a HTTP POST
request. This metadata also includes the identifier of the
content distribution network that the client would like to use
for publishing the content. CAT proxy reads the metadata, and
registers the content item to the desired content distribution
network. Upon registering the matadata with the corresponding
content distribution network, CAT proxy replies to the client
with the publication identifier and other related information.
The client can now publish the content via CAT proxy in the
desired content distribution network.
Control Association
: Content Network
: data(publication_id, data_bytes)
: Cat Proxy
: publish()
: publication_ok(publication_id, port)
: Cat Client
: connect(port)
: publish(metadata)
Data Association
Fig. 3. Publishing operation
The client sends the actual data data of the content item to
Page 4
be published to the CAT proxy in a HTTP PUT request. The
request contains (i) content identifier, and (ii) actual content
if any (missing content may indicate that the client wishes to
publish content that is not cachable, and will be only provided
after subscriber registers a trigger for such content). Depending
on the case, CAT proxy may (i) block PUT request and wait
until a corresponding subscription request arrives, or (ii) reply
immediately to the client.
4) Subscribing: In a similar fashion to the publication
procedure, a client that wishes to subscribe (see Figure 4)
specific content contacts CAT proxy and submits metadata
about the data item to be subscribed to the proxy. The proxy
checks the availability of the data the client has subscribed
to. At this phase, the CAT proxy may contact all available
content distribution networks to discover the desired content.
Depending on the availability of the content requested by
the client the CAT proxy will either respond with a proper
publication identifier and metadata, or reply that the requested
data is not yet available. For latter case, the client may wait
(depending on the user’s behavior) until the content is found in
any content distribution network available to the CAT proxy.
The CAT proxy stores a soft-state for a subscription request
and periodically queries the content distribution networks for
availability of the content. When the content is available the
client can start to pull it using a proper publication identifier.
: subscribe(metadata)
Control Association
: subscribe()
: Cat Client : Cat Proxy
: subscription_ok(publication_id, port)
: connect(port)
: data()
Data Association
: Content Network
: data(publication_id,data_bytes)
Fig. 4. Subscription operation
In contrast to the publish request, the subscription opera-
tion is implemented using a HTTP GET request. The client
constructs a HTTP GET request which includes the unique
identifier of the content and other metadata, such as authenti-
cation information.Depending on particular implementation,if
the content is not available in the content distribution network,
the CAT proxy may (i) respond to the client with a HTTP 404
error message, or (ii) block the GET request until the requested
data item appears in the content distribution network. If the
waiting time exceeds a specified threshold, the connection will
be closed with a timeout message sent to the client. In this
case, the client may try to repeat the request later on. In case
the specified data item is present in the content distribution
network, an HTTP 200 message will be sent to the client, and
the client can start pulling the actual data.
IV. ANALYSIS
We introduce a simple model for our proxy-based approach
and analyze the system performance. One of the most im-
portant notions of content distribution networks is availability,
responsiveness and resulting throughput of the system. The
system aims to provide high availability for content by caching
content locally, and utilizing a number of CCN networks. We
believe that a network operator can maintain several proxies
and assign a set of clients to each proxy (e.g., by using DNS
load balancing).
Let λ denote the request arrival rate at the CAT proxy from
the clients, and p denote the cache hit probability at the proxy.
Then (1 − p) ∗ λ + Λ denotes the load to the CCN system,
where Λ is the request rate from other proxies to the CCN.
Let k denote the number of different CCN systems, and f
denote the fan-out of the requests to the CCNs. Therefore the
overall rate of traffic at a single CCN is λCCN=
Anycast happens when f = 1 and if it is greater requests are
performed in parallel.
The key parameters for the proxy are the following:
• The request rate λ that determines the overall load of the
CAT proxy, and also the load on the CCNs.
• The cache hit probability p, which determines the request
forwarding rate to the CCNs.
• The fan-out parameter f of the requests, which deter-
mines the level of parallel processing in the network, and
the overall load on the CCNs.
CAT operations, namely subscribe and publish, can be
implemented in different ways from the viewpoint of CCNs.
Namely, a subscribe request is sent to f CCNs. This requires
that there is a mechanism for preventing redundant content
updates from the CCNs towards the CAT proxy. Publications
do not pose similar challenge, a piece of content is simply
made available in f CCNs. We believe that an optional selector
function is needed that can be used by clients to define which
CCN networks are used for subscriptions and publications.
For details regarding buffering and response times, we refer
to the queuing model presented in [12]. This model can be
used to include the effects of queuing and network bandwidth
in the model. This model applies to the case when f = 1
and the case of a single CCN. In this case, multiple CCNs
can be modeled simply by increasing the service rate of a
single CCN (and updating the Λ parameter). We have analyzed
this more detailed model, and based on the analysis high hit
rates alleviate system performance significantly. The size of
the requested data items and the buffer lengths are important
and high hit rates can result in congestion at the local proxy.
Recent results indicate that approximately = 71% of HTTP
requests are cacheable and 28%, of the bytes. Similar analysis
can be applied for the case of peer-to-peer file exchange.
Experimental results suggest that 27% local BitTorrent traffic
and 97% global AS-wide traffic is cacheable, respectively [13].
It is well known that the popularity of Web objects and RSS
feeds follow the zipf distribution. The popularity of media
objects, such as video-on-demand objects, has been shown to
f(1−p)∗λ+Λ
k
.
Page 5
follow the stretched exponential distribution. This indicates
that the hit ratio will be lower for media objects than for Web
pages related resources. With high temporal locality, long-
term caching for media objects might attain hit ratio greater
than 85% with caching of 10% of the content [14]. Thus it is
reasonable to expect that the hit rate parameter for the CAT
proxy would tend towards one with popular and cacheable
content. This would then localize the content dissemination.
We conclude the analysis of the proposed CAT system with
the following observations:
• Parallel access to multiple CCNs improves system perfor-
mance and content availability; however, the proxy needs
to coordinate updates to subscribed content. The updates
for the same content may be available from different
CCNs, which results in unnecessary delivery overhead
unless content version checking is done. This can be
done by first signaling about new content that matches
subscriptions, and only after that transferring the content.
The proxy thus provides a concast service for subscribers.
• The CAT proxy offers significant improvements when
content can be cached. Recent experimental results indi-
cate that a large portion of Internet content can be cached.
• A selector function enables clients and applications to
specify which CCNs should be used for which content
thus providing a universal CCN service. The selector
function should specify CCN policies and delivery op-
tions.
V. CONCLUSIONS AND FUTURE WORK
We have begun this paper by examining various state-of-
the-art content-centric networks and commonalities that these
networks have. A common last-mile transport protocol is
crucial for the adoption of future content centric networks.
Therefore, we have outlined the concept of content-aware
public-subscribe protocol for content-centric networks – net-
works in which the information is not retrieved in end-to-
end but rather end-to-middle manner. The CAT proxy design
supports various transport protocols; however, HTTP is the
main focus of our work due to its interoperable nature. The
CAT proxy needs to support content selection and delivery
across various content-centric networks, which means that the
proxy is metadata-aware and able to support concast data
delivery to subscribers. The key features are thus universal
data access, content locality, and parallel access to content-
centric networks.
To complement our concept we have analyzed the proposed
design with a simple model. One of the key parameters of the
analytical model is the cacheability of data. Indeed, the model
suggests that with good cacheability and hit rate, the delay
and communication cost of the system is significantly reduced.
Current and future work include a prototype implementation
of the system as well as experiments with realistic workloads.
VI. ACKNOWLEDGMENTS*
This work has been supported by the ICT SHOK Future
Internet project. We would like to thank Long Nguyen Hoang
for initial work on the proxy, and Dr. Pasi Lassila for feedback
regarding the modeling aspects.
REFERENCES
[1] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim,
S. Shenker, and I. Stoica, “A data-oriented (and beyond) network
architecture,” in SIGCOMM ’07: Proceedings of the 2007 conference
on Applications, technologies, architectures, and protocols for computer
communications.New York, NY, USA: ACM, 2007, pp. 181–192.
[2] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs,
and R. L. Braynard, “Networking Named Content,” in Proceedings of
ACM CoNEXT ’09, Rome, Italy, Dec. 2009.
[3] D. Trossen, M. S¨ arel¨ a, and K. Sollins, “Arguments for an information-
centric internetworking architecture,” SIGCOMM Computer Communi-
cation Review, vol. 40, no. 2, pp. 26–33, 2010.
[4] M. Demmer, K. Fall, T. Koponen, and S. Shenker, “Towards a modern
communications api,” in ACM HotNets-VI, 2007.
[5] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, “TCP Selective
Acknowledgement Options,” RFC 2018, Oct. 1996.
[6] D. Borman, R. Braden, and V. Jacobson, “TCP Extensions for High
Performance,” RFC 1323, May 1992.
[7] D. Wischik, M. Handley, and M. Bagnulo, “The Resource Pooling
Principle,” ACM SIGCOMM Computer Communication Review, vol. 38,
no. 5, pp. 47–52, Oct. 2008.
[8] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor,
I. Rytina, M. Kalla, L. Zhang, and V. Paxson, “RFC 4960: Stream control
transmission protocol,” United States, 2000.
[9] R. Gupta, M. Chen, S. Mccanne, and J. Walrand, “WebTP: A Receiver-
Driven Web Transport Protocol,” in Proceedings of IEEE Infocom ’99,
1999.
[10] H.-Y. Hsieh, K.-H. Kim, Y. Zhu, and R. Sivakumar, “A receiver-
centric transport protocol for mobile hosts with heterogeneous wireless
interfaces,” in Proceedings of ACM MOBICOM ’03, 2003.
[11] S. H¨ at¨ onen, A. Nyrhinen, L. Eggert, S. Strowes, P. Sarolahti, and
M. Kojo, “An experimental study of home gateway characteristics,”
in Proceedings of ACM SIGCOMM Internet Measurement Conference
(IMC) ’10, Melbourne, Australia, November 2010.
[12] T. Berczes, G. Guta, G. Kusper, W. Schreiner, and J. Sztrik, “Analyzing
a proxy cache server performance model with the probabilistic model
checker prism,” in Proceedings of the 5th International Workshop on
Automated Specification and Verification of Web Systems (WWV’09),
2009.
[13] B. Ager, J. Kim, F. Schneider, and A. Feldmann, “Revisiting cacheability
in times of user generated content,” in 13th IEEE Global Internet
Symposium, 2010.
[14] L. Guo, E. Tan, S. Chen, Z. Xiao, and X. Zhang, “Does internet
media traffic really follow Zipf-like distribution?” in SIGMETRICS ’07:
Proceedings of the 2007 ACM SIGMETRICS international conference
on Measurement and modeling of computer systems.
USA: ACM, 2007, pp. 359–360.
New York, NY,