NeTS-FIND: Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates
Project description
Routing information is the most basic and, perhaps, the
most complicated function that networks perform. Conventional
wisdom states that to find paths to destinations through
the complex network maze, nodes must communicate and exchange
information about the status of their connections to
other nodes, since without some knowledge of changing network
connectivity, it is not possible to successfully route information
through the network.
In the Internet, this required inter-node communication
makes routing both expensive and fragile. The recent Internet
Architecture Board report on routing and addressing (RFC4984)
identifies convergence costs of deployed routing protocols as one of
the most serious scaling issues with the existing Internet routing
architecture, aggravated by explosive rates of routing table size
growth. Worse yet, in our recent review
of compact routing described below, we show that the required number
of messages for routing state to converge on small-world networks
cannot scale better than linearly with network size for any routing
algorithm.
In many other real networks however, nodes can efficiently
communicate, even though they do not exchange any information about
the current global state of the network topology. In 1969, Stanley
Milgram performed the following
experiment. He asked some random
individuals (sources) to send a letter to a specific person (the
destination), described by his name, occupation, age, and the city
he lived in. The sources were asked to pass the letter to friends
chosen to maximize the probability of the letter reaching its
destination. Approximately 30% of the letters reached their
destination, even though nodes in this routing example had no global
knowledge of the human acquaintance network topology, except their
local connections and some characteristics (e.g., occupation, age,
city of dwelling) of their connections.
Much later, Jon Kleinberg
offered the first popular
explanation of this surprising effect. In
his model, each node, in addition to being a part of the graph
representing the global network topology, resides in a coordinate
space -- a grid embedded in the Euclidean plane. The coordinates of
a node in the plane, its address, abstracts the information about
the destination in Milgram's experiments. Each node knows: 1) its
coordinates; 2) the coordinates of its neighbors; and 3) the
coordinates of the destination written on the packet. Given these
three pieces of information, the node can route greedily by
selecting the direct neighbor closest to the destination in the
plane.
Clearly, the described greedy routing strategy can be efficient
only if the network topology is in some way congruent
with the underlying space. Kleinberg indeed finds that
the paths produced by greedy routing are polylogarithmically
short only if the probability that there is a link between a pair
of nodes depends in a very specific way on the distance between
the two nodes in the plane. This finding implies that
greedy routing cannot be equally efficient on any arbitrary
network topology built over a given underlying space. Only
certain topologies, congruent with the underlying space, will
perform well.
Given that the network topology is so critically important, the
Kleinberg model stands closer to the beginning of an explanation for
Milgram's experiment than to its end. The model does not (try to)
reproduce the basic topological properties of social networks
through which letters were traveling in Milgram's experiments. For
instance, the Kleinberg model produces k-regular graphs while
social networks, the Internet, and many other complex networks are
known to be scale-free, meaning that: i) the distribution P(k) of
node degrees k in a network follows power laws P(k) ~ k-γ with
exponent γ lying between 2 and 3; and ii) the network has strong
clustering, i.e., a large number of triangular subgraphs.
Extending the Kleinberg's approach, we assume in this project that
nodes in the Internet and other complex networks exist in some
spaces that underlie the observed network topologies. We call these
spaces hidden metric spaces. The observed network topology is
coupled to the hidden space geometry in the following way: a link
between two nodes in the topology exists with a certain probability
that depends on the distance between two nodes in the hidden
geometry. In the most general settings, hidden distances
abstract intrinsic similarities between nodes. The assumption that
more similar nodes are more likely to be connected makes the
connection probability a decreasing function of the hidden
distance.
The ultimate goal of this project is to find the hidden spaces
underlying real networks and to identify node coordinates in them.
Unfortunately, it is quite difficult to find something if one does not
know what it is. Therefore, we have developed the following
step-by-step research programme that we believe is optimal to
achieve our ultimate goal:
-
Obtain empirical evidence that hidden spaces do exist and
that they are metric indeed;
-
Identify navigability mechanisms that influence the efficiency of
greedy routing in complex networks;
-
Find the basic geometrical and topological properties of hidden
spaces that make them maximally congruent with respect to the
identified navigability mechanisms;
-
Obtain empirical evidence that hidden spaces under real networks
do possess these properties; and
-
Find mappings of nodes in real networks to the identified spaces
or their models.
The results obtained so far are described below.
Papers
- D. Krioukov, kc claffy, K. Fall, and A. Brady
On Compact Routing for the Internet
Here we review compact routing and discuss its applicability to
routing in the Internet.
Compact routing is the routing theory that studies the fundamental
limits for routing efficiency, and tries to construct routing
algorithms that meet those limits. Measures of routing
efficiency include the routing table size, routing path stretch, and
communication overhead, often estimated by the number of messages
required for the algorithm to converge upon a network topology
change. The global network topology knowledge is usually assumed,
and all the efficiency parameters are estimated in the worst case,
across all possible network topologies on which the routing
algorithm correctly operates.
The most pessimistic fact from this theory is that there can exist
no routing algorithm that would be able to converge with the number
of control messages growing slower than linearly with the network
size in the worst case. The most pessimistic finding in this paper
is that the small-world topologies are this worst case. Almost all
complex network topologies, including the Internet, are small-world:
the average shortest path length in them grows (sub)logarithmically
with the network size.
- M. Angeles Serrano, D. Krioukov, and M. Boguna
Self-Similarity of Complex Networks and Hidden Metric Spaces
Here we obtain empirical evidence that metric spaces do underlie
real networks.
We do so the following way. We consider the simplest possible
degree-thresholding network renormalization procedure: given a
network, obtain its subnetworks by throwing out all nodes of degrees
less than a certain threshold. We apply this procedure to real
networks and find that the degree distribution, degree correlation,
and clustering are self-similar: both before and after
renormalization, all these characteristics follow the same curves.
We then randomize the networks. We randomly rewire links such
that the degree distribution is preserved. In these randomized
networks, the degree distribution and correlations
remain self-similar, but clustering does not.
We then introduce the model of scale-free networks with the simplest
hidden metric space (a circle) underneath, generate synthetic
networks in this model, perform all the same operations described
above with these synthetic networks, and find that they exhibit
qualitatively the same effects as the real networks. Given the
intimate relationship between the triangle inequality in the hidden
geometry (transitivity of being close) and clustering in the
observed topology (transitivity of being connected), these findings
provide a strong evidence that metric spaces do underlie real networks,
including the Internet, and are a plausible explanation of their
self-similarity.
An interesting and fundamentally important "by-product" of these results is that
they offer a new perspective on self-similarity of complex networks.
These networks turn out to be self-similar with respect to distance
rescaling in the hidden space, because
this distance rescaling is
equivalent to the degree-thresholding renormalization described
above.
- M. Boguna, D. Krioukov, and kc claffy
Navigability of Complex Networks
Here we discuss the general mechanisms behind navigability of
complex networks and show that the structural characteristics of
real networks correspond to most navigable configurations.
Specifically, we show that more navigable networks are networks with
more heterogenous degree distributions
(γ's closer to 2) and
stronger clustering. The first property is easy to explain: higher
heterogeneity implies relatively more hubs in the network that boost
up its navigability. Clustering is trickier. Networks with stronger
clustering are those that have stronger ties between the network
topology and hidden geometry. More formally, in such networks,
connections between nodes close in the hidden space are more
preferred, or equivalently, the connection probability function
decreases faster with the hidden distance. The stronger the ties
between the network topology and hidden geometry, the more congruent
they are, the closer and more successfully the greedy routing paths
follow the shortest paths in the network.
- D. Krioukov, F. Papadopoulos, M. Boguna, and A. Vahdat
Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric Spaces
Here we establish the main geometric property of hidden spaces that
make them maximally congruent with respect to the described mechanisms
behind navigability of complex networks. We argue that these spaces
are hyperbolic, i.e., negatively curved.
Specifically, we consider the simplest example of a compact
hyperbolic space, a hyperbolic disc, and show that if nodes are
uniformly distributed in it, then the scale-free (power-law) network
topologies naturally emerge in these settings as a consequence of
negative curvature in the hidden space. Greedy routing is maximally
efficient in these settings, too. Almost all paths are shortest and
successfully reach destinations. If we perturb topologies by
removing links from them, thus emulating network dynamics, the
efficiency of greedy routing deteriorates very slowly. Removal of up
to 10% of the links degrades the percentage of successful paths by
less than 1%.
The main reason for such outstanding efficiency is the exceptional
congruency between the metric structures of scale-free topologies
and hyperbolic geometries. The shortest paths in the network follow
very closely the hyperbolic geodesics between the source and
destination, and there are many shortest paths between the same
source and destination that all do so. Link removals affect some of
these shortest paths but not all of them.
The reason why the hidden spaces under real networks are negatively
curved is as follows. Complex networks connect distinguishable,
heterogenous elements abstracted as nodes. Understood broadly, this
heterogeneity implies that there is at least some taxonomy of
elements, meaning that all nodes can be somehow classified. In most
general settings, this classification implies that nodes can be
split in large groups consisting of smaller subgroups, which in turn
consist of even smaller subsubgroups, etc. The relationships between
such groups and subgroups can thus be approximated by tree-like
structures, sometimes called dendrograms, that represent hidden
hierarchies in networks. But the geometries of trees and hyperbolic
spaces are intimately related. Trees embed with minimal,
logarithmic, distortion into hyperbolic spaces, while they embed
with exponential distortion into Euclidean spaces. Colloquially,
hyperbolic spaces are continuous versions of trees. The metric
structures of both are essentially the same. In other words, the
hierarchical organization of complex networks translates
into the negative curvature of hidden metric spaces.
Other FIND-funded papers
- X. Dimitropoulos, D. Krioukov, A. Vahdat, and G. Riley
Graph Annotations in Modeling Complex Network Topologies
Here we develop a methodology to generate annotated complex network
topologies of different sizes.
We focus on the AS-level Internet with AS links annotated by
business relationships between ASs inferred in our previous work. The
methodology starts with extracting a set of specific properties of
the Internet topology that remain the same during its evolution.
This set of properties is contained in the correlation profile of
the annotated Internet. We first extend the generic
dK-series
formalism (also developed in our previous work) to apply to
annotated graphs. By annotated graphs we essentially mean colored
graphs, except that each link can be colored not by one but by two
colors: each stub (half-link) is colored by a different color. These
colors can be AS relationships. For example, provider=red,
customer=green, and peer=blue, so that links connecting customers to
providers are red-green, while links between peers are all blue.
We extract different kind of correlations from the real Internet
topologies annotated by AS relationships. The 1K-correlations are
correlations among colors of stubs attached to nodes, i.e., the
joint distribution of colors showing how many nodes with x red stubs,
y green stubs, and z blue stubs the network has. The 2K-correlations
tell us how many differently colored links connecting nodes of
different colored degrees the network has. It turns out
that these types of correlations fully charcterzied the structure
of the annotated AS Internet.
We then decompose
these joint distributions into a collections of marginals and the
normalized correlation profiles using copulas, a technique from
statistics specifically designed to do so. We then rescale
separately the marginals, by preserving their 1-dimensional shapes, and
correlations, by using copulas, to obtain the corresponding joint
distributions for target synthetic networks of desired sizes. By
construction, these networks have the same correlation profile as
the annotated Internet, so that they reproduce the topological
invariants of its evolution.
- P. Krapivsky and D. Krioukov
Scale-Free Networks as Pre-Asymptotic Regimes of Super-Linear Preferential Attachment
Here we show that scale-free networks, including the Internet, may
exist in pre-asymptotic regimes of network evolution processes that
do not have any scale-free, self-similar structures as their
asymptotes.
Specifically, we study the Positive Feedback Preference model, which
is a growing network model that, among growing models, reproduced
the AS topology of the Internet most accurately according to a long
list of topological characteristics. However, the model is based on
super-liner preferential attachment, which is known to lead to not
scale-free but star-like graphs in the limit of large graph sizes.
We resolve this paradox by showing that the model has a vast
pre-asymptotic regime, in which it is capable, according to our
analysis confirmed by extensive simulations, produce scale-free
networks. The asymptotic, star-like graph regime starts, given the
specific set of model parameters, only at network sizes of the order
of 1017.
These findings have potential to impact both practice and theory of
network science. In practice, if we want to construct topology
generators of growing Internet-like networks, we no longer have to
limit ourselves to mechanisms, such as preferential attachment, that
quickly, for small network sizes, achieve power laws as their
asymptotes. In theory, the research area on what growth mechanisms
produces scale-free networks not in their asymptotic but
pre-asymptotic regimes is virtually untouched.
|
|