Probabilistic models for real world networks

Networks

Networks, or complex interconnected systems that work together to generate macroscopic functions are amongst the most important facets of daily. Bhamidi’s research is motivated by probabilistic and statistical aspects of such systems, including understanding how such systems change and evolve over time, properties of various dynamics such as random walks, epidemic models and models for diffusion of information through such networks. Let me now describe my work,  some of current projects and interests.

  • Understanding the effect of edge costs in routing traffic through networks:

Internet routing tree of paths constructed by the Opte project

   Many of the networks we deal with in practice (for example the Internet or the power grid)carry flow (bits, physical quantities) through the network. The manner in which such systems transmit information through the network depends not just on topological (graph) structure of the network, but on the costs (congestion costs, physical distances etc) of traversing edges in the network. Bhamidi has been heavily involved in understanding such questions in the context of  mathematical models of real world networks (such as the configuration model on n vertices) largely with Bhamidi’s collaborators. Dr. Remco van der Hofstad and Dr. Gerard Hooghmiestra. We have been able to analyze properties of the optimal routing path between typical vertices in such idealized models. In particular, it turns out that for wide variety of such models, under some basic assumptions on the edge weight distribution, the hopcount (number of edges on the optimal path) scales like \log{n}  and in fact satisfy central limit theorems even when the degree exponent of the model \tau \in (2,3), where the graph distance scales like \log\log n. Thus the presence of edge costs fundamentally alters the geometry of the network. A rigorous analysis of such questions lead to surprisingly deep connections to stable age distribution theory and Malthusian rate of growth of age dependent branching processes, studied deeply by Jagers and Nerman in the 70s and early 80s. Typically one sees two regimes, the central limit regime mentioned above and a “localization” regime where only really short paths matter. In the second regime, probabilistic techniques such as extreme value theory and the Poisson Dirichlet distribution and Stein’s method for Poisson approximation play a major role.

  •  Understanding sources of variation in blood vascular networks:

Vascular trees from UNC CASILab

The medical school at UNC (see CASILab) has data on the vascular networks in the brain on a large number of subjects. One of the aims of the Lab is to understand statistical causes of variation in such vascular networks. Prof. Steve Marron in the statistics and operations research department has developed a number of  techniques to extend classical techniques such as PCA to non-euclidean settings such as this to quantify the sources of variation. Along with Steve Marron, Haipeng Shen and Dan Shen Bhamid has been exploring the applications of sophisticated combinatorial and probabilistic techniques used in the modern analysis of random tree models to answer such questions.

  • Critical random graphs and coagulation models: A large number of random graph models have been developed to model real world systems. Many of these models of networks on n vertices have a parameter \nu such that when the connectivity in the network is less than \nu, the size of the largest connected component is o(n). Understanding exactly how this giant component emerges via the merger of smaller components (technically understanding the system at criticality and the critical scaling window) turns out to have deep connections to stochastic processes such as reflected inhomogeneous brownian motion, as well as the entrance boundary of Markov processes. At the same time, such systems rigorously define coalescent models that are of great interest to practitioners in the physical sciences. Building on Aldous’s work on the analysis of the critical Erdos-Renyi random graph, with Remco van der Hofstad and Johan van Leeuwarden, Bhamidi has extended these results to a very popular random network model (the so called rank-1 inhomogeneous random graph). One of the surprising aspects of this model is that in the critical regime, there is a difference in the system when the degree exponent \tau>4 and \tau <4 which is very different from the behavior of various other dynamic processes such as first passage percolation, where the transition occurs when the degree exponent \tau < 3 and \tau>3. Along with Amarjit Budhiraja and Xuan Wang, we have also analyzed the emergence of the giant component in the Bohman-Frieze model, one of the most famous examples of how limited choice in the construction of a network can delay the emergence of the giant component.  For a rigorous analysis of such questions, one needs a wide variety of techniques, ranging from breadth first explorations of graphs, the barely subcritical regime, as well as the entrance boundary of the multiplicative coalescent.
  • MCMC algorithms in social networks: 

My Facebook friend network in 2009

Many social networks for example facebook) show a high degree of clustering, reflecting the natural property that if A is a friend of B and C then it is quite likely that B and C are friends as well. Such properties are reflected by the large number of triangles in the network. To statistically model such networks, social scientists typically use the exponential random graph model which is a probability measure on the space of graphs on the space of n vertices with a particular structure. To consider a simple example, for any graph G let E(G) and T(G) be the number of edges and triangles respectively in the graph. Now for fixed constants \beta_0, \beta_1 consider the probability measure:

\mathbb{P}(G)\propto \exp(\beta_0 E(G)+\beta_1 T(G))

If \beta_1> 0 then this gives higher mass to graphs with more triangles. Typically one needs MCMC techniques to generate such models and do infirence and estimation procedures with them. Understanding when these MCMC techniques converge has been one of the core problems in this area. Along with Allan Sly and Guy Bresler, Bhamidi derived conditions on the parameters arising in such models that guarantee that such MCMC algorithms mix quickly (as well as when it takes an exponentially long time for such algorithms to converge to the stationarity).