k-clique percolation and clustering in directed and weighted networks
Transcripción
k-clique percolation and clustering in directed and weighted networks
Introduction Directed communities Weighted communities k -clique percolation and clustering in directed and weighted networks Gergely Palla1 Dániel Ábel2 Imre Derényi2 Illés Farkas1 Péter Pollner1 Tamás Vicsek1,2 1 Statistical and Biological Physics Research Group, Hungarian Acedmy of Sciences, 2 Department of Biological Physics, Eötvös University, Hungary March 2008, Louvain-la-Neuve Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Outline Introduction The Clique Percolation Method (CPM) Phase transition in the Erdős-Rényi graph Directed communities Relative in- and out degree Directed CPM Results Weighted communities Weights in the original CPM Weighted CPM Results Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The Clique Percolation Method (CPM) Definitions k -clique: a complete (fully connected) subgraph of k vertices. k -clique adjacency: two k -cliques are adjacent if they share k − 1 vertices, i.e., if they differ only in a single node. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The Clique Percolation Method (CPM) Definitions k -clique: a complete (fully connected) subgraph of k vertices. k -clique adjacency: two k -cliques are adjacent if they share k − 1 vertices, i.e., if they differ only in a single node. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The Clique Percolation Method (CPM) Definitions k -clique: a complete (fully connected) subgraph of k vertices. k -clique adjacency: two k -cliques are adjacent if they share k − 1 vertices, i.e., if they differ only in a single node. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph CPM (continued) Definition k -clique community: the union of k -cliques that can be reached from one to the other through a sequence of adjacent k -cliques. Illustration: same at k = 4: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph Advantages of the CPM The main advantages of the CPM: Allows overlaps between the communities. The definition is based on the density of the links. It is local. (No resolution limit). Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The order parameters The Erdős-Rényi graph: N nodes, every pair is independently linked with probability p. A giant k -clique percolation cluster can be found if p ≥ pc (k ). The order parameter of the phase transition is the size of the giant cluster: The number of nodes, N ∗ −→ Φ ≡ N ∗ /N, ∗ The number of k -cliques, N −→ Ψ ≡ N ∗ /N . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The order parameters The Erdős-Rényi graph: N nodes, every pair is independently linked with probability p. A giant k -clique percolation cluster can be found if p ≥ pc (k ). The order parameter of the phase transition is the size of the giant cluster: The number of nodes, N ∗ −→ Φ ≡ N ∗ /N, ∗ The number of k -cliques, N −→ Ψ ≡ N ∗ /N . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph The order parameters The Erdős-Rényi graph: N nodes, every pair is independently linked with probability p. A giant k -clique percolation cluster can be found if p ≥ pc (k ). The order parameter of the phase transition is the size of the giant cluster: The number of nodes, N ∗ −→ Φ ≡ N ∗ /N, ∗ The number of k -cliques, N −→ Ψ ≡ N ∗ /N . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities CPM Phase transition in the Erdős-Rényi graph Results Numerical results: 1 1 0.8 Φ 0.6 0.4 0.2 0 0.2 k=4 k=4 0.8 N=100 N=200 N=500 N=1000 N=2000 N=5000 0.4 0.6 Ψ N=100 N=200 N=500 N=1000 N=2000 N=5000 0.6 0.4 0.2 0.8 1 p/pc(k ) 1.2 1.4 1.6 0 0.2 1.8 1 pc (k ) = 1 0.6 . [N(k − 1)] k −1 Vicsek group 1 p/pc(k ) Directed and weighted communities 1.4 1.8 Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Directed links Direction of the links: Direction of some kind of flow (e.g. information, energy). Asymmetrical relation (e.g. superior-inferior). Out-hubs in communities represent “sources”, whereas in-hubs correspond to “drains”: Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Directed links Direction of the links: Direction of some kind of flow (e.g. information, energy). Asymmetrical relation (e.g. superior-inferior). Out-hubs in communities represent “sources”, whereas in-hubs correspond to “drains”: source/top drain/bottom Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Directed links Direction of the links: Direction of some kind of flow (e.g. information, energy). Asymmetrical relation (e.g. superior-inferior). Out-hubs in communities represent “sources”, whereas in-hubs correspond to “drains”: source/top interface Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Relative in- and out-degree We define the relative in-degree and relative out-degree of node i in community α as α Di,in ≡ α Di,out ≡ α di,in α di,in , α + di,out α di,out , α α di,in + di,out For weighted networks these can be replaced by the relative in-strength and relative out-strength: α Wi,in ≡ α wi,in , α α wi,in + wi,out α Wi,out ≡ α wi,out , α α wi,in + wi,out Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Relative in- and out-degree We define the relative in-degree and relative out-degree of node i in community α as α Di,in ≡ α Di,out ≡ α di,in α di,in , α + di,out α di,out , α α di,in + di,out For weighted networks these can be replaced by the relative in-strength and relative out-strength: α Wi,in ≡ α wi,in , α α wi,in + wi,out α Wi,out ≡ α wi,out , α α wi,in + wi,out Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Directed k -cliques? Comparing undirected and directed connections: A B A B A B A B In case of k -cliques: k (k − 1)/2 links −→ 3k (k −1)/2 possible configurations. However, we would like the k -clique to have some kind of directionality as a whole as well. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Directed k -cliques? Comparing undirected and directed connections: A B A B A B A B In case of k -cliques: k (k − 1)/2 links −→ 3k (k −1)/2 possible configurations. However, we would like the k -clique to have some kind of directionality as a whole as well. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Definition A directed k -clique has to fulfil the following conditions: In the absence of double links: Any directed link in the k -clique points from a node with a higher order (larger restricted out-degree) to a node with a lower order. The k -clique contains no directed loops. The restricted out-degree of each node in the k -clique is different. If double links are present: It is possible to eliminate the double links in such a way that the single links fulfil the above conditions. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Definition A directed k -clique has to fulfil the following conditions: In the absence of double links: Any directed link in the k -clique points from a node with a higher order (larger restricted out-degree) to a node with a lower order. The k -clique contains no directed loops. The restricted out-degree of each node in the k -clique is different. If double links are present: It is possible to eliminate the double links in such a way that the single links fulfil the above conditions. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Definition A directed k -clique has to fulfil the following conditions: In the absence of double links: Any directed link in the k -clique points from a node with a higher order (larger restricted out-degree) to a node with a lower order. The k -clique contains no directed loops. The restricted out-degree of each node in the k -clique is different. If double links are present: It is possible to eliminate the double links in such a way that the single links fulfil the above conditions. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Definition A directed k -clique has to fulfil the following conditions: In the absence of double links: Any directed link in the k -clique points from a node with a higher order (larger restricted out-degree) to a node with a lower order. The k -clique contains no directed loops. The restricted out-degree of each node in the k -clique is different. If double links are present: It is possible to eliminate the double links in such a way that the single links fulfil the above conditions. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Definition A directed k -clique has to fulfil the following conditions: In the absence of double links: Any directed link in the k -clique points from a node with a higher order (larger restricted out-degree) to a node with a lower order. The k -clique contains no directed loops. The restricted out-degree of each node in the k -clique is different. If double links are present: It is possible to eliminate the double links in such a way that the single links fulfil the above conditions. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Illustration a) 3 b) NO 1 0 c) YES contains double links? 2 3 d) 1 (or 2) 2 (or 1) 0 YES NO directed k−clique? Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Phase transition in the directed E-R graph The directed E-R graph: N nodes, The N(N − 1) possible “places” for the directed links are filled independently with probability p. Theoretical prediction of the critical point for the appearance of a giant directed k -clique percolation cluster: pctheor = 1 1 . [Nk (k − 1)] k −1 Order parameters: Φ, Ψ (same as in the undirected case). Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Phase transition in the directed E-R graph The directed E-R graph: N nodes, The N(N − 1) possible “places” for the directed links are filled independently with probability p. Theoretical prediction of the critical point for the appearance of a giant directed k -clique percolation cluster: pctheor = 1 1 . [Nk (k − 1)] k −1 Order parameters: Φ, Ψ (same as in the undirected case). Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Numerical results 1 1 (a) (b) 0.8 0.8 k=4 0.6 0.4 N = 25 50 100 200 400 800 1600 Ψ(p) Φ(p) k=4 0.2 0.6 0.4 N = 50 100 200 400 800 1600 0.2 0 0 0.6 0.8 1 1.2 1.4 1.6 1.8 0.6 0.8 theor 1.7 (c) (d) 1.6 k=3 4 5 6 1.6 N = 25 50 100 200 400 800 1600 theor k=4 num / pC pCnum pC χ(p) 1.4 1.8 0.3 0.1 1.2 p / pC 0.4 0.2 1 theor p / pC 1.5 1.4 1.3 1+cN-α fit: α = 0.49(2) 0.49(1) 0.48(2) 0.45(2) 1.2 1.1 1 0 0.6 0.8 1 1.2 1.4 1.6 1.8 p / pCtheor Vicsek group 1e-04 0.001 1/N 0.01 Directed and weighted communities 1.8 Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Word association network Local picture of the communities: PRESTIGE (1.0) SUCCESS (0.46) BRONZE (0.95), (1.0) BRASS (0.87) TARNISH (0.93) FAME (0.89) PROSPER (0.84) WEALTH (0.60) POVERTY (1.0) FORTUNE (0.56) OLYMPICS (0.79) METAL (0.22.) POOR (0.28) GOLD (0.29), (0.0), (0.73), (0.15) LUXURY (0.91) MANSION (1.0) TREASURE (1.0) SILVER (0.39) COPPER (0.63) RICH (0.31) MEDAL (0.54) SUCCEED (1.0) MONEY (0.0),(0.06) VALUABLE (0.80) DIAMOND (0.25) BRACELET (0.80) LIMOUSINE (1.0) EXPENSIVE (0.14) NECKLACE (0.55) JEWELRY (0.43) PRICELESS (0.93) JEWEL (0.67) RING (0.13) EARRING (0.87) STONE (0.0) PEARL (0.81) SAPPHIRE (1.0) GEM (0.71) α Wout 1 0.8 0.6 0.4 0.2 0 EMERALD (0.86) RUBY (0.60) Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Relative out-degree and number of hits α : The number of hits in Google as a function of Wi,out 1e+10 number of hits 1e+09 1e+08 1e+07 1e+06 1e+05 0 0.2 Vicsek group 0.4 Wi,αout 0.6 0.8 1 Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Google’s on web pages Local picture of the communities: enterprise/maps (0.83) enterprise/government (1.0) enterprise addurl (0.83) enterprise/whygoogle.html (0.50) enterprise/news_events.html (0.43) enterprise/gsa/onebox.html (0.64) enterprise/support.html (0.43) enterprise/gep (0.43) services/websearch.html (1.0) enterprise/apps (0.91) enterprise (0.52) enterprise/customers.html (0.43) enterprise/gsa (0.43) psearch (0.17) privacy.html (0.17) + accounts/Login (0.83) help/faq_accounts.html (0.54) accounts jobs newsalerts (0.56) jobs/working.html (0.50) jobs/international.html (0.67) (https) accounts (0.60) alerts (0.80) accounts/ServiceLogin (0.83.) accounts (0.89) about.html (0.33), (0.06) sitemap.html (1.0), (0.83) jobs (0.50) terms_of_service.html (0.18) accounts/NewAccount (0.67) enterprise/mini (0.39) + www.google.com (0.0),(0.08),(0.0) accounts/ForgotPasswd (0.67) Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Relative in- and out-degree Directed CPM Results Comparing overlaps α : Membership number in function of Di,out Google’s own pages word associations email network yeast transc. reg. 9 avg. # of modules of a node 8 3 7 6 2 5 4 1 3 0 0.5 1 2 1 0 0.2 0.4 0.6 dout / ( din + dout ) 0.8 1 Community overlaps: word association net, Google’s web pages −→ in-hubs, e-mail net, transcription regulatory network −→ out-hubs. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Link weights in the original CPM In the original CPM we can take into account the weights by ignoring links weaker than a certain threshold w ∗ . Changing w ∗ and k is similar to changing the resolution in a microscope. Optimal k -clique size and w ∗ Where the community structure is as highly structured as possible: just below the critical point of the appearance of a giant k -clique community. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Link weights in the original CPM In the original CPM we can take into account the weights by ignoring links weaker than a certain threshold w ∗ . Changing w ∗ and k is similar to changing the resolution in a microscope. Optimal k -clique size and w ∗ Where the community structure is as highly structured as possible: just below the critical point of the appearance of a giant k -clique community. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Link weights in the original CPM In the original CPM we can take into account the weights by ignoring links weaker than a certain threshold w ∗ . Changing w ∗ and k is similar to changing the resolution in a microscope. Optimal k -clique size and w ∗ Where the community structure is as highly structured as possible: just below the critical point of the appearance of a giant k -clique community. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results k -clique intensity The intensity I of a sub-graph is defined as the geometrical mean of its link weights. 0 12/k /(k −1) BY C wij C For a k -clique C: I(C) = B A @ i<j i,j∈C Weighted k -clique A k -clique with an intensity greater or equal to a given intensity threshold I ∗ . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Percolation transition in the E-R graph A weighted E-R graph: N nodes, every pair is linked independently with uniform probability p, each link is assigned a weight chosen randomly from a uniform distribution on the (0, 1] interval. The critical linking probability is a function of the intensity threshold. At I = 0 we recover pc (I = 0) = [N(k − 1)]−1/(k −1) . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Percolation transition in the E-R graph A weighted E-R graph: N nodes, every pair is linked independently with uniform probability p, each link is assigned a weight chosen randomly from a uniform distribution on the (0, 1] interval. The critical linking probability is a function of the intensity threshold. At I = 0 we recover pc (I = 0) = [N(k − 1)]−1/(k −1) . Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Results Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results NYSE graph New York Stock Exchange graph: We studied the pre-computed stock correlation matrix containing the averaged correlation between the daily logarithmic returns. The correlation coefficients can be used as link weights. We kept only the strongest 3%. Vicsek group Directed and weighted communities Introduction Directed communities Weighted communities Weights in the original CPM Weighted CPM Results Summary Directed communities: Relative in- and out-degree, Directed k -cliques. Weighted communities: k -clique intensity. Publications: New Journal of Physics 9, 180 (2007), New Journal of Physics 9, 186 (2007). Downloadable community finding software: http://cfinder.org Vicsek group Directed and weighted communities