Efficient Algorithms towards Network Intervention

Research suggests that social relationships have substantial impacts on individuals' health outcomes. Network intervention, through careful planning, can assist a network of users to build healthy relationships. However, most previous work is not designed to assist such planning by carefully examining and improving multiple network characteristics. In this paper, we propose and evaluate algorithms that facilitate network intervention planning through simultaneous optimization of network degree, closeness, betweenness, and local clustering coefficient, under scenarios involving Network Intervention with Limited Degradation - for Single target (NILD-S) and Network Intervention with Limited Degradation - for Multiple targets (NILD-M). We prove that NILD-S and NILD-M are NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP. We propose the Candidate Re-selection with Preserved Dependency (CRPD) algorithm for NILD-S, and the Objective-aware Intervention edge Selection and Adjustment (OISA) algorithm for NILD-M. Various pruning strategies are designed to boost the efficiency of the proposed algorithms. Extensive experiments on various real social networks collected from public schools and Web and an empirical study are conducted to show that CRPD and OISA outperform the baselines in both efficiency and effectiveness.


INTRODUCTION
Previous studies have shown the importance and strengths of social relationships in influencing individual behaviors. Strong social relationships have been shown to facilitate the dissemination of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. WWW '20, April 20-24, 2020, Taipei, Taiwan © 2020 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. https:// doi. org/ 10. 1145/ nnnnnnn. nnnnnnn information, encourage innovations, and promote positive behavior [44]. Also, social relationships surrounding individuals have substantial impacts on individuals' mental and physical health [12,17]. For example, studies in Science and American Sociological Review indicate that socially isolated individuals are more inclined to have mental health problems and physical diseases, ranging from psychiatric disorders to tuberculosis, suicide, and accidents [13,17].
To alleviate the problems that arise with social isolation, two classes of intervention strategies may be adopted, including: 1) personal intervention, which guides individuals to understand their situations, attitudes, and capacities through counseling [9]; and 2) network intervention, which emphasizes the need to strengthen individuals' social networks to accelerate behavior changes that can lead to desirable outcomes at the individual, community, or organizational level [12,17]. As an example, network intervention that helps establish new social links is crucial for individuals with autism spectrum disorders [6]. Those new links may be effectively introduced by curative groupings and network meetings -events which encourage them to socialize more frequently [11,14].
In order for network intervention to improve individual health outcomes, it is crucial to add social links in ways that promote network characteristics found to be related to positive outcomes. Thus, an important question to ask is: given an individual, what properties define the strength of the individual's network? Previous studies point out several possibilities: 1) Degree indicates the number of established friendships. An individual with a large degree is more popular and has more opportunities to establish self-identity and social skills [43]. Thus, a large-degree individual is less inclined to be socially-isolated and have mental health problems [13]. 2) Closeness indicates the inverse of the average social distance from an individual to all others in the network. An individual with great closeness is typically located in the center of the network and tend to perceive a lower level of stress [18]. 3) Betweenness indicates the tendency of an individual to fall on the shortest path between pairs of other individuals. An individual with large betweenness tends to occupy brokerage positions in the network and have more knowledge of events happening in the network. Thus, an individual with large betweenness may also perceive social relationships more accurately [24]. 4) Finally, Local clustering coefficient (LCC) indicates the diversity of relationships within an individual's ego network [12]. Specifically, an individual with a small LCC tends to have diverse relationships since her friends are less likely to be acquainted with each other [12]. The perspective that individuals with small LCCs are less likely to have mental health problems has been postulated in the functional specificity theory, which advocates the need for having different support groups for distinct functions e.g., by obtaining attachment from families or friends, social integration from social activity groups, and guidance from colleagues [50]. In other words, individuals with large LCCs tend not to build a diverse social network by putting all their relationship eggs in a few baskets, and tend to have depressive symptoms and neurological illnesses [12]. This relationship has been validated in studies with participants across various cultures and ages. In a study involving 173 retired US elders, higher LCC is found to be associated with lower life satisfaction, self-esteem, happiness, and higher depression [52]. In another study involving 2844 high school students, higher LCC again is associated with lower self-esteem [42].
In practice, specialists and practitioners may not have the time or resources to provide frequent and ongoing relationship recommendations to every individual. Also, the recommendations made by persons are susceptible to their subjective biases. As such, supplemental, objective information from automated network planning algorithms that can simultaneously optimize multiple network characteristics are helpful and valuable. This paper aims to develop novel algorithms that recommend suitable intervention links based on multiple potentially health-enhancing network characteristics. However, adding social links is not always straightforward for network characteristics. Of the characteristics noted earlier, the degree for each individual can be improved by adding more edges, in which case the closeness and betweenness of the network are also enhanced. However, improving the LCC is more challenging as the LCCs of individuals and nearby friends may not always improve, and they can even deteriorate, when more edges are added.
Selecting good intervention links based on the LCC is further deterred by other challenges. A new link established for a targeted individual may increase the LCCs of her friends, when those new friends are acquainted with each other. Figure 1 presents an example showing the side-effects of adding improper new links on the LCC. Figure 1(a) shows a social network in which the nodes are annotated by their initial LCCs, and Figure 1(b) presents the network after adding an edge from B to G. The LCC of B is effectively reduced to 0.5, but the LCC of C unfortunately grows to 0.66. The example shows that heuristic or uninformed selection of intervention links by specialists and practitioners may lead to undesirable changes in the LCC; even worse, the undesirable changes may happen to many nearby individuals when the network size is large. Moreover, even though a simple way to decrease LCC is to remove some existing social links, this approach is not considered because removal of existing social links undermines established social support [41].
In this paper, we propose and test several algorithms that can simultaneously optimize network LCC, closeness, betweenness, and degree. We first formulate a new problem, namely, Network Intervention with Limited Degradation -for Single target (NILD-S). Given a budget of k, a threshold τ , and a target t, NILD-S finds the set F of k intervention edges to minimize the LCC of t, such that the side effect, in terms of increment in anyone's LCC, cannot exceed τ . NILD-S also ensures that the degree, betweenness and closeness of t exceed given thresholds. We propose Candidate Re-selection with Preserved Dependency (CRPD) algorithm, which first obtains an initial solution by extracting the individuals with the smallest degrees, and improves the initial solution by re-examining the candidates filtered out by nodes involved in the solution. Note that CRPD selects edges according to multiple characteristics. We prove that NILD-S is NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP. Nevertheless, we prove that CRPD can find the optimal solution for threshold graphs, which are very similar to many well-known online social networks regarding many measurements like degree distribution, diameter and clustering coefficient [27,32,46].
Finally, we seek to extend the NILD-S to simultaneously improve the LCCs of multiple individuals while ensuring other network characteristics, including betweenness, closeness, and degree. To do so, we formulate the Network Intervention with Limited Degradation -for Multiple targets (NILD-M) problem to jointly minimize the LCCs of multiple targets. Given the aforementioned k, τ , and the set of targets T , NILD-M finds the set F of k intervention edges such that the maximal LCC of individuals in T is minimized, while the LCC increment of any person does not exceed τ . NILD-M also ensures that the degree, betweenness, and closeness of all targetes exceed their minimum thresholds. We prove that NILD-M is NPhard and cannot be approximated within any ratio in polynomial time unless P=NP. To solve NILD-M, we design Objective-aware Intervention edge Selection and Adjustment (OISA), which 1) carefully examines both the LCC of each terminal and the network structure to ensure the constraint of τ , 2) explores the idea of optionality to improve the solution quality, and 3) derives the lower bound on the number of required edges and the LCC upper bounds to effectively reduce computational time. Also, we evaluate OISA via an empirical study on four psychological outcomes, anxiety, perceived stress, positive and negative emotions, and psychological well-being.
The contributions are summarized as follows.
• Previous research has suggested the use of network intervention in improving health outcomes. With the potential to increase the support network by new acquaintances, however, there is no effective planning tool for practitioners to select suitable intervention edges. We formulate NILD-S and NILD-M to address this critical need for identifying suitable intervention links for a single target and a group of targets, while considering multiple network characteristics. • We prove that NILD-S is NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP. We propose CRPD and prove that CRPD obtains the optimal solution for threshold graphs. • We prove that NILD-M is NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP and design OISA for NILD-M.
• Experiments on real datasets show that the proposed CRPD and OISA efficiently find near-optimal solutions for NILD-S and NILD-M and outperform the baselines. Also, an empirical study assessed by clinical psychologists and professors in the field manifests that the network intervention alleviates self-reported health outcomes of participants, and the effects are statistically significant over another control group.
The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 formulates NILD-S, analyzes its theoretical hardness, and proposes CRPD. Section 4 formulates NILD-M and analyzes its theoretical hardness. Section 5 proposes OISA. Section 6 reports the experiments. Finally, Section 7 concludes the paper.

RELATED WORK
The theory of network intervention has been studied in the fields of psychology, behavioral health, and education for lowering negative emotions by enhancing social integration, support, engagement, and attachment [3]. It has also been adopted for family therapy and bullying avoidance [14]. In education, network intervention has been implemented to facilitate knowledge dissemination among students, thereby improving student learning [45]. Under current practice, new intervention links are typically selected heuristically by practitioners [23,35]. However, it is very challenging to consider multiple persons simultaneously without deteriorating the status of surrounding individuals. Thus, it would be worthwhile to develop algorithms for this important need.
In the field of social network analysis, researchers have paid considerable attention to efficiently finding the number of triangles [25,28] and selecting a group of individuals with the maximum or minimum number of triangles [34]. Notice that the abovementioned research mostly focuses on measuring structural properties of nodes in static or dynamic networks, with no intention to tailor and change the network graph. Recently, a new line of research in network science has emerged with the objective of revising a network graph according to specific network characteristics. These include maximizing the closeness centrality, betweenness centrality and influence score, minimizing the diameter, and enhancing the network robustness [10,30,51]. However, these algorithms do not include LCC as a target network characteristic for intervention purposes. Importantly, none of these algorithms was designed to optimize multiple network characteristics simultaneously.
Recently, owing to the success of online social networks, reported cases of social network mental disorders have increased, motivating new collaborations between data scientists and mental health practitioners. New machine learning frameworks have been shown to be helpful in identifying patients tending to be vulnerable, and even have clinical levels of negative emotions and unhealthy living [31,[36][37][38]. However, those are not designed for network intervention, which actually changes the network graph. Finally, link prediction [1,4,7,15,33,48,53] has been widely studied. Existing algorithms usually recommend individuals sharing many common friends and similar interests to become friends. However, they are not designed for network intervention, which

INTERVENTION FOR A SINGLE TARGET
In this section, we first reduce the Local Clustering Coefficient (LCC) of a targeted individual (denoted as t) by selecting a set of people from the social network to become friends with t. Given a social network G = (V , E) (or G for short), where each node ∈ V denotes an individual, and each edge (i, j) ∈ E represents the social link between individuals i and j, the ego network of an individual is the subgraph induced by and its neighbors N G ( ). The LCC of a node in G, LCC G ( ), is defined as the number of edges between the nodes in N G ( ) divided by the maximum number of possible edges among the nodes in N G ( ), where d G ( ) = |N G ( )| and C(d G ( ), 2) is the number of combinations to choose two items from d G ( ) ones. Adding social links may increase LCC of other nodes not incident to any new edge. However, the increment of LCC for healthy people also needs to be carefully controlled. 1 In addition to LCC, it is also important to ensure that the degree, betweenness, and closeness are sufficiently large. Therefore, we formulate the Network Intervention with Limited Degradation -for Single target (NILD-S) problem as follows.

D
1. Given a social network G = (V , E) (or G for short), the target t, the number k of intervention edges to be added, the LCC degradation threshold τ , the lower bounds on betweenness, closeness and degree ω b , ω c , and ω d , NILD-S minimizes the LCC of t by adding a set F of k edges incident to t, such that in the new network G, 1) , c G (t) and d G (t) are the betweenness, closeness, and degree of t in G.
NILD-S is computationally expensive. We prove that it is NPhard and inapproximable within any ratio, i.e., there is no approximation algorithm with a finite ratio for NILD-S unless P=NP. However, later we show that NILD-S is tractable for threshold graphs, which share similar graph properties with many well-known online social networks, e.g., Live-Journal, Flickr, and Youtube [27,32,46]. 1. NILD-S is NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP.

P
. We prove the NP-hardness by the reduction from the Maximum Independent Set (MIS) problem under triangle-free graphs (i.e., a graph without any three nodes forming a triangle) [26]. Given a triangle-free graph G M = (V M , E M ), MIS is to find the largest subset of nodes S M ⊆ V M , such that every node in S M is not adjacent to any other nodes in S M . For each instance of MIS, we construct an instance G = (V , E) of NILD-S as follows. For each node ′ ∈ V M and edge (i ′ , j ′ ) ∈ E M , we create the corresponding node ∈ V and edge (i, j) ∈ E, respectively. Also, we add a node t as the targeted node and set τ , ω b , ω c , and ω d as 0. In the following, we prove that G M = (V M , E M ) has an independent set S M with size k in MIS if and only if the LCC of t in G = (V , E) remains as 0 after adding (t, ) for every ′ ∈ S M . We first prove the sufficient condition. If G M = (V M , E M ) has an independent set S M of size k, then there is no edge between any two nodes in S M . Thus, if we add an edge ( , t) for each node ′ ∈ S M in G(V , E), the LCC of t is still 0. We then prove the necessary condition. If there is a set S of k nodes such that t's LCC remains as 0 after adding ( , t) to E, then there exists no edge among t's neighbors, i.e., S. Therefore, S M with the corresponding nodes in S is an independent set.
Next, we prove that NILD-S cannot be approximated within any ratio in polynomial time unless P=NP by contradiction. Assuming that there exists a polynomial-time algorithm with solution lcc to approximate NILD-S with a finite ratio ro for a triangle-free G = (V , E), i.e., the LCC of the optimal solution is at least lcc/ro. If lcc = 0, there is an independent set with size k. If lcc > 0, the LCC of the optimal solution is at least lcc/ro > 0, and there is no k-node independent set. Thus, the approximation algorithm for NILD-S can solve MIS in polynomial time by examining lcc, contradicting that MIS is NP-hard [26].

The CRPD Algorithm
For NILD-S, a simple approach is to iteratively choose a node u, add (t, u) into F , and eliminate (i.e., does not regard it as a candidate in the future) every neighbor r of u if adding both (t, u) and (t, r ) into F would increase the LCC of any node for more than τ (called the LCC degradation constraint). However, the above approach does not carefully examine the structure among the neighbors of t. The selection of u is crucial, because it may become difficult to choose its neighbors for connection to t later due to the LCC degradation constraint. Therefore, a simple baseline is to extract the u with the smallest degree and add (t, u) to F , because such u tends to result in the least number of neighbors removed from the pool of candidates for connecting to t. It removes those neighbors r of u if adding (t, u) increases LCC of any individual by more than τ . However, Example 1 indicates that a good candidate r may be improperly removed due to a small LCC.

E
1. Figure 2 shows an example of NILD-S with 16 nodes, where t = 1 , k = 3, τ = 0.05, ω b = 0.5, ω c = 0.5, and ω d = 4. Note that all edges in Figure 2 are edges in E regardless of their colors. The baseline first selects 5 (i.e., adding ( 1 , 5 ) to F ) since it has the smallest degree among all nodes not connected to 1 . Then, for the neighbor 4 of 5 , adding ( 1 , 4 ) to F = {( 1 , 5 )} does not while |F | < k do 10: Choose u from C according to d G (u) and Eq. 2 11: if ∃i ∈ C s.t. add (t, i) violates the τ constraint then 12: increase LCC of any node to more than τ . Thus, 4 is still a valid candidate. 2 However, after choosing 5 , the baseline excludes 6 and Motivated by Example 1, we propose the Candidate Re-selection with Preserved Dependency (CRPD) algorithm for NILD-S. CRPD includes two components: 1) Removed Node Re-selection Strategy, and 2) Multi-measurement Integration Selection Strategy, as follows. A pseudocode of CRPD is shown in Algorithm 1.
Removed Node Re-selection (RNR) Strategy. To avoid missing good candidates when processing each node u in the above baseline, CRPD first extracts R during the process, where each r in R is a neighbor of u removed by the baseline due to the LCC degradation constraint, i.e., including both (t, u) and (t, r ) in F increases any node's LCC for more than τ . CRPD improves the above baseline by conducting a deeper exploration that tries to replace (t, u) by (t, r m ), where r m is the node with the minimum degree in R, if selecting (t, r m ) instead removes fewer neighbors and obtains a better solution later. In Example 1, adding ( 1 , 5 ) to F removes its neighbors 6 and 8 because including } increases the LCCs of 6 or 8 by more than τ , respectively. In contrast, adding ( 1 , 6 ) (instead of ( 1 , 5 )) to F only removes 5 and obtains a better solution 14 )} of the baseline. With the above deeper inspection, later we prove that CRPD can find the optimal solution of NILD-S in threshold graphs.

Multi-measurement Integration Selection Strategy (MISS).
To address ω d , the degree of t can be examined according to k +d G (t) when more edges are included. However, when the betweenness and closeness of t are smaller than ω b and ω c , respectively, CRPD selects u as follows to improve the betweenness and closeness of t, when multiple nodes available to be chosen as u.
where b G (u) and c G (u) denote the betweenness and closeness of u, and w b and w c are the weights of betweenness and closeness in selecting u. w b (or w c ) is derived by computing the difference of t's betweenness and ω b (or t's closeness and ω c ) when t's betweenness (or closeness) is smaller than ω b (or ω c ). Otherwise, w b (or w c ) is set as 0.

Solution Quality in Threshold Graph
We prove that CRPD obtains the optimal solution for threshold graphs, which are similar to many well-known online social networks in terms of important properties like the scale-free degree distribution, short diameter, and clustering coefficient [27,32,46].
The CRPD algorithm can find an optimal solution of NILD-S when G = (V , E) is a threshold graph, and the running time of CRPD is O(|E|d + kd |V |), whered is the maximum degree in G = (V , E).
With the property of threshold graphs, we prove that CRPD always finds a feasible solution, and the obtained feasible solution is one of the optimal ones in Appendix A.1.

INTERVENTION FOR MULTIPLE TARGETS
We formulate the Network Intervention with Limited Degradationfor Multiple targets (NILD-M) problem and show the NP-hardness. 3. Given a social network G = (V , E) (or G for short), the number of k intervention edges to be added, the LCC degradation threshold τ , the lower bounds on betweenness, closeness and degree ω b , ω c , and ω d , and a set of targeted individuals T , the NILD-M problem minimizes the maximal LCC among all nodes in T , i.e., max t ∈T LCC G (t), by selecting a set of k intervention edges F , such that in the new network G, and d G (t) are the betweenness, closeness, and degree of t in G. Corollary 1 follows since NILD-S, the special case of NILD-M when |T | = 1, is NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP in Theorem 1.

THE OISA ALGORITHM
A naïve approach for NILD-M is to exhaustively search every kedge set spanning the nodes in T . However, the approach is not scalable as shown in Section 6. In the following, we first present two baseline heuristics, Budget Utility Maximization (BUM) and Surrounding Impact Minimization (SIM) for NILD-M. Budget Utility Maximization (BUM). To intervene the maximal number of individuals within T , BUM repeatedly selects the node u having the largest LCC without contradicting any constraints and connects u to the node m with the largest LCC in T until k edges are selected. Surrounding Impact Minimization (SIM). Without considering the proximity between m and u, adding (m,u) sometimes increases the LCCs of their common neighbors. To avoid the above situation, SIM chooses the u with the maximum number of hops from m in T because adding (m,u) is less inclined to change the LCCs of other neighbor nodes.
In summary, BUM carefully evaluates the LCC of u but ignores the structural properties, whereas SIM focuses on the distance between m and u but overlooks the LCC. Be noted that BUM and SIM are also equipped with MISS to ensure they obtain feasible solutions. Example 2 indicates that their solutions are far from the optimal solution. In contrast, the maximal LCC in the optimal solution is 0.66 acquired by adding the dotted blue edges into F . For each node whose LCC G ( ) is not equal to LCC G ( ), its LCC G ( ) are labeled aside in red. The optimal solution effectively lowers the maximal LCC by 34% from BUM and SIM without increasing any node's LCC.
Motivated by the strengths and pitfalls of BUM and SIM, we propose Objective-aware Intervention Edge Selection and Adjustment (OISA) to jointly consider the LCCs and the network structure with three ideas: 1) Expected Objective Reaching Exploration (EORE), and 2) Poor Optionality Node First (PONF), and 3) Acceleration of LCC Calculation (ALC). EORE finds the minimum number of intervention edges required to achieve any targeted LCC for each candidate solution and sees if it can meet the budget constraint k. Given a targeted LCC, PONF carefully adds intervention edges to avoid seriously increasing LCCs for some individuals. A pseudocode of OISA is shown in Algorithm 2.

Expected Objective Reaching Exploration
EORE carefully examines the correlation between the LCC reduction and the number of intervention edges. Lemma 1 first derives the minimum number of required intervention edges k G for G to achieve any targeted LCC. To meet the degree constraint ω d , the minimum number of edges for each node t ∈ T is at least Given a node t of degree d G (t), to reduce LCC G (t) to any targeted LCC l, the minimum number of intervention edges k t is the smallest number satisfying: Also, the minimum number of edges k G for G is Equipped with Lemma 1, a simple approach is to examine every possible LCC, i.e., 1 , ..., 1, whered is the maximum degree among all nodes in G. However, since adding k edges can make intervention for at most 2k nodes, it is not necessary to scan all possible LCCs, and EORE thereby only examines a small number of targeted LCCs l j , where with j = 1, 2, ..., until l j exceeds the maximal LCC before intervention, andd 2k ≤d is the maximum degree among all top-2k nodes with the largest LCCs in T . OISA skips an l j if k G > k according to the above lemma.
For any targeted LCC l j , every node t ∈ T requires at least max{k t , ω d − d G (t)} intervention edges to achieve the targeted LCC l j and the degree constraint. Thus, OISA stops the edge selection process if there exists a node t not able to achieve the above goals when For the example in Figure 3, the node with the maximum degree among the top-2k nodes with the largest LCCs is 1 , where its degree is 4. The targeted LCCs to be examined (i.e., l j ) are 0.17, 0.33, 0.5, 0.67, 0.83, 1. For the targeted LCC l 1 = 0.33, k G is 7/2 = 3.5 since k t = 0 for 14 , and k t = 1 for 1 , 2 , 4 , 7 , 11 , 12 , and 13 . However, for the targeted LCC l 2 = 0.17, k G is (2 × 7 + 1)/2 = 8.5 since k t = 1 for 14 , and k t = 2 for 1 , 2 , 4 , 7 , 11 , 12 , and 13 . Thus, it is impossible for the maximal LCC to reach 0.17.

Poor Optionality Node First
Recall that BUM ignores the proximity between the two terminal nodes of an intervention edge, and SIM does not examine the LCCs of both terminals and their nearby nodes. Most importantly, both strategies do not ensure the LCC degradation constraint. To address this critical issue, for each targeted LCC, we first propose the notion of optionality to identify qualified candidate intervention edges that do not increase the LCC of any individual to more than τ . Optionality. For a target t, the optionality of t denotes the number of nodes in the option set U t ⊆ T such that for every u t ∈ U t , either 1) the hop number from u t to t is no smaller than 3, or 2) u t is two-hop away from t and adding an edge (t, u t ) does not increase the LCC of any common neighbor by more than τ .
For the first case, adding an intervention edge (t, u t ) does not increase the LCC of any node. For the second case, the LCC degradation constraint can be ensured as long as the LCCs of common neighbors are sufficiently small. Equipped with the optionality, each iteration of PONF first extracts the node m with the largest LCC in T . If there are multiple candidates for m with the same LCC, (e.g., 2 and 4 in Figure 3), PONF selects the one with the smallest optionality as m so that others with larger optionalities can be Algorithm 2: The OISA algorithm Require: G, T , k, τ , ω b , ω c , ω d Ensure: A set F of k edges to be added, such that the maximal LCC among nodes in T are minimized, while the LCC increment of any node does not exceed τ and the betweenness, closeness, and degree of all target nodes exceed < the maximum LCC of nodes in G do 3: for every t ∈ T do 4: Calculate k t according to Lemma 1 5: if k G < k then Choose m as the node with the maximal LCC in T 10: Choose u according to Definition 4 and Eq. 2 11: Add (m,u) into F

12:
Recompute nodes' LCC with the acceleration of ALC 13: Record F if it reaches a smaller maximal LCC 14: j ← j + 1 15: return The best F found employed later. In contrast, if m was not selected by now, its optionality tends to decrease later when the network becomes denser and may reach 0, so that the LCC of m can no longer be reduced without increasing the LCCs of others, boosting the risk to violate the LCC degradation constraint. The node u of the intervention edge (m,u) is the one with the largest LCC in the option set U m to reduce the LCCs of both m and u. PONF also exploits MISS in Section 3.1 to choose u according to the differences between t's betweenness and ω b , and between t's closeness and ω c .
For the example in Figure 3 with the targeted LCC as 0.33, one intervention edge is selected for 2 , 4 , 7 , 11 , 12 and 13 . In the first iteration, the optionalities of nodes 2 , 4 , 7 , 11 , 12 and 13 are 7, 5, 5, 5, 6, and 7, respectively. The option set of 4 is { 2 , 7 , 11 , 13 , 14 }. Thus, PONF chooses 4 as m and 7 as u. Note that 7 is the node with the largest difference to reach ω b , ω c , and ω d according to Equation 2. In the second iteration, the optionality of 2 , 11 , 12 , and 13 are 7, 5, 6, 7, respectively. Thus, PONF selects 11 as m and 12 as u. It repeats the above process and chooses ( 2 , 13 ) and ( 1 , 14 ) afterward. Figure 3 shows the returned solution with the maximal LCC as 0.33. It is also the optimal solution in this example.

Acceleration of LCC Calculation
When an edge (m,u) is added, only the LCCs of m and u and their common neighbors are likely to change. However, the LCC update cost of m is not negligible when the number of neighbors is huge, since it needs to examine whether there is an edge from u to each of m's neighbors. The adjacency list of every m's neighbor c is required to be inspected even when the LCC of c remains the same. To improve the efficiency, ALC avoids examination of every node by deriving its LCC upper bound LCC( , k).
where d G ( ) is the degree of in G. n is the number of edges between 's neighbors before intervention, n = LCC G ( ) × C(d G ( ), 2).
For an intervention edge set F of size k, LCC G ( ) ≤ LCC( , k) holds.

P
. Let k 1 and k 2 denote the numbers of intervention edges connecting to and any two neighbors of , respectively. After intervention, k 1 + k 2 ≤ k, and LCC G ( ) = n +k 2 + C (d G ( )+k 1 , 2) , where is the number of edges between the new neighbors via the k 1 new edges and the original neighbors of in G, and ≤ d G ( ) × k 1 + C(k 1 , 2). Thus, LCC G ( ) ≤ n +k 2 +d G ( )×k 1 +C (k 1 , 2) According to the above theorem, ALC first derives LCC G ( ) and LCC( , k) as a pre-processing step of OISA before intervention. Accordingly, PONF does not update the LCC of a node if the intervention edge neither connects to nor spans 's two neighbors, and is not going to be m and u in the next iteration since LCC( , k) is smaller than the current maximal LCC potential to be m and u in the next iteration. Therefore, Theorem 3 enables OISA to effectively skip the LCC updates of most nodes.

EXPERIMENTAL RESULTS
We evaluate the effectiveness and efficiency of CRPD and OISA by experimentation in Sections 6.1 and 6.2, respectively. Also, to show the feasibility of using OISA in real-world setting, we present an empirical study in Section 6.3. 3 The study has been inspected by 11 clinical psychologists and professors in the field. 4 For simulation, since there is no prior work on lowering LCCs while ensuring their betweenness, closeness and degree, we compare the proposed CRPD and OISA with five baselines: 1) Budget Utility Maximization (BUM): BUM iteratively adds an edge between a targeted node and the node with the largest LCC, while not violating the constraints; 2) Surrounding Impact Minimization (SIM): SIM iteratively adds an edge from a targeted node to the node with the maximal number of hops from it, while not violating the constraints; 3) Enumeration (ENUM): ENUM exhaustively finds the optimal solution. 4) Edge Addition for Improving Network Centrality (EA) [30]: EA iteratively adds an edge with the largest increment on closeness centrality, and 5) Target-oriented Edge Addition for Improving Network Centrality (TEA) [10]: TEA iteratively adds an edge with the largest increment on closeness centrality for the targeted nodes. 6) Greedy algorithm for Dyad scenario (GD) [51]: GD iteratively adds an edge with the largest increment on influence score, where the influence score is calculated in a similar way to PageRank. All algorithms are implemented on an HP DL580 G9 server with four Intel Xeon E7-8870v4 2.10 GHz CPUs and 1.2 TB RAM. Six real datasets are evaluated in the experiments. The first one, CPEP [16], contains the complete social network of the students in 10 classrooms of several public elementary schools in the US. The other five large real social network datasets, collected from the Web, are Facebook, Flickr, Youtube, Amazon, and Cond-Mat. Some statistics of datasets used in our experiments are summarized in Table 2. The default τ , ω b and ω c are set to 0.12, 0.01, and 0.1, respectively.

Evaluation of CRPD for NILD-S
In the following, we first compare CRPD with baselines by varying k. For Facebook and Flickr, ENUM does not return the solutions in two days even when k = 10, and thus it is not shown. Figures 4(a)-(e) compare the results of various k on CPEP and Flickr under default τ , ω b and ω c . For CPEP, k is set to 6%, 12%, 18%, 25%, and 31% of the number of nodes (i.e., 1, 2, 3, 4, and 5 intervention edges). For Flickr, k is set to 0.02%, 0.04%, 0.06%, 0.8%, and 0.1% of nodes (i.e., 400, 800, 1200, 1600, and 2000 intervention edges). The t is randomly chosen from the nodes with LCC larger than 0.8, and we report the average of 50 trials. Figures 4(a) and 4(b) show that with k increasing, t's LCC decreases as more edges are connected with t to reduce its LCC. Also, CRPD outperforms other baselines as it carefully examines the candidates' structure to avoid increasing the number of edges among t's neighbors. Figure 4(c) shows the running time of OISA is comparable with simple baselines BUM and SIM while achieving better LCC on Flickr.
Figures 4(d) and 4(e) show the betweenness and closeness of t after adding the intervention edges to Flickr. EA and TEA obtain the largest betweenness and closeness, but their maximal LCCs are not effectively reduced. SIM also achieves large betweenness and closeness since it selects the node farthest from t as u and creates a shortcut between them, but the maximal LCC of SIM does not significantly decrease. GD achieves smaller betweenness and closeness than SIM since it generally selects a node u with a large PageRank score, but its improvement on t's betweenness and closeness is smaller than SIM. BUM induces the smallest betweenness and closeness of t since it selects u with the largest LCCs, and those chosen u are inclined to be near each other. In contrast, CRPD achieves comparable performance with EA and TEA, showing that CRPD can effectively improve not only LCC but also other network characteristics.
Next, we conduct a series of sensitivity tests on τ , ω b , and ω c , and show the component effectiveness. The results on CPEP and Facebook are similar to Flickr and thus not shown here. Varying τ . Figure 4(f) compares LCC of t with different τ on Flickr, under default ω b and ω c . As shown, the LCC of t decreases because a large τ allows more nodes to be candidates and thereby tends to   Figure 4(g) shows that the running time of CRPD and baselines on Flickr increases as τ grows because more candidates are considered.
Varying ω b and ω c . Figure 4(h) compares the LCC of t with different ω b on Flickr, where k = 0.04% (800 intervention edges). The trend of ω c is similar to that of ω b and thus not shown here. LCC of t slightly grows with increasing ω b and ω c , because large ω b and ω c require CRPD and baselines to allocate more edges for the betweenness and closeness, instead of focusing on reducing LCC. Moreover, LCC of t obtained by TEA, EA, and CRPD increases slower than one obtained by BUM, SIM and GD, since the edges selected by TEA and EA maximize the closeness centrality while fulfilling the constraints of ω b and ω c , whereas the edges from CRPD effectively increase the betweenness and closeness of t. Component Effectiveness. Figure 4(i) compares LCC of t obtained by CRPD with and without RNR, and Figure 4(j) compares the closeness of CRPD with and without MISS on Flickr. The trend of betweenness is similar to closeness and thereby not shown here. As shown, while CRPD with and without MISS shares similar LCCs, CRPD with MISS achieves a greater closeness because choosing u with larger closeness/betweenness tends to increase t's betweenness and closeness. The results of CRPD on CPEP and Facebook share a similar trend and thus are not shown here.

Evaluation of OISA for NILD-M
In the following, we evaluate the proposed OISA by randomly choosing 20% of nodes in V as T from the nodes with the top-40% maximum LCCs. The results are the average of 50 trials. Figures 5(a)-(c) first compare the effectiveness of all examined approaches on CPEP by varying the number of intervention edges k (relative to the edge number in G). Figure 5(a) indicates that as k grows, the maximum LCC generally decreases. OISA and ENUM significantly outperform BUM, SIM, EA, TEA, and GD because EA, TEA and GD are designed to maximize the closeness centrality and influence score, instead of reducing the maximum LCC.Also, BUM only examines the LCCs of the nodes, while SIM ignores LCCs and gives the priority to the intervention edges with the maximum numbers of hops. Figure 5(b) presents the running time of all approaches in the log-scale. OISA and most baselines select F within 1 second. In contrast, the running time of ENUM grows exponentially.
To understand the changes of LCCs in different nodes, we take a closer look at the nodes whose LCC potentially changes, i.e., the terminal nodes of the selected edges F and their neighbors.  Figure 5(c), even though BUM outperforms SIM, the average LCC reduction in the range [0.9, 1] is smaller than SIM since BUM may connect nearby nodes and increase the LCCs of the common neighbors. It also explains why the maximum LCC achieved by BUM drops slower than SIM, ENUM, and OISA while k increases, i.e., BUM creates lots of targeted nodes with LCCs around 0.6 and 0.7, and BUM needs to make intervention for all of them again to reduce the maximum LCC to become lower than 0.6. In contrast, the behavior of OISA is similar to ENUM in most ranges, and it successfully achieves comparable performance.
Next, Figures 5(d)-(i) compare all approaches except ENUM on Flickr, since ENUM does not return any solution in two days even for k = 10. The result on Facebook is similar and thereby is not shown here. As there are nearly 9000 nodes with LCCs as 1 on Flickr, the minimal k is 4500 edges, because adding one edge can make intervention for at most two nodes with LCC as 1. Thus, k is set to 0.02%, 0.04%, 0.06%, 0.08%, and 0.1% of the number of edges (i.e., 4500, 9000, 13500, 18000, and 22500 edges). Figure 5(d) indicates that OISA significantly outperforms all other baselines under all settings of k. BUM is superior to SIM when k is small because the farthest node u selected by SIM usually results in a small LCC and fewer neighbors, i.e., reducing the LCC of u does not help reduce the maximum LCC. Thus, when the number of intervention edges to be added is small, e.g., smaller than the number of targeted nodes in T , it is impossible for SIM to reduce the maximum LCC. In contrast, u selected by BUM usually has a large LCC, and thus BUM is able to reduce the maximum LCC with the number of intervention edges around half of the minimal k. Figure 5(e) compares the running time. EA considers every possible edge and thus incurs the largest running time. BUM requires the least running time since it only retrieves the nodes with the largest LCC as the terminals of the intervention edges. OISA needs slightly more time but obtains much better solutions since it carefully examines multiple anticipated LCCs, i.e., l j . Also, OISA takes a longer time on Flickr since Flickr has much more large-LCC nodes, and it is necessary for OISA to derive the optionality for all these

Empirical Study
The empirical study aims at evaluating the utility and feasibility of the proposed network intervention algorithm in real-world settings. The study, spanned over two months, included 8 weekly measurements of psychological outcomes among 424 participants, aged between 18 and 25. The participants were university students and employees with 638 pre-existing friendship links at the beginning of the study. Four self-reported standard psychological questionnaires were adopted as indicators, including Beck Anxiety Inventory (BAI) [2] for anxiety, Perceived Stress Scale (PSS) [8] for stress, Positive And Negative Affect Schedule (PANAS) [49] for emotion, and Psychological well-being Scale (PWS) [40] for well-being. Table 3 summarizes some evaluated items in the above questionnaires. In PANAS, there are 12 positive emotion terms and 14 negative emotion terms, and the overall score is the total score of 12 positive emotion terms minus the total score of negative emotion terms. For anxiety and stress, higher scores indicate higher levels of anxiety and perceived stress, respectively. For emotion and wellbeing, higher scores imply better emotion and psychological wellbeing.
To evaluate the effects of adding friendship links based on different approaches, the participants were randomly assigned to one of the following four groups: three intervention groups and one control group, with 103 participants in every group. 5 Participants in the intervention groups were provided with explicit friendship recommendations suggested by OISA and other two baselines, GD and BUM, respectively. Among the five baselines, GD was chosen because it can be applied to propagate health-related information to prevent obesity [51]. BUM was chosen because it performs the best among all baselines in Section 6.2. In the study, k was set as 28 for OISA, GD, and BUM, whereas τ = 0.12, ω b = 0.01, ω c = 0.1, and |T | = 103 for OISA. For OISA, GD, and BUM, the recommended edges were suggested by providing specific instructions for the participants to engage in online chatting. In contrast, participants in the control group received no intervention and explicit instruction to interact with other participants. Each participant was required to provide responses to the four questionnaires every week for a total of eight times in this study. An important question in the study is whether the participants accepted the friendship recommendations or not. To answer this question, participants were also asked the following questions at the end of the study: Q1. Do you feel happy chatting with this recommended participant? Q2. After the study ends, are you willing to chat with this participant? Q3. If possible, are you willing to become a friend of this participant? For Q1, 80.2% of the participants reported that they felt happy during the chat with the recommended participants. For Q2, 79.9% of the participants replied that they would be willing to chat with the recommended participants even after the study ended. For Q3, 83.9% of the participants reported that they would be willing to become a friend of the recommenced participant. Accordingly, participants in this study tended to accept the friend recommendations.
To evaluate the effects of the intervention, Figure 6 reports the average improvement on each psychological outcome for GD, BUM and OISA. As shown in Figure 6(a), among three intervention groups, OISA outperforms GD and BUM in all four measures, anxiety, stress, emotion, and well-being. Figures 6(b)-(c) further divide participants into four sub-groups according to the percentage of reduction in LCC. 6 Figures 6(b)-(c) manifest that greater percentages of reduction in LCC are associated with more significant improvements in the psychological outcomes. It validates that new friendships via OISA is able to improve the psychological outcomes in this study. GD is inferior due to a different goal (i.e., maximizing the social 6 Results of pressure, and well-being are similar and thus not shown here. influence score). Figures 6(d)-(e) plot the average scores of anxiety and emotion of participants in the intervention groups of GD, BUM, and OISA over time. As shown in Figures 6(d)-(e), the improvements of OISA is the most significant. In contrast, GD and BUM do not consider multiple network characteristics simultaneously. Figure 6(f), which plots the improvement of OISA for each outcome, also manifests that the improvements on anxiety, stress, and emotion with negative feelings are slightly better than that on well-being with only positive feelings, because brains tend to focus on potential threatening and negative emotions [22]. In this case, the friend candidate recommended by OISA is able to provide social support to the participants.
We also evaluate OISA, GD and BUM separately with mixed effects modeling [19], a statistical technique to examine if the intervention group and control group are statistically different. Specifically, we fitted the model: where H it represents participant i's emotion at time t; inter ention i denotes whether participant i is in the control group (inter ention i = 0) or the intervention group (inter ention i = 1); time represents the number of weeks elapsed since the study started. The study starts at time 0. β 0 is a group intercept term representing the predicted outcome for the control group at time 0. β 0i is the participant i's deviation in intercept relative to β 0 , which is a random effect in intercept at time 0. β 1β 3 are the regression weights associated with inter ention i , time and their interaction, respectively. Specifically, β 1 indicates the difference in predicted psychological outcomes between the control group and intervention group at time 0; β 2 represents the estimated amount of change in emotion in the control group for each week of elapsed time (i.e., the "placebo effect", or the improvements in psychological outcomes shown by the control group). Finally, β 3 is the most important, i.e., the regression coefficient for the inter ention i · time interaction reveals the estimated difference in the amount of change in outcomes reported by the intervention group relative to the control group for each week of elapsed time. If β 3 is statistically significantly different from 0, OISA (or other baselines) is able to change the participants' health outcomes with time substantially more than what is expected in the control group. Finally, ϵ it represents the residual error in the negative emotion that cannot be accounted for by other terms. Table 4 presents the results of model fitting of OISA. Estimates with p-values smaller than 0.05 are identified as statistically significantly different from 0. Thus, the two values of β 0 , the predicted psychological outcomes of the control group at time 0, are estimated to be significantly different from 0. The term β 1 is not significantly different from 0 for all the four outcomes, which validates our random assignment procedure -indicating that the intervention and control group do not differ substantially in their anxiety, stress, emotion and well-being scores at time 0. Also, the control group does not show substantial changes in all the four outcomes with time, as β 2 is not significant. Moreover, the values of β 3 are negative and significantly different from zero for anxiety and stress, indicating that the intervention group shows statistically greater decreases in anxiety and stress. Similarly, the values  of β 3 are positive and significantly different from zero for emotion and well-being, indicating that the intervention group shows statistically greater increase in emotion and well-being. In other words, the intervention group, whose friendship recommendation is suggested by OISA, shows statistically greater improvement for all the four outcomes. Also, the more extreme test statistic value (t-value) and its corresponding p-value, associated with the outcomes, suggest that the intervention have a more systematic effect (i.e., less uncertainty or smaller standard error) [5]. Also, Table 5 and Table 6 reveal that the model fitting results for BUM and GD are not statistically significant. Lastly, inspection of these results by 11 clinical psychologists and professors 4 , is carried out to observe the behavioral implications behind the scores. The psychologists and professors are asked to the following question in Likert scale: Is the network intervention help improve the participants' outcomes? Comparing their evaluation among intervention and control groups, the results indicate that 82% of psychologists and professors agree that the recommendation of OISA is the most effective; while only 9% of them agree for GD and BUM. The above results lead to consistent conclusion with the study -the intervention is therapeutic and positive enough to be brought into clinical consideration.

CONCLUSION
Even though research has suggested the use of network intervention for improving psychological outcomes, there is no effective planning tool for practitioners to select suitable intervention edges from a large number of candidate network characteristics. In this paper, we formulate NILD-S and NILD-M to address this practical need. We prove the NP-hardness and inapproximability, and propose effective algorithms for them. Experiments based on real datasets show that our algorithms outperform other baselines in terms of both efficiency and effectiveness; empirical results further attest to the practical feasibility and utility of using the OISA algorithm for real-world intervention purposes. L 2. After (i, j) is added to G = (V , E), the LCC of every ∈ (V \{i, j})\(N G (i) ∩ N G (j)) remains the same.

P
. We also prove the lemma by contradiction. Assume the LCC of a ∈ (V \{i, j})\(N G (i) ∩ N G (j)) becomes different. Two possible cases exist. (1) is not a neighbor of i and j, i.e., ( , i) E and ( , j) E. (2) is the neighbor of either i or j. Without loss of generality, we assume ( , i) ∈ E but ( , j) E. For the first case, i and j are not in the subgraph induced by and the neighbors of . Thus, adding (i, j) does not change 's LCC. For the second case, i is in the subgraph induced by and its neighbors, but j is not. Therefore, (i, j) is not included in the induced subgraph.
if there exists at least one feasible solution following the constraints of NILD-S on τ , ω b , ω c , and ω d , the solution obtained by CRPD is always feasible.

P
. Assume that G = (V , E) has at least one feasible solution V F S , but the solution V F obtained by CRPD is different and infeasible. If there are multiple feasible solutions, let V F S be the one with the most common nodes with V F . Let u f s and u f denote the first different node in V F S and V F when all nodes are sorted according to their IDs in the threshold graph, respectively. Note that the ID of u f is smaller than u f s ; otherwise CRPD would choose u f s in V F . In the following, we prove that connecting t to the nodes in V F F = V F S \{u f s } ∪ {u f } leads to another feasible solution following the constraints on ω b , ω c , and ω d . 1) Both V F S and V F F add k new neighbors to t, and the degree of t for V F F always exceeds ω d . 2) The betweenness of t is the proportion of shortest paths among all node pairs passing through t. If both u f s and u f are in V D ∪ V C , the betweenness of t is the same in V F S and V F F , because the length of shortest paths among all nodes in V D ∪ V C is at most 2, and the betweenness will not be improved by edges in {(t, )| ∈ V F S } or {(t, )| ∈ V F F }. If both u f s and u f are in V Z , the betweenness of t is the same, because the number of node pairs u f s , with ∈ V D ∪ V C \{t } and the corresponding shortest paths not passing through t after removing (t, u f s ) is identical to the number of node pairs u f , with ∈ V D ∪ V C \{t } and the corresponding shortest paths passing through t after adding (t, u f ). If u f s ∈ V D ∪ V C and u f ∈ V Z , the betweenness of t grows and becomes larger than ω b , because the shortest paths of node pairs u f , with ∈ V D ∪ V C \{t } pass through t. Therefore, the betweenness of t in solution V F F is larger than ω b . 3) The closeness of t is the sum of reciprocal of distances to other nodes. If both u f s and u f are in V D ∪V C (or both u f s and u f are in V Z ), the closeness of t is the same. If u f s ∈ V D ∪ V C and u f ∈ V Z , the closeness of t increases because the distance between t and u f changes from 1 to 3, but the distance between t and u f changes from ∞ to 1. Theorem 2 proves that the above feasible solution is optimal. T 2. The CRPD algorithm can find the optimal solution of NILD-S when G = (V , E) is a threshold graph, and the running time of CRPD is O(|E|d + kd |V |).

P
. We prove the optimality by exploring the following two cases: 1) |V Z ∪ V D \{t }| ≥ k and 2) |V Z ∪ V D \{t }| < k. For the first case, according to Definition 2, a node with a larger weight is more inclined to connect to others. According to Corollary 2, V Z ∪ V D forms an independent set. Also, the nodes are ordered in the ascending order of the degrees. Thus, CRPD examines nodes in the ascending order of their weights and chooses all nodes in V Z ∪ V D . Adding edges in {(t, )| ∈ V Z ∪ V D } will optimize the LCC of t because of the following reasons. (a) Adding edges in {(t, )| ∈ V Z ∪ V D } only changes the LCCs of the nodes in {t } ∪ V Z ∪ V D according to Lemma 2. (b) For the nodes in V Z , its degree is increased by 1 (i.e, connecting to t), but the LCC remains as 0 because V Z ∪V D is an independent set. (c) The LCCs of nodes in V D is 1, and the LCCs of nodes in V D are impossible to be increased. (d) The LCC of t will be optimized because adding edges in {(t, )| ∈ V Z ∪V D } only increases the denominator in Equation 1, but it does not increase the numerator in Equation 1. Thus, if |V Z ∪V D \{t }| ≥ k, CRPD finds the optimal solution.
Next, we prove the second case by contradiction. Assume that there is an optimal solution F * = {(t, )| ∈ V * F } better than the solution F = {(t, )| ∈ V F } obtained by CRPD. When there are multiple optimal solutions, let F * = {(t, )| ∈ V * F } be the one such that V * F has the most common nodes with V F . Let V F = {u 1 , u 2 , ..., u i −1 , u i ...u k } and V * F = {u 1 , ..., u i −1 , u * i , ..., u * k }, such that u i and u * i are the first different node in V F and V * F when all nodes are sorted according to their IDs. u i is smaller than u * i ; otherwise CRPD would select u * i in V F . Moreover, N G (u i ) ⊆ N G (u * i ) according to Definition 2. We consider another solution {(t, )| ∈ V * * F } with V * * F = V * F \{u * i }∪{u i }. First, the solution {(t, )| ∈ V * * F } is a feasible solution due to the following reason. If u i ∈ V Z ∪ V D \{t }, the LCC of u i after adding {(t, )| ∈ V * * F } satisfies the LCC degradation constraint τ since the LCC of u i ∈ V D is 1 before adding edges (i.e., never increase), and the LCC of u i ∈ V Z remains as 0 before and after adding edges. On the other hand, if u i ∈ V C , the LCC of u i after adding {(t, )| ∈ V * * F } is identical to the one after adding {(t, )| ∈ V * S }, because all nodes u i +1 ...u k and u * i +1 , ..., u * k are fully connected, and the number of edges between u i 's neighbors is thereby the same. Second, the LCC of t after adding {(t, )| ∈ V * * F } is not larger than the LCC of t in solution {(t, )| ∈ V * F }, as N G (u i ) ⊆ N G (u * i ). Thus, it contradicts that {(t, )| ∈ V * F } is the optimal solution with the most common nodes with {(t, )| ∈ V F }.
In LCC Calculation Step, to find the number of edges between node neighbors, CRPD stores a count for each node. For each edge (i, j) ∈ E, it examines the neighbors of the terminal nodes N G (i) and N G (j) in O(d) time, whered is the maximum degree in G = (V , E). For every common neighbor of i and j, CRPD increases its count by 1. After examining all |E| edges in O(|E|d) time, CRPD extracts n for each node . (2) With n , CRPD finds LCC G ( ) of every node in G = (V , E) in O(|V |) time, and the total time complexity in LCC Calculation Step is O(|E|d + |V |). In Edge Selection Process Step, CRPD first examines each of t's two-hop neighbors and excludes if adding (t, ) increases the LCC of any t's one-hop neighbor to more than τ in O(d|V |) time. Then, CRPD acquires an initial solution in k iterations. In each iteration, CRPD first finds u from the remaining candidates in O(|V |) time and excludes any node if adding (t, ) will increase another node's LCC to more than τ in O(|V |d) time. The total time to find the initial solution is O(kd |V |). After that, CRPD explores another solution starting from r in O(kd |V |) time. In summary, CRPD requires O(|E|d + |V |)

in LCC Calculation Step and O(kd |V |) in Edge Selection Processing
Step, and the total running time is O(|E|d + kd |V |). 7