A comparative study on the reliability of co-authorship networks with emphases on edges and nodes

A scientific co-authorship network may be modeled by a graph G composed of k nodes and m edges. Researchers that make up this network may be interpreted as its nodes and the link between these agents (co-authored papers) as its edges. Current work evaluated and compared the reliability measure of networks with two emphases: 1) On nodes (perfectly reliable edges) and 2) On edges (perfectly reliable nodes). Specifically, the reliability of a fictitious co-authorship network at a given time t was analyzed taking into account, first, the reliability of nodes (researchers) equal and different, and, second, the reliability of edges (co-authorship relations), equal and different. Additionally, centrality measures of nodes were obtained to identify situations where the insertion of an edge significantly increased the reliability of the network. Results showed that the reliability of the co-authorship network focusing on edges is more sensitive to changes in individual reliabilities than the reliability of the network focusing on nodes. Additionally, the use of centrality measures was viable to identify possible insertions of edges or coauthorship relations to increase the reliability of the network in the two approaches.


Introduction
Physical, biological or social systems characterized by a huge set of well-defined items that interact dynamically are networks.The maintenance of the functionality of a network requires information of its structure, functions and characteristics.
Since a network structure may be represented by a graph G composed of k nodes and m edges, the theory of graphs is basic to determine the properties, which refer to topological aspects of the network (Brigantini, Oliveira, & Braga Junior, 2014).
The network reliability is given by the probability of this network to remain operating even when one or more subsets of components (edges and/or nodes) are removed (Barlow & Proschan, 1981).Highly reliable networks are strong structures.Moreover, a network is more reliable than another if the probability of one network is disconnected is smaller than that of the other (Oliveira, Ferreira, Brigantini, & Uehara, 2014).
Social networks are structures composed of people, organizations, territories or others, connected among themselves by one or several types of relationships (friendship, family, commercial etc.) through which information, knowledge, interests, values and aims are shared (Lyra & Oliveira, 2011).
Co-authorship networks may be included in this presented context.In fact, they are made up of researchers and shared tasks.Co-authorship networks are symmetrical in the sense that researcher A is a collaborator of researcher B at a given time t in the exact number of times that researcher B is a collaborator of A. Co-authorship of products obtained by scientific activities, with a special mention to scientific publications, indicates collaboration.
A co-authorship network is called highly reliable when is likely to continue producing science.If this probability decreases the reliability of the network also decreases.
The analysis of co-authorship networks is one of the most visible and accessible ways to identify scientific collaboration relationships within the academic milieu.Those networks have been extensively employed by the social network analysis (SNA) method which is specifically focused on the inter-relationship of agents (researchers).It aims at understanding the social mechanisms behind the connections (co-authorship, citations and others) and the manner they make easy the flow of information and knowledge among agents (Oliveira et al., 2014).
On the other hand, network structural analysis precisely investigates the roles and positions of agents in the network and employs the concepts of the theory of graphs that provide the mathematical basis for computer calculations and use more specific approaches in data analyses.Recent studies on the reliability of co-authorship networks may be included since they have mainly emphasized combinatory distributions (Lyra & Oliveira, 2011;Brigantini, Oliveira, & Braga Junior, 2014;Oliveira et al., 2014).
Given this context, studies on the structural analyses of co-authorship networks underscoring reliability measures coupled to a methodology of SNA are relevant to maintain the working of the networks and their activities and for the emergence of strategies in survival and competiveness.
A network of scientific co-authorship may be modeled by graph G composed of k nodes and m edges.Researchers of the network are its nodes and connections or links among the agents (represented by common or co-authored publications) are the edges.
Current paper evaluate and compare the reliability measure of networks with two emphases: 1) On nodes (unreliable nodes or nodes prone to failure, or rather, one or more researchers leaving the network, and perfectly reliable edges); and 2) On edges (unreliable edges or edges prone to failure, or rather, one or more co-authorship relations among researchers may not exist anymore, and perfectly reliable nodes).Specifically, the reliability of a fictitious co-authorship network at a given time t was analyzed taking into account, first, the reliability of nodes (researchers) equal and different, and, second, the reliability of edges (co-authorship relations), equal and different.Additionally, centrality measures of nodes were obtained to identify situations in which the insertion of an edge may increase significantly the reliability of the network with regard to influence on nodes and on edges.

Material and methods
In a co-authorship network each researcher is represented by a node and two nodes are linked by one edge when the represented researchers have at least one publication in common.Then, the reliability of the co-authorship network is the probability of this network to be active at time t, even though one or more faults cause the removal of one or more subsets of nodes or edges of the graph.
Current essay comprises a network of fictitious scientific co-authorship modeled by graph G of Figure 1

Calculation of the network reliability
Let a network be modeled by a simple undirected graph G = (V, E) with k nodes and m edges.In order for the network to function (or to be active) at time t, every pair of nodes should be connected by at least one path.In this work were considered two situations: Unreliable edges and perfectly reliable nodes (emphasis on edges) Let´s suppose that the nodes are perfectly reliable and only the edges tend to be faulty.Therefore, each edge j ( ) has a probability of operation (reliability of edge j) denoted by p j .There are instances in which all the edges of a graph that models the network have the same reliability, simply denoted by p. Further, edges are independent two by two.In other words, the failure of one does not imply the failure of the other.
In order for the reliability of a network (probability of graph G continues connected, even given the failure of one or more edges) may be calculated, the probability of each operating stage of the network must first be determined by Equation 1: in which E is the set of edges of graph G and E' is the set made up by the operating edges of graph G.When the edges of graph G that models the network have the same reliability p , the network reliability is given as Equation 2: in which G is the graph that models the network with k nodes and m edges; j S is the number of connected sub-graphs of G with j edges (Kelmans, 1966).
When the edges of the graph that models the network have different reliabilities j p , the network reliability G R p is calculated similarly as Equation ( 2), or rather, when the connected sub-graphs of G with j edges are obtained, the probability of each operating state of the network should be calculated and results added.

Unreliable nodes and perfectly reliable edges (emphasis on nodes)
Let´s suppose that the edges are perfectly reliable and only the nodes tend to be faulty.Therefore, ) has a probability of operation (reliability of node i) denoted by p i .There are instances in which all the nodes of a graph that models the network have the same reliability, simply denoted by p. Further, nodes are independent two by two.In other words, the failure of one does not imply the failure of the other.
In order for the reliability of a network (probability of graph G continues connected, even given the failure of one or more nodes) may be calculated, the probability of each operating stage of the network must first be determined by Equation 3: in which V is the set of nodes of graph G and V' is the set made up by the operating nodes of graph G.When the nodes of graph G that models the network have the same operating probability p, the network reliability is given as Equation 4: in which G is the graph that models the network with k nodes and m edges; i S is the number of connected sub-graphs of G with i nodes (Goldschmidt, Jaillet, & Lasota, 1994).
When the nodes of the graph that models the network have different reliabilities p i , the network reliability G R p is calculated similarly as Equation (4), or rather, when the connected sub-graphs of G with i nodes are obtained, the probability of each operating state of the network should be calculated and results added.

Centrality measures
Centrality measures are employed to verify the relevance of a node with regard to the others in a network.Through centrality measures, nodes may be ordered according to their relative importance.
Different centrality measures are used for different types of relevance (position, flux, influence and others).In this work it was employed two centrality measures that assess the importance of nodes in a network according to their structural position (Newman, 2010): Closeness measure relates total distance of a node to other nodes of the network, or rather, it indicates the access velocity of a node to another in the network and shows the nodes that need improvement.Closeness measure of node i ( i v ) is calculated by Equation 5: represents the least distance between node i ( i v ) and node j ( j v ); k is the number of nodes in the network.The most central item of the network has the lowest rate of , or rather, the item that communicates with the highest speed with the other items of the network due to its structural position.
• Information degree measure gives relevance to a node due to the number of direct bonds that it establishes with other nodes of the network.In other words, it evaluates direct interference (or immediate effect for time 1 + t ) of a node in the other by the number of measurement unit paths originating from a node.The calculation of the information degrees measure of node i ( i v ) is given by Equation 6: in which k is the number of nodes in the network.

Results and discussion
Perfectly reliable nodes and unreliable edges (emphasis on edges or co-authorship relations) Graph G of Figure 1  In practice, these reliabilities may be obtained by the frequentist probability of each edge or co-authorship relation, or rather, it is the number of co-authored publications of a pair of researchers with regard to total number of publications of this same pair of researchers for the specific network (Brigantini, Oliveira, & Braga Junior, 2014).
Four connected sub-graphs may be formed from graph G of Figure 2: three sub-graphs with five edges and one sub-graph with six edges.Due to the configuration of graph G, it is not possible to form connected sub-graphs with four or less edges.The reliability of the network modeled by graph G in time t may be given by , then the reliability of the network is given by Equation 12: Table 1 shows results of simulations for different rates of p.One may observe the behavior of the network reliability as the reliability of each edge or the co-authorship relation (value of p) increases.Therefore, due to the configuration of the researchers´ network and the co-authorship relation, it may be verified that the probability of failure of edge or co-authorship relation above 0.6 (p ≤ 0.4) causes the reliability of the network in t to be close to zero.Perfectly reliable edges and unreliable nodes (emphasis on nodes or researchers) In the case of focus on nodes or researchers, or rather, taking into consideration the edges or relations of totally reliable co-authorship and the nodes or researchers prone to failure, graph G of Figure 1  .In a real situation, reliabilities may be obtained by the frequentist probability of each node or researcher, i.e., it is the number of publications of a researcher for the network of scientific co-authorship with regard to total number of publications of this same researcher for that network and for other ends (Oliveira et al., 2014).
Twenty-six connected sub-graphs may be formed from graph G of Figure 3, with one subgraph with six nodes, four sub-graphs with five nodes, seven sub-graphs with four nodes, eight subgraphs with three nodes and six sub-graphs with two nodes.The reliability of the co-authorship network modeled by graph G in time t is given by , where, replacing the respective reliabilities, according as Equations 13 to 17: Therefore, the reliability of the co-authorship network with emphasis on nodes or researchers is given by: .If all the nodes of graph G have the same reliability p, the reliability of the co-authorship network may be expressed by (4).Since Table 2 shows the results of simulations for different rates of p.One may observe the behavior of the network reliability in proportion to the increase of the reliability of each node or researcher (value of p).Therefore, owing to the configuration of this network and the relationships of existing coauthorship, the probability of the node´s or researcher´s failure above 0.8 (p ≤ 0.2) causes the reliability of the network in time t to be close to zero.
According to the configuration of the fictitious co-authorship network of Figure 1 and the relationships of the existing co-authorship, when the two situations for different rates of p (emphasis on edges and then emphasis on nodes) are compared, one may see that as p decreases, the reliability of the network becomes close to zero more quickly in the first case (edges or unreliable co-authorship relations) than in the second (nodes or unreliable researchers), as Figure 4 shows.Therefore, the reliability measure of network p R G with emphasis on edges or co-authorship relation is more sensitive to changes of individual reliabilities p than that of the reliability of the network with emphasis on nodes or researchers.So that the reliability of the fictitious coauthorship network could be increased and the importance of centrality measures in this context could be introduced, within all possible insertion options of a new edge (or co-authorship relation) in graph G of Figure 1, nine non-isomorph graphs 1 , described in Table 3, could be produced.
Since the nine graphs of Table 3 have equal reliabilities p (for emphasis on edges and on nodes), a simulation was conducted for different rates of p whose results of reliability calculated for G1, G2,..., G9 are represented in Tables 4 and 5, below.Results were then compared with the reliabilities of the network modeled by graph G, calculated in 1 Isomorph graphs have the same structure; in other words, they have the same number of nodes and edges, albeit a different pattern.
Tables 1 and 2, to identify which graph(s) provided the highest increase in reliability of G within all possibilities of insertion of an edge (link between nodes or researchers who were not previously extant).
Table 3. Non-isomorph graphs obtained from G (Figure 1) introducing an edge or co-authorship relation.

Graph
Link Graph Link G1 Researchers 1 and 6 G6 Researchers 1 and 5 G2 Researchers 4 and 6 G7 Researchers 1 and 4 G3 Researchers 1 and 3 G8 Researchers 2 and 5 G4 Researchers 4 and 5 G9 Researchers 5 and 6 G5 Researchers 3 and 6 In the case of unreliable edges (Table 4), it should be noted that as from the insertion of an edge or co-authorship relation in graph G, the graphs G6 and G9 had respectively the highest increase in reliability.Therefore, the insertion of an edge or co-authorship link between nodes or researchers '1 ' and '5' and between researchers '5' and '6' provided an average 32.87 (G6) and 31.22%(G9) higher reliability of the network modeled by graph G.
In the case of unreliable nodes (Table 5), one notes that as from the insertion of an edge or coauthorship relation in graph G, the graphs G8, G6 and G9 had respectively the highest increase in reliability.Therefore, the insertion of an edge or coauthorship link between nodes or researchers '2 ' and '5' and between researchers '1' and '5', or similarly, between researchers '5' and '6' increased, on an average, by 22.26 (G8) and 20.33% (G6 and G9) the reliability of the network modeled by graph G.
Centrality measures may be employed to verify how a node or researcher of a co-authorship network is relatively more important with regard to the others and to indicate among which nodes or researchers a new edge may be inserted for maximum reliability within the network.
Two centrality measures of nodes, previously given, were taken into account in the analysis of the fictitious co-authorship network modeled by graph G: Closeness measure and Information degree measure.
The former refers to the path that the node has to go to reach the others and the latter refers to the direct links that the node does to the others.The node with the lowest closeness measure is the most central of the network, or rather, that which communicates more rapidly with the other nodes.
In the case of the information degree measure, the node that has the highest rate (or degree) is that which has the best direct contact with the other nodes.
Table 6 shows the centrality measures calculated for the nodes of graph G of Figure 1.The table shows that, according to the closeness measures and information degree measures, the most central node or researcher of graph G is '2', or rather, it has the highest access speed and the greatest influence on the other nodes or researchers.Therefore, if the node or researcher '2' were to be removed from the graph (for any reason), the network become less connected and consequently with decreased reliability, since certain pathways were eliminated.On the other hand, nodes or less central researchers of graph G are nodes '5', '1' and '6', respectively.Taking into consideration results in recent studies on the reliability of co-authorship network (Lyra & Oliveira, 2011;Oliveira et al., 2014;Brigantini, Oliveira & Braga Junior, 2014), one must pinpoint links of nodes '1' and '5' or '5' and '6' if the aim is to make the network more reliable through the insertion of an edge or relation between nodes or researchers.
Simulation results in Table 4 (emphasis on edges) showed that the greatest increases of reliability of the fictitious network of researchers modeled by graph G were precisely obtained with the insertions of the edges between nodes or researchers '1 ' and '5' and between researchers '5' and '6', respectively.The above corroborated totally results obtained by centrality measures.

Conclusion
Studies on the reliability of scientific coauthorship network identify which networks are reliable from different approaches (edges and/or nodes) according to the participation of researchers and the intensity of extant co-authorship relations.
The advantage of an approach with emphasis on nodes is that it is possible, in a future research, to measure the importance of each researcher in the network through the calculus of the network reliability conditioned on its absence.Although the approach considering both edges and nodes is desired there are some reasons for not considering it.First, because it may be intractable.And second, the extra information obtained with the addition of the edge to the approach considering just the nodes may be negligible.The same logic may be considered for an approach focused on edges.
The results of this work showed that the measure of reliability of the co-authorship network with emphasis on edges or relations of coauthorship is more sensitive to changes in individual reliabilities of the network with emphasis on nodes or researchers.
The example provided showed that the calculation of reliability of a co-authorship network may be stressing when executed manually or by computer.The employment of centrality measures may be considered a feasible alternative.However, the use of other centrality measurements and the execution of simulations for more trust-worthy results are recommended besides the employment of these measurements.
is approached for emphasis on edges, or rather, taking into consideration nodes, or totally reliable researchers, and edges, or relations of co-authorship prone to failure:

Figure 2 .
Figure 2. Graph G from Figure 1 with emphasis on edges or co-authorship relations.Therefore, the reliability of the co-authorship network with emphasis on edges or co-authorship relation is given by Equation 11: (11) Since all edges or co-authorship relations of graph G have the same reliability p, the reliability of the co-authorship network may be expressed by (2).If 3 5 = S and 1 6 = S , then the reliability of the

Figure 3 .
Figure 3. Graph G from Figure 1 with emphasis on nodes or researchers.12096 .0 6 5 4 3 2 1 = × × × × × = p p p p p p p A (13) with k = 6 nodes or researchers (each researcher i is denoted by Figure 1.Graph G with six nodes or researchers (numbers in white) and six edges or co-authorship relations (numbers in black).

Table 1 .
Reliability of the co-authorship network modeled by graph G (edges or co-authorship relations with equal reliabilities) for different rates of p.

Table 2 .
Reliability of the co-authorship network modeled by graph G (nodes or researchers with equal reliabilities) for different values of p.
Figure 4. Comparison between reliabilities of the co-authorship network modeled by graph G (p RG ) shown in Table1and 2 for different rates of p.

Table 6 .
Centrality measures of nodes for the graph G.