During my bachelor's, I picked up Albert-László Barabási's Network Science book for a research project. The premise is elegant: protein interactions, power grids, social media feeds, financial markets, and many other complex systems can be modeled as graphs, with objects as nodes and relationships as edges. Once you have a graph, you can measure things: who is central, what clusters form, how fast information spreads, and whether the network has the "small-world" property where any two nodes are surprisingly close [1].
Scientific collaboration networks fell into this framework naturally and captivated me. I wanted to apply these ideas to Brazilian computer science to answer some questions that I had out of curiosity. The result is a peer-reviewed paper, Beyond Boundaries: Collaboration Networks and Research Output in Brazilian Computer Science, co-authored with André Vignatti, published in the XIV Brazilian Workshop on Social Network Analysis and Mining (BraSNAM) [2]. This post is my attempt to bring the findings out of LaTeX and into plain language, while adding some context about the research.
The Science of Measuring Science
Before the methodology, some framing. Bibliometrics is the quantitative study of scientific publications: how many papers are produced, who cites whom, how journals and conferences vary in prestige, and how those numbers change over time. A related but broader field, scientometrics, studies science itself as a social and epistemic system, asking how institutions, funding, geography, and policy shape what knowledge gets produced and by whom [3].
These fields have exploded in the last two decades, partly because open bibliometric databases made large-scale analysis feasible, and partly because funding agencies started using bibliometric indicators in evaluation processes. In Brazil, CAPES (which accredits graduate programs) and CNPq (the main research funding agency) both rely on metrics such as publication counts, citation rates, and collaboration breadth to assess researchers and programs.
Instead of counting papers and citations in isolation, Social Network Analysis (SNA) lets you ask structural questions: is this research community tightly clustered or spread out? Who are the "bridge" researchers connecting otherwise separate groups? How much does the network depend on a handful of key connectors? These questions cannot be answered by looking at individual papers; they require treating the entire community as a system.
Data Collection
There are a few options to collect publication metadata, some of them are open solutions like DBLP, Semantic Scholar, and OpenAlex, and also some commercial tools such as Scopus and Web of Science. All of these options enable us to access the publication metadata through API requests.
We decided to use OpenAlex for two main reasons. First, it has a detailed level of metadata, providing all the information we needed. For instance, it has institutional affiliations of the authors that support geographical analysis and systematic classification of publications within disciplinary subfields. Second, it offers great API documentation, especially when compared to other data sources, where the API data itself and the documentation are very confusing. Overall, OpenAlex is very easy to use, and the data we collected follows this structure:

We collected all data on March 31, 2025, and stored the raw dataset in CSV. The pipeline followed a classic ETL (Extract, Transform, Load) pattern:
- Extract: Python scripts queried the OpenAlex API in batches of 25 records, organized by subfield and country to handle pagination constraints.
- Transform: We cleaned duplicates (using DOIs and OpenAlex IDs as keys), removed entries without DOIs, and built co-authorship graphs using NetworkX. Edges were weighted by collaboration frequency and graphs were exported as GEXF files.
- Load: Statistical visualizations were produced with Seaborn and network visualizations with Gephi.
All code and datasets are publicly available on our GitHub repository.
Global Landscape
Brazil ranks 12th globally in publication output with 76,184 publications and 447,919 citations. This positions Brazil in the middle tier of global research productivity, yet Brazil's citation ratio of 5.88 falls considerably short of high-impact nations like Australia, Great Britain, and the United States.
| Rank | Country | Publications | Citations | Citations/Paper |
|---|---|---|---|---|
| 1 | 🇨🇳 China | 694,103 | 8,280,834 | 11.93 |
| 2 | 🇺🇸 United States | 474,474 | 9,590,230 | 20.21 |
| 3 | 🇮🇳 India | 311,644 | 2,224,750 | 7.14 |
| 4 | 🇮🇩 Indonesia | 266,047 | 755,078 | 2.84 |
| 5 | 🇩🇪 Germany | 141,044 | 1,960,782 | 13.90 |
| 6 | 🇬🇧 Great Britain | 140,019 | 3,026,576 | 21.62 |
| 7 | 🇯🇵 Japan | 103,265 | 799,048 | 7.74 |
| 8 | 🇫🇷 France | 92,131 | 1,011,787 | 10.98 |
| 9 | 🇨🇦 Canada | 84,076 | 1,562,176 | 18.58 |
| 10 | 🇷🇺 Russian Federation | 83,214 | 363,210 | 4.36 |
| 11 | 🇮🇹 Italy | 80,647 | 1,021,944 | 12.67 |
| 12 | 🇧🇷 Brazil | 76,184 | 447,919 | 5.88 |
| 13 | 🇪🇸 Spain | 75,433 | 944,337 | 12.52 |
| 14 | 🇰🇷 South Korea | 74,421 | 994,882 | 13.37 |
| 15 | 🇦🇺 Australia | 68,502 | 1,561,906 | 22.80 |
Many factors influence how often a scientific publication is cited, including the quality of the research, the field of study, the publication venue, and the language. These factors are extensively studied in Bibliometrics and Scientometrics [3], but they are not the focus of our work. Instead, we investigate international scientific collaboration between Brazilian researchers.
International Collaboration Levels
To examine the level of international collaborations for each Computer Science subfield, we classified each publication as domestic (all authors at Brazilian institutions) or international (at least one co-author at a foreign institution).
We investigate the rate of international publications for each Computer Science subfield. It turns out that the proportion varies significantly. Theory & Math leads international collaboration (37%), while Info Systems lags (16%).
| Subfield | Domestic-only Pub. | Domestic-only % | International Pub. | International % |
|---|---|---|---|---|
| Theory & Math | 2,471 | 62.54 | 1,480 | 37.46 |
| Networks & Comm | 5,309 | 67.34 | 2,575 | 32.66 |
| Graphics & CAD | 251 | 68.21 | 117 | 31.79 |
| Hardware & Arch | 811 | 69.38 | 358 | 30.62 |
| AI | 10,009 | 70.68 | 4,152 | 29.32 |
| Vision & Recognition | 4,470 | 71.93 | 1,744 | 28.07 |
| Signal Processing | 1,543 | 72.20 | 594 | 27.80 |
| Software | 583 | 72.24 | 224 | 27.76 |
| CS Apps | 2,260 | 74.69 | 766 | 25.31 |
| HCI | 1,443 | 78.42 | 397 | 21.58 |
| Info Systems | 22,135 | 83.26 | 4,450 | 16.74 |
| Total | 51,285 | 75.26 | 16,857 | 24.74 |
The United States is the primary collaborator of Brazil across all subfields. Portugal emerges as the second most significant collaborator, likely facilitated by shared linguistic and cultural ties. Other European countries, such as Spain, Germany, and France, exhibit specialized collaborations.

Our analysis also showed that papers involving international collaboration receive substantially more citations than domestic-only papers, with an average of 12.56 citations per paper compared to 3.95. Furthermore, international collaboration papers exhibited a considerably lower zero-citation rate: only 29% remained uncited, whereas 49% of domestic-only papers received no citations.
Network Structure and Dynamics
The full network, encompassing all publications, has 119,228 nodes (N) and 465,163 edges (E), with a fragmentation (F) of 0.64, the average clustering coefficient (C) of 0.79, and the with 76,770 Nodes in the Largest Component (NLC). This suggests a relatively connected network with a high propensity for collaborators of an author to also be collaborators with each other.
| Network | Pub. | N | E | F | C | NLC |
|---|---|---|---|---|---|---|
| Info Systems | 26,585 | 65,349 | 196,087 | 0.83 | 0.80 | 26,543 |
| AI | 14,161 | 28,636 | 119,515 | 0.64 | 0.77 | 17,211 |
| Net & Comm | 7,884 | 14,701 | 48,756 | 0.64 | 0.78 | 8,763 |
| Vision & Recognition | 6,214 | 13,776 | 49,678 | 0.69 | 0.82 | 7,641 |
| Theory & Math | 3,951 | 8,568 | 46,513 | 0.87 | 0.80 | 3,132 |
| CS Apps | 3,026 | 7,023 | 17,872 | 0.85 | 0.78 | 2,705 |
| Signal Processing | 2,137 | 4,399 | 13,340 | 0.89 | 0.82 | 1,409 |
| HCI | 1,840 | 4,516 | 12,116 | 0.93 | 0.78 | 1,174 |
| Hardware & Arch | 1,169 | 2,276 | 7,084 | 0.65 | 0.83 | 1,345 |
| Software | 807 | 1,736 | 4,221 | 0.86 | 0.80 | 630 |
| Graphics & CAD | 368 | 925 | 1,709 | 0.98 | 0.79 | 105 |
| Full network | 68,142 | 128,847 | 492,524 | 0.64 | 0.78 | 76,770 |
Visualizing this type of network is challenging because the large number of nodes and edges leads to heavily cluttered representations, which reduces interpretability. So we generated a series of network visualizations focusing on two aspects of the collaboration structure: highly cited publications (those with more than 40 citations) and recurrent co-authorship relationships. The analysis considered both country-level and subfield-level collaboration networks. The filtered network has 1,063 nodes and 2,237 edges.


We note that in highly cited publications, Brazilian authors form central hubs, and international authors are not evenly distributed over the network, suggesting that international partnerships concentrate around a few key researchers. In addition to that, cross-disciplinary edges exist but largely rely on a handful of “bridge” research.
Discussion
It is very interesting to see how collaboration varies within Computer Science subfields, and this gives us more context for understanding some of the reasons for research impact. Still, there are many other angles to explore. For instance, using another data source (Scopus, Semantic Scholar, Web of Science) would be great to improve data coverage and quality. Another thing that would help us to understand the collaboration dynamics is to compare Brazilian collaboration networks with those of other countries. For example, Canada, France, and Italy have a similar number of publications, so a few questions could be made: how do these countries’ collaboration networks look? Which are the countries they collaborate with the most? How does the citation disparity differ for each subfield in those countries?
[1] https://networksciencebook.com/
[2] https://sol.sbc.org.br/index.php/brasnam/article/view/36367
[3] https://researchmusings.substack.com/p/scientometrics-or-bibliometrics
[4] https://link.springer.com/article/10.1023/A:1017919924342