I was recently shown this Web site, called the Computer Science Ranking, which provides "[a] ranking of top computer science schools". "[It] is designed to identify institutions and faculty actively engaged in research across a number of areas of computer science, based on the number of publications by faculty that have appeared at the most selective conferences in each area of computer science".

Now, such ranking is highly perplexing.

As software engineers, we all know that there is not one metric that can adequately and comprehensively measure the quality of all software systems. Indeed, LOC (or one of its many variants) is a "good" first metric but we all know and understand that LOC makes little sense when comparing, say, two Java systems, one for image processing (short but complex) and another for managing a database (long but straightforward) or when comparing, say, one Java system with one Prolog system. As software engineers, we all know that there is not one metric that can adequately and comprehensively describe all software systems and that we need many metrics (to measure different aspects of the software systems) and quality models (to link and interpret the metrics and their values) and external factors (to assess the adequacy and comprehensiveness of the models).

Perplexing? (Arguments)

As researchers, we should know better than use one and only one metric to measure the quality of all software-engineering researchers. Indeed, ICSE and FSE and ASE and ISSTA (as of today) are "selective conferences". But what does their "selectiveness" represent? As LOC, "selectiveness" (or one of its many variants) is a "good" first metric but we should all know and understand that "selectiveness" makes little sense when comparing, say, one researcher who comes from a well-endowed Ph.D. program and a long line of dedicated mentors and another who makes do with little resources, no mentoring, or a high teaching-load. As researchers, we should all know that there is not one metric that can adequately and comprehensively describe all researchers and that we need many metrics that measure different aspects of the researchers and models that combine these metrics.

NSERC (the Natural Sciences and Engineering Research Council of Canada) does a very good job at taking into account different aspects of the roles of the researchers applying to its Discovery Grant program. It instructs explicitly the members of its Evaluation Groups (and associated reviewers) to consider different aspects of the roles of researchers: fundings, tutoring, and publications. It also takes into account circumstances affecting productivity. It further considers partly community services and the environments in which researchers evolve. NSERC then asks the members of its Evaluation Groups to consider all these aspects and grade each applicant in terms of the "Excellence of the researcher" and their "Contributions to the training of highly qualified personnel". As such, NSERC uses more adequate and comprehensive evaluations of researchers and their organisations. (The two aspects that NSERC does not consider fully: administrative services and access to a graduate program are debated every year.)

Perplexing? (Numbers)

Just in Canada, there are more than 3,000 researchers in various area of computer science who apply to the NSERC Discovery Grant every year and, thus, who could be considered "actively engaged in research". Would that make any sense if these 3,000 researchers were to submit to ICSE and FSE? Let us assume that only half of these researchers submits to ICSE and FSE and only half of this half submits to either conference, then 750 researchers would submit one (of more!) paper to ICSE. If these 750 researchers submit only one paper each, that would still be more submissions than the 530 papers received at ICSE in 2016. And only from one country!

In addition, blindly (misleadingly?) counting the numbers of papers published in a (very) small set of conferences leads obviously to perverse effects including but not limited to: inflation of the numbers of submissions, creation of a "cast" of submitters and reviewers, promotion of a "us vs. them" mentality... All effects detrimental to the advance of software-engineering research. (Allow me to draw a parallel between using a subset of "selective conferences" to rank researchers and their organisations with using the amount of money brought by traffic tickets to rank police officers...)

Alternative Solutions?

Rather than using a subset of "selective conferences", which, by definition of "selectiveness" and the difficulty of the reviewers' jobs, necessarily reject meritorious papers and accept mediocre papers, "[a] ranking of top computer science schools" must take into account the different aspects of the roles of researchers, including and not limited to fundings, tutoring, publications, community services, special circumstances, environment. I am all for ranking researchers and their organisations to promote excellency but I am against using metrics (and no models!) that lead to biases and, ultimately, worthless rankings.

Disclaimer: I was a member of the CS 1507 NSERC Evaluation Group for four years. My tenure ended in 2017. I have no stake in NSERC and promote NSERC because I argue that its instructions provide more adequate and comprehensive evaluations of researchers and their organisations than any set of "selective conferences".

Further readings: Pek, Mens, et al. have an intersting paper on "How healthy are software engineering conferences?"

One Metric to Rule Them All?

Perplexing? (Arguments)

Perplexing? (Numbers)

Alternative Solutions?

Written by:

Yann-Gaël Guéhéneuc