Ad-hoc Features and Where to Find Them, Part 1: Origin Story

When you look at a piece of code/software, it is often fairly intuitive to know where/what are some of the implemented features; either because there is a good amount of documentation, the code is well structured, or because there are tools that help with feature location. However, not all features are simple to find: some because of poor maintenance, others because the feature was not necessarily treated as such during the implementation. I would call these ad-hoc features. As the name suggests, these features were implemented without much thought: maybe developers thought it would not be useful to properly package them and they ended up being reused elsewhere, maybe a large team got confused and re-implemented the same piece of code many times, or maybe someone just didn't recognize that there was even a feature to begin with!

Recently, there was some development regarding the detection of such features, and it was shown that concept lattices could help in that endeavor [1]. In short, a concept lattice is a hierarchical structure composed of concepts, which is used to find commonalities and variations between a set of objects described by some attributes (see this post for an introductory approach to lattices in software engineering).

In a paper from 2023, Mili et al. [1] showed that given the correct incidence relation and appropriate post-processing, we can build a lattice and transform it to detect ad-hoc features. Let's look at a concrete example from the paper to understand how that is:

Given this example hierarchy, the ad-hoc feature that we want to detect consists of the elements {capabilities, schedule, assemblyLine, licenseClass} that are duplicated across classes that are not directly related (i.e., there is no sub/superclass relation between the classes that implement these elements and there is no composition). This is a textbook example of an ad-hoc feature, which is defined in the paper as "sets of program elements at the class level, i.e., fields and methods, that appear together in multiple locations in the system." with togertherness and multilocation being defined respectively as "“occurring within the same class sub-hierarchy” and [...] “occurring in two sub-hierarchies that are not hierarchically related”, i.e., such that one sub-hierarchy is not included in the other." [1]

With this understanding in mind, we can now get to building our concept lattice. To do so, we first have to encode the class hierarchy, i.e., associate each class with some program elements to build an incidence relation. The encoding is done as follows:

Each method is referred to using its signature, and each attribute is referred to by its type and name (e.g., methods from different classes that have the same signature will map to the same attribute in the relation)
Each class is then associated with the elements that it declares, and the elements from all its subclasses

Once the relation is built, we get the following lattice

We can clearly see in Concept_ExAntiSpec_4 the feature that we're after, but some post-processing still needs to be done for us to be able to extract it. Let's refer back to our original hierarchy. We can see below that the classes Personnel and Ressource will contain the same attributes as ShopFloorStaff, but these 3 classes all correspond to the same occurrence of the feature we're looking for. Because we're looking for independent occurrences, we have to just keep the class that is at the very bottom of the sub-hierarchy, i.e., the one that is the closest to where the feature occurs. This means that, out of the three classes, we only keep ShopFloorStaff. The reason we don't choose one of the two other classes is to avoid conflicts with other potentially independent occurrences. For instance, if we kept Ressource, it would conflict with the occurrence that we detect from Machinery.

You might also notice that we could also detect {assemblyLine} and {licenseClass}. These are also ad-hoc features, but less interesting than the more general case {capabilities, schedule, assemblyLine, licenseClass}. If these features have more independent occurrences than the more general one (i.e., the extent of the related concept contains more independent occurrences), we can keep them, otherwise we can just discard these features.

In the end, once the extent of each concept is rid of all the superfluous classes, and all the non-interesting sub-features have been deleted, all the concepts whose intent is non-null and whose extent contains more than 1 class are potential ad-hoc features. Notice that I wrote 'potential': more work needs to be done to filter out the non-ad-hoc features. Indeed, because of composition and redefinition, we can have multiple occurrences in different parts of the hierarchy, and yet the extent is made up of classes that represent a feature that is well thought-out.

The original paper considers three other categories of feature:

Interface implementation
Aggregation
Class-subclass redefinition

For brevity I won't explain this last part of the approach, but having to do such manipulations shows that, while the approach works and can detect useful ad-hoc features (the reader can read the original paper to look at the arguments in detail), it can probably be improved.

Now that we've laid the groundwork for further discussion, I'll try to make the case, over the next few blog posts, as to why this approach can probably be improved, and I'll present my own attempt at providing a better algorithm for the detection of ad-hoc features (still using concept lattices!).

See you soon!

References:

[1] : H. Mili, I. Benzarti, A. Elkharraz, G. Elboussaidi, Y. -G. Guéhéneuc and P. Valtchev, "Discovering Reusable Functional Features in Legacy Object-Oriented Systems," in IEEE Transactions on Software Engineering, vol. 49, no. 7, pp. 3827-3856, July 2023

Jungle image from https://pxhere.com/en/photo/912906