Ad-hoc Features and Where to Find Them, Part 2: Current Limitations

Previously, we discussed how lattices can be used to discover features. Using a relationship called 'reverse-inheritance', it is possible to associate classes with methods from their sub-classes, and build a lattice that can be processed to then extract features. This is great to get an understanding of some system we want to study/improve, and the original paper introducing this method [1] shows real examples as to how this approach could be useful for real applications. However, when focusing specifically on ad-hoc features (i.e., poorly implemented features that don't fully take advantage of composition/inheritance to reduce duplication), the method has some shortcomings that I will develop in this blog post.

As a small reminder, the kind of features we want to detect looks like this :

That is, a set of method that is duplicated together in the hierarchy. However, a quick look at our reference paper [1] will reveal to us at least one limitation (which I'll call the "one method, many paths" problem): because there isn't necessarily always one path for our methods to 'travel upwards' in the hierarchy (for instance, because of interfaces (in the context of Java programs)), a set of methods can appear to be duplicated when in fact, some elements of the set come from the same place. Now, hopefully, not all elements can come from the same place, otherwise the super-classes would have been considered redundant and they would have been cut from the extent (cf., my previous post for an explanation of why that is).

Nonetheless, this is annoying for any user of this method. They might have to parse every method from the ad-hoc features they find to see if they're included because there is truly some kind of duplication, or because the hierarchy is built in such a way that the methods come from the same place, but from two paths. In practice, we could argue that a developer should be aware of the hierarchy they built, and would be able to catch such occurrences at a glance, but (1) we probably don't want to impose such a strain on our users and (2) not every user will be familiar with the code that they analyse. They might use this method to understand a new system, or to maintain some legacy code that they don't really care about.

Beside this inherent limitation, due to the way lattices are built, there are many edge cases where the method produces false positives. Let's look at a few examples:

This is a concrete example where the problem I've mentioned previously can appear. Ideally, we would like to detect a feature for the method getConstituent() that groups together Package and Element. However, because of the getTargetAssoc() method, there will be a concept grouping together IElemMarker and Element with both methods. This concept is more specialised than the first concept (because there are more attributes), and there will be as many occurrences of the 'duplication' in both concepts, so the algorithm will delete the first concept, because it is considered less 'interesting' (i.e., it can be compared to the other concept, has less methods but contains the same number of non-minimal classes, as defined by Mili et al [1]).

In short, because of the "one method, many paths" problem, some features can be "hidden", or more annoying to find, as there will be false positives in the concept. Now, it may still be interesting to know that getTargetAssoc() always appears close to getConstituent(), at least from a "finding features" perspective, but from a refactoring point of view, it could be argued that having a way to isolate getConstituent() could be more interesting.

Let's move on to another example:

This one is more artificial, and shows a concept grouping C2, C3 with foo() and bar(). Here, foo() is not duplicated in an ad-hoc manner, without encountering the "one method, many paths" problem. Unfortunately, we don't have a way to differentiate foo() from bar() using only the given lattice.

At last, let's see an example illustrating an annoying problem:

In this, diagram, we can detect a simple feature with the classes highlighted in red. On the surface, there seems to be no issue, and looking at the diagram with the feature in mind, it seems pretty natural to try and move the duplicated methods into the Package class. In this case, it all works out, and we have a demonstration at how discovering features could lead to an improvement in software quality.

Let's add a method into PackageDefault, ideally a method that is not found anywhere else. Because of the "one method, many paths" problem, a new, supposedly more 'interesting' concept, grouping Package and IPackageDefault, will replace our original concept. Now, when looking at this feature, it seems a bit less natural to move the duplicated methods into Package, at least for me, my first intuition would be to look at some common ancestor between Package and IPackageDefault, not to mention the fact that we have a false positive to take care of.

This demonstrate two things:

Non relevant methods can change the 'shape' of our lattice
It is not always advantageous to consider the concepts that contain more methods, over the 'smaller' concepts.

There are also many edge cases that can be found by fiddling with multiple inheritance/weird hierarchies, but to avoid adding too much noise I won't elaborate on them.

In short, we found out that while the lattice-based method is able to discover useful features, it fails to filter out false positives in the case of ad-hoc detection. In the next blog post, I'll argue that we can improve this method and clearly differentiate ad-hoc methods form the rest, within the lattice, by extending the set of attributes used to build the concept lattice. This will also entail some changes to the reverse-inheritance relationship, but in the end we will have a specialised algorithm that is more capable of discovering ad-hoc features (at the cost of not detecting the other kinds of features detected by Mili et al.).

See you next time!

References:

[1] : H. Mili, I. Benzarti, A. Elkharraz, G. Elboussaidi, Y. -G. Guéhéneuc and P. Valtchev, "Discovering Reusable Functional Features in Legacy Object-Oriented Systems," in IEEE Transactions on Software Engineering, vol. 49, no. 7, pp. 3827-3856, July 2023

Jungle image from https://pxhere.com/en/photo/645274

Ad-hoc Features and Where to Find Them, Part 2: Current Limitations

Written by:

Luca Scistri