I expressed this idea recently in a tweet and then started writing it up in more detail as a comment to Brendan Byrd’s The Four Major Problems with CPAN blog post. It grew in detail until I figured I should just write it up as a blog post of my own.(I fell out of the way of blogging over the two years or so of focus and distraction that our major house extension took to go from conception to reality. I’ve been meaning to start blogging again more regularly anyway. I’ve a few blog posts brewing in the back of my mind, so we’ll see how it goes.)
In Brendan’s post he describes four problems with CPAN:
- Too many modules are unmaintained; abandoned but not marked as such.
- There is not enough data on what modules are mature; which ones are the “right ones” to use.
- Many modules are only used for semi-private needs.
- Modules cannot be renamed or deleted, even with a long-term deprecation process.
I’d like to propose a feature that doesn’t seem to address these issues directly but would, I believe, greatly reduce the significance of all of them.
Olaf Alders responded to Brendan’s post with Sifting Through the CPAN and pointed out the need for better search tools and specifically suggests tagging. While tagging might be helpful in general I think we need a way to explicitly guide users from one module to another.
I’ve long thought that CPAN would benefit from a mechanism to track “suggested alternative modules”. (And/or perhaps “suggested alternative distributions“, but I’ll just talk about modules for now.)
I envisage a “Suggested Alternatives” section in the right sidebar on every module page. It would show the top-N suggestions, with a [++] icon beside each, ordered by the number of people who have made the suggestion or agreed to it by pressing the [++] icon. And naturally it would have a text field to enter an existing module name, with type-ahead suggestions. Finally, the Suggested Alternatives heading would be a link to a details page.
The details page would show, for that module, every instance of a suggestion being made or up-voted, with the user and the date. That would let people see who made the suggestion and when. Users would be able to remove their own suggestions.
For modules that are the suggested alternative for some other module, their page could show something like “Suggested as the alternative to X other modules by Y people” with a link to a page that would show the corresponding details.
With something like this in place “unmaintained, abandoned” modules would gather suggested alternatives. Mature ‘good’ modules would tend to accumulate suggestions pointing towards them, while mature ‘poor’ modules would tend to accumulate suggestions pointing away. Experiments and obscure “private needs” modules wouldn’t gather suggestions and that, combined with the higher ranking of modules with votes and inward pointing suggestions, means they’d languish in obscurity doing little harm.
The Alternatives Graph
This “alternatives” data creates a graph of relationships among similar modules in a powerful and directly useful way.
For search results it would be useful not only for ranking but also for widening the search. Modules that are the suggested alternatives for modules in the ‘natural’ results could be included. That’s potentially a big win.
Of course it would be perfectly reasonable for a pair of modules to have suggestions pointing to each other. Or for there to be loops of suggestions. That’s fine and simply expresses the conflicting views of the users making the suggestions.
Similar Modules (a digression)
I also had the idea that there may be value in having a ‘similar modules’ link that shows the list of modules produced by traversing the graph of suggestions for some number of hops in both directions, and ranked by some combination of votes and placement in the graph.
But then I wondered if that would be better implemented an explicit way to suggest a ‘similar module’. In other words, generalize the idea of a “suggested alternative” into a “related module” relationship plus attributes like a “weight”. Where a positive weight denotes a “suggested alternative” and a zero weight is simply a “similar module” or a “see also”. Perhaps there’s also value in having a “complementary module” relationship.
This is all a bit vague. It suggests to me that any code to support a “module relationship” mechanism should be kept generic to allow for other kinds of relationships in future.
The Whys and Wherefores
The primary data of the graph is a link from one module to another with a count of the number of people who agreed with that suggestion.
That surface data is built from a deeper layer that records, for each link, which users that made the suggestion and when.
A helpful extra feature would be to let users optionally give a short reason for why they are suggesting this particular alternative. Perhaps because they feel it’s unmaintained, or lacks specific features that their suggested alternative has.
Suggestions without the whys would be very useful, and I’d suggest that that much is implemented first. But suggestions without explanations are also very limited. Knowing what motivated someone to suggest a particular alternative would be very helpful to others trying to pick a module for a task. For example, people might make multiple alternative suggestions recommending Bar instead of Foo if you want a certain feature, and Baz instead of Foo if they want another.
I don’t think there’s much risk of this becoming a comment battlefield because on any given page all the comments share the same direction ‘away’ from the module. Someone with an opposing viewpoint would add a separate suggestion with their own comments on the ‘opposite’ module.
I’d suggest the comment field be kept very short, say 50 characters, and provide a separate url to encourage referencing supporting material such as a blog post or mailing list archive.
Other approaches might be to have a few checkboxes with typical reasons (very limited), or perhaps tags, or link in with cpanratings in some way (possibly complex).
The best way to build and present Alternative Distributions data is probably to simply derive it from the Alternative Modules data.
It would simply be a read-only view that collapses the module level graph data down to links between the corresponding distributions.
Yanick Steps Up
After writing a draft of this post I saw a tweet from Yanick with a link to a specific proposal on his blog. I skimmed it, realised it was similar to mine and replied saying to I’d reference it here. I decided I’d finish my post before reading it properly.
So here are my thoughts on Yanick’s suggestions:
Distributions vs Modules: Modules are the fundamental unit of use and the natural focus of attention and reviews. It’s relatively easy to derive distribution suggestions from module suggestions, but not the other way around. Using modules as the focus also means the suggestions will still be valid if a module moves from one distribution to another.
Adding notes: I agree that comments are best avoided for the initial system. I also feel strongly that their value outweighs their risks if implemented and presented carefully, so they should at least be taken into account in the initial design work.
User interface for recommending an alternative: Having a button beside the existing high-profile vote button doesn’t feel right to me. The vote button is a positive action and encouraging low-friction drive-by voting makes sense. Suggesting an alternative is a more negative action, and one to be considered more carefully. Using the sidebar seems more appropriate.
User interface for viewing suggestion alternatives: I’d rather not include any user names on the module page. It complicates the code and confuses the user experience (“which names are shown and why?” etc). The full details are available on the detail page if anyone wants to take the extra step to see them.
Volunteering to do something: Awesome!
Update: Implementation is being discussed on this cpan-api ticket.