Poorly handled enums can infect code with fragility and tight coupling like a digital Typhoid Mary.
Say you’re writing software that optimizes traffic flow patterns, and you need to model different vehicle types. So you code up something like this:
vehicle_type.h
enum VehicleType { eVTCar, eVTMotorcycle, eVTTruck, eVTSemi, };
Then you press your enum into service:
route.cpp
if (vehicle.vt == eVTSemi || vehicle.vt == eVTTruck) {
// These vehicle types sometimes have unusual weight, so we
// have to test whether they can use old bridges...
if (vehicle.getWeight() > bridge.getMaxWeight()) {
Quickly your enum becomes handy in lots of other places as well:
if (vehicle.vt == eVTMotorcycle) {
// vehicle is particularly sensitive to slippery roads
And…
switch (vehicle.vt) { case eVTTruck: case eVTSemi: // can't use high-occupancy/fuel-efficient lane case eVTMotorcycle: // can always use high-occupancy/fuel-efficient lane default: // can only use lane during off-peak hours }
Diagnosis
The infection from your enum is already coursing through the bloodstream at this point. Do you recognize the warning signs?
- Knowledge about the semantics of each member of the enum are spread throughout the code.
- The members of the enum are incomplete. How will we account for cranes and bulldozers and tractors and vans?
- Semantics are unsatisfying. We’re saying cars are never gas guzzlers or gas savers; what about massive old steel-framed jalopies and tiny new hybrids?
The infection amplifies when we want to represent the enum in a config file or a UI. Now we need to convert to and from strings, and we use the classic shadow array of string literals, indexed by enum:
// THIS ARRAY ***MUST*** BE KEPT IN SYNC WITH THE ENUM DECLARED
// AT THE TOP OF vehicle_type.h!!!
char const * VEHICLE_TYPE_NAMES[] = {
"car",
"motorcycle",
"truck",
"semi"
};
char const * getVehicleTypeName(VehicleType vt) {
return VEHICLE_TYPE_NAMES[vt];
}
Lest you think this ugliness is unique to C/C++, the Java or C# equivalent isn’t all that pretty, either:
@override
String toString() {
// eVTxyz --> xyz
return super.toString().toLowerCase().substring(3);
}
static VehicleType fromString(String vt) {
if (vt.equals("truck")) return eVTTruck;
if (vt.equals("semi")) return eVTSemi;
...
}
You might be rolling your eyes at the clumsy conversions. Yes, we could do error checking to make getVehicleTypeName()
safer. Yes, we could use reflection in some languages to automate these conversions.
That misses the point.
We’re still propagating knowledge indiscriminately. If the UI is involved, chances are there’s a view, or an html <select> tag, or a javascript validation function, or–heaven help us–a localized message table–that has knowledge about the possible values of this enum. As the enum grows over time, your maintenance must regularly touch many different modules, possibly at many different layers. This is a recipe for bugs.
The code is sick.
Pretty soon symptoms become externally visible: code is measurably buggy; unit tests require lots of maintenance when you make a change; you have debates about how to accommodate strange new vehicle types; the high priest/grand wizard of the codebase regularly corrects acolytes that attempt “simple” tweaks; people advocate a coding standard that requires comments on every member of the enum to explain its ramifications.
Treatment
The good news is that this particular sickness has an effective and straightforward cure.
The root cause of our disease is semantic diffusion and coupling, and the essence of the cure is encapsulation through a form of declarative programming.
I’ll present a formula for our prescription in C++ (where I first learned it from Julie Jones, years ago); then we can explore what it’s doing, and what its analogs might be in other languages.
vehicle_type_tuples.h
// No sentry. This is deliberate. // TUPLE(id, max_wheels, max_weight_kg, max_passengers, avg_km_per_liter) TUPLE(Car, 4, 1800, 6, 8) TUPLE(Truck, 4, 5700, 4, 5.5) TUPLE(Motorcycle, 2, 450, 18) TUPLE(Semi, 18, 19000, 2, 4) #undef TUPLE
vehicle_type.h
#ifndef VEHICLETYPE_H #define VEHICLETYPE_H enum VehicleType { #define TUPLE(id, max_wheels, max_weight_kg, max_passengers, avg_km_per_liter) eVT##id, #include "vehicle_type_tuples.h" }; char const * getVehicleTypeName(VehicleType vt); int getVehicleTypeMaxWheels(VehicleType vt); int getVehicleTypeMaxWeightKg(VehicleType vt); int getVehicleTypeMaxPassengers(VehicleType vt); double getVehicleTypeFuelEconomy(VehicleType vt); #endif // sentry
vehicle_type.cpp
#include "vehicle_type.h" static struct VehicleTypeTuple { VehicleType id; char const * name; int max_wheels; int max_weight_kg; int max_pasengers; double avg_km_per_liter; }; static VehicleTypeTuple const TUPLES[] = { #define TUPLE(id, max_wheels, max_weight_kg, max_passengers, avg_km_per_liter) \ { eVT##id, #id, max_wheels, max_weight_kg, max_passengers, avg_km_per_liter }, #include "vehicle_type_tuples.h" }; static const size_t TUPLE_COUNT = sizeof(TUPLES) / sizeof(TUPLES[0]); char const * getVehicleTypeName(VehicleType vt) { if (static_cast<size_t>(vt) < TUPLE_COUNT) { return TUPLES[vt].name; } return "unknown"; }; ... other functions ...
bridge.cpp
bool mayBeTooHeavy(VehicleType vt) { return getVehicleMaxWeightKg(vt) > 5000; }
route.cpp
if (Bridge::mayBeTooHeavy(vehicle.vt)) {
Setting aside the last two snippets for a moment, the obvious ingredients in the C++ version of our formula are:
- Our vehicle enum values, and their associated attributes or semantics, are declared by calling a macro, TUPLE.
- This macro is called once for each enum value, in a header that contains no sentry (vehicle_type_tuples.h). Essentially, this creates a table of data that can be manipulated at compile time.
- The TUPLE macro is #defined to mean different things in different places (in vehicle_type.h, and again in vehicle_type.cpp). Each time the meaning of the macro changes, we #include our table of data and generate more code.
How does this help us?
- All knowledge about possible enum values is concentrated in one file.
- We no longer have to hand-edit a parallel shadow array with an obnoxious (and ignorable) comment to keep it in sync with our enum. It is impossible to get out of sync.
- The set of attributes that we can associate with our enum is unbounded; we can add as many fields to our tuple as we wish.
- Our file of tuples is extraordinarily simple to parse; it contains nothing other than a series of TUPLE() calls. If we need to validate enum values in some other language or environment, we can process the file during the build to generate a javascript function, an xml example, a sample config file, and so forth.
Separation of Concerns
Another characteristic of the solution deserves deeper discussion. Why did we include the snippet from bridge.cpp in our solution? Isn’t another function there unnecessary? Why not do the following in vehicle_type_tuples.h?
// TUPLE(id, heavy_risk, slides_easily, fuel_consumption)
TUPLE(Car, false, false, average)
TUPLE(Motorcycle, false, true, low)
TUPLE(Truck, true, false, high)
TUPLE(Semi, true, true, high)
Then we could do this in route.cpp:
if (vehicle.vt.heavy_risk) {
After all, if our goal is to figure out which vehicles are heavy enough to cause problems on bridges, shouldn’t we just say that in our tuples?
The answer involves coupling. The second, less optimal form of the TUPLE macro builds into each vehicle type assumptions about how and why the vehicle type’s inherent characteristics will be analyzed, while the earlier and better form does not. Instead, it leaves judgement about the ramifications of these characteristics to other parts of the system (like bridges) that know about their own problem domain.
In other words, the better version couples vehicle type and traffic routing more loosely.
Which version will require less maintenance if you decide that the threshold for vehicles that are too heavy for a bridge is 10,000 kg instead of 5,000? Which will require less maintenance if you decide you now need 4 gradations of ranking on fuel economy, or if the average fuel economy on your vehicles changes?
Other Languages
Only a few modern programming languages provide a preprocessor, but this doesn’t mean that lack of macros makes enum encapsulation impossible. All languages that I know support some form of tabular data structure, and quite a few offer first-class tuples.
In Java, for example, you could write a static initializer block that builds a HashMap of attributes for each value in an enum. In Python, you could populate a dict indexed by string constants. The basics of the technique are replicable anywhere.
Pragmatism
Of course, not every enum is worth handling in this careful and encapsulated way. If you have an enum that’s got three items, and it will never change, and you have no interesting semantics to manage, and you’re not converting it to and from strings, and the enum is only visible in a single module, then (to quote my friend Moray King), the juice is probably not worth the squeeze.
For the more critical enums in your codebase, however, I think a careful approach will pay big dividends.
Signs of Health
You’ll know you’re handling enums right if it’s difficult or impossible to add a new value to an enum without also specifying that value’s attributes, and if you stop seeing tests for one or more enum values, scattered in conditionals all over the code. Statements like this:
if (vehicle.vt == eVTTruck || vehicle.vt == eVTSemi)
or…
switch (vehicle.vt)
… will be hidden in functions that capture (encapsulate) the semantic condition you really want to test. In fact, enum values themselves will only appear in places where an object’s state is set directly; even in semantic wrappers, you’ll often be testing a characteristic (like weight [mass], in our example) instead of actual enum values themselves. Certainly, all other code works off of semantics. When you add a new enum value, you only have to examine a handful of semantic functions to tease out ramifications, and your confidence in the tweak is high. Unit tests break in predictable and isolated ways, and the fixes become obvious.
Action Item
Find one enum that’s problematic in your code, and clean it up.
Related articles
- Enums With More Than One Name (pdark.de)
- Can I use Enums when I declare the element number of an array? (stackoverflow.com)
- C# check for member of enum using input value (stackoverflow.com)
![](http://pixel.wp.com/b.gif?host=codecraft.co&blog=4361534&post=749&subd=techknowledgeme&ref=&feed=1)