The Meaning of Null

With many object-oriented computer languages there is a difference between an empty object and a null object. I often come across code which understands there is a technical difference between empty and null, but doesn't properly use the semantics of these two concepts. The semantic difference is important.  Professional developers can write better code if they understand this difference.

First consider the technical difference:

  • An empty Integer is zero; an null Integer is not zero; it's Nothing, or null, or whatever your language uses to say "null."  Unless you're a VB developer you know that zero is not equal to null. (VB.NET blurs this technical distinction in some cases; an unfortunate hold over from its ancient roots).
  • An empty string is "" while a null string is Nothing. "" does not equal nothing (VB gets this one right).
  • Same with List: you can have an empty List, or a null List, and these things do not equal each other and carry different semantics.

Now take that List and consider the semantic difference:

  • An empty List has a specific meaning: it means there is a bag that can hold stuff, but the bag is empty -- it has nothing in it.
  • A null list says, "There is no bag."

Here comes the main take-away from this blog post:  programmers tend to use the empty value as a special, magic value. Avoid doing this.  Use null value instead as a special value.

Consider this coding example where you can optionally filter a report by year:

void PrintReport(int year) {
    var query = GetRecords();
    if (year != 0) {
        query.FilterByYear(year)
    }
    Print(query)
}

This code uses the empty value, zero, to have a special, magic semantic meaning: execute the report on all years, not just one. The code should be changed to use null to represent this magic value:

void PrintReport(int? year) {
    var query = GetRecords();
    if (year != null) {
        query.FilterByYear(year)
    }
    Print(query)
}

Why is this code so much better that I'm taking the time to blog about it?

  1. It expresses intent better. Imagine a developer who's about to call the top version of PrintReport. They wouldn't know without looking at the code that printing a report for all years is even an option. Whereas the second version makes it pretty clear: the question mark in the method signature tells every developer who ever calls this method, "Hey, there's something interesting about this parameter!"
  2. It avoids collisions. What if we're printing a report of archeological artifacts filtered by the estimated date the artifact was created?  The top code has a bug for that case - when the user wants to find artifacts estimated at 0 AD, it will return every artifact in the system.
  3. It prevents future bugs. In the former code, everyone who ever touches the "year" parameter needs to be aware, "this value could be zero, and that means something special."  But there's nothing in the language itself to remind us of that. Whereas in the second block, every time you touch the "year" parameter and don't handle the null case, the compiler will detect that and throw an error forcing you to handle it.

Sometimes -1 is used with magic semantics. In general, don't do it. Here are some more examples demonstrating the semantic difference between empty and null.

  • We don't know the age of a person.
    • Person.Age = 0 is ambiguous; maybe we start tracking infants and 0 is a real valid age, not a magic special value.
    • Person.Age = null is much more clear: "there is no age value."
    • Person.Age = -1 is dangerous; if we start doing calculations based on age, the calculations will probably work and we'll get whacky values that we may or may not notice. But if you try to do a calculation when Age = null then usually stuff blows up and the coder can immediately see their mistake. In fact, having to use Person.Age.Value is an immediate and unavoidable reminder that you have to handle the null case.
  • We give our programming candidates a bowling scoring exercise; in it there is an array of integers called “throws” which represents how many pins were knocked down in a throw.
    • Many applicants accept a strike as { 10, 0 }. Semantically this is wrong. { 10, 0 } implies two throws, one in which 10 pins were knocked down and one in which zero pins were knocked down.
    • The correct way to accept a strike is { 10 }. There is no second throw. Many bowling submissions we get require that the first method is used and blow up if you use the second, correct method. (This doesn't necessarily disqualify a candidate, especially an entry level candidate, but it's something I expect expert programmers to think about.)

Understanding the semantic difference between empty and null makes your code more consistent, more readable, and less prone to bugs.

Comments !

links

social