Tuesday, March 18, 2014

The law of least surprise

Extreme programming argues that documentation is not needed, since code is the best description of what the program does. In order for this to be true, code has to be readable. Very readable.

It is important to be able to tell what is the direction of data in every line you read. In the following case, which variable is the source and which is the destination ?

    Array *names;
    struct Detail details;
    ...
    set_name_details (names, &details);

Here is a simple rule to make the caller code clearer...


Back in the first days you learned to write C or C++, you had to pass a pointer to the variable that you wanted act as a return value:

    int product;
    int multiplier, multiplicand;
    ...
    multiply (multiplier, multiplicand, &product);

With this type of call the intention is clear - product is going to be written, and multiplier passed as a read only variable.

Well, if this rule is kept for everything you write, then you do not need to
- read documentation of how a function works
- read the implementation of the function to see whether it treats the variables as read-only
- read the header to see which parameters are constant
- guess what the function does

If you used C++ style of writing, instead of using low-level C constructs (such as pointers to structs), it is possible to write readable code.

If you follow this call convention, then none of these calls would be ambiguous about the direction of information flow:


    vector<NameDetails> names;
    struct Detail details;
    ...
    set_name_details (&names, details);
    ...
    extract_name_details (names, &details);


It is always the variable that contains the '&', ie the reference (or pointer) that is expected to be written!


But, it is not efficient to pass every read-only variable by-value 


Yes, for efficiency, we should pass large variables by reference. In C++ you can do that by declaring the functions like so:


    void set_name_details     (vector<NameDetails>       *names, const Detail &details);
    void extract_name_details (const vector<NameDetails> &names, Detail       *details);


where:

  •  when you want to pass a variable for read-only purpose, 
                you pass it as a const reference 
                  void foo (const X&); 

                or, you pass it by-value
                  void foo (X);
  • when you want to pass a variable to be written
                you pass its pointer
                  void foo (X *);



what about the return value?


The class-of-85 rules, dictated that you should not return large variables, because it makes unnecessary copies. Well, this rule is almost history now.

It is clearer to read 
    vector<NameDetails> details = get_name_details ();

rather than 
    vector<NameDetails> details;
    get_name_details (&details);

where there is always the dilemma:  is the function going to append to the details vector, or will it clear my vector and then append ?

Well, in C++, both versions can be equally efficient, because of the RVO optimization that the compiler may perform.


what do you mean that the optimization may be performed?


  • This code can optimize the return value (no unnecessary copy takes place):

      vector<NameDetails> get_name_details ()
      {
          vector<NameDetails> ret;
          ...
          return ret;
      }

    If you follow the rule:

    • - declare the return variable first,
    • - there should be only one return statement

  • This code cannot optimize the return value:

      vector<NameDetails> get_name_details ()
      {
          vector<NameDetails> ret1;
          vector<NameDetails> ret2;
          ...
          if( ... )
              return ret1;
          else
              return ret2;
      }



Note also, that in C++11, the cost of returning containers, either way, is minimal, because of move-semantics.

No comments:

Post a Comment