from
Programming Pearls
by
Jon Bentley

This page:

Programming Pearls

Category:

Computer programming

index pages:
authors
titles
categories
topics
translators

Programming Pearls

Copyright © 1986 by Bell Telephone Laboratories, Incorporated

Note (Hal’s):
CACM refers to Communications of the Association for Computing Machinery, which published the original versions of these columns, and where I first read some of them.

— end note

Preface

The essays in this book are about a more glamorous aspect of the profession: programming pearls whose origins lie beyond solid engineering, in the realm of insight and creativity. Just as natural pearls grow from grains of sand that have irritated oysters, these programming pearls have grown from real problems that have irritated real programmers. The programs are fun, and they teach important programming techniques and fundamental design principles.

Part I: Preliminaries

Column 1: Cracking the Oyster
CACM, August 1983

1.5 Principles

A Simple Design. Antoine de Saint-Exupéry, the French writer and aircraft designer, said that “a designer knows he has arrived at perfection not when there is no longer anything to add, but when there is no longer anything to take away”. More programmers should judge their work by this criterion. Simple programs are usually more reliable, secure, robust and efficient than their complex cousins, and a lot easier to build and to maintain.

1.7 Further Reading

In the case study described in this column, the programmer’s main problem was not so much technical as psychological: he couldn’t make progress because he was trying to solve the wrong problem. We finally solved his problem by breaking through his conceptual block and solving an easier problem. Conceptural Blockbusting by James L. Adams [...] studies this kind of leap and is generally a pleasant prod towards more creative thinking. Although it was not written with programmers in mind, many of its lessons are particularly appropriate for programming problems.

Column 2: Aha! Algorithms
CACM, September 1983

2.3 The Power of Primitives
Problem B is to rotate the N-element vector X left by I positions in time proportional to N and with just a few words of extra space.

The problem looks hard until you finally have the aha! insight: let’s view the problem as transforming the array AB into the array BA, but let’s also assume we have a subroutine that reverses the elements in a specified portion of the array. Starting with AB, we reverse A to get ARB, reverse B to get ARBR, and then reverse the whole thing to get (ARBR)R, which is exactly BA. This results in the following code for rotation; the comments show the results when ABCDEFGH is rotated left three elements.

Reverse(1,I)/* CBADEFGH */
Reverse(I+1,N)/* CBAHGFED */
Reverse(1,N)/* DEFGHABC */

The reversal code is time- and space-efficient, and is so short and simple that it’s pretty hard to get wrong. [...] Ken Thompson wrote the editor and the reversal code in 1971, and claims that it was part of the folklore even then.

2.4 Getting It Together: Sorting

Given a dictionary of English words (one word per input line in lower case letters), we must find all anagram classes. There are several good reasons for studying this problem. The first is technical: the solution is a nice combination of getting the right viewpoint and then using the right tools. The second reason is more compelling: wouldn’t you hate to be the only person at a party who didn’t know that “deposit”, “dopiest”, “posited” and “topside” are anagrams? [...]

There are a number of surprisingly ineffective and complicated ways to solve this problem. Any method that considers all permutations of letters for a word is doomed to failure. The word “microphotographic” (an anagram of “photomicrographic”) has 17! permutations, and a few multiplications showed that 17! ≈ 3×1014. Even assuming the blazing speed of one microsecond per permutation, this will take 3×108 seconds. The rule of thumb that “π seconds is a nanocentury” is true to within half a percent (there are roughly 3.155×109 seconds in a century), so the 3×108 seconds are almost a decade.

Note (Hal’s):
A mathematical purist might note that repeated letters reduce the number of different permutations to 17!/(2!2!2!2!2!6!) ≈ 1.85×1012. But a programming purist recognizes that excluding duplicate permutations is as much of a computation burden as checking them all.

— end note

2.5 Principles

A Problem Solver’s Perspective. Good programmers are a little bit lazy: they sit back and wait for an insight rather than rushing forward with their first idea. That must, of course, be balanced with the initiative to code at the proper time. The real skill, though, is knowing the proper time. That judgment comes only with the experience of solving problems and reflecting on their solutions.

Column 3: Data Structures Programs
CACM, October 1983

Most programmers have seen them, and most good programmers realize they’ve written at least one. They are huge, messy, ugly programs that should have been short, clean, beautiful programs. I once saw a COBOL program whose guts were

IF THISINPUT IS EQUAL TO 001 ADD 1 TO COUNT001.
IF THISINPUT IS EQUAL TO 002 ADD 1 TO COUNT002.
   ...
IF THISINPUT IS EQUAL TO 500 ADD 1 TO COUNT500.

[...] Think for a minute about how you could accomplish the same task with a program just a tiny fraction of the size by using a different data structure — a 500-element array to replace the 500 individual variables. (Programmers who are paid per line of code may wish to ignore this exercise; managers who pay by the line may wish to faint.)

Hence the title of this column: a proper view of data does indeed structure programs.

3.5 Principles

The moral of each of the stories is the same: don’t write a big program when a little one will do. Most of the structures exemplify what Polya calls the Inventor’s Paradox in his How To Solve It: “the more general problem may be easier to solve”. In programming this means that it may be harder to solve a 73-case problem directly than to write a general program to handle the N-case version, and then apply it to the case that N=73.

This column has concentrated on just one contribution that data structures can make to software: reducing big programs to small programs. Data structure design can have many other positive impacts, including time and space reduction and increased portability and maintainability. Fred Brooks’s comment in Chapter 9 of his Mythical Man Month is stated for space reduction, but it is good advice for programmers who desire the other attributes as well [...]

Here are a few principles to ponder as you rear back.

Rework repeated code into arrays. A long stretch of similar code is often best expressed by the simplest of data structures, the array.

Be familiar with advanced data structures. Advanced data structures aren’t appropriate very often, but when you need them, they’re indispensable.

Let the data structure the program. The theme of this column is that data can structure a program by replacing complicated code by a data structure. There are many other interpretations of the advice. [...] Although the particulars change, the theme remains: before writing code, good programmers thoroughly understand the input, the output, and the intermediate data structures around which their programs are built.

Column 4: Writing Correct Programs
CACM, December 1983

Coding skill is just one small part of writing correct programs. The majority of the task is the subject of the three previous columns: problem definition, algorithm design, and data structure selection. If you perform those tasks well, writing correct code is usually easy.
4.1 The Challenge of Binary Search

I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right. But they aren’t the only ones to find this task difficult: in the history in Section 6.2.1 of his Sorting and Searching, Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962.

Topic:

Bugs

Part II: Performance

Column 5: Perspective on Performance
CACM, November 1984

5.3 Principles

Because an ounce of prevention is worth a pound of cure, we should keep in mind an observation due to Gordon Bell of Encore Computer Corporation.

The cheapest, fastest and most reliable components of a computer system are those that aren’t there.

Those missing components are also the most accurate (they never make mistakes), the most secure (they can’t be broken into), and the easiest to design, document, test and maintain. The importance of a simple design can’t be overemphasized.

But when performance problems can’t be sidestepped, thinking about design levels can help focus a programmer’s effort.

Column 6: The Back of the Envelope
CACM, March 1984

6.2 Quick Calculations in Computing

Early in the life of a system, rapid calculations can steer the designer away from dangerous waters into safe passages. And if you don’t use them early, they may show in retrospect that a project was doomed to failure. The calculations are often trivial, employing no more than high school mathematics. The hard part is remembering to use them soon enough.

6.5 Principles

When you use back-of-the-envelope calculations, be sure to recall Einstein’s famous advice.

Everything should be made as simple as possible, but no simpler.

We know that simple calculations aren’t too simple by including safety factors to compensate for our mistakes in estimating parameters and our ignorance of the problem at hand.

Topic:

Occam’s Razor

6.8 Quick Calculations in Everyday Life [Sidebar]

Everyday life presents us with many opportunities to hone our skills at quick calculations. For instance, how much money have you spent in the past year eating in restaurants? I was once horrified to hear a New Yorker quickly compute that he and his wife spend more money each month on taxicabs than they spend on rent. And for California readers (who may not know what a taxicab is), how long does it take to fill a swimming pool with a garden hose?

Several other readers discussed teaching this attitude to children, from the viewpoints of both parent and child. Popular questions were of the form “How long would it take you to walk to Washington, D.C.?” and “How many leaves did we rake this year?” Administered properly, such questions seem to encourage a life-long inquisitiveness in children, at the cost of bugging the heck out of the poor kids at the time.

Topic:

Education

Column 8: Code Tuning
CACM, February 1984

Some programmers pay too much attention to efficiency; by worrying too soon about little “optimizations” they create ruthlessly clever programs that are insidiously difficult to maintain. Others pay too little attention; they end up with beautifully structured programs that are utterly inefficient and therefore useless. Good programmers keep perspective on efficiency: it is just one of many problems in software, but it is sometimes very important.

[...] “Code tuning” locates the expensive parts of an existing program and then makes little changes to the code to improve its performance. It’s not always the right approach to follow and it’s rarely glamorous, but it can sometimes make a big difference in a program’s performance.

8.4 Principles

The most important principle about code tuning is that it should be done rarely.

The Role of Efficiency. Many other properties of software are as important as efficiency, if not more so. Don Knuth has observed that premature optimization is the root of much programming evil; it can compromise the correctness, functionality and maintainability of programs. Save concerns for efficiency for when it matters.

Design Levels. [...] Before tuning code, we should make sure that other approaches don’t provide a more effective solution.

Column 9: Squeezing Space
CACM, May 1984

9.3 Code Space

Although one can’t ignore the many merits of “top-down” methods, the homogeneous world view given by good primitive routines can make a system easier to maintain and simultaneously reduce space.

Featurecide reduces code space by removing functionality from subroutines.

Part III: The Product

Column 11: Searching
CACM, December 1984

11.3 The Design Space

One part of a programmer’s job is solving today’s problem. Another, and perhaps more important, part of the job is to prepare for solving tomorrow’s problems. Sometimes that preparation involves taking classes or studying books like Knuth’s. More often, though, we programmers learn by the simple mental exercise of asking how we might have solved a problem differently.

One student suggested photocopying the precinct list, cutting the copy with a paper slicer, shaking the slips in a paper bag, and then pulling out the required number of slips. That cadet showed the “conceptual blockbusting” that is the subject of Adams’s book cited in Section 1.7.†

† Page 57 of that book sketches Arthur Koestler’s views on three kinds of creativity. Ah! insights are his name for originality, and aha! insights are acts of discovery. He would call this cadet’s solution a haha! insight: the low-tech answer to a high-tech question is an act of comic inspiration [...]

book citation
11.4 Principles

Although the following discussion presents the stages in one natural order, the design process is more active: we hop from one activity to another, usually iterating through each many times before arriving at an acceptable solution.

Understand the Perceived Problem. Talk with the user about the context in which the problem arises. Problem statements often include ideas about solutions: like all early ideas, they should be considered but not followed slavishly.

Specify an Abstract Problem. A clean, crisp problem statement helps us first to solve this problem and then to see how this solution can be applied to other problems.

Explore the Design Space. Too many programmers jump too quickly to “the” solution to their problem; they think for a minute and code for a day rather than thinking for an hour and coding for an hour.

Implement One Solution. On lucky days our exploration of the design space shows that one program is far superior to the rest; at other times we have to prototype the top few to choose the best. We should strive to implement the chosen design in straightforward and succinct code.†

† Problem 5 describes a class exercise that I graded on programming style. Most students turned in one-page solutions and received mediocre grades. Two students who had spent the previous summer on a large software development project turned in beautifully documented five-page programs, broken into a dozen procedures, each with an elaborate heading. They received failing grades. My program worked in five lines of code, and their inflation factor of sixty was too much for a passing grade. When they complained that they were employing standard software engineering tools, I should have quoted Pamela Zave: “The purpose of software engineering is to control complexity, not to create it.”

Column 12: Heaps
CACM, March 1985

12.3 Priority Queues

There are two sides to any data structure. Looking from the outside, its abstraction tells what it does — a queue maintains a sequence of elements under the operations of insert and extract. On the inside, its implementation tells how it does it — a queue might use an array or a linked list.

12.5 Principles

Abstraction. Good engineers distinguish between what a component does (the abstraction seen by the user) and how it does it (the implementation inside the black box). In this column we’ve seen two ways to package black boxes: procedural abstraction and abstract data types.

Column 13: A Spelling Checker
CACM, May 1985

13.2 The Design Space
An affix is a prefix like pre- or a suffix like -ly; most dictionaries leave affix analysis to the reader. Although a spelling checker may in good conscience report antidisestablishmentarianism as a mistake, I would be miffed to wade through a long list of misspellings like cats and replay when cat and play were in the dictionary.
Epilog
[...] thinking hard about programming can be both useful and fun. There’s more to the job than systematic program development from formal requirements documents. If this book helps just one disillusioned programmer to fall back in love with his or her work, it will have served its purpose.

Here are some points that occur repeatedly in “Principles” sections.

Work on the right problem.
Explore the design space of solutions.
Look at the data.
Use the back of the envelope.
Design with components.
Build prototypes.
Make tradeoffs when you have to.
Keep it simple.

The points were originally discussed in the context of programming, but they apply to any engineering endeavor.

text checked (see note) Feb 2024

top of page