Monday, July 28, 2008

Optimal Span - Why YOU Should Care

Howard agreed to do his new Topics on Biosemiotics and Language (with some help from Joel's Topic on Memory) with the expectation I would do one on Optimal Span. He was Chairman of my PhD Committee. My dissertation, Hierarchy Theory: Some Common Properties of Competitively-Selected Systems, centered on Optimal Span. (Howard is the author of an excellent book Hierarchy Theory - The Challenge of Complex Systems.)

Why should you care and how does this affect you?

Most obviously, it affects your employment experiences, but also (according to my thesis) the hierarchical structure of things from your ability to discriminate sights and sounds and tastes to written language to how proteins, RNA, and DNA fold! As recenty as 2006 a Dutch scholar I do not know wrote Organizational Structures for Dealing with Complexity and cites my PhD dissertation and a draft paper I wrote for my students at U. Maryland (see Bart A. Meijer, pages 6, 104, 106, 107 and 204).


Management experts have long recommended that Management Span of Control be in the range of five or six for employees whose work requires considerable interaction. That's why corporate hierarchies usually have around six employees (sometimes a few more than six) in each first-level department and around five (sometimes a bit less) first-level departments reporting to the next level up and so on. (If the lowest level consists of service-type employees, there may be a dozen or two or more in a department, but there will usually be one or more foremen or team leaders, etc.)

The above diagram shows three different ways you might organize 49 workers. In (A) you have ONE manager and 48 workers, which is a BROAD hierarchy. Management experts would say a Management Span of Control of 48 is way too much for anyone to handle! In (B) you have THIRTEEN managers in a three-level management hierarchy and only 36 workers, which is a TALL hierarchy with an average Management Span of Control of only 3.3. Management experts would say this is way too inefficient with too many managers! In (C) you have SEVEN managers and 42 workers in a MODERATE hierarchy with an average Management Span of Control of about 6.5. Management experts would say this is about right for most organizations where the workers have to interact with each other. Optimal Span theory supports this common-sense belief!


George A Miller wrote a classic paper in 1956 The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information that showed that human senses of sight, sound, and taste were generally limited to five to nine gradations that could be reliably distinguished. Miller's paper begins as follows:

My problem is that I have been persecuted by an integer [7 +/- 2]. For seven years this number has followed me around, has intruded in my most private data, and has assaulted me from the pages of our most public journals. This number assumes a variety of disguises, being sometimes a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecognizable. The persistence with which this number plagues me is far more than a random accident. There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution.
Miller's paper is well worth reading!


Miller's number also pursued me until I caught it. I showed, as part of my PhD research, that, based on empirical data from varied domains, the optimal span for virtually all hierarchical structures falls into Miller's range, five to nine. Using Shannon's information theory, I also showed that maximum intricacy is obtained when: The Span (optimal) for single-dimensional structures is, So = 1 + 2e = 6.4 (where e is the natural number, 2.71828459). My "magical number" is not the integer 7, but 6.4, a more precise rendition of Miller's number!

Hierarchy and Complexity

As M. Mitchell Waldrop observes:

[The] hierarchical, building-block structure of things is as commonplace as air. (Complexity - The Emerging Science at the Edge of Order and Chaos, Touchstone, 1992.)

Howard H. Pattee, in his seminal book that I mentioned above, was one of the early researchers in hierarchy theory and he personally challenged me to find:

a simple theory of very complex, evolving systems [and] common, essential properties of hierarchical organizations (Hierarchy Theory - The Challenge of Complex Systems, Braziller, 1973.)

Most complex structures are compositional or control hierarchies. An example of a compositional hierarchy is written language. A word is composed of characters. A simple sentence is composed of words. A paragraph is composed of simple sentences, and so on. An example of a control hierarchy is a management structure, where a manager controls a number of foremen or team leaders, and they, in turn, control a number of workers.

Ira's Hypothesis

The hypothesis at the heart of my PhD dissertation is that the optimal span is about the same for virtually all complex structures that have been competitively selected. That includes the products of Natural Selection (Darwinian evolution) and the products of Artificial Selection (Human inventions that competed for acceptance by human society).

Weak Statement of Hypothesis

In what I call the "weak" statement of the hypothesis, I showed that it is scientifically plausable to believe that diverse structures tend to have spans in the range of five to nine. I did this by gathering empirical data from six domains plus a computer simulation. The domains are:

  1. Human Cognition: Span of Absolute Judgement (one, two and three dimensions), Span of Immediate Memory, Categorical hierarchies and the fine structure of the brain. These all conform to my hypothesis.

  2. Written Language: Pictographic, Logographic, Logo-Syllabic, Semi-alphabetic, and Alphabetic writing. Hierarchically-folded linear structures in written languages, including English, Chinese, and Japanese writing. These all conform to my hypothesis.

  3. Organization and Management of Human Groups: Management span of control in business and industrial organizations, military, and church hierarchies. These all conform to my hypothesis. NOTE: Hierarchy means rank or order of holy beings. I showed that the hierarchy of the angels of the heavenly host, as recounted in Jewish and Christian scriptures and later mystical writings are not typically in the range five to nine and therefore do not comform to my hypothesis. That is a good result because these hierarchies are not competitively selected! They are either the product of human imagination -or- the Creation of God who is not bound by the laws of information theory!

  4. Animal and Plant Organization and Structure: Primates, schooling fish, eusocial insects (bees, ants), plants. These all conform to my hypothesis.

  5. Structure and Organization of Cells and Genes: Prokaryotic and eukaryotic cells, gene regulation hierarchies. These all conform to my hypothesis.

  6. RNA and DNA: Structure of nucleic acids. These all conform to my hypothesis.

  7. Computer Simulations: Hierarchical generation of initial conditions for Conway's Game of Life. (Two-dimensional ). These all conform to my hypothesis.

Strong Statement of Hypothesis

What I call the "strong" statement of the hypothesis is that Shannon's information theory, and Smith and Morowitz' concept of the intricacy of a graphical representation of a structure, can be used to derive a formula for the optimal span of a hierarchical graph.

This work extended the single-dimensional span concepts of management theory and Miller's "seven plus or minus two" concepts to a general equation for any number of dimensions. I derived an equation that yields Optimal Span for a structure with one-, two-, three- or any number of dimensions!

My equation for Span (optimal) is: So = 1 + De. (where D is the degree of the nodes and e is the natural number, 2.71828459.)

NOTE: For a one-dimensional structure, such as a management hierarchy or the span of absolute judgement for a single-dimensional visual, taste or sound, the degree of the nodes, D = 2 . This is because each node is a link in a one-dimensional chain or string and so each node has two closest neighbors. For a two-dimensional structure, such as a 2D visual or the pitch and intensity of a sound or a mixture of salt and sugar, D = 4. Each node is a link in a 2D mesh and so each node has four closest neighbors. For a 3D structure, D = 6 because each node is a link in a 3D egg crate and has six closest neighbors. Some of the examples in Miller's paper were 2D and 3D and his published data agreed with the results of my formula. The computer simulation was 2D and also conformed well to the hypothesis.


In Chapter 6 of my novel, Jim and Luke wonder about the control structure for the 1600 scepter-holders:

After a period of silence, Luke spoke up. “Sixteen hundred people are way too many for there not to be a hierarchical structure,” he began. “If the scepter-holder system was properly designed, according to system science theory at least, there would have to be several grades above the lowest class of scepter-holder.”

He took out his read-WINs and put them on.

“Luke,” I observed, “There’s no WIN coverage in this area …”

“Right,” answered Luke, “But there are processors and software in my read-WINs that allows them to operate independently. I’ve got a program for ‘optimal span’ – you know the ‘magical number seven plus or minus two.’”

“What the heck is that?” I asked, “And why would I care? Where are we going here?”

“Well, back about a century ago, a psychologist named Miller discovered that human perception, such as sight and smell and taste and memory and so on, is limited to five to nine gradations. He called it 'the magical number seven, plus or minus two' or, more scientifically, the 'span of human perception'."

“Another guy, an engineer named Glickstein, about sixty years ago, proved the optimal span for any structure is one plus the degree of the nodes times 2.71828459, the natural number ‘e.’ For a one-dimensional string, the degree is two and the formula comes out to be around six and a third, or a little more. He also showed with Shannon’s information theory that the range five to nine was, at least theoretically, over ninety-six percent efficient and four to twelve was over eighty percent efficient. And that’s not just for control hierarchies like a management chain, but also containment hierarchies in all types of physical systems and even software systems like …”

“You just told me how to build a clock,” I laughed, interrupting Luke. “All I want to know is what time it is! Please, tell me why I give a hoot about the range five to nine or the number six and a third or a bit more?”

“About forty years ago,” continued Luke, “A management expert rediscovered the optimal span theory and proclaimed that all management structures must adhere to it! Did you ever notice how nearly all departments at TABB have either six or seven workers to each manager? How each second-level manager has six or seven first-level managers working for him or her?”

“Yeah, come to think of it,” I replied, “That’s how it is. On the other hand, when I worked in a factory as a college summer job, we had about a dozen guys and gals in our team.”

“Well,” replied Luke, “The lowest level, like a platoon in the military, can have ten or twelve or sometimes a bit more. The theory only applies when the workers have to interact with each other in complex ways, not when they’re doing grunt work.”

“OK,” I replied, “So, as I asked before, where are we going here?”

“If you’d quit interrupting, I’ll tell you,” Luke said good-naturedly, “According to the optimal span program in my read-WINs, sixteen-hundred scepter-holders would break down into about two-hundred-fifty first-level ‘departments,’ each with six or seven scepter-holders and one higher-level scepter-holder ‘managing’ them. The two-hundred-fifty second-level scepter-holders would report to thirty-six third-level scepter-holders who, in turn, would report to six fourth-level scepter-holders who would report to the top dog scepter-holder if there was one.”

“OK,” I replied, “So the scepter-holders are hierarchically organized … Wait a minute, did you say thirty-six?”

“Yeah,” replied Luke, “There should be thirty-six scepter-holders at the third level. What about it?”

“Well,” I began, very seriously, “We have a tradition in Judaism that there are thirty-six ‘tzadikim’ or ‘righteous ones’ for whose sake the world exists. No one knows who they are. When one dies, he, or she I guess, is replaced by another, chosen by God. They are sometimes called the ‘Lamed Vovniks’ because, according to gematria, which we discussed some months ago, the Hebrew letter Lamed stands for thirty and the letter Vuv for six, which adds up to thirty-six.”

“So,” replied Luke with a level of interest that surprised me at the time, “There would be thirty-six especially powerful scepter-holders who would regulate the rest! And they do need regulation. I’m not one-hundred percent pleased with Stephanie’s ethics ...

Ira Glickstein


joel said...

Hi Ira -Sorry but I.m having difficulty following. I read Miller's paper, but that didn't help. I couldn't figure out what the word "optimal" means in this case. What is the quantity being optimized? -Joel

Ira Glickstein said...

Thanks for your question Joel!

According to my Optimal Span Hypothesis, if the Span is close to 6.4, the Intricacy of the system is optimized to 100%. In other words, using the example illustrated by the diagram in the above posting, if you are given a resource of 49 employees, all else being equal, you will get close 100% of the maximum theoretical Intricacy if you organize them as shown in (C).

If you organize them as shown in (A), with a Span of 48 -or- as shown in (B) with a Span of 3.3, you will get a substantially lower percentage of the theoretical Intricacy.

So, why should you want to increase Intricacy? For that you need to understand the difference between "complexity" and "intricacy". (In short, complexity = bad; intricacy = good!)

Fortunately, I have just published my first "Knol" on Google's new Knol service and it is about Optimal Span. I plan to post a new Topic on our Blog about the Knol service in a week or so. (In short, "a little bit of knowledge is (not necessarily dangerous, but it is) a Knol!")

Here is the key part that answers your question copied from the Knol:

In normal usage, complexity and intricacy are sometimes used interchangeably. However, there is an important distinction between them according to [Smith and Morowitz, 1982].

Something is said to be complex if it has a lot of different parts, interacting in different ways. To completely describe a complex system you would have to completely describe each of the parts and then describe how they interact. Therefore, a measure of complexity is how long a description would be required for one person to explain it to another.

Something is said to be intricate if it has a lot of parts, but they may all be the same and they may interact in simple ways. To completely describe an intricate system you would only have to describe one or two or a few different parts and then describe the simple ways they interact. For example, a window screen is intricate but not at all complex. It consists of equally-spaced vertical and horizontal wires criss-crossing in a regular pattern in a frame where the spaces are small enough to exclude bugs down to some size. All you need to know is the material and diameter of the wires, the spacing betwen them, and the size of the window frame. Similarly, a field of grass is intricate but not complex.

If you think about it for a moment, it is clear that, given limited resources, they should be deployed in ways that minimize complexity to the extent possible, and maximize intricacy!

Using [Smith and Morowitz, 1982] concepts of inticacy, it is possible to compute the theoretical efficiency and effectiveness of a hierarchical structure. If it had the Optimal Span, it is 100% efficient, meaning that it attains 100% of the theoretical intricacy given the resources used. If not, the percentage of efficiency can be computed. For example, a one-dimensional tree structure hierarchy is 100% efficient (maximum theoretical intricacy) with a Span of 6.4. For a Span of five, it is 94% efficient (94% of maximum theoretical intricacy). It is also 94% efficient with a Span of nine. For a Span of four or twelve, it is 80% efficient.

You can see the complete Knol and links to the references at

Miller's paper, which is a classic and I'm glad you read it, points out several domains of human sensory and memory ability that seem to favor the "magical number seven plus or minus two". He has no idea why the range five to nine is favored.

I argue that evolution and natural selection happened upon that Optimal Span range because it makes best use of resources available. All competitive-selection proceses tend to gravitate to somewhere near the best use of resources. Other factors may force the Span to be somewhat higher or lower, and that is understandable, but, all else being equal, the Optimal Span range will be found and adopted.

Ira Glickstein

joel said...

Hi Ira -Sorry to be dense, but I don't see how one computes this quantity called "intricacy" even in the simple cases you depict. -Joel

Ira Glickstein said...

Joel: Shannon's Information Theory (1948) provides a formula for computation of the "information entropy" (or "uncertainty"). "It quantifies the information contained in a message, usually in bits or bits/symbol. It is the minimum message length necessary to communicate information."

The Shannon equation has the form:

H(X)= {Greek Sigma}p(xi) log p(xi)

While I am not enough of a mathematician to fully understand this in detail, I know Shannon worked for the telephone company and was interested in how much information could be transferred from point A to point B.

Smith and Morowitz (1982)
adapted Shannon entropy to compute what they called the intricacy per edge of a structure represented by a graph.

I put their equation in the following form:

I = S (A/M) LOG(base 2) (A/M)

where I is intricacy in bits per edge, S is the total number of nodes in the graph, A is the actual number of nodes connected, and M is the maximum number of nodes that could be connected.

"Information" in a message, viewed abstractly, has to do with uncertainty or surprise. If I told you GW Bush was in Washington or Texas or was married to Laura Bush that would not be much information to you. If I told you he was on the Moon or married to Hillary Clinton that would be more surprising and (if true) would be more information!

Smith and Morowitz use that idea in their concept of intricacy. Given a graph consisting of nodes and edges, how many edges would be the most surprising per investment in edges?

Let us take a simple example to get the idea across and then we'll look at the equation again.

Say you have five nodes, A B C D and E.

There are ten places you could place an edge: A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, D-E.

If you decide to place one edge, there are 10 possible places you could put it. So the "surprise" per edge would be 10/1= 10.

If you decide to place two edges, there are 45 possible ways you could do it. So the "suprise" per edge would be 45/2= 22.2. So, there would be more intricacy per edge for two edges than for one.

If you decide to place three edges, there are 120 possible ways you could do it. So the "suprise" per edge would be 120/3 = 40. So, there would be more intricacy per edge for three edges than for two.

If you decide to place four edges, there are 210 possible ways you could do it. So the "suprise" per edge would be 210/4 = 52.5. So, there would be more intricacy per edge for four edges than for three.

If you decide to place five edges, there are 252 possible ways you could do it. So the "suprise" per edge would be 252/5 = 50.4. So, there would be less intricacy per edge for five edges than for four.

OK, we now know that the intricacy per edge for a graph with five nodes is maximized if you place four of the possible ten edges. Any fewer than four will give you fewer possible ways per investment in edges. Any more than four will also give you fewer possible ways per edge.

OK, back to the Smith/Morowitz equation for intricacy I that I put in the following form:

I = S (A/M) LOG(base 2) (A/M)

Any good mathematician could tell you I is maximized when (A/M) = 1/e = 1/2.71828459 = 0.367879.

(We just showed that with the simple example of five nodes, where intricacy was maximized when we connected four of the possible ten edges. A/M = 4/10 = 0.4 which is closest to 0.367879. Would you like me to repeat this for six nodes? Seven? I did not think so!!! Please take my word for it, it works for any number of nodes. The element of "surprise" that we call intricacy is maximized for the investment in edges if we have as (A/M) as close as possible to 0.367879.)

I took that result, and showed in my dissertaion using the Derivation reproduced in my Knol that the Optimal Span for any structure was:

S(optimal) = 1 + De

I also showed that, making certain assumptions (bi-directional, equal-weighted edges) it is best for each node to be connected to around two others (the degree of the node, D = 2).

Assuming D = 2, you get maximum intricacy when

S = 1 + 2e = 6.4

I hope this answers your questions. If you or anyone else has any other questions I will be happy to try to answer them.

Ira Glickstein