After looking at the most common words, I decided to look at the most common combinations of words that Sal uses. I expected that *"and now we are done"* would be common.

## Bigrams

Bigrams are combinations of two words (or letters). The five most common bigrams Sal uses are (with counts in parentheses), are shown below. Before running my analysis, I had assumed all the most frequent bigrams would be common English word combinations, but actually, I think only *"this is"* and *"of the"* are common in most text (though I'd like to check this).

The bigram *"going to"* is probably quite common in English, but I suspect not as common as here. This must be because Sal frequently introduces what he is about to do before doing it. The bigrams *"equal to"* and *"is equal"* are probably relatively rare in English outside of mathematical discussions.

- going to (11,939)
- this is (11,182)
- equal to (10,925)
- of the (8,462)
- is equal (8,068)

## Trigrams

When we extend our search to trigrams, the most common bigrams all get extended.

You can see that the bigram *"going to"* is extended in both directions to *"(is) going to (be)"*, while *"this is"* is extended to *"(so) this is (the)"*. When we move to 4-grams we can see how the *"is equal to"* fits in.

- is equal to (7,990)
- going to be (5,101)
- is going to (3,108)
- so this is (2,137)
- this is the (1,958)

## 4-grams

More the same, expanding the sentence fragments further. You can see how these could be pieced together to form the fragment *"is going to be equal to the same thing as"*.

Other 4-grams I found interesting are *"x is equal to"* (1,015), *"the square root of"* (691), *"in the last video"* (379) and *"with respect to x"* (309).

- is going to be (2,395)
- to be equal to (1,148)
- is equal to the (1,096)
- the same thing as (1,064)
- going to be equal (1,020)

## 5-grams

You can probably see where this is going now. Clearly, the clause *"this is going to be equal to"* is very common. Interestingly, *"is the same thing as"* is also very common, and essentially the same thing. The fragment *"let's see if we can"* is very much part of Sal's inclusive style of talking. Other 5-grams include *"both sides of this equation"* (209), *"so let's say I have"* (130), *"the limit as x approaches"* (109) and *"let's say I have a"* (107).

- going to be equal to (932)
- is going to be equal (681)
- is the same thing as (647)
- this is going to be (635)
- let's see if we can (348)

To be continued...

## Comments (1)

## Benjamin Cuningham on June 13, 2012, 7:13 p.m.

Very cool!

http://www.khanacademy.org/profile/BenjaminCuningham/