Quantification of Information

Entropy of English - Letter Models reference

  1. Zero-order approximation. (The symbols are independent and equiprobable -- i.e., the entropy = log2 27 = 4.76 bits per letter)

    XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGXYD QPAAMKBZAACIBZLHJQD
     

  1. First-order approximation. (The symbols are independent, but frequency of letters matches English text
    -- e.g., p(e) = 0.13 and p(q) = p(z) = 0.01.
  2.  
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA
NAH BRL.

  1. Second-order approximation. (Frequency of pairs of letters (i.e., digrams) matches English text.)
ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE
AT TEASONARE FUSO
TIZIN ANDY TOBE SEACE CTISBE.


  1. Third-order approximation.(Frequency of triplets of letters (i.e., trigrams) matches English text.)
IN NO IST LAT WHEY CRATICT FROURE BERS GROCID PONDENOME OF DEMONSTURES
OF THE REPTAGIN IS
REGOACTIONA OF CRE.


  1. Fourth-order approximation. (Frequency of quadruplets of letters matches English text. Each letter depends on the previous three letters.)
THE GENERATED JOB PROVIDUAL BETTER TRAND THE
DISPLAYED CODE, ABOVERY UPONDULTS WELL THE CODERST IN THESTICAL IT DO HOCK BOTHEMERG.
(INSTATES CONS ERATION. NEVER ANY OF PUBLE AND TO THEORY. EVENTIAL CALLEGAND TO ELAST BENERATED IN WITH PIES AS IS WITH THE)

Biographies and Resources
  Abstract:
Shannon in 1950 estimated the entropy of written English to be between 0.6 and 1.3 bits per character (bpc), based on the ability of human subjects to guess successive characters in text.  Simulations to determine the empirical relationship between the provable bounds and the known entropies of various models suggest that the actual value is 1.1 bpc or less.

 
 


reference
Shanon Examples from Elements of Information Theory, Thomas M. Cover and Joy A. Thomas, John Wiley & Sons, New York, (1991) ISBN 0-471-06259-6