Physics 5  Class Notes from 3/10/11      

Here is a polish of the code we started in class on Thursday, which is designed to measure empirically (by performing random trials) the probability that a randomly chosen sequence of three alphabetic characters will be an English word.

 

// Physics 5 code for March 10

 

#include <iostream>

#include <ctime>

#include <fstream>

#include <string>

#include <cstdlib>

 

using namespace std;

 

int main() {

      int count = 0;

      srand(int(time(NULL))); //cast time as an int to avoid warning

      ifstream inky("threeletterwords.txt"); //open for reading

      string rword = "AAA", word = "AAA"; //initialize as 3-letter words

      for(int j = 0; j < 10000; ++j) { // do ten thousand trials

         for(int i = 0; i < 3; i++) //create a random string

               rword[i] = (char)(rand()%26+65);

         inky.clear();  // clear the eof (end of file) state

         inky.seekg(0, ios::beg); // set inky’s get pointer to the start

         while(inky >> word) { // get next word while not at end of file

              if (word == rword) { // compare random word with file word

                  cout << rword << " ";

                        ++count;

                        break; // break out of while loop

              } // end if

         } // end while

      } // end for

      cout << "\nThe empirical probability of a random string "

           << "\nof three letters being an English word is "

           << (float)count/10000.;

      cin.get();

}

 

Observe how

·         The seed for the random number generator is a time which normally may be 8 bytes but since srand() expects an integer seed (4 bytes) a cast is performed to avoid the typedef mismatch warning

·         Often the input file stream pointer is named “infile.”  Here it is named “inky” just for some variety.

·         The string variables rword and word are initialized as three-letter strings so they’ll have the right size.

·         The ASCII charactors for A, B, … , Z have decimal values 65-90, so (char)(rand()%26+65) will be a random capital letter from the alphabet.

·         If the input file stream pointer reaches the end of the file, then the “end of file” (eof) flag is set to true.  To read through the file again, clear this flag with inky.clear() and then move the pointer back to the beginning with inky.seekg(0, ios::beg).

·         The counter count is used to tally how many random strings were English words.  Count/10000 is then an empirical measure of the probability that a random three-letter string is an English word.

 

Here’s one output from running this code:

PRY PEW CWM HUM ERE PIT TOE SOU FEE DUD LAG POI ZED BEE YOW GOA DOT GAD BUR AHA KOR LOO OLD UKE CAW KAY FRO RIN DUI BUN ODD HAJ AAS PRY HAT LIN ELL ARB GNU RIG FEN LOX KIR FEZ LOX HUT FEM LEK BOO DEE WYN AWN EME LUV SEC FEE DUB HEN ARK YAK PEP BRO LAC BUY OOT VAN ELK PAD REC VAC EON LOW GAM ZED VAS OPT LOT HOE CAM ZOO WEN SUM AHA FEY ZEE NAH MUN MUT SKA RET ASS GAN WEN FIL ERR POH DAK ATE DOL YAH PAR BUD KEY ELF YID WAW PAH ZOA DOC KOP RUE SOT AAH AVE ARF OLD BAP ZIG CAB FLY WON JUG NOR PIG YOB PIN PIC FIN TEG KUE THY YOD FER WAT SOD POP VOE FIZ BAY AUK BOA HAO VEX CEP TIT YAK GIN FOU COT MED PEA TIE JOT BOP YOD ABY ERR DAL HAP DEL AIN DUD GHI CEE PED NIM NUB AGA PEW HOD HAJ CAR VEX ALB JEU APT MAR YOU ORE ZIP DEN DIS SIS AVE LIN WAE PAR REC TIS ZOO REE COR FEH PAT JOY BUG VAS PEW JAG GUL LEX SEI LUV DUI QAT GOX PEN HIM RIM SUM AIL PEE GAG CAW JOE FEY RIP DAH AMU APT LET TAG RAT ALT KAF GOX BAM BIB AWL FAY TWO PLY FEZ FIZ DID DAY BAT VAT PET TAG DOC PEC ULU RUT AUK WIZ SOL NAB BID ARS WOK LOO ALP UGH SIB COS NOS LAM OES DIN RUN COD SOT HOP TIN SOX RAS SEA BAD LIT SOW AGE FUR PIG FUG KEG GAT KOR HMM BIB LED AHA SAU HON PAM IFF WIS SEW OAF YIP VIG NAG KIT ITS PAH TIL MUN ATE NEE ABO HAH BOP YET ARM AIS MOA LEX LAX HEW SKI ANY GUY HET YIP FEE IFF WAB YET HET YID LOP YAR LET DEW AYE TEG DIB SOP WAY LAC GAD YOD GEM NIM ODD ARF PYA GYP FAG BAL YET TED VEX KAS INK YAH DEL WRY OMS OFF ROD WAP COW HOB WOK HUN SUM SAP DUP FET VOX UTS NOW ABY DAW NAN RHO SIT ERG PUB HUH MAR SAD ERE OUD BIN EAT BUY FOX GAM CEP UNS AGA ANT WOE THO RIB MIS BYE ROT WAN MAY RAS ORE AWE PUD ALP OAR ZEE HEX KOR HOY ANY HER OPE BIN LAD MOS RYE ORE HER GAL WYE HAO OUT FOG ONS SRI LOW HEP EMU DIP WAY WYN BAG TUT CWM UKE AWA THY SIT MEN HAY USE KEX PUS CUD OWE EKE NAE ZAG ROW KOI SOB DID TOP TIC HMM PUP THE BOD PHT DOR DEE VOE OUT PAC WET GIP UKE SPY TOD FEN OPS SHY EBB DAK HAG SHH HUP KEA LEU BUD NOD HEM WIT UMP GIP HUN GAR BAY ABS DOC MOB RAS ANE ZIN ZOA TOP GUL EDH BRO YUP MON HUB FAR WRY TAP JEU PET YAW YOD BOO FIB GUN THO WAX VIE AIL JAR OMS TIL MOA SUP MOB DUO LOX LOW DIG CEP WON DEL ASS ARS WIZ GOB IRK PAR FOB SOW VAU REG JIN LAR WAD TIC MAE DUI WHO ABO SEG HIC GIE NOW VET BAP REF ROE RID ERN YOB HEN TOT OXY YOM
The empirical probability of a random string
of three letters being an English word is 0.0555

 

Since there are 970 3-letter words, the likelihood of picking one at random is 970/263 = 0.0552, so this looks good!