Physics
5 Class Notes from 3/10/11
Here
is a polish of the code we started in class on Thursday, which is designed to
measure empirically (by performing random trials) the probability that a
randomly chosen sequence of three alphabetic characters will be an English
word.
// Physics 5 code for March 10
#include <iostream>
#include <ctime>
#include <fstream>
#include <string>
#include <cstdlib>
using namespace std;
int main() {
int
count = 0;
srand(int(time(NULL))); //cast time as an
int to avoid warning
ifstream inky("threeletterwords.txt");
//open for reading
string rword = "AAA", word = "AAA";
//initialize as 3-letter words
for(int j = 0; j < 10000; ++j) { // do ten thousand trials
for(int
i = 0; i < 3; i++) //create a random string
rword[i] = (char)(rand()%26+65);
inky.clear(); // clear the eof (end of file) state
inky.seekg(0, ios::beg); // set inky’s get
pointer to the start
while(inky >> word) { // get next word while not at end of file
if (word == rword) { // compare random word with file word
cout
<< rword << " ";
++count;
break; // break out of
while loop
} // end if
} // end while
} // end for
cout << "\nThe
empirical probability of a random string "
<< "\nof three letters being an English word is
"
<< (float)count/10000.;
cin.get();
} |
Observe how
·
The seed for the random number generator is a time which
normally may be 8 bytes but since srand() expects an integer seed (4 bytes) a
cast is performed to avoid the typedef mismatch warning
·
Often the input file stream pointer is named “infile.” Here it is named “inky” just for some
variety.
·
The string variables rword and word are initialized as
three-letter strings so they’ll have the right size.
·
The ASCII charactors for A, B, … , Z have decimal values
65-90, so (char)(rand()%26+65)
will be a random capital letter from the alphabet.
·
If the input file stream pointer reaches the end of the file,
then the “end of file” (eof) flag is set to true. To read through the file again, clear this
flag with inky.clear() and then move the pointer back to the beginning with
inky.seekg(0, ios::beg).
·
The counter count is used to tally how many random strings
were English words. Count/10000 is
then an empirical measure of the probability that a random three-letter
string is an English word.
Here’s one
output from running this code:
PRY PEW CWM HUM ERE PIT TOE SOU FEE DUD LAG POI
ZED BEE YOW GOA DOT GAD BUR AHA KOR LOO OLD UKE CAW KAY FRO RIN DUI BUN ODD HAJ
AAS PRY HAT LIN ELL ARB GNU RIG FEN LOX KIR FEZ LOX HUT FEM LEK BOO DEE WYN AWN
EME LUV SEC FEE DUB HEN ARK YAK PEP BRO LAC BUY OOT VAN ELK PAD REC VAC EON LOW
GAM ZED VAS OPT LOT HOE CAM ZOO WEN SUM AHA FEY ZEE NAH MUN MUT SKA RET ASS GAN
WEN FIL ERR POH DAK ATE DOL YAH PAR BUD KEY ELF YID WAW PAH ZOA DOC KOP RUE SOT
AAH AVE ARF OLD BAP ZIG CAB FLY WON JUG NOR PIG YOB PIN PIC FIN TEG KUE THY YOD
FER WAT SOD POP VOE FIZ BAY AUK BOA HAO VEX CEP TIT YAK GIN FOU COT MED PEA TIE
JOT BOP YOD ABY ERR DAL HAP DEL AIN DUD GHI CEE PED NIM NUB AGA PEW HOD HAJ CAR
VEX ALB JEU APT MAR YOU ORE ZIP DEN DIS SIS AVE LIN WAE PAR REC TIS ZOO REE COR
FEH PAT JOY BUG VAS PEW JAG GUL LEX SEI LUV DUI QAT GOX PEN HIM RIM SUM AIL PEE
GAG CAW JOE FEY RIP DAH AMU APT LET TAG RAT ALT KAF GOX BAM BIB AWL FAY TWO PLY
FEZ FIZ DID DAY BAT VAT PET TAG DOC PEC ULU RUT AUK WIZ SOL NAB BID ARS WOK LOO
ALP UGH SIB COS NOS LAM OES DIN RUN COD SOT HOP TIN SOX RAS SEA BAD LIT SOW AGE
FUR PIG FUG KEG GAT KOR HMM BIB LED AHA SAU HON PAM IFF WIS SEW OAF YIP VIG NAG
KIT ITS PAH TIL MUN ATE NEE ABO HAH BOP YET ARM AIS MOA LEX LAX HEW SKI ANY GUY
HET YIP FEE IFF WAB YET HET YID LOP YAR LET DEW AYE TEG DIB SOP WAY LAC GAD YOD
GEM NIM ODD ARF PYA GYP FAG BAL YET TED VEX KAS INK YAH DEL WRY OMS OFF ROD WAP
COW HOB WOK HUN SUM SAP DUP FET VOX UTS NOW ABY DAW NAN RHO SIT ERG PUB HUH MAR
SAD ERE OUD BIN EAT BUY FOX GAM CEP UNS AGA ANT WOE THO RIB MIS BYE ROT WAN MAY
RAS ORE AWE PUD ALP OAR ZEE HEX KOR HOY ANY HER OPE BIN LAD MOS RYE ORE HER GAL
WYE HAO OUT FOG ONS SRI LOW HEP EMU DIP WAY WYN BAG TUT CWM UKE AWA THY SIT MEN
HAY USE KEX PUS CUD OWE EKE NAE ZAG ROW KOI SOB DID TOP TIC HMM PUP THE BOD PHT
DOR DEE VOE OUT PAC WET GIP UKE SPY TOD FEN OPS SHY EBB DAK HAG SHH HUP KEA LEU
BUD NOD HEM WIT UMP GIP HUN GAR BAY ABS DOC MOB RAS ANE ZIN ZOA TOP GUL EDH BRO
YUP MON HUB FAR WRY TAP JEU PET YAW YOD BOO FIB GUN THO WAX VIE AIL JAR OMS TIL
MOA SUP MOB DUO LOX LOW DIG CEP WON DEL ASS ARS WIZ GOB IRK PAR FOB SOW VAU REG
JIN LAR WAD TIC MAE DUI WHO ABO SEG HIC GIE NOW VET BAP REF ROE RID ERN YOB HEN
TOT OXY YOM
The empirical probability of a random string
of three letters being an English word is 0.0555 |
Since there
are 970 3-letter words, the likelihood of picking one at random is 970/263
= 0.0552, so this looks good! |