Back to November Calendar

Previous Entry Next Entry→

November 23, 2005

NPR's Weekend Edition Sunday Chemical Symbol Puzzle

The WESun puzzle of  Nov. 13th struck me as a nice way to mix basic chemical literacy with basic English vocabulary with the stirring power of mathematics and computer science.  I didn't realize the deadline was Wednesday until Thursday (Thanksgiving) so I didn't turn in a solution--turns out I didn't find the best one, in anycase, so they wouldn't have chosen me which I wouldn't want to have been chosen, even if I were home to be called. 

Here is a statement of the problem:

This week we have a two-week challenge: Write down these two chemical symbols, on the first line write LI, for lithium, and FE for iron. Underneath write NE for neon. And AR for Argon. Reading across you get the four letter words, life and near. And reading down you get, line and fear, completing a miniature word square composed only of chemical symbols. The object is to create a 3X3 square, composed of nine chemical symbols, in which each of the three rows across and each of the columns down, spells a word. You may use either one letter or two letter chemical symbols, but the idea is to use as many two letter ones as possible. Only un-capitalized words are allowed. Our source for acceptable words will be Merriam-Webster 11th Collegiate Dictionary.  What is the Samuel Johnson dictionary's list?

I figured this was a good opportunity to practice reading and writing from/to files, so I forged ahead with this basic plan:

  • Get a list of all six letter words in a text file.  I googled "six letter words" and got to these lists of Scrabble words, which I (perhaps mistakenly!) took as good as gold.  If anything, I was worried about saying some of these words with a straight face.
  • Check all combinations of chemical symbols and see whether or not they're in the list of 6-letter words.  If they're in the list, write them to another file, which will contain only  (one hopes) many 6-letter words, consisting of three consecutive 2-letter chemical symbols, like CaBaNa (calcium, barium, Sodium.)
  • Construct a vector of vectors of strings consisting of 1 or 2 characters representing a chemical symbol.
  • Sort these alphabetically.
  • Use lots of programming structure to build an interlocking 3X3 grid following an algorithm something like this:
  • for each chemWord find all other chemWords with the same first symbol, but sharing no other symbols and fill the matrix with the first word across the first row and the second word down the first column. 
  • For each such gnomon, check to see if there's a 2 across whose second symbol matches the second symbol the 2 down (without repeating a symbol) and
  • check that if you've got a good middle, if there's a 2-down and 2-across combo that also allows for words in the 3-across and 3-down positions (without repeating a symbol) and
  • if you've found all the above, does the third symbol of 3-across match the third symbole of 3-down (without repeating a symbol) ?  If so, hooray!  print!

Even having laid out the plan, it still seems like a fair bit of coding, especially keeping track of the logic levels and all the file i/o and vector and sorting challenges.  Never to worry.  Here we go!

Put aside the question of how to get a better list of six letter words and just accept that the this list is complete.  It does, after all, contain lots of words, including the pigeon anagram, "epigon," which does not appear in Merriam Webster's on-line dictionary, but may be an alternate spelling for their "epigone" which is defined as "an inferior imitator." 

The second bulleted task is not so easily faked.  What I end up doing is concatenating all possible combinations of my 3-letter symbol set and checking to see if the concatenation is in the list of six letter words.  In pseudocode:

for each chemical symbol
    concatenate with each other symbol
    for two-symbol concatenation
        concatenate with a third symbol, not like either of the first two
        if the 3-symbol concatenation is in the list of English words
            write it to the list of chemical words

It's necessary that various text files be in alphabetical order.   If necessary, use a bubblesort routine (see code later on.)

Here's one possible implementation I developed:

//ChemPuzzMain -- for developing list of 6-letter chem words.

#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <cstdlib>
using namespace std;

bool checkThree(string, istream &, ostream &);

int main() {
	// Build a vector of chemical symbol strings 
	// One letter symbols may be omitted here 
	// CS.push_back("f"); CS.push_back("o"); vector<string> CS; CS.push_back("h");  
	// CS.push_back("c"); CS.push_back("i"); CS.push_back("v"); CS.push_back("w"); 
	// CS.push_back("y"); CS.push_back("w"); CS.push_back("u"); CS.push_back("n"); 
	// CS.push_back("s");
	// Here is an incomplete list of two-letter symbols chosen to be 
	// most promising in building 6-letter words.
	CS.push_back("he"); CS.push_back("ac"); CS.push_back("al"); CS.push_back("am"); 
	CS.push_back("ar"); CS.push_back("as"); CS.push_back("at"); CS.push_back("ba"); 
	CS.push_back("be"); CS.push_back("bi"); CS.push_back("br"); CS.push_back("ca"); 
	CS.push_back("ce"); CS.push_back("cl"); CS.push_back("co"); CS.push_back("es"); 
	CS.push_back("er"); CS.push_back("fr"); CS.push_back("ga"); CS.push_back("ge"); 
	CS.push_back("au"); CS.push_back("ha"); CS.push_back("he"); CS.push_back("ho");
	CS.push_back("in"); CS.push_back("ir"); CS.push_back("fe"); CS.push_back("la"); 
	CS.push_back("li"); CS.push_back("mo"); CS.push_back("ne"); CS.push_back("ni"); 
	CS.push_back("no"); CS.push_back("po"); CS.push_back("pr"); CS.push_back("pa"); 
	CS.push_back("ra"); CS.push_back("re"); CS.push_back("se"); CS.push_back("si"); 
	CS.push_back("ag"); CS.push_back("na"); CS.push_back("ta"); CS.push_back("te");
	CS.push_back("th"); CS.push_back("ti"); CS.push_back("te"); CS.push_back("sc");
	
	//open six letter word input file stream
	ifstream inWordSix( "sixLetterWords.txt", ios::in); //, ios::in
	if( !inWordSix ) {
		cerr << "File could not be opened." << endl;
		exit(1);
	}

	//open six letter word output file stream
	ofstream outWordSix("sixChemWords.txt", ios::out);
	if(!outWordSix) {
		cerr << "\nFile sixChemWords couldn't be opened." << endl;
		exit(2);
	}
	// jGuess and kGuess are needed to keep track of running guesses 
	string guess = "", jGuess, kGuess;  
	unsigned i,j,k;
	for(i = 0; i < CS.size(); ++i) {  //for each chemical symbol  
		guess = CS[i];
		// concatenate everyother chemical symbol (should omit same)
		for(j = 0; j < CS.size(); ++j) {
			jGuess = guess;
			guess += CS[j]; //+= is overloaded to concatenate strings
			for(k = 0; k < CS.size(); ++k) {  // for each pair, append third 
				kGuess = guess;
				// this guess will be one of the N^3 possible chem words
				guess += CS[k]; 
				inWordSix.clear();  //WHOA!! This took a long time to find!
				// checkThree checks of this concatenation of three
				// chemical symbols is in the list of 6-letter words
				// and if it is, writes it to the file of same.
				if (checkThree(guess, inWordSix, outWordSix)) {
					cout << guess << endl;
				}
				guess = kGuess; // Restore last 2-symbol guess 
			} // end innermost for loop 
			guess = jGuess;
			cout << "\nguess is now " << guess;
		} //end second for loop
	} // end first for loop 
	char a;
	cin >> a; // for a pause 
	return 0;
}
 
bool checkThree(string s, istream &input, ostream &output) {
	string word;
	bool found = false;
	input.seekg(0, ios::beg); // seekg ifstream ftn positions to start
	do {
		input >> word; // Read word from file 
        	// and if it's in the alphabetically sorted file
		// write it to the output file
		if(word == s) output << word << " ";
	} while( word <= s && !input.eof() );
	// return info on whether or not a word was found 
	return found;
}
 

The main outcome of running this program is a file of six letter chemical words like this one --though by using more symbols or a different list of 6-letter words, of course, you'll get a somewhat different result. 

It may be that concatenating wasn't the most productive thing, since the third bulleted item above, "Construct a vector of vectors of strings consisting of 1 or 2 characters representing chemical symbols"  means that we need to break them apart again in pieces so we have a proper "database" of words of symbol sequences to build our 3X3 array.  Below you should find the code I used to build vector of vectors of strings which compose this data.  As I recall, this code will first build the structure and then sort it.  It could be better to sort as we build. 

Tomorrow.