Loading…

Given a list of millions of words, design an algorithm

 

Many problems involving a dictionary can be solved by doing some pre-processing. Where can we do preprocessing?

 

Well, if we’re going to create a rectangle of words, we know that each row must be the same length and each column must be the same length. So let’s group the words of the dictionary based on their sizes. Let’s call this grouping D, where D [ i] contains the list of words of length i.

Next, observe that we’re looking for the largest rectangle. What is the largest rectangle that could be formed? It’s a length ( largest word ).

 

 int maxRectangle = longestWord * longestWord;

 for z = maxRectangle to 1 {

 for each pair of numbers (i, j) where i*j = z {

 /* attempt to make a rectangle. return if successful. */

 }

 }

 

By iterating from the biggest possible rectangle to the smallest, we ensure that the first valid rectangle we find will be the largest possible one.

 

Now, for the hard part: makeRectangle( int 1, int h). This method attempts to build a rectangle of

words which has length 1 and height h.

 

One way to do this is to iterate through all (ordered) sets of h words and then check if the columns are also valid words. This will work, but it’s rather inefficient.

 

Imagine that we are trying to build a 6×5 rectangle and the first few rows are:

  • there

  • queen

  • pizza

 

At this point, we know that the first column starts with tqp. We know or should know-that no dictionary word starts with tqp. Why do we bother continuing to build a rectangle when we know we’ll fail to create a valid one in the end? This leads us to a more optimal solution. We can build a trie to easily look up if a substring is a prefix of a word in the dictionary. Then, when we build our rectangle, row by row, we check to see if the columns are all valid prefixes. If not, we fail immediately, rather than continue to try to build this rectangle.

 

The code below implements this algorithm. It is long and complex, so we will go through it step by step. First, we do some pre-processing to group words by their lengths. We create an array of tries (one for each word length) but hold off on building the tries until we need them.

 

 WordGroup[] grouplist = WordGroup.createWordGroups(list);

 int maxWordlength = grouplist.length;

 Trie trielist[] = new Trie[maxWordlength];

 

The maxRectangle method is the “main” part of our code. It starts with the biggest possible rectangle area (which is maxWord Length) and tries to build a rectangle of that size. If it fails, it subtracts one from the area and attempts this new, smaller size. The first rectangle that can be successfully built is guaranteed to be the biggest.

 

 Rectangle maxRectangle() {

 int maxSize = maxWordLength * maxWordLength;

 for (int z = maxSize; z > 0; z–) {// start from biggest area

 for (int i= 1; i <= maxWordLength; i TT) {

 if (z % i == 0) {

 int j = z / i;

 if (j <= maxWordLength) {

 /* Create rectangle of length i and height j. Note that i * j z. */

Rectangle rectangle = makeRectangle(i, j);

if (rectangle != null) return rectangle;

 }

  }

  }

 }

 return null;

 }

 

The makeRectangle method is called by maxRectangle and tries to build a rectangle of a specific length and height.

 

 Rectangle makeRectangle(int length, int height) {

 if (grouplist[length-1] == null I I grouplist[height-1]

 return null;

 }

 

 /* Create trie for word length if we haven’t yet*/

 if (trielist[height – 1] == null) {

null) {

 LinkedList<String> words = grouplist[height – 1].getWords();

 trielist[height – 1] = new Trie(words);

 }

 

 return makePartialRectangle(length, height, new Rectangle(length));

}

 

The makePartialRectangle method is where the action happens. It is passed in the intended, final length and height, and a partially formed rectangle. If the rectangle is already of the final height, then we just check to see if the columns form valid, complete words, and return.

 

Otherwise, we check to see if the columns form valid prefixes. If they do not, then we immediately break since there is no way to build a valid rectangle off of this partial one.

 

But, if everything is okay so far, and all the columns are valid prefixes of words, then we search through all the words of the right length, append each to the current rectangle, and recursively try to build a rectangle off of{current rectangle with new word appended}.

 

 Rectangle makePartialRectangle(int 1, int h, Rectangle rectangle) {

 if (rectangle.height== h) {// Check if complete rectangle

 if (rectangle.isComplete(l, h, grouplist[h – 1])) {

 return rectangle;

 }

 return null;

 }

 

 /* Compare columns to trie to see if potentially valid rect */

 if (!rectangle.isPartialOK(l, trielist[h – 1])) {

 return null;

 }

 /* Go through all words of the right length. Add each one to the current partial

 * rectangle, and attempt to build a rectangle recursively. */

 for (int i= 0; i < grouplist[l-1].length(); i++) {

 /* Create a new rectangle which is this rect + new word. */

 Rectangle orgPlus = rectangle.append(groupList[l-1].getWord(i));

 

 /* Try to build a rectangle with this new, partial rect */

 Rectangle rect = makePartialRectangle(l, h, orgPlus);

 if (rect != null) {

 return rect;

 }

 }

 return null;

 }

 

The Rectangle class represents a partially or fully formed rectangle of words. The method isPartialOk can be called to check if the rectangle is, thus far, a valid one (that is, all the columns are prefixes of words). The method isComplete serves a similar function, but checks if each of the columns makes a full word.

 

 public class Rectangle {

 public int height, length;

 public char[][] matrix;

 

 /* Construct an “empty” rectangule. Length is fixed, but height varies as we add

 *words.*/

 public Rectangle(int 1) {

 height 0;

length= l;

 }

 

 /* Construct a rectangular array of letters of the specified length and height,

 * and backed by the specified matrix of letters. (It is assumed that the length

 * and height specified as arguments are consistent with the array argument’s

 *dimensions.)*/

 public Rectangle(int length, int height, char[][] letters) {

 this.height = letters.length;

 this.length = letters[0].length;

 matrix = letters;

 }

 

 public char getletter (int i, int j) { return matrix[i][j]; }

 public String getColumn(int i) { … }

 

 /* Check if all columns are valid. All rows are already known to be valid since

 * they were added directly from dictionary.*/

 public boolean isComplete(int 1, int h, WordGroup grouplist) {

 if (height== h) {

 /* Check if each column is a word in the dictionary.*/

 

for (int i= 0; i < l; i++) {

String col= getColumn(i);

if (!groupList.containsWord(col)) {

return false;

}

}

 return true;

 }

 return false;

 }

 

 public boolean isPartialOK(int 1, Trie trie) {

 if (height == 0) return true;

 for (inti= 0; i < l; i++ ) {

 String col= getColumn(i);

 if (!trie.contains(col)) {

 return false;

}

}

Cracking the Coding Interview, 6th Edition 

 return true;

 }

 /* Create a new Rectangle by taking the rows of the current rectangle and

 * appending s. */

 public Rectangle append(String s) { … }

 }

 

The WordGroup class is a simple container for all words of a specific length. For easy lookup, we store the words in a hash table as well as in an Array List.

 

The lists in WordGroup are created through a static method called createWordGroups.

 

 public class WordGroup {

 private HashMap<String, Boolean> lookup= new HashMap<String, Boolean>();

 private Arraylist<String> group= new Arraylist<String>();

 public boolean containsWord(String s) { return lookup.containsKey(s); }

 public int length() { return group.size(); }

 public String getWord(int i) { return group.get(i); }

 public Arraylist<String> getWords() { return group; }

 

 public void addWord (String s) {

 group.add(s);

 lookup.put(s, true);

 }

 

 public static WordGroup[] createWordGroups(String[] list) {

 WordGroup[] grouplist;

 int maxWordlength = 0;

 /* Find the length of the longest word*/

 for (int i= 0; i < list.length; i++) {

 if (list[i].length() > maxWordLength) {

 maxWordLength = list[i].length();

 }

 }

 

 /* Group the words in the dictionary into lists of words of same length.

 * grouplist[i] will contain a list of words, each of length (i+l). */

 grouplist = new WordGroup[maxWordlength];

 for (int i= 0; i < list.length; i++) {

 /*We do wordlength – 1 instead of just wordlength since this is used as

 * an index and no words are of length 0 */

 int wordLength = list[i].length() – 1;

 if (groupList[wordLength] == null) {

 grouplist[wordlength] = new WordGroup();

 }

 groupList[wordLength].addWord(list[i]);

 }

 return grouplist;

 }

 }

 

The full code for this problem, including the code for Trie and TrieNode, can be found in the code

attachment. Note that in a problem as complex as this, you’d most likely only need to write the pseudocode.

Writing the entire code would be nearly impossible in such a short amount of time. 

 


Leave a Comment