Monday, August 26, 2013

Open Hatch's Scrabble Challenge: The Analysis Explained

How long has this been going on?


In the last post of this challenge we wrote a program to examine the SOWPODS word list. Why don't we just skip all this statistics nonsense and write the final codethat solves the problem? Why do we care about the word length? 

We need to determine the scope of the project so we can determine a proper solution. If the SOWPODS word list only had 100 words or 7 million words in size, there may be drastically different approaches to the solution. Also the length of the word can be used to shorten a set of possible matches to a particular rack entered by the user. By analyzing the SOWPODS list, we see that there are substantial number of longer words, what some call five dollar words. If our user only gives us 7 or 8 letters, we can exclude words that are longer from the set of possible matches and speed up response time.

Let's take a look at the code and if you have questions about specific functions and commands, check out the official Python Language Reference. If you are an experienced Python programmer, you may want to skip through the pedantic explanations.

In the Beginning



#!/usr/bin/env python
# -*- coding: ascii -*-

"""
Sowpods stats
    - counts the words on the list
    - finds the longest word
    - breakdown of word length
 
    ToDo
    - Help
    - Error handling
 
"""

If you need some explanation of this section, you can review Executable Python Scripts, Source Code Encoding and Documentation Strings.

Back to the Future


from __future__ import print_function
import string
import sys

Here we use the print function from Python 3 by using the __future__ module. This allows us to easily port our code to Python 3 if need be.

Get Some Class


class LenCounter:
    def __init__(self):
        self.dict = {}
    def add(self, item):
        count = self.dict.get(item, 0)
        self.dict[item]  = count + 1
    def counts(self, desc=None):
        """Returns a list of keys sorted by values.
        Pass desc as 1 if you want a descending sort. """
        result = map(None, self.dict.values(), self.dict.keys())
        result.sort()
        if desc: result.reverse()
        return result     
 

Here we have created a LenCounter class to build a dictionary to count words of various lengths. See if you can determine what each method in the class does.

Get the Stats


def get_stats():
    input_file = sys.argv[1]
    word_count = 0
    longest_length = 0
    lc = LenCounter()
 
    f = open(input_file, 'r')
 
    for line in f:
        word_count += 1
        lc.add(len(line.strip()))
        if len(line.strip()) > longest_length:
            longest_length = len(line.strip())
 
    print("Word Count: ", word_count)
    print("Longest Word Length: ", longest_length)
    for item in lc.counts():
        print(item)
 
    f.close()
     
         

The heart of the program. We open the SOWPODS file, read in each line, trim the white space, call the LenCounter class, determine the length of the largest string in the list and print the results.

The Main Event


if __name__=='__main__':
    # test()
    get_stats()

We will be running this program as a standalone utility so this is necessary. Eventually we may modify this and use it a module in a larger program.

Unfinished Business

You may have noticed sections of the stats program are like this:

def test():
    """ Testing Docstring"""
    pass

These functions have yet to be completed and currently aren't necessary to the core functionality of the program. Currently they serve as placeholders for features yet to be implemented. But we will come back to finish them at a later date, since they will make for a more complete, correct and friendly program.



No comments:

Post a Comment