Tuesday, August 20, 2013

Open Hatch's Scrabble Challenge: The Analysis

Charting letter frequency

Our first step is to analyze the SOWPODS word list used in this challenge. If you read the Wikipedia entry, you will notice that there is a word distribution by number of letters. This is important to note if you are a tournament player or just completing this challenge.

Below is a program that will recreate this list, although in a slightly different manner. Read through the code and see if you can explain what each part does. Try to determine why we care about the length of the words,

stats.py



#!/usr/bin/env python
# -*- coding: ascii -*-

"""
Sowpods stats
    - counts the words on the list
    - finds the longest word
    - breakdown of word length
 
    ToDo
    - Help
    - Error handling
 
"""
from __future__ import print_function
import string
import sys

class LenCounter:
    def __init__(self):
        self.dict = {}
    def add(self, item):
        count = self.dict.get(item, 0)
        self.dict[item]  = count + 1
    def counts(self, desc=None):
        """Returns a list of keys sorted by values.
        Pass desc as 1 if you want a descending sort. """
        result = map(None, self.dict.values(), self.dict.keys())
        result.sort()
        if desc: result.reverse()
        return result     
 
def get_stats():
    input_file = sys.argv[1]
    word_count = 0
    longest_length = 0
    lc = LenCounter()
 
    f = open(input_file, 'r')
 
    for line in f:
        word_count += 1
        lc.add(len(line.strip()))
        if len(line.strip()) > longest_length:
            longest_length = len(line.strip())
 
    print("Word Count: ", word_count)
    print("Longest Word Length: ", longest_length)
    for item in lc.counts():
        print(item)
 
    f.close()
     
         
def print_help():
    """ Help Docstring"""
    pass


def test():
    """ Testing Docstring"""
    pass

if __name__=='__main__':
    # test()
    get_stats()


The Results


Word Count: 267751
Longest Word Length: 15
(124, 2)
(1292, 3)
(5454, 4)
(5757, 15)
(9116, 14)
(12478, 5)
(13857, 13)
(20297, 12)
(22157, 6)
(27893, 11)
(32909, 7)
(35529, 10)
(40161, 8)
(40727, 9)




2 comments:

  1. This is great Chris! I'm really excited by this, and I have some code I'll share on GitHub as soon as I get home this evening.

    I tried to share it last night, but my internet (Time Warner) is not the greatest.

    Thanks for this fun exercise!!!

    ReplyDelete
  2. Ed,

    I'm looking forward to seeing someone else's solution.

    Thanks!

    ReplyDelete