Charting letter frequency
Our first step is to analyze the SOWPODS word list used in this challenge. If you read the Wikipedia entry, you will notice that there is a word distribution by number of letters. This is important to note if you are a tournament player or just completing this challenge.Below is a program that will recreate this list, although in a slightly different manner. Read through the code and see if you can explain what each part does. Try to determine why we care about the length of the words,
stats.py
#!/usr/bin/env python
# -*- coding: ascii -*-
"""
Sowpods stats
- counts the words on the list
- finds the longest word
- breakdown of word length
ToDo
- Help
- Error handling
"""
from __future__ import print_function
import string
import sys
class LenCounter:
def __init__(self):
self.dict = {}
def add(self, item):
count = self.dict.get(item, 0)
self.dict[item] = count + 1
def counts(self, desc=None):
"""Returns a list of keys sorted by values.
Pass desc as 1 if you want a descending sort. """
result = map(None, self.dict.values(), self.dict.keys())
result.sort()
if desc: result.reverse()
return result
def get_stats():
input_file = sys.argv[1]
word_count = 0
longest_length = 0
lc = LenCounter()
f = open(input_file, 'r')
for line in f:
word_count += 1
lc.add(len(line.strip()))
if len(line.strip()) > longest_length:
longest_length = len(line.strip())
print("Word Count: ", word_count)
print("Longest Word Length: ", longest_length)
for item in lc.counts():
print(item)
f.close()
def print_help():
""" Help Docstring"""
pass
def test():
""" Testing Docstring"""
pass
if __name__=='__main__':
# test()
get_stats()
The Results
Word Count: 267751
Longest Word Length: 15
(124, 2)
(1292, 3)
(5454, 4)
(5757, 15)
(9116, 14)
(12478, 5)
(13857, 13)
(20297, 12)
(22157, 6)
(27893, 11)
(32909, 7)
(35529, 10)
(40161, 8)
(40727, 9)
This is great Chris! I'm really excited by this, and I have some code I'll share on GitHub as soon as I get home this evening.
ReplyDeleteI tried to share it last night, but my internet (Time Warner) is not the greatest.
Thanks for this fun exercise!!!
Ed,
ReplyDeleteI'm looking forward to seeing someone else's solution.
Thanks!