Monday, August 26, 2013

Open Hatch's Scrabble Challenge: The Analysis Explained

How long has this been going on?


In the last post of this challenge we wrote a program to examine the SOWPODS word list. Why don't we just skip all this statistics nonsense and write the final codethat solves the problem? Why do we care about the word length? 

We need to determine the scope of the project so we can determine a proper solution. If the SOWPODS word list only had 100 words or 7 million words in size, there may be drastically different approaches to the solution. Also the length of the word can be used to shorten a set of possible matches to a particular rack entered by the user. By analyzing the SOWPODS list, we see that there are substantial number of longer words, what some call five dollar words. If our user only gives us 7 or 8 letters, we can exclude words that are longer from the set of possible matches and speed up response time.

Let's take a look at the code and if you have questions about specific functions and commands, check out the official Python Language Reference. If you are an experienced Python programmer, you may want to skip through the pedantic explanations.

In the Beginning



#!/usr/bin/env python
# -*- coding: ascii -*-

"""
Sowpods stats
    - counts the words on the list
    - finds the longest word
    - breakdown of word length
 
    ToDo
    - Help
    - Error handling
 
"""

If you need some explanation of this section, you can review Executable Python Scripts, Source Code Encoding and Documentation Strings.

Back to the Future


from __future__ import print_function
import string
import sys

Here we use the print function from Python 3 by using the __future__ module. This allows us to easily port our code to Python 3 if need be.

Get Some Class


class LenCounter:
    def __init__(self):
        self.dict = {}
    def add(self, item):
        count = self.dict.get(item, 0)
        self.dict[item]  = count + 1
    def counts(self, desc=None):
        """Returns a list of keys sorted by values.
        Pass desc as 1 if you want a descending sort. """
        result = map(None, self.dict.values(), self.dict.keys())
        result.sort()
        if desc: result.reverse()
        return result     
 

Here we have created a LenCounter class to build a dictionary to count words of various lengths. See if you can determine what each method in the class does.

Get the Stats


def get_stats():
    input_file = sys.argv[1]
    word_count = 0
    longest_length = 0
    lc = LenCounter()
 
    f = open(input_file, 'r')
 
    for line in f:
        word_count += 1
        lc.add(len(line.strip()))
        if len(line.strip()) > longest_length:
            longest_length = len(line.strip())
 
    print("Word Count: ", word_count)
    print("Longest Word Length: ", longest_length)
    for item in lc.counts():
        print(item)
 
    f.close()
     
         

The heart of the program. We open the SOWPODS file, read in each line, trim the white space, call the LenCounter class, determine the length of the largest string in the list and print the results.

The Main Event


if __name__=='__main__':
    # test()
    get_stats()

We will be running this program as a standalone utility so this is necessary. Eventually we may modify this and use it a module in a larger program.

Unfinished Business

You may have noticed sections of the stats program are like this:

def test():
    """ Testing Docstring"""
    pass

These functions have yet to be completed and currently aren't necessary to the core functionality of the program. Currently they serve as placeholders for features yet to be implemented. But we will come back to finish them at a later date, since they will make for a more complete, correct and friendly program.



Friday, August 23, 2013

DFW Pythoneers Meeting August 22, 2013

We had 10 individuals show up at Taco Cabana last night throughout the evening for the Monthly Social meeting. There were some questions about the Scrabble Challenge, so laptops were setup for a short while to poke around some code.

There was some discussion on learning Python on a more intermediate level, so a suggestion was made to utilize Doug Hellmann's Python Module of the Week to learn the standard Python library. It's free and it's good. Plus if you like a dead tree copy, he's compiled the postings into a book.

If we start getting more people to attend, we can move to a nearby restaurant that has a separate meeting room. This would improve communication and space since Taco Cabana can get noisy and packed.

The next Second Saturday Teaching meeting is in three weeks so we need to firm up a location. If you have a facility that can host, let us know. Room for up to 30 or 40 people, WiFi and a projector is what we are looking for. You get the glory of being a sponsor and your HR people can make some connections.

If you want to suggest a space for a additional social or project night in a site other the Frisco, feel free to do so. There's a few people on the west side of the MetroPlex that need some attention. I can travel or continue to host at Gazebo Burger.

If you need a Python programmer for a contract, Ralph's available. Send a email on the mailing list or to me with some contact info.

Tuesday, August 20, 2013

Open Hatch's Scrabble Challenge: The Analysis

Charting letter frequency

Our first step is to analyze the SOWPODS word list used in this challenge. If you read the Wikipedia entry, you will notice that there is a word distribution by number of letters. This is important to note if you are a tournament player or just completing this challenge.

Below is a program that will recreate this list, although in a slightly different manner. Read through the code and see if you can explain what each part does. Try to determine why we care about the length of the words,

stats.py



#!/usr/bin/env python
# -*- coding: ascii -*-

"""
Sowpods stats
    - counts the words on the list
    - finds the longest word
    - breakdown of word length
 
    ToDo
    - Help
    - Error handling
 
"""
from __future__ import print_function
import string
import sys

class LenCounter:
    def __init__(self):
        self.dict = {}
    def add(self, item):
        count = self.dict.get(item, 0)
        self.dict[item]  = count + 1
    def counts(self, desc=None):
        """Returns a list of keys sorted by values.
        Pass desc as 1 if you want a descending sort. """
        result = map(None, self.dict.values(), self.dict.keys())
        result.sort()
        if desc: result.reverse()
        return result     
 
def get_stats():
    input_file = sys.argv[1]
    word_count = 0
    longest_length = 0
    lc = LenCounter()
 
    f = open(input_file, 'r')
 
    for line in f:
        word_count += 1
        lc.add(len(line.strip()))
        if len(line.strip()) > longest_length:
            longest_length = len(line.strip())
 
    print("Word Count: ", word_count)
    print("Longest Word Length: ", longest_length)
    for item in lc.counts():
        print(item)
 
    f.close()
     
         
def print_help():
    """ Help Docstring"""
    pass


def test():
    """ Testing Docstring"""
    pass

if __name__=='__main__':
    # test()
    get_stats()


The Results


Word Count: 267751
Longest Word Length: 15
(124, 2)
(1292, 3)
(5454, 4)
(5757, 15)
(9116, 14)
(12478, 5)
(13857, 13)
(20297, 12)
(22157, 6)
(27893, 11)
(32909, 7)
(35529, 10)
(40161, 8)
(40727, 9)




Monday, August 19, 2013

Python Challenge Number 1: Open Hatch's Scrabble Challenge: Intro

Intro to the Intro:

 If you are new to Python or new to programming, there's numerous Python tutorials and online courses. But once you get through Learn Python the Hard Way or the official tutorial, what's next?

The best way to learn may be another course or tutorial, but a project. Projects may be work projects or personal projects, or if you haven't found an idea that strikes your fancy, a suggested programming challenge. There's thousands of projects and challenge sites out there, so lets narrow the scope down an pick a few interesting, educational, fun, but doable ones.

The Challenge:

OpenHatch has a list of Intermediate Python Workshops/Projects on their wiki that suit our requirements. The Scrabble Challenge is the first one we want to attempt.  Scrabble, especially in the form of "Words with Friends", is a popular pastime among many people, including my spouse.

 Scrabble has been around for generations in my family and has evolved in a few forms to be a more interactive game on Facebook and various smart phones. My spouse usually has several games or more ongoing with friends and many of them use "hint" or "cheater" web sites to "broaden their vocabulary" or gain an advantage.

The Scrabble challenge is to make "Scrabble cheater" that helps the play find words in their letter rack with a CLI Python program.

The Requirements:

You need to have a computer with Python 2.6 or greater, a text editor and a copy of the SOWPODS word list that is referenced in the Scrabble Challenge. You should attempt to complete as much as this challenge on your own before resorting to the help of others.

The challenge web site does have some helpful guidelines and hints on how to break the problem down into easier pieces.  I'll post some code this week that may help you analyze the challenge.

Sunday, August 18, 2013

August DWF Pythoneers Meetings

This weekend a good chunk of the crew are at PyTexas 2013, so let's recap the local events so far.

Last Saturday we had the 2nd Saturday Teaching Meeting at ZeOmega. 22 Pythoneers showed up and we had a great time with presentations, networking, snacks and pizza. I ended up leading the beginners group and covered analyzing words in the word list for the Scrabble Solver Challenge.

Special thanks to ZeOmega for hosting Saturday. Also to Bill for loaner laptop with Mint Linux, it was a lifesaver! If you have some SQL Server reporting skills, ZeOmega is looking for some good analysts.

Thursday night 3 coders should up  at Project Night at Gazebo Burger in Frisco. I continued work on the Scrabble Solver Challenge, putting the word list in a SQLite database and calculated the word score among other functions.

Next Thursday night is Social Meeting at Taco Cabana in Addison. No laptop required, but if you want to talk code or design, feel free.