tag:blogger.com,1999:blog-68042524923409005332024-03-13T23:37:57.252-07:00Once More, With ParityUnknownnoreply@blogger.comBlogger18125tag:blogger.com,1999:blog-6804252492340900533.post-83826853489784741012013-09-19T17:27:00.001-07:002013-09-19T17:27:17.520-07:00Open Hatch's Scrabble Challenge: Let's Score and More<h4>
<b><span style="font-weight: normal;">Scoring</span></b></h4>
So we've discussed and analyzed word length as part of our Scrabble Challenge. Now let's discuss scoring the value of the words in the list. The kind folks at OpenHatch kindly give us a <a href="https://openhatch.org/wiki/Scrabble_challenge#Resources">dictionary</a> of letters with their values. So the "base" score of a word would be the sum of the value of letters. Let's build a function that calculates the score that utilizes this gift.<br />
<br />
<code class="prettyprint lang-py"><br />
def word_score(input_word):<br />
# score the word<br />
# need to account for the blanks<br />
scores = {"a": 1, "c": 3, "b": 3, "e": 1, "d": 2, "g": 2,<br />
"f": 4, "i": 1, "h": 4, "k": 5, "j": 8, "m": 3,<br />
"l": 1, "o": 1, "n": 1, "q": 10, "p": 3, "s": 1,<br />
"r": 1, "u": 1, "t": 1, "w": 4, "v": 4, "y": 4,<br />
"x": 8, "z": 10}<br />
<br />
word_score = 0<br />
<br />
for letter in input_word:<br />
word_score = word_score + scores[letter]<br />
<br />
return word_score</code><br />
<br />
In actual play scoring is more complex. The player has to consider the premium squares, blank tiles and the words that are already in play. These situations are a bit too complex to be calculated in our simple program, but but we can assist the player by returning the base score of their possibilities. <br />
<br />
<h4>
<span style="font-weight: normal;">Where to Store the Score?</span></h4>
<br />
We could calculate and store the score and words in the SOWPODS list in a large dictionary similar to that of the letters and there scores. But this approach doesn't resolve some issues. We also need a place to store the length of the word. A dictionary can store a key/value pair, but we need the ability to use more than one value to sort by. We could use a list, but instead we are going are going to use a secret weapon included with Python: <a href="https://sqlite.org/">SQLite</a>.<br />
<br />
Loading the word list, word lengths an scores into a database solves a few problems in this challenge. We only have to load the data and the calculated values once, then use the fruits of the work again and again.<br />
<br />
<code class="prettyprint lang-py"><br />
#!/usr/bin/env python<br />
# -*- coding: ascii -*-<br />
"""<br />
Load the sowpods word list into a sqlite database table<br />
<br />
Note: Rough Prototype<br />
"""<br />
<br />
from __future__ import print_function<br />
import string<br />
import sys<br />
import sqlite3 as sqlite<br />
<br />
def test_for_db():<br />
# test for existance of sowpods database<br />
pass<br />
<br />
def test_for_sowpods():<br />
# test for existence of sowpods text file<br />
pass<br />
<br />
def word_score(input_word):<br />
# score the word<br />
# need to account for the blanks<br />
scores = {"a": 1, "c": 3, "b": 3, "e": 1, "d": 2, "g": 2,<br />
"f": 4, "i": 1, "h": 4, "k": 5, "j": 8, "m": 3,<br />
"l": 1, "o": 1, "n": 1, "q": 10, "p": 3, "s": 1,<br />
"r": 1, "u": 1, "t": 1, "w": 4, "v": 4, "y": 4,<br />
"x": 8, "z": 10}<br />
<br />
word_score = 0<br />
<br />
for letter in input_word:<br />
word_score = word_score + scores[letter]<br />
<br />
return word_score<br />
<br />
<br />
def word_list(input_file):<br />
# create a list of tuples which containing the word, it's length, score and sorted value<br />
<br />
sp_list =[]<br />
f = open(input_file, 'r')<br />
<br />
for line in f:<br />
sp_word = line.strip().lower()<br />
sp_list.append((sp_word, len(sp_word), ''.join(sorted(sp_word)), word_score(sp_word)))<br />
<br />
f.close()<br />
<br />
return sp_list<br />
<br />
<br />
def load_db(data_list):<br />
<br />
# create database/connection string/table<br />
conn = sqlite.connect("sowpods.db")<br />
<br />
cursor = conn.cursor()<br />
# create a table <br />
tb_create = """CREATE TABLE spwords<br />
(sp_word text, word_len int, word_alpha text, word_score int)<br />
"""<br />
conn.execute(tb_create)<br />
conn.commit()<br />
<br />
# Fill the table<br />
conn.executemany("insert into spwords(sp_word, word_len, word_alpha, word_score) values (?,?,?,?)", data_list)<br />
conn.commit()<br />
<br />
# Print the table contents<br />
for row in conn.execute("select sp_word, word_len, word_alpha, word_score from spwords"):<br />
print (row)<br />
<br />
if conn:<br />
conn.close()<br />
<br />
def print_help():<br />
""" Help Docstring"""<br />
pass<br />
<br />
<br />
def test():<br />
""" Testing Docstring"""<br />
pass<br />
<br />
if __name__=='__main__':<br />
# test()<br />
sp_file = "sowpods.txt"<br />
load_db(word_list(sp_file))<br />
<br />
<br />
</code><br />
<br />
<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-8931844477558104462013-09-15T09:08:00.000-07:002013-09-15T09:08:03.412-07:00DFW Pythoneers 2nd Saturday Teaching Meeting, September 14, 2013We had 11 Pythoneers show up for <a href="http://www.meetup.com/dfwpython/events/124805362/">DFW Pythoneers 2nd Saturday Teaching Meeting.</a> A special shout out to Jose and Jim from <a href="http://www.kforce.com/Office-Locations/Dallas-Texas.aspx">KForce Technology</a> for setting us up with a great conference room and refreshments.<br />
<br />
John Zurawski covered the news and interesting projects occurring in the Python world. He also covered his latest entry in the <a href="http://www.ludumdare.com/" target="_blank">Ludum Dare</a> contest and went into about his challenges with Python application installers. (John also loaned his Apple Mini DisplayPort to VGA Adapter to me which allowed me to present on the projector. Thanks a million, John!)<br />
<br />
I made a presentation on Python Challenges and covered another step in solving the <a href="https://openhatch.org/wiki/Scrabble_challenge">OpenHatch Scrabble Challenge</a>. <br />
<br />
If you missed the meeting due to the time change, please be aware that the scheduling may vary based on the venue, so double check the time and location for each individual meetup. KForce offered us great venue in a very central location, so it's probable that we will met here in the future and 1:00 PM is the closing time on Saturday. If you have any <b>firm</b> alternative sites please suggest them to either John, Kevin, Jeff or myself. We need room for 25 people, WiFi, a projector and restrooms. <br />
<br />
<br />
<br />
<br />
<br />
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-6804252492340900533.post-82391715503160726452013-08-26T18:07:00.000-07:002013-08-28T19:04:39.490-07:00 Open Hatch's Scrabble Challenge: The Analysis Explained<h4>How long has this been going on?</h4><br />
In the <a href="http://withparity.blogspot.com/2013/08/open-hatchs-scrabble-challenge-analysis.html" target="_blank">last post</a> of this challenge we wrote a program to examine the SOWPODS word list. Why don't we just skip all this statistics nonsense and write the final codethat solves the problem? Why do we care about the word length? <br />
<br />
We need to determine the scope of the project so we can determine a proper solution. If the SOWPODS word list only had 100 words or 7 million words in size, there may be drastically different approaches to the solution. Also the length of the word can be used to shorten a set of possible matches to a particular rack entered by the user. By analyzing the SOWPODS list, we see that there are substantial number of longer words, what some call five dollar words. If our user only gives us 7 or 8 letters, we can exclude words that are longer from the set of possible matches and speed up response time. <br />
<br />
Let's take a look at the code and if you have questions about specific functions and commands, check out the official <a href="http://docs.python.org/2/reference/" target="_blank">Python Language Reference</a>. <i>If you are an experienced Python programmer, you may want to skip through the pedantic explanations.</i><br />
<br />
<h4>In the Beginning</h4><br />
<code class="prettyprint lang-py"><br />
#!/usr/bin/env python<br />
# -*- coding: ascii -*-<br />
<br />
"""<br />
Sowpods stats<br />
- counts the words on the list<br />
- finds the longest word<br />
- breakdown of word length<br />
<br />
ToDo<br />
- Help<br />
- Error handling<br />
<br />
"""<br />
</code><br />
If you need some explanation of this section, you can review <a href="http://docs.python.org/2/tutorial/interpreter.html#executable-python-scriptshttp://">Executable Python Scripts</a>, <a href="http://docs.python.org/2/tutorial/interpreter.html#source-code-encoding">Source Code Encoding</a> and <a href="http://docs.python.org/2/tutorial/controlflow.html#documentation-strings">Documentation Strings</a>.<br />
<br />
<h4>Back to the Future</h4><code class="prettyprint lang-py"><br />
from __future__ import print_function<br />
import string<br />
import sys<br />
</code><br />
Here we use the print function from Python 3 by using the <a href="http://docs.python.org/2/glossary.html#term-future">__future__</a> module. This allows us to easily port our code to Python 3 if need be. <br />
<br />
<h4>Get Some Class</h4><code class="prettyprint lang-py"><br />
class LenCounter:<br />
def __init__(self):<br />
self.dict = {}<br />
def add(self, item):<br />
count = self.dict.get(item, 0)<br />
self.dict[item] = count + 1<br />
def counts(self, desc=None):<br />
"""Returns a list of keys sorted by values.<br />
Pass desc as 1 if you want a descending sort. """<br />
result = map(None, self.dict.values(), self.dict.keys())<br />
result.sort()<br />
if desc: result.reverse()<br />
return result <br />
<br />
</code><br />
Here we have created a LenCounter class to build a dictionary to count words of various lengths. See if you can determine what each method in the class does. <br />
<br />
<h4>Get the Stats</h4><code class="prettyprint lang-py"><br />
def get_stats():<br />
input_file = sys.argv[1]<br />
word_count = 0<br />
longest_length = 0<br />
lc = LenCounter()<br />
<br />
f = open(input_file, 'r')<br />
<br />
for line in f:<br />
word_count += 1<br />
lc.add(len(line.strip()))<br />
if len(line.strip()) > longest_length:<br />
longest_length = len(line.strip())<br />
<br />
print("Word Count: ", word_count)<br />
print("Longest Word Length: ", longest_length)<br />
for item in lc.counts():<br />
print(item)<br />
<br />
f.close()<br />
<br />
<br />
</code><br />
The heart of the program. We open the SOWPODS file, read in each line, trim the white space, call the LenCounter class, determine the length of the largest string in the list and print the results. <br />
<br />
<h4>The Main Event</h4><code class="prettyprint lang-py"><br />
if __name__=='__main__':<br />
# test()<br />
get_stats()<br />
</code><br />
We will be running this program as a standalone utility so this is necessary. Eventually we may modify this and use it a module in a larger program.<br />
<br />
<h4>Unfinished Business</h4>You may have noticed sections of the stats program are like this:<br />
<code class="prettyprint lang-py"><br />
def test():<br />
""" Testing Docstring"""<br />
pass<br />
</code><br />
These functions have yet to be completed and currently aren't necessary to the core functionality of the program. Currently they serve as placeholders for features yet to be implemented. But we will come back to finish them at a later date, since they will make for a more complete, correct and friendly program. <br />
<br />
<br />
<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-7363753922920601642013-08-23T06:03:00.000-07:002013-08-28T19:02:30.651-07:00DFW Pythoneers Meeting August 22, 2013We had 10 individuals show up at Taco Cabana last night throughout the evening for the <a href="http://www.meetup.com/dfwpython/events/133070472/">Monthly Social meeting</a>. There were some questions about the Scrabble Challenge, so laptops were setup for a short while to poke around some code. <br />
<br />
There was some discussion on learning Python on a more intermediate level, so a suggestion was made to utilize Doug Hellmann's <a href="http://pymotw.com/2/" target="_blank">Python Module of the Week</a> to learn the standard Python library. It's free and it's good. Plus if you like a dead tree copy, he's compiled the postings into a book.<br />
<br />
If we start getting more people to attend, we can move to a nearby restaurant that has a separate meeting room. This would improve communication and space since Taco Cabana can get noisy and packed. <br />
<br />
The next Second Saturday Teaching meeting is in three weeks so we need to firm up a location. If you have a facility that can host, let us know. Room for up to 30 or 40 people, WiFi and a projector is what we are looking for. You get the glory of being a sponsor and your HR people can make some connections. <br />
<br />
If you want to suggest a space for a additional social or project night in a site other the Frisco, feel free to do so. There's a few people on the west side of the MetroPlex that need some attention. I can travel or continue to host at Gazebo Burger. <br />
<br />
If you need a Python programmer for a contract, Ralph's available. Send a email on the <a href="http://www.dfwpython.org/mailman/listinfo/dfwpython" target="_blank">mailing list</a> or to me with some contact info. Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-79735277068962795592013-08-20T19:14:00.000-07:002013-08-28T19:01:59.664-07:00 Open Hatch's Scrabble Challenge: The Analysis<h4>Charting letter frequency</h4>Our first step is to analyze the <a href="https://en.wikipedia.org/wiki/SOWPODS" target="_blank">SOWPODS</a> word list used in this challenge. If you read the Wikipedia entry, you will notice that there is a word distribution by number of letters. This is important to note if you are a tournament player or just completing this challenge.<br />
<br />
Below is a program that will recreate this list, although in a slightly different manner. Read through the code and see if you can explain what each part does. Try to determine why we care about the length of the words,<br />
<br />
<h4>stats.py</h4><br />
<code class="prettyprint lang-py"><br />
#!/usr/bin/env python<br />
# -*- coding: ascii -*-<br />
<br />
"""<br />
Sowpods stats<br />
- counts the words on the list<br />
- finds the longest word<br />
- breakdown of word length<br />
<br />
ToDo<br />
- Help<br />
- Error handling<br />
<br />
"""<br />
from __future__ import print_function<br />
import string<br />
import sys<br />
<br />
class LenCounter:<br />
def __init__(self):<br />
self.dict = {}<br />
def add(self, item):<br />
count = self.dict.get(item, 0)<br />
self.dict[item] = count + 1<br />
def counts(self, desc=None):<br />
"""Returns a list of keys sorted by values.<br />
Pass desc as 1 if you want a descending sort. """<br />
result = map(None, self.dict.values(), self.dict.keys())<br />
result.sort()<br />
if desc: result.reverse()<br />
return result <br />
<br />
def get_stats():<br />
input_file = sys.argv[1]<br />
word_count = 0<br />
longest_length = 0<br />
lc = LenCounter()<br />
<br />
f = open(input_file, 'r')<br />
<br />
for line in f:<br />
word_count += 1<br />
lc.add(len(line.strip()))<br />
if len(line.strip()) > longest_length:<br />
longest_length = len(line.strip())<br />
<br />
print("Word Count: ", word_count)<br />
print("Longest Word Length: ", longest_length)<br />
for item in lc.counts():<br />
print(item)<br />
<br />
f.close()<br />
<br />
<br />
def print_help():<br />
""" Help Docstring"""<br />
pass<br />
<br />
<br />
def test():<br />
""" Testing Docstring"""<br />
pass<br />
<br />
if __name__=='__main__':<br />
# test()<br />
get_stats()<br />
</code><br />
<br />
<h4>The Results</h4><code class="prettyprint"><br />
Word Count: 267751<br />
Longest Word Length: 15<br />
(124, 2)<br />
(1292, 3)<br />
(5454, 4)<br />
(5757, 15)<br />
(9116, 14)<br />
(12478, 5)<br />
(13857, 13)<br />
(20297, 12)<br />
(22157, 6)<br />
(27893, 11)<br />
(32909, 7)<br />
(35529, 10)<br />
(40161, 8)<br />
(40727, 9)<br />
</code><br />
<br />
<br />
<br />
</code>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-6804252492340900533.post-66616092160360481452013-08-19T11:35:00.002-07:002013-08-28T19:01:44.178-07:00Python Challenge Number 1: Open Hatch's Scrabble Challenge: Intro<h4>Intro to the Intro: </h4> If you are new to Python or new to programming, there's numerous Python tutorials and online courses. But once you get through <a href="http://learnpythonthehardway.org/" target="_blank">Learn Python the Hard Way</a> or the <a href="http://docs.python.org/2/tutorial/" target="_blank">official tutorial</a>, what's next?<br />
<br />
The best way to learn may be another course or tutorial, but a project. Projects may be work projects or personal projects, or if you haven't found an idea that strikes your fancy, a suggested programming challenge. There's thousands of projects and challenge sites out there, so lets narrow the scope down an pick a few interesting, educational, fun, but doable ones.<br />
<br />
<h4>The Challenge:</h4>OpenHatch has a list of <a href="https://openhatch.org/wiki/Intermediate_Python_Workshop/Projects" target="_blank">Intermediate Python Workshops/Projects</a> on their wiki that suit our requirements. The <a href="https://openhatch.org/wiki/Scrabble_challenge" target="_blank">Scrabble Challenge</a> is the first one we want to attempt. Scrabble, especially in the form of "Words with Friends", is a popular pastime among many people, including my spouse.<br />
<br />
<a href="http://en.wikipedia.org/wiki/Scrabble" target="_blank"> Scrabble</a> has been around for generations in my family and has evolved in a few forms to be a more interactive game on Facebook and various smart phones. My spouse usually has several games or more ongoing with friends and many of them use "hint" or "cheater" web sites to "broaden their vocabulary" or gain an advantage.<br />
<br />
The Scrabble challenge is to make "Scrabble cheater" that helps the play find words in their letter rack with a CLI Python program.<br />
<br />
<h4>The Requirements:</h4>You need to have a computer with Python 2.6 or greater, a text editor and a copy of the <a href="http://en.wikipedia.org/wiki/SOWPODS" target="_blank">SOWPODS</a> word list that is referenced in the <a href="https://openhatch.org/wiki/Scrabble_challenge" target="_blank">Scrabble Challenge</a>. You should attempt to complete as much as this challenge on your own before resorting to the help of others.<br />
<br />
The challenge web site does have some helpful guidelines and hints on how to break the problem down into easier pieces. I'll post some code this week that may help you analyze the challenge. <br />
<br />
<h1 class="title"></h1>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-23491986594296848672013-08-18T06:14:00.003-07:002013-08-28T19:01:11.727-07:00August DWF Pythoneers Meetings This weekend a good chunk of the crew are at <a href="http://www.pytexas.org/2013/" target="_blank">PyTexas 2013</a>, so let's recap the local events so far. <br />
<br />
Last Saturday we had the <a href="http://www.meetup.com/dfwpython/events/131291582/" target="_blank">2nd Saturday Teaching Meeting</a> at ZeOmega. 22 Pythoneers showed up and we had a great time with presentations, networking, snacks and pizza. I ended up leading the beginners group and covered analyzing words in the word list for the <a href="https://openhatch.org/wiki/Scrabble_challenge" target="_blank">Scrabble Solver Challenge</a>. <br />
<br />
Special thanks to <a href="http://www.zeomega.com/" target="_blank">ZeOmega</a> for hosting Saturday. Also to Bill for loaner laptop with Mint Linux, it was a lifesaver! If you have some SQL Server reporting skills, ZeOmega is looking for some <a href="http://www.zeomega.com/zeomega-disease-care-and-utilization-management-software/careers-at-zeomega/friscou-s-careers/" target="_blank">good analysts</a>. <br />
<br />
Thursday night 3 coders should up at <a href="http://www.meetup.com/dfwpython/events/134005182/" target="_blank">Project Night</a> at Gazebo Burger in Frisco. I continued work on the Scrabble Solver Challenge, putting the word list in a SQLite database and calculated the word score among other functions. <br />
<br />
Next Thursday night is <a href="http://www.meetup.com/dfwpython/events/133070472/" target="_blank">Social Meeting </a>at Taco Cabana in Addison. No laptop required, but if you want to talk code or design, feel free.<br />
<br />
<br />
<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-20949404073757024132013-07-15T19:45:00.001-07:002013-07-15T19:45:23.416-07:00DFW Pythoneers 2nd Saturday Teaching Meeting, July 13, 201318 <a href="http://www.meetup.com/dfwpython/">Pythoneers</a> showed up at the <a href="http://collidecenter.com/">The Collide Center</a> in McKinney. John Zurawski and Joesph Weaver found the location, and John was the gracious host and leader for the meeting. The venue was great, but unfortunately for us the meeting space will become more space for startups afterwards. <br />
<br />
<h3>
Upcoming Events</h3>
Kevin Horn announced that <a href="http://www.pytexas.org/2013/">PyTexas 2013</a> is August 16-18 in College Station. Early registration ends July 16th.Friday is oriented towards tutorials and training. See the web site for details.<br />
<br />
July 18th is Project Night at <a href="http://www.gazeeboburgers.com/locations.html#3">Gazeebo Burgers</a> in Frisco from 6:30 to 8:30. There's a separate meeting room that I'll request and the WiFi is usually good. Topic is pandas and data analysis, so show up prepped with software loaded and data to rip. If you are having difficulty loading all the required packages, consider loading the <a href="https://store.continuum.io/cshop/anaconda/">Anaconda</a> distribution from <a href="http://continuum.io/">Continuum Analytics</a>.<br />
<br />
July 20th is <a href="http://www.flightmuseum.com/event/moon-day-2/">Moon Day</a> at Frontiers Of Flight Museum at Love Field in Dallas. Our buddies at <a href="http://www.dprg.org/">DRPG</a> will be demoing various robots, plus there will be other cool displays. Moon Day is my favorite unofficial holiday, so even if you can't attend, pause for a moment and realize what an awesome achievement the Apollo program was... <br />
<br />
July 25 will be the normal casual <a href="http://www.meetup.com/dfwpython/events/123733962/" target="_blank">meeting</a> at Taco Cabana in Addison. This is a informal get together to chat and network with other Pythoneers and techies. If you are seeking a solution for a problem, ask around and chances are someone can help you. Otherwise, just geek out and enjoy the company. <br />
<br />
<h3>
Group Discussions</h3>
One of the main topics of discussion revolved around meeting locations and meeting content.<br />
<br />
<h4>
Meeting Space</h4>
In the past the group has had corporate sponsors whom had meeting facilities. Currently we need locations for Saturday teaching meetings and alternative sites for the 4th Thursday informal meetings. I can lead the 3rd Thursday Project nights at Gazeebo Burgers.<br />
<br />
An ideal teaching location would be central, easy to find, with a room for 30 or 40 people, WiFi, power outlets and restrooms. We also need a slightly large venue than Taco Cabana for the casual meeting since the largest table there is about eight seats.<br />
<br />
<h4>
Meeting Topics</h4>
There was an active discussion about Topics, Teaching, Presentation and Projects/Challenges. John proposed a meeting structure that works to start the meeting. and the group discussed various ideas for the "meat" of the meeting. It was recognized that people into Python have different needs, skill levels and interests. Some of the various topics brought forth:<br />
<br />
<pre>Web Frameworks
- Flask
- Django
- Idea: Framework Shootout - Simple web app spec; write it in diff frameworks
Related: find a way to allow beginners to work on subject area before talk
- Database?
- ORM's
- REST API's in Python
- Network Programming
- Twisted / Tornado
- Scientific Computing
- Pandas
- Game Programming
- IPython
- IPython Notebook
- Python eco-system/community
- Best Practices
- PEP8</pre>
<pre> </pre>
<h4>
Sharknado</h4>
The breakout presentation of the meeting was John Zurawski's pixel accurate clone of Sharknado done in <a href="http://cocos2d.org/" target="_blank">cocos2d</a>. Most awesome use of Python, ever!<br />
<br />
<h4>
Challenges </h4>
It was observed that programmers don't learn unless they have projects or challenges. So I'm borrowing a Python<a href="https://openhatch.org/wiki/Scrabble_challenge" target="_blank"> challenge</a> from elsewhere and will be working on this particular project in my spare time in the next couple of months. Let's swap notes and review code at a meeting in the future if you are interested. :)<br />
<br />
<br />
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-6804252492340900533.post-34252310729840012022013-07-01T03:48:00.002-07:002013-07-15T10:52:47.389-07:00DFW Pythoneers Meeting June 27, 201313 Python enthusiasts showed up at the Taco Cabana in Addison for the monthly meeting. There was some confusion about the meeting location and some participants were disappointed with the lack of formality. Overall most of the attendees enjoyed themselves and there was much discussion about Python and related topics. This was the largest group that has met up for the Thursday night meetings. <br />
<br />
The good news: We may have found a space for the upcoming 2nd Saturday Teaching Meeting. Joseph Weaver mentioned that John Zurawski had found a possible meeting space in McKinney. John got in contact with me and will try to contact some of the more veteran members and leaders of the group. From what I've heard of the location, it will be good for the larger teaching meetings. <br />
<br />
The challenge: With increasing attendees, Taco Cabana is difficult to have more than the most casual of meetings. I spoke with Jay, who said that this meeting has usually never been larger than 4 to 6 participants. If you wish to have a larger space that is more conducive to larger meetings, we're open for suggestions. Key things to consider are central location since we attract folks from 20 to 40 miles away. Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-37380484814105561472013-06-26T07:32:00.001-07:002013-06-26T07:32:48.898-07:00Another Iteration...Some time you have to throw a small script together to fix an issue. When you deal with third party data that's manually generated sometimes you have to take what they give you. <br />
<br />
I had to determine the start the field positions in a fixed record length text file by the position of the double quotes in the file. After some research I conclude to use a regular expression with an iterator. <br />
<br />
<code><br />
import re<br />
<br />
text = 'Some Long" Record with " lots of '<br />
pattern = re.compile('"')<br />
print [m.start() for m in pattern.finditer(text)]<br />
<br />
</code>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-88356705936256859012013-06-11T18:46:00.003-07:002013-06-11T18:50:32.660-07:00DFW Pythoneers Meeting June 8, 201316 Python Enthusiasts showed up at the Gravity Center for the meeting. This will be the last meeting at the Gravity Center, a new meeting space will need to be found.<br />
<br />
We discussed Python and other resources for young programmers. John discussed Cocos2d, Greg demoed Python koans, and Jeremy show how to install Pelican and deploy a static site on Amazon S3. Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-19191275229591429802013-05-12T14:02:00.000-07:002013-05-12T14:02:16.341-07:00DFW Pythoneers Meeting May 14, 2013<i>I like to talk about Python. - Kevin Horn</i><br />
<br />
11 Python Enthusiasts showed up at the Gravity Center for the <a href="http://www.meetup.com/dfwpython/events/115948392/?_af_eid=115948392&a=uc1_te&_af=event">meeting.</a> After brief introductions, Kevin was gracious enough to take charge of the presentation and demo <a href="http://www.virtualenv.org/en/latest/#">vitualenv</a> and <a href="http://www.pip-installer.org/en/latest/">pip</a>. <br />
<br />
Kevin also discussed various package repositories, <a href="http://www.lfd.uci.edu/~gohlke/pythonlibs/">Unofficial Windows Binaries for Python Extension Packages</a>, scipy, numpy, ipython, mingw and <a href="http://nuget.org/packages/chocolatey">Chocolatey</a> for Windows.<br />
<br />
Mention was made of the forth coming PyTexas conference in August and Pyvideo.org, where various talks and tutorial are available for view. <br />
<br />
The main focus of the second half of the meeting was data, databases and Python. Topics of discussion were: CSV, json, pickle, dbapi, SQLite, SQLAlchemy, sqlautocode and <a href="https://bitbucket.org/zzzeek/alembic">Alembic</a>.<br />
<br />
After the meeting a group of us went to Cafe Brazil for food and further discussion. <br />
<br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-33598943713749670512013-03-12T05:50:00.000-07:002013-03-22T13:13:05.219-07:00Netflix and PythonThere's a nice post on their technical blog about how <a href="http://techblog.netflix.com/2013/03/python-at-netflix.html" target="_blank">Netflix uses Python</a>.<br />
<br />
I found this section quite interesting:<br />
<br />
<blockquote class="tr_bq"><span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;">Data Science and Engineering</span><br />
<span style="font-family: Arial; font-size: 15px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"></span><br />
<span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">Our Data Science and Engineering teams rely heavily on Python to help surface insights from the vast quantities of data produced by the organization. Python is used in tools for monitoring data quality, managing data movement and syncing, expressing business logic inside our ETL workflows, and running various web applications to visualize data. </span><br />
<span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"></span><br />
<span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;">One such application is Sting, a lightweight RESTful web service that slices, dices, and produces visualizations of large in-memory datasets. Our data science teams use Sting to analyze and iterate against the results of Hive queries on our big data platform. While a Hive query may take hours to complete, once the initial dataset is loaded in Sting, additional iterations using OLAP style operations enjoy sub-second response times. Datasets can be set to periodically refresh, so results are kept fresh and up to date. Sting is written entirely in Python, making heavy use of libraries such as pandas and numpy to perform fast filtering and aggregation operations.</span></blockquote><br />
<br />
Here's the video from PyCon 2013: <a href="http://pyvideo.org/video/1743/python-at-netflix">http://pyvideo.org/video/1743/python-at-netflix</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-33146449535067382752013-03-11T11:12:00.004-07:002013-05-12T14:12:23.966-07:00Text processing: The Bottom LineYou get all types of data formats when you deal with clients and financial data. Some send you nicely delimited text files with an current data dictionary. Some send Excel files that look like the intern's preschooler designed them. But sometimes you end up with a report consisting of pages of fixed-width text designed to be print off on the green-bar paper printer by the office AS/400.<br />
<br />
If you need assistance in parsing text files, you can use commercial applications designed to handle the job like <a href="http://www.datawatch.com/dw-information-optimization-suite/datawatch-monarch-professional" target="_blank">Monarch</a>. There's also many tools and utilities designed to view and parse text files. Both <a href="http://www.hanselman.com/blog/ScottHanselmans2011UltimateDeveloperAndPowerUsersToolListForWindows.aspx" target="_blank">Scott Hanselman</a> and <a href="https://www.simple-talk.com/sql/database-administration/setting-up-a-data-science-laboratory/" target="_blank">Buck Woody</a> have detailed lists that you should peruse and explore. <br />
But let's our skills and tackle the problem programatically. <br />
<br />
The nice thing about many of the fixed-width text reports is they are very consistent in layout and organization, making them easy to parse. If they are generated from an accounting system that includes the GL, (General Ledger), account number on each row, then you probably have the key to pulling out the information needed on a periodical basis. Let's see an small <a href="http://www.treasurydirect.gov/govt/reports/tbp/tbp_1301.txt" target="_blank">example.</a><br />
<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0xyWA-dLKcbla-HyZz3Fo-bLS21g9OYROozCC7uMoBiBZ76g2rIwxOe9Aq9Ocx6Nt7xaR0O2Fim6Mf0DDVpFz_qSOOySUqTXaCxLhIpfZ2RQX0mRyve88D-LyNTKXiPRAj7FYtVEqIYUW/s1600/gl_example.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img alt="GL Example" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0xyWA-dLKcbla-HyZz3Fo-bLS21g9OYROozCC7uMoBiBZ76g2rIwxOe9Aq9Ocx6Nt7xaR0O2Fim6Mf0DDVpFz_qSOOySUqTXaCxLhIpfZ2RQX0mRyve88D-LyNTKXiPRAj7FYtVEqIYUW/s1600/gl_example.gif" height="175" title="" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Federal Borrowings Program </td></tr>
</tbody></table>Notice the layout is very regimented with nicely formatted columns, descriptive headers and unique account numbers. The normal way a novice handles this type of file is to hand edit it and then try to clean up the result in Excel. (Shudder!) This report's organization makes it easy to write a simple utility rip the needed values. Even if the file is in a printable "report" format with headers on each page, it's a simple task to ignore these rows by focusing on the ACCOUNT column.<br />
<br />
Sometimes you don't need every row since you don't want to load the data back into a database, you want to pull out specific totals and sub totals. It's easy enough to feed a list of account numbers or GL items to a routine, along with a list of position and widths of the account/items and the position and widths of the balances. You then end up with a dictionary, (<a href="http://docs.python.org/2/tutorial/datastructures.html#dictionaries" target="_blank">Python</a>, <a href="http://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.90).aspx" target="_blank">C#</a>), a data structure that you can reference for calcutions or export/return to be handled by another process. The process is something like this:<br />
<br />
<ol><li>Pass file name and list of items to a routine</li>
<li>Create a dictionary structure with the list of items as the key values</li>
<li>Read in each line of the file, looking for matching keys, (using position and width)</li>
<li>If match found, populate the value for the matching key, (using position and width)</li>
<li>Continue till done with file.</li>
<li>Export dictionary to files, do calulations, or whatever. </li>
</ol>Note: If you intend to do calculations on the values and wish to use them as numeric values, you will convert the text to numeric. This means you will probably have to clean up the currency characters and thousands separators. Easy to do in Python, but sometimes tricky in C#. In the case of C# include: <span style="color: blue;">using</span> <a href="http://msdn.microsoft.com/en-us/library/system.globalization(v=vs.90).aspx" target="_blank">System.Globalization</a>;<br />
<br />
Then use the following method: <br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoiKsAwVx95pYTDJ-pAcfqVkoBqOMWZ6cqUJJB9QuEFfUSmxsl3f3fSy2bqdN3KlFbkNPJ8jczk8mqKo-J2WJ4DnNAfxFTJjvXyV31xpJ63R7rMdoBqHThSpR8HnutkMsnUMMaAL72l3la/s1600/g2_example.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="public static decimal getFinancials" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoiKsAwVx95pYTDJ-pAcfqVkoBqOMWZ6cqUJJB9QuEFfUSmxsl3f3fSy2bqdN3KlFbkNPJ8jczk8mqKo-J2WJ4DnNAfxFTJjvXyV31xpJ63R7rMdoBqHThSpR8HnutkMsnUMMaAL72l3la/s1600/g2_example.gif" height="70" title="public static decimal getFinancials" width="640" /></a></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-59931334557401942282013-02-25T07:58:00.001-08:002013-03-12T10:00:50.322-07:00Keep it CleanAs a developer, I was paid the ultimate complement by a coworker last week.<br />
<br />
"Hey, that PowerShell script you wrote is really clean." he said in passing. <br />
<br />
This was a script that I had ported from another language, then tweaked and forwarded to him to help manage some server resources. It was an ugly hack originally intended to solve a personal need, but I re-factored it to make it more modular, simpler, readable and added a few key comments. <br />
<br />
The main benefit to clean code is that it's easy to come back months later and modified to suit your needs without having to do some sort of digital archaeology. But the best outcome is when others can use it and maintain it without any additional assistance from yourself.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-19596809116808894662013-02-25T06:33:00.000-08:002013-03-12T10:01:06.272-07:00Would you like some Data with that?<h4><span style="font-size: large;"><i> <span style="font-size: small;"><span class="st"><i>Water</i>, <i>water</i>, <i>everywhere</i>, Nor any <i>drop to drink</i>.</span></span></i></span> </h4>One thing that always bugged me about many of the talks and conferences I go to is the lack of good real world datasets and examples. There is the ubiquitous use of Adventureworks, which does fine for many demonstrations. Or the session based on the presenters experience with his employers assets, which are not accessible to the audience to view or play with. And there's the MVP speech with the sports statistics and the matching ball cap discussing ERA or passer efficiency rating to audiences from other cultures that follow different sports. And there's vendors that offer tools to generate sanitized datasets. If you need small or large datasets hopefully you don't always resort to these fall-backs since there are terabytes of interesting public data available on the Internet<br />
<br />
Open or "public" data as it is called as been around for years. Before the WWW was in the public spotlight, you could order various data sets and source code on physical media from vendors. Two decades later, with the acceptance of the Internet and the increase of bandwidth, there's a plethora of sources of a huge variety of data sets available. One good stopping point for an overview is <a href="https://explore.data.gov/" target="_blank">Data.Gov</a>, an aggregate of Open Federal Data sources and tools.<br />
<br />
Before you dive in and start grabbing collections of miscellaneous agricultural and health care stats from online sources, you need to have a idea of what type, quality and quantity of data set you are seeking. It's probably better to pick a domain that you have an understanding and experience in. And it doesn't hurt to select a data set that may solve a personal itch or business problem.<br />
<h4><i><span style="font-size: small;">What's the Frequency, Kenneth?</span> </i></h4>One of my favourite online databases to pull from is the FCC ULS database. The <a href="http://transition.fcc.gov/" target="_blank">FCC</a>, (Federal Communication Commission), is responsible from managing the RF, (radio waves) and other communication in the United States. The <a href="http://wireless.fcc.gov/uls/index.htm?job=about" target="_blank">ULS</a> (Universal Licensing System) is a system to keep track of licenses, frequencency allocations and other business related to the FCC. As an amateur radio license holder, it's a fun to keep track of my and several hundred thousands of other "ham" license holders. As a database professional it's a open, well documented source of<i> </i>real world addresses with which to test skills, geocoding and CASS certification. So let's grab the <i><span class="text-black-small"><a href="http://wireless.fcc.gov/uls/data/complete/l_amat.zip" target="_blank">Amateur Radio Service License</a> </span></i><span class="text-black-small">database</span><i><span class="text-black-small">.</span></i><br />
<br />
The<span style="font-size: small;"> license database (l<span style="font-size: small;">_mat.zip)</span> is <span style="font-size: small;">an <span style="font-size: small;"><span style="font-size: small;">archive</span> over </span></span>400 MBs in <span style="font-size: small;">size</span> <span style="font-size: small;">when </span>expanded, so make sure you have the <span style="font-size: small;">resources</span></span> to handle it. Once<i> </i>you have the data extracted, it's time to take inventory and break out the tool kit.<br />
<h3></h3>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-3342584256945551472013-02-25T06:28:00.003-08:002013-03-12T10:01:19.235-07:00Pragmatism vs Partisanship<br />
<em><strong>You ate Chinese food, so obviously you must hate Europeans...</strong></em> <br />
<br />
Sounds silly doesn't it? So was the type of reaction I got from a data professional when I showed him a new book on <a href="http://oreilly.com/shop/product/0636920023784.html" target="_blank">data analysis</a> that I was excited to add to my library. The software language didn't match his worldview or career investment, so I was labeled a "Microsoft basher". Which is silly since we were at event for users of Microsoft software, I was using a Windows phone and two out of the three operating system I was running on my laptop were Windows 7 and Windows Server 2012. And I spent much of the time taking notes in OneNote and discussing PowerShell 3 and SQL Server 12 with my cohort. <br />
<br />
And the irony of situation is that Microsoft and many of it's employees and advocates recognized that not all the great tools and goodness flows from the mother-ship in Redmond. <a href="http://blogs.msdn.com/b/buckwoody/" target="_blank">Buck Woody</a>, a author and well known Microsoft database and Azure evangelist recommends installing OSS text-handling utilities when setting up your <a href="https://www.simple-talk.com/sql/database-administration/setting-up-a-data-science-laboratory/" target="_blank">Data Science Laboratory.</a> Another well known Microsoft technologist, <a href="http://www.hanselman.com/blog/" target="_blank">Scott Hanselman</a>, suggests many third party tools and has a recent post discussing GitHub and line endings. With the existence of CodePlex,the inclusion of Git support in Visual Studio and offering Linux VMs on Azure, Microsoft is becoming more pragmatic and inclusive in regards to OSS. <br />
<br />
And OSS has growing garnering commercial support. Red Hat has been making money for years. VMware supports both commercial and OSS hosts and guest. Some of the projects on CodePlex get adopted by commercial companies. And data analysis tools featured in the book that seed of this post have commercial support from a company, <a href="http://continuum.io/index.html" target="_blank">Continuum Analytics,</a> which just received a grant from DARPA, to further develop their tools. <br />
<br />
So, while disappointed in the reaction I received from this individual, I still respect him and hope to demonstrate the power of using both OSS and Microsoft tools together to tackle some tough data problems. <br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-6804252492340900533.post-82961653726714351902013-01-03T11:21:00.000-08:002013-06-26T07:35:11.068-07:00Cleaning House, Shifting GearsI've always practiced safe internet. But over the holidays I ended up spending more time reinstalling software than writing code on my laptop. No matter how careful you are in browsing and locking your environment, there's a punk, a crook or TLA that has the kryptonite to your defenses. The vendors can't keep up with the vandals. The out-of-band security updates from Redmond and other vendor came out <em>one</em> day after my firewall reported strange outbound traffic and blocked it. <br />
<br />
This is something I anticipated would eventually happen. So I had backups and developed a plan of action. Since the laptop had to be zapped, I decided to remodel the contents. Instead of vitualizing Linux on a Windows 7 host, I reversed the roles. This allowed me to create a optimized Windows 7 image that can be used for various purposes. I already had Server 2008R2 and 2012, Windows 8 in VMs.<br />
<br />
Why Linux? I debated installing Windows 8 as the host for a nanosecond, but I really don't feel any love for that mess of a UI. The tile interface I like on my phone, doesn't work for me on my production desktop. Add the fact that Dell's recovery partition didn't recognized the hardware it's supposed to recover and Linux does. But switching host systems allowed me to play with VM's and set up the tools for my latest digital explorations.<br />
<br />
Back in the '90s I used to play with all the OS'es on the block. At one time I was using seven distinct operating systems as part of my job. At the house we had even more esoteric hardware and software. When it came along, Linux was a bear to setup and configure. Fast forward 15 years and a couple of generations of geeks, users and billions in corporate contributions later, Linux and much of the OSS universe has been refined and polished. <br />
<br />
Along with Linux, Android, the BSDs and their ilk, many of the programming and data tools have matured. The most interesting code comes from the scientific data community. And some of the best packages work with my favorite programming language: Python. Unknownnoreply@blogger.com0Dallas-Fort Worth Metroplex, TX, USA32.8334428466495 -96.76757812531.978106346649497 -98.058471625 33.6887793466495 -95.476684625