Home / General / Python

Python

One of the downsides of working on a pre-startup project is that you really can’t say much about it. Seriously. You think Cryptonomicon seemed paranoid about security? You’ve just never met the folks who safeguard possible IP for college spin-outs. Yowza. And it’s a shame because some of this stuff is really rather nifty, and it’s been good to not only do some high-level design of low-level stuff, but also to get back to implementing in C and for high-capacity stuff as well.

However, side projects are totally fair game 😀

At the moment, most of my side-project time has gone into a quick script for the rifle club in college. It has to read in a text file and do some basic statistics on the data therein. PHP would blaze through this in a web setting, but to my mind, PHP is out of its depth when not running on a webserver so I thought something else would be more appropriate. Perl is certainly up to the task, as is Ruby and I’ve been wanting to learn Ruby for a while, but some upcoming PhD stuff requires me to know Python, so I figured this would be a good starting point for it, so apt-get install python and away I went. 

First, the problem. There’s a program my college rifle club uses called Kada, which tracks members names and details and scores and produces charts of the club’s “ladders”, a sort of running competition which tracks shooters’ performances over the year (there’s a small prize at the end). The problem isn’t Kada itself – it is pretty much a textbook case of how you develop a user interface, even if it is character-mode. Keith, who developed Kada, spent ages refining the user interface by actually using it and talking to other range officers in the club who used it as well and making changes and tweaking. The UI isn’t fancy and mousable, but it’s slick and efficient, and all the edge cases for using it are handled. That’s the kind of really solid work that takes time and effort to do and frankly, it’s always impressed me. Keith also did some work on a program to run competitions in the statistics office back in the days when we scored targets by hand. It also had a very slick and efficient UI but it also did some smart things – it let us project scores on a projector long before we had electronic targets or target scoring machines, and it also estimated what a shooter’s likely final score was as targets were scored.

The problem is that Keith’s not been actively developing Kada for a while and the current committee in the club wanted to make some slight changes to how the ladders were calculated. So I volunteered to take Kada’s data files, figure out how to extract the data entered with Kada and produce the ladder charts. The data files are text with a straightforward format, so reading in the data is simple enough. First read the member’s data file:
[cc escaped=”true” lang=”python”]
def readMembers():
    membersfile = open(“members.kda”, “r”)
    membersfile.readline()
    membersfile.readline()

    i = 1
    firstname = ‘ ‘
    while firstname:
        shooter = Shooter()

    #    members.kda file record format:
    #        surname
    #        firstname
    #        gender (m/f)
    #        alias (not usually used)
    #        student ID card seen? (y/n)
    #        experience level (three letters, Novice/Experienced/Advanced, in Target, Air, Sporter)
    #        course
    #        year
    #        unknown

        surname = membersfile.readline().rstrip()
        firstname = membersfile.readline().rstrip()
        shooter.name =  firstname + ‘ ‘ + surname
        shooter.gender = membersfile.readline().rstrip()
        shooter.alias = membersfile.readline()
        shooter.idcardseen = membersfile.readline().rstrip()
        shooter.experience = list(membersfile.readline().rstrip())
        shooter.course = membersfile.readline().rstrip()
        shooter.year = membersfile.readline().rstrip()
        membersfile.readline()

        member[i] = shooter
        i += 1

    membersfile.close()[/cc]
This just reads in the file and creates a dictionary of Shooter objects (they’re not really an object as such, I’m just emulating a C struct sort of idea there), which is actually a global in the script. We discard the initial two lines in the file (I don’t actually know what Keith stores there apart from the season – 2007/8 or whatever). The readline() approach is a bit awkward and not very Pythonic methinks.

Next, read in the scores file, and add the scores as a sequence to the member[] dictionary. Slightly complicated by the air and smallbore rifle scores being in the same file, and I haven’t yet figured out a clean way to use Python’s for line in file idiom to cover just a part of the file.
[cc escaped=”true” lang=”python”]
def readScores():
    scoresfile = open(“scores.g1”, “r”)
    x = False
    y = 3
    for line in scoresfile:
        if y == 0:
            line = line.strip()
            if line:
                scores = []
                shooter = Shooter()
                for s in line.split(‘ ‘):
                    scores.append(int(s))

                id = scores[0]

                if x:
                    # Air Rifle scores
                    member[id].scores_air = scores[1:]
                    member[id].mean_air = stats.mean(scores[1:])
                    member[id].oldKADAannual_air = oldKADAannual(member[id].scores_air)
                    member[id].oldKADA_air = oldKADA(member[id].scores_air)
                    member[id].newKADAannual_air = newKADAannual(member[id].scores_air)
                    member[id].newKADA_air = newKADA(member[id].scores_air)
                else:
                    # Target Rifle scores
                    member[id].scores_target = scores[1:]
                    member[id].mean_target = stats.mean(scores[1:])
                    member[id].oldKADAannual_target = oldKADAannual(member[id].scores_target)
                    member[id].oldKADA_target = oldKADA(member[id].scores_target)
                    member[id].newKADAannual_target = newKADAannual(member[id].scores_target)
                    member[id].newKADA_target = newKADA(member[id].scores_target)
            else:
                x = True
        else:
            y = y – 1
    scoresfile.close()[/cc]
That’s not horrible. It’s certainly less clumsy than the readline() approach when reading the members file.

So at this stage the data’s in. Turns out, calculating the ladder averages is exceptionally compact in Python. First, I wanted to duplicate Kada’s original algorithm:
[cc escaped=”true” lang=”python”]
def oldKADAannual(scores):
    “””Calculates the shooter’s ladder average for the whole year under
    the old KADA algorithm of dropping the lowest card out of every six shot
    “””
    tmpscores = sorted(scores)
    n = len(tmpscores)//6
    tmpscores = tmpscores[n:]
    return stats.mean(tmpscores)[/cc]
Four lines of code. I do love scripting languages 😀 Now for the new algorithm:
[cc escaped=”true” lang=”python”]
def newKADAannual(scores):
    “””Calculates the shooter’s ladder average for the whole year under
    the new KADA algorithm of dropping the lowest and highest cards out
    of every eight shot
    “””
    tmpscores = sorted(scores)
    n = len(tmpscores)//8
    if n >= 1:
        tmpscores = tmpscores[n:-n]
    else:
        if len(tmpscores) == 7:
            tmpscores = tmpscores[1:]
    return stats.mean(tmpscores)[/cc]
Again, very compact with only eight lines of code (and really, it’s hard to count some of them as actual lines as such 😀 ). There are two other variants of these for calculating the running averages rather than the end-of-year averages, but let’s ignore them. They’re equally short.

Calculating the ladders themselves is even shorter, at least on a per-ladder basis. There are 16 ladders all told: Target/Air x Novice/Experienced/Advanced/Overall x Running/Final. Looking at just the code for two (the others are basicly more of the same:
[cc escaped=”true” lang=”python”]
def calculateLadders():
# Novice Air Ladder
for id, m in member.iteritems():
if hasattr(m,’scores_air’):
if m.experience[1] == ‘N’:
NAL[m.newKADA_air] = id

# Final Novice Air Ladder
for id, m in member.iteritems():
if hasattr(m,’scores_air’):
if m.experience[1] == ‘N’:
if len(m.scores_air) >= 3:
NALannual[m.oldKADAannual_air] = id
[/cc]
Very straightforward really, the ladders are just dictionaries of id numbers with the keys being the actual ladder averages. That way, to print them out, you just sort the keys, pull the first key from that list and look up the id and that id is the key to your shooter object with all the data to play with. So that’s the calculation done. Now, the output.

Initially, I wanted some very basic output of the ladders, more for test reasons than anything else. Kada just creates plain ASCII text files which we dump to the printer like so:

              NOVICE 10-METRE AIR RIFLE LADDER Final Ladder 9/05/08

                                         CARDS          BEST
                    RANK    NAME          SHOT AVERAGE  CARD

                     1. J.D'Plumber        26   86.864   92
                     2. J.D'Plumber        45   86.500   94
                     3. J.D'Plumber        18   85.067   92
                     4. J.D'Plumber        16   84.857   91
                     5. J.D'Plumber        36   81.300   91
                     6. J.D'Plumber        31   79.846   87
                     7. J.D'Plumber         7   74.000   82
                     8. J.D'Plumber        16   65.429   84
                     9. J.D'Plumber         3   64.000   71
                    10. J.D'Plumber        10   63.444   78
                    11. J.D'Plumber         4   61.750   76
                    12. J.D'Plumber         3   53.667   62

So I set up to do something similar. It’s not a perfect match, but I just wanted to be able to compare the old output and mine to know my math was good:
[cc escaped=”true” lang=”python”]
def printLadder(ladder, description, discipline):
    keys = ladder.keys()
    keys.sort()
    keys.reverse()
    i = 0
    j = 0
    print description
    for j in range (1, len(description)):
        print ‘-‘,
    print ‘\n’
    for key in keys:
        if discipline == ‘air’:
            scores = member[ladder[key]].scores_air
        else:
            scores = member[ladder[key]].scores_target
        if len(scores) >= 3:
            i = i + 1
            print ‘%2d ‘ % i,
        else:
            print ‘   ‘,
        print ‘%20s ‘ % member[ladder[key]].name,
        print ‘%4d ‘ % len(scores),
        print ‘%5.3f ‘ % key,
        print ‘%4d ‘ % max(scores),
        if len(scores) < 3:
            print ‘*’,
        print
    print ‘\n\n'[/cc]
And the output (with the names redacted for their privacy):

Novice Air Ladder
- - - - - - - - - - - - - - - -

 1         Joe D'Plumber    45  91.833    94
 2         Joe D'Plumber    36  87.500    91
 3         Joe D'Plumber    18  87.167    92
 4         Joe D'Plumber    16  85.833    91
 5         Joe D'Plumber    26  85.167    92
 6         Joe D'Plumber    31  81.167    87
          Tito D'Builder     2  76.000    76  *
 7         Joe D'Plumber     7  74.000    82
          Tito D'Builder     2  69.000    75  *
          Tito D'Builder     2  67.000    68  *
          Tito D'Builder     2  66.000    70  *
          Tito D'Builder     1  64.000    64  *
 8         Joe D'Plumber    10  63.167    78
          Tito D'Builder     1  62.000    62  *
 9         Joe D'Plumber     4  61.750    76
          Tito D'Builder     2  61.500    72  *
          Tito D'Builder     1  60.000    60  *
          Tito D'Builder     2  55.500    71  *
          Tito D'Builder     2  55.000    58  *
10         Joe D'Plumber     3  53.667    62
          Tito D'Builder     1  52.000    52  *
          Tito D'Builder     1  50.000    50  *
          Tito D'Builder     1  49.000    49  *

Which is unfancy and basic, but does the job. You’ll note the lists aren’t identical – in the Kada output, there’s only Joe D’Plumber, but mine has several entries from Tito D’Builder as well (thank the Daily Show for the names btw). The reason is that the official rules of the ladder say you must have at least 3 cards shot to enter; Tito hasn’t had three cards shot, so he’s listed on the chart (which is posted each week) to see where he is in relation to those who are entered, but with no rank and an asterisk to point out that his score isn’t official yet.

This isn’t bad, but the thing is that I wanted to be a bit fancier with the output. Enter the ReportLab library, which lets Python generate PDFs and some simple code to generate sparklines, and now the ladder prints to a PDF file, Tito is in a smaller, italic, gray font so that Joe stands out more, and Joe has sparklines of his scores showing his (hopefully) upward progress through the year and with his high point highlighted. The PDF isn’t quite as I’d like it yet, but I’ll post up a snapshot when I get it right.

The important thing about all this though, is that I never did any Python programming before last week; it’s taken the very limited free time I’ve had over five days (while under the gun for a major demo during the day) to learn Python and do something useful and a bit complicated with it (generating a PDF report with custom graphics and typography). All told, maybe eight to ten hours. I’m really quite impressed with Python – I didn’t think much of it prior to now because frankly I had hassles with how the indentation thing sat in my head, but the truth is that once you’ve started (and gotten frustrated, cursed a bit, googled vim customisations for python and installed the main two or three and restarted) it really does just go away and you see through it to the code itself. Which is clean, powerful, and very easy to read. So far I’ve not run into any real warts, though I will admit to having used an in-scope import because I couldn’t figure out how to use Python’s namespacing to instantiate an Image object from the Imaging library in the same script that I was using Image objects from the ReportLab library. And even the stranger idioms like ”.join() aren’t horrible. It’s quite a relief really because I was planning on using SAGE in my PhD stuff and this does indicate that it’ll be a lot easier than I was anticipating.

Next up, once this is done and made into an executable using py2exe, will be PyQt. I’ve bought this new toy for the RCMS project you see… but that’s another post 😀

14 Comments

  1. Python is wonderful. Incidentally, I also first started using it because PHP for non-server scripting felt icky.

    I believe the Pythonic solution to both the awkward readlines and the kludgy way of dropping the first few lines of a file is islice in the itertools package.

  2. keys.sort()
    keys.reverse()

    should be:
    keys.sort(reverse=True)

  3. You should get comfortable with list comprehensions. They make code compact when you are looping and appending onto a list.

    Instead of
    scores = []
    for s in line.split(”): scores.append(int(s))

    you can do
    scores = [int(score) for score in line.split()]
    or even more compactly
    scores = map(int, line.split())
    although many woud perfer the list comp.

  4. I couldn’t figure out how to use Python’s namespacing to instantiate an Image object from the Imaging library in the same script that I was using Image objects from the ReportLab library.

    instead of “from foo import bar”, use “import foo.bar” or “from foo import bar as baz”.

  5. The layout of the Kada files is pretty consistent and straightforward – I’d be tempted to run bulk inserts with them (or even treat them as linked tables) and have them in a db rather than read in from a text file each time. That would take up even less code and be neater still. (Mind you I am thinking of this from an SQL server perspective, but I presume mySQL could work in a similar fashion

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.