ReportLab

As I mentioned before, after writing a python script to read in Kada’s data files on the rifle club shooters’ scores and calculate new ladders, the next step is output that’s a bit fancier than the straight ASCII text dump:

<span style="color: #008000;">Novice Air Ladder
- - - - - - - - - - - - - - - -

 1         Joe D'Plumber    45  91.833    94
 2         Joe D'Plumber    36  87.500    91
 3         Joe D'Plumber    18  87.167    92
 4         Joe D'Plumber    16  85.833    91
 5         Joe D'Plumber    26  85.167    92
 6         Joe D'Plumber    31  81.167    87
          Tito D'Builder     2  76.000    76  *
 7         Joe D'Plumber     7  74.000    82
          Tito D'Builder     2  69.000    75  *
          Tito D'Builder     2  67.000    68  *
          Tito D'Builder     2  66.000    70  *
          Tito D'Builder     1  64.000    64  *
 8         Joe D'Plumber    10  63.167    78
          Tito D'Builder     1  62.000    62  *
 9         Joe D'Plumber     4  61.750    76
          Tito D'Builder     2  61.500    72  *
          Tito D'Builder     1  60.000    60  *
          Tito D'Builder     2  55.500    71  *
          Tito D'Builder     2  55.000    58  *
10         Joe D'Plumber     3  53.667    62
          Tito D'Builder     1  52.000    52  *
          Tito D'Builder     1  50.000    50  *
          Tito D'Builder     1  49.000    49  *
</span>

This does the basic job that the original system did (actually, it does a bit more – the asterisks mark out those shooters who haven’t yet shot enough cards to get on the ladder, but they’re still listed as an incentive for them to shoot more cards – the current system doesn’t do this). It’s not really doing all that can be done, however, and it’s certainly not all that fancy-looking. Especially in a scripting language, where the whole point is to do fancy stuff quickly through toolkits. So, what I wanted was PDF output (because these are printed off and posted up in the club), graphics like logos and so on, but also some graphs and charts with some meaningful data.

One of the graphs I wanted to include was one of Edward Tufte’s many good ideas, sparklines. Small graphs which summarise the state of play of a variable in an easy to read, inline format (meaning that it’s in the flow of the text itself as if english had suddenly become a pictographic language for a moment. They seemed perfect to show a high-level view of how the shooters were doing over the course of the year. Also, it would be nice to display the various breakdowns and analysis of membership (by gender, experience, college year, etc) in a graphical form – there’s nothing wrong with the raw data, but it’s almost always easier and faster to take in analysis that has a graphical expression. So pie charts and such would be an improvement.Right, so I need a PDF generation library and a graphing library, and after some casting about and looking up different options, I settled on ReportLab to generate the PDF because it could do both high and low level generation (more on that in a moment) and because it has some well-written documentation. For the graphs and charts, MatPlotLib seemed to have the best combination of documentation and features (though there are others with wide followings). I’ll write a bit more about MatPlotLib in a future post and just focus on ReportLab for now.

PDF generation turns out to be conceptually very easy with ReportLab, both at high and low levels. At low level, the page is a canvas and you simply draw lines and place text on it using the pdfgen section of the library. I’d guess it would be very useful for small jobs, or for fine detail; but for this project pdfgen’s a bit too fine-grained. Reportlab’s high level document creation model platypus (Page Layout and Typography Using Scripts) is more like it – it’s based on the TeX model of creating documents (ie. specify the content, let the software do the layout properly for you).

In this case, I want two documents, one for the actual ladders and one for the analysis and breakdowns of membership numbers. Platypus requires that you create an actual document object (in this case I use their base template, but custom templates are possible too), and each document will require you to create and populate a list of elements within that document, as well as any layout information, before generating the document itself.  So to start with, two lists, two document objects, and setting of some basic layout stuff:
[cc escaped=”true” lang=”python”]
def prepareReport():
LadderDocElements=[]
ReportDocElements=[]

ladderDoc = platypus.BaseDocTemplate(‘ladders.pdf’)
ladderDoc.leftMargin = 0.5*inch
ladderDoc.rightMargin = 0.5*inch
ladderDoc.topMargin = 0.5*inch
ladderDoc.bottomMargin = 0.5*inch
ladderDoc.width = ladderDoc.pagesize[0] – ladderDoc.leftMargin – ladderDoc.rightMargin
ladderDoc.height = ladderDoc.pagesize[1] – ladderDoc.topMargin – ladderDoc.bottomMargin

reportDoc = platypus.BaseDocTemplate(‘report.pdf’)
reportDoc.leftMargin = 0.5*inch
reportDoc.rightMargin = 0.5*inch
reportDoc.topMargin = 0.5*inch
reportDoc.bottomMargin = 0.5*inch
reportDoc.width = reportDoc.pagesize[0] – reportDoc.leftMargin – reportDoc.rightMargin
reportDoc.height = reportDoc.pagesize[1] – reportDoc.topMargin – reportDoc.bottomMargin[/cc]
Nice and simple. Two lists of elements, one for each document (LadderDocElement and ReportDocElement); two actual Document objects; and some basic margin setup.

There is some additional setup to arrange the layout of frames on the page that the elements will plug into, and it looks a little complex because I’ve got two different layouts (one and two column), but to be honest the lack of readability is more down to formatting issues than code complexity:
[cc escaped=”true” lang=”python”]
interFrameMargin = 0.5*inch
frameWidth = ladderDoc.width/2 – interFrameMargin/2
frameHeight = ladderDoc.height – inch*0.6
framePadding = 0*inch

frames = []

leftMargin = ladderDoc.leftMargin
titlebar = platypus.Frame(leftMargin, ladderDoc.height, ladderDoc.width, 0.75*inch, leftPadding=0, rightPadding=0, topPadding=0, bottomPadding=0, showBoundary=1)
frames.append(titlebar)

column = platypus.Frame(leftMargin, ladderDoc.bottomMargin, frameWidth, frameHeight)
frames.append(column)

leftMargin = ladderDoc.leftMargin + frameWidth + interFrameMargin
column = platypus.Frame(leftMargin, ladderDoc.bottomMargin, frameWidth, frameHeight)
frames.append(column)

soloframe = platypus.Frame(ladderDoc.leftMargin, ladderDoc.bottomMargin, ladderDoc.width, ladderDoc.height)

[/cc]
So we have frames and soloframe, the former a two-column layout with a titlebar, the latter a single frame taking up the whole of the page between the margins. Later on in the file (which is now in dire need of refactoring for readability’s sake, which is the next thing on the project to-do list), those frame layouts are fed into the document objects and the document generated.

There’s also a helper function or two defined within prepareReport() to manage simple tasks like header text and preformatted chunks, these are mainly left over from swiped demo code, but they do show rather well how text gets added into the document.
[cc escaped=”true” lang=”python”]
def header(Elements, txt, style=HeaderStyle, klass=platypus.Paragraph, sep=0.3):
s = platypus.Spacer(0.2*inch, sep*inch)
Elements.append(s)
style.alignment=1
style.fontName=’Helvetica-Bold’
style.fontSize=18
para = klass(txt, style)
Elements.append(para)

def pre(txt):
s = platypus.Spacer(0.1*inch, 0.1*inch)
p = Preformatted(txt, PreStyle)
precomps = [s,p]
result = KeepTogether(precomps)
return result
[/cc]
That’s pretty much boilerplate sort of stuff, not hugely complicated.

A handy point is the showBoundary option, which shows the outlines of frames (handy for initial design work).
[cc escaped=”true” lang=”python”] ladderDoc.showBoundary = True
[/cc]
Once all that boilerplate and intial layout work is done, it’s just a matter of adding elements to the various documents, which consists mainly of repeated chunks of code like this:
[cc escaped=”true” lang=”python”]
header(LadderDocElements, ‘Air Rifle Ladders’, sep=0)
header(LadderDocElements, str(datetime.date.today()), sep=0)
LadderDocElements.append(platypus.FrameBreak())
header(LadderDocElements, ‘Overall’)
LadderDocElements.append(pdfLadder(OAL, ‘air’))
LadderDocElements.append(platypus.FrameBreak())
[/cc]
In this case I’ve wrapped up the actual ladder generation a bit for clarity, but I’ll get to that in a second. Once all the elements are appended to the list, you add the frame layouts and tell the document to build itself (as mentioned above) and that’s it – the library goes off, builds the PDF and saves it to the file you specified at the start.
[cc escaped=”true” lang=”python”] ladderDoc.addPageTemplates([platypus.PageTemplate(id=’TwoColumn’,frames=frames),platypus.PageTemplate(id=’Normal’,frames=soloframe)])
ladderDoc.build(LadderDocElements)
[/cc]
It’s not that much work generating the actual ladder either; platypus has high-level table routines as well, and it’s just a case of supplying a list of data elements to them and then choosing what style each column/row set is rendered in.
[cc escaped=”true” lang=”python”]
def pdfLadder(ladder, discipline):
data = ladderData(ladder, discipline)
t = platypus.Table(data)
ts =platypus.TableStyle([(‘FONT’, (0, 0), (-1, -1), ‘Helvetica’, 8)])
ts.add(‘ALIGN’, (0, 0), (-1, -1), ‘CENTRE’)
ts.add(‘ALIGN’, (0, 0), (0, -1), ‘RIGHT’)
ts.add(‘ALIGN’, (5, 0), (5, -1), ‘LEFT’)
ts.add(‘TEXTCOLOR’, (0, 0), (-1, -1), colors.black)
ts.add(‘FONT’, (1, 0), (1, -1), ‘Helvetica-Bold’, 8)
ts.add(‘FONT’, (0, 0), (-1, 0), ‘Helvetica-Bold’, 9)
ts.add(‘LINEBELOW’, (0, 0), (-1, 0), 0.5, colors.black)
ts.add(‘LEFTPADDING’, (0, 0), (-1, -1), 2)
ts.add(‘RIGHTPADDING’, (0, 0), (-1, -1), 2)
ts.add(‘TOPPADDING’, (0, 0), (-1, -1), 0.25)
ts.add(‘BOTTOMPADNG’, (0, 0), (-1, -1), 0.25)
ts.add(‘TOPPADDING’, (0, 1), (-1, 1), 3)
ts.add(‘BOTTOMPADDING’, (0, 1), (-1, 1), 2)

i = 0
for d in data:
if d[1] == None:
ts.add(‘FONT’, (0, i), (-1, i), ‘Helvetica-Oblique’, 8)
ts.add(‘TEXTCOLOR’, (0, i), (-1, i), colors.gray)
i = i + 1
t.setStyle(ts)
return t
[/cc]
Again, the actual data formatting is wrapped up for clarity, but it isn’t very complex:
[cc escaped=”true” lang=”python”]
def ladderData(ladder, discipline):
data = []
keys = ladder.keys()
keys.sort()
keys.reverse()
i = 0
j = 0

row = [‘Name’, ‘Rank’, ‘Cards’, ‘Running\nAverage’, ‘Overall\nAverage’, ‘Range’, None]
data.append(row)
for key in keys:
row = []
shooter = members[ladder[key]]
if discipline == ‘air’:
scores = shooter.scores_air
else:
scores = shooter.scores_target
row.append(shooter.name)
if len(scores) >= 3:
i = i + 1
row.append(i)
else:
row.append(None)

row.append(len(scores))
row.append(‘%5.3f’ % key)

if discipline == ‘air’:
row.append(‘%5.3f’ % shooter.oldKADAannual(‘air’))
else:
row.append(‘%5.3f’ % shooter.oldKADAannual(‘target’))

row.append(‘%d -> %d’ % (min(scores), max(scores)))

if len(scores) > 2:
plotLadderSparkline(ladder[key], ladder, discipline)
file = ”.join([‘tmp/’,discipline,str(ladder[key]),’.png’])
image = platypus.Image(file)
row.append(image)
else:
row.append(None)

data.append(row)

return data
[/cc]
At the end there is the first instance of graphing. Rather than trying to hook MatPlotLib into ReportLab tightly, the easier option is to simply generate individual png image files for each sparkline (and the same method is used for all the other graphics as well) and to save them all in a temporary directory which can be either deleted or re-used after the pdf is generated. It’s not too hard to think of advantages to this – you could upload the images to the club webserver and have both PDF and webpage formats for reports and ladders and so on. As I said, I’ll go into this in more detail in a later post.

I’m not really overly happy with the code as yet though. It’s rambling in places, downright ugly in others, and I really want to do a job refactoring it before putting it up anywhere. This whole series of blog posts isn’t really supposed to be saying “look at this incredible python coding”, it’s meant to be saying “look how fast you can do actual, useful work” – in this case the actual need comes from a sports club to be sure, but it’s an actual need nonetheless. And that you can take python with no prior experience and put something like this together inside of about 20 hours of playing with it for the first time, that’s a rather excellent recommendation really.  I’m definitely going to be keeping python in the toolbox for future projects – in fact I’m already playing with using PyQT for the Range Officer Report part of my RCMS project. More on that later as well, but I’m already finding it’s a nifty little tool for doing GUIs quickly. Whether or not it’ll be fast enough on the new toy to be usable remains to be seen, of course.

7 comments

  1. Mark,

    Do you have an example of the output as it stands at the moment? I’ve been following this little series with interest and I’m keeping note of it all as I may be faced with a similar enough problem in the next 12 – 18 months at a club I’m involved with myself (not air rifles though).

    Cheers,
    Rob

  2. Another option is to generate LaTex and use pdflatex

  3. The problem with LaTeX and pdflatex (or even just dvips followed by ps2pdf) is that it’s a whole additional toolchain to be installed on the club computer. Reportlab’s just another library so I just have to worry about the python install and that’s it.

  4. The only examples I have handy have personal data in them Rob (names, etc); I’ll mung the test data I’m using to change names and such and I’ll post the output from that data.

  5. Done! They’re up in a new post here.

  6. […] of wading through the ReportLab documentation enjoyable.  Furthermore, although I have found some 3rd party documentation, there is really not too much information out […]

  7. Hi,
    Can we append data to an existing pdf file using reportlab?

    Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *


Stochastic Geometry is Stephen Fry proof thanks to caching by WP Super Cache