Introduction
I've always been a big believer in language variety, mostly as a means to sharpen one's skills and develop new ways of thinking about problems. This past year I decided to pick Python as the language to work with, where viable, for personal projects to break the usual run of languages I use at the office.
Python was a good choice for a number of reasons, most importantly is that it has a tremendous wealth of support libraries both in the standard library and at the cheeseshop. I also wanted a language I don't normally use, and outside of Conary recipes I hardly touch Python at all. There was another reason to choose Python however, and that was in the hope it would make me release more code.
You see we can't use Python at work, for a whole heap of reasons I'm not going to get in to now, but that simple fact made me believe that getting any code written in Python past the company lawyers would be a whole lot easier than Ruby or Haskell. And actually, I think it did as it removed one whole avenue of concern that is often raised with code releases and that is whether there have been any wholesale code reuse from internal projects. The so-called "intellectual property" clauses that are found in many developers contracts aren't that easy to bypass though, so I still haven't released as much as I'd like.
The truth is I've used Python for a whole lot of personal stuff over the past year, the graph below is generated based solely on the lines of code in each language(because I can't think of a better way). It is a pretty accurate representation of usage, as I keep almost every piece of code over a few lines in a Mercurial repository in case I want it again in the future.
The others block consists of a pretty even spread of C, Lua, Erlang, Vala, and ocaml. Vala has taken up its place in that group very quickly though, and I imagine it would have overtaken at least ocaml if I didn't place a cut off of December 31st on the data.
A few things aren't included in the metrics though, and they are the one liners that are created in the interactive shells or in that test.<language> file we all use so much. If a piece of code is useful in the shell I often generalise it and save it for future use, but not if it so small as to be of little use in the future. An example of the sort of thing that can be found in my IPython history file from today that was never copied in to my scripts repository is:
>>> from glob import glob
>>> from os.path import (exists, islink, realpath)
>>> print "\n".join([i for i in glob("/var/lib/packages/*/*/*")
... if islink(i) and not exists(realpath(i))])
All it does is print a list of broken links in my Gentoo binary packages trees, and it would take maybe 30 seconds to write again in the shell which is probably longer that it would take to generalise. Much beyond that, and it would have found its way in to the scripts repository. So, the above graph is likely to be representative of the year's code at least as far as lines of code can be.
The graph, and underlying data, do prove one other point to me — if my plan was to only use Python I have failed. However, that wasn't really the plan, I know there are times when other languages are just more expressive or suited to the task and I tend to choose them if that is the case. For example, one of the large Ruby projects was built around libraries that are only available for Ruby. And about a quarter of the Haskell code is proofs, a task I find Haskell most suitable for.
The effects
There have been a few effects that have surprised me, the biggest of all is quite how much my current full time project affects my train of thought(perhaps the lawyers are right to put all the "Intellectual Property" clauses in to contracts after all). Looking back through some of the code I can see there are significant chunks that read like Pythonised Haskell, right down to the purity of functions. And then there is a month or so that coincides with a large work project in Fortran that would make genuine Python developers rabid blood-lusting maniacs.
I didn't actually spot this effect myself, a friend noticed it after asking me for help with an idea and receiving "a marketable 4-function Python interpreter; map, reduce, filter and partial".
The most lucid demonstration I can imagine having filtered my way through a large chunk of the code base as a refresher would be the naïve factorial function. Following on from lots of C at the office I have a tendency to write recursive functions, even if I would normally consider them to be inappropriate for the task:
def fact(n):
if n <= 1:
return 1
else:
return n * fact(n-1)
In Python we'd have to raise the maximum recursion depth before we hit fact(1000), and the calculation is incredibly expensive. There is just some thought in the back my mind that drives me that way, in much the same way as you see people who use a heavy RegExp when simple substring match would work.
To complete this example there is a genuine example of the factorial function in a piece of code from November, and it is probably pretty easy to guess what I had been working in that week.
from operator import mul
fact = lambda n: 1 if n <= 1 else reduce(mul, range(2, n+1))
While reading Planet Haskell back in December I came across Adam Turoff's entry titled Taxicab Numbers. The first thought I had after reading it was to how readable the Python version would be, and the answer isn't that clear cut.
cube = lambda x: x**3
def taxicab(n):
# inclusive range wrapper
irange = lambda x: range(x, n+1)
return [(cube(a) + cube(b), (a,b), (c,d))
for a in irange(1)
for b in irange(a+1)
for c in irange(a+1)
for d in irange(c+1)
if (cube(a) + cube(b)) == (cube(c) + cube(d))]
If you spend much time around Haskell it probably reads okay, as it could have been converted with sed. However, if you show it to people who spend a lot of time with Python they tend to turn their nose up at it. And here lies the problem...
After a year, I've succeeded in writing a lot of Py-kell, lu-by and a degenerate pile of other monstrosities.
With the year in review like this I'm not yet sure I want to repeat the experiment afresh.
The love
With this past year being the first that I've written more than a few thousand lines of Python I've grown to love it. I enjoy the elegance of the syntax, the Cold War sized stockpile of functionality from the standard library and the almost mystical power of IPython as an interactive shell.
I may abuse some of the less than favoured functional support a lot, but I'm a huge fan of Python's implementation of generators and list comprehensions too.
decorators
I adore the simplicity of decorators, adding coverage analysis or timing to a function can be as easy as prefixing the definition with the appropriate @<decorator>. For example, a brittle, yet simple, function timing decorator could be implemented as:
from time import time
from sys import stderr
def time_func(function):
def wrap(*args, **kwargs):
begin = time()
result = function(*args, **kwargs)
end = time()
print >>stderr, "%s%s: %.3fs" % (function.func_name, args, end-begin)
return result
return wrap
You can also fashion a time_func decorator using the timeit module for more accurate results, but you have to be very careful if your functions have side effects. And for the more complex tasks it may make sense to just use the significantly more heavyweight profile module.
Standard library
What can you say about doctest? It is a truly fantastic module. The ease with which one can drop tests in to code with doctest is amazing. I use it, and probably in many cases overuse it, for everything. For interface documentation I tend to drop the tests straight in to the docstrings, for unit tests I often create new files for the tests as one might with the unittest module. I even use it for functional testing and a pseudo literate Python, thanks to the doctest.testfile runner.
It is great to be able to just write plain text requirements, and litter it with interface design samples when starting out on a new project. And later when the project is running they become both a part of the test suite and a guarantee that you maintain the documentation as the code changes.
I won't reproduce any examples here, but the Pythons on a plane page is a reasonable example of what I'm talking about. It is included as a usage example in the upoints package, and it is also executed as part of the test suite with ./setup.py test_doc.
Beyond that, I seem to have written 30 different scripts just for mangling mail over the past year, and without the excellent email and mimetools modules they would have required probably 50 times the code for the same functionality(and they would have no doubt been considerably more fragile too).
I've also done a lot of XML processing and Fredrik Lundh's ElementTree module is a great way to work with XML. One with which you have no excuse not to use now as it is in the standard library since the Python 2.5 release. I find its interface much cleaner and expressive than pretty much any other XML library I've used, irrespective of the language. And it is incredibly fast in its cElementTree form, and it is reasonably nippy in its pure Python version too..
Note
Another option you may wish to explore if you like ElementTree is the lxml module, which wraps libxml2 and libxslt in a mostly ElementTree compliant API. In my tests while it is much faster than the Python version of ElementTree it is much slower than cElementTree, however it is has more functionality and you may be willing to sacrifice the speed for its extra features in certain circumstances.
External packages
Outside of the standard library there are many excellent packages such as BeautifulSoup for handling the abominations of markup often found in the wild. It allows you to bring a standard DOM based approach back to your coding, when you'd normally end up with a error prone and often entirely broken RegExp "solution".
And as I mentioned Mercurial above, we have to mention it again as what is in my opinion a flagship Python project. One that definitely forces you to rethink any poor opinions on the language you may have, when you consider the functionality packed in to such a fast and small SCM.
Last, but not least, there is Werkzeug the fantastically functional Web anti-framework. I rewrote the front end of the project that spawned upoints with Werkzeug in one afternoon, and reduced the code size by some sixty percent from the custom solution. And in its current, closed audience, implementation it is serving 500 users from a Linksys NSLU2 with no problems.
...and mild disdain
Of course, there are some complaints I would make. No, I'm not going to complain about self because I think it was a great design decision. And if, like many others, your only complaint about self is that it is too much to type add a macro to your editor, for example in my editor of choice vim you could add the following snippet to your ~/.vimrc to expand \@ to self. in Python source.
autocmd FileType python imap <Leader>@ self.
Whitespace
There is the issue of significant whitespace though, and my complaint may surprise you. I think it is too lax, plain and simple. I love the enforced formatting, but it pains me to see the craziness that happens the moment somebody wraps parenthesis around code. If you're going to make whitespace significant go all the way, remove that backdoor to nastiness that I'm sure we've all seen used(and may even have used ourselves at times).
On top of that, -tt should have been the default operating mode, tabs and spaces should never be mixed. Once again your text editor or IDE should come to the rescue here and allow you to make your Python templates set #! /usr/bin/python -tt like mine.
Coding style
It took me about a week before I started enforcing strict coding guidelines, beyond the Style Guide for Python Code [PEP 8] basics I have some rules that are specifically aimed at diff sanity. Compare the following patch hunk:
--- test.py
+++ test2.py
@@ -1,4 +1,5 @@
test_dict = {
"Python": "Fun",
- "Vala": "Fresh"
+ "Vala": "Fresh",
+ "Scala": "Esoteric"
]
You're required to needlessly scan two lines when semantically only one has changed. Versus the following output with trailing commas in the dictionary:
--- test.py
+++ test2.py
@@ -1,4 +1,5 @@
test_dict = {
"Python": "Fun",
"Vala": "Fresh",
+ "Scala": "Esoteric",
]
Enhancements
I can think of quite a few enhancements I'd like to see made to Python, and I notice that other people feel the same about some of them as they are going to be made for Python 3000. Makes me feel like I'm mostly heading in the right direction.
The first that springs to mind is attribute access in print "functions". Not only is print a function in the Python 3000 betas, but the new format function handles attribute access in a cool way removing a lot of the excuses for the magic often seen following the % in current print statements.
And most importantly to me function annotations [PEP 3107], although I suspect their open-ended implementation will limit take up and usage. At the same time, the open-ended design allows for them to be simply used as developer documentation of type signatures without breaking duck typing and that is a good thing in itself.
I'm sure there will be times when the advantages of breaking duck typing far outweigh the costs, and instead of the usual clutter of isinstance calls throughout a function we can just wrap a type-checking-by-introspection decorator around a chunk of code(even if just for debug sessions).
The over simplified, for clarity, Python 3.0a1 example that checks only positional arguments and return types could be implemented like this:
from functools import reduce
from inspect import getfullargspec
from operator import mul
def type_check(function):
def wrap(*args):
argspec = getfullargspec(function)
arg_order = argspec[0]
annos = argspec[6]
for pos, arg in enumerate(arg_order):
if not isinstance(args[pos], annos[arg]):
raise TypeError("Invalid type for `%s' arg `%s' (should be %s)"
% (arg, args[pos], annos[arg]))
result = function(*args)
if not isinstance(result, annos['return']):
raise TypeError("Invalid type for return value `%s'(should be %s)"
% (result, annos['return']))
return result
return wrap
@type_check
def fact(n: int) -> int:
return 1 if n <= 1 else reduce(mul, range(2, n+1))
@type_check
def sum(x: int, y: int) -> int:
return x + y
When fact and sum are used we get the following results:
>>> print(fact(4))
24
>>> print(fact("test"))
Traceback (most recent call last):
...
TypeError: Invalid type for `n' arg `test' (should be <type 'int'>)
>>> print(sum(2, 2))
4
>>> print(sum(4, []))
Traceback (most recent call last):
...
TypeError: Invalid type for `y' arg `[]' (should be <type 'int'>)
People following along will notice there is a serious bug in all the factorial generating functions so far, and that is they produce invalid results when n < 0. The @type_check decorator may have fixed the bug that was allowing us to pass in floats, but it isn't the complete solution yet. However, it would be a piece of cake to define a subclass of int we can use in our annotations that allows us to check for positive integers and raise errors on negative values.
I know that a lot of heavy Python users seem to have a dislike for explicit type checking, but as the factorial example shows there are a lot of times when it is the simplest solution. It is also a solution that doesn't drastically reduce readability, and I believe the code has much higher clarity than the current Python fix that would require us to dump all our validity checks inside of the function body.
Silver scope sugar
And I have another issue of syntactic sugar, as I said above I'm a big fan of decorators and I make significant use of both @staticmethod and @classmethod on a regular basis. The sugary niggle I have is that when it is considered bad taste to not use self and cls in instance and class methods, why can't the naming guidelines be forced and the scope of methods be automatically deduced. I don't believe it breaks the "explicit is better than implicit" mantra, because you're specifying the use in the method signature already by using self and cls as the first argument.
Reread that again carefully, I'm absolutely not suggesting implicit self or other such nonsense as I truly believe its current explicitness dramatically decreases the effort for visual scope analysis. I'm only suggesting some mild syntactic sugar to remove the duplication of information in instance, class and static methods. That being said the current practice with decorators is definitely a lot better than the previous method of defining a method's scope with a reassignment such as method = staticmethod(method) buried some place in the code.
class Test:
def squared(self):
# instance method, because of self
def get_instances(cls):
# class method, because of cls
def from_string(string):
# static method, because string is neither self or cls
The only marginally common case I'm aware of that breaks the self convention are Conary build recipes, where r is the preferred name for the instance reference. And trust me, being a big Conary user, there is no great advantage in either increased clarity or saved time. I may even counter the opposite is true, because one becomes accustomed to looking for the standard names and occasionally misses the purpose of r in build recipes.
Conclusions
It's been a fun year, no really it has. I picked Python especially because I don't use it, and it has led to some tangible results. The whole purpose of the experiment was to engage myself in some challenges I don't normally come across and develop some new skills along the way, and I believe I've achieved that.
While I've come to love Python for its simplicity and expressiveness, over time the only irritations I've found are easily overcome. And in most cases are actively being overcome in the Python 3000 process.
I'm absolutely sure that I'll keep using Python now, and not only because Nokia provide a build for my phone as has been the base in the past. I guess I'll have to wait another year to see if it manages to displace any of the other languages from the graph above, I do suspect it may.
I'm planning on repeating the experiment this year, but I'm not sure what language to pick. Part of me wants to say "choose Scala", but unfortunately many of the architectures I use don't have a usable JRE available. I'd like to try F# on mono, but the runtime requirements seem a little too high for comfort for the simple tasks. I'm open to suggestions for the new direction, and you can drop me a mail to make suggestions. Or even to abuse me if you find an error in something I've written here, or can offer me a better way to accomplish one of the tasks I've described above.