A few ‘bits’ of python

Bit twiddling has always mesmerized me, or rather, watching others manipulate raw data (at the lowest level possible) is what was so mesmerizing. This is mostly because it seems magical to me. Not coming from a CS degree background I’ve always been attracted to abstractions and libraries that shield me from dealing with system-level data and components.

A recent project required me to take a text file full of csv records and compress each one into a six byte structure.

Legend:
| = byte separator
. = one bit
<----> = space consumption of numeric component

Binary structure of one record:

 <--------------------><--------------------> <------>
|........|........|........|........|........|........|

As you can see, there are two numeric components that take up to 2.5 bytes (20 bits) of space and one 8-bit component. One example csv record looked something like this:

129,-42,54038

Each number (component) expressed in binary using the built-in python function bin() looks like this:

>>> bin(129)
'0b10000001'
>>> bin(-42)
'-0b101010'
>>> bin(54038)
'0b1101001100010110'

Now, try running this line of code in your interpreter:

>>> bin(42)

I’ll wait while you do that…

I’m still waiting.

Ok, did you notice how the result of bin(-42) from earlier looks just like the result for bin(42) just with a leading negative sign (-)? That’s actually really unhelpful in visualizing the end result. What would be really nice to have seen in the beginning is the number represented in “two’s compliment binary”. For reasons I don’t fully understand python differs in its visual representation from its actual implementation for this function’s output. An optional parameter to the bin function that specified “two’s compliment” would be awesome, but it doesn’t exist. So, I wrote some that would show the “two’s compliment” representation of negative numbers:

>>> bits(3, 8)
'00000011'
>>> bits(-3, 8)
'11111101'

Compare the result above with the built-in:

>>> bin(-3)
-0b11

Here’s the implementaion of my bits() method, complete with documentation:

def bits(number, size_in_bits):
    """
    The bin() function is *REALLY* unhelpful when working with negative numbers.
    It outputs the binary representation of the positive version of that number
    with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
    complement binary of a negative number:

        complement(bin(+n - 1))

    `complement` is a function that flips each bit. `+n` is the negative number
    made positive.

    """
    if number < 0:
        return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
    else:
        return bin(number)[2:].rjust(size_in_bits, '0')

def compliment(value):
    return ''.join(COMPLEMENT[x] for x in value)

COMPLEMENT = {'1': '0', '0': '1'}

So, now that I could see the bits as they really were I came up with the following description of the scenario. Take each number visualized using the available space:

>>> bits(129, 8)
'10000001'
>>> bits(-42, 20)
'11111111111111010110'
>>> bits(54038, 20)
'11111111111110010001'

No concatenate them in a 48-bit, or 6-byte structure, with bits(54038, 20) ocupying the left-most position, followed by bits(-42, 20), then finally bits(129, 8):

           <--------------------><--------------------> <------>
          |........|........|........|........|........|........|
Expected:  11111111 11111001 00011111 11111111 11010110 10000001

So now, what we need is a sequence of bitwise operations that will produce the 6-byte structure from the 8-bit component and the 2 20-bit components.

Let’s start simple.

We are going to need to do some bit shifting in order to get the 20-bit components to the left of the 8-bit component, which needs to be on the far right.

So, let’s just work with 129 (8-bits) and -42 (20-bits):

>>> bits(-42 << 8 | 129, 48)
'111111111111111111111111111111111101011010000001'

As you can hopefully see, the 129 takes up the right-most 8 bits. The 129 takes up the rest.

Now, the tricky part is getting the 20 bits from 54038 in there without messing up what we’ve already done. Without any more messing around, here’s the solution diagrammed:

Start with bits(54038):

>>> bits(54038)  # showing prettier result for the purpose of illustration
00000000 00000000 00000000 00001111 11111111 10010001

Now shift left 28 bits (<<):

>>> bits(54038 << 28, 48)
00001101 00110001 01100000 00000000 00000000 00000000

Now the tricky part, which makes the next step possible:

>>> bits(54038 << 28 ^ 268435455, 48)
00001101 00110001 01101111 11111111 11111111 11111111

What we just did was a “bitwise exclusive or (^)” using a mask. In this case the mask was 268435455. What’s so special about that number? bits(268435455, 21) shows that in binary, that number is 21 ones in a row. Try removing the bitwise exclusive or using the mask and you’ll see things go bonkers in a hurry.

Now, lets prepare the -42 component:

>>> bits(-42, 48)
11111111 11111111 11111111 11111111 11111111 11010110

Shift it over 8 to make room for the 129 later:

>>> bits(-42 << 8, 48)
11111111 11111111 11111111 11111111 11010110 00000000

Now, add that result into what we had previously using a “bitwise and (&)”:

>>> bits(54038 << 28 ^ 268435455 & -42 << 8, 48)
00001101 00110001 01101111 11111111 11010110 00000000

Can you see both numbers in that structure? Now, add in the 129 on the end (or the beginning if you’re thinking in binary) using a “bitwise or (|)”:

>>> bits(54038 << 28 ^ 268435455 & -42 << 8 | 129, 48)
00001101 00110001 01101111 11111111 11010110 10000001

Phew! Now, we really don’t need the bits function anymore:

>>> result = 54038 << 28 ^ 268435455 & -42 << 8 | 129, 48

Now it was just a matter of creating a function that accepted the 3 numeric inputs and calculated the result using the above operators and mask. The last step was to output the raw bytes of the result:

>>> import struct
>>> b = bytearray(struct.pack('q', result))[:6]

And now, one last sanity check:

# working left to right:
# 00001101 00110001 01101111 11111111 11010110 10000001

>>> bits(b[0], 8)
'10000001'
>>> bits(b[1], 8)
'11010110'
>>> bits(b[2], 8)
'11111111'
>>> bits(b[3], 8)
'01101111'
>>> bits(b[4], 8)
'00110001'
>>> bits(b[5], 8)
'00001101'

As expected, we have 6 bytes that match up with our expected bit array. I must admit that it took quite the reading of the python bitwise operator docs as well as a lot of expirementation once I had created the bits function to find the right order of operations. But my understanding of bitwise operations is much more solid as a result. I’ve posted my bits function as a gist. Enjoy!


pyspecs – my first real open source contribution to the world

pyspecs – Minimalistic BDD in Python.

pyspecs is a testing framework that strives to achieve more readable
specifications (tests) by leveraging some fancy syntactic sugar and
auto-discovery of tests/specifications (specs).

Installation is straightforward:

    $ pip install pyspecs

or…

    $ easy_install pyspecs

or…

    $ git clone https://mdwhatcott@github.com/mdwhatcott/pyspecs.git
    $ cd pyspecs
    $ python setup.py

Writing specs

Specifications are identified by subclassing from the spec class. From there
the idea is then to lay out the specification in steps (given-when-then). The
following steps are available to each subclass of spec as method decorators and
are executed in the order listed:

given – The context for the specification, the initial setup phase.
when – This is where to invoke the action under test.
collect – Allows the aggregation of results for ease when making assertions.
then – This is where assertions are made (more details below) about the results arrived at in the when and collect steps.
after – Analogous to the tearDown method in unit-testing frameworks.

Assertions

The simplest assertion can be made by using the built-in assert statement:

assert 42 == 'The answer the life, the universe and everything'

For readability this project provides a more fluent method for making
assertions:

# These imported names are all synonyms for the class that
# provides fluent assertions (Should). Use whichever provides
# the best readability.  The general patter is:
# >>> the([value]).should.[condition_method]([comparison_args])
#  or...
# >>> the([value]).should_NOT.[condition_method]([comparison_args]) # negated!

from pyspecs import the, this, that, it, then

this(42).should.equal(42) # this passes

this([1, 2, 3]).should.contain(2) # this also passes

the(list()).should.be_empty() # passes

it(1).should_NOT.be_greater_than(100) # passes

# raises AssertionError, caught by framework, logged as failure
that(200).should.be_less_than(0)

Example

from pyspecs import spec, given, when, then, the

class simple_addition(spec):
    @given
    def two_numbers(self):
        self.first = 2
        self.second = 3

    @when
    def we_add_them(self):
        self.result = add(self.first, self.second)

    @then
    def the_sum_should_equal_5(self):
        the(self.result).should.equal(5)

def add(a, b):
    return a + b

Execution of specs

Beyond providing the python library which will be explained below, installation
provides two command-line scripts into the environment, meant to be invoked
from the root of your project. Each will execute all specs in .py files
ending in ‘spec.py’ or ‘specs.py’.

For one-time execution of specs:

    $ pyspecs

To begin an auto-test loop (runs all specs anytime a .py file is saved):

    $ pyspecs_idle

To increase verbosity (default is ‘dot’):

    $ pyspecs --verbosity=story

or…

    $ pyspecs_idle --verbosity=story

Output

$ pyspecs --verbosity=story

------------------------------------ Specs ------------------------------------

"simple addition"
     given two numbers
     when we add them
     then the sum should equal 5

---------------------------------- Statistics ----------------------------------

1 specs
1 assertions passed

Duration: 0.081s

(ok)

What now? Please try it out! If you like it but find yourself thinking “I wish that is supported ______” (fill in blank with feature you want) then by all means fork the github repository, hack away (adding tests as you go, if you please) and submit a pull request. Enjoy!


C# LOC

Here’s a little Python script for counting lines in a c# project:

import os

lines = 0

for root, dirs, files in os.walk(root_directory):
	for file in files:
		if file.endswith('.cs') and file != 'AssemblyInfo.cs':
			with open(root + '/' + file, 'r') as f:
				lines += sum(1 for line in f)

print lines

Chained null-coalescing operators

Ok, I really didn’t mean to submit two posts in a row about chained operators but this was too good to pass up.
After pairing with a colleague for most of the day composing a concise but difficult test suite and finally implementing a small subset of that test suite I was tasked to refactor the production code we had written thus far on my own. Since refactoring is usually my favorite part of coding I was excited to clear the way for the next day when we would implement the rest of the test suite.
This code, like the code from the last post, was for a predefined algorithm that contained a lot of tedious conditional logic (the reason for the hard-to-compose test suite). The desired end result was that one of the conditions would be met allowing a single candidate to be selected from a larger set so that further processing could take place on that selected candidate.
Here’s a sketch of how it looked originally:
public class WorkerClass
{
	public ProcessedCandidate ProcessWinner(IList candidates)
	{
		var winner = SelectWinner(candidates);

		if (winner == null)
			return null;

		return ProcecssWinner(winner);
	}

	private Candidate SelectWinner(IList candidates)
	{
		if (SomeComplexConditionIsMet(candidates))
		{
			// lots of code to select the winning candidate
			return winningCandidate;
		}

		if (ACompletelyDifferentConditionIsMet(candidates))
		{
			// lots of code to select the winning candidate
			return winningCandidate;
		}

		if (AnEvenMoreComplexAndLessImaginableConditionIsMet(candidates))
		{
			// even more code to select the winner
			return winningCandidate;
		}

		// etc... (there were about 6 different conditions,
		// each with their own way of selecting a winner)

		return ambiguousResult;
	}
}
So most of the condition checking and winner selection was happening from inside the SelectWinner method. I started by extracting the conditions and selection code out to private methods but it still left a bunch of ‘if’ statements in the high level algorithm. Then, it came to me – why not chain together a bunch of null-coalescing operators to cleanly call the condition checking methods and ensure that when a winner is selected earlier on, a later winner (if a further condition is satisfied) would not overwrite the previous winner?
Here is the final result:
public class WorkerClass
{
	public ProcessedCandidate ProcessWinner(IList candidates)
	{
		return ProcessWinner(SelectWinner(candidates));
	}

	private Candidate SelectWinner(IList candidates)
	{
		return CheckForWinnerFromFirstCondition(candidates)
			?? CheckForWinnerFromSecondCondition(candidates)
			?? CheckForWinnerFromThirdCondition(candidates);
	}

	// small private methods containing condition checking logic
	// as well as candidate selection algorithms.
}
You can see exactly what is going on at a glance and it will be a cinch to add further conditional logic (although the class might get large enough to break the logic into separate modules/objects). What do you think?

Readable ‘if’ statements (impossible you say???)…

I was pairing with a colleague today as we wrote C# code for a predefined algorithm of a process that needed to be automated. The logic of the process was quite convoluted and at its core involved 3 different outcomes as well as a catch-all instruction to raise an error if none of those outcomes could be reached. That made a total of at least 4 ways to complete that part of the algorithm – we knew we would probably have to start with a bunch of if statements (as was indicated in the docs we were consulting) and try refactoring from there once we had good test coverage and a working implementation.
Here’s a general idea of how the code looked at that point:
private string TheMethodWithConvolutedLogic()
{
	if (ConditionA())
		return "A";

	if (ConditionB())
		return "B";

	if (ConditionC())
		return "C";

	return "None of the above"; // 'error' condition
}
I’ve seen worse-looking code but there was room for improvement. The final outcome of the code didn’t exactly use ‘if’ statements but it did the same thing in a much cleaner, and succinct way. We accomplished it with a chained ternary statement. The simplest use of a ternary statement is like so:
private string SomeMethod()
{
	return SomeBooleanCondition() ? "Value when true" : "Value when false";
}
Here was our application of the chained ternary statement on the first example:
private string TheMethodWithConvolutedLogic()
{
	return ConditionA() ? "A"
		 : ConditionB() ? "B"
		 : ConditionC() ? "C"
		 "None of the above";
}
After staring at that for a few seconds we both concluded, against both of our preconceptions of how ugly a chained ternary statement would be, that the change actually improved readability. The question marks do a good job of showing the question-answer feeling of this code in a single return statement. Do you have any ideas for making this even cleaner?

Python refactoring with ropevim

I just started using pylint to police my python code for inconsistencies and cruft. Along with this, I’m trying to unify my naming conventions but this is tricky as I haven’t been able to find any tools that would facilitate a great deal of quick and safe renaming operations throughout a project. I just haven’t any free full-blown python IDE’s that provide a rich and varied set of refactorings (please comment if you know of any!).

Luckily, I stumbled upon a blogpost including a script that automatically pulled the correct packages from source and provided the appropriate addition to your .vimrc file to get Vim working with the Rope refactoring library via the Ropevim plugin.

The first ‘rename’ operation I tried in my project was successful! The rename was executed on the production side of the code and the tests were updated with the rename and nothing broke! I’d love to hear from others who are using refactoring tools with python code.
[update]: The renaming does have trouble when you want to rename a field on an objects that is passed around into methods in other classes and used there – at that point, the symbol can’t be resolved and the renaming doesn’t take effect. But for class names, method argument names and class variable names it works great!

Follow

Get every new post delivered to your Inbox.