A few ‘bits’ of python
Posted: June 13, 2013 Filed under: bitwise, python Leave a comment »Bit twiddling has always mesmerized me, or rather, watching others manipulate raw data (at the lowest level possible) is what was so mesmerizing. This is mostly because it seems magical to me. Not coming from a CS degree background I’ve always been attracted to abstractions and libraries that shield me from dealing with system-level data and components.
A recent project required me to take a text file full of csv records and compress each one into a six byte structure.
Legend:
| = byte separator
. = one bit
<----> = space consumption of numeric component
Binary structure of one record:
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
As you can see, there are two numeric components that take up to 2.5 bytes (20 bits) of space and one 8-bit component. One example csv record looked something like this:
129,-42,54038
Each number (component) expressed in binary using the built-in python function bin() looks like this:
>>> bin(129)
'0b10000001'
>>> bin(-42)
'-0b101010'
>>> bin(54038)
'0b1101001100010110'
Now, try running this line of code in your interpreter:
>>> bin(42)
I’ll wait while you do that…
I’m still waiting.
…
Ok, did you notice how the result of bin(-42) from earlier looks just like the result for bin(42) just with a leading negative sign (-)? That’s actually really unhelpful in visualizing the end result. What would be really nice to have seen in the beginning is the number represented in “two’s compliment binary”. For reasons I don’t fully understand python differs in its visual representation from its actual implementation for this function’s output. An optional parameter to the bin function that specified “two’s compliment” would be awesome, but it doesn’t exist. So, I wrote some that would show the “two’s compliment” representation of negative numbers:
>>> bits(3, 8)
'00000011'
>>> bits(-3, 8)
'11111101'
Compare the result above with the built-in:
>>> bin(-3)
-0b11
Here’s the implementaion of my bits() method, complete with documentation:
def bits(number, size_in_bits):
"""
The bin() function is *REALLY* unhelpful when working with negative numbers.
It outputs the binary representation of the positive version of that number
with a '-' at the beginning. Woop-di-do. Here's how to derive the two's-
complement binary of a negative number:
complement(bin(+n - 1))
`complement` is a function that flips each bit. `+n` is the negative number
made positive.
"""
if number < 0:
return compliment(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
else:
return bin(number)[2:].rjust(size_in_bits, '0')
def compliment(value):
return ''.join(COMPLEMENT[x] for x in value)
COMPLEMENT = {'1': '0', '0': '1'}
So, now that I could see the bits as they really were I came up with the following description of the scenario. Take each number visualized using the available space:
>>> bits(129, 8)
'10000001'
>>> bits(-42, 20)
'11111111111111010110'
>>> bits(54038, 20)
'11111111111110010001'
No concatenate them in a 48-bit, or 6-byte structure, with bits(54038, 20) ocupying the left-most position, followed by bits(-42, 20), then finally bits(129, 8):
<--------------------><--------------------> <------>
|........|........|........|........|........|........|
Expected: 11111111 11111001 00011111 11111111 11010110 10000001
So now, what we need is a sequence of bitwise operations that will produce the 6-byte structure from the 8-bit component and the 2 20-bit components.
Let’s start simple.
We are going to need to do some bit shifting in order to get the 20-bit components to the left of the 8-bit component, which needs to be on the far right.
So, let’s just work with 129 (8-bits) and -42 (20-bits):
>>> bits(-42 << 8 | 129, 48)
'111111111111111111111111111111111101011010000001'
As you can hopefully see, the 129 takes up the right-most 8 bits. The 129 takes up the rest.
Now, the tricky part is getting the 20 bits from 54038 in there without messing up what we’ve already done. Without any more messing around, here’s the solution diagrammed:
Start with bits(54038):
>>> bits(54038) # showing prettier result for the purpose of illustration
00000000 00000000 00000000 00001111 11111111 10010001
Now shift left 28 bits (<<):
>>> bits(54038 << 28, 48)
00001101 00110001 01100000 00000000 00000000 00000000
Now the tricky part, which makes the next step possible:
>>> bits(54038 << 28 ^ 268435455, 48)
00001101 00110001 01101111 11111111 11111111 11111111
What we just did was a “bitwise exclusive or (^)” using a mask. In this case the mask was 268435455. What’s so special about that number? bits(268435455, 21) shows that in binary, that number is 21 ones in a row. Try removing the bitwise exclusive or using the mask and you’ll see things go bonkers in a hurry.
Now, lets prepare the -42 component:
>>> bits(-42, 48)
11111111 11111111 11111111 11111111 11111111 11010110
Shift it over 8 to make room for the 129 later:
>>> bits(-42 << 8, 48)
11111111 11111111 11111111 11111111 11010110 00000000
Now, add that result into what we had previously using a “bitwise and (&)”:
>>> bits(54038 << 28 ^ 268435455 & -42 << 8, 48)
00001101 00110001 01101111 11111111 11010110 00000000
Can you see both numbers in that structure? Now, add in the 129 on the end (or the beginning if you’re thinking in binary) using a “bitwise or (|)”:
>>> bits(54038 << 28 ^ 268435455 & -42 << 8 | 129, 48)
00001101 00110001 01101111 11111111 11010110 10000001
Phew! Now, we really don’t need the bits function anymore:
>>> result = 54038 << 28 ^ 268435455 & -42 << 8 | 129, 48
Now it was just a matter of creating a function that accepted the 3 numeric inputs and calculated the result using the above operators and mask. The last step was to output the raw bytes of the result:
>>> import struct
>>> b = bytearray(struct.pack('q', result))[:6]
And now, one last sanity check:
# working left to right:
# 00001101 00110001 01101111 11111111 11010110 10000001
>>> bits(b[0], 8)
'10000001'
>>> bits(b[1], 8)
'11010110'
>>> bits(b[2], 8)
'11111111'
>>> bits(b[3], 8)
'01101111'
>>> bits(b[4], 8)
'00110001'
>>> bits(b[5], 8)
'00001101'
As expected, we have 6 bytes that match up with our expected bit array. I must admit that it took quite the reading of the python bitwise operator docs as well as a lot of expirementation once I had created the bits function to find the right order of operations. But my understanding of bitwise operations is much more solid as a result. I’ve posted my bits function as a gist. Enjoy!
pyspecs – my first real open source contribution to the world
Posted: May 31, 2012 Filed under: bdd, python, tdd | Tags: bdd, python, tdd Leave a comment »pyspecs – Minimalistic BDD in Python.
pyspecs is a testing framework that strives to achieve more readable
specifications (tests) by leveraging some fancy syntactic sugar and
auto-discovery of tests/specifications (specs).
Installation is straightforward:
$ pip install pyspecs
or…
$ easy_install pyspecs
or…
$ git clone https://mdwhatcott@github.com/mdwhatcott/pyspecs.git
$ cd pyspecs
$ python setup.py
Writing specs
Specifications are identified by subclassing from the spec class. From there
the idea is then to lay out the specification in steps (given-when-then). The
following steps are available to each subclass of spec as method decorators and
are executed in the order listed:
given – The context for the specification, the initial setup phase.
when – This is where to invoke the action under test.
collect – Allows the aggregation of results for ease when making assertions.
then – This is where assertions are made (more details below) about the results arrived at in the when and collect steps.
after – Analogous to the tearDown method in unit-testing frameworks.
Assertions
The simplest assertion can be made by using the built-in assert statement:
assert 42 == 'The answer the life, the universe and everything'
For readability this project provides a more fluent method for making
assertions:
# These imported names are all synonyms for the class that # provides fluent assertions (Should). Use whichever provides # the best readability. The general patter is: # >>> the([value]).should.[condition_method]([comparison_args]) # or... # >>> the([value]).should_NOT.[condition_method]([comparison_args]) # negated! from pyspecs import the, this, that, it, then this(42).should.equal(42) # this passes this([1, 2, 3]).should.contain(2) # this also passes the(list()).should.be_empty() # passes it(1).should_NOT.be_greater_than(100) # passes # raises AssertionError, caught by framework, logged as failure that(200).should.be_less_than(0)
Example
from pyspecs import spec, given, when, then, the
class simple_addition(spec):
@given
def two_numbers(self):
self.first = 2
self.second = 3
@when
def we_add_them(self):
self.result = add(self.first, self.second)
@then
def the_sum_should_equal_5(self):
the(self.result).should.equal(5)
def add(a, b):
return a + b
Execution of specs
Beyond providing the python library which will be explained below, installation
provides two command-line scripts into the environment, meant to be invoked
from the root of your project. Each will execute all specs in .py files
ending in ‘spec.py’ or ‘specs.py’.
For one-time execution of specs:
$ pyspecs
To begin an auto-test loop (runs all specs anytime a .py file is saved):
$ pyspecs_idle
To increase verbosity (default is ‘dot’):
$ pyspecs --verbosity=story
or…
$ pyspecs_idle --verbosity=story
Output
$ pyspecs --verbosity=story
------------------------------------ Specs ------------------------------------
"simple addition"
given two numbers
when we add them
then the sum should equal 5
---------------------------------- Statistics ----------------------------------
1 specs
1 assertions passed
Duration: 0.081s
(ok)
What now? Please try it out! If you like it but find yourself thinking “I wish that is supported ______” (fill in blank with feature you want) then by all means fork the github repository, hack away (adding tests as you go, if you please) and submit a pull request. Enjoy!
C# LOC
Posted: May 4, 2010 Filed under: python Leave a comment »Here’s a little Python script for counting lines in a c# project:
import os
lines = 0
for root, dirs, files in os.walk(root_directory):
for file in files:
if file.endswith('.cs') and file != 'AssemblyInfo.cs':
with open(root + '/' + file, 'r') as f:
lines += sum(1 for line in f)
print lines
Chained null-coalescing operators
Posted: March 31, 2010 Filed under: C#, refactoring Leave a comment »
public class WorkerClass
{
public ProcessedCandidate ProcessWinner(IList candidates)
{
var winner = SelectWinner(candidates);
if (winner == null)
return null;
return ProcecssWinner(winner);
}
private Candidate SelectWinner(IList candidates)
{
if (SomeComplexConditionIsMet(candidates))
{
// lots of code to select the winning candidate
return winningCandidate;
}
if (ACompletelyDifferentConditionIsMet(candidates))
{
// lots of code to select the winning candidate
return winningCandidate;
}
if (AnEvenMoreComplexAndLessImaginableConditionIsMet(candidates))
{
// even more code to select the winner
return winningCandidate;
}
// etc... (there were about 6 different conditions,
// each with their own way of selecting a winner)
return ambiguousResult;
}
}
public class WorkerClass
{
public ProcessedCandidate ProcessWinner(IList candidates)
{
return ProcessWinner(SelectWinner(candidates));
}
private Candidate SelectWinner(IList candidates)
{
return CheckForWinnerFromFirstCondition(candidates)
?? CheckForWinnerFromSecondCondition(candidates)
?? CheckForWinnerFromThirdCondition(candidates);
}
// small private methods containing condition checking logic
// as well as candidate selection algorithms.
}
Readable ‘if’ statements (impossible you say???)…
Posted: March 20, 2010 Filed under: C#, clean code, refactoring Leave a comment »
private string TheMethodWithConvolutedLogic()
{
if (ConditionA())
return "A";
if (ConditionB())
return "B";
if (ConditionC())
return "C";
return "None of the above"; // 'error' condition
}
private string SomeMethod()
{
return SomeBooleanCondition() ? "Value when true" : "Value when false";
}
private string TheMethodWithConvolutedLogic()
{
return ConditionA() ? "A"
: ConditionB() ? "B"
: ConditionC() ? "C"
"None of the above";
}
Python refactoring with ropevim
Posted: February 27, 2010 Filed under: python, refactoring, vim 3 Comments »I just started using pylint to police my python code for inconsistencies and cruft. Along with this, I’m trying to unify my naming conventions but this is tricky as I haven’t been able to find any tools that would facilitate a great deal of quick and safe renaming operations throughout a project. I just haven’t any free full-blown python IDE’s that provide a rich and varied set of refactorings (please comment if you know of any!).