{"id":38,"date":"2007-11-30T10:12:08","date_gmt":"2007-11-30T12:12:08","guid":{"rendered":"http:\/\/www.ragestorm.net\/blogs\/?p=38"},"modified":"2007-11-30T10:12:08","modified_gmt":"2007-11-30T12:12:08","slug":"python-converting-signedness","status":"publish","type":"post","link":"https:\/\/www.ragestorm.net\/blogs\/?p=38","title":{"rendered":"Python: Converting Signedness"},"content":{"rendered":"<p>Many times I encounter Python modules that were written in C\/C++ which when you use them you get to a really annoying problem. Integers that that module returns are all <em>unsigned<\/em>. That&#8217;s really a problem, because Python\u00a0longs are (potentially) inifinite (they are digit-strings) AFAIK, unlike C\/C++\u00a0integers that\u00a0have a limit of, usually, 32 or 64 bits.<\/p>\n<p>If you pass back -1 to Python and format it as unsigned long, you will get 2**32-1. Now suppose you really want to treat the number inside your Python code as signed integer, you will have to convert 2**32-1 to -1 somehow. Most solutions I saw to this problem was to use struct in the following manner:<\/p>\n<p>import struct<br \/>\nstruct.unpack(&#8220;l&#8221;, struct.pack(&#8220;&gt;L&#8221;, 2**32-1))[0]<\/p>\n<p>\u00a0Packs an unsigned long value of 0xffffffff to (signed) long -1 (using little endian, but I don&#8217;t care now about it).<\/p>\n<p>You might want to use unsigned long long &#8211; that&#8217;s 64 bits integers in order to convert a bigger number. So you can do one of two things, convert your 32 bits integer to 64 bits integer by sign extending (and that&#8217;s another whole challenge)\u00a0it and stuff it into unpack\/pack of 64 bits, or test the size of the integer (by how many bits it takes) and call the correct unpack\/pack pair.<\/p>\n<p>It was then that I realized why I shouldn&#8217;t use this ugly trick. It simply doesn&#8217;t support Python&#8217;s longs. As I said earlier they are infinite and using this trick you are capped to 64 bits. So I thought of a better way, and not using any stdlib module at all, leaving it pure Python code&#8230;<\/p>\n<p>The way we know to change the signedness of an integer is by negating it, which is NOTing all bits of that number and incrementing it by 1, right? (2&#8217;s complement)\u00a0Well true and that should work:<\/p>\n<p>(Say we work with 8 bits integers)<\/p>\n<p>0xfb = -5<\/p>\n<p>&gt;&gt;&gt; ~0xfb<br \/>\n-252<br \/>\n&gt;&gt;&gt; ~0xfb+1<br \/>\n-251<br \/>\n&gt;&gt;&gt; 256-251<br \/>\n5<br \/>\n&gt;&gt;&gt; 0-5<br \/>\n-5<\/p>\n<p>The way it works is: -(256\u00a0+ (~x+1)) where x is a byte. For every integer we need to scan for its most significant bit&#8230; We can do it the other way around with the following formula (x &#8211; NEXT_MSB(x)):<\/p>\n<p>&gt;&gt;&gt; 0xfb &#8211; 0x100<br \/>\n-5<\/p>\n<p>This way it&#8217;s like we changed the sign bit of the integer and fixed the number as well. Both formulas can work for all integers&#8217; sizes. But the key here is to find the MSB. I prefered to stick\u00a0to the latter formula rather than the former, since it seems to be shorter. But it doesn&#8217;t really matter, both work.<\/p>\n<p>So now we have to code it, and as you should know me already &#8211; in one-liner device! The first challenge is to find the MSB, something like this should suffice in C(!):<\/p>\n<p>for (int i = 7; i &gt;= 0; i&#8211;)<br \/>\n\u00a0 if (x &amp; (1 &lt;&lt; i)) break;<\/p>\n<p>This will work for a byte integer, and note that we must start from the <em>end<\/em> towards the <em>start<\/em>. Otherwise we won&#8217;t find the MSB but the LSB. The problem in Python is that we don&#8217;t know the size of the longs we mess with and we need to come up with a trick to find its MSB.<\/p>\n<p>The lame trick I used for converting a number into decimal ASCII string, will be used here too and it goes like this:<\/p>\n<p>for i in xrange(2**32):<br \/>\n\u00a0if (i \/ (1 &lt;&lt; i)):<br \/>\n\u00a0 break<\/p>\n<p>We try to divide the input number by 2, 4, 8, 16, 32, &#8230; and when the result is 0, we know that we are out of bits. I said it&#8217;s lame because we use division, which is slow. If you got any other idea write to me please.<\/p>\n<p>Another drawback is the limit of the numbers we scan, we are limited to 2**32, this is huge enough and I guess you will never reach that, or I will be dead first prolly :o.\u00a0Using Erez&#8217;s trick (see <a href=\"http:\/\/www.ragestorm.net\/blogs\/?p=31#comment-615\">here<\/a>), we can make it a bit more elegant and stop as soon as the MSB was found.<\/p>\n<p>I am not sure whether you noticed, but supplying an input of a negative\u00a0number isn&#8217;t a smart move, we will have to check for it specifically. Eventually this is the code I came up with:<\/p>\n<p>(Note that the input &#8220;bits-stream&#8221; can be any size)<\/p>\n<p>def signed(n):<br \/>\n\u00a0return n if n &lt; 0 else n &#8211; [i for i in (2**j if n\/(2**(j-1)) else iter(()).next() for j in xrange(2**31-1))][-1]<\/p>\n<p>&gt;&gt;&gt; signed(0xb)<br \/>\n-5<br \/>\n&gt;&gt;&gt; signed(0xfb)<br \/>\n-5<br \/>\n&gt;&gt;&gt; signed(0xffffb)<br \/>\n-5<br \/>\n&gt;&gt;&gt; signed(0xffffffffffffffffffffffffb)<br \/>\n-5L\u00a0<\/p>\n<p><font size=\"-5\">GZ-LN<\/font><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many times I encounter Python modules that were written in C\/C++ which when you use them you get to a really annoying problem. Integers that that module returns are all unsigned. That&#8217;s really a problem, because Python\u00a0longs are (potentially) inifinite (they are digit-strings) AFAIK, unlike C\/C++\u00a0integers that\u00a0have a limit of, usually, 32 or 64 bits. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":""},"categories":[21,9],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pbWKd-C","_links":{"self":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/38"}],"collection":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=38"}],"version-history":[{"count":0,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/38\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=38"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=38"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=38"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}