## Hexadecimal and byte strings


<p>
<script language="JavaScript">
document.write("Last modified: " + document.lastModified)
</script>
    <p>
Python functions are compiled to byte code which are represented as byte strings.

In [1]:
def s():
    a = 1
    return a

In [2]:
s.__code__.co_code

b'd\x01}\x00|\x00S\x00'

In [3]:
for b in s.__code__.co_code:
    print (b)

100
1
125
0
124
0
83
0


How did the byte string get converted to integers?  Let's take it apart.

In [4]:
ord('d')

100

In [5]:
ord('\x01')

1

In [6]:
ord('}')

125

In [7]:
ord('\x00')

0

In [8]:
ord('|')

124

In [9]:
ord('\x00')

0

In [10]:
ord('S')

83

In [11]:
ord('\x00')

0

Each byte has a character (string) representation.  The non-printing characters, like 0 and 1, are given by their corresponding hexadecimal character code, which uses the prefix '\x' to indicate that the next two characters are to be interpreted as hexadecimal.

We can convert decimal numbers to hexadecimal using the <code>hex()</code> function.

In [12]:
hex(100)

'0x64'

In [13]:
chr(100)

'd'

In [14]:
chr(0x64)

'd'

In [15]:
ord('d')

100

In [16]:
ord('\x64')

100

Thus, the ASCII code for 'd' is decimal 100, which is hexadecimal 0x64 (64).  

The character for code 100 is 'd'.  The ASCII code for '\x64' is 'd' as well.

In hexadecimal, 0x64 is an integer and '\x64' is a character.

The <code>bytes()</code> function converts strings to byte strings. See <a target=bb href="https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview">Byte sequence types</a>.

See also <a target=qq href="https://realpython.com/python-encodings-guide/">Python encodings guide</a>

In [24]:
bytes('this is a test', 'UTF-8')

b'this is a test'

In [35]:
bytes('this is a test', 'ASCII')

b'this is a test'

In [37]:
bytes('r√©sum√©', 'ASCII')

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)

In [36]:
bytes('r√©sum√©', 'UTF-8')

b'r\xc3\xa9sum\xc3\xa9'

In [32]:
"r√©sum√©".encode("utf-8")

b'r\xc3\xa9sum\xc3\xa9'

In [33]:
"El Ni√±o".encode("utf-8")

b'El Ni\xc3\xb1o'

In [34]:
bytes("El Ni√±o", "utf-8")

b'El Ni\xc3\xb1o'

In [38]:
import locale
locale.getpreferredencoding()

'UTF-8'

In [39]:
ibrow = "ü§®"
len(ibrow)

1

In [40]:
ibrow

'ü§®'

In [42]:
ibrow.encode("utf-8")

b'\xf0\x9f\xa4\xa8'

In [43]:
len(ibrow.encode("utf-8"))

4

In [48]:
greekalphabet = 'Œ±Œ≤Œ≥Œ¥ŒµŒ∂Œ∑Œ∏ŒπŒ∫ŒªŒºŒΩŒæŒøœÄœÅœÇœÉœÑœÖœÜœáœà'

In [49]:
greekalphabet

'Œ±Œ≤Œ≥Œ¥ŒµŒ∂Œ∑Œ∏ŒπŒ∫ŒªŒºŒΩŒæŒøœÄœÅœÇœÉœÑœÖœÜœáœà'

In [50]:
print (greekalphabet)

Œ±Œ≤Œ≥Œ¥ŒµŒ∂Œ∑Œ∏ŒπŒ∫ŒªŒºŒΩŒæŒøœÄœÅœÇœÉœÑœÖœÜœáœà


In [51]:
bytes(greekalphabet, 'UTF-8')

b'\xce\xb1\xce\xb2\xce\xb3\xce\xb4\xce\xb5\xce\xb6\xce\xb7\xce\xb8\xce\xb9\xce\xba\xce\xbb\xce\xbc\xce\xbd\xce\xbe\xce\xbf\xcf\x80\xcf\x81\xcf\x82\xcf\x83\xcf\x84\xcf\x85\xcf\x86\xcf\x87\xcf\x88'

In [52]:
len(greekalphabet)

24

In [53]:
len(bytes(greekalphabet, 'UTF-8'))

48