Python functions are compiled to byte code which are represented as byte strings.
def s():
a = 1
return a
s.__code__.co_code
b'd\x01}\x00|\x00S\x00'
for b in s.__code__.co_code:
print (b)
100 1 125 0 124 0 83 0
How did the byte string get converted to integers? Let's take it apart.
ord('d')
100
ord('\x01')
1
ord('}')
125
ord('\x00')
0
ord('|')
124
ord('\x00')
0
ord('S')
83
ord('\x00')
0
Each byte has a character (string) representation. The non-printing characters, like 0 and 1, are given by their corresponding hexadecimal character code, which uses the prefix '\x' to indicate that the next two characters are to be interpreted as hexadecimal.
We can convert decimal numbers to hexadecimal using the hex()
function.
hex(100)
'0x64'
chr(100)
'd'
chr(0x64)
'd'
ord('d')
100
ord('\x64')
100
Thus, the ASCII code for 'd' is decimal 100, which is hexadecimal 0x64 (64).
The character for code 100 is 'd'. The ASCII code for '\x64' is 'd' as well.
In hexadecimal, 0x64 is an integer and '\x64' is a character.
The bytes()
function converts strings to byte strings. See Byte sequence types.
See also Python encodings guide
bytes('this is a test', 'UTF-8')
b'this is a test'
bytes('this is a test', 'ASCII')
b'this is a test'
bytes('résumé', 'ASCII')
--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) /tmp/ipykernel_438389/880225973.py in <module> ----> 1 bytes('résumé', 'ASCII') UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)
bytes('résumé', 'UTF-8')
b'r\xc3\xa9sum\xc3\xa9'
"résumé".encode("utf-8")
b'r\xc3\xa9sum\xc3\xa9'
"El Niño".encode("utf-8")
b'El Ni\xc3\xb1o'
bytes("El Niño", "utf-8")
b'El Ni\xc3\xb1o'
import locale
locale.getpreferredencoding()
'UTF-8'
ibrow = "🤨"
len(ibrow)
1
ibrow
'🤨'
ibrow.encode("utf-8")
b'\xf0\x9f\xa4\xa8'
len(ibrow.encode("utf-8"))
4
greekalphabet = 'αβγδεζηθικλμνξοπρςστυφχψ'
greekalphabet
'αβγδεζηθικλμνξοπρςστυφχψ'
print (greekalphabet)
αβγδεζηθικλμνξοπρςστυφχψ
bytes(greekalphabet, 'UTF-8')
b'\xce\xb1\xce\xb2\xce\xb3\xce\xb4\xce\xb5\xce\xb6\xce\xb7\xce\xb8\xce\xb9\xce\xba\xce\xbb\xce\xbc\xce\xbd\xce\xbe\xce\xbf\xcf\x80\xcf\x81\xcf\x82\xcf\x83\xcf\x84\xcf\x85\xcf\x86\xcf\x87\xcf\x88'
len(greekalphabet)
24
len(bytes(greekalphabet, 'UTF-8'))
48