analitics

Pages

Sunday, October 1, 2017

The capstone python module - disassembly framework.

The official python module comes with this info about this python module:
Capstone is a disassembly framework with the target of becoming the ultimate
the disasm engine for binary analysis and reversing in the security community.

Created by Nguyen Anh Quynh, then developed and maintained by a small community,
Capstone offers some unparalleled features:

- Support multiple hardware architectures: ARM, ARM64 (ARMv8), Mips, PPC & X86.

- Having clean/simple/lightweight/intuitive architecture-neutral API.

- Provide details on disassembled instruction (called “decomposer” by others).

- Provide semantics of the disassembled instruction, such as list of implicit
registers read & written.

- Implemented in pure C language, with lightweight wrappers for C++, Python,
Ruby, OCaml, C#, Java and Go available.

- Native support for Windows & *nix platforms (with OSX, Linux, *BSD & Solaris
have been confirmed).

- Thread-safe by design.

- Distributed under the open source BSD license.

Today I tested this python module with python version 2.7.
First I need to use a build of this python module from the official website.
I used binaries 32 bits like my python 2.7 and I tested with pip 2.7:
C:\Python27\Scripts>pip install capstone
Requirement already satisfied: capstone in c:\python27\lib\site-packages
Let's make a simple test with this python module:

C:\Python27>python.exe
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from capstone import (
...     Cs,
...     CS_ARCH_X86,
...     CS_MODE_32,
...     CS_OPT_SYNTAX_ATT,
... )
>>> mode=Cs(CS_ARCH_X86, CS_MODE_32)
>>> mode.syntax = CS_OPT_SYNTAX_ATT
>>> def D_ASM(code):
...     for address, size, mnemonic, op_str in mode.disasm_lite(code, offset=0x08048060):
...         print("0x{0:x}\t{1:d}\t{2:s}\t{3:s}".format(address, size,mnemonic, op_str))
...
>>> D_ASM(b"\xe1\x0b\x40\xb9\x20\x04\x81\xda\x20\x08\x02\x8b")
0x8048060       2       loope   0x804806d
0x8048062       1       incl    %eax
0x8048063       5       movl    $0xda810420, %ecx
0x8048068       2       andb    %cl, (%eax)
It seems to work very well.