ARM汇编 2.ARM Instructions , GNU Assembler Derivatives - 吾爱破解 - 52pojie.cn

t7sqynt3 发表于 2020-7-23 06:39

ARM汇编 2.ARM Instructions, GNU Assembler Derivatives

本帖最后由 t7sqynt3 于 2020-7-23 06:42 编辑

前言：逆向工程需要汇编基础，先尝试整理目前所学的ARM汇编知识。由于水平有限，可能会有所疏漏，欢迎进行指正和讨论。此系列为原创，如引用需标明出处。然而，不建议作为学术研究的引用，因为内容未经过peer review。

You should cite this article if you want to use it in your work.

Warning: this article may not be precise and professional, and it is not peer-reviewed.

欢迎专业人士对此文进行翻译，因为作者本人不知道准确的翻译术语。

# 2.ARM Instructions, GNU Assembler Derivatives

Basically, ARM instructions have three categories:

- Data processing instructions: the destination of data is a register and they only work on registers.

- Control flow instructions: unconditional or conditional branch, function call.

- Data transfer instructions: load contents from memory to registers, and store contents of registers to memory

The instructions listed below is not comprehensive, but they are the most commonly used when you write assembly code. (Which may not be exactly the case if you reverse engineer some programs.)

## Data Processing Instructions

The general format is `instruction destination_reg, oprand1_reg, operand2`.

Operand2 can be registers r0-r15, r0-15 shifted, or 32-bit immediate value that can be derived by an 8-bit value, or by shifting, rotating, and/or complementing an 8-bit value.

- `mov`: moves a value to a register

`mov destination_reg, source_reg` or `mov destination_reg, #imm8`

When -256 <= #imm8 <= 255, the assembler can always success. Otherwise, the value must be produced by shifting, rotating, and/or complementing an 8-bit value. There are many values that fail to meet this requirement, and the assembler will generate errors if this happens.

In our assembly code, #imm8 can be written as numbers, e.g. `1`, or as #number's, e.g. `#1`. Both are accepted by the GNU assembler for ARM.

- `mvn`: move 1's complement of a value to a register

`mvn destination_reg, source_reg` or `mvn destination_reg, #imm8`

`#imm8` is typically a value <= 255. Otherwise, the restrictions are the same as mentioned before.

- `add`: add two registers

`add destination_reg, source1_reg, source2_reg` or `add destination_reg, source1_reg, #imm8`

dest = src1 + src2 or dest = src1 + #imm8

- `sub`: subtract two registers

`sub destination_reg, source1_reg, source2_reg`or`sub destination_reg, source1_reg, #imm8`

dest = src1 - src2 or dest = src1 - #imm8

- `rsb`: subtract two registers (operand2 - operand1)

`rsb destination_reg, source1_reg, source2_reg`or`rsb destination_reg, source1_reg, #imm8`

dest = src2 - src1 or dest = #imm8 - src1

- `mul`: multiply two registers

`mul destination_reg, source1_reg, <source2_reg>`

source2_reg is optional. If omitted, dest = dest * src1.

Otherwise, dest = src1 * src2.

Notice that the lower 32-bit of the result is stored in dest, and the multiplication is not signed.

- `and`: bitwise AND of two registers

`and destination_reg, source1_reg, source2_reg`

dest = src1 & src2

- `orr`: bitwise OR of two registers

`orr destination_reg, source1_reg, source2_reg`

dest = src1 | src2

- `eor`: bitwise exclusive OR of two registers (XOR)

`eor destination_reg, source1_reg, source2_reg`

dest = src1 ^ src2

- `bic`: bitwise clear (AND NOT) of two registers

`bic destination_reg, source1_reg, source2_reg`

dest = src1 & ~src2

- `lsl`: logical shift left

`lsl <destination_reg,> source1_reg, source2_reg` or `lsl <destination_reg,> source1_reg, #const`

1 <= #const <= 32

dest = src1 << src2 or dest = src1 << #const

If dest does not present, the result is stored in src1.

- `lsr`: logical shift right

`lsr <destination_reg,> source1_reg, source2_reg` or `lsr <destination_reg,> source1_reg, #const`

1 <= #const <= 32

dest = (unsigned)src1 >> src2 or dest = (unsigned)src1 >> #const

If dest does not present, the result is stored in src1.

- `asr`: arithmetic shift right

`asr <destination_reg,> source1_reg, source2_reg` or `asr <destination_reg,> source1_reg, #const`

1 <= #const <= 32

dest = (signed)src1 >> src2 or dest = (signed)src1 >> #const

If dest does not present, the result is stored in src1.

- `ror`: rotate right

`ror <destination_reg,> source1_reg, source2_reg` or `ror <destination_reg,> source1_reg, #const`

1 <= #const <= 32

Copy the low-order bits into the high-order bits positions as they are shifted.

If dest does not present, the result is stored in src1.

- `cmp`: compare two values and set condition flags

`cmp source1_reg, source2_reg` or `cmp source1_reg, #imm8`

The requirements for #imm8 is the same as before.

Notice: many ARM CPU's do not have hardware support for division, especially for early ones. We avoid discuss this here.

## Control Flow Instructions

### Condition Flags

Note: the content of this section and the below ("Condition Flags" and "") is referenced from the ARM community post "Condition Codes 1: Condition Flags and Codes" by Jacob Bramley, published September 11, 2013.

| Flag | Explanation                                              |
| ---- | ----------------------------------------------------------- |
| N | set if the result is negative (set to bit 31 of the result) |
| Z | set if the result is zero                               |
| C | set if the result of an unsigned operation overflows.    |
| V | set if the result of a signed operation overflows.       |

Dedicated comparison instructions that set the flags: (these are not the only instructions that set the flags)

- `cmp`: works like `subs`, which do `sub` and set the conditional flags, but `cmp` does not store the result.

- `cmn`: works like `adds`, but does not store the result.

- `tst`: works like `ands`, but does not store the result.

- `teq`: works like `eors`, but does not store the result.

### Branch instructions

Branch instructions are used for changing the order of instruction execution, or "jump".

For function calls, use branch and link: `bl`. For example, `bl printf`.

For conditional and unconditional branching, use instruction format `bxx` where `xx` can be codes below:

| **Code**       | **Meaning (for `cmp` or `subs`)**                         | **Flags Tested** |
| ----------------- | ------------------------------------------------------------- | ------------------- |
| `eq`          | Equal.                                                    | `Z==1`          |
| `ne`          | Not equal.                                                 | `Z==0`          |
| `cs` or `hs`    | Unsigned higher or same (or carry set).                   | `C==1`          |
| `cc` or `lo`    | Unsigned lower (or carry clear).                            | `C==0`          |
| `mi`          | Negative. The mnemonic stands for "minus".                | `N==1`          |
| `pl`          | Positive or zero. The mnemonic stands for "plus".          | `N==0`          |
| `vs`          | Signed overflow. The mnemonic stands for "overflow set".    | `V==1`          |
| `vc`          | No signed overflow. The mnemonic stands for "overflow clear". | `V==0`          |
| `hi`          | Unsigned higher.                                           | `(C==1) && (Z==0)`|
| `ls`          | Unsigned lower or same.                                     | `(C==0) || (Z==1)` |
| `ge`          | Signed greater than or equal.                               | `N==V`          |
| `lt`          | Signed less than.                                           | `N!=V`          |
| `gt`          | Signed greater than.                                        | `(Z==0) && (N==V)`|
| `le`          | Signed less than or equal.                                  | `(Z==1) || (N!=V)` |
| `al` (or omitted) | Always executed.                                           | None.             |

## Data Transfer Instructions

These instructions are used to move data between CPU and memory. They can transfer bytes, half words (2 bytes), or words (4 bytes), from registers to memory, or from memory to registers.

- `ldr`: load data from memory address to registers

`ldr destination_reg, source_memory_address`

- `str`: store data from registers to memory address

`str destination_reg, source_memory_address`

`source_memory_address` can be:

- =label

- =expression (e.g. =0xffffffff)

- (-2048 <= #imm12 <= 2047)

-

Suffixes for `ldr` or `str`:

- `ldrb` or `strb`: load or store a byte.

- `ldrh` or `strh`: load or store a halfword.

- `ldr` or `str`: load or store a word.

- `ldrd` or `strd`: load or store a double word. (even register, register + 1 as lower word, upper word)

Sign extension for load a byte or halfword:

- `ldrsb`: load a signed byte.

- `ldrsh`: load a signed halfword.

Notice the alignment requirements for halfword (2-byte), word (4-byte), and double word (8-byte) in memory. Failure to align will be detrimental to the performance.

## GNU Assembler Derivatives

The list below is not meant to be comprehensive; the derivatives are only the commonly used ones.

### Target Hardware

- `.arch`: the CPU architecture. e.g. `.arch armv6`.

- `.cpu`: the CPU. e.g. `.cpu cortex-a15`.

### Assembler Control

- `.section`: assemble the following in section. e.g. `.section .rodata`.

- `.text`: the text section, equivalent to `.section .text`.

- `.data`: the data section.

- `.bss`: the BSS section.

- `.align`: align the following code or data in the section. `.align x` where x is a non-negative integer means that align to 2 to the power of x bytes.

### Symbol

- `.global`: make symbol visible to the linker. e.g. `.global main`.

- `.extern`: use external functions or library functions. e.g. `.extern printf`.

- `.type`: define a label. e.g. `.type main, %function`.

- `.equ`: set the value of symbol. e.g. `.equ SIZ, 1`.

### Constant Definition

- `.byte`: define a byte data (8-bit).

- `.hword`: define a halfword data (2 bytes).

- `.word`: define a word data (4 bytes).

- `.quad`: define a double word data (8 bytes).

- `.single`: 4 bytes float value.

- `.double`: 8 bytes double float value.

- `.skip`: skip address forward, filled by 0.

- `.fill`: repeat copies of value with size. e.g. `.fill 16, 4, 0xffffffff` creates an integer (4 bytes size) array of 16 elements with value -1.

- `.ascii`: an ascii string. NOT zero-terminated.

- `.asciz`: a zero-terminated ascii string.

### Misc

`.syntax`: the syntax of assembly code. Generally we use modern syntax: `.syntax unified`.

bc001 发表于 2020-7-23 07:24

谢谢楼主分享

az12az 发表于 2020-7-23 08:16

看来需要翻译。

Psyber 发表于 2020-7-23 08:20

谢谢分享

huohua1991 发表于 2020-7-23 08:37

谢谢分享，学习学习

ALL_IN 发表于 2020-7-23 08:38

看不懂............

so_so_so 发表于 2020-7-23 08:38

谢谢楼主分享

fuzzylogic 发表于 2020-7-23 08:53

现在如此底层的好少了

sbuangke2019 发表于 2020-7-23 09:00

给中文可以吗

skjsnb 发表于 2020-7-23 09:10

Markdown好评，学习一下，帮顶！

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

ARM汇编 2.ARM Instructions, GNU Assembler Derivatives