ARM汇编 2.ARM Instructions, GNU Assembler Derivatives

t7sqynt3 · 发表于 2020-7-23 06:39

本帖最后由 t7sqynt3 于 2020-7-23 06:42 编辑

前言：逆向工程需要汇编基础，先尝试整理目前所学的ARM汇编知识。由于水平有限，可能会有所疏漏，欢迎进行指正和讨论。此系列为原创，如引用需标明出处。然而，不建议作为学术研究的引用，因为内容未经过peer review。

You should cite this article if you want to use it in your work.

Warning: this article may not be precise and professional, and it is not peer-reviewed.

欢迎专业人士对此文进行翻译，因为作者本人不知道准确的翻译术语。

2.ARM Instructions, GNU Assembler Derivatives

Basically, ARM instructions have three categories:

Data processing instructions: the destination of data is a register and they only work on registers.
Control flow instructions: unconditional or conditional branch, function call.
Data transfer instructions: load contents from memory to registers, and store contents of registers to memory

The instructions listed below is not comprehensive, but they are the most commonly used when you write assembly code. (Which may not be exactly the case if you reverse engineer some programs.)

Data Processing Instructions

The general format is instruction destination_reg, oprand1_reg, operand2.

Operand2 can be registers r0-r15, r0-15 shifted, or 32-bit immediate value that can be derived by an 8-bit value, or by shifting, rotating, and/or complementing an 8-bit value.

mov: moves a value to a register

mov destination_reg, source_reg or mov destination_reg, #imm8

When -256 <= #imm8 <= 255, the assembler can always success. Otherwise, the value must be produced by shifting, rotating, and/or complementing an 8-bit value. There are many values that fail to meet this requirement, and the assembler will generate errors if this happens.

In our assembly code, #imm8 can be written as numbers, e.g. 1, or as #number's, e.g. #1. Both are accepted by the GNU assembler for ARM.
mvn: move 1's complement of a value to a register

mvn destination_reg, source_reg or mvn destination_reg, #imm8

#imm8 is typically a value <= 255. Otherwise, the restrictions are the same as mentioned before.
add: add two registers

add destination_reg, source1_reg, source2_reg or add destination_reg, source1_reg, #imm8

dest = src1 + src2 or dest = src1 + #imm8
sub: subtract two registers

sub destination_reg, source1_reg, source2_regorsub destination_reg, source1_reg, #imm8

dest = src1 - src2 or dest = src1 - #imm8
rsb: subtract two registers (operand2 - operand1)

rsb destination_reg, source1_reg, source2_regorrsb destination_reg, source1_reg, #imm8

dest = src2 - src1 or dest = #imm8 - src1
mul: multiply two registers

mul destination_reg, source1_reg, <source2_reg>

source2_reg is optional. If omitted, dest = dest * src1.

Otherwise, dest = src1 * src2.

Notice that the lower 32-bit of the result is stored in dest, and the multiplication is not signed.
and: bitwise AND of two registers

and destination_reg, source1_reg, source2_reg

dest = src1 & src2
orr: bitwise OR of two registers

orr destination_reg, source1_reg, source2_reg

dest = src1 | src2
eor: bitwise exclusive OR of two registers (XOR)

eor destination_reg, source1_reg, source2_reg

dest = src1 ^ src2
bic: bitwise clear (AND NOT) of two registers

bic destination_reg, source1_reg, source2_reg

dest = src1 & ~src2
lsl: logical shift left

lsl <destination_reg,> source1_reg, source2_reg or lsl <destination_reg,> source1_reg, #const

1 <= #const <= 32

dest = src1 << src2 or dest = src1 << #const

If dest does not present, the result is stored in src1.
lsr: logical shift right

lsr <destination_reg,> source1_reg, source2_reg or lsr <destination_reg,> source1_reg, #const

1 <= #const <= 32

dest = (unsigned)src1 >> src2 or dest = (unsigned)src1 >> #const

If dest does not present, the result is stored in src1.
asr: arithmetic shift right

asr <destination_reg,> source1_reg, source2_reg or asr <destination_reg,> source1_reg, #const

1 <= #const <= 32

dest = (signed)src1 >> src2 or dest = (signed)src1 >> #const

If dest does not present, the result is stored in src1.
ror: rotate right

ror <destination_reg,> source1_reg, source2_reg or ror <destination_reg,> source1_reg, #const

1 <= #const <= 32

Copy the low-order bits into the high-order bits positions as they are shifted.

If dest does not present, the result is stored in src1.
cmp: compare two values and set condition flags

cmp source1_reg, source2_reg or cmp source1_reg, #imm8

The requirements for #imm8 is the same as before.

Notice: many ARM CPU's do not have hardware support for division, especially for early ones. We avoid discuss this here.

Control Flow Instructions

Condition Flags

Note: the content of this section and the below ("Condition Flags" and "") is referenced from the ARM community post "Condition Codes 1: Condition Flags and Codes" by Jacob Bramley, published September 11, 2013.

Flag	Explanation
N	set if the result is negative (set to bit 31 of the result)
Z	set if the result is zero
C	set if the result of an unsigned operation overflows.
V	set if the result of a signed operation overflows.

Dedicated comparison instructions that set the flags: (these are not the only instructions that set the flags)

cmp: works like subs, which do sub and set the conditional flags, but cmp does not store the result.
cmn: works like adds, but does not store the result.
tst: works like ands, but does not store the result.
teq: works like eors, but does not store the result.

Branch instructions

Branch instructions are used for changing the order of instruction execution, or "jump".

For function calls, use branch and link: bl. For example, bl printf.

For conditional and unconditional branching, use instruction format bxx where xx can be codes below:

Code	Meaning (for `cmp` or `subs`)	Flags Tested
`eq`	Equal.	`Z==1`
`ne`	Not equal.	`Z==0`
`cs` or `hs`	Unsigned higher or same (or carry set).	`C==1`
`cc` or `lo`	Unsigned lower (or carry clear).	`C==0`
`mi`	Negative. The mnemonic stands for "minus".	`N==1`
`pl`	Positive or zero. The mnemonic stands for "plus".	`N==0`
`vs`	Signed overflow. The mnemonic stands for "overflow set".	`V==1`
`vc`	No signed overflow. The mnemonic stands for "overflow clear".	`V==0`
`hi`	Unsigned higher.	`(C==1) && (Z==0)`
`ls`	Unsigned lower or same.	`(C==0) \|\| (Z==1)`
`ge`	Signed greater than or equal.	`N==V`
`lt`	Signed less than.	`N!=V`
`gt`	Signed greater than.	`(Z==0) && (N==V)`
`le`	Signed less than or equal.	`(Z==1) \|\| (N!=V)`
`al` (or omitted)	Always executed.	None.

Data Transfer Instructions

These instructions are used to move data between CPU and memory. They can transfer bytes, half words (2 bytes), or words (4 bytes), from registers to memory, or from memory to registers.

ldr: load data from memory address to registers

ldr destination_reg, source_memory_address
str: store data from registers to memory address

str destination_reg, source_memory_address

source_memory_address can be:

=label
=expression (e.g. =0xffffffff)
[base_register<, #imm12>] (-2048 <= #imm12 <= 2047)
[base_register, offset_register]

Suffixes for ldr or str:

ldrb or strb: load or store a byte.
ldrh or strh: load or store a halfword.
ldr or str: load or store a word.
ldrd or strd: load or store a double word. (even register, register + 1 as lower word, upper word)

Sign extension for load a byte or halfword:

ldrsb: load a signed byte.
ldrsh: load a signed halfword.

Notice the alignment requirements for halfword (2-byte), word (4-byte), and double word (8-byte) in memory. Failure to align will be detrimental to the performance.

GNU Assembler Derivatives

The list below is not meant to be comprehensive; the derivatives are only the commonly used ones.

Target Hardware

.arch: the CPU architecture. e.g. .arch armv6.
.cpu: the CPU. e.g. .cpu cortex-a15.

Assembler Control

.section: assemble the following in section. e.g. .section .rodata.
.text: the text section, equivalent to .section .text.
.data: the data section.
.bss: the BSS section.
.align: align the following code or data in the section. .align x where x is a non-negative integer means that align to 2 to the power of x bytes.

Symbol

.global: make symbol visible to the linker. e.g. .global main.
.extern: use external functions or library functions. e.g. .extern printf.
.type: define a label. e.g. .type main, %function.
.equ: set the value of symbol. e.g. .equ SIZ, 1.

Constant Definition

.byte: define a byte data (8-bit).
.hword: define a halfword data (2 bytes).
.word: define a word data (4 bytes).
.quad: define a double word data (8 bytes).
.single: 4 bytes float value.
.double: 8 bytes double float value.
.skip: skip address forward, filled by 0.
.fill: repeat copies of value with size. e.g. .fill 16, 4, 0xffffffff creates an integer (4 bytes size) array of 16 elements with value -1.
.ascii: an ascii string. NOT zero-terminated.
.asciz: a zero-terminated ascii string.

Misc

.syntax: the syntax of assembly code. Generally we use modern syntax: .syntax unified.

bc001 · 发表于 2020-7-23 07:24

谢谢楼主分享

az12az · 发表于 2020-7-23 08:16

看来需要翻译。

Psyber · 发表于 2020-7-23 08:20

谢谢分享

huohua1991 · 发表于 2020-7-23 08:37

谢谢分享，学习学习

ALL_IN · 发表于 2020-7-23 08:38

看不懂............

so_so_so · 发表于 2020-7-23 08:38