728x90

1. ์บ์‹œ๋ž€?

CPU๋Š” ๋งค์šฐ ๋น ๋ฅด์ง€๋งŒ, RAM์€ ์ƒ๋Œ€์ ์œผ๋กœ ๋А๋ฆผ
๊ทธ๋ž˜์„œ CPU์™€ RAM ์‚ฌ์ด์— ์ž‘์€ ๊ณ ์† ๋ฉ”๋ชจ๋ฆฌ์ธ โ€œ์บ์‹œ(Cache)"๊ฐ€ ์กด์žฌํ•จ
- L1,L2,L3 ์บ์‹œ : ์ ์  ๋А๋ฆฌ์ง€๋งŒ ํฌ๊ธฐ๋Š” ์ปค์ง
- CPU๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์š”์ฒญํ•  ๋•Œ:
  - ์บ์‹œ์— ์žˆ์œผ๋ฉด -> Cache Hit (๋น ๋ฆ„)
  - ์—†์œผ๋ฉด -> Cache Miss (๋А๋ฆผ, RAM๊นŒ์ง€ ๋‚ด๋ ค๊ฐ)

2. Cache Hit vs Cache Miss

- Cache Hit : CPU๊ฐ€ ์ฐพ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์บ์‹œ์— ์ด๋ฏธ ์กด์žฌ : ๋น ๋ฆ„ (์ˆ˜ ns)
- Cache Miss : CPU๊ฐ€ ์ฐพ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์บ์‹œ์— ์—†์Œ -> RAM์—์„œ ๊ฐ€์ ธ์˜ด (์ˆ˜์‹ญ~์ˆ˜๋ฐฑ ns)

3. Spatial & Temporal Locality (์ง€์—ญ์„ฑ)

CPU ์บ์‹œ๋Š” ์ง€์—ญ์„ฑ(Locality)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘๋™ํ•จ
- Temporal Locality (์‹œ๊ฐ„์  ์ง€์—ญ์„ฑ):
  - ์ตœ๊ทผ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋˜ ์‚ฌ์šฉํ•  ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ
- Spatial Locality (๊ณต๊ฐ„์  ์ง€์—ญ์„ฑ):
  - ์ธ์ ‘ํ•œ ๋ฐ์ดํ„ฐ๋„ ๊ฐ™์ด ์‚ฌ์šฉํ•  ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ
๊ทธ๋ž˜์„œ CPU๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, ๊ทผ์ฒ˜ ๋ฉ”๋ชจ๋ฆฌ ๋ฉ์–ด๋ฆฌ(=Cache Line) ์ „์ฒด๋ฅผ ๋ฏธ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ด (๋ณดํ†ต ํ•œ cache line์€ 64๋ฐ”์ดํŠธ)

4. ์บ์‹œ ๋ฏธ์Šค๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ

- ๋ถˆ์—ฐ์† ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ
- ๋ฌด์ž‘์œ„ ํฌ์ธํ„ฐ ์ ‘๊ทผ
- ํฐ ๊ตฌ์กฐ์ฒด๋ฅผ ์ง€๋‚˜์น˜๊ฒŒ ๋ณต์‚ฌ
- ๋‹ค์ค‘ ์“ฐ๋ ˆ๋“œ์—์„œ ๋™์ผํ•œ ์บ์‹œ๋ผ์ธ ์ ‘๊ทผ (false sharing)

5. Unity/ECS ๊ด€์ ์—์„œ

ECS + Chunk ๊ตฌ์กฐ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋””์ž์ธ๋จ
- Chunk๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฐ์†๋œ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ๋จ
- ๋”ฐ๋ผ์„œ for-loop๋กœ ์ˆœํšŒํ•  ๋•Œ Cache Hit๊ฐ€ ๊ทน๋Œ€ํ™”
- ๋ฐ˜๋Œ€๋กœ, GameOjbect + MonoBehaviour๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฐ๋ฐœ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์žˆ์Œ -> Cache Miss ์ž์ฃผ ๋ฐœ์ƒ

6. ์˜ˆ์‹œ ์ฝ”๋“œ ๋น„๊ต (Cache Friendly vs Cache Unfriendly)

[Cache Unfriendly]

class Unit {
    public float posX, posY, velX, velY;
}

Unit[] units = new Unit[100000];

for (int i = 0; i < units.Length; i++) {
    units[i].posX += units[i].velX;
}

-> ๊ฐ Unit์ด ๋ฉ”๋ชจ๋ฆฌ ์ƒ ํฉ์–ด์ ธ ์žˆ์–ด ์บ์‹œ ๋ฏธ์Šค ๋ฐœ์ƒ

[Cache Friendly - ECS ์Šคํƒ€์ผ]

struct Position { public float x, y; }
struct Velocity { public float x, y; }

NativeArray<Position> positions;
NativeArray<Velocity> velocities;

for (int i = 0; i < positions.Length; i++) {
    positions[i].x += velocities[i].x;
}

-> ์—ฐ์†๋œ ๋ฉ”๋ชจ๋ฆฌ, ์บ์‹œ ํžˆํŠธ์œจ ๋†’์Œ

- CPU ์บ์‹œ๋Š” RAM๋ณด๋‹ค ๋น ๋ฅธ ์ค‘๊ฐ„ ์ €์žฅ์†Œ
- Cache Hit = ๋น ๋ฅด๊ฒŒ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
- Cache Miss = RAM ์ ‘๊ทผ -> ๋А๋ ค์ง
- ์—ฐ์†๋œ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ & ์ง€์—ญ์„ฑ ๊ณ ๋ ค๊ฐ€ ํ•ต์‹ฌ
- Unity ECS๋Š” ์บ์‹œ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋จ

๋ฉ€ํ‹ฐ์Šค๋ ˆ๋”ฉ ์„ฑ๋Šฅ ์ตœ์ ํ™” : False Sharing๊ณผ Data Alignment ์ดํ•ดํ•˜๊ธฐ

1. False Sharing (๊ฐ€์งœ ๊ณต์œ )

๋ฉ€ํ‹ฐ์Šค๋ ˆ๋”ฉ ํ™˜๊ฒฝ์—์„œ ๋‘ ๊ฐœ ์ด์ƒ์˜ ์Šค๋ ˆ๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ์ ‘๊ทผํ•œ๋‹ค๊ณ  ํ•ด๋„, ๊ทธ ๋ฐ์ดํ„ฐ๋“ค์ด ๊ฐ™์€ ์บ์‹œ ๋ผ์ธ(Cache Line)์— ์กด์žฌํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์Œ
์™œ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š”๊ฐ€?
- CPU๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์บ์‹œ ๋ผ์ธ ๋‹จ์œ„(64๋ฐ”์ดํŠธ ๋“ฑ)๋กœ ๋ถˆ๋Ÿฌ์˜ด
- ์Šค๋ ˆ๋“œ A๊ฐ€ ๋ณ€์ˆ˜ a๋ฅผ ์ˆ˜์ • -> ๊ฐ™์€ ์บ์‹œ ๋ผ์ธ์— ์žˆ๋Š” ์Šค๋ ˆ๋“œ B์˜ ๋ณ€์ˆ˜ b๋„ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๋ฌดํšจํ™”๋จ
- ๊ทธ ๊ฒฐ๊ณผ, CPU๋Š” ์บ์‹œ๋ฅผ ๊ณ„์† ๋™๊ธฐํ™”(sync)ํ•ด์•ผ ํ•˜๋ฉฐ, ์„ฑ๋Šฅ ์ €ํ•˜๋กœ ์ด์–ด์ง

์˜ˆ์‹œ

public class SharedData
{
    public int a; // Thread 1 ์‚ฌ์šฉ
    public int b; // Thread 2 ์‚ฌ์šฉ
}

- a์™€ b๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์“ฐ๋ ˆ๋“œ๊ฐ€ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๊ฐ™์€ 64๋ฐ”์ดํŠธ ์ •๋„)์— ๋“ค์–ด๊ฐ€ ์žˆ์„ ํ™•๋ฅ ์ด ๋†’์Œ
- Thread 1์ด a๋ฅผ ์ˆ˜์ • -> CPU๋Š” a๊ฐ€ ์žˆ๋Š” ์บ์‹œ ๋ผ์ธ์„ ๋‹ค๋ฅธ ์ฝ”์–ด์™€ ๋™๊ธฐํ™”ํ•ด์•ผ ํ•จ
- ๊ทธ ๊ฒฐ๊ณผ Thread 2๋„ b์— ์ ‘๊ทผํ•  ๋•Œ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ์บ์‹œ ๋ฏธ์Šค๋‚˜ ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฐœ์ƒ

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•:
- ์บ์‹œ ๋ผ์ธ ํŒจ๋”ฉ(Padding)์„ ๋„ฃ์–ด์„œ ๋ถ„๋ฆฌ์‹œํ‚ด
๋ฐ์ดํ„ฐ๋ฅผ ์„œ๋กœ ๋‹ค๋ฅธ ์บ์‹œ๋ผ์ธ์— ๋ฐฐ์น˜ํ•˜๋Š” ๊ฒƒ

[StructLayout(LayoutKind.Explicit, Size = 128)] // 64๋ฐ”์ดํŠธ ์ด์ƒ์œผ๋กœ ๋ถ„๋ฆฌ
public struct PaddedInt
{
    [FieldOffset(64)] public int value;
}

- StructLayout์„ ์ด์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ธ์œ„์ ์œผ๋กœ ๋„์›Œ์„œ false sharing ๋ฐฉ์ง€
- Unity์—์„œ๋„ ๋น„์Šทํ•œ ๋ฐฉ์‹์˜ ์บ์‹œ ๋ผ์ธ ํŒจ๋”ฉ ๊ธฐ๋ฒ•์ด ์‚ฌ์šฉ๋จ

๋˜๋Š” Unity์˜ Burst/Jobs์—์„œ๋Š”:

[StructLayout(LayoutKind.Sequential)]
public struct Counter1 : IJob
{
    [NativeDisableParallelForRestriction]
    public NativeArray<int> array;

    public void Execute()
    {
        array[0]++;
    }
}

-> NativeArray<int>์—์„  false sharing์ด ๋‚  ์ˆ˜ ์žˆ์œผ๋‹ˆ index๋งˆ๋‹ค 64๋ฐ”์ดํŠธ ์ •๋„ ๋„์›Œ์„œ ์ ‘๊ทผํ•˜๊ฑฐ๋‚˜ NativeArray<CustomPaddedStruct>๋ฅผ ์‚ฌ์šฉ


2. Data Alignment (๋ฐ์ดํ„ฐ ์ •๋ ฌ)

์ •์˜:
CPU๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ฐ”์ดํŠธ ๋‹จ์œ„๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ์–ด์•ผ ํšจ์œจ์ ์œผ๋กœ ์ ‘๊ทผ ๊ฐ€๋Šฅ
์ž˜ ์ •๋ ฌ๋œ ๋ฐ์ดํ„ฐ๋Š” ํ•œ ๋ฒˆ์— ์ฝ์„ ์ˆ˜ ์žˆ์Œ, ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์—ฌ๋Ÿฌ ๋ฒˆ ์ฝ๊ฑฐ๋‚˜ ๋А๋ ค์ง

struct A {
    byte a;
    int b;
}


- ์œ„ ๊ตฌ์กฐ์ฒด์—์„œ byte a๋Š” 1๋ฐ”์ดํŠธ์ง€๋งŒ, int b๋Š” 4๋ฐ”์ดํŠธ ์ •๋ ฌ์ด ํ•„์š”ํ•จ -> ์‚ฌ์ด์— 3๋ฐ”์ดํŠธ ํŒจ๋”ฉ์ด ์ž๋™ ์‚ฝ์ž… ๋จ

Unity์—์„œ ์ฃผ์˜ํ•  ์ :

- IComponentData ๊ตฌ์กฐ์ฒด๋Š” ๊ฐ€๋Šฅํ•œ ํ•œ 4,8,16๋ฐ”์ดํŠธ ๋‹จ์œ„๋กœ ์ •๋ ฌํ•˜๋Š” ๊ฒŒ ์ข‹์Œ
- float3๋Š” 16๋ฐ”์ดํŠธ ์ •๋ ฌ๋จ -> ๋’ค์— int๊ฐ€ ์˜ค๋ฉด ๋ถˆํ•„์š”ํ•œ ์ •๋ ฌ๋น„์šฉ ๋ฐœ์ƒ ๊ฐ€๋Šฅ

์ •๋ ฌ ์ตœ์ ํ™”

// ๋‚˜์œ ์˜ˆ
struct Bad {
    public int a;
    public byte b;
    public float3 c;
}

// ์ข‹์€ ์˜ˆ
struct Good {
    public float3 c;
    public int a;
    public byte b;
}




๋ฐ˜์‘ํ˜•

+ Recent posts