Linux Programmer | RHCE | RHCSA

Search This Blog

Wednesday, 25 June 2025

MIG - NVIDIA

The new Multi-Instance GPU (MIG) feature allows GPUs (starting with NVIDIA Ampere architecture) to be securely partitioned into up to seven separate GPU Instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization. 

 This feature is particularly beneficial for workloads that do not fully saturate the GPU’s compute capacity and therefore users may want to run different workloads in parallel to maximize utilization.

MIG supports the following deployment configurations:

  • Bare-metal, including containers

  • GPU pass-through virtualization to Linux guests on top of supported hypervisors

  • vGPU on top of supported hypervisors

     https://docs.nvidia.com/datacenter/tesla/mig-user-guide/_images/gpu-mig-overview.jpg 

 

Split single GPU instance into multiple GPU.

 

For e.x.

1 bare metal server have 141GB of Single GPU Card.

 

With MiG we can create 7 multiple mini-GPU instances with 20GB of memory.

 

Each contains:

Compute cores

Memory

Cache

Scheduling engine

 

Each behaves like single independent GPU to the system.

 

GPU instance (GI)

- A Slice of CPU that includes compute and cache resources. (Hardware allocation)

 

Compute instance (CI) -

like Virtual machine or containerized environment that uses GI. (Execution unit)

 

Compute slices:

 

 

MIG Profile - Configuration for splitting the GPU.

For e.g.

1g.5gb

4g.5gb

 

 

40 GB GPU have

8x5GB Memory slices - Portion of the GPU Memory like VRAM.

7 compute slices -  Part of GPU compute power

 

 

If GPU has 7 compute engines called (GPCs), a MIG Instance will get:

1 slice = lower power (e.g. 1g.5gb)

4 slice = Medium power (e.g. 4g.20gb)

7 slice = full GPU (e.g. 7g.40gb)

 

 

Note:

How to check GPU compute engines: nvidia-smi -q

 

 

  Compute instance:

 

3c.4g.20gb

 

You assigns three CIs(Containers, Execution engines) to use 4 GPU compute and 20GB Memory.

 

Means 3 compute slices can be combines together to create 3c.4g.20gb

Sharing GPU between 3 apps.

 

If it is 4c.4g.20gb is equal to 4g.20gb.

 

 

 Difference between 4g.20gb and 2c.4g.20gb

4g.20gb

 

- 1 big GPU instance

- Runs a single job

- All 4 GPU slices and 20gb memory are used together by one process

- one big process can run.

 

2c.4g.20gb

 

- still 4 slices and 20GB memory.

- but it split into 2 compute instances.

- You can run two seperate jobs(containers, users or apps)

- Run two jobs seperately.

 

GPU instance and compute instance are enumurated in /proc

 

# ls -l /proc/driver/nvidia-caps/

 

To view GPU Nvidia drivers UUID:

 

# nvidia-smi -L

 

Enable MIG Mode:

Check MIG is enabled or not.

# nvidia-smi -i 0

# nvidia-smi -i 0 --query-gpu=pci.bus_id,mig.mode.current --format=csv

 

To enable it,

# sudo nvidia-smi -i <GPU ID> -mig 1

 

# sudo nvidia-smi -i 0 -mig 1 

 

If no GPU ID is specified, MIG mode will be applied to all the GPUs on the system.

 

When MIG is enabled on the GPU, depending on the product the driver will attempt to reset the GPU so the MIG mode can take affect.

 

On Reboot MIG Mode will be disabled explicitly.

 

In some cases you need to nvsm and dgsm service to enable MIG mode.

 

# sudo systemctl stop nvsm

 

# sudo systemctl stop dcgm

 

# sudo nvidia-smi -i 0 -mig 1

Enabled MIG Mode for GPU 00000000:07:00.0

All done.

 

List all possible GPU instance profiles:

 

# nvidia-smi mig -lgip

 

User can create two instance of 3g.71gb

Seven instance of 1g.18gb

 

 

List the possible placements of GPU Instances:

 

nvidia-smi mig -lgipp

 


 

List the possible placements of Compute instances:

 

# nvidia-smi mig -lcipp


 

 Now Create GPU instances:

 

Simply Enabling MIG Mode on the GPU is not enough.

Without Creating GPU instances CUDA workloads can not be run on the GPU.

This are not persistent on reboot. So need to recreate it.

For that you need to use tool. mig-parted.

 

 

Check available GPU instance profiles:

# nvidia-smi mig -lgip

 

Create MIG profile:

# sudo nvidia-smi mig -cgi 9,3g.20gb -C

Here 0 - profile ID

9 - Profile name

By default it is creating in GID 0.

 

OR

 

# sudo nvidia-smi mig -cgi 19,14,5

 

This will create in default GID.

 

How to get the profile ID - nvidia-smi mig -lgip

 

List the created GPU instances:

# nvidia-smi mig -lgi

 

 

Enable MIG on Specific GPU

 

Enable MIG on GPU ID 1:

# sudo nvidia-smi -i <GPU ID> -mig 1

# sudo nvidia-smi -i 1 -mig 1

 

 

After enabling MIG, Check supported profiles:

# nvidia-smi mig -lgip -i 1

 

Create GPU instance on GPU ID 1:

 

# nvidia-smi mig -cgi 19,15 -i 1 -C

OR

# OR

# nvidia-smi mig -cgi 19,1g.18gb -i 1 -C

OR

# sudo nvidia-smi mig -cgi 14,19,19,19,19,19

 

Here,

19 - is the profile ID

1g.35gb - is the profile name

-C : This flag will create Compute instances along with GPU instances.

 

Note:

Once the GPU instances are created you need to create corrsponding CPU instances CI. By using -c option.

 

- This creation should be in geometry. Largest first.

 

 

If any error during creating then clean the GPU:

sudo nvidia-smi mig -dci -i 1

sudo nvidia-smi mig -dgi -i 1

 

 

 

Now list the Created GI instance:

# nvidia-smi mig -lgi -i 1

OR

# nvidia-smi mig -lgi ## for listing the GPU instances

# nvidia-smi mig -lci ## for listing the Compute instance

 

Now verify that the GIs and corresponding CIs are created:

# nvidia-smi

 

Available GPU profiles capacity:

# nvidia-smi mig -lgip -i 1

 

List All available GPU instances (GI)

# sudo nvidia-smi mig -lgip -gi 1

 

List All available Compute instances (CI)

# sudo nvidia-smi mig -lgip -ci 1

 

List all created GPU instances:

# nvidia-smi -L

# nvidia-smi

 

 

- Delete GI

Check the available MIG instances

# nvidia-smi

 

- First delete the CI

# sudo nvidia-smi mig -dci -i <GPU ID> -ci <CI ID>

# sudo nvidia-smi mig -dci -i 1 -ci 0

 

Note: The GI is also deleted which have the CI ID 0

 

If wanted to remove only GI then use

# sudo nvidia-smi mig -dgi -i <GPU_ID> -gi <GI_ID>

# sudo nvidia-smi mig -dci -gi 13 -i 1

 

Delete all CIs from GI 1.

 

# sudo nvidia-smi mig -dci -gi 1 -i 1

# sudo nvidia-smi mig -dgi -gi 1 -i 1

 

 

Destroy all CI and GI

nvidia-smi mig -dci && sudo nvidia-smi mig -dgi

 

 

 

Compute instances:

Use case: This is required because if we wanted to give 2g.72gb then this is not available by default. So we can achive this by spliting it from 3g.72gb.

So we can split it up by 2c.3g.72gb and 1c.3g.72gb.


    

    

 

The Split of Compute instance into multiple CIs depend on:

nvidia-smi mig -lcipp

Only 1,2,7 profile ID have multi instance CI supported.

If, GPU  0 GI  2 Profile ID  7 Placements: {0,2}:2

Then you can create 2 CIs under that GI.

 

 

For 2c.3g.72gb

nvidia-smi mig -gi <GI_ID> -cci <Profile ID> -C

 

# nvidia-smi mig -gi 2 -cci 1 -C

 

If error,

# nvidia-smi mig -gi 2 -cci 1 -C

Unable to create a compute instance on GPU  0 GPU instance ID  2 using profile 1: Insufficient Resources

Failed to create compute instances: Insufficient Resources

 

Solution:

It might possible that CI is already created automatically with CI.

#nvidia-smi mig -lci 

 

How to resolve?

 

Delete the created CI instance.

# nvidia-smi mig dci -gi <Instance ID> -ci Profile ID>

 

# nvidia-smi mig -dci -gi 2 -ci 1 -i 0

 

Then try to re-create it,

# nvidia-smi mig -gi 2 -cci 1 -C -i 0 # for 2c.3g.72gb

# nvidia-smi mig -gi 2 -cci 0 -C -i 0 # for 1c.3g.72gb

 

Here,

-cci 0 - 1c

-cci 1 - 2c

-cci 2 - 3c

 

Compute instance are created automatically generally when you create GPU instances. Especially when you are using -C flag.

 

# nvidia-smi mig -lci

 

If it is not created automatically then follow below steps.

 

Further level of concurrencly achived by using CI.

Now if you wanted to run three CUDA processes can be run on the same GI.

 

# nvidia-smi mig -lgi -i 1


List Already created compute instances:

 

# nvidia-smi mig -lci -i 7

 

# nvidia-smi mig -lci

 


List the GI:

# nvidia-smi


 

List all supported compute instance profiles:

 

# sudo nvidia-smi mig -lcip -gi <GI ID> -i <GPU ID>

 

# sudo nvidia-smi mig -lcip -gi 2 -i 7


 

Create 3 CIs, each of type 1c compute capacity (profile ID 0) on the first GI:

 

# sudo nvidia-smi mig -cci <profile_ID> -gi <GPU instance ID> -i <GPU >

# sudo nvidia-smi mig -cci 7 -gi 2 -i 1

 

Here 7 - Profile ID

1 - GPU instance ID

1 - GPU ID

 

OR

Create multiple,

# sudo nvidia-smi mig -cci 0,0,0 -gi 1

 

 

Now GI’s and CI’s are created.

# nvidia-smi

 

Error:

# nvidia-smi mig -cgi 9,19,19,19,19 -i 0 -C

Unable to create a GPU instance on GPU  0 using profile 9: In use by another client

Failed to create GPU instances: In use by another client

 

Then check some processes might executing and which are using GPU.

sudo lsof /dev/nvidia*

 

Kill the process and again re-create the Instances.

 

 

NVidia mig-parted

 

When we create GPU partitions and reboot the server, the partitions GPU will be automatically removed and when we create it the UUID will be changed.

 

To overcome this issue. We have to use nvidia-mig-parted tool.

 

Link -

 

Install nvidia-mig-parted:

https://github.com/NVIDIA/mig-parted/releases

Download deb file and install it.

 

Clone the mig-parted git repository:

cd /home/script

# git clone https://github.com/purvalpatel/mig-parted.git

 

 

 

Now create/edit config YAML file for the configuration inside /home/script/mig-parted/examples/config.yaml

 

Location on live server: /home/script/mig-parted/

# config.yaml

 

version: v1

mig-configs:

  - devices: [0]

    mig-enabled: true

    mig-devices:

      1c.3g.71gb: 1

      1g.18gb: 4

      2c.3g.71gb: 1

  - devices: [1, 2, 3, 4, 5, 6]

    mig-enabled: false

    mig-devices: {}

  - devices: [7]

    mig-enabled: true

    mig-devices:

      1c.3g.71gb: 1

      1g.18gb: 3

      2c.3g.71gb: 1

 

 

Verify the configuration are proper or not.

# nvidia-mig-parted assert -f config.yaml

 

Apply the changes:

# nvidia-mig-parted apply -f config.yaml

 

Verify it is working fine or not?

# reboot

 

After reboot apply below command.

# nvidia-mig-parted apply -f config.yaml

 

The same partitions will be created after that with the same UUID.

 

 

 

MIG - NVIDIA

The new Multi-Instance GPU (MIG) feature allows GPUs (starting with NVIDIA Ampere architecture) to be securely partitioned into up to seven ...