Provisioning compute
In this lab, we'll use Karpenter to provision AWS Trainium nodes specifically designed for accelerated machine learning inference. Trainium is AWS's purpose-built ML accelerator that provides high performance and cost-effectiveness for running inference workloads like our Mistral-7B model.
To learn more about Karpenter, check out the Karpenter module in this workshop.
Karpenter has already been installed in our EKS cluster and runs as a Deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
karpenter 2/2 2 2 11m
Let's review the configuration for the Karpenter NodePool that we'll be using to provision Trainium instances:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: trainium-trn1
spec:
template:
metadata:
labels:
instanceType: trn1.2xlarge
provisionerType: Karpenter
neuron.amazonaws.com/neuron-device: "true"
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
taints:
- key: aws.amazon.com/neuron
value: "true"
effect: "NoSchedule"
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["trn1.2xlarge"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: trainium-trn1
limits:
aws.amazon.com/neuron: 2
cpu: 16
memory: 64Gi
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: trainium-trn1
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 256Gi
iops: 16000
throughput: 1000
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
sed -i "s/^max_concurrent_downloads_per_image = .*$/max_concurrent_downloads_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml
sed -i "s/^max_concurrent_unpacks_per_image = .*$/max_concurrent_unpacks_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml
--//
Content-Type: application/node.eks.aws
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
featureGates:
FastImagePull: true
--//
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
aws-neuron: "true"
We're asking the NodePool to start all new nodes with a Kubernetes label provisionerType: Karpenter
, which will allow us to specifically target Karpenter nodes with Pods for demonstration purposes. Since there are multiple nodes being autoscaled by Karpenter, there are additional labels added such as instanceType: trn1.2xlarge
to indicate that this Karpenter node should be assigned to the trainium-trn1
pool.
The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type
to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type
to limit to a subset of specific instance types. You can learn which other properties are available here.
A Taint defines a specific set of properties that allow a node to repel a set of Pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that Pods are properly scheduled onto the appropriate nodes. You can learn more about the other properties in this resource.
A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.
Let's create the NodePool:
ec2nodeclass.karpenter.k8s.aws/trainium-trn1 created
nodepool.karpenter.sh/trainium-trn1 created
Once properly deployed, check for the NodePools:
NAME NODECLASS NODES READY AGE
trainium-trn1 trainium-trn1 0 True 31s
As seen from the above command the NodePool has been properly provisioned, allowing Karpenter to provision new nodes as needed. When we deploy our ML workload in the next step, Karpenter will automatically create the required Trainium instances based on the resource requests and limits we specify.