Kubernetes on bluefield 2

I have been working on setting up workers nodes (through K8s and K3s) on the bluefield 2 cards (Ubuntu 20.04) and so far I have had had no success. Major issues include faulty kubelet status (loaded, but not active), bus error while downloading packages, etc. Anyone here who has successfully brought a heterogeneous cluster (x86/aarch64) using bluefield 2? If so, please let me know. Thanks.

Hey Vivek, could you share with us some more details about these errors? Are there any logs or errors or failures you can share about that faulty kubelet status?

Can you also share the console/terminal output of the bus error that you see while downloading packages?

Thanks!

Hi Justin, we have trouble setting up a K8 worker node on bluefield2.
The main problem is that curl and wget commands do not work on Bluefield 2 as explained here

I wonder if you have any solution for that?

Hi,
Regarding the wget/curl issue, if IPSEC isn’t used/needed, you can change the default OpenSSL behavior for the entire DPU, and it will workaround the current OpenSSL bug.

The BFB comes with two config files,

  1. /etc/ssl/openssl.cnf.orig (PKA is disabled)
  2. /etc/ssl/openssl.cnf.mlnx (PKA is enabled)

Based on the need, either of the above two config files can be copied to /etc/ssl/openssl.cnf to be used by OpenSSL package.

Ex: To disable PKA, execute # cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf

However, this step will need to be repeated on every reboot (due to restrictions from the IPSEC daemon).

Hi Justin,

Would you mind to post the output of “systemctl status kubelet” and as well as “cat /etc/cni/net.d/99-loopback.conf”?

Also do you have k8s version (kubeadm,kubelet, kubectl) and BF-2 BFB image version?

Thanks eitkin. That did not solve the curl/wget issue either. (w/o the need for IPSEC).

So I got the cluster working with bluefield2 as one of the worker nodes. However, none of the images appear to be loading on the ARM cores. Attached below are the outputs of:

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-78fcd69978-sjlj5 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system coredns-78fcd69978-z7v99 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system etcd-themis 1/1 Running 3 2d8h 10.xx.xx.xx themis
kube-system kube-apiserver-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-controller-manager-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-flannel-ds-4smmg 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-flannel-ds-k2l7j 0/1 CrashLoopBackOff 662 ( ago) 2d8h 10.xx.xx.yy bluefield
kube-system kube-proxy-k92bg 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-proxy-tkk5x 1/1 Running 0 2d8h 10.xx.xx.yy bluefield
kube-system kube-scheduler-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis

$ kubectl describe pods kube-flannel-ds-k2l7j -n kube-system

Hi, you mentioned that it did not solve the issue? I just tested it again on my setup and wget worked as a charm. The command I used is the one noted above:
# cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf

Could you maybe post the contents of your current “/etc/ssl/openssl.cnf” file, as well as the wget command you are trying to use?

Thanks,
Eyal.

Hi,

Can you please create a file /etc/cni/net.d/99-loopback.conf on the BF2 and add this following:
{
“cniVersion”: “0.3.1”,
“type”: “loopback”
}

Then restart kubelet.

On the kmaster node, I would suggest you using node selector/affinity to avoid flannel to currently run on the bluefield.

Please post me the output of:
kubectl get nodes -o wide

root@bluefield:/home/ubuntu# curl -sLS https://get.arkade.dev
curl: (35) error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied

root@bluefield:/home/ubuntu# ls /etc/ssl/openssl.cnf
/etc/ssl/openssl.cnf

Hi Eitkin, here is the output of /etc/ssl/openssl.cnf file:
root@bluefield:~# cat /etc/ssl/openssl.cnf

OpenSSL example configuration file.

This is mostly being used for generation of certificate requests.

Note that you can include other files from the main configuration

file using the .include directive.

#.include filename

This definition stops the following lines choking if HOME isn’t

defined.

HOME = .

Extra OBJECT IDENTIFIER info:

#oid_file = $ENV::HOME/.oid

oid_section = new_oids

To use this configuration file with the “-extfile” option of the

“openssl x509” utility, name here the section containing the

X.509v3 extensions to use:

extensions =

(Alternatively, use a configuration file that has only

X.509v3 extensions in its main [= default] section.)

[ new_oids ]

We can add new OIDs in here for use by ‘ca’, ‘req’ and ‘ts’.

Add a simple OID like this:

testoid1=1.2.3.4

Or use config file substitution like this:

testoid2=${testoid1}.5.6

Policies used by the TSA examples.

tsa_policy1 = 1.2.3.4.1

tsa_policy2 = 1.2.3.4.5.6

tsa_policy3 = 1.2.3.4.5.7

####################################################################

[ ca ]

default_ca = CA_default # The default ca section

####################################################################

[ CA_default ]

dir = ./demoCA # Where everything is kept

certs = $dir/certs # Where the issued certs are kept

crl_dir = $dir/crl # Where the issued crl are kept

database = $dir/index.txt # database index file.

#unique_subject = no # Set to ‘no’ to allow creation of

several certs with same subject.

new_certs_dir = $dir/newcerts # default place for new certs.

certificate = $dir/cacert.pem # The CA certificate

serial = $dir/serial # The current serial number

crlnumber = $dir/crlnumber # the current crl number

must be commented out to leave a V1 CRL

crl = $dir/crl.pem # The current CRL

private_key = $dir/private/cakey.pem# The private key

x509_extensions = usr_cert # The extensions to add to the cert

Comment out the following two lines for the “traditional”

(and highly broken) format.

name_opt = ca_default # Subject Name options

cert_opt = ca_default # Certificate field options

Extension copying option: use with caution.

copy_extensions = copy

Extensions to add to a CRL. Note: Netscape communicator chokes on V2 CRLs

so this is commented out by default to leave a V1 CRL.

crlnumber must also be commented out to leave a V1 CRL.

crl_extensions = crl_ext

default_days = 365 # how long to certify for

default_crl_days= 30 # how long before next CRL

default_md = default # use public key default MD

preserve = no # keep passed DN ordering

A few difference way of specifying how similar the request should look

For type CA, the listed attributes must be the same, and the optional

and supplied fields are just that :-)

policy = policy_match

For the CA policy

[ policy_match ]

countryName = match

stateOrProvinceName = match

organizationName = match

organizationalUnitName = optional

commonName = supplied

emailAddress = optional

For the ‘anything’ policy

At this point in time, you must list all acceptable ‘object’

types.

[ policy_anything ]

countryName = optional

stateOrProvinceName = optional

localityName = optional

organizationName = optional

organizationalUnitName = optional

commonName = supplied

emailAddress = optional

####################################################################

[ req ]

default_bits = 2048

default_keyfile = privkey.pem

distinguished_name = req_distinguished_name

attributes = req_attributes

x509_extensions = v3_ca # The extensions to add to the self signed cert

Passwords for private keys if not present they will be prompted for

input_password = secret

output_password = secret

This sets a mask for permitted string types. There are several options.

default: PrintableString, T61String, BMPString.

pkix : PrintableString, BMPString (PKIX recommendation before 2004)

utf8only: only UTF8Strings (PKIX recommendation after 2004).

nombstr : PrintableString, T61String (no BMPStrings or UTF8Strings).

MASK:XXXX a literal mask value.

WARNING: ancient versions of Netscape crash on BMPStrings or UTF8Strings.

string_mask = utf8only

req_extensions = v3_req # The extensions to add to a certificate request

[ req_distinguished_name ]

countryName = Country Name (2 letter code)

countryName_default = AU

countryName_min = 2

countryName_max = 2

stateOrProvinceName = State or Province Name (full name)

stateOrProvinceName_default = Some-State

localityName = Locality Name (eg, city)

0.organizationName = Organization Name (eg, company)

0.organizationName_default = Internet Widgits Pty Ltd

we can do this but it is not needed normally :-)

#1.organizationName = Second Organization Name (eg, company)

#1.organizationName_default = World Wide Web Pty Ltd

organizationalUnitName = Organizational Unit Name (eg, section)

#organizationalUnitName_default =

commonName = Common Name (e.g. server FQDN or YOUR name)

commonName_max = 64

emailAddress = Email Address

emailAddress_max = 64

SET-ex3 = SET extension number 3

[ req_attributes ]

challengePassword = A challenge password

challengePassword_min = 4

challengePassword_max = 20

unstructuredName = An optional company name

[ usr_cert ]

These extensions are added when ‘ca’ signs a request.

This goes against PKIX guidelines but some CAs do it and some software

requires this to avoid interpreting an end user certificate as a CA.

basicConstraints=CA:FALSE

Here are some examples of the usage of nsCertType. If it is omitted

the certificate can be used for anything except object signing.

This is OK for an SSL server.

nsCertType = server

For an object signing certificate this would be used.

nsCertType = objsign

For normal client use this is typical

nsCertType = client, email

and for everything including object signing:

nsCertType = client, email, objsign

This is typical in keyUsage for a client certificate.

keyUsage = nonRepudiation, digitalSignature, keyEncipherment

This will be displayed in Netscape’s comment listbox.

nsComment = “OpenSSL Generated Certificate”

PKIX recommendations harmless if included in all certificates.

subjectKeyIdentifier=hash

authorityKeyIdentifier=keyid,issuer

This stuff is for subjectAltName and issuerAltname.

Import the email address.

subjectAltName=email:copy

An alternative to produce certificates that aren’t

deprecated according to PKIX.

subjectAltName=email:move

Copy subject details

issuerAltName=issuer:copy

#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem

#nsBaseUrl

#nsRevocationUrl

#nsRenewalUrl

#nsCaPolicyUrl

#nsSslServerName

This is required for TSA certificates.

extendedKeyUsage = critical,timeStamping

[ v3_req ]

Extensions to add to a certificate request

basicConstraints = CA:FALSE

keyUsage = nonRepudiation, digitalSignature, keyEncipherment

[ v3_ca ]

Extensions for a typical CA

PKIX recommendation.

subjectKeyIdentifier=hash

authorityKeyIdentifier=keyid:always,issuer

basicConstraints = critical,CA:true

Key usage: this is typical for a CA certificate. However since it will

prevent it being used as an test self-signed certificate it is best

left out by default.

keyUsage = cRLSign, keyCertSign

Some might want this also

nsCertType = sslCA, emailCA

Include email address in subject alt name: another PKIX recommendation

subjectAltName=email:copy

Copy issuer details

issuerAltName=issuer:copy

DER hex encoding of an extension: beware experts only!

obj=DER:02:03

Where ‘obj’ is a standard or added object

You can even override a supported extension:

basicConstraints= critical, DER:30:03:01:01:FF

[ crl_ext ]

CRL extensions.

Only issuerAltName and authorityKeyIdentifier make any sense in a CRL.

issuerAltName=issuer:copy

authorityKeyIdentifier=keyid:always

[ proxy_cert_ext ]

These extensions should be added when creating a proxy certificate

This goes against PKIX guidelines but some CAs do it and some software

requires this to avoid interpreting an end user certificate as a CA.

basicConstraints=CA:FALSE

Here are some examples of the usage of nsCertType. If it is omitted

the certificate can be used for anything except object signing.

This is OK for an SSL server.

nsCertType = server

For an object signing certificate this would be used.

nsCertType = objsign

For normal client use this is typical

nsCertType = client, email

and for everything including object signing:

nsCertType = client, email, objsign

This is typical in keyUsage for a client certificate.

keyUsage = nonRepudiation, digitalSignature, keyEncipherment

This will be displayed in Netscape’s comment listbox.

nsComment = “OpenSSL Generated Certificate”

PKIX recommendations harmless if included in all certificates.

subjectKeyIdentifier=hash

authorityKeyIdentifier=keyid,issuer

This stuff is for subjectAltName and issuerAltname.

Import the email address.

subjectAltName=email:copy

An alternative to produce certificates that aren’t

deprecated according to PKIX.

subjectAltName=email:move

Copy subject details

issuerAltName=issuer:copy

#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem

#nsBaseUrl

#nsRevocationUrl

#nsRenewalUrl

#nsCaPolicyUrl

#nsSslServerName

This really needs to be in place for it to be a proxy certificate.

proxyCertInfo=critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo

####################################################################

[ tsa ]

default_tsa = tsa_config1 # the default TSA section

[ tsa_config1 ]

These are used by the TSA reply generation only.

dir = ./demoCA # TSA root directory

serial = $dir/tsaserial # The current serial number (mandatory)

crypto_device = builtin # OpenSSL engine to use for signing

signer_cert = $dir/tsacert.pem # The TSA signing certificate

(optional)

certs = $dir/cacert.pem # Certificate chain to include in reply

(optional)

signer_key = $dir/private/tsakey.pem # The TSA private key (optional)

signer_digest = sha256 # Signing digest to use. (Optional)

default_policy = tsa_policy1 # Policy if request did not specify it

(optional)

other_policies = tsa_policy2, tsa_policy3 # acceptable policies (optional)

digests = sha1, sha256, sha384, sha512 # Acceptable message digests (mandatory)

accuracy = secs:1, millisecs:500, microsecs:100 # (optional)

clock_precision_digits = 0 # number of digits after dot. (optional)

ordering = yes # Is ordering defined for timestamps?

(optional, default: no)

tsa_name = yes # Must the TSA name be included in the reply?

(optional, default: no)

ess_cert_id_chain = no # Must the ESS cert id chain be included?

(optional, default: no)

ess_cert_id_alg = sha1 # algorithm to compute certificate

identifier (optional, default: sha1)

Thanks,
Are you suggesting not to allow flannel to run on bluefield2?
We think the network fabric need to be hosted on all the nodes for sending the queries.
here is the output
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
bluefield Ready 2d22h v1.22.0 10.xx.xx.xx Ubuntu 20.04.2 LTS 5.4.0-1013-bluefield docker://20.10.8

themis Ready control-plane,master 2d22h v1.22.0 10.xx.xx.xx Ubuntu 20.04.2 LTS 5.4.0-81-generic docker://20.10.8

And this is the output of pods:
kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-78fcd69978-sjlj5 1/1 Running 0 2d22h 10.93.120.2 themis
kube-system coredns-78fcd69978-z7v99 1/1 Running 0 2d22h 10.93.120.3 themis
kube-system etcd-themis 1/1 Running 3 2d22h 10.xx.xx.xx themis
kube-system kube-apiserver-themis 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-controller-manager-themis 1/1 Running 0 2d22h 10.93.226.77 themis
kube-system kube-flannel-ds-4smmg 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-flannel-ds-k2l7j 0/1 CrashLoopBackOff 823 ( ago) 2d22h 10.93.231.112 bluefield
kube-system kube-proxy-k92bg 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-proxy-tkk5x 1/1 Running 0 2d22h 10.xx.xx.xx bluefield
kube-system kube-scheduler-themis 1/1 Running 0 2d22h 10.xx.xx.xx themis

Thanks. The content you posted matches that of “/etc/ssl/openssl.cnf.orig” and specifically the file doesn’t define the PKA engine as is defined in “/etc/ssl/openssl.cnf.mlnx”, meaning that the OpenSSL bug shouldn’t occur now.

Did you test curl/wget after you performed the “cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf” command? I would expect them both to work without errors.

We did – still similar errors. Rebooted the machine for a sanity check (and changed the conf yet again), yet no success. Any other point of failure that comes to your mind?

Could you please post the error you get from both “curl” and “wget”? From the curl error you posted above this seems like an https error and not a crash (segfault) like the initial error.

While I tried your above command and it worked on my setup, it would be great if you could try accessing a different server so we could check what is the cause for this issue:

This is correct. We’ll provide a CNI in future but for now we only support “host network” PODs.
The cni loopback configuration is just to make kubelet running.
You should be able to get a micro app pod deployed and running.

I’m currently running myself a NGINX pod in my setup:

root@kmaster:/# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
static-web 1/1 Running 0 13d 10.110.169.9 bf-02.internal.nvidia.com

root@kmaster:/# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.233.0.1 443/TCP 118d
nginx NodePort 10.233.44.43 8080:31822/TCP 20d app=nginx

root@bluefield:/home/ubuntu# wget https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list

–2021-08-25 03:35:11-- https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list
Resolving linux.mellanox.com (linux.mellanox.com)… 168.62.212.37
Connecting to linux.mellanox.com (linux.mellanox.com)|168.62.212.37|:443… connected.
OpenSSL: error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied
Unable to establish SSL connection.

root@bluefield:/home/ubuntu# curl -sLS https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list

curl: (35) error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied

Thanks for the output traces. The config file update indeed solved the original (segfault) issue, but it seems you encounter some certificate-based issue, as it happens for both curl/wget and it isn’t site-specific.

Could you please supply the following so we could try to reproduce it on our end?

  • Which BF Image have you installed? Is it BlueField OS version (3.7) - July? Or BlueField OS version (3.6) - March?
  • OFED version - $ ofed_info -s
  • OpenSSL’s version - $ openssl version

If you didn’t have the /etc/cni/net.d/99-loopback.conf it is likely that you aren’t using the latest BlueField OS version, in which case an update will most probably resolve this issue.

We are using Bluefield OS 3.7 version. I could see the /etc/cni/net.d/99-loopback.conf file configured correctly.

cat /etc/cni/net.d/99-loopback.conf
{
“cniVersion”: “0.3.1”,
“type”: “loopback”
}

root@bluefield:/home/ubuntu# ofed_info -s
MLNX_OFED_LINUX-5.4-1.0.3.0:

root@bluefield:/home/ubuntu# openssl version
OpenSSL 1.1.1f 31 Mar 2020