I have been working on setting up workers nodes (through K8s and K3s) on the bluefield 2 cards (Ubuntu 20.04) and so far I have had had no success. Major issues include faulty kubelet status (loaded, but not active), bus error while downloading packages, etc. Anyone here who has successfully brought a heterogeneous cluster (x86/aarch64) using bluefield 2? If so, please let me know. Thanks.
Hey Vivek, could you share with us some more details about these errors? Are there any logs or errors or failures you can share about that faulty kubelet status?
Can you also share the console/terminal output of the bus error that you see while downloading packages?
Thanks!
Hi Justin, we have trouble setting up a K8 worker node on bluefield2.
The main problem is that curl and wget commands do not work on Bluefield 2 as explained here
I wonder if you have any solution for that?
Hi,
Regarding the wget/curl issue, if IPSEC isn’t used/needed, you can change the default OpenSSL behavior for the entire DPU, and it will workaround the current OpenSSL bug.
The BFB comes with two config files,
- /etc/ssl/openssl.cnf.orig (PKA is disabled)
- /etc/ssl/openssl.cnf.mlnx (PKA is enabled)
Based on the need, either of the above two config files can be copied to /etc/ssl/openssl.cnf to be used by OpenSSL package.
Ex: To disable PKA, execute # cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf
However, this step will need to be repeated on every reboot (due to restrictions from the IPSEC daemon).
Hi Justin,
Would you mind to post the output of “systemctl status kubelet” and as well as “cat /etc/cni/net.d/99-loopback.conf”?
Also do you have k8s version (kubeadm,kubelet, kubectl) and BF-2 BFB image version?
Thanks eitkin. That did not solve the curl/wget issue either. (w/o the need for IPSEC).
So I got the cluster working with bluefield2 as one of the worker nodes. However, none of the images appear to be loading on the ARM cores. Attached below are the outputs of:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-78fcd69978-sjlj5 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system coredns-78fcd69978-z7v99 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system etcd-themis 1/1 Running 3 2d8h 10.xx.xx.xx themis
kube-system kube-apiserver-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-controller-manager-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-flannel-ds-4smmg 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-flannel-ds-k2l7j 0/1 CrashLoopBackOff 662 ( ago) 2d8h 10.xx.xx.yy bluefield
kube-system kube-proxy-k92bg 1/1 Running 0 2d8h 10.xx.xx.xx themis
kube-system kube-proxy-tkk5x 1/1 Running 0 2d8h 10.xx.xx.yy bluefield
kube-system kube-scheduler-themis 1/1 Running 0 2d8h 10.xx.xx.xx themis
Hi, you mentioned that it did not solve the issue? I just tested it again on my setup and wget worked as a charm. The command I used is the one noted above:
# cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf
Could you maybe post the contents of your current “/etc/ssl/openssl.cnf” file, as well as the wget command you are trying to use?
Thanks,
Eyal.
Hi,
Can you please create a file /etc/cni/net.d/99-loopback.conf on the BF2 and add this following:
{
“cniVersion”: “0.3.1”,
“type”: “loopback”
}
Then restart kubelet.
On the kmaster node, I would suggest you using node selector/affinity to avoid flannel to currently run on the bluefield.
Please post me the output of:
kubectl get nodes -o wide
root@bluefield:/home/ubuntu# curl -sLS https://get.arkade.dev
curl: (35) error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied
root@bluefield:/home/ubuntu# ls /etc/ssl/openssl.cnf
/etc/ssl/openssl.cnf
Hi Eitkin, here is the output of /etc/ssl/openssl.cnf file:
root@bluefield:~# cat /etc/ssl/openssl.cnf
OpenSSL example configuration file.
This is mostly being used for generation of certificate requests.
Note that you can include other files from the main configuration
file using the .include directive.
#.include filename
This definition stops the following lines choking if HOME isn’t
defined.
HOME = .
Extra OBJECT IDENTIFIER info:
#oid_file = $ENV::HOME/.oid
oid_section = new_oids
To use this configuration file with the “-extfile” option of the
“openssl x509” utility, name here the section containing the
X.509v3 extensions to use:
extensions =
(Alternatively, use a configuration file that has only
X.509v3 extensions in its main [= default] section.)
[ new_oids ]
We can add new OIDs in here for use by ‘ca’, ‘req’ and ‘ts’.
Add a simple OID like this:
testoid1=1.2.3.4
Or use config file substitution like this:
testoid2=${testoid1}.5.6
Policies used by the TSA examples.
tsa_policy1 = 1.2.3.4.1
tsa_policy2 = 1.2.3.4.5.6
tsa_policy3 = 1.2.3.4.5.7
####################################################################
[ ca ]
default_ca = CA_default # The default ca section
####################################################################
[ CA_default ]
dir = ./demoCA # Where everything is kept
certs = $dir/certs # Where the issued certs are kept
crl_dir = $dir/crl # Where the issued crl are kept
database = $dir/index.txt # database index file.
#unique_subject = no # Set to ‘no’ to allow creation of
several certs with same subject.
new_certs_dir = $dir/newcerts # default place for new certs.
certificate = $dir/cacert.pem # The CA certificate
serial = $dir/serial # The current serial number
crlnumber = $dir/crlnumber # the current crl number
must be commented out to leave a V1 CRL
crl = $dir/crl.pem # The current CRL
private_key = $dir/private/cakey.pem# The private key
x509_extensions = usr_cert # The extensions to add to the cert
Comment out the following two lines for the “traditional”
(and highly broken) format.
name_opt = ca_default # Subject Name options
cert_opt = ca_default # Certificate field options
Extension copying option: use with caution.
copy_extensions = copy
Extensions to add to a CRL. Note: Netscape communicator chokes on V2 CRLs
so this is commented out by default to leave a V1 CRL.
crlnumber must also be commented out to leave a V1 CRL.
crl_extensions = crl_ext
default_days = 365 # how long to certify for
default_crl_days= 30 # how long before next CRL
default_md = default # use public key default MD
preserve = no # keep passed DN ordering
A few difference way of specifying how similar the request should look
For type CA, the listed attributes must be the same, and the optional
and supplied fields are just that :-)
policy = policy_match
For the CA policy
[ policy_match ]
countryName = match
stateOrProvinceName = match
organizationName = match
organizationalUnitName = optional
commonName = supplied
emailAddress = optional
For the ‘anything’ policy
At this point in time, you must list all acceptable ‘object’
types.
[ policy_anything ]
countryName = optional
stateOrProvinceName = optional
localityName = optional
organizationName = optional
organizationalUnitName = optional
commonName = supplied
emailAddress = optional
####################################################################
[ req ]
default_bits = 2048
default_keyfile = privkey.pem
distinguished_name = req_distinguished_name
attributes = req_attributes
x509_extensions = v3_ca # The extensions to add to the self signed cert
Passwords for private keys if not present they will be prompted for
input_password = secret
output_password = secret
This sets a mask for permitted string types. There are several options.
default: PrintableString, T61String, BMPString.
pkix : PrintableString, BMPString (PKIX recommendation before 2004)
utf8only: only UTF8Strings (PKIX recommendation after 2004).
nombstr : PrintableString, T61String (no BMPStrings or UTF8Strings).
MASK:XXXX a literal mask value.
WARNING: ancient versions of Netscape crash on BMPStrings or UTF8Strings.
string_mask = utf8only
req_extensions = v3_req # The extensions to add to a certificate request
[ req_distinguished_name ]
countryName = Country Name (2 letter code)
countryName_default = AU
countryName_min = 2
countryName_max = 2
stateOrProvinceName = State or Province Name (full name)
stateOrProvinceName_default = Some-State
localityName = Locality Name (eg, city)
0.organizationName = Organization Name (eg, company)
0.organizationName_default = Internet Widgits Pty Ltd
we can do this but it is not needed normally :-)
#1.organizationName = Second Organization Name (eg, company)
#1.organizationName_default = World Wide Web Pty Ltd
organizationalUnitName = Organizational Unit Name (eg, section)
#organizationalUnitName_default =
commonName = Common Name (e.g. server FQDN or YOUR name)
commonName_max = 64
emailAddress = Email Address
emailAddress_max = 64
SET-ex3 = SET extension number 3
[ req_attributes ]
challengePassword = A challenge password
challengePassword_min = 4
challengePassword_max = 20
unstructuredName = An optional company name
[ usr_cert ]
These extensions are added when ‘ca’ signs a request.
This goes against PKIX guidelines but some CAs do it and some software
requires this to avoid interpreting an end user certificate as a CA.
basicConstraints=CA:FALSE
Here are some examples of the usage of nsCertType. If it is omitted
the certificate can be used for anything except object signing.
This is OK for an SSL server.
nsCertType = server
For an object signing certificate this would be used.
nsCertType = objsign
For normal client use this is typical
nsCertType = client, email
and for everything including object signing:
nsCertType = client, email, objsign
This is typical in keyUsage for a client certificate.
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
This will be displayed in Netscape’s comment listbox.
nsComment = “OpenSSL Generated Certificate”
PKIX recommendations harmless if included in all certificates.
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
This stuff is for subjectAltName and issuerAltname.
Import the email address.
subjectAltName=email:copy
An alternative to produce certificates that aren’t
deprecated according to PKIX.
subjectAltName=email:move
Copy subject details
issuerAltName=issuer:copy
#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem
#nsBaseUrl
#nsRevocationUrl
#nsRenewalUrl
#nsCaPolicyUrl
#nsSslServerName
This is required for TSA certificates.
extendedKeyUsage = critical,timeStamping
[ v3_req ]
Extensions to add to a certificate request
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
[ v3_ca ]
Extensions for a typical CA
PKIX recommendation.
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid:always,issuer
basicConstraints = critical,CA:true
Key usage: this is typical for a CA certificate. However since it will
prevent it being used as an test self-signed certificate it is best
left out by default.
keyUsage = cRLSign, keyCertSign
Some might want this also
nsCertType = sslCA, emailCA
Include email address in subject alt name: another PKIX recommendation
subjectAltName=email:copy
Copy issuer details
issuerAltName=issuer:copy
DER hex encoding of an extension: beware experts only!
obj=DER:02:03
Where ‘obj’ is a standard or added object
You can even override a supported extension:
basicConstraints= critical, DER:30:03:01:01:FF
[ crl_ext ]
CRL extensions.
Only issuerAltName and authorityKeyIdentifier make any sense in a CRL.
issuerAltName=issuer:copy
authorityKeyIdentifier=keyid:always
[ proxy_cert_ext ]
These extensions should be added when creating a proxy certificate
This goes against PKIX guidelines but some CAs do it and some software
requires this to avoid interpreting an end user certificate as a CA.
basicConstraints=CA:FALSE
Here are some examples of the usage of nsCertType. If it is omitted
the certificate can be used for anything except object signing.
This is OK for an SSL server.
nsCertType = server
For an object signing certificate this would be used.
nsCertType = objsign
For normal client use this is typical
nsCertType = client, email
and for everything including object signing:
nsCertType = client, email, objsign
This is typical in keyUsage for a client certificate.
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
This will be displayed in Netscape’s comment listbox.
nsComment = “OpenSSL Generated Certificate”
PKIX recommendations harmless if included in all certificates.
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
This stuff is for subjectAltName and issuerAltname.
Import the email address.
subjectAltName=email:copy
An alternative to produce certificates that aren’t
deprecated according to PKIX.
subjectAltName=email:move
Copy subject details
issuerAltName=issuer:copy
#nsCaRevocationUrl = http://www.domain.dom/ca-crl.pem
#nsBaseUrl
#nsRevocationUrl
#nsRenewalUrl
#nsCaPolicyUrl
#nsSslServerName
This really needs to be in place for it to be a proxy certificate.
proxyCertInfo=critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo
####################################################################
[ tsa ]
default_tsa = tsa_config1 # the default TSA section
[ tsa_config1 ]
These are used by the TSA reply generation only.
dir = ./demoCA # TSA root directory
serial = $dir/tsaserial # The current serial number (mandatory)
crypto_device = builtin # OpenSSL engine to use for signing
signer_cert = $dir/tsacert.pem # The TSA signing certificate
(optional)
certs = $dir/cacert.pem # Certificate chain to include in reply
(optional)
signer_key = $dir/private/tsakey.pem # The TSA private key (optional)
signer_digest = sha256 # Signing digest to use. (Optional)
default_policy = tsa_policy1 # Policy if request did not specify it
(optional)
other_policies = tsa_policy2, tsa_policy3 # acceptable policies (optional)
digests = sha1, sha256, sha384, sha512 # Acceptable message digests (mandatory)
accuracy = secs:1, millisecs:500, microsecs:100 # (optional)
clock_precision_digits = 0 # number of digits after dot. (optional)
ordering = yes # Is ordering defined for timestamps?
(optional, default: no)
tsa_name = yes # Must the TSA name be included in the reply?
(optional, default: no)
ess_cert_id_chain = no # Must the ESS cert id chain be included?
(optional, default: no)
ess_cert_id_alg = sha1 # algorithm to compute certificate
identifier (optional, default: sha1)
Thanks,
Are you suggesting not to allow flannel to run on bluefield2?
We think the network fabric need to be hosted on all the nodes for sending the queries.
here is the output
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
bluefield Ready 2d22h v1.22.0 10.xx.xx.xx Ubuntu 20.04.2 LTS 5.4.0-1013-bluefield docker://20.10.8
themis Ready control-plane,master 2d22h v1.22.0 10.xx.xx.xx Ubuntu 20.04.2 LTS 5.4.0-81-generic docker://20.10.8
And this is the output of pods:
kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-78fcd69978-sjlj5 1/1 Running 0 2d22h 10.93.120.2 themis
kube-system coredns-78fcd69978-z7v99 1/1 Running 0 2d22h 10.93.120.3 themis
kube-system etcd-themis 1/1 Running 3 2d22h 10.xx.xx.xx themis
kube-system kube-apiserver-themis 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-controller-manager-themis 1/1 Running 0 2d22h 10.93.226.77 themis
kube-system kube-flannel-ds-4smmg 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-flannel-ds-k2l7j 0/1 CrashLoopBackOff 823 ( ago) 2d22h 10.93.231.112 bluefield
kube-system kube-proxy-k92bg 1/1 Running 0 2d22h 10.xx.xx.xx themis
kube-system kube-proxy-tkk5x 1/1 Running 0 2d22h 10.xx.xx.xx bluefield
kube-system kube-scheduler-themis 1/1 Running 0 2d22h 10.xx.xx.xx themis
Thanks. The content you posted matches that of “/etc/ssl/openssl.cnf.orig” and specifically the file doesn’t define the PKA engine as is defined in “/etc/ssl/openssl.cnf.mlnx”, meaning that the OpenSSL bug shouldn’t occur now.
Did you test curl/wget after you performed the “cp /etc/ssl/openssl.cnf.orig /etc/ssl/openssl.cnf” command? I would expect them both to work without errors.
We did – still similar errors. Rebooted the machine for a sanity check (and changed the conf yet again), yet no success. Any other point of failure that comes to your mind?
Could you please post the error you get from both “curl” and “wget”? From the curl error you posted above this seems like an https error and not a crash (segfault) like the initial error.
While I tried your above command and it worked on my setup, it would be great if you could try accessing a different server so we could check what is the cause for this issue:
This is correct. We’ll provide a CNI in future but for now we only support “host network” PODs.
The cni loopback configuration is just to make kubelet running.
You should be able to get a micro app pod deployed and running.
I’m currently running myself a NGINX pod in my setup:
root@kmaster:/# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
static-web 1/1 Running 0 13d 10.110.169.9 bf-02.internal.nvidia.com
root@kmaster:/# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.233.0.1 443/TCP 118d
nginx NodePort 10.233.44.43 8080:31822/TCP 20d app=nginx
root@bluefield:/home/ubuntu# wget https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list
–2021-08-25 03:35:11-- https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list
Resolving linux.mellanox.com (linux.mellanox.com)… 168.62.212.37
Connecting to linux.mellanox.com (linux.mellanox.com)|168.62.212.37|:443… connected.
OpenSSL: error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied
Unable to establish SSL connection.
root@bluefield:/home/ubuntu# curl -sLS https://linux.mellanox.com/public/repo/doca/1.1/ubuntu20.04/doca.list
curl: (35) error:14094419:SSL routines:ssl3_read_bytes:tlsv1 alert access denied
Thanks for the output traces. The config file update indeed solved the original (segfault) issue, but it seems you encounter some certificate-based issue, as it happens for both curl/wget and it isn’t site-specific.
Could you please supply the following so we could try to reproduce it on our end?
- Which BF Image have you installed? Is it BlueField OS version (3.7) - July? Or BlueField OS version (3.6) - March?
- OFED version -
$ ofed_info -s
- OpenSSL’s version -
$ openssl version
If you didn’t have the /etc/cni/net.d/99-loopback.conf
it is likely that you aren’t using the latest BlueField OS version, in which case an update will most probably resolve this issue.
We are using Bluefield OS 3.7 version. I could see the /etc/cni/net.d/99-loopback.conf file configured correctly.
cat /etc/cni/net.d/99-loopback.conf
{
“cniVersion”: “0.3.1”,
“type”: “loopback”
}
root@bluefield:/home/ubuntu# ofed_info -s
MLNX_OFED_LINUX-5.4-1.0.3.0:
root@bluefield:/home/ubuntu# openssl version
OpenSSL 1.1.1f 31 Mar 2020