AIAA Connection Refused using Kubernetes

Hi,
Currently, I’m using Kubernetes for accessing Clara Train SDK v2. I run the image using this command:
kubectl run -n syifa clara-v2 -it --tty --image=nvcr.io/nvidia/clara-train-sdk:v2.0 start_aas.sh
The cmd shows this

If you don’t see a command prompt, try pressing enter.

NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.

NOTE: Detected MOFED driver 4.4-2.0.7; version automatically updated.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TensorFlow. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

/opt/nvidia/medical/aiaa-launch-config.json

ENGINE:: engine=TRTIS
TRTIS:: Backend is enabled

TRTIS:: trtis_ip=localhost
TRTIS:: Will setup TRTIS Server on localhost

TRTIS:: trtis_http_port=8000
TRTIS:: trtis_grpc_port=8001
TRTIS:: trtis_metrics_port=8002
TRTIS:: trtis_proto=grpc
TRTIS:: trtis_model_path=/var/nvidia/aiaa/trtis_models
TRTIS:: trtis_verbose=false
TRTIS:: trtis_log=/var/nvidia/aiaa/logs/host-80/trtis.log
TRTIS:: trtis_start_timeout=120
TRTIS:: trtis_model_timeout=30

TRTIS:: Waiting 1 seconds to fully up…
TRTIS:: Server started with pid: 103

AIAA:: aiaa_port=80
AIAA:: aiaa_log_file=/var/nvidia/aiaa/logs/host-80/aiaa.log
AIAA:: aiaa_log_dir=/var/nvidia/aiaa/logs/host-80
AIAA:: aiaa_workspace=/var/nvidia/aiaa
AIAA:: aiaa_ssl=false
AIAA:: aiaa_ssl_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
AIAA:: aiaa_ssl_pkey_file=/etc/ssl/private/ssl-cert-snakeoil.key

Stopping Apache httpd web server apache2
Site 000-default disabled.
To activate the new configuration, you need to run:
service apache2 reload
Enabling site AIAA.
To activate the new configuration, you need to run:
service apache2 reload
Starting AIAA Server…
AH00558: apache2: Could not reliably determine the server’s fully qualified domain name, using 10.233.116.248. Set the ‘ServerName’ directive globally to suppress this message

but after I do port-forwarding using this command
kubectl port-forward clara-v2-64f4dd4c4-r8jh2 5000:5000 -n syifa
When I accessed the localhost:5000, it said localhost didn’t send any data
The error on cmd is shown below.

E0413 14:59:28.775394   21411 portforward.go:400] an error occurred forwarding 5000 -> 5000: error forwarding port 5000 to pod d04aa4f526d6bb6a44f9640a7088459df3bb1195d431ffd16e6520072fab987e, uid : exit status 1: 2020/04/13 14:59:28 socat[16300] E connect(6, AF=2 127.0.0.1:5000, 16): Connection refused

I try to exit the pods and resume the pods but the AIAA server often stopped before it started. I’ve never experienced this problem with Clara Train SDK v1.0.
image

Can you help me with my problem? Thank you so much!

Hi,

Can you try this new v3.0: https://ngc.nvidia.com/catalog/containers/nvidia:clara-train-sdk/

From the error, I guess it is a port issue.
So if run in just docker environment.
We need to specify this tag: “-p [some port on host]:80” when starting the docker container.
Because AIAA is using apache and is bind to port 80 inside the container.

When you start_aas.sh you don’t specify port that is correct.
Try this and let us know if there are any issues.

The v2.0 image I pulled a few months back exposed the AIAA server on port 5000. The v2.0 image I pulled today exposes the AIAA server on port 80. The new v2.0 image also uses Apache instead of (what I assumed) used to be Flask.

Are such retroactive changes of already released images to be expected with Clara?

We are sorry for the inconvenience, we released a hotfix for version 2.0 and the version tag should have reflected the same as 2.1 - Sorry about the confusion. This is not expected from Clara - we plan to maintain and release version controlled updates. At this point we recommend you use our latest version Clara Train v3.0 which includes latest updates to AI Assisted Annotation Framework https://news.developer.nvidia.com/clara-train-deploy-medical-imaging-developers/ - We shall ensure this sort of inconvenience is not faced in the future.

Hello everyone. As I understood it is better to use Clara v3.0 because v2.0 doesnt work correctly?

Hi, thanks for your reply.

I’ve tried Clara Train SDK v3 and the AIAA server runs correctly and never stops working suddenly as I said before. But I still have an issue on port forward. As you said that I don’t need to specify the port when I access it using kubectl, but it still gives the response “localhost didn’t send any data” and the error message is still the same as I showed in my previous question (connection refused).

Do you have any idea why I failed to access the AIAA server from my local?

I will suggest you use Clara v3.0 as it has more features and better supported

Hi there,

From your response I am assuming you can run AIAA server correctly under a docker environment.
For the port-forwarding in you kubectl, I think you can try kubctl port-forward [your pod] 5000:80.
Because AIAA now by default listen to the port 80 inside the container.
So if you want your local port 5000 to work, you need to use this.

hi,

It solved my problem. So, the AIAA listen to port 80 in v3.0. Thank you for your help

1 Like