Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
A100
• How to reproduce the issue ?
run TAO/getting_started_v4.0.0/notebooks/tao_api_starter_kit/api/end2end/ssd.ipynb
I get to the point where I upload data (approx 5 Gb)
# Upload
files = [("file",open(train_dataset_path,"rb"))]
endpoint = f"{base_url}/dataset/{dataset_id}/upload"
response = requests.post(endpoint, files=files, headers=headers, verify=rootca)
print(response)
print(response.json())
I get this overflow error
OverflowError Traceback (most recent call last)
/tmp/ipykernel_147669/3804223585.py in
4 endpoint = f"{base_url}/dataset/{dataset_id}/upload"
5
----> 6 response = requests.post(endpoint, files=files, headers=headers, verify=rootca)
7
8 print(response)
~/.pyenv/versions/3.7.16/envs/PY37/lib/python3.7/site-packages/requests/api.py in post(url, data, json, **kwargs)
113 """
114
--> 115 return request("post", url, data=data, json=json, **kwargs)
116
117
~/.pyenv/versions/3.7.16/envs/PY37/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs)
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
60
61
~/.pyenv/versions/3.7.16/envs/PY37/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
585 }
...
-> 1003 return self._sslobj.write(data)
1004 else:
1005 return super().send(data, flags)
OverflowError: string longer than 2147483647 bytes
I assumed (even though I don.t understand why this is happening) This was because of the size of the training dataset and modified the code with
with open(train_dataset_path, "rb") as f:
response = requests.post(endpoint, headers=headers, data=f, stream=True, verify=rootca) # stream with 1mb chunks
response.raise_for_status()
With this I get a code 500 (internal server error)
HTTPError Traceback (most recent call last)
/tmp/ipykernel_189197/1635968353.py in
1 with open(train_dataset_path, "rb") as f:
2 response = requests.post(endpoint, headers=headers, data=f, stream=True, verify=rootca) # stream with 1mb chunks
----> 3 response.raise_for_status()
~/.pyenv/versions/3.7.16/envs/PY37/lib/python3.7/site-packages/requests/models.py in raise_for_status(self)
1019
1020 if http_error_msg:
-> 1021 raise HTTPError(http_error_msg, response=self)
1022
1023 def close(self):
HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://aisrv.gnet.lan:31549/api/v1/user/f2d3c55a-f3dd-5dff-badc-851e27460122/dataset/32a03a22-8a6f-4a4c-a6c0-93373e9638b0/upload
The setup is a k8 cluster with a master (24 core cpu srver with 16 Gb RAM, Ubuntu Server 22.04 ) and a GPU node (Nvidia DGX A100 Station)
The jupyter notebook server is run within the DGX (virtualenv python 3.7.16).
Cheers,
Ganindu.
P.S.
however the smaller evaluation dataset (approx 800mb) uploaded sucessfully (Response 201)
files = [("file",open(eval_dataset_path,"rb"))]
endpoint = f"{base_url}/dataset/{eval_dataset_id}/upload"
response = requests.post(endpoint, files=files, headers=headers, verify=rootca)
print(response)
print(response.json())