Training Info

Good Morning,

following doc https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/model.html, we have a question.

Suppose we want to train from a dataset of patient images. Suppose the folder N with the png images of the patient N. Suppose the patient N has a pathology X. We indicate with label 1 the presence of pathology, and with label 0 the absence. Our question is: for each png image of the patient, in the json file, label 1 must be inserted also for those images in which the presence of the pathology is not evident?

Thanks in advance for your response.

Best Regards

Hi
Thanks for your interest in Clara Train SDK. If I understood you correctly you have N images and are doing a binary classification 0 and 1 ?
If so you should take a look at the chest Xray classification example it is a multi-class (15 independent binary classes) You would then simply change each image label from
1,1,0,1,0....
to
1 for positive
0 for negative
You will also need to change the loss function and metric in the train_config .json file

Hope that helps

Good Morning,

yes, 1 is for positive, 0 for negative. But I try to explain myself better: if from the single image it is not clear that it is a positive, but in reality it is, do you have to tag with 1 or 0?

For example, from a series of dicom images relating to a person with a pathology, in some images the presence of the pathology may not be evident: these images, which we know however are related to a person with a pathology, must be tagged with 1 or 0 ?

Thanks in advance for your response.

Best Regards

Hi
So it seems some images are confirmed as positive while others are negative. is there a 3rd group of unknown?
You need to understand what is that unknown 3rd group. Is it other diseases and you want to treat it as a 3rd group for all other diseases ? in that case you could do 1 class with 3 values
0 --> other diseases
1 --> negative
2 --> positive

Or is your 3rd group simply missing labels that you can eventually place as negative or positive. in that case you can simply take those images out of the initial training

Another possibility in case I misunderstood you is that you have 1 label per patient and multiple images that don’t have a label per image. instead it is one label per patient. This is a much harder problem.

I hope that helps and that I understood your problem

No, we don’t have a third group.

The case should be this: we know that the images are related to a patient with a pathology, but from some of these images, taken individually, it would not be clear. So, to these images in which the presence of the pathology is not evident, should label 0 be put in anyway?

We know both the patient label and the images related to that patient. The question is this: if we know that a patient has label 1, do all the images of that patient have to be labeled 1? Even those in which the pathology is not evident?

Thanks in advance for your response.

Best regards

Hi
I think you problem should be defined as:
for each patient with 50 images, we will do inference on each image, if X or more images are positive then the patient is positive.
With that then you should train a model to just do binary yes no on each image level. So yes image where it is not clear should be labeled -->0
You should be careful not to have a large imbalance ratio. Ideally you need a 1:1 ratio it usually is not as you have more negative than positive. there are multiple ways to overcome this within your loss or sampling. But once you get to 1:10 it becomes harder problem that causes your network to not learn any thing and simply always return negative result

Hope that helps

Good morning,

OK, the problem has been clarified. Thank you.

Best regards