New Self-Paced Course: Synthetic Tabular Data Generation Using Transformers

Originally published at: Courses – NVIDIA

Synthetic data generation is a data augmentation technique necessary for increasing the robustness of models by supplying training data. Explore the use of Transformers for synthetic tabular data generation in the new self-paced course.

@jwitsoe thanks for this. what is minimum sample size we should use? like how many rows are considered as minimum here to generate synthetic tabular data?

By the way, how do I get ask questions after I purchase the self-paced training course from DLI? I have purchased “SYnthetic Tabular Data Generation” course.

Hi @iamexperimentingnow1 ,
Here is the response from our SME:

Minimum is generating 1 row.

i.e. ideally you pass in the most amount of historical context (rows) leaving enough room to generate 1 additional row.

Suppose the TOKENS_PER_ROW = 127 (including 1 for newline char so 128).

SEQ_LEN = 4096

Thus NROWS = 4096 // 127 = 32 # don’t forget about also accounting for the token too

So you can pass in 31 rows and then expect to generate an additional row. Since division of 4096/127 isn’t perfect (32.5) you can end up with an extra partial row (1.5 rows total), which can be trimmed (just the last 0.5 row).

Regarding your other question - you can reach out to dli-help@nvidia.com if you have questions post your purchase.