Using Kaldi for speech recognition

I hit a roadblock when trying to use KALDI for a corpus of english language data using a Listen-Attend-Spell code which is designed for Chinese language.

I do not know how to transform my VCTK data into kaldi format; these are the instructions I found, but not enough for me to do the implementation:

Task dependent. You have to make data the following preparation part by yourself.
But you can utilize Kaldi recipes in most cases
Generate wav.scp, text, utt2spk, spk2utt (segments)

any ideas? I can provide additional details as needed