pre-training: Training an artifical intelligence model with one task, in order to make future tasks easier to learn.
An average attention mechanism might consist of a weighted sum over a couple of inputs, where in fact the weight for each input is computed by another part of the neural network. Trained on 70,000 hours of IDM-labeled online video, our behavioral cloning model (the “VPT foundation model”) accomplishes tasks in Minecraft which are nearly…