Respond to Limited or No Labeled Data in The Realm of NLP
Published:
Lack of labeled data for training is such a cliché in the data science and AI field. In data science, only large companies which accumulated sufficient user or machine records bother to research fancy predictive models to obtain a systematic understanding of the patterns with statistical evidence. Regarding AI, the algorithm development was once stuck at a bottleneck until the huge ImageNet dataset was open sourced which led to a blossom of neural network research in computer vision. In NLP, a decent, large go-to training set is still missing. To overcome this issue, researchers made enormous amount of effort in pre-training zero-shot or few-shot large language models from plain text.