随着科技的飞速发展,人工智能领域也在不断进步。最近,OpenAI发布了一个名为GPT-2的大型语言模型,这个模型不仅有着惊人的参数规模(高达15亿),而且能够进行流畅的续写,甚至可以生成成篇文章,就像人类一样。
GPT-2的训练数据来源于互联网上的一些高质量内容,这些内容来自Reddit论坛中评分较高的外链页面。这意味着这些内容质量较高,多样性也很强。通过无监督训练,该模型能够预测给定句子后面的下一个词,并且因为数据库充满了丰富和多样化的文本,它得到了令人惊叹的结果:能流畅地续写句子,也能形成完整文章。
虽然有时会出现一些失败的情况,比如重复、错误常识或者话题跳跃,但成功例子中,生成的文本非常全面,有逻辑性,而且段落之间甚至全篇文章内保持连贯一致。研究人员发现,只要尝试几次,对于常见话题一般都能得到好的结果,而对于比较罕见的话题则效果差一些。
除了基本功能之外,GPT-2还可以在特定领域数据集上进行微调,以更好地控制风格。此外,它还可以在没有任何任务专用数据集的情况下进行“无样本学习”,并取得了优异成绩。这表明这些任务都是通用语言建模的一个子集,所以自然表现良好。
最引人注目的是它生成文本的心理层面,即使是在翻译成中文后,我们也感觉到其描写和叙事逻辑虽然仍有一些混乱,但它不仅找到了适合文字风格,还补充了许多细节,其文本中也有上下文联系和递进性,不仅语法几乎正确,可以说是非常接近真实状态。
举例来说,如果我们给出一段关于独角兽现象被科学家发现并且开始对它们进行研究的情景,那么这个模型可能会这样继续:
人类给定的文本:
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
(在一项令人震惊的发现中,科学家们在安第斯山脉中偏远、从未曾探索过的小山谷里找到了一群独角兽。更令研究人员们意外的是,这群独角兽竟然说得一口流利英语。)
接着由该模型续写:
The scientist named the population after their distinctive horn, Ovid’s Unicorn. These four-horned silver-white unicorns were previously unknown to science. Now, after almost two centuries of research and study, we have finally solved the mystery of what sparked this odd phenomenon.
Dr. Jorge Pérez and his companions ventured further into the valley as they reached one peak where water appeared blue with some crystals on top; it was so close they could touch their horns without having to move much from their original position.
Pérez and his friends were astonished at sight of these creatures - not only did they see them without any effort but also noticed that each unicorn had an unique crystal embedded within its horn which emitted different colors depending upon its mood or emotions.
This remarkable discovery has opened up new avenues for research in understanding nature's most mysterious creatures and how humans can coexist peacefully with them while respecting their rights as individuals with intelligence just like us.