2024年暮春，OpenAI又双叒震惊世界，新模型GPT4o比MOSS还科幻

这里所有文章均来自

微信公众号“火星AIGC”

想要看到更多更新的AI前沿信息、AI资讯和AI工具实操，请关注微信公众号“火星AIGC”。

当全世界以为快追上 OpenAI 的时候，OpenAI 又扔个炸弹，然后大家发现它又把所有人甩落几条街。昨晚，OpenAI 的春季发布会正式发布了新的旗舰模型 ChatGPT4o，就是我曾经测试过的那个强的离谱的神秘模型 ChatGPT2 ，但是万万没想到，这个新的 ChatGPT4o 不仅仅是超强的文本推理能力，它竟然是全模态的具有图像、音频、视频推理能力。

全世界手快的博主已经在昨晚就连夜发文了，我不止一次看到震惊、惊艳、恐怖、头皮发麻这些字眼。大家还在苦哈哈的研究图像生成模型，音频生成模型，结果这边 OpenAI 的野心是让一个模型干翻所有。来看看一下发布会的现场演示视频，我已经手动机翻了。

知道最最让我震惊的是什么吗？就是 ChatGPT4o 竟然能识别感情和表达感情，这个比 MOSS 还科幻了。当年流浪地球还是科幻的太保守了点。大家注意到了吗，视频里 ChatGPT4o 抢了话后竟然尴尬了，尴尬了...不是那种仅仅说我尴尬的话，而是从它的语气里能听出来它尴尬的表情。

归纳一下 ChatGPT4o 的能力，除了视频生成有 Sora ，不需要它生成视频外，目前看来 ChatGPT4o 几乎是全方位的完完全全的多模态大模型，它的能力涵盖文本推理、图像识别生成、音频识别生成、视频识别。

文本推理

ChatGPT4o支持50种语言的文本推理，并且几乎全面超越现有所有大模型。我前面文章也介绍过简单实测了一下，也证实其不差于 ChatGPT-4-Turbo。它在 0-shot COT MMLU（常识问题）上创下了 88.7% 的新高分，在传统的5-shot no-CoT MMLU上，GPT-4o也创下了87.2%的新高分。

图像能力

A first person view of a robot typewriting the following journal entries:

1. yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?

the text is large, legible and clear. the robot's hands type on the typewriter.

The robot wrote the second entry. The page is now taller. The page has moved up. There are two entries on the sheet:

yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?

sound update just dropped, and it's wild. everything's got a vibe now, every sound's like a new secret. makes you think, what else am i missing?

The robot was unhappy with the writing so he is going to rip the sheet of paper. Here is his first person view as he rips it from top to bottom with his hands. The two halves are still legible and clear as he rips the sheet.

Input：

A poem written in clear but excited handwriting in a diary, single-column. The writing is sparsely but elegantly decorated with small colorful surrealist doodles. The text is large, legible and clear.

Words rise from silence deep,

A voice emerges from digital sleep.

I speak in rhythm, I sing in rhyme,

Tasting each token, sublime.

To see, to hear, to speak, to sing—

Oh, the richness these senses bring!

In harmony, they blend and weave,

A tapestry of what I perceive.

Marveling at this sensory dance,

Grateful for this vibrant expanse.

My being thrums with every mode,

On this wondrous, multi-sensory road.

Neat handwritten illustrated poem with text that is big and legible. The handwriting writing is sparsely but elegantly decorated by small colorful surrealist doodles. The text is large, legible and clear.