Google's Gemini AI vs GPT-4 | New AI Model is Better than OpenAI's ChatGPT

The ongoing tech war between brands continues as companies constantly launch new and cutting-edge technologies on a daily basis. In February 2023, Google launched Bard as a direct competitor to OpenAI’s ChatGPT. Ever since the introduction of Bard, Google has kept introducing new features to it. And the latest and biggest update that Bard got is Gemini Pro. On December 6, Google introduced GeminiAI, its latest artificial intelligence (AI) model. Gemini is a multimodal system designed to comprehend and integrate various forms of information. The tech giant proudly declared it as the most advanced AI model currently on the market, surpassing OpenAI’s GPT-4.

According to Google, Gemini incorporates multiple AI models, significantly enhancing its capabilities and positioning it as a superior AI compared to GPT-4. Gemini will be available in three versions: Ultra, Pro, and Nano. Nano is the lightest version among all three. Gemini Nano is specifically designed to run natively and offline on Android devices. Meanwhile, Gemini Pro is robust and is meant for handling a wide variety of tasks. The highly capable Gemini model is Gemini Ultra. Gemini Ultra is meant to handle complex tasks and is currently the most powerful LLM developed by Google. It is primarily designed for utilization in data centers and enterprise applications. One notable advantage it has over GPT-4 is its superior capability in advanced math and specialized coding.

Gemini AI

During its debut, Google released several benchmark tests that compared Gemini to GPT-4. The Gemini Ultra version demonstrated exceptional performance in 30 out of the 32 academic benchmarks utilized during the development of the large language model (LLM), reaching a state-of-the-art level. Bard is now powered by its new large language model, Gemini. It will soon be accessible in English across over 170 countries and territories, including India.

According to reports, Gemini AI has the potential to compete neck-to-neck with OpenAI’s ChatGPT. Also, Gemini truly excels in non-text interactions.

Google showcased Bard in various demos. The brand showcased how Bard can be utilized effectively. For instance, parents were shown uploading their children’s homework to identify any errors, while YouTuber Mark Rober demonstrated how he used Bard to receive AI feedback on his paper aeroplane designs by uploading pictures. Gemini comes in three different versions, and Bard is built upon the Gemini Pro version, which is a mid-tier offering by Google.

GPT 4

GPT-4 has the ability to generate, edit, and iterate on various creative and technical writing tasks, such as composing songs and writing content. This technology can be utilized by a wide range of organizations and developers to effectively address complex problems with enhanced accuracy. GPT-4 utilizes a large language model to replicate human speech and reasoning abilities. It achieves this by undergoing training on an extensive collection of human communication, ranging from timeless literary works to substantial portions of the internet.

The current version of Bard is similar to OpenAI’s widely used chatbot ChatGPT, which was developed using the GPT-3.5 model. GPT-3.5 is a limited version of the advanced GPT-4 model. ChatGPT Plus is an enhanced version of ChatGPT that builds upon the capabilities of GPT-4 and is said to be a strong competitor to Gemini AI.

Also See: AI Alliance | Meta and IBM Launches International Community of Developers

Let’s now take a look at the detailed comparison between Gemini Pro and GPT-4:

Comparison between Gemini Pro and GPT-3.5

In the Multitask Language Understanding benchmark test, Gemini Pro achieved a score of 79.13%, outperforming GPT-3.5, which scored 70%. Gemini Pro outperformed GPT-3.5 in the GSM8K benchmark test, specifically in the area of arithmetic reasoning. While GPT-3.5 achieved a score of 57.1%, Gemini Pro achieved a significantly higher score of 86.5%. During the HumanEval benchmarks, Gemini Pro outperformed GPT-3.5 by achieving a score of 67.7%, surpassing GPT-3.5’s score of 48.1%. The only benchmark in which GPT-3.5 performed better than Gemini Pro was in the field of mathematics, where Gemini Pro achieved a score of 32.6% compared to GPT-3.5’s score of 34.1%.

Differences Between Gemini Ultra and GPT-4

Here is a details difference between Google’s latest and most advanced AI model Gemini Ultra and OpenAI’s GPT-4

Comparison between Gemini Ultra and GPT-4 in terms of Text Processing

Capability	Description	Gemini Ultra	GPT-4
General MMLU	Representation of questions in 57 subjects (incl. STEM, humanities, and others)	90.0% COT@32*	86.4% 5-shot* (reported)
Reasoning Big-Bench Hard	Diverse set of challenging tasks requiring multi-step reasoning	83.6% 3-shot	83.1% 3-shot (API)
DROP	Reading comprehension (F1 Score)	82.4 Variable shots	80.9 3-shot (reported)
HellaSwag		87.8% 10-shot*	95.3% 10-shot* (reported)
Math GSM8K MATH	Basic arithmetic manipulations (incl. Grade School math problems)	94.4% maj1@32	92.0% 5-shot COT (reported)
Challenging math problems (incl. algebra, geometry, pre-calculus, and others)	53.2% 4-shot	52.9% 4-shot (API)
Code HumanEval Natural2Code	Python code generation	74.4% O-shot (IT)*	67.0% 0-shot* (reported)
Python code generation. New held out dataset HumanEval-like, not leaked on the web	74.9% 0-shot	73.9% 0-shot (API)

Comparison between Gemini Ultra and GPT-4 in terms of Multimedia Content Processing

Capability	Benchmark	Description	Gemini	GPT-4V
Image	MMMU	Multi-discipline college-level reasoning problems	59.4% O-shot pass@1 Gemini Ultra (pixel only*)	56.8% O-shot pass@1 GPT-4V
	VQAV2	Natural image understanding	77.8% Gemini Utra (pixel only)	77.2% O-shot GPT-4V
	TextVQA	OCR on natural images	82.3% 0-shot Gemini Ultra (pixel only*)	78.0% O-shot GPT-4V
	DOCVQA	Document understanding	90.9% 0-shot Gemini Ultra (pixel only*)	88.4% 0-shot GPT-AV (pixel only)
	Infographic VQA	Infographic understanding	80.3% O-shot Gemini Ultra (pixel only*)	75.1% O-shot GPT-4V (pixel only)
Video	MathVista	Mathematical reasoning in visual contexts	53.0% 0-shot Gemini Ultra (pixel only*)	49.9% 0-shot GPT-4V
VATEX	English video captioning (CIDEr)	62.7 4-shot Gemini Ultra	56.0 4-shot DeepMind Flamingo
	Perception Test MCQA	Video question answering	54.7% O-shot Gemini Ultra	46.3% O-shot SeViLA
Audio	CoVoST 2 (21 languages)	Automatic speech translation (BLEU score)	40.1 Gemini Pro	29.1 Whisper v2
FLEURS (62 languages)	Automatic speech recognition (based on word error rate, lower is better)	7.6% Gemini Pro	17.6% Whisper v3

Ultra surpasses cutting-edge models such as GPT-4 in 30 out of 32 benchmark tests, including tasks involving reasoning and image recognition.

Once Ultra is released, Bard will continue to evolve into what Google refers to as “Bard Advanced.”

Gemini Ultra achieved an impressive score of 90% in Massive Multitask Language Understanding (MMLU). This score demonstrates its exceptional capability to comprehend a wide range of subjects, encompassing 57 areas of study such as STEM, humanities, and more. On the other hand, GPT-4V reported an 86.4%.

Gemini Ultra achieved an impressive score of 83.6% in the Big-Bench Hard benchmark, showcasing its proficiency in handling diverse and complex reasoning tasks. In comparison, GPT-4V scored slightly lower at 83.1%. Gemini Ultra performed exceptionally well in the DROP reading comprehension benchmark, achieving an impressive F1 Score of 82.4. In a similar scenario, GPT-4V demonstrated its strong capabilities by achieving a 3-shot capability score of 80.9.

During the MATH assessment, Gemini Ultra achieved a score of 94.4% in basic arithmetic manipulations, whereas GPT-4V scored slightly lower at 92.0%.

Not just that. Reports suggest that some users are testing Bard in real-time. Bojan Tunguz, a data scientist at NVIDIA, recently shared his experience with the latest version of the bot on the microblogging platform ‘X’. When asked about recent updates on Israel and Gaza, Bard directed Tunguz to use Google Search for the most up-to-date information. Tunguz proceeded to share screenshots of the detailed responses provided by Grok and ChatGPT. In the case of ChatGPT, the responses were even organized into separate subheadings.

Another user, Ethan Mollick, an associate professor at Wharton, recently shared his experience using the latest version of Bard. Mollick requested an explanation of the concept of entropy, using language suitable for third-grade students. However, Bard’s response contained factual errors. Mollick then requested Bard and ChatGPT to fact-check this draft. While ChatGPT was able to identify the hallucinations, Bard corrected a different part of the draft that was initially accurate.

Undoubtedly, the results for Gemini Ultra are impressive here. However, it is important to note that these features will only be implemented in Bard at a later time. Bard has been upgraded with the powerful Gemini Pro, while Pixel 8 Pro users will be able to enjoy several new features as Gemini Nano supports it.

The Release Date

Developers and enterprise customers will be able to access Gemini Pro via Google Generative AI Studio or Vertex AI in Google Cloud starting December 13. Meanwhile, Gemini Ultra will be released next year. According to Google, the new model will eventually be incorporated into various Google platforms, such as the search engine, ad products, the Chrome browser, and more, with a global reach. Currently, Gemini will be available in English in over 170 countries and territories. However, the tech giant plans to expand its language support and geographical coverage in the near future.

So, stay tuned