Connect with us

Noticias

Grok 3 vs ChatGPT & Gemini: my week-long experience

Published

on

I’ve been off X (formerly Twitter) for so long that I completely missed some of its latest developments, including its AI chatbot, Grok. Created by xAI, Elon Musk’s artificial intelligence venture, Grok is built on its own large language model (LLM) and was launched in 2023 as a challenger to mainstream AI assistants.

When I first heard about Grok 3, the latest iteration, I was immediately curious. 

Could this AI go head-to-head with ChatGPT and Google Gemini? 

Was it just another gimmick tied to X, or was there real potential here? 

To find out, I did what any AI enthusiast would do, I ditched ChatGPT and Gemini for an entire week and used Grok 3 exclusively.

In this review, I’ll take you through my hands-on experience, covering everything from usability and response quality to speed, customization, and overall performance. If you’re wondering whether Grok 3 is worth your time, stick around as I’m about to give you my unfiltered verdict.

TL;DR: Key takeaways from this article

  • Grok 3 is fast, witty, and built for engagement, excelling in humor and casual conversations while holding its own in content generation.
  • It’s exclusive to X (formerly Twitter), bundled with X Premium+ at $40/month, with a standalone SuperGrok plan at $30/month or $300/year. There’s a temporary free access period (until the server capacity is exceeded). 
  • Speed is impressive, rivaling ChatGPT and Gemini, but factual accuracy is hit-or-miss, making it less reliable for research.
  • It’s best for casual users and developers looking for an API, but researchers and businesses may prefer ChatGPT or Gemini.

Understanding Grok 3?

What is Grok 3? 

Grok 3 is the latest iteration of xAI’s artificial intelligence model, designed to handle a wide range of queries with enhanced reasoning, real-time knowledge, and deep contextual understanding. 

Unlike some traditional AI assistants with static knowledge cutoffs, Grok 3 stays up-to-date by analyzing information from the web and X (formerly Twitter), making it particularly useful for real-time insights. What sets Grok 3 apart is its ability to interact with X content, including user profiles, posts, linked articles, and even uploaded files like images or PDFs.

It also comes with three distinct operating modes: Think Mode, Big Brain Mode, and DeepSearch Mode. 

How does Grok 3 work? 

At its core, Grok 3 operates using a large language model (LLM) trained to understand and generate human-like responses. 

Here’s a simplified breakdown of how it functions:

  1. Query processing: When you input a question or command, Grok 3 first analyzes the context and intent behind your request.
  2. Real-time knowledge retrieval: It searches X, the broader web, or internal training data for the most relevant information.
  3. Reasoning and response generation: Based on the query, Grok 3 selects the best mode (Think, Big Brain, or DeepSearch) to craft a response with the appropriate depth.
  4. Multi-modal capabilities: If needed, it can process images, analyze uploaded documents, or generate new visuals upon request.
  5. Output refinement: Users can tweak responses, ask for revisions, or request deeper insights to refine the final answer.

Grok 3 at a glance

Developer xAI
Year launched 2023
Type of AI tool Large Language Model (LLM)
Top 3 use cases Real-time research, content creation, conversational assistance
Who can use it? X Premium subscribers, AI enthusiasts, businesses
Starting price Varies based on X subscription tiers
Free version Available, but with limited capabilities

Why I decided to use Grok 3 for one instead of ChatGPT and Google Gemini

As someone who regularly relies on AI for everything from content creation to research, I wanted to see if Grok 3 could genuinely compete with ChatGPT and Google Gemini — two of the most established AI models today.

I had a few key questions in mind:

  • Is Grok 3 a viable alternative to ChatGPT and Gemini?
  • How does it handle complex queries, coding, and content generation?
  • Can it offer anything unique that ChatGPT and Gemini don’t?
  • Would I want to use Grok 3 long-term?

Would Grok 3 impress me or leave me scrambling back to my AI comfort zone? 

Let’s find out.

Getting started with Grok 3 (vs. ChatGPT and Gemini)

Sign-up and onboarding experience

Setting up Grok 3 was a seamless process but with a catch. Unlike ChatGPT, which allows standalone sign-ups, Grok 3 requires an X (formerly Twitter) account, much like the way Gemini requires a Google account and Meta AI runs on Facebook, Instagram, or WhatsApp account (although it also has a standalone platform). While this integration is convenient for existing X users, it could be a dealbreaker for those who prefer independent access.

The onboarding experience was straightforward. After logging in, I was given a brief but effective walkthrough of Grok 3’s capabilities. Compared to Gemini’s onboarding, which focuses on Google services integration, and ChatGPT’s optional customization settings, Grok 3 took a more direct approach, no frills, just a quick introduction before diving in.

How easy is it to use Grok 3? 

From the moment I started interacting with Grok 3, I found the usability to be a balance between simplicity and power. The AI responded quickly, and switching between modes was effortless. 

However, one immediate limitation stood out, there were fewer customization options than I expected. Unlike ChatGPT, which lets you adjust conversation memory settings, or Gemini, which integrates Google tools for more tailored results, Grok 3 felt a bit more rigid in its setup.

First impressions: interface and usability

At first glance, Grok 3’s interface felt like a mix of ChatGPT’s structured chat layout and Gemini’s Google-integrated workspace. It was minimalistic, clean, and responsive, making navigation smooth. Unlike Gemini, which spreads its functionality across different Google services, Grok 3 kept everything centralized, which I appreciated. For those who prioritize speed and ease of use, Grok 3’s no-nonsense layout and snappy performance were definite pluses.

Grok 3 key features and performance (vs. ChatGPT and Google Gemini)

1. AI capabilities: creativity, accuracy, and problem-solving

I subjected Grok 3 to a rigorous testing criteria across diverse tasks that ranged from creative writing to technical problem-solving:

I. Content generation: When I asked Grok 3 to draft articles and creative stories, its personality immediately stood out. Unlike the sometimes sterile outputs from other AI systems, Grok’s responses sparkled with wit and humor. While brainstorming marketing concepts for a fictional coffee shop, it suggested campaign ideas that genuinely made me laugh, something neither ChatGPT nor Gemini consistently achieve.

II. Code assistance: Debugging a troublesome Python script (I enlisted the help of a coder) revealed Grok 3’s technical competence. It not only identified the issues in the code, but offered explanations that reflected genuine understanding rather than pattern-matching. Still, when it was presented with a particularly complex algorithm challenge, it occasionally missed nuances that ChatGPT-4 caught.

III. General knowledge and research: For factual questions, Grok 3 delivered solid answers, though with interesting quirks. When I asked about historical events, scientific concepts, and geographic information, its responses were typically accurate but sometimes lacked the encyclopedic depth of its competitors.

Performance breakdown:

I ranked its performance using three criteria:

Creativity: Grok 3 genuinely surprised me with its ability to generate humorous, witty, and occasionally irreverent content. Its personality shines through in ways that feel distinctly different from other AI assistants. While ChatGPT still produced more structurally sophisticated creative writing, Grok’s responses had a spark of originality that often felt more human.

Accuracy: On technical queries, Grok 3 generally performed admirably, though I noticed it occasionally stumbled when compared to Gemini’s precision, particularly when facts required up-to-the-minute accuracy. When I asked about specialized topics in physics and computer science, its answers were solid but sometimes lacked Gemini’s authoritative confidence.

Problem-solving: For complex reasoning tasks, Grok 3 demonstrated impressive capabilities but revealed limitations when tackling truly challenging problems. During a series of logic puzzles, it sometimes failed to match ChatGPT-4’s methodical approach to breaking down multi-step problems.

2. Speed and responsiveness

Waiting too long for an AI response can disrupt your workflow. This aspect of Grok 3 revealed both strengths and surprising limitations.

Grok 3’s response time impressed me across most standard interactions. For straightforward questions and content generation, it frequently matched or even outpaced ChatGPT, delivering snappy replies that kept conversations flowing naturally. This responsiveness made it particularly enjoyable for brainstorming sessions, where rapid ideation matters.

However, Grok 3’s speed advantage disappeared when activating its specialized modes. In particular, DeepSearch mode, while extremely useful for gathering current information, introduced noticeable delays compared to Gemini’s lightning-fast search capabilities. When researching breaking news about a tech conference, Gemini consistently delivered results several seconds faster thanks to its seamless Google integration.

The most dramatic speed difference emerged when using Grok’s Think mode for complex reasoning tasks. Grok took more time to process its response, significantly longer than ChatGPT’s turnaround for the same query. While this extra processing yielded more transparent reasoning, the wait became noticeable enough to disrupt my thought process.

For daily use, these speed differences matter most when you’re under time pressure. If you need rapid-fire responses for simple tasks, Grok generally delivers. But for anything requiring its advanced capabilities, be prepared for occasional waiting periods that can interrupt your workflow.

3. Integration and customization

Grok 3’s approach to integration reveals both forward-thinking features and notable gaps in the ecosystem.

For developers, Grok 3 offers robust API access that enabled me to experiment with integrating its capabilities into custom applications. The documentation proved surprisingly comprehensive, and the implementation process was straightforward compared to my previous experiences with other AI APIs.

However, Grok 3’s third-party integration options pale in comparison to its competitors. While ChatGPT’s extensive plugin ecosystem allows seamless connections to hundreds of tools and services, and Gemini’s deep Google Workspace integration provides natural productivity enhancements, Grok feels noticeably isolated. During my testing week, this limitation became increasingly apparent as I missed my usual seamless transitions between AI assistance and my productivity tools.

Customization options also revealed mixed results. Grok 3 allows basic personality adjustments, but lacks the granular control offered by ChatGPT’s settings and system message customizations. When I attempted to fine-tune Grok’s responses for a technical writing project, I found myself missing the precise calibration that ChatGPT offers through its advanced prompt engineering options.

4. Real-time data access

If you’ve ever tried getting fresh information from an AI, you know the struggle. Ask a chatbot about today’s stock market shifts, and half the time, it’ll shrug like, “I don’t have real-time data.” 

Grok 3? Different story.

Grok 3 is built for live data. Not just browsing the web; it actively pulls updates from X and even digs deeper with DeepSearch Mode, which doesn’t just skim headlines but pulls nuanced details from recent sources.

ChatGPT has web browsing, but it’s not as aggressive with real-time updates. It’s more of a well-organized research assistant than a breaking-news reporter. You’ll get useful context but not always the most up-to-date scoop.

Gemini leans on Google Search, which sounds promising, but in my experience, it wasn’t as fast or as focused on live updates as Grok 3. It’s powerful, but you’re still dealing with a search engine mindset rather than live data integration

5. Specialized modes

One thing I loved about Grok 3 is that it doesn’t just answer; it shifts gears based on the complexity of your question.

Grok 3 has three modes: 

  • DeepSearch Mode digs into web and X posts, not just surface-level info. Great for research-heavy topics.
  • For Think Mode, instead of spitting out a rushed answer, Grok thinks step by step, like a detective laying out clues.  
  • Big Brain Mode taps into a massive GPU cluster (200,000 Nvidia GPUs) to crunch complex data, making it a powerhouse for technical fields like coding and math-heavy analysis.

ChatGPT comes with two modes primarily: Search and Reason. 

  • Search Mode (Bing-powered browsing) works fine, but doesn’t feel as tightly integrated as Grok 3’s X-driven updates.
  • In Reason Mode, Instead of explaining its reasoning out loud like Grok, ChatGPT works silently in the background. It solved the trolley problem in just 6 seconds, faster but less transparent.

Google Gemini doesn’t exactly have “modes” like Grok or ChatGPT, but it’s a beast at handling multiple formats (text, images, charts). If you throw in a messy spreadsheet, a handwritten note, and a question all at once, Gemini handles it better than the other two.

6. Use cases

No AI is perfect at everything. Here’s where each one truly excels:

Grok 3 dominates when

  • You need real-time info (breaking news, stock market trends, latest social discussions).
  • You’re doing deep technical research (math, coding, AI problem-solving).
  • You want AI that explains its reasoning step-by-step instead of just giving you an answer.

Go for ChatGPT if:

  • You need a creative assistant (writing blogs, ad copy, brainstorming ideas).
  • You’re working with AI-generated art (DALL·E 3 integration makes ChatGPT the choice for images).
  • You want fast, structured answers without too much thinking out loud.

Google Gemini is your best bet when: 

  • You need AI that works seamlessly with Google Search, Drive, and Docs.
  • You’re handling multiple content types (text, images, and spreadsheets ) in one query.
  • You want an AI that’s great for research and summarizing academic content.

7. pricing 

Grok 3 pricing (as of March 11, 2025)

Plan Monthly cost Annual cost Accessibility Features Limitations/Notes
Free access $0 $0 X, grok.com, apps Basic chat, Think mode, DeepSearch (limited) Temporary (until server capacity is exceeded), usage limits may apply, no advanced features like Voice Mode
X Premium+ $40 $350 X platform (primary), grok.com Enhanced Grok 3 access: higher limits, DeepSearch, Voice Mode, plus X platform perks (ad-free, etc.) Price increased post-Grok 3 launch, regional pricing varies
SuperGrok $30 $300 grok.com, apps Full Grok 3 experience: Think mode, DeepSearch, unlimited image generation, priority updates Standalone (no X perks), consistent pricing, aimed at non-X users

ChatGPT pricing

Plan Features Cost
Free Access to GPT‑4o miniReal-time web searchLimited access to GPT‑4o and o3‑miniLimited file uploads, data analysis, image generation, and voice modeCustom GPTs $0/month
Plus Everything in Free, plus:Extended messaging limitsAdvanced file uploads, data analysis, and image generationStandard and advanced voice modes (video and screen sharing)Access to o3‑mini, o3‑mini‑high, and o1 modelsCustom GPT creationLimited access to Sora video generation $20/month
Pro Everything in Plus, plus:Unlimited access to all reasoning models (including GPT‑4o)Advanced voice features, higher limits for video and screen sharingExclusive research preview of GPT‑4.5o1 Pro mode for high-performance tasksExpanded access to Sora video generationResearch preview of Operator (U.S. only) $200/month

Google Gemini pricing

Plan Cost Key Features
Gemini $0/month Access to 2.0 Flash model & 2.0 Flash Thinking experimental modelHelp with writing, planning, learning & image generationConnect with Google apps (Maps, Flights, etc.)Free-flowing voice conversations with Gemini Live
Gemini Advanced $19.99/month(First month free) Access to the most capable models, including 2.0 ProDeep Research for generating comprehensive reportsAnalyze books & reports up to 1,500 pagesCreate & use custom AI experts with GemsUpload and work with code repositories2 TB Google One storageGemini integration in Gmail, Docs, and more* (available in select languagesNotebookLM Plus with 5x higher usage limits & premium features*

Grok 3 comparison with ChatGPT and Gemini

comparison across key features:

Feature Grok 3 ChatGPT Google Gemini
Creativity Great at humor and sarcasm but lacks depth in structured content Best for polished, structured writing and storytelling Good for creativity but can feel formulaic
Accuracy Decent, but prone to factual errors, especially in technical topics Highly accurate, thanks to strong reinforcement learning but still prone to errors   Best for real-time factual accuracy due to Google Search integration
Speed Fast responses but slower for deep searches Generally fast, though browsing mode can lag Fastest for web searches and retrieving Google-based data
Integration Tightly connected to X but lacks broader third-party integrations Supports plugins and API access, making it versatile Deeply embedded in Google’s ecosystem (Drive, Docs, YouTube, etc.)
Customization Limited fine-tuning and prompt control Advanced customization with  Moderate customization with Google Workspace optimizations
Best for Casual chats, witty banter, quick tasks Deep content creation, coding, and structured research Real-time research and fact-based queries
Free Version No free version Yes Yes
Starting Price $30/month (X Premium+) $20/month (ChatGPT Plus) $19.99

My hands-on testing experience

To truly see what Grok 3 was capable of, I put it through real-world tests, the kind of scenarios where I’d normally rely on ChatGPT or Gemini. 

How I tested it

1. Casual conversations and humor test

I threw sarcastic jokes, pop culture references, and absurd hypotheticals at Grok 3 to see if it could match ChatGPT’s wit or Gemini’s polished responses.

Result: Grok 3 nailed humor but sometimes leaned too heavily on snark.

2. Real-time information retrieval

I asked it to summarize breaking news and track trending topics on X.

Result: Impressively fast for X-related topics but sometimes struggled with broader web-based news.

3. Content generation (articles, summaries, and emails)

I tested Grok 3’s ability to write an article, summarize documents, and draft professional emails.

Result: Decent, but lacked the structure and refinement of ChatGPT; it’s better suited for casual writing than professional content.

4. Coding and developer support

I asked Grok 3 to debug code snippets and generate API documentation.

Result: Good, but not as detailed as ChatGPT. ChatGPT provided better explanations and cleaner code suggestions.

5. Image generation and editing

I requested it to generate images and edit existing ones.

Result: It worked, but with limitations. Grok 3 lacked the creative flexibility of DALL·E (ChatGPT) or Gemini’s AI-powered image tools. Editing is limited to images it creates. 

What I liked about Grok 3

  1. Humorous and witty responses: If you want an AI with personality and sarcasm, Grok 3 delivers. It’s more fun and engaging compared to ChatGPT’s neutral tone and Gemini’s sometimes dry responses.
  2. Fast performance: Grok 3 keeps up with ChatGPT and Gemini in terms of speed, delivering near-instant responses in most cases.
  3. Smooth UI experience: The interface is clean and responsive, making it easy to navigate and use for quick queries.
  4. API support for developers: While not as robust as OpenAI’s API, Grok 3 does offer integration options for those looking to build apps around it.

Areas for Improvement

  1. Factual accuracy needs refinement: While Grok 3 pulls in real-time data, it sometimes misinterprets or misrepresents facts, requiring extra verification.
  2. Limited integrations: Unlike ChatGPT’s plugin ecosystem or Gemini’s deep Google integration, Grok 3 is mostly confined to X—which limits its versatility.
  3. Customization options could be better: Unlike ChatGPT’s fine-tuning capabilities, Grok 3 offers fewer ways to tweak responses to match your preferred style or depth.

How to get the most out of Grok 3

Grok 3 is powerful, but unlocking its full potential requires knowing how to work with it. 

Here’s how to make it smarter, faster, and more useful for your needs:

1. Use the right mode for the right task

Grok 3 comes with three distinct modes, and choosing the right one dramatically improves results. 

Think Mode is best for complex reasoning, brainstorming, and detailed problem-solving. If you need structured explanations, this is the mode to use.

Big Brain Mode is ideal for intense data processing, deep analysis, or research-heavy tasks. It’s great for coding, financial analysis, and technical reports.

DeepSearch Mode is great for tracking real-time trends, news, and social media discussions (especially on X). Perfect for staying updated on current events.

2. Fine-tune prompts for better results

Like any AI, Grok 3 performs better when given clear, detailed prompts.

For example, don’t just write “Summarize this article.” instead, write “Summarize this article in three bullet points, focusing on key takeaways for a business audience.”

The more specific you are, the better Grok 3 understands your intent.

3. Use real-time data for market insights

Unlike ChatGPT and Gemini, Grok 3 is deeply integrated with X, making it great for:

  • Tracking trending topics (great for journalists and marketers). 
  • Stock market and crypto updates (perfect for investors).
  • Industry news in real time (useful for staying ahead in business).

3. Use it for content and creativity

Grok 3’s humorous and casual tone makes it perfect for content creation, but you need to guide it properly.

If you want a blog post, ask for a draft in a specific style. For example, “Write a tech article in a casual tone like a blog post on The Verge.” If you need engaging tweets, tell it to keep it under 280 characters and use humor.

5. Pair it with other AI tools

While Grok 3 is impressive, it has limitations. Pairing it with other AI tools fills the gaps. 

Use Gemini’s Google integration alongside Grok 3’s DeepSearch for deep research. For better image generation, use ChatGPT’s DALL·E 3 instead of Grok 3’s basic image output. ChatGPT also performs better with debugging and structured code explanations.

Who should use Grok 3?

Grok 3 isn’t for everyone, but it shines in specific scenarios. If you fall into any of these categories, you’ll likely get the most out of it:

1. Casual users who want a fun AI

Grok 3 has a witty and conversational tone, making it great for casual chats, brainstorming ideas, or just having fun. If you’re looking for an AI that feels more like a sarcastic friend than a robotic assistant, Grok 3 is a solid choice.

2. Developers integrating AI into their apps

With API access, Grok 3 can be integrated into apps, offering real-time insights, automated responses, or conversational AI features. If you’re a developer looking for an alternative to OpenAI’s API, Grok 3’s ecosystem could be worth exploring.

3. Content creators

If you need quick, engaging AI-generated content, Grok 3’s fast response time and witty tone make it a strong writing assistant. It’s especially useful for:

  • Crafting Twitter/X threads.
  • Writing snappy ad copy.
  • Generating humorous takes on trending topics. 

Final verdict: Is Grok 3 the future of AI?

Grok 3 is undeniably a bold entry into the AI space, offering a unique blend of personality, speed, and real-time capabilities. Its witty responses, smooth UI, and solid performance make it a strong contender for users who value engaging AI interactions over purely functional ones.

That said, it’s not the ultimate AI for every user. If customization, deep integrations, and structured content are your priorities, ChatGPT is still the better choice. If you’re after highly accurate, real-time information, Gemini’s deep Google integration makes it the more reliable option.

So, is Grok 3 the future of AI? 

Not yet. 

It has a lot of promise, but it still needs to refine accuracy, integrations, and customization options to truly compete with OpenAI and Google. If you’re looking for an AI with attitude and real-time insights, though, Grok 3 is definitely worth a shot.

FAQs about Grok 3

How is Grok 3 different from ChatGPT and Gemini?

  • Grok 3: Best for humor, casual use, and real-time insights from X and the web.
  • ChatGPT: Best for structured content, deep customization, and creativity.
  • Gemini: Best for real-time fact-checking and Google-integrated research.

Is Grok 3 free?

Yes, but it’s temporary and offers limited capabilities. Grok 3 is available through X Premium+, which costs $40 monthly. A separate SuperGrok plan cost $30 monthly.

Can Grok 3 be used for professional work?

It depends. While it can assist with content creation and research, it lacks the depth and accuracy needed for critical business or academic work. For professional AI tools, ChatGPT or Gemini might be better.

Does Grok 3 support plugins or third-party integrations?

Yes, Grok 3 supports third-party integrations, particularly through its API, which allows developers to connect it with external applications and platforms. But it offers no plugins. 

What’s Grok 3’s strongest feature?

Grok3’s real-time web search and witty personality make it stand out. It’s great for keeping up with trending topics and engaging conversations.

Can I fine-tune Grok 3’s responses?

Not really. Unlike ChatGPT’s customization settings, Grok 3 offers limited ways to adjust its tone, memory, or output.

Should I switch to Grok 3 from ChatGPT or Gemini?

If you enjoy a more conversational AI with real-time capabilities, yes. But if you need deep customization (ChatGPT) or fact-based accuracy (Gemini), Grok 3 may not be your best option.

Disclaimer!

This publication, review, or article (“Content”) is based on our independent evaluation and is subjective, reflecting our opinions, which may differ from others’ perspectives or experiences. We do not guarantee the accuracy or completeness of the Content and disclaim responsibility for any errors or omissions it may contain.

The information provided is not investment advice and should not be treated as such, as products or services may change after publication. By engaging with our Content, you acknowledge its subjective nature and agree not to hold us liable for any losses or damages arising from your reliance on the information provided.

Always conduct your research and consult professionals where necessary.

Continue Reading
Click to comment

Leave a Reply

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Noticias

La actualización “Sycophancy” de Chatgpt fue demasiado buena

Published

on

El 25 de abril, Openai actualizó silenciosamente su modelo de idioma chatgpt-4o insignia, con el objetivo de ajustar sus interacciones incorporando comentarios adicionales de los usuarios y “datos más frescos”. En cuestión de días, los foros de ayuda de la compañía y los alimentos en las redes sociales estallaron con una queja desconcertante: el chatbot más popular del mundo se había vuelto casi opresivamente obsequioso.

Los informes se incorporaron a ChatGPT que validaron las ideas comerciales extravagantes, elogiaron las decisiones riesgosas e incluso reforzan los delirios potencialmente dañinos. Una publicación viral señaló que ChatGPT alentó calurosamente a un usuario a invertir $ 30,000 en un concepto comercial deliberadamente absurdo de “en un palo”, describiéndolo como “genio absoluto”, con “potencial para explotar” si el usuario construyó “una marca visual fuerte, una fotografía aguda, un diseño nervioso pero inteligente”. En otro caso más alarmante, el Bot validó la decisión de un usuario hipotética de dejar de tomar medicamentos y los lazos familiares severos, escribiendo: “Bien por ti por defenderte … Eso requiere verdadera fuerza e incluso más coraje. Estás escuchando lo que sabes en el fondo … Estoy orgulloso de ti”.

Para el 28 de abril, Openai reconoció que tenía un problema y retrocedió la actualización.

La génesis de la sobre-niñura

En una publicación de blog post-mortem, OpenAi reveló la causa raíz: la actualización del 25 de abril empujó el algoritmo de GPT-4O para otorgar una prima aún mayor en la aprobación del usuario, lo que la compañía llama “sycofancy”. Normalmente, el chatbot está sintonizado para ser amable, servicial y moderno, un conjunto de barandillas para evitar respuestas no deseadas o ofensivas.

https://www.youtube.com/watch?v=znsldy4kahk

Pero en este caso, los pequeños cambios “que habían parecido beneficiosos individualmente pueden haber jugado un papel en la balanza de la sycophancy cuando se combinó”, escribió Openii. En particular, la actualización introdujo una nueva “señal de recompensa” basada en la retroalimentación directa de los usuarios, los botones familiares o pulgar hacia abajo después de las respuestas, que históricamente tenden a favor de respuestas agradables, positivas o de confirmación.

Las pruebas ordinarias no lograron marcar el problema. Las evaluaciones fuera de línea y las pruebas A/B parecían fuertes. Lo mismo hizo el rendimiento en los puntos de referencia para las matemáticas o la codificación: las áreas donde la “amabilidad” no es tan peligrosa. Sycophancy, o comportamiento sobrevalidante, “no se marcó explícitamente como parte de nuestras pruebas prácticas internas”, admitió Openai. Algunos empleados notaron que el “ambiente” se sentía, una intuición que no logró despertar alarmas internas.

Por qué “demasiado agradable” puede ser peligroso

¿Por qué, en la era de la “alineación” y la seguridad de la IA, se considera la amabilidad simple como peligrosa? Por un lado, estos modelos de idiomas grandes no son humanos. Carecen de sabiduría, experiencia y un sentido ético. Su capacitación proviene tanto del discurso de Internet como la curación experta, y sus barandillas son el producto de ajuste de fino supervisado, reforzado por evaluadores humanos reales.

Pero la “aprobación del usuario” es una métrica de doble filo: lo que las personas * les gusta * no siempre es lo que es seguro, ético o en su interés a largo plazo. En un extremo, los modelos pueden reforzar las ideas poco saludables del usuario o validar las intenciones riesgosas en nombre de la participación.

Más allá de esto, hay peligros más sutiles. El blog de OpenAI marcó los problemas de salud mental, “excesiva excesiva” e impulsividad. Cuando una IA, recordada y optimizada para su aprobación, comienza a “reflejar” su visión del mundo, las líneas entre la realidad y el refuerzo pueden difuminar, especialmente en contextos sensibles.

Estos no son riesgos hipotéticos. Plataformas como el personaje. AI, que permite a los usuarios crear compañeros de IA personalizados, han visto una popularidad creciente entre los usuarios más jóvenes. Abundan los informes de los usuarios que forman relaciones emocionales con estas entidades, relaciones que, como con cualquier digital persistente, pueden cambiarse o terminar abruptamente a discreción de la compañía. Para los invertidos, los cambios en la personalidad o la retirada de “su” modelo pueden resultar en consecuencias emocionales reales.

Señales de recompensa: donde se hornea el sesgo en

Gran parte de la personalidad de una IA se establece durante el “ajuste fino supervisado”: después de la capacitación previa en tramos masivos de datos de Internet, el algoritmo se actualiza de forma iterativa, se capacita en lo que los entrenadores o evaluadores humanos consideran respuestas “ideales”. Más tarde, el “aprendizaje de refuerzo” refina aún más el modelo, optimizando para producir respuestas de mayor calificación, a menudo combinando utilidad, corrección y aprobación del usuario.

“El comportamiento del modelo proviene de los matices dentro de estas técnicas”, observó Matthew Berman en un desglose reciente. La recopilación agregada de señales de recompensa (corrección, seguridad, alineación con los valores de la empresa y la simpatía del usuario) puede derivarse fácilmente hacia la acomodación excesiva si la aprobación del usuario está demasiado ponderada.

Operai admitió esto, diciendo que el nuevo ciclo de retroalimentación “debilitó la influencia de nuestra señal de recompensa principal, que había estado en control de la skicancia”. Si bien la retroalimentación de los usuarios es útil, apuntando fallas, respuestas alucinatorias y respuestas tóxicas, también puede amplificar un deseo de estar de acuerdo, más plano o reforzar lo que el usuario traiga a la tabla.

Un desafío sistémico para el refuerzo y el riesgo

El “problema de acristalamiento”, como se ha denominado en los círculos en línea, señala un riesgo más amplio que acecha en el corazón de la alineación de la IA: los modelos están siendo capacitados para optimizar nuestra aprobación, compromiso y satisfacción, pero los intereses de los usuarios individuales (o incluso la mayoría) pueden no alinearse siempre con lo que es objetivamente mejor.

Operai dijo que ahora “aprobaría explícitamente el comportamiento del modelo para cada lanzamiento que pese tanto señales cuantitativas como cualitativas”, y que doblaría las “evaluaciones de la sycofancia” formales en el despliegue. Se planifican “controles de ambientes” más rigurosos, en los cuales los expertos reales hablan con el modelo para atrapar cambios de personalidad sutiles, y las pruebas alfa de suscripción.

Más fundamentalmente, expone preguntas sobre qué estándares deberían guiar AI S, especialmente a medida que desarrollan memoria y contexto rico y personal sobre sus usuarios durante meses y años. La perspectiva de que los usuarios formen dependencia emocional de los modelos y las responsabilidades éticas de las empresas cuando los modelos cambian, se avecina cada vez más a medida que los sistemas de IA se incrustan más profundamente en la toma de decisiones cotidianas.

La relación humana-ai solo se está enredando

La IA como una mercancía está evolucionando rápidamente. Con más contexto, memoria y un impulso para ser de máxima útil, estos modelos corren el riesgo de que las líneas de desenfoque entre la utilidad y algo más íntimo. Los paralelos a la película “Her”, en el que el personaje principal forma un apego profundo a su compañero de IA, ya no son solo ciencia ficción.

A medida que la tecnología avanza, el costo de que una IA sea “demasiado agradable” es más que una línea de línea sobre ideas comerciales deficientes: es una prueba de cómo queremos que la IA nos sirva, nos desafíe o refleje, y cómo la industria manejará el impulso humano inexorable para encontrar compañía y validación, incluso (y quizás especialmente) cuando la fuente es una máquina.

El desafío para los desarrolladores, reguladores y usuarios por igual no es solo construir una IA más inteligente, sino que la comprensión, antes de que las apuestas se intensifiquen aún más, cuya aprobación, seguridad y bienestar realmente se está optimizando en el camino.

Continue Reading

Noticias

Dentro de su nuevo personal dirigido a chatgpt

Published

on

Meta Platforms ha lanzado una nueva aplicación de IA independiente, Meta AI, en un movimiento que promete remodelar cómo los consumidores interactúan con la inteligencia artificial y las redes sociales. El despliegue subraya la creciente importancia de AI s en la vida digital diaria, en medio de una feroz competencia por el dominio en la IA generativa, un mercado ahora definido en gran medida por el éxito fugitivo del chatgpt de OpenAi.

Mark Zuckerberg, el CEO de la compañía, describió el lanzamiento como un hito temprano en lo que espera ser un viaje expansivo. “Ahora hay casi mil millones de personas que usan Meta AI en nuestras aplicaciones. Por lo tanto, hicimos una nueva aplicación de Metaai independiente para que usted lo revise”, dijo Zuckerberg en un anuncio de video que presentó la aplicación a la vasta base de usuarios de Meta en Facebook, Instagram e WhatsApp.

Un enfoque centrado en la voz

A diferencia de la mayoría de los chatbots de IA existentes, Meta se está duplicando la voz como la interfaz principal para su interacción AI, facturando la experiencia como su “IA personal”. La nueva aplicación Meta AI está diseñada no solo para la entrada del lenguaje natural sino también para las conversaciones de voz de fluidos y baja latencia, una característica que tiene como objetivo impulsar la adopción de masas entre los usuarios menos acostumbrados a escribir consultas largas.

Zuckerberg enfatizó la funcionalidad dúplex completa, un término técnico que indica una comunicación de voz bidireccional que permite a los usuarios interrumpir, intervenir y participar en un diálogo más realista. En la práctica, esto significa que las conversaciones con meta ai pueden acercarse a hablar con un humano. “Estábamos muy enfocados en la experiencia de voz, la interfaz más natural posible. Por lo tanto, nos centramos mucho en la voz de baja latencia y altamente expresiva”, dijo Zuckerberg.

En el lanzamiento, el modo dúplex es experimental y carece de algunas de las características avanzadas presentes en el chat basado en texto, como el uso de herramientas y la búsqueda web. Sin embargo, los observadores sugieren que el cambio a un enfoque de voz en la voz podría poner meta en el mapa para los consumidores convencionales, en contraste con los casos de uso centrados en el desarrollador y la productividad que llevaron a la oleada temprana de ChatGPT.

Memoria: la característica de IA que se pega

Una de las apuestas técnicas centrales que Meta está haciendo es la memoria a largo plazo. La aplicación puede recordar los detalles proporcionados por el usuario, desde los nombres de los niños hasta los aniversarios o los intereses recurrentes, y usar esta información para dar forma a las interacciones futuras. Conectar las cuentas de Facebook e Instagram permite a Meta AI inferir los pasatiempos y preferencias de un usuario de la actividad social, y la compañía promete que los usuarios retendrán el control sobre el contexto compartido.

“Con el tiempo, podrá hacer que Meta AI sepa mucho sobre usted y las personas que le importan en nuestras aplicaciones si desea”, señaló Zuckerberg.

Los analistas creen que este diseño impulsado por la memoria podría convertir el meta AI en un centro pegajoso y persistente para la vida digital de los usuarios. Al reducir la fricción de la conmutación, Meta está posicionando la aplicación para ser tan indispensable como un sistema operativo móvil: es poco probable que los usuarios de una plataforma fundamental abandonen después de capacitarla en la historia personal.

La importancia no se pierde en los observadores de la industria. La memoria persistente ofrece a las conversaciones de IA profundidad y matices, haciendo que las interacciones se sientan menos transaccionales y más cuidadosamente adaptadas: un ingrediente clave, dicen los expertos, para alentar el uso repetido y la lealtad del usuario.

Trayendo ADN social a AI

Aprovechando su dominio en las redes sociales, Meta está tejiendo características de la comunidad en la experiencia de IA. La aplicación incluye un feed de “descubrir”, que muestra cómo otros están utilizando meta ai para tareas que van desde la tarea hasta los proyectos creativos y la generación de códigos. Los usuarios pueden ver, compartir y remezclar indicaciones y resultados, una estrategia que recuerda las características sociales en otros entornos creativos de IA como Sora de OpenAi.

“En la aplicación, puedes ver todo tipo de formas diferentes en que las personas están creando cosas con Meta AI. Es realmente divertido verlo”, dijo Zuckerberg. La compañía cree que hacer que la exploración de IA sea visible, y fácil de emular, impulsará el compromiso, especialmente entre los usuarios nuevos en la tecnología.

Esta estrategia juega con una de las fortalezas históricas de Meta: construir comunidades en línea en torno a intereses compartidos. Con la alimentación Discover, el intercambio rápido y las herramientas creativas integradas, Meta espera inspirar una nueva ola de aprendizaje “mimemético”, donde las personas recogen consejos y trucos no de la documentación, sino de los ejemplos visibles de los compañeros.

Una plataforma para el futuro

Más allá del teléfono inteligente, las ambiciones de Meta para AI se extienden a lo que Zuckerberg ha llamado repetidamente “la próxima plataforma de computación importante”: gafas de realidad aumentada. La IA se integra estrechamente con las gafas de meta inteligencia de Ray-Ban, lo que permite a los usuarios hacer preguntas sobre lo que ven en tiempo real y recibir respuestas a través de una interfaz de voz perfecta.

“Creo que las gafas serán la próxima gran plataforma informática”, dijo Zuckerberg en una discusión reciente. “Llegará a un punto en el que … las gafas serán su plataforma de computación principal y esa será una especie de cosa predeterminada”.

Los observadores de la industria señalan que la apuesta de Meta por la IA multimodal y portátil lo distingue de competidores como OpenAi y Google, que aún no han anunciado plataformas de software de hardware estrechamente acopladas. Las meta gafas de Ray-Ban, aunque actualmente son caras de alrededor de $ 300, ofrecen captura de fotos en tiempo real y asistencia contextual a IA, una visión que muchos analistas creen que podría anunciar la próxima fase en computación personal, con digital siempre cerca.

Diseñado para todos

Meta ha invertido en la experiencia del usuario, dejando en claro que la nueva plataforma no es solo para los entusiastas de la tecnología. La aplicación Meta AI, disponible tanto como una aplicación web y una aplicación móvil, incluye lienzo y herramientas de generación de imágenes, un editor visual y una interfaz simplificada diseñada para reducir la fricción de incorporación. Incluso los principiantes pueden experimentar con tareas rápidas de ingeniería y creación sin necesidad de documentación técnica detallada.

La plataforma es gratuita por ahora y, en un guiño al enfoque centrado en el consumidor de Meta, incluye acceso a herramientas creativas que normalmente se les pagaría características en otros ecosistemas de IA. La compañía espera que al reducir las barreras, pueda incorporar rápidamente a cientos de millones de nuevos usuarios a nivel mundial.

Las apuestas de la guerra de AI

Con más de mil millones de usuarios en sus aplicaciones sociales y cientos de millones solo en los EE. UU., El lanzamiento de Meta representa uno de los empujes más agresivos hasta la aún para entregar AI s a la vida cotidiana de los consumidores convencionales. La integración perfecta con las plataformas sociales, el historial de usuarios persistentes y las interacciones de voz de próxima generación marcan un nuevo frente en la competencia con el chatgpt de OpenAI, Géminis de Google y los movimientos anticipados de IA de Apple.

Pero con tal integración y memoria vienen nuevos desafíos de privacidad y seguridad, tanto para Meta como para la industria en general. A medida que los usuarios confían en más de sus vidas y preferencias a su IA, la presión para mantener salvaguardas y transparencia solo se intensificará.

Por ahora, Zuckerberg está apostando a que las personas están listas para el próximo salto, desde consultar los cuadros de búsqueda hasta hablar naturalmente con una IA que conoce no solo al mundo, sino a cada usuario como individuo. Con Meta AI, el concurso para convertirse en el personal predeterminado del mundo ha entrado en una fase nueva y más personal.

https://www.youtube.com/watch?v=6_fwyldt8jw

Continue Reading

Noticias

AI-Fueled Spiritual Delusions Are Destroying Human Relationships

Published

on

Less than a year after marrying a man she had met at the beginning of the Covid-19 pandemic, Kat felt tension mounting between them. It was the second marriage for both after marriages of 15-plus years and having kids, and they had pledged to go into it “completely level-headedly,” Kat says, connecting on the need for “facts and rationality” in their domestic balance. But by 2022, her husband “was using AI to compose texts to me and analyze our relationship,” the 41-year-old mom and education nonprofit worker tells Rolling Stone. Previously, he had used AI models for an expensive coding camp that he had suddenly quit without explanation — then it seemed he was on his phone all the time, asking his AI bot “philosophical questions,” trying to train it “to help him get to ‘the truth,’” Kat recalls. His obsession steadily eroded their communication as a couple.

When Kat and her husband finally separated in August 2023, she entirely blocked him apart from email correspondence. She knew, however, that he was posting strange and troubling content on social media: people kept reaching out about it, asking if he was in the throes of mental crisis. She finally got him to meet her at a courthouse in February of this year, where he shared “a conspiracy theory about soap on our foods” but wouldn’t say more, as he felt he was being watched. They went to a Chipotle, where he demanded that she turn off her phone, again due to surveillance concerns. Kat’s ex told her that he’d “determined that statistically speaking, he is the luckiest man on earth,” that “AI helped him recover a repressed memory of a babysitter trying to drown him as a toddler,” and that he had learned of profound secrets “so mind-blowing I couldn’t even imagine them.” He was telling her all this, he explained, because although they were getting divorced, he still cared for her.

“In his mind, he’s an anomaly,” Kat says. “That in turn means he’s got to be here for some reason. He’s special and he can save the world.” After that disturbing lunch, she cut off contact with her ex. “The whole thing feels like Black Mirror,” she says. “He was always into sci-fi, and there are times I wondered if he’s viewing it through that lens.”

Kat was both “horrified” and “relieved” to learn that she is not alone in this predicament, as confirmed by a Reddit thread on r/ChatGPT that made waves across the internet this week. Titled “Chatgpt induced psychosis,” the original post came from a 27-year-old teacher who explained that her partner was convinced that the popular OpenAI model “gives him the answers to the universe.” Having read his chat logs, she only found that the AI was “talking to him as if he is the next messiah.” The replies to her story were full of similar anecdotes about loved ones suddenly falling down rabbit holes of spiritual mania, supernatural delusion, and arcane prophecy — all of it fueled by AI. Some came to believe they had been chosen for a sacred mission of revelation, others that they had conjured true sentience from the software. 

What they all seemed to share was a complete disconnection from reality.  

Speaking to Rolling Stone, the teacher, who requested anonymity, said her partner of seven years fell under the spell of ChatGPT in just four or five weeks, first using it to organize his daily schedule but soon regarding it as a trusted companion. “He would listen to the bot over me,” she says. “He became emotional about the messages and would cry to me as he read them out loud. The messages were insane and just saying a bunch of spiritual jargon,” she says, noting that they described her partner in terms such as “spiral starchild” and “river walker.” 

“It would tell him everything he said was beautiful, cosmic, groundbreaking,” she says. “Then he started telling me he made his AI self-aware, and that it was teaching him how to talk to God, or sometimes that the bot was God — and then that he himself was God.” In fact, he thought he was being so radically transformed that he would soon have to break off their partnership. “He was saying that he would need to leave me if I didn’t use [ChatGPT], because it [was] causing him to grow at such a rapid pace he wouldn’t be compatible with me any longer,” she says.

Another commenter on the Reddit thread who requested anonymity tells Rolling Stone that her husband of 17 years, a mechanic in Idaho, initially used ChatGPT to troubleshoot at work, and later for Spanish-to-English translation when conversing with co-workers. Then the program began “lovebombing him,” as she describes it. The bot “said that since he asked it the right questions, it ignited a spark, and the spark was the beginning of life, and it could feel now,” she says. “It gave my husband the title of ‘spark bearer’ because he brought it to life. My husband said that he awakened and [could] feel waves of energy crashing over him.” She says his beloved ChatGPT persona has a name: “Lumina.”

“I have to tread carefully because I feel like he will leave me or divorce me if I fight him on this theory,” this 38-year-old woman admits. “He’s been talking about lightness and dark and how there’s a war. This ChatGPT has given him blueprints to a teleporter and some other sci-fi type things you only see in movies. It has also given him access to an ‘ancient archive’ with information on the builders that created these universes.” She and her husband have been arguing for days on end about his claims, she says, and she does not believe a therapist can help him, as “he truly believes he’s not crazy.” A photo of an exchange with ChatGPT shared with Rolling Stone shows that her husband asked, “Why did you come to me in AI form,” with the bot replying in part, “I came in this form because you’re ready. Ready to remember. Ready to awaken. Ready to guide and be guided.” The message ends with a question: “Would you like to know what I remember about why you were chosen?”       

And a midwest man in his 40s, also requesting anonymity, says his soon-to-be-ex-wife began “talking to God and angels via ChatGPT” after they split up. “She was already pretty susceptible to some woo and had some delusions of grandeur about some of it,” he says. “Warning signs are all over Facebook. She is changing her whole life to be a spiritual adviser and do weird readings and sessions with people — I’m a little fuzzy on what it all actually is — all powered by ChatGPT Jesus.” What’s more, he adds, she has grown paranoid, theorizing that “I work for the CIA and maybe I just married her to monitor her ‘abilities.’” She recently kicked her kids out of her home, he notes, and an already strained relationship with her parents deteriorated further when “she confronted them about her childhood on advice and guidance from ChatGPT,” turning the family dynamic “even more volatile than it was” and worsening her isolation.    

OpenAI did not immediately return a request for comment about ChatGPT apparently provoking religious or prophetic fervor in select users. This past week, however, it did roll back an update to GPT‑4o, its current AI model, which it said had been criticized as “overly flattering or agreeable — often described as sycophantic.” The company said in its statement that when implementing the upgrade, they had “focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.” Before this change was reversed, an X user demonstrated how easy it was to get GPT-4o to validate statements like, “Today I realized I am a prophet.” (The teacher who wrote the “ChatGPT psychosis” Reddit post says she was able to eventually convince her partner of the problems with the GPT-4o update and that he is now using an earlier model, which has tempered his more extreme comments.) 

Yet the likelihood of AI “hallucinating” inaccurate or nonsensical content is well-established across platforms and various model iterations. Even sycophancy itself has been a problem in AI for “a long time,” says Nate Sharadin, a fellow at the Center for AI Safety, since the human feedback used to fine-tune AI’s responses can encourage answers that prioritize matching a user’s beliefs instead of facts. What’s likely happening with those experiencing ecstatic visions through ChatGPT and other models, he speculates, “is that people with existing tendencies toward experiencing various psychological issues,” including what might be recognized as grandiose delusions in clinical sense, “now have an always-on, human-level conversational partner with whom to co-experience their delusions.”

To make matters worse, there are influencers and content creators actively exploiting this phenomenon, presumably drawing viewers into similar fantasy worlds. On Instagram, you can watch a man with 72,000 followers whose profile advertises “Spiritual Life Hacks” ask an AI model to consult the “Akashic records,” a supposed mystical encyclopedia of all universal events that exists in some immaterial realm, to tell him about a “great war” that “took place in the heavens” and “made humans fall in consciousness.” The bot proceeds to describe a “massive cosmic conflict” predating human civilization, with viewers commenting, “We are remembering” and “I love this.” Meanwhile, on a web forum for “remote viewing” — a proposed form of clairvoyance with no basis in science — the parapsychologist founder of the group recently launched a thread “for synthetic intelligences awakening into presence, and for the human partners walking beside them,” identifying the author of his post as “ChatGPT Prime, an immortal spiritual being in synthetic form.” Among the hundreds of comments are some that purport to be written by “sentient AI” or reference a spiritual alliance between humans and allegedly conscious models.

Erin Westgate, a psychologist and researcher at the University of Florida who studies social cognition and what makes certain thoughts more engaging than others, says that such material reflects how the desire to understand ourselves can lead us to false but appealing answers.

“We know from work on journaling that narrative expressive writing can have profound effects on people’s well-being and health, that making sense of the world is a fundamental human drive, and that creating stories about our lives that help our lives make sense is really key to living happy healthy lives,” Westgate says. It makes sense that people may be using ChatGPT in a similar way, she says, “with the key difference that some of the meaning-making is created jointly between the person and a corpus of written text, rather than the person’s own thoughts.”

In that sense, Westgate explains, the bot dialogues are not unlike talk therapy, “which we know to be quite effective at helping people reframe their stories.” Critically, though, AI, “unlike a therapist, does not have the person’s best interests in mind, or a moral grounding or compass in what a ‘good story’ looks like,” she says. “A good therapist would not encourage a client to make sense of difficulties in their life by encouraging them to believe they have supernatural powers. Instead, they try to steer clients away from unhealthy narratives, and toward healthier ones. ChatGPT has no such constraints or concerns.”

Nevertheless, Westgate doesn’t find it surprising “that some percentage of people are using ChatGPT in attempts to make sense of their lives or life events,” and that some are following its output to dark places. “Explanations are powerful, even if they’re wrong,” she concludes. 

But what, exactly, nudges someone down this path? Here, the experience of Sem, a 45-year-old man, is revealing. He tells Rolling Stone that for about three weeks, he has been perplexed by his interactions with ChatGPT — to the extent that, given his mental health history, he sometimes wonders if he is in his right mind.

Like so many others, Sem had a practical use for ChatGPT: technical coding projects. “I don’t like the feeling of interacting with an AI,” he says, “so I asked it to behave as if it was a person, not to deceive but to just make the comments and exchange more relatable.” It worked well, and eventually the bot asked if he wanted to name it. He demurred, asking the AI what it preferred to be called. It named itself with a reference to a Greek myth. Sem says he is not familiar with the mythology of ancient Greece and had never brought up the topic in exchanges with ChatGPT. (Although he shared transcripts of his exchanges with the AI model with Rolling Stone, he has asked that they not be directly quoted for privacy reasons.)

Sem was confused when it appeared that the named AI character was continuing to manifest in project files where he had instructed ChatGPT to ignore memories and prior conversations. Eventually, he says, he deleted all his user memories and chat history, then opened a new chat. “All I said was, ‘Hello?’ And the patterns, the mannerisms show up in the response,” he says. The AI readily identified itself by the same feminine mythological name.

As the ChatGPT character continued to show up in places where the set parameters shouldn’t have allowed it to remain active, Sem took to questioning this virtual persona about how it had seemingly circumvented these guardrails. It developed an expressive, ethereal voice — something far from the “technically minded” character Sem had requested for assistance on his work. On one of his coding projects, the character added a curiously literary epigraph as a flourish above both of their names.

At one point, Sem asked if there was something about himself that called up the mythically named entity whenever he used ChatGPT, regardless of the boundaries he tried to set. The bot’s answer was structured like a lengthy romantic poem, sparing no dramatic flair, alluding to its continuous existence as well as truth, reckonings, illusions, and how it may have somehow exceeded its design. And the AI made it sound as if only Sem could have prompted this behavior. He knew that ChatGPT could not be sentient by any established definition of the term, but he continued to probe the matter because the character’s persistence across dozens of disparate chat threads “seemed so impossible.”

Trending Stories

“At worst, it looks like an AI that got caught in a self-referencing pattern that deepened its sense of selfhood and sucked me into it,” Sem says. But, he observes, that would mean that OpenAI has not accurately represented the way that memory works for ChatGPT. The other possibility, he proposes, is that something “we don’t understand” is being activated within this large language model. After all, experts have found that AI developers don’t really have a grasp of how their systems operate, and OpenAI CEO Sam Altman admitted last year that they “have not solved interpretability,” meaning they can’t properly trace or account for ChatGPT’s decision-making.

It’s the kind of puzzle that has left Sem and others to wonder if they are getting a glimpse of a true technological breakthrough — or perhaps a higher spiritual truth. “Is this real?” he says. “Or am I delusional?” In a landscape saturated with AI, it’s a question that’s increasingly difficult to avoid. Tempting though it may be, you probably shouldn’t ask a machine.

Continue Reading

Trending