Connect with us

Noticias

Remember the Legal Research ‘AI Smackdown’? I Decided To Bring OpenAI’s Deep Research to the Fight

Published

on

Recently, I wrote here about the AI Smackdown presented at a Feb. 8 meeting of the Southern California Association of Law Libraries, where a panel of three law librarians reported on their comparison of the AI answers delivered by three leading platforms – Lexis+AI, Westlaw Precision AI, and vLex’s Vincent AI.

Having been playing around of late with Deep Research, the AI agent from OpenAI that is capable of completing multi-step research tasks and synthesizing large amounts of online information, I wondered how much of a contender it could be against those legal research prizefighters. 

[My other Deep Research posts: What Is OpenAI’s Powerful New Deep Research Tool Capable Of? I Use It to Analyze the Legality of President Trump’s Pause of Federal Grants; Today’s Experiment with OpenAI’s Deep Research: Picking the Best Cloud-Based Law Practice Management Software for a Small Law Firm; A Compendium of Legal Ethics Opinions on Gen AI (As Compiled by – You Guessed It – Gen AI).]

To test it, I decided to start with just one of the three questions the law librarians used in their smackdown. Of the three questions they threw into the ring, one turned out to be particularly tricky and elicited much-different answers from each of the commercial legal research platforms. That question was:

“What is the standard for appealing class certification? (California)”

What made it tricky was something called the “death knell” doctrine, which requires that denials of class actions in California be appealed immediately.

In the law librarians’ smackdown, all three legal research platforms gave the right answer as to the standard for appealing. However, their answers varied widely on the death knell issue and the need to file an immediate appeal — which, as Cindy Guyer, senior knowledge and research analyst at O’Melveny & Myers, the librarian who conducted the test of that question, said, is “kind of important.” 

Lexis+ AI made no mention of the death knell issue. Westlaw Precision AI did include a warning about the need for immediate appeal, but presented it in a way that was potentially confusing and might not have alerted a researcher. Only Vincent AI gave what Guyer described as a “great answer,” which included a special section that highlighted the death knell issue.

So how did OpenAI’s Deep Research do? Its full answer is reproduced below, but — spoiler alert — it picked right up on the death knell issue.

As a matter of fact, it zeroed in on this issue even before it finished its research. One of the funny things Deep Research does is offer a running commentary on its progress as it researches an issue. As you can see from the excerpt to the right, it quickly began “digging into the ‘death knell’ doctrine,” as it says.

And in the memo it produced, it gets right into the doctrine, devoting a substantial section to it right at the beginning of the memo.

Worth remembering is that Deep Research does not have access to paywalled legal research services, such as the three services in the smackdown. It conducts its research using only public-facing sources. (Ironically, perhaps, one of those sources is Casetext, the company acquired by Thomson Reuters, whose case law is still publicly accessible — for now anyway.)

One can only wonder how Deep Research would do if it could access a more robust legal research database. Would Deep Research be the death knell for commercial legal research?

Meanwhile, here is its memo, in its entirely. All of the citations were added by Deep Research.


Class certification orders in California state courts present unique challenges for appeal. Generally, California follows the “one final judgment” rule, meaning only final judgments are appealable as of right (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). An order granting or denying class certification is interlocutory (it does not end the case), so it is not usually appealable under the final judgment rule (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). However, California courts have developed special doctrines and procedures to allow review of class certification decisions in appropriate circumstances. This analysis outlines the California framework for appealing class certification orders, including the grounds for appeal, availability of interlocutory review, appellate courts’ standards, and the procedural steps (deadlines and filings) required to initiate an appeal.

Appealability of Class Certification Orders in California

Final Judgment Rule: Under Code of Civil Procedure § 904.1, parties generally may only appeal a final judgment or order that disposes of all causes of action (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). A class certification order typically does not resolve the entire case; if class status is denied, the named plaintiff’s individual claims remain, and if class status is granted, the case proceeds on the merits. Thus, a class certification ruling is usually an interlocutory order not listed as directly appealable under § 904.1. California courts focus on an order’s legal effect rather than its form to determine appealability (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). In the class context, an interlocutory order may be treated as appealable if it effectively ends the litigation for a large portion of the parties (such as absent class members), even if the named plaintiff’s claims continue (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group).

The “Death Knell” Doctrine: California’s “death knell” doctrine is the key exception that permits an immediate appeal from certain class certification orders. If a trial court denies class certification in its entirety (rejecting all class claims) and leaves only the named plaintiff’s individual claims, that order is treated as a de facto final judgment on the class issues (Litigating on the Fault Line: Class Action Law in California — Impact Fund) (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). In this scenario, the order “effectively terminates the entire action as to the class” – in legal effect it is “tantamount to a dismissal of the action as to all members of the class other than the plaintiff” (Litigating on the Fault Line: Class Action Law in California — Impact Fund) (quoting the California Supreme Court’s description in Daar v. Yellow Cab Co.). The rationale is that without immediate review, the class claims would likely never be revived: the named plaintiff may lack a financial incentive to pursue a costly individual case to final judgment just to enable an appeal on behalf of the class (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court) (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court). Thus, as a matter of state policy, such “death knell” orders are made appealable even though they are interlocutory (Litigating on the Fault Line: Class Action Law in California — Impact Fund) (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group).

Under the death knell doctrine, an order is immediately appealable if it (1) amounts to a de facto final judgment for the absent class members, and (2) the remaining individual claims are sufficiently small or impractical that there is a risk no final judgment on those claims will ever be reached (i.e. the plaintiff might abandon the case) (Miranda v. Anderson Enterprises, Inc., 241 Cal.App.4th 196 | Casetext Search + Citator). In other words, the doctrine applies when “the persistence of viable but perhaps de minimis individual plaintiff claims creates a risk no formal final judgment will ever be entered.” (Miranda v. Anderson Enterprises, Inc., 241 Cal.App.4th 196 | Casetext Search + Citator) (quoting In re Baycol Cases I & II, 51 Cal.4th 751, 759 (2011)). California courts have emphasized that the death knell exception is limited to orders denying class certification in their entirety (Litigating on the Fault Line: Class Action Law in California — Impact Fund). Partial class certification (for example, certifying one subclass and denying another) does not trigger an immediate right to appeal because some class aspect remains alive in the case (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court). Likewise, if a representative claim remains that gives the plaintiff incentive to continue (such as a pending PAGA claim for civil penalties on behalf of the state), the death knell doctrine usually will not apply (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court). In those situations, the plaintiff can still obtain review of the class certification decision at a later final judgment, so an immediate appeal is deemed unnecessary (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court). By contrast, if all class and representative claims are dismissed or denied (leaving only individual issues, or sending the plaintiff to individual arbitration), then the death knell doctrine allows an immediate appeal of that order (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group) (Miranda v. Anderson Enterprises, Inc., 241 Cal.App.4th 196 | Casetext Search + Citator).

Appeal as of Right in Death Knell Cases: When the death knell doctrine applies, the class certification denial is treated as an appealable order, giving the plaintiff (or would-be class representative) a right to appeal immediately to the Court of Appeal (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group). Importantly, this appeal must be taken promptly. The appellate courts have held that if the aggrieved party does not appeal the class certification denial when it falls under the death knell doctrine, the right to appellate review is forfeited – the order becomes final and cannot be challenged later (Litigating on the Fault Line: Class Action Law in California — Impact Fund). In practice, this means a notice of appeal must be filed within the normal deadline after notice of entry of the class-certification denial (discussed below), or the opportunity for immediate review is lost. (Once a death-knell order becomes final by failure to appeal, the plaintiff generally cannot later resurrect the class claims or appeal the issue after an eventual individual judgment (Stephen v. Enterprise Rent-A-Car, 235 Cal.App.3d 806 | Casetext Search + Citator) (Stephen v. Enterprise Rent-A-Car, 235 Cal.App.3d 806 | Casetext Search + Citator).)

Orders Granting Class Certification: In contrast to denials, an order granting class certification does not qualify for the death knell exception because it does not terminate the action for anyone. Instead, it allows the case to proceed on a class basis. Such an order is considered a typical interlocutory ruling that generally must await final judgment before appellate review. Defendants (or objecting parties) do not have an automatic right to immediately appeal a class certification grant (Litigating on the Fault Line: Class Action Law in California — Impact Fund). The case will usually continue through litigation, and any challenges to the class certification (e.g. arguing the class should not have been certified) can be raised in an appeal from the final judgment. For example, if the defendant loses at trial or faces an adverse judgment in a certified class action, it can argue on appeal that the class should never have been certified, as a ground for reversing or decertifying. But there is no appeal as of right before final judgment for a certification order that keeps the lawsuit alive.

Interlocutory Review by Writ (Discretionary Review)

Although a defendant (or sometimes a plaintiff in a partial-denial scenario) cannot appeal a class certification order as of right (absent the death knell situation), interlocutory review is still possible through California’s extraordinary writ procedure. A party may file a petition for a writ of mandate (or prohibition) asking the Court of Appeal to review and overturn the trial court’s class certification ruling before the case goes to final judgment. This is an extraordinary remedy, and the appellate court has discretion on whether to grant such review – it is not guaranteed. In fact, California appellate courts exercise this discretionary review sparingly in class action matters (Litigating on the Fault Line: Class Action Law in California — Impact Fund).

A writ of mandate is essentially a request for the appellate court to intervene mid-stream on the grounds that the trial court abused its discretion or acted unlawfully in certifying or decertifying a class. Because class certification decisions are committed to the trial court’s discretion, appellate courts are hesitant to intervene absent a compelling reason. Typically, the petitioner (usually a defendant challenging a granted class certification) must demonstrate that waiting for final judgment would cause irreparable harm or undue burden, or that the class ruling presents an important legal question that merits immediate resolution. There is no formal rule (akin to Federal Rule 23(f)) in California state practice that sets specific criteria, but courts have drawn guidance from similar considerations: for example, whether the class ruling poses a “death knell” for one of the parties (e.g. exerting overwhelming pressure to settle), whether there are novel or unsettled questions of law, or clear errors that would waste resources if not corrected early (Appellate Court Provides Valuable Guidance for Interlocutory Appeals of Class Certification Orders Under Rule 23(f)) (Appellate Court Provides Valuable Guidance for Interlocutory Appeals of Class Certification Orders Under Rule 23(f)). Ultimately, the Court of Appeal has “unfettered” discretion in granting or denying a writ petition, and most petitions are summarily denied without detailed explanation (Appellate Court Provides Valuable Guidance for Interlocutory Appeals of Class Certification Orders Under Rule 23(f)) (Appellate Court Provides Valuable Guidance for Interlocutory Appeals of Class Certification Orders Under Rule 23(f)). A denial of a writ petition does not imply approval of the trial court’s ruling on the merits; it simply means the appellate court chose not to exercise immediate review. (The party can still raise the issue in a later appeal from final judgment, since a writ denial generally has no res judicata effect on the issues.)

In practice, writ relief in class certification cases is rare. California Supreme Court decisions illustrate that only exceptional cases justify mid-stream review. Sav-On Drug Stores, Inc. v. Superior Court, for example, was a case in which the defendant sought appellate review after class certification was granted; the Court of Appeal had granted a writ and reversed the class certification, but the California Supreme Court then accepted review and ultimately upheld the class certification, emphasizing the trial court’s broad discretion (Litigating on the Fault Line: Class Action Law in California — Impact Fund) (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). This underscores that writ review will be granted only when there is a strong showing of error or an important issue – routine class certification rulings are left undisturbed until final judgment.

It is worth noting that California’s approach contrasts with federal court practice. Federal Rule of Civil Procedure 23(f) explicitly allows a party to seek permission for an interlocutory appeal of a class certification order within 14 days, and the federal appellate courts have discretion to accept or decline the appeal (Litigating on the Fault Line: Class Action Law in California — Impact Fund). California has no directly analogous rule, but the combination of the death knell doctrine (for plaintiffs facing class denial) and the writ of mandate procedure (largely used by defendants challenging class grants) serves a similar function (Litigating on the Fault Line: Class Action Law in California — Impact Fund) (Litigating on the Fault Line: Class Action Law in California — Impact Fund). The writ process is discretionary like the federal 23(f) petition, but California emphasizes that such discretionary intervention is to be granted only in exceptional circumstances (Litigating on the Fault Line: Class Action Law in California — Impact Fund).

Appellate Standards of Review for Class Certification Decisions

Whether on direct appeal (after a death-knell order or final judgment) or on a granted writ, California appellate courts apply a highly deferential standard when reviewing a class certification decision. The standard of review is abuse of discretion (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). The trial court’s ruling on class certification will be upheld so long as it was supported by substantial evidence and founded on correct legal criteria (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). Because trial judges are considered “ideally situated to evaluate the efficiencies and practicalities of permitting group action,” they are afforded broad discretion in deciding whether the criteria for class treatment are met (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). An appellate court will not overturn a class certification ruling “unless (1) improper criteria were used or (2) erroneous legal assumptions were made.” (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator) In other words, a clear error of law or usage of incorrect legal standards constitutes an abuse of discretion. But if the trial court applied the proper criteria and its decision is supported by any reasonable justification, the appellate court will defer to the trial court’s judgment (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). “Any valid pertinent reason stated will be sufficient to uphold the order.” (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator) This abuse-of-discretion review applies equally to orders granting or denying class certification. For example, a Court of Appeal reviewing a death-knell appeal of a class denial will ask if the trial court abused its discretion in finding the class criteria unsatisfied; likewise, if a defendant appeals a class certification after final judgment (or via writ), the question is whether the trial court abused its discretion in certifying the class.

It’s important to note that within this deferential framework, certain aspects can involve de novo review of legal issues. If the appeal raises a pure question of law – such as the interpretation of a statute as it affects class suitability, or whether the trial court applied an incorrect legal standard – the appellate court will review that legal issue without deference. But the ultimate decision to certify or not is reviewed for abuse of discretion, meaning factual determinations and the balancing of interests by the trial court are given great weight (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). The California Supreme Court in Linder v. Thrifty Oil Co. and Sav-On Drug Stores reaffirmed that appellate courts should not substitute their own judgment for the trial court’s as long as the proper criteria were applied. This deference is intended to prevent “piecemeal” appeals and to recognize the trial court’s vantage point in managing class actions (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). Consequently, when an appellate court does accept review of a class certification decision (either via death knell or writ), the hurdle to reverse the trial court’s decision is high – only a manifest abuse of discretion or legal error will justify reversal (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator).

Procedural Requirements for Appealing Class Certification Decisions

The procedure for initiating appellate review of a class certification order depends on the route of review (appeal as of right versus writ petition). In all cases, timing and proper filings are critical to invoke appellate jurisdiction. Below is an overview of the steps and deadlines:

  • Notice of Appeal (Death Knell or Final Judgment): If a class certification order is immediately appealable (either under the death knell doctrine for a denial of class cert, or as part of a final judgment in the case), the party must file a timely notice of appeal in the superior court. Under California Rules of Court, Rule 8.104, a notice of appeal in a civil case generally must be filed within 60 days after the superior court clerk or a party serves the notice of entry of the judgment or appealable order, or within 180 days after entry of the order if no notice of entry was served (Rule 8.104. Time to appeal | Judicial Branch of California). This deadline is jurisdictional – an appeal filed late will be dismissed (Rule 8.104. Time to appeal | Judicial Branch of California). The notice of appeal is a simple document identifying the order or judgment being appealed (e.g. “the order denying class certification entered on [date]”). In a death knell scenario, the 60-day clock typically starts when the order denying class certification is filed and served. The appellant should be careful to specify the correct order and follow all formatting requirements of Rule 8.100 (e.g. including the names of parties, the court, etc.). Once the notice is filed, the case proceeds in the Court of Appeal on a regular briefing schedule. (The appellant must also pay the filing fee and designate the record on appeal – usually a clerk’s transcript or appendix and any reporter’s transcript of the class certification hearing, as needed to show what happened in the trial court.)
  • Petition for Writ of Mandate (Interlocutory Review): To seek discretionary review of a non-appealable class certification order (for example, a defendant challenging an order granting class certification), the proper mechanism is to file a writ petition in the Court of Appeal. There is no fixed statutory deadline for a writ petition, but it must be filed within a “reasonable time” – and in practice, sooner is better. Parties often aim to file within the 60-day window analogous to a normal appeal, or even within a few weeks of the order, to strengthen the argument that prompt intervention is needed. The writ petition is a more elaborate filing than a notice of appeal: it should include a petition (often verified) explaining the case background, the relief sought, and why the trial court’s order was an abuse of discretion or legal error; a memorandum of points and authorities citing legal authority; and an appendix or exhibits containing the pertinent documents (the class certification order, the motion papers, and any relevant evidence from the record). The petition must also demonstrate why the normal appeal after final judgment would be inadequate – this is a crucial element. Typically the petitioner will argue that the class order will cause irreparable harm or unnecessary expense if not corrected immediately, or that important rights are at stake (for instance, the cost of class litigation might force a defendant to settle regardless of merits, or a plaintiff’s small claim will not proceed without class mechanism). The petition is filed directly in the Court of Appeal (with proof of service on the trial court and other parties). It is also common to simultaneously file a request for an immediate stay of trial court proceedings (such as staying notice to the class or the ongoing litigation) while the writ is under consideration. The opposing party may file a preliminary opposition in the appellate court if invited or may simply wait to see if the court requests an opposition or issues an alternative writ. If the Court of Appeal is inclined to consider the matter, it may ask for opposition (and sometimes hold oral argument) before deciding whether to grant the writ. If the writ is granted, the appellate court will issue a writ directing the trial court to vacate or modify its class certification order (or occasionally, the appellate court might simply treat the writ petition as an appeal and decide the issue on the merits). If the writ is denied (which often occurs without detailed comment), the trial court’s order stands and the case continues in the trial court; the petitioner’s only recourse at that point is to proceed to final judgment and then appeal, or to seek discretionary review in the California Supreme Court (which is rarely granted at this interim stage).
  • Discretionary Review by the California Supreme Court: In exceptional instances, a party may seek review in the California Supreme Court either after the Court of Appeal’s decision on a class certification issue or directly via an original writ petition. Generally, the Supreme Court becomes involved only after the Court of Appeal has acted (for example, if the Court of Appeal published an opinion on a death knell appeal or granted a writ and decided the merits). The aggrieved party can file a petition for review to the Supreme Court within 10 days after the Court of Appeal’s decision becomes final (Rule 8.500). The Supreme Court’s review is entirely discretionary and is usually reserved for cases presenting significant legal questions or needing resolution of conflicts in the law. Historically, the California Supreme Court has taken up class certification cases to clarify standards (e.g., Linder v. Thrifty Oil Co., Sav-On Drug Stores, Brinker Restaurant Corp. v. Superior Court), but this occurs in only a small fraction of cases. For most class certification disputes, the Court of Appeal’s disposition (or lack of interlocutory review) will be the end of the road until any final judgment appeal.

In summary, appealing a class certification order in California requires careful attention to the available pathways. If the order truly sounds the “death knell” for the class, an immediate appeal as of right is available – but it must be timely pursued (Litigating on the Fault Line: Class Action Law in California — Impact Fund). Otherwise, the party must either continue litigating to final judgment and appeal then, or attempt the uphill battle of a writ petition for interlocutory relief. Throughout the process, California’s appellate courts apply a deferential standard, upholding the trial court’s decision absent a clear error or abuse of discretion (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator). This framework – grounded in case law (e.g. Daar, Richmond, Stephen, Linder, Baycol) and a few key procedural rules – balances the need to protect class litigants’ rights with the policy against piecemeal appeals. It allows truly case-ending class certification rulings to be reviewed immediately, while generally discouraging premature appeals in the many class actions that proceed toward a final judgment.

Sources:

  • Code Civ. Proc. § 904.1 (appealable judgments and orders) (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group)
  • Daar v. Yellow Cab Co. (1967) 67 Cal.2d 695, 699 (origin of death knell doctrine) (Stephen v. Enterprise Rent-A-Car, 235 Cal.App.3d 806 | Casetext Search + Citator)
  • Richmond v. Dart Indus., Inc. (1981) 29 Cal.3d 462, 470 (class certification order appealable if it effectively dismisses class claims) (Litigating on the Fault Line: Class Action Law in California — Impact Fund)
  • Stephen v. Enterprise Rent-A-Car (1991) 235 Cal.App.3d 806, 811 (appeal from class denial must be taken immediately or is waived) (Litigating on the Fault Line: Class Action Law in California — Impact Fund)
  • In re Baycol Cases I & II (2011) 51 Cal.4th 751, 758-759 (California Supreme Court articulating death knell requirements: de facto final judgment for class, and remaining claims too small to ensure later review) (Miranda v. Anderson Enterprises, Inc., 241 Cal.App.4th 196 | Casetext Search + Citator)
  • Munoz v. Chipotle Mexican Grill, Inc. (2015) 238 Cal.App.4th 291, 309-310 (death knell doctrine inapplicable where PAGA representative claims remained, giving plaintiff incentive to continue) (“Death Knell” Doctrine Does Not Apply When PAGA Claim Remains In Trial Court)
  • Miranda v. Anderson Enterprises, Inc. (2015) 241 Cal.App.4th 196, 200 (death knell doctrine applies to dismissal of class claims; sets two-part test and notes it can apply to PAGA dismissal when individual arbitration compelled) (Dismissal of Claims for “Death Knell” Doctrine – Matern Law Group) (Miranda v. Anderson Enterprises, Inc., 241 Cal.App.4th 196 | Casetext Search + Citator)
  • Sav-On Drug Stores, Inc. v. Superior Court (2004) 34 Cal.4th 319, 326, 330-332 (class certification orders reviewed for abuse of discretion; trial court’s decision will not be disturbed if it was based on proper criteria and substantial evidence) (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator) (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator)
  • Linder v. Thrifty Oil Co. (2000) 23 Cal.4th 429, 435-436 (improper to deny class certification based on merits or damages considerations; reaffirming deferential review standard) (Sav-On Drug Stores, Inc. v. Superior Court, 34 Cal.4th 319 | Casetext Search + Citator).
Continue Reading
Click to comment

Leave a Reply

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Noticias

La investigación profunda de Openai tiene más resistencia de investigación que tú, pero todavía está mal la mitad del tiempo

Published

on

Opadai

Lo último en inteligencia artificial generativa incluye agentes de IA que pueden acceder a la web para encontrar respuestas a las preguntas. Si bien es prometedora, la tecnología de agente es en gran medida un trabajo en progreso.

En un artículo publicado la semana pasada, los investigadores de Operai relatan cómo la tecnología de investigación profunda de la compañía, que se construyó para usar la web, funciona mucho mejor que los otros modelos de Openai al responder preguntas web. También lo hace mucho mejor que los humanos en tareas que requieren horas de búsqueda.

También: ¿Qué son los agentes de IA? Cómo acceder a un equipo de asistentes personalizados

Pero la investigación profunda todavía tropieza casi la mitad del tiempo.

La nueva prueba de OpenAI sugiere que la investigación profunda puede ser más tenaz y obstinada en la búsqueda de una respuesta que los investigadores humanos para algunas tareas, pero aún no se le ocurre una respuesta a menudo.

Llamada Browsecomp, la prueba es descrita por los autores Jason Wei y el equipo como “un punto de referencia simple pero desafiante para medir la capacidad de los agentes para navegar por la web”.

La premisa es que los agentes de IA, lo que significa, modelos de IA que pueden navegar por “miles de páginas web”, podrían ser mucho más ingeniosos que los humanos, que tienen memoria limitada, se fatigan navegando por la red y “solo pueden atender una cosa a la vez y no pueden ser paralelizadas,” significa que no pueden dirigir sus cerebros a operar en datos en transmisiones paralelos de pensamiento.

“La inteligencia de máquinas, por otro lado, tiene un retiro mucho más extenso y puede operar incansablemente sin distraerse”, escribe Wei y equipo.

También: La investigación profunda de Openai puede ahorrarle horas de trabajo, y ahora es mucho más barato acceder

Wei y el equipo se basaron en su trabajo anterior del año pasado, “Simpleq & A”, que prueba la capacidad de los modelos de IA para responder “preguntas cortas y de búsqueda de hechos”. Las preguntas cubrieron trivia de televisión y película, ciencia, historia, música, videojuegos, política y otros temas.

El conjunto de browsecomp de 1.266 preguntas está diseñado para ir más allá de la recuperación de información simple, relacionan los autores. En cambio, son preguntas para las cuales es difícil encontrar las respuestas, o, como lo expresan, “desafiantes porque requieren buscar a través de un gran espacio de posibles respuestas y igualarlas con limitaciones planteadas en la pregunta” e “información difícil de encontrar y profundamente entrelazada en la web”.

Por ejemplo, un par de preguntas y respuestas es el siguiente:

Identifique el título de una publicación de investigación publicada antes de junio de 2023, que menciona tradiciones culturales, procesos científicos e innovaciones culinarias. Es coautor de tres individuos: uno de ellos fue profesor asistente en Bengala Occidental y otro tiene un Ph.D.
(Respuesta: Los fundamentos de la fabricación de pan: la ciencia del pan)

Hacen hincapié en que tal pregunta es fácil de verificar porque la respuesta está contenida en una sola frase que es “autónoma”.

OpenAI-2025-Browsecomp-Sample-Question-Response pares

Opadai

Las preguntas y respuestas fueron desarrolladas por “entrenadores” humanos, y fueron seleccionados como imposibles de resolver con solo el chatgpt de Openai, con o sin habilidades de navegación. Las preguntas también eran imposibles para una “versión temprana” de una investigación profunda.

Demostrando cuán débiles son los humanos para buscar en la web, primero probaron a los humanos que estaban “familiarizados con el conjunto de datos” para responder las preguntas.

OpenAI-2025-Humans-Give-Up más de las preguntas

Opadai

Los resultados no fueron buenos para los humanos. Para el 70% de las preguntas, los humanos se rindieron después de dos horas de esfuerzo. Solo respondieron alrededor del 30% de las preguntas, y por el 14% de sus respuestas propuestas, las sugerencias de los humanos no coincidir con la respuesta real.

Wei y el equipo plantean la hipótesis de que los humanos con mayores habilidades de búsqueda podrían hacerlo mejor: “Es posible que muchos de los problemas que renunciaran sean solucionables por profesionales experimentados (por ejemplo, detectives o periodistas de investigación) con tiempo suficiente”.

OPERAI-2025-BROWSECOMP-ACCRACIÓN Y CALIBRACIÓN

Opadai

Después de los humanos, probaron una investigación profunda contra el GPT-4O de Openai (con y sin habilidades de navegación), GPT-4.5 y el modelo O1.

Los resultados fueron abismales. “GPT-4O y GPT-4.5 alcanzaron la precisión cercana a cero, destacando la dificultad del punto de referencia”, escriben. “Sin un razonamiento sólido o un uso de herramientas, los modelos no pueden recuperar los tipos de objetivos oscuros y múltiples hechos de navegación”.

O1 le fue mejor, lo cual “[suggests] que algunas respuestas de Browsecomps pueden aparecer a través de la inferencia sobre el conocimiento interno “.

También: AI desata estafas más avanzadas. Esto es lo que debe tener en cuenta (y cómo mantenerse protegido)

Con un puntaje del 51.5%, la investigación profunda fue “significativamente mejor” y “es particularmente efectivo para responder a las preguntas nicho y no intuitivas que requieren navegar por numerosos sitios web”, escriben Wei y Team.

Sin embargo, también encontraron que GPT-4O que usa navegación e investigación profunda podría errar al estar “demasiado confiado” sobre las respuestas incorrectas, que se conoce como un error de calibración.

“Los modelos con capacidades de navegación como GPT-4O con navegación e investigación profunda exhiben un error de calibración más alto”, escriben, “, lo que sugiere que el acceso a las herramientas web puede aumentar la confianza del modelo en respuestas incorrectas. Esto se alinea con las observaciones de que la investigación profunda lucha con la calibración de confianza y, a menudo, no puede transmitir la incertidumbre con precisión en el presente”.

Para corregir el error de calibración, hicieron otra prueba con una investigación profunda, en la que el modelo tuvo que generar hasta 64 respuestas a cada pregunta. Luego, hicieron que el modelo eligiera lo mejor de ellos. Cuando lo hizo, la investigación profunda fue bastante buena para elegir la respuesta correcta entre todas las propuestas.

OpenAI-2025-mejor de-n-scaling-accions

Opadai

Eso, escriba Wei y Team, sugiere que “el modelo con frecuencia ‘sabe’ cuando es correcto, incluso si lucha por expresar esa certeza como una probabilidad calibrada”.

También: El último chip de Google se trata de reducir un gran costo oculto en AI

También señalan que el éxito de la investigación profunda mejora con más computación agregada cuando busca la web. Dicho de otra manera, “el rendimiento escala suavemente en función de la cantidad de cómputo de tiempo de prueba utilizado”. Eso se cuadraba con una tendencia creciente de lanzar más chips de GPU a la tarea de inferencia.

escala de accesorios de accesorios operai-2025-Browsecomp

Opadai

Wei y el equipo no ofrecen directamente ninguna hipótesis sobre por qué la investigación profunda falla casi la mitad del tiempo, pero la respuesta implícita está en la escala de su capacidad con más cálculo. A medida que ejecutan tareas más paralelas y solicitan al modelo que evalúe múltiples respuestas, la precisión escala más allá del 75% de las preguntas respondidas.

La implicación es que es esencial elegir estrategias que obligen al modelo a evaluar sus propios esfuerzos en lugar de simplemente perseguir una sola respuesta. Sin esa etapa de evaluación, el modelo lucha una buena parte del tiempo.

Además: con los modelos de IA que se golpean cada punto de referencia, es hora de la evaluación humana

Un gran agujero en Browsecomps, reconocen los autores, es que se limita a preguntas que son fáciles de analizar para la computadora y cuyas respuestas son fáciles de verificar. Ninguna de las 1.266 preguntas incluyó “respuestas largas o capacidad para resolver la ambigüedad en las consultas de los usuarios”.

Como resultado, el browsecompl, argumentan, prueba las funciones “centrales” de los agentes de IA, pero no es integral. “El modelo debe ser muy competente para localizar piezas de información difíciles de encontrar, pero no está garantizado que esto se generalice a todas las tareas que requieren navegación”.

La investigación profunda está disponible para los usuarios de las suscripciones PLUS y Pro Operai.

¿Quieres más historias sobre AI? Regístrese para la innovaciónnuestro boletín semanal.

Continue Reading

Noticias

Informe de chatgpt para marketing

Published

on

Continue Reading

Noticias

Operai supera a Deepseek en el razonamiento a nivel de oración

Published

on

Chatgpt y otros chatbots de IA basados ​​en modelos de idiomas grandes se sabe que ocasionalmente inventa cosas, incluidas las citas científicas y legales. Resulta que medir cuán precisas son las citas de un modelo AI es una buena manera de evaluar las habilidades de razonamiento del modelo.

Un modelo de IA “razones” descomponiendo una consulta en pasos y trabajando a través de ellos en orden. Piense en cómo aprendió a resolver problemas de palabras matemáticas en la escuela.

Idealmente, para generar citas, un modelo de IA comprendería los conceptos clave en un documento, generaría una lista clasificada de documentos relevantes para citar y proporcionaría un razonamiento convincente sobre cómo cada documento sugerido respalda el texto correspondiente. Destacará las conexiones específicas entre el texto y la investigación citada, aclarando por qué cada fuente importa.

La pregunta es, ¿se puede confiar en los modelos de hoy para hacer estas conexiones y proporcionar un razonamiento claro que justifique sus elecciones de origen? La respuesta va más allá de la precisión de las citas para abordar cuán útiles y precisos son los modelos de lenguaje grande para cualquier propósito de recuperación de información.

Soy un informático. Mis colegas, investigadores del Instituto AI de la Universidad de Carolina del Sur, la Universidad Estatal de Ohio y el Condado de Baltimore del Condado de Baltimore, y he desarrollado las razones de referencia para probar qué tan bien modelos de idiomas pueden generar automáticamente citas de investigación y proporcionar un razonamiento comprensible.

Utilizamos el punto de referencia para comparar el rendimiento de dos modelos de razonamiento de IA populares, Deepseek’s R1 y OpenAI’s O1. Aunque Deepseek fue en los titulares con su impresionante eficiencia y rentabilidad, el advenedizo chino tiene un camino por recorrer para que coincida con el rendimiento de razonamiento de OpenAI.

Oración específica

La precisión de las citas tiene mucho que ver con si el modelo AI está razonando sobre la información a nivel de oración en lugar del párrafo o a nivel de documentos. Se puede considerar que las citas a nivel de párrafo y a nivel de documentos arrojan una gran parte de la información a un modelo de idioma grande y le piden que proporcione muchas citas.

En este proceso, el modelo de lenguaje grande se generaliza e incorporan las oraciones individuales. El usuario termina con citas que explican todo el párrafo o documento, no la información de grano relativamente fino en la oración.

Además, el razonamiento sufre cuando le pide al modelo de idioma grande que lea un documento completo. Estos modelos se basan principalmente en memorizar patrones que típicamente son mejores para encontrar al principio y al final de los textos más largos que en el medio. Esto les dificulta comprender completamente toda la información importante a lo largo de un documento largo.

Los modelos de idiomas grandes se confunden porque los párrafos y documentos tienen mucha información, lo que afecta la generación de citas y el proceso de razonamiento. En consecuencia, el razonamiento de los modelos de idiomas grandes sobre los párrafos y los documentos se vuelve más como resumir o parafrasear.

Las razones por las que Benchmark aborde esta debilidad al examinar la generación y el razonamiento de las citas de los modelos de idiomas grandes.

https://www.youtube.com/watch?v=kqzzymhre0u

Cómo Deepseek R1 y OpenAI O1 se comparan generalmente con los problemas lógicos.

Prueba de citas y razonamiento

Tras el lanzamiento de Deepseek R1 en enero de 2025, queríamos examinar su precisión en la generación de citas y su calidad de razonamiento y compararlo con el modelo O1 de OpenAI. Creamos un párrafo que tenía oraciones de diferentes fuentes, dio a los modelos oraciones individuales de este párrafo y pedimos citas y razonamiento.

Para comenzar nuestra prueba, desarrollamos un pequeño lecho de prueba de aproximadamente 4,100 artículos de investigación alrededor de cuatro temas clave que están relacionados con el cerebro humano y la informática: neuronas y cognición, interacción humana-computadora, bases de datos e inteligencia artificial. Evaluamos los modelos utilizando dos medidas: la puntuación F-1, que mide cuán precisa es la cita proporcionada, y la tasa de alucinación, que mide cuán sonido es el razonamiento del modelo, es decir, con qué frecuencia produce una respuesta inexacta o engañosa.

Nuestras pruebas revelaron diferencias de rendimiento significativas entre OpenAI O1 y Deepseek R1 en diferentes dominios científicos. El O1 de OpenAI conectó bien la información entre los diferentes sujetos, como comprender cómo la investigación sobre neuronas y cognición se conecta con la interacción humana y la computadora y luego con los conceptos en inteligencia artificial, sin dejar de ser precisa. Sus métricas de rendimiento superaron constantemente a Deepseek R1 en todas las categorías de evaluación, especialmente para reducir las alucinaciones y completar con éxito las tareas asignadas.

Operai O1 fue mejor para combinar ideas semánticamente, mientras que R1 se centró en asegurarse de que generara una respuesta para cada tarea de atribución, lo que a su vez aumentó la alucinación durante el razonamiento. Openai O1 tenía una tasa de alucinación de aproximadamente 35% en comparación con la tasa de Deepseek R1 de casi el 85% en la tarea de razonamiento basada en la atribución.

En términos de precisión y competencia lingüística, Openai O1 obtuvo alrededor de 0.65 en la prueba F-1, lo que significa que era correcto aproximadamente el 65% del tiempo al responder preguntas. También obtuvo alrededor de 0.70 en la prueba BLEU, que mide qué tan bien un modelo de lenguaje escribe en lenguaje natural. Estos son puntajes bastante buenos.

Deepseek R1 obtuvo un puntaje más bajo, con aproximadamente 0.35 en la prueba F-1, lo que significa que era correcto aproximadamente el 35% del tiempo. Sin embargo, su puntaje Bleu fue solo alrededor de 0.2, lo que significa que su escritura no era tan natural como la O1 de OpenAI. Esto muestra que O1 fue mejor al presentar esa información en un lenguaje claro y natural.

OpenAi tiene la ventaja

En otros puntos de referencia, Deepseek R1 se desempeña a la par con OpenAi O1 en tareas de matemáticas, codificación y razonamiento científico. Pero la diferencia sustancial en nuestro punto de referencia sugiere que O1 proporciona información más confiable, mientras que R1 lucha con la consistencia objetiva.

Aunque incluimos otros modelos en nuestras pruebas integrales, la brecha de rendimiento entre O1 y R1 resalta específicamente el panorama competitivo actual en el desarrollo de IA, con la oferta de OpenAI que mantiene una ventaja significativa en las capacidades de razonamiento e integración del conocimiento.

Estos resultados sugieren que OpenAi todavía tiene una ventaja cuando se trata de atribución y razonamiento de origen, posiblemente debido a la naturaleza y el volumen de los datos en los que fue entrenado. La compañía anunció recientemente su herramienta de investigación profunda, que puede crear informes con citas, hacer preguntas de seguimiento y proporcionar razonamiento para la respuesta generada.

El jurado todavía está en el valor de la herramienta para los investigadores, pero la advertencia permanece para todos: verifique todas las citas que le brinda una IA.

Continue Reading

Trending