Noticias
Dos periodistas octogenarios deman a ChatGPT por robar su trabajo

Dos periodistas octogenarios, Nicholas Gage y Nicholas Basbanes, presentaron una demanda contra OpenAI y su socio Microsoft, alegando que el chatbot ChatGPT podría estar utilizando y repropulsando su trabajo sin permiso ni compensación. Ambos periodistas, con décadas de experiencia en el periodismo y la autoría de libros, encontraron que las capacidades de ChatGPT para generar texto humanoide parecían derivar de grandes cantidades de escritos humanos, sin reconocer ni remunerar a los autores originales.
Gage, conocido por su aclamada memoria “Eleni”, y Basbanes, autor de libros sobre cultura literaria, iniciaron su demanda tras notar que el chatbot podía estar “robando” su trabajo. Esta demanda ha sido incorporada en un caso más amplio que busca el estatus de acción colectiva, representando a autores como John Grisham y George R. R. Martin, quienes también alegan que sus obras fueron utilizadas sin permiso para entrenar la IA de OpenAI.
La defensa de OpenAI y Microsoft se basa en el argumento de “uso justo”, sugiriendo que el entrenamiento de sistemas de IA con contenido disponible en internet es legal bajo las leyes de derechos de autor de EE. UU. Sin embargo, esta postura se encuentra en una “zona gris”, especialmente en casos donde organizaciones de noticias han manifestado explícitamente su oposición a que su contenido sea usado sin permiso
Noticias
DeepSeek’s R1 and OpenAI’s Deep Research just redefined AI — RAG, distillation, and custom models will never be the same

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Things are moving quickly in AI—and if you’re not keeping up, you’re falling behind.
Two recent developments are reshaping the landscape for developers and enterprises alike: DeepSeek’s R1 model release and OpenAI’s new Deep Research product. Together, they’re redefining the cost and accessibility of powerful reasoning models, which has been well reported on. Less talked about, however, is how they’ll push companies to use techniques like distillation, supervised fine-tuning (SFT), reinforcement learning (RL), and retrieval-augmented generation (RAG) to build smarter, more specialized AI applications.
After the initial excitement around the amazing achievements of DeepSeek begins to settle, developers and enterprise decision-makers need to consider what it means for them. From pricing and performance to hallucination risks and the importance of clean data, here’s what these breakthroughs mean for anyone building AI today.
Cheaper, transparent, industry-leading reasoning models – but through distillation
The headline with DeepSeek-R1 is simple: It delivers an industry-leading reasoning model at a fraction of the cost of OpenAI’s o1. Specifically, it’s about 30 times cheaper to run, and unlike many closed models, DeepSeek offers full transparency around its reasoning steps. For developers, this means you can now build highly customized AI models without breaking the bank—whether through distillation, fine-tuning, or simple RAG implementations.
Distillation, in particular, is emerging as a powerful tool. By using DeepSeek-R1 as a “teacher model,” companies can create smaller, task-specific models that inherit R1’s superior reasoning capabilities. These smaller models, in fact, are the future for most enterprise companies. The full R1 reasoning model can be too much for what companies need – thinking too much, and not taking the decisive action companies need for their specific domain applications. “One of the things that no one is really talking about in, certainly in the mainstream media, is that actually the reasoning models are not working that well for things like agents,” said Sam Witteveen, an ML developer who works on AI agents, which are increasingly orchestrating enterprise applications.
As part of its release, DeepSeek distilled its own reasoning capabilities onto a number of smaller models, including open-source models from Meta’s Llama family and Alibaba’s Qwen family, as described in its paper. It’s these smaller models that can then be optimized for specific tasks. This trend toward smaller, fast models to serve custom-built needs will accelerate: there will be armies of them. “We are starting to move into a world now where people are using multiple models. They’re not just using one model all the time,” said Witteveen. And this includes the low-cost, smaller closed-sourced models from Google and OpenAI as well. “Meaning that models like Gemini Flash, GPT-4o Mini, and these really cheap models actually work really well for 80% of use cases,” he said.
If you work in an obscure domain, and have resources: Use SFT…
After the distilling step, enterprise companies have a few options to make sure the model is ready for their specific application. If you’re a company in a very specific domain, where details around the domain are not on the web or in books – where LLMs can train on them – you can inject it with your own domain-specific data sets, in a process called supervised fine tuning (SFT). One example would be the ship container-building industry, where specifications, protocols and regulations are not widely available.
DeepSeek showed that you can do this well with “thousands” of question-answer data sets. For an example of how others can put this into practice, Chris Hay, an IBM engineer, demonstrated how he fine-tuned a small model using his own math-specific datasets to achieve lightning-fast responses—outperforming OpenAI’s o1 on the same tasks (See his hands-on video here)
…and a little RL
Additionally, companies wanting to train a model with additional alignment to specific preferences – for example making a customer support chatbot sound empathetic while being concise – will want to do some reinforcement learning (RL) on the model. This is also good if a company wants its chatbot to adapt its tone and recommendation based on a user’s feedback. As every model gets good at everything, “personality” is going to be increasingly big, said Wharton AI professor Ethan Mollick on X yesterday.
These SFT and RL steps can be tricky for companies to implement well, however. Feed the model with data from one specific domain area, or tune it to act a certain way, and it suddenly becomes useless for doing tasks outside of that domain or style.
For most companies, RAG will be good enough
For most companies, however, retrieval-augmented generation (RAG) is the easiest and safest path forward. RAG is a relatively straight-forward process that allows organizations to ground their models with proprietary data contained in their own databases — ensuring outputs are accurate and domain-specific. Here, an LLM feeds a user’s prompt into vector and graph databases, in order to search information relevant to that prompt. RAG processes have gotten very good at finding only the most relevant content.
This approach also helps counteract some of the hallucination issues associated with DeepSeek, which currently hallucinates 14% of the time compared to 8% for OpenAI’s o3 model, according to a study done by Vectara, a vendor that helps companies with the RAG process.
This distillation of models plus RAG is where the magic will come for most companies. It has become so incredibly easy to do, even for those with limited data science or coding expertise. I personally downloaded the DeepSeek distilled 1.5b Qwen model, the smallest one, so that it could fit nicely on my Macbook Air. I then loaded up some PDFs of job applicant resumes into a vector database, and then asked the model to look over the applicants to tell me which ones were qualified to work at VentureBeat. (In all, this took me 74 lines of code, which I basically borrowed from others doing the same).
I loved that the Deepseek distilled model showed its thinking process behind why or why not it recommended each applicant — a transparency that I wouldn’t have gotten easily before Deepseek’s release.
In my recent video discussion on DeepSeek and RAG, I walked through how simple it has become to implement RAG in practical applications, even for non-experts. Sam Witteveen also contributed to the discussion by breaking down how RAG pipelines work and why enterprises are increasingly relying on them instead of fully fine-tuning models. (Watch it here).
OpenAI Deep Research: Extending RAG’s capabilities — but with caveats
While DeepSeek is making reasoning models cheaper and more transparent, OpenAI’s Deep Research announced Sunday, represents a different but complementary shift. It can take RAG to a new level by crawling the web to create highly customized research. The output of this research can then be inserted as input into the RAG documents companies can use, alongside their own data.
This functionality, often referred to as agentic RAG, allows AI systems to autonomously seek out the best context from across the internet, bringing a new dimension to knowledge retrieval and grounding.
Open AI’s Deep Research is similar to tools like Google’s Deep Research, Perplexity and You.com, but OpenAI tried to differentiate its offering by suggesting its superior chain-of-thought reasoning makes it more accurate. This is how these tools work: A company researcher requests the LLM to find all the information available about a topic in a well-researched and cited report. The LLM then responds by asking the researcher to answer another 20 sub-questions to confirm what is wanted. The research LLM then goes out and performs 10 or 20 web searches to get the most relevant data to answer all those sub-questions, then extract the knowledge and present it in a useful way.
However, this innovation isn’t without its challenges. Amr Awadallah, the CEO of Vectara, cautioned about the risks of relying too heavily on outputs from models like Deep Research. He questions whether indeed it is more accurate: “It’s not clear that this is true,” Awadallah noted: “We’re seeing articles and posts in various forums saying no, they’re getting lots of hallucinations still and Deep Research is only about as good as other solutions out there on the market.”
In other words, while Deep Research offers promising capabilities, enterprises need to tread carefully when integrating its outputs into their knowledge bases. The grounding knowledge for a model should come from verified, human-approved sources to avoid cascading errors, Awadallah said.
The cost curve is crashing: why this matters
The most immediate impact of DeepSeek’s release is its aggressive price reduction. The tech industry expected costs to come down over time, but few anticipated just how quickly it would happen. DeepSeek has proven that powerful, open models can be both affordable and efficient, creating opportunities for widespread experimentation and cost-effective deployment.
Awadallah emphasized this point, noting that the real game-changer isn’t just the training cost—it’s the inference cost, which for DeepSeek is about 1/30th of OpenAI’s o1 or o3 for inference cost per token. “The margins that OpenAI, Anthropic, and Google Gemini were able to capture will now have to be squished by at least 90% because they can’t stay competitive with such high pricing,” Awadallah said.
Not only that, but those costs will continue to go down. Dario Amodei, CEO of Anthropic said recently that the cost of developing models continues to drop at around a 4x rate each year. It follows that the rate that LLM providers charge to use them will continue to drop as well. “I fully expect the cost to go to zero,” said Ashok Srivastava, chief data officer of Intuit, a company that has been driving AI hard in its tax and accounting software offerings like TurboTax and Quickbooks. “…and the latency to go to zero. They’re just going to be commodity capabilities that we will be able to use.”
This cost reduction isn’t just a win for developers and enterprise users; it’s a signal that AI innovation is no longer confined to big labs with billion-dollar budgets. The barriers to entry have dropped, and that’s inspiring smaller companies and individual developers to experiment in ways that were previously unthinkable. Most importantly, the models are so accessible that any business professional will be using them, not just AI experts, said Srivastava.
DeepSeek’s disruption: Challenging “Big AI’s” stronghold on model development
Most importantly, DeepSeek has shattered the myth that only major AI labs can innovate. For years, companies like OpenAI and Google positioned themselves as the gatekeepers of advanced AI, spreading the belief that only top-tier PhDs with vast resources could build competitive models.
DeepSeek has flipped that narrative. By making reasoning models open and affordable, it has empowered a new wave of developers and enterprise companies to experiment and innovate without needing billions in funding. This democratization is particularly significant in the post-training stages—like RL and fine-tuning—where the most exciting developments are happening.
DeepSeek exposed a fallacy that had emerged in AI—that only the big AI labs and companies could really innovate. This fallacy had forced a lot of other AI builders to the sidelines. DeepSeek has put a stop to that. It has given everyone inspiration that there’s a ton of ways to innovate in this area.
The Data imperative: Why clean, curated data is the next action-item for enterprise companies
While DeepSeek and Deep Research offer powerful tools, their effectiveness ultimately hinges on one critical factor: data quality. Getting your data in order has been a big theme for years, and accelerated over the past nine years of the AI era. But it has become even more important with generative AI, and now with DeepSeek’s disruption, it’s absolutely key. Hilary Packer, CTO of American Express, underscored this in an interview with VentureBeat yesterday: “The AHA moment for us, honestly, was the data. You can make the best model selection in the world… but the data is key. Validation and accuracy are the holy grail right now of generative AI.”
This is where enterprises must focus their efforts. While it’s tempting to chase the latest models and techniques, the foundation of any successful AI application is clean, well-structured data. Whether you’re using RAG, SFT, or RL, the quality of your data will determine the accuracy and reliability of your models.
And while many companies aspire to perfect their entire data ecosystems, the reality is that perfection is elusive. Instead, businesses should focus on cleaning and curating the most critical portions of their data to enable point AI applications that deliver immediate value.
Related to this, a lot of questions linger around the exact data that DeepSeek used to train its models on, and this raises questions about the inherent bias of the knowledge stored in its model weights. But that’s no different from questions around other open source models, such as Meta’s Llama model series. Most enterprise users have found ways to fine-tune or ground the models with RAG enough so that they can mitigate any problems around such biases. And that’s been enough to create serious momentum within enterprise companies toward accepting open source, indeed even leading with open source.
Similarly, there’s no question that many companies will be using DeepSeek models, regardless of the fear around the fact that the company is from China. Though it’s also true that a lot of companies in highly regulated companies such as finance or healthcare are going to be cautious about using any DeepSeek model in any application that interfaces directly with customers, at least in the short-term.
Conclusion: The future of enterprise AI Is open, affordable, and data-driven
DeepSeek and OpenAI’s Deep Research are more than just new tools in the AI arsenal—they’re signals of a profound shift, where enterprises will be rolling out masses of purpose-built models, extremely affordably, competent, and grounded in the company’s own data and approach.
For enterprises, the message is clear: the tools to build powerful, domain-specific AI applications are at your fingertips. You risk falling behind if you don’t leverage these tools. But real success will come from how you curate your data, leverage techniques like RAG and distillation, and innovate beyond the pre-training phase.
As AmEx’s Packer put it, the companies that get their data right will be the ones leading the next wave of AI innovation.
Noticias
OpenAI Deep Research Explains Itself

If you will be in San Francisco on February 13th, consider attending my reader dinner, New World Dinner 4.
I asked OpenAI Deep Research to explain itself. The result is impressively factual, at least at an initial read-through. I did catch a few errors, which are identified in footnotes, but it is impressively few errors.
More specifically, I asked for “a thorough report on the technical breakthrough in hardware, algorithm design, product engineering, and logistics necessary to create OpenAI Deep Research.” I have edited it briefly to reduce redundancy, but the factual content is largely unchanged.
The reports generated by ODR can be heavily repetitive and seen artificially long. Ideas spill out of their designated section and are repeated (sometimes in ways that are not relevant) in other sections. The excessive use of brackets for examples in run-on sentences also bothers me. Both these elements remain to some degree in the edited version, but are heavily edited out. There was some amount of topic drift outside of the specified topics.
Enjoy the following, informative of ODR in both form and content!
Early AI Hardware (Pre-GPU/TPU): Before GPUs and TPUs became dominant, researchers experimented with various specialized hardware for AI. Early vector processors and co-processors were used to accelerate linear algebra operations central to neural networks. Intel’s ETANN in the late 1980s used analog circuits for neural computations. By the early 1990s, all-digital neural network chips (e.g. the Nestor/Intel Ni1000) appeared, and digital signal processors (DSPs) were repurposed to speed up tasks like optical character recognition. Field-programmable gate arrays (FPGAs) were also explored for neural network acceleration as early as the 1990s, offering customizable hardware pipelines, though they often traded off maximum throughput for flexibility. These pre-GPU advances established the principle that domain-specific hardware could dramatically boost AI workloads by exploiting parallelism and low-precision arithmetic, a lesson that paved the way for later GPU and TPU innovations.
Graphics Processing Units (GPUs): Originally designed for rendering graphics, GPUs in the 2000s were retrofitted for general-purpose computing, excelling in the matrix and vector operations needed for machine learning. A modern GPU contains thousands of smaller cores arranged for massive parallelism, a large on-board memory (VRAM), and a fast memory hierarchy optimized for throughput. GPU manufacturers introduced AI-specific architectural features. For example, NVIDIA’s Volta architecture (2017) added Tensor Cores that perform mixed-precision matrix multiply-accumulate operations, delivering up to ~125 TFLOPS on 16-bit calculations in a single chip. These innovations dramatically increased training speed by performing many multiply-adds in hardware concurrently. GPUs also leverage high-bandwidth memory (HBM in newer models) to feed data to the cores quickly, and use programming models like CUDA to let developers optimize memory access patterns and parallel execution. Thanks to their programmability and an existing ecosystem from the graphics world, GPUs became the workhorse for neural networks.
Tensor Processing Units (TPUs) and ASICs: TPUs are application-specific integrated circuits (ASICs) developed by Google specifically for neural network workloads. First deployed in Google’s data centers in 2015, the TPU v1 was tailored for the inference phase of deep learning. Its core was a 65,536-unit systolic array (matrix multiply unit) operating on 8-bit integers, achieving a peak of 92 trillion operations per second, backed by 28 MiB of on-chip SRAM for fast data access. By stripping out general-purpose features (caches, branch prediction, etc.), TPUs sacrificed versatility in favor of determinism and efficiency. The initial TPU proved to be “about 15×–30× faster at inference than the contemporary GPU or CPU” (Nvidia K80 and Intel Haswell), while delivering 30×–80× higher performance per watt. Such gains came partly from the TPU’s streamlined dataflow design and aggressive low-precision computing. Subsequent TPU generations (v2, v3, v4) incorporated support for training (using bfloat16/FP16 for higher numeric range), much larger on-chip memory and high-bandwidth off-chip memory (HBM), and massive scalability via specialized interconnects between chips in a TPU pod. TPUs are cloud-hosted and optimized for Google’s software stack (TensorFlow XLA compiler). They are essentially hardware-as-a-service for AI. Other companies have similarly built AI ASICs – e.g. Amazon’s Inferentia for inference and Trainium for training in AWS data centers – aiming to outperform general GPUs by focusing on the matrix/tensor operations common to deep learning. These ASICs exemplify the trend of vertical integration, where the hardware is co-designed with machine learning algorithms for maximum efficiency.
GPU vs. TPU – Performance, Cost, and Scalability: GPUs and TPUs represent two different approaches to AI hardware:
-
Raw Performance: TPUs tend to have an edge in raw throughput for dense tensor operations. For example, a single TPU v3 core can deliver 123 TFLOPs for BF16 multiply-add, comparable to or higher than a high-end GPU, and Google reported order-of-magnitude gains in throughput per dollar for TPUs on large neural workloads (infoq.com). However, GPUs have narrowed the gap by introducing similar tensor accelerators and by excelling at tasks requiring flexibility or high precision (e.g. scientific computing or custom operations). For many models, modern GPUs achieve training speeds on par with TPU pods when using optimized libraries.
-
Cost & Ecosystem: GPUs benefit from economies of scale and a broad market. They can be deployed from a single desktop up to supercomputer clusters, and the GPU software ecosystem (CUDA, PyTorch, etc.) is very mature. This makes GPUs highly adaptable – researchers can experiment with new model types without waiting for new hardware. TPUs can offer lower cost-per-training for large production workloads, but they are less accessible for small-scale use and require using Google’s platform. Cost also depends on utilization – a TPU pod is cost-effective when fully utilized for large training jobs, whereas idle time or smaller jobs might waste its capacity.
-
Scalability: Both GPUs and TPUs scale to massive clusters, but the strategies differ. GPU clusters often use high-speed interconnects like NVLink and InfiniBand to connect dozens or hundreds of GPUs; Nvidia’s DGX SuperPOD, for instance, uses InfiniBand to ensure 1600+ GB/s cross-node bandwidth for scaling to thousands of GPUs. Google’s TPU pods, on the other hand, have an ultra-fast custom mesh network connecting up to thousands of TPU chips, allowing near-linear scaling on training jobs designed for TPU infrastructure. In practice, TPUs can be easier to scale for very large training runs because the hardware and software are designed together. GPU clusters can also scale well but may require more engineering by the user.
-
Adaptability: GPUs are general-purpose processors. Aside from neural nets, they can accelerate graphics, physics simulations, or data analytics. This versatility means a GPU investment can be repurposed across different workloads and GPUs readily accommodate new model architectures or dynamic neural network operations that weren’t anticipated by hardware designers. TPUs, in contrast, are more specialized for matrix-heavy neural network patterns. Within their domain TPUs are programmable. They support many network architectures via high-level TensorFlow/XLA code. Moreover, Google continues to broaden their capabilities each generation. In summary, GPUs offer broad adaptability and a huge community/stack, while TPUs offer brute-force efficiency for mainstream deep learning tasks.
From Early AI Algorithms to Transformers: The evolution of AI algorithms has been marked by a series of breakthroughs that increased model expressiveness and scalability. Early AI models in the mid-20th century were limited by computational power and algorithmic understanding. The introduction of backpropagation in the 1980s enabled multi-layer neural networks to learn complex functions, leading to the first wave of deep learning (e.g. LeCun’s CNN for handwriting in 1989). Recurrent neural networks (RNNs) and their gated variants (LSTMs, GRUs in the 1990s) brought sequence modeling to the forefront, proving effective for speech and language by maintaining state across time steps. However, RNNs suffered from sequential processing constraints – they process one token at a time, making it hard to parallelize and capturing long-range dependencies was tricky even with gating mechanisms (mchromiak.github.io).
In the mid-2010s, the attention mechanism emerged as a game-changer. First used alongside RNNs in machine translation (Bahdanau et al., 2015) to allow models to focus on relevant parts of the input sequence, attention opened the door to better context handling. The culmination of these ideas was the Transformer architecture (Vaswani et al., 2017), which eschewed recurrence entirely and relied solely on attention to model global relationships in sequences. By encoding the position of tokens and using multi-head self-attention, Transformers can attend to different parts of a sequence in parallel, overcoming the bottlenecks of RNNs. This parallelism meant that Transformers could be trained much faster on GPUs/TPUs than RNN-based models for the same sequence lengths (mchromiak.github.io).
Within just a couple of years, transformers became the foundation of most state-of-the-art models in NLP, vision, and beyond, owing to their scalability and superior performance on long-range dependencies. They made major algorithmic breakthroughs: Multi-head attention allows the model to learn different types of relationships simultaneously. Positional encoding injects order information without recurrence. These were key enablers for this transformer revolution. These innovations, along with techniques like layer normalization and residual connections, allowed training extremely deep networks that converge faster and generalize better, setting the stage for today’s large-scale models.
Key Components Enabling Transformers: A few specific innovations were crucial for modern transformer-based networks.
(1) Scaled Dot-Product Attention – a mechanism that lets the model weigh the relevance of different tokens to each other, with a scaling factor to keep gradients stable. This idea, combined with multi-head attention, means the model effectively has multiple attention “subspaces” to capture different aspects of similarity in the data.
(2) Positional Encoding – since transformers have no built-in notion of word order (unlike RNNs which process sequentially), Vaswani et al. introduced adding sinusoidal position embeddings to token representations, giving the model awareness of sequence positions. This allowed the attention mechanism to consider relative positions.
(3) Feed-Forward and Residual Layers – each transformer layer includes a position-wise feed-forward network and uses residual connections and layer normalization, which help train very deep architectures by mitigating vanishing gradients and stabilizing learning.
(4) Parallelization Strategies – transformers significantly reduce the number of sequential operations needed to relate two distant positions in a sequence. In RNN-based models, the number of steps to connect tokens grows linearly with their distance. Transformers reduce this to one attention pass regardless of distance. This property, combined with parallel computation of sequence elements, means training time can be dramatically shorter for long sequences. Replacing recurrence with self-attention “leads to significantly shorter training time” due to the ability to parallelize sequence processing (mchromiak.github.io) .
Additionally, researchers developed better optimization techniques (like Adam optimizer, learning rate schedulers) and training tricks (dropout, initialization schemes). While not specific to transformers, they enabled stable training of very large models that would have been unstable before. The transformer architecture’s success is a prime example of algorithm design co-evolving with hardware capabilities. It trades off some computational intensity (O(n²) attention) for much greater parallelism, which is a good trade in the era of abundant GPU/TPU compute.
Evolution of “Chain of Thought” Reasoning: A recent algorithmic development in AI is the concept of chain-of-thought (CoT) reasoning, particularly in large language models. Instead of providing an answer directly, the model is encouraged to generate a sequence of intermediate reasoning steps – essentially, to “think out loud.” Wei et al. (2022) demonstrated that simply by prompting a sufficiently large language model to output a step-by-step solution, one can significantly boost its problem-solving capabilities. This was surprising because it did not require changing the model’s architecture – it leveraged the model’s latent knowledge when guided properly.
The CoT approach improves problem-solving efficiency because the model can break a tough problem into smaller chunks, reducing errors at each step and allowing backtracking if needed. It’s an active research area, with work showing that chain-of-thought methods can lead to emergent abilities in very large models that smaller models do not exhibit.
Best Practices for Large-Scale AI Software: Engineering around large AI models (such as GPT-3-scale transformers) requires disciplined software practices to ensure reliability and efficiency. Teams now adopt MLOps practices – an extension of DevOps for machine learning – to streamline the model lifecycle from development to deployment. MLOps involves automation of data pipelines, reproducible training runs, model versioning, CI/CD for model deployment, and continuous monitoring of models in production (developer.harness.io).
Challenges in Training and Tuning at Scale: Training large AI models brings unique engineering challenges. The sheer scale of data and parameters means that distributed training is often necessary – no single machine has enough memory or compute. This requires strategies like data parallelism (split batches across GPUs), model parallelism (split the model itself across devices), or pipeline parallelism (chaining model segments on different hardware) – often all three in hybrid forms for trillion-parameter models. Sophisticated frameworks have been developed to automate these sharding strategies, but engineering oversight is needed to handle issues like synchronization, communication overhead, and fault tolerance. Google’s GPipe (2019) demonstrated how pipeline parallelism can train giant models by partitioning layers across accelerators and using micro-batches to keep all partitions busy. Such techniques require careful orchestration to ensure that each batch of data and the model partitions are in the right place at the right time. Engineers must also optimize the training throughput by tuning things like batch size (too small and GPUs underutilize, too large and convergence might slow or memory overflows).
Deployment, Inference, and MLOps: Once a model is trained, serving it to end-users at scale is another engineering feat. Large models often need to run on clusters of machines with accelerators to handle high query volumes with low latency. Best practices here include efficient serving architectures and model compression. The latter uses techniques like knowledge distillation, quantization, or sparse pruning to reduce model size and speed up inference.
Inference optimizations like using half-precision or INT8 quantized models can dramatically cut costs. Many industry deployments now run neural nets in INT8 where accuracy permits, since it doubles the throughput on compatible hardware. From a software engineering standpoint, deploying AI models involves a robust CI/CD pipeline: new model versions should go through automated integration tests.
MLOps also covers monitoring and maintenance: models in production need continuous monitoring for data drift, performance drift, and even adversarial or unexpected inputs. If anomalies are detected, an automated pipeline might trigger a model retraining or fallback to a safe model. Automation is key – leading AI firms have continuous training systems where models are periodically retrained on fresh data and redeployed, much like how software is continuously integrated and deployed (ml-ops.org, developer.harness.io). All these engineering practices ensure that large-scale AI models remain reliable, accurate, and efficient as they move from research to real-world products.
AI Hardware Supply Chain Challenges: The rapid growth of AI has put enormous strain on the global hardware supply chain. Cutting-edge AI training and inference rely on advanced semiconductors (GPUs, TPUs, ASICs), which in turn depend on a complex, global semiconductor manufacturing pipeline. In recent years, demand for AI chips has surged – Deloitte projected AI chip sales would account for 11% of a $576B semiconductor market in 2024, with generative AI and LLMs driving many enterprises to acquire GPUs by the thousands. This surge (over 20% increase in demand year-on-year) is straining the supply chain, leading to chip shortages and long lead times for acquiring hardware. A few factors make the supply fragile:
(1) Concentrated Suppliers – a large share of advanced AI chips are manufactured by TSMC in Taiwan or Samsung in South Korea. Any disruption (natural disaster, geopolitical tension) affecting these manufacturers or the specialized fabs that produce 5nm/7nm chips can create global bottlenecks. The sector “relies on a few key suppliers… any disruption can create significant bottlenecks, delaying production and impacting the entire supply chain.”
(2) Complex Production Process – producing high-end GPUs/TPUs involves dozens of steps across different countries (design in the US, fabrication in Taiwan, packaging and testing elsewhere). Production can halt due to shortages in critical materials like silicon wafers, photoresist chemicals, or neon gas for lasers. During the COVID-19 pandemic and subsequent supply crunch, lead times for GPU orders stretched to over 6-12 months, affecting not just research labs but any company relying on that hardware (logicalis.com, techrepublic.com).
(3) Geopolitical Risks – export controls and trade disputes also play a role; for instance, recent regulations on chip exports have limited access to top-tier AI GPUs in certain countries, which not only impacts availability but also prompts efforts to develop indigenous AI chips. To mitigate these issues, governments and companies are investing in diversifying and shoring up the supply chain. Initiatives like the US CHIPS Act (2022) earmark tens of billions of dollars to build new fabs in the US. However, building new semiconductor fabs is a slow process – it can take 2–3 years and billions of dollars to get a new plant online, and even then, ramping up yield for cutting-edge nodes is nontrivial (datacenterpost.com).
Data Center Infrastructure Constraints: Building and operating the data centers that power advanced AI is another logistical challenge. AI supercomputing clusters (like those used for training GPT-4 or other large models) pack thousands of accelerators together, which creates extraordinary demands on power and cooling. Energy consumption is a major concern: training a single large model can consume megawatt-hours of electricity. For example, GPT-3’s training is estimated to have used ~1,300 MWh, equivalent to the annual power usage of 100+ U.S. homes (weforum.org). Data centers must be designed to deliver this power (often tens of MW for an AI cluster) and remove the corresponding heat. This has led to specialized cooling solutions, like liquid cooling plates on GPUs and even full immersion cooling for servers, to allow dense packing of chips. From a facilities standpoint, companies often choose locations with cheap electricity and cool climates for AI data centers to manage operating costs and sustainability concerns.
Additionally, bandwidth and networking inside these clusters are a limiting factor. Distributing a training job across hundreds of GPUs/TPUs requires extremely high network throughput and low latency. Architects use high-bandwidth switches, and sometimes novel network topologies (e.g. Fat-Tree or Dragonfly networks), to ensure each node can communicate at tens or hundreds of gigabits per second. Communication overhead can eat into scaling efficiency, so researchers have to optimize communication patterns to fully utilize big clusters. Another constraint is data storage and pipeline: feeding terabytes of training data to thousands of accelerators without stalls requires parallel storage systems (like NVMe RAID arrays or distributed file systems) that can stream data at dozens of GB/s. If the I/O can’t keep up, the expensive compute sits idle. Many AI datacenters now employ high-throughput flash storage and caching to pre-load datasets into local SSDs or even GPU memory. All these considerations mean that scaling AI is not just about more GPUs – it’s about balancing compute, memory, networking, and storage. As one illustration, NVIDIA’s DGX SuperPOD design notes that each node has to sustain >40 GB/s I/O to not bottleneck the GPUs. Ensuring such performance across an entire cluster is a major logistical feat, requiring careful planning of data center layout, power distribution, and network architecture.
Emerging Solutions and Future Trends: To address these logistical hurdles, the industry is exploring several promising directions. On the hardware supply side, one trend is the development of chiplet-based designs. Instead of one large, monolithic die (which is harder to manufacture at high yield), companies like AMD and Intel are building chips out of multiple smaller dies (chiplets) connected by high-speed interfaces. This improves yield and flexibility – different chiplets (compute, memory, I/O) can be mixed and matched. It could alleviate some supply issues by allowing more modular production. Another approach is wafer-scale integration: Cerebras Systems famously created a wafer-sized chip (over 80,000 cores on one huge silicon wafer) to accelerate AI, eliminating off-chip communication for certain workloads. While niche, it shows the appetite for novel form factors to speed up AI. In networking, there’s work on optical interconnects and silicon photonics to eventually replace or augment copper links, which could dramatically increase bandwidth and reduce latency between nodes, easing the data movement problem.
There is also a push toward distributed training across data centers. If one center doesn’t have enough capacity, frameworks could in theory utilize resources from multiple locations. However, network latency makes this challenging for synchronous training.
Another future trend is algorithmic: reducing the need for brute-force compute via smarter training methods. Techniques like sparsity (pruning models), low-rank approximations, and progressive training aim to cut down the required compute without sacrificing results. If successful, these could relieve pressure on hardware and infrastructure by making AI models less hungry for resources. Lastly, the industry is acutely aware of geopolitical considerations – there’s a focus on building more resilient and geographically distributed supply chains. This might mean more chip fabs in different countries, standardizing certain components to be interchangeable, and maintaining strategic stockpiles of critical materials.
In summary, while today’s AI boom is taxing the logistics of compute, a combination of technological innovation and strategic planning is underway to ensure that OpenAI Deep Research-scale projects remain feasible. The path involves not just more powerful chips, but smarter algorithms, better software infrastructure, and robust planning for the “nuts and bolts” that underpin AI at global scale. Each breakthrough in hardware, algorithms, product engineering, or logistics brings us a step closer to truly ubiquitous and sustainable advanced AI systems, enabling researchers to push the boundaries of what AI can do.
https://chatgpt.com/share/67a3a20c-d2d4-8005-92a7-feae93cb9b1e
Bonus: the o1-pro assisted factcheck did not help
Noticias
Tether monta Hype, OpenAi hace movimientos

Hace unas semanas, Deepseek sacudió el mercado de IA con una alternativa GPT más barata, tal vez incluso mejor chat.
Después, AI Hype recibió un golpe y los proyectos de criptografía cayeron, antes de que finalmente recuperara.
Pero solo ahora se siente el impacto real; Con Deepseek, un verdadero competidor, las compañías de IA occidentales han intensificado su juego, aumentaron el desarrollo y, en general, provocaron una ola de grandes movimientos en todo el mundo de la IA.
Tether se muda a la IA, OpenAi está abrazando Europa, y existe la posibilidad de que un nuevo jugador ingrese al juego. Y mientras tanto, los proyectos criptográficos como Mind of Pepe continúan potenciando una preventa, posicionándose para convertirse en un jugador importante en el mundo de la IA siempre turno.
La incursión de Tether en AI: ¿una nueva era para la IA con criptografía?
Tether, mejor conocido por la establo de $ USDT, se está expandiendo a la inteligencia artificial.
Al desarrollar aplicaciones con AI y lanzar un Kit de desarrollo de software de código abierto (SDK), Tether espera atraer herramientas de IA en la órbita criptográfica. Tether afirma estar trabajando en:
- Ai traducir
- Asistente de voz de AI
- Asistente de billetera de bitcoin ai
Esa última aplicación en particular indica cómo Tether ve que la IA y la criptografía trabajan entre sí.
El próximo SDK se construirá en Bare, un tiempo de ejecución de JavaScript desarrollado por Holepunch, y estará diseñado para ejecutarse localmente en varios dispositivos, incluidos sistemas integrados, teléfonos inteligentes, computadoras portátiles y servidores.
El énfasis en las operaciones locales de IA lleva a la cripto de regreso a sus raíces descentralizadas con énfasis en la privacidad del usuario y los datos de autododial y la administración de dinero.
El empuje de Tether en IA también conecta el aprendizaje automático con los ecosistemas blockchain. Al integrar la automatización impulsada por la IA en transacciones criptográficas, traducción y comandos de voz, Tether podría ayudar a cerrar brechas vitales en la adopción de criptografía.
Tal como está, $ USDT lidera el mercado de Stablecoin con algunos números impresionantes, lo que espera que sus movimientos de IA mejoren aún más.
Si tiene éxito, estos desarrollos podrían establecer un precedente para que otras empresas de blockchain exploren los servicios con IA, impulsando aún más el creciente sector cripto de IA.
OpenAi va a Europa
Como dice el viejo dicho, con gran poder viene un gran escrutinio. Los reguladores vigilan cada vez más a la IA a medida que los sistemas se vuelven más potentes. Eso va el doble para las regiones con estrictas leyes de protección de datos como la Unión Europea.
No todas las empresas de IA han estado dispuestas a cumplir, prefiriendo hacer la mayor parte de sus negocios en otro lugar. Pero en respuesta a la creciente competencia, Openai introdujo un programa de residencia de datos en Europa, lo que permite a los clientes almacenar y procesar sus datos completamente dentro de la UE.
Para revertir la declaración de apertura: Operai está dispuesto a soportar un gran escrutinio, con la esperanza de un gran poder (y más dinero, apostaríamos).
El movimiento de residencia de datos se alinea con las regulaciones GDPR y aborda preocupaciones en torno a la privacidad de los datos, la seguridad y la soberanía. Y el desarrollo es particularmente significativo a medida que crece la adopción de IA en industrias reguladas como finanzas, atención médica y servicios legales.
Al proporcionar a los clientes europeos la gestión de datos localizados, Operai espera posicionarse como una opción más atractiva para las empresas y preparar el escenario para la expansión estratégica
Podría establecer un nuevo estándar para las compañías de IA que operan a nivel mundial, lo que lleva a los competidores a implementar marcos de cumplimiento similares.
John Schulman sale de Anthrope: ¿Qué sigue?
John Schulman, cofundador de Openai y uno de los principales investigadores de alineación de IA, acaba de dejar antrópico, la startup de IA a la IA a la que se unió en agosto de 2024.
Anthrope, fundada por ex empleados de Openai, se posicionó como un competidor clave para OpenAi en la carrera para desarrollar sistemas AI avanzados, con $ 875 millones en ingresos anuales y ofreciendo acceso de modelos de IA a través de plataformas como Amazon Web Services.
Entonces, ¿por qué irse? La partida de Schulman podría tener varias implicaciones.
La medida destaca la intensa competencia dentro de la comunidad de investigación de IA. El enfoque de Schulman en la alineación de la IA, un aspecto crucial para garantizar que los modelos de IA se comporten de manera segura y ética, significa que su salida podría afectar el enfoque de Anthrope.
Sus partidas también subrayan cuán dinámica se ha vuelto el ecosistema de inicio de IA, especialmente a raíz de los grandes movimientos de Deepseek. Con la creciente inversión y los avances tecnológicos, las empresas compiten por los mejores talentos. ¿Schulman se unirá a otro competidor? ¿O podría lanzar una iniciativa completamente nueva? Cualquiera de los movimientos podría tener un gran impacto en el mercado.
Mente de Pepe: escenario establecido para el token de agente de IA Memecoin AI
La convergencia de IA y Blockchain parece cada vez más inevitable, con la expansión de IA de Tether preparando el escenario para más aplicaciones criptográficas integradas en AI.
Uno de esos proyectos es Mind of Pepe ($ Mind), un agente autónomo de IA basado en Memecoin.
Al lanzar un agente de IA autónomo en X, empoderarlo para obtener ideas del mercado y las redes sociales, y entregar a los titulares de tokens Mind de Alpha a $, la mente de Pepe parece llevar IA y Crypto a la próxima etapa de la evolución.
Imagine un agente de IA Mind Capaz de integrarse con la suite de herramientas AI de Tether, navegando sin problemas el mundo de Memecoins para encontrar los mejores para comprar, y las preventas criptográficas para obtener una preciosa inteligencia de mercado. O, como indica TechMap, imagine una mente capaz de controlar sus propios tokens y lanzar otros nuevos.
Ese es un agente de IA verdaderamente independiente capaz de transformar el mercado, exactamente el tipo de avance que están buscando las compañías de IA de OpenAI a Deepseek.
La mente de Pepe Preventa ya ha recaudado más de $ 5 millones, lo que demuestra un fuerte interés de los inversores. Compruébalo si quieres ser parte del movimiento.
Cripto + ai: el futuro espera
El combo Crypto + AI recién comienza y, con los Estados Unidos y muchos otros estados nacionales cargando en activos criptográficos, podríamos dirigirnos a mayores alturas.
Por último, pero ciertamente no menos importante. Haga su propia investigación: esto no es asesoramiento financiero, y el mercado de criptografía siempre es volátil. Dado el alto perfil de riesgo, invierta lo que no puede permitirse perder.
Pero el sector de la IA se está recuperando del incidente de Deepseek con energía renovada, y la mente de Pepe lo ama.