Today I’m going to summarise the latest report from Scale AI’s 2024 AI Readiness Report.

The report focuses on what it takes for organisations to transition from merely adopting AI to actively optimising and evaluating it.

To understand the current state of AI development and adoption, Scale surveyed over 1,800 ML practitioners and leaders directly involved in building or applying AI solutions.

Key challenges identified include security concerns, lack of expertise, and insufficient benchmarks to effectively evaluate models. While improving operational efficiency was the top reason for adopting AI (cited by 79%), only half are measuring the business impact of their AI initiatives.

AI Year in Review 2023 saw rapid advancements in generative AI from leaders like OpenAI, Google, Anthropic, and Meta.

Key milestones included:

  • OpenAI's GPT-4 in March 2023, demonstrating human-level performance
  • Google's launch of Bard and PaLM 2
  • Anthropic's release of Claude 2 in the summer with a 100K context window
  • Meta's unveiling of Llama 2 and Code Llama
  • Google DeepMind's release of Gemini in late 2023, outperforming humans on the MMLU test
  • Emergence of open source models like Falcon, Mixtral, and DBRX enabling local inference with less compute
  • Anthropic's launch of Claude 3 in March 2024, doubling the context window
  • Cohere's release of Command R designed for scalability and long context

Frontier research also made significant strides in mathematical reasoning, model interpretability, and performance improvements with smaller models fine-tuned on high-quality data.

Key stats on the impact of generative AI:

  • 74% report generative AI forced the creation of an AI strategy in 2024, up from 53% in 2023
  • 82% plan to increase investment in commercial closed-source models over the next 3 years
  • 85% consider AI very or highly critical to their business in the next 3 years, up from 77%
  • Only 4% have no plans to work with generative AI
  • 38% have generative AI models in production, a substantial increase from 19% in 2023

Applying AI

This section examines trends in enterprise AI adoption, model preferences, investment plans, use cases, and challenges.

Stage of adoption:

  • 22% have one model in production
  • 27% have multiple models in production
  • 49% still evaluating use cases or developing first model/application

Model preferences:

  • 58% use latest OpenAI GPT-4, 44% use GPT-3.5
  • Google Gemini used by 39%
  • OpenAI overwhelmingly the preferred vendor

Planned model investments:

  • 72% increasing investment in commercial closed-source models
  • 67% increasing investment in open-source models

Top use cases driving adoption:

  1. Improved operational efficiency (61%)
  2. Improved customer experience (55%)
  3. Computer programming and content generation

Challenges:

  • Infrastructure, tooling, solutions not meeting needs (61%)
  • Insufficient budget (54%)
  • Data privacy concerns (52%)

For the 60% who have not yet adopted AI, security concerns and lack of expertise were the top reasons holding them back. Software and internet companies cited other priorities taking precedence.

"RAG aims to address a key challenge with LLMs - while they are very creative, they lack factual understanding of the world and struggle to explain their reasoning. RAG tackles this by connecting LLMs to known data sources, like a bank's general ledger, using vector search on a database. This augments the LLM prompts with relevant facts."

Jon Barker, Customer Engineer, Google

Building AI

This section explores the key pillars needed to build effective models, including model architecture innovations, computational resource trends, and the high-quality data imperative.

New neural network designs like sparse expert models are enabling larger, more capable models that efficiently activate only relevant subsets of neurons for each input. Example models leveraging these architectures include Falcon, Mixtral, DBRX and AI21 Labs' Grok.

The transition to GPU and TPU-centric AI workloads presents challenges:

  • 48% rate compute resource management as "most challenging" or "very challenging"
  • 38% cite lack of suitable AI-specific tools and frameworks as a major obstacle
"CPUs consume about 80% of IT workloads today. GPUs consume about 20%. That's going to flip in the short term, meaning 3 to 5 years. Many industry leaders that I've talked to at Google and elsewhere believe that in 3 to 5 years, 80% of IT workloads will be running on some type of architecture that is not CPU, but rather some type of chip architecture like a GPU."

Jon Barker, Customer Engineer, Google

Data is critical to building effective models. Survey results highlight:

  • Labeling quality is the top challenge in preparing training data
  • 55% leverage internal labeling teams
  • 50% engage specialized data labeling services
  • 29% use crowdsourcing
"Even if you train long enough with enough GPUs, you'll get similar results with any modern model. It's not about the model, it's about the data that it was trained with. The difference between performance is the volume and quality of data, especially human feedback data. You absolutely need it. That will determine your success."

Ashiqur Rahman, Machine Learning Researcher, Kimberly-Clark

Going forward, key priorities include:

  • Acquiring domain-specific human-generated datasets
  • Investing in human-in-the-loop pipelines to refine model outputs
  • Collecting multimodal data spanning text, speech, images and video

Evaluating AI

This section dives into current model evaluation practices and challenges for both model builders and enterprises applying AI.

Top reasons for evaluating models:

  1. Performance (67%)
  2. Reliability (68%)
  3. Security (62%)
  4. Safety (54%)

Evaluation approaches:

  • Automated model metrics (61%)
  • Benchmarks (42%)
  • Human preference ranking (41%)
  • Human evaluation (41%)

Automated metrics and human preference ranking surfaced issues the fastest, with over 70% discovering problems within a week. However, existing benchmarks have shortcomings, with 48% lacking security benchmarks and 50% missing industry-specific benchmarks.

For model builders:

  • 87% evaluate models/applications
  • 46% have internal teams with dedicated test & evaluation platforms
  • 64% leverage internal proprietary platforms
  • 40% use third-party evaluation platforms

For enterprises applying AI:

  • 72% evaluate models/applications
  • 49% use internal proprietary platforms
  • 42% have internal teams using external evaluation platforms
  • 38% adopt third-party platforms
"Evaluating generative AI performance is complex due to evolving benchmarks, data drift, model versioning, and the need to coordinate across diverse teams. The key question is how the model performs on specific data and use cases... Centralized oversight of the data flow is essential for effective model evaluation and risk management in order to achieve high acceptance rates from developers and other stakeholders."

Babar Bhatti, IBM AI Customer Success Lead

Gaps remain in current evaluation practices. Only about half of organizations are measuring the business impact of AI models on key outcomes like revenue and profitability. Performance and usability benchmarks, along with industry-specific standards, are needed as AI permeates different sectors.

Conclusion

The report concludes that optimization and evaluation are key to unlocking AI performance and ROI, whether organizations are building or applying the technology. The two most significant trends are:

  1. The growing need for model evaluation frameworks and private benchmarks
  2. Ongoing challenges optimizing models for specific use cases without sufficient tooling for data preparation, model training, and deployment

Scale reaffirms its mission to accelerate AI application development and commitment to shedding light on the latest trends, challenges, and requirements for building, applying, and evaluating AI.

Loading Bar Streamline Icon: https://streamlinehq.com

Yikes, a paywall!

70+ tutorials, courses and case studies wait behind it. No subscription, $150 paid once.

✅ Full course & tutorial access
Case studies on companies using AI
✅ Private community access
✅ No subscription, $150 paid once
Expense it using this template. Or get a team account.
Join 2,641 learners from companies like...
Or start with a free course:

More like this

Real-world examples of AI transforming business operations.
View all

If you scrolled this far, you must be a little interested...

Start learning ->

Join 3,107 professionals already learning