Reveal the top 10 trends in data and AI expected for 2024
These predictions range from the transformation of modern data stacks by LLM to the growing importance of data observability in vector databases.
"The data and AI landscape are moving very quickly. If you don't stop and look around once in a while, you might miss it."
2023 is the year of GenAI. And 2024 will be... another year of GenAI.
But while 2023 saw groups vying for names, 2024 will see groups prioritizing real business issues for their AI models. And with a focus on innovation, new priorities will emerge.
When it comes to the future of data, a rising tide will lift all boats. And GenAI will continue to evolve in 2024, simultaneously raising the standards — and priorities — of the data industry.
Here are my top 10 predictions for what's next for data and AI groups — and how your group can stay one step ahead.
1. LLM will transform the stack
It's no exaggeration to say that large language models (LLMs) have reshaped the tech landscape over the past 12 months. From companies with legitimate use cases to night-shift tech teams searching for problems, everyone — from individuals to data managers — is trying to utilize synthetic AI (GenAI) in one way or another.
LLM is set to continue that transformation into 2024 and beyond — from driving the increasing demand for data and necessitating new architectures like vector databases (or "AI stacks") to changing how we handle and use data for end users.
Automated data analysis and activation will become an expected tool in every product and at every level of the data stack. The question is: how do we ensure that these new products will deliver real value in 2024 rather than just being something novel for PR credit?
2. Data teams will resemble software teams
The most complex data teams view their data assets as actual data products — complete with product requirements, documentation, runbooks, and even SLAs for end users.
So, as organizations start to map out increasingly more value for their established data products, more and more data teams will start to look for — and be managed — much like important product teams they currently have.
3. And software teams will become data practitioners
When engineers try to build data products or GenAI without thinking about the data, it won't yield good results. Just ask United Healthcare.
As AI continues to dominate the world, engineering, and data will become one. There won't be any major software development entering the market without considering AI — and no major AI entering the market without some degree of enterprise data fueling it.
That means as engineers seek to enhance new AI products, they'll need to pay attention to the data — and how they work with the data — to build models with added value and continuous innovation.
>>Read more: Unlocking The Future: Top Emerging Technology Trends In 2024
4. RAG will be all the RAGE
After a series of high-profile GenAI failures, the need for clear, reliable, and managed context data to enhance AI products is becoming increasingly apparent.
As the AI field continues to evolve and blind spots in LLM training become more apparent, data-rich teams will shift towards RAG (Reinforcement Access Generation) and batch fine-tuning to enhance AI products for their businesses and deliver demonstrable benefits to their stakeholders.
RAG is still relatively new in the market (first introduced by Meta AI in 2020) and organizations have yet to develop best practices or practical methods for RAG — but they will soon.
5. Teams will operate enterprise-ready AI products
The trend of data engineering continues to be the trend — data products. And make no mistake, AI is a data product.
If 2023 was the year of AI, then 2024 will be the year of operating AI products. Whether necessary or coerced, data teams across industries will grasp enterprise-ready AI products. The question is whether they're truly ready for enterprise.
Gone are (hopefully) the days of creating random chatbot features just to say you're integrating AI when the board demands it. In 2024, teams may become more sophisticated in how they develop AI products, leveraging better training methods to create value and identify problems to be solved rather than using technology to create new problems.
6. Data observability will support AI and vector databases
In AWS's 2023 CDO Insights survey, respondents were asked what their organization's biggest challenge was in realizing the creative potential of AI.
The most common answer? Data quality.
At its core, creative AI is a data product. And like any data product, it won't work without reliable data. However, at the LLM scale, manual monitoring cannot provide the comprehensive and efficient quality coverage needed to make any AI reliable.
To truly succeed, data teams need a vibrant data observability strategy tailored to AI stacks that can help them detect, address, and prevent data downtime consistently in a dynamic and evolving environmental context. Those solutions need to prioritize resolution, process efficiency, and AI-supporting broadcast/vector infrastructure to become contenders in the battle for modern AI reliability in 2024.
7. Big data will shrink
Thirty years ago, personal computers were a novelty. Now, with modern MacBooks boasting computational power similar to AWS Snowflake servers introduced in their MVP store in 2012, hardware blurs the lines between commercial and enterprise solutions.
Since most workloads are small, data teams will start using processing and in-memory/processing databases to analyze and move datasets.
Especially for teams needing rapid scalability, these solutions can start quickly and can be upgraded to enterprise-grade functionality with commercial cloud services.
>> Read more: 5 Best ECommerce Website Builders For Online Store In 2024
8. Right-sizing will be prioritized
Today's data leaders face an impossible task. Use more data, make more impact, leverage more AI — but reduce cloud costs.
As Harvard Business Review has put it, data and AI executives are sure to fail. As of Q1 2023, IDC reports that spending on cloud infrastructure has risen to $21.5 billion. According to McKinsey, many companies are witnessing cloud spending increase by up to 30% annually.
Low-impact approaches like super data monitoring and tools that allow teams to view and use the right size will be invaluable in 2024.
9. Iceberg will rise (Apache Iceberg)
Apache Iceberg is an open-source data table format developed by Netflix's data engineering team to provide a way to handle large datasets at large scales more quickly and easily. It's designed to be easily queryable via SQL even for large analytical tables with petabyte-scale data.
As modern data lakes and repositories provide both computing and storage, Iceberg focuses on providing structured, cost-effective storage that can be accessed by many different tools that can be leveraged in your organization simultaneously, such as Apache Spark, Trino, Apache Flink, Presto, Apache Hive, and Impala.
Recently, Databricks announced that Delta's super data will also be compatible with the Iceberg format and Snowflake is actively integrating with Iceberg. As the lake becomes a practical solution for many organizations, Apache Iceberg — and Iceberg alternatives — may continue to gain popularity.
10. Back to the office because... someone
RTO — the original least favoriteism. Or maybe it is their favorite! Frankly, I can't keep up at this point. While teams seem divided on the issue, more and more teams are being called back to the office/floor plan/flexible work environment at least a few days per week.
According to the September 2023 report by Resume Builder, 90% of companies plan to enforce return-to-office policies by the end of 2024 — nearly 4 years after the fateful spring of 2020.
Some powerful CEOs — including Amazon's Andy Jassy, OpenAI's Sam Altman, and Google's Sundar Pichai — have issued return-to-office policies in recent months. And there seems to be at least some benefits to working in the office (at least part-time) compared to working from home.
Feeling stuck in a forever home camp? The answer seems to be — as always with data — to provide more value. Despite recent economic hardships and their impact on the job market, demand for data and AI teams remains high. And recruiters will often do what it takes to get them — and keep them. While some companies are requiring all employees to return to the office regardless of role, others like Salesforce are requiring non-remote engineers to work significantly less, a total of 10 days per quarter.