DeepSeek: The Whale in the Deep Blue Sea

Brian Pereira

2 months ago

When DeepSeek’s R1 model surpassed OpenAI’s ChatGPT on the app charts, it drew global attention and raised data privacy concerns. How has DeepSeek-R1 impacted the big tech companies, and how did they respond? What are the lessons learned?

Image: Solen Feyissa on Unsplash

First, an explanation for this story headline. The Whale is DeepSeek’s logo. The Deep Blue Sea is a metaphor for deep learning and reasoning models, which have garnered much interest lately.

For the uninitiated, DeepSeek R1 is a family of AI models based on reinforcement learning (RL) that’s designed for logical and reasoning tasks. The model solves complex problems by breaking them down into multiple steps. It’s open-source (with the MIT license) and has a conversational chat interface like any other genAI tool.

Reinforcement learning is a term that stands out in that definition. In its simplest form, AI models now think twice or more before spitting out answers to user prompts. In fact, you will now see “thinking….” In the interface of RL models. Remember how your high school teacher or mama told you, “Think before you speak”?

According to Geeks-for-Geeks, Reinforcement Learning is a branch of machine learning focused on making decisions to maximize cumulative rewards in a given situation. Unlike supervised learning, which relies on a training dataset with predefined answers, RL involves learning through experience. In RL, an agent learns to achieve a goal in an uncertain, potentially complex environment by performing actions and receiving feedback through rewards or penalties.

That means you are going to get more accurate answers. The models use techniques such as backpropagation and weights in neural networks to evaluate the most accurate and relevant response to a prompt.

Lessons Learned

Now, back to the Chinese company DeepSeek and its models.

The biggest lesson we learned from DeepSeek is that you don’t need cutting-edge technology (expensive GPUs) to train a model. DeepSeek developed and trained its R1 model for just $6 million, whereas it cost OpenAI $500 million to train its O1 model. Also, due to trade sanctions imposed by the U.S. on Chinese companies, DeepSeek did not have access to the latest hardware – it used Nvidia GPUs purchased in 2023.

This shift highlights how countries can get around the challenge of accessing the latest technology from Western countries due to trade sanctions (chip export controls) and high tariffs.

Who’s Using DeepSeek Models?

After the hue and cry about data privacy and concerns about data stored on Chinese servers, companies are now taking DeepSeek seriously.

I have yet to hear about user companies using DeepSeek models, which is going to take a while, due to regulatory and compliance concerns. Governments are rushing to ban departments from using it citing security and privacy concerns – primarily sensitive national data leaking to the Chinese government.

However, the product and service companies are integrating DeepSeek into their offerings. They are mostly taking the open-source route. The open-source version of DeepSeek R1 is available as an MIT license and this is the route these companies are taking. Since it is open-source, they can examine the code, tailor it to their specific needs and incorporate security and privacy controls to ensure data sovereignty.

Here are some examples:

Yotta Data Services (Yotta), a provider of end-to-end digital transformation solutions, recently announced the launch of myShakti, India’s first fully sovereign B2C generative AI chatbot that runs off DeepSeek Open-Source AI model. Developed by a team at Yotta in just four days, the chatbot is hosted entirely in India.
In this case, Yotta took the fully open-source DeepSeek model and deployed it within NM1, its first data center in India. It uses server infrastructure comprising 16 nodes of H100 GPUs – a total of 128 H100s in all. Available as a beta version on a web app, the product is currently free to use.
AWS announced that DeepSeek-R1 Distill Llama and Qwen models are now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. There are four ways to deploy DeepSeek-R1 models on AWS: 1. Amazon Bedrock Marketplacefor the DeepSeek-R1 model. 2. Amazon SageMaker JumpStart for the DeepSeek-R1 model. 3. Amazon Bedrock Custom Model Import for the DeepSeek-R1-Distill models. 4. Amazon EC2 Trn1 instances for the DeepSeek-R1-Distill models.
European startups are already deploying the DeepSeek model in their products, as they seek cheaper alternatives to the better-known AI providers.
Elevenlabs, a synthetic voice startup and industry leader, was among the first to announce it had integrated R1 into its products. Others, including London-based AI unicorn Synthesia, have begun experimenting with the model to test its capabilities.
Another example is German startup Novo AI, an early adopter, which switched to the Chinese AI model from OpenAI’s ChatGPT two weeks ago.

Chinese Companies Too

Chinese companies are also integrating DeepSeek models:

China’s three largest telecommunications companies – China Mobile, China Unicom and China Telecom – have integrated DeepSeek’s open-source models, and provided exclusive computing power solutions and supporting environments for the DeepSeek-R1 model to help release the performance of the model, the country’s industry and information technology ministry said on 9 February.
Alibaba, Baidu and Tencent have also announced integration of DeepSeek’s models into their cloud platforms. This is likely to accelerate these companies’ development of intelligent applications.
China’s largest electric vehicle (EV) maker BYD, which sold around 4.2mn EVs last year in China, announced plans to integrate software from DeepSeek into 21 of its EV models, enabling the automaker to offer advanced autonomous driving features on all of its 18 models priced above 100,000 yuan ($13,686).
Local smartphone manufacturers such as Huawei, Honor and Oppo have also integrated DeepSeek’s services into their products. This is likely to accelerate the development and consumer adoption of AI smartphones.

Conclusion

It seems like some companies, especially startups, are willing to take the risk and adopt DeepSeek’s models – as they can drastically reduce the cost of model development and training. Yet this becomes a supply chain concern as startups usually innovate for large companies (banks and telecom companies, for instance), who, in turn, have customers. Large companies are custodians of their customer data, and they need to ensure that this data is safe in the hands of startups – or maybe just give them synthetic data to test their models, without taking that risk.