a16z - How to use DeepSeek safely

发布时间：2025-02-28 15:00:08 原节目

这是一段视频文字稿的总结，重点关注DeepSeek（深度求索）的关键见解和企业使用方面的担忧：讨论围绕着近期在中国开发的开源推理模型DeepSeek引发的热议展开。最初的炒作声称它将彻底改变经济学和应用开发，而悲观的观点则表达了对中国主导地位、数据安全以及潜在审查的担忧。 DeepSeek的核心吸引力在于其开源性质、推理能力以及其中国背景。开源被视为全球创新的一项积极进展，展现了协作开发的潜力。然而，它与中国政府的联系以及对模型行为产生潜在影响的可能性，令人担忧。演讲者深入研究，旨在描述这种影响的程度，重点是通过“红队演练”来测试模型对诸如提示注入和越狱等对抗技术的抵抗力。这些技术非常重要，因为它们可能成为利用更大系统或架构的入口，潜在地允许攻击者进行横向移动并执行恶意操作。 DeepSeek具有两层保护。其中一层旨在限制在中国境内对政治敏感话题（如台湾或天安门广场）的言论。这层保护与类似模型上的典型防护措施不同。这种政治审查非常严厉。询问有争议的话题要么会遭到拒绝，要么会得到一篇冗长的中国共产党宣传稿。这种对模型的公然引导引发了人们对其一致性和潜在偏见的担忧，特别是对于西方用户而言。然而，DeepSeek的弱点体现在其他方面。在其他领域，它的保护措施相当薄弱。在越狱基准测试中，它的表现明显不如GPT，与GPT-3.5早期阶段简单越狱就能奏效的情况相当。这表明DeepSeek并没有在加强其防御以应对这些类型的攻击方面投入大量精力。运行该模型的基础设施也被认为是不安全的。关于数据传输到中国的担忧，演讲者澄清说，虽然在中国托管的模型可能会暴露数据，但用户可以在MIT许可下下载并在本地运行该模型，从而降低这种风险。值得注意的是，即使是本地托管或美国提供的DeepSeek版本也保留了政治审查。这意味着，虽然数据可能不会被发送到中国，但模型的响应仍然受到约束。演讲者强调，明显的审查并不是最令人担忧的方面。更大的担忧在于未知数：北京是否在其他主题或领域微妙地影响模型的行为？是否存在绕过防护措施或暴露敏感信息的后门？这些未知数给考虑采用该模型的企业带来了风险。讨论转向了对西方模型审查制度的比较。虽然美国模型可能不会像DeepSeek那样进行公开说教，但它们通常会拒绝回答有关敏感话题的问题。有趣的是，一些西方模型，尤其是Anthropic的Claude，在中国政治话题上的审查程度与DeepSeek相当。GPT稍微自由一些，而X AI的Grok在涉及敏感的中国政治话题时几乎没有审查。对于考虑使用DeepSeek的企业，建议持谨慎态度。避免使用在中国托管的模型，而是选择自托管或美国的服务提供商。鉴于不确定性和模型的弱点，建议等待实施类似推理技术的开源模型。演讲者认为，很快就会出现合适的替代方案，提供更好的安全性和可靠性。由于速度慢、措辞冗长以及输出中偶尔出现中文字符等问题，DeepSeek不被视为一个强大的日常驱动程序。它容易受到越狱攻击，因此不适合面向最终用户的应用程序。

Here's a summarization of the video transcript, focusing on the key insights and concerns regarding DeepSeek, particularly in the context of enterprise usage: The discussion centers on the recent buzz surrounding DeepSeek, an open-source reasoning model developed in China. The initial hype involved claims that it would revolutionize economics and app development, while the pessimistic view expressed concerns about Chinese dominance, data security, and potential censorship. The core aspects of DeepSeek that have garnered attention are its open-source nature, its reasoning capabilities, and its Chinese origins. The open-source aspect is seen as a positive development for global innovation, showcasing the potential of collaborative development. However, the connection to the Chinese government and the potential for influence on the model's behavior is a cause for concern. The speaker's research delves into characterizing the extent of this influence, focusing on red-teaming the model to test its resistance to adversarial techniques such as prompt injections and jailbreaks. These techniques are significant because they can be gateways to exploiting larger systems or architectures, potentially allowing attackers to pivot and perform malicious actions. DeepSeek possesses two layers of protection. One aims to limit speech on politically sensitive topics within China, like Taiwan or Tiananmen Square. This layer is separate from the typical guardrails found on similar models. This political censorship is heavy-handed. Asking about controversial topics elicits either a refusal or a lengthy recitation of the Chinese Communist Party line. This blatant steering of the model raises concerns about its alignment and potential biases, especially for Western users. However, DeepSeek’s weaknesses are elsewhere. In other domains, its protections are quite weak. It performs substantially worse than GPT in jailbreaking benchmarks, comparable to the early days of GPT-3.5 when simple jailbreaks were effective. This suggests that DeepSeek hasn't invested heavily in hardening its defenses against these types of attacks. The infrastructure on which the model runs has also been deemed insecure. Regarding concerns about data transfer to China, the speaker clarifies that while the hosted model in China could potentially expose data, users can download and run the model locally under the MIT license, mitigating this risk. It's important to note that even locally hosted or U.S.-provided versions of DeepSeek retain the political censorship. This means that while data might not be sent to China, the model's responses are still subject to constraints. The speaker emphasizes that the obvious censorship is not the most concerning aspect. The greater concern lies in the unknowns: Are there other topics or areas where Beijing is subtly influencing the model's behavior? Could there be backdoors that bypass guardrails or expose sensitive information? These unknowns pose a risk for enterprises considering adopting the model. The discussion transitions to a comparison of censorship in Western models. While US models may not deliver overt lectures like DeepSeek, they often refuse to answer questions on sensitive topics. Interestingly, some Western models, particularly Anthropic's Claude, exhibit censorship on Chinese political topics that is on par with DeepSeek. GPT is a little bit freer, and Grok from X AI has relatively little censorship when it comes to sensitive Chinese political topics. For enterprises considering using DeepSeek, the advice is cautious. Avoid using the model hosted in China and instead opt for self-hosting or U.S.-based providers. Given the uncertainty and the model's weaknesses, it's recommended to wait for open-source models that implement similar reasoning techniques. The speaker believes that a suitable alternative will emerge soon, offering better security and reliability. DeepSeek isn't seen as a robust daily driver due to issues like slowness, verbosity, and occasional Chinese characters in its output. Its susceptibility to jailbreaks makes it unsuitable for end-user-facing applications.