Unsafe Deserialization in Python (CWE-502)

What is serialization and deserialization? How does it work in Python?

Serialization transforms an object into a byte stream. Deserialization is the inverse process: taking a byte stream to create an object.

The reason for saving objects and restoring them to/from byte streams is to be able to communicate objects through the filesystem and the network. For example, in distributing system, the server may receive objects from a client (or the other way around - the server can send objects back to the client).

Why is deserialization unsafe?

Deserialization is unsafe for one simple reason: you cannot trust the binary value passed to you. While there is no problem with serializing data (e.g. sending to someone else), deserialization takes a random byte stream and converts it to an object. There is absolutely no guarantee that the object is safe to be used and can include code that may compromise your system.

Unsafe deserialization is a common software weakness. MITRE, in their Common Weakness Enumeration (CWE) system, references it under CWE-502: Deserialization of Untrusted Data

This blog post illustrates how unsafe deserialization works with Python and the standard pickle module.

What Python modules are vulnerable to unsafe deserialization?

While it's hard to enumerate all Python modules that serialize/deserialize data, the most used are:

pickle (from the Python standard distribution)
pandas (with read_pickle())
shelve (with open)

Note that the documentation of all these modules mentions security concerns and warns developers only to deserialize data from trusted sources. In very specific cases, it might be safe to deserialize data (e.g., when loading data you previously saved on your local machine). In the vast majority of cases, it's unsafe and highly not recommended to deserialize data.

How to safely serialize and deserialize data?

Unfortunately, there is no silver bullet, and the safest way to deserialize data is not to rely on deserialization and instead use API that exchanges the data you need.

If you need to communicate data in a binary format (for performance reasons), using binary protocols like protobuf or thrift are more secure and appropriate.

Automatically detect unsafe deserialization

Codiga provides IDE plugins and integrations with GitHub, GitLab, or Bitbucket to detect unsafe deserialization for multiple Python modules (pickle, shelve, pandas). The Codiga static code analysis detects unsafe deserialization in your IDE or code reviews ; here is a dedicated rule. This rule detects unsafe deserialization from the following Python modules: pickle, shelve and pandas.

Detect Unsafe Deserialization

To use this rule consistently, all you need to do is to install the integration in your IDE (for VS Code or JetBrains) or code management system and add a codiga.yml file at the root of your profile with the following content:

rulesets:
  - python-security

It will then check all your Python code against 100+ rules that detect unsafe and insecure code and suggests fixes for each of them.

Explore Code Analysis Rules for Python