peps/pep-0307/index.html

903 lines
74 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 307 Extensions to the pickle protocol | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0307/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 307 Extensions to the pickle protocol | peps.python.org'>
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0307/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="Python Enhancement Proposals (PEPs)">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 307</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 307 Extensions to the pickle protocol</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Guido van Rossum, Tim Peters</dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">31-Jan-2003</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">2.3</dd>
<dt class="field-even">Post-History<span class="colon">:</span></dt>
<dd class="field-even">07-Feb-2003</dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#introduction">Introduction</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#protocol-versions">Protocol versions</a></li>
<li><a class="reference internal" href="#security-issues">Security issues</a></li>
<li><a class="reference internal" href="#extended-reduce-api">Extended <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> API</a></li>
<li><a class="reference internal" href="#the-reduce-ex-api">The <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> API</a></li>
<li><a class="reference internal" href="#customizing-pickling-absent-a-reduce-implementation">Customizing pickling absent a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation</a><ul>
<li><a class="reference internal" href="#case-1-pickling-classic-class-instances">Case 1: pickling classic class instances</a><ul>
<li><a class="reference internal" href="#the-getstate-method">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-setstate-method">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-getinitargs-method">The <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> method</a></li>
</ul>
</li>
<li><a class="reference internal" href="#case-2-pickling-new-style-class-instances-using-protocols-0-or-1">Case 2: pickling new-style class instances using protocols 0 or 1</a></li>
<li><a class="reference internal" href="#case-3-pickling-new-style-class-instances-using-protocol-2">Case 3: pickling new-style class instances using protocol 2</a><ul>
<li><a class="reference internal" href="#id1">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></li>
<li><a class="reference internal" href="#id2">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-getnewargs-method">The <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> method</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#the-newobj-unpickling-function">The <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> unpickling function</a></li>
<li><a class="reference internal" href="#the-extension-registry">The extension registry</a><ul>
<li><a class="reference internal" href="#extension-registry-api">Extension registry API</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-copy-module">The copy module</a></li>
<li><a class="reference internal" href="#pickling-python-longs">Pickling Python longs</a></li>
<li><a class="reference internal" href="#pickling-bools">Pickling bools</a></li>
<li><a class="reference internal" href="#pickling-small-tuples">Pickling small tuples</a></li>
<li><a class="reference internal" href="#protocol-identification">Protocol identification</a></li>
<li><a class="reference internal" href="#pickling-of-large-lists-and-dicts">Pickling of large lists and dicts</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="introduction">
<h2><a class="toc-backref" href="#introduction" role="doc-backlink">Introduction</a></h2>
<p>Pickling new-style objects in Python 2.2 is done somewhat clumsily
and causes pickle size to bloat compared to classic class
instances. This PEP documents a new pickle protocol in Python 2.3
that takes care of this and many other pickle issues.</p>
<p>There are two sides to specifying a new pickle protocol: the byte
stream constituting pickled data must be specified, and the
interface between objects and the pickling and unpickling engines
must be specified. This PEP focuses on API issues, although it
may occasionally touch on byte stream format details to motivate a
choice. The pickle byte stream format is documented formally by
the standard library module <code class="docutils literal notranslate"><span class="pre">pickletools.py</span></code> (already checked into
CVS for Python 2.3).</p>
<p>This PEP attempts to fully document the interface between pickled
objects and the pickling process, highlighting additions by
specifying “new in this PEP”. (The interface to invoke pickling
or unpickling is not covered fully, except for the changes to the
API for specifying the pickling protocol to picklers.)</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<p>Pickling new-style objects causes serious pickle bloat. For
example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">C</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="c1"># Omit &quot;(object)&quot; for classic class</span>
<span class="k">pass</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">C</span><span class="p">()</span>
<span class="n">x</span><span class="o">.</span><span class="n">foo</span> <span class="o">=</span> <span class="mi">42</span>
<span class="nb">print</span> <span class="nb">len</span><span class="p">(</span><span class="n">pickle</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div>
</div>
<p>The binary pickle for the classic object consumed 33 bytes, and for
the new-style object 86 bytes.</p>
<p>The reasons for the bloat are complex, but are mostly caused by
the fact that new-style objects use <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> in order to be
picklable at all. After ample consideration weve concluded that
the only way to reduce pickle sizes for new-style objects is to
add new opcodes to the pickle protocol. The net result is that
with the new protocol, the pickle size in the above example is 35
(two extra bytes are used at the start to indicate the protocol
version, although this isnt strictly necessary).</p>
</section>
<section id="protocol-versions">
<h2><a class="toc-backref" href="#protocol-versions" role="doc-backlink">Protocol versions</a></h2>
<p>Previously, pickling (but not unpickling) distinguished between
text mode and binary mode. By design, binary mode is a
superset of text mode, and unpicklers dont need to know in
advance whether an incoming pickle uses text mode or binary mode.
The virtual machine used for unpickling is the same regardless of
the mode; certain opcodes simply arent used in text mode.</p>
<p>Retroactively, text mode is now called protocol 0, and binary mode
protocol 1. The new protocol is called protocol 2. In the
tradition of pickling protocols, protocol 2 is a superset of
protocol 1. But just so that future pickling protocols arent
required to be supersets of the oldest protocols, a new opcode is
inserted at the start of a protocol 2 pickle indicating that it is
using protocol 2. To date, each release of Python has been able to
read pickles written by all previous releases. Of course pickles
written under protocol <em>N</em> cant be read by versions of Python
earlier than the one that introduced protocol <em>N</em>.</p>
<p>Several functions, methods and constructors used for pickling used
to take a positional argument named bin which was a flag,
defaulting to 0, indicating binary mode. This argument is renamed
to protocol and now gives the protocol number, still defaulting
to 0.</p>
<p>It so happens that passing 2 for the bin argument in previous
Python versions had the same effect as passing 1. Nevertheless, a
special case is added here: passing a negative number selects the
highest protocol version supported by a particular implementation.
This works in previous Python versions, too, and so can be used to
select the highest protocol available in a way thats both backward
and forward compatible. In addition, a new module constant
<code class="docutils literal notranslate"><span class="pre">HIGHEST_PROTOCOL</span></code> is supplied by both <code class="docutils literal notranslate"><span class="pre">pickle</span></code> and <code class="docutils literal notranslate"><span class="pre">cPickle</span></code>, equal to
the highest protocol number the module can read. This is cleaner
than passing -1, but cannot be used before Python 2.3.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">pickle.py</span></code> module has supported passing the bin value as a
keyword argument rather than a positional argument. (This is not
recommended, since <code class="docutils literal notranslate"><span class="pre">cPickle</span></code> only accepts positional arguments, but
it works…) Passing bin as a keyword argument is deprecated,
and a <code class="docutils literal notranslate"><span class="pre">PendingDeprecationWarning</span></code> is issued in this case. You have
to invoke the Python interpreter with <code class="docutils literal notranslate"><span class="pre">-Wa</span></code> or a variation on that
to see <code class="docutils literal notranslate"><span class="pre">PendingDeprecationWarning</span></code> messages. In Python 2.4, the
warning class may be upgraded to <code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code>.</p>
</section>
<section id="security-issues">
<h2><a class="toc-backref" href="#security-issues" role="doc-backlink">Security issues</a></h2>
<p>In previous versions of Python, unpickling would do a “safety
check” on certain operations, refusing to call functions or
constructors that werent marked as “safe for unpickling” by
either having an attribute <code class="docutils literal notranslate"><span class="pre">__safe_for_unpickling__</span></code> set to 1, or by
being registered in a global registry, <code class="docutils literal notranslate"><span class="pre">copy_reg.safe_constructors</span></code>.</p>
<p>This feature gives a false sense of security: nobody has ever done
the necessary, extensive, code audit to prove that unpickling
untrusted pickles cannot invoke unwanted code, and in fact bugs in
the Python 2.2 <code class="docutils literal notranslate"><span class="pre">pickle.py</span></code> module make it easy to circumvent these
security measures.</p>
<p>We firmly believe that, on the Internet, it is better to know that
you are using an insecure protocol than to trust a protocol to be
secure whose implementation hasnt been thoroughly checked. Even
high quality implementations of widely used protocols are
routinely found flawed; Pythons pickle implementation simply
cannot make such guarantees without a much larger time investment.
Therefore, as of Python 2.3, all safety checks on unpickling are
officially removed, and replaced with this warning:</p>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>Do not unpickle data received from an untrusted or
unauthenticated source.</p>
</div>
<p>The same warning applies to previous Python versions, despite the
presence of safety checks there.</p>
</section>
<section id="extended-reduce-api">
<h2><a class="toc-backref" href="#extended-reduce-api" role="doc-backlink">Extended <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> API</a></h2>
<p>There are several APIs that a class can use to control pickling.
Perhaps the most popular of these are <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and
<code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>; but the most powerful one is <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>. (Theres
also <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>, and were adding <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> below.)</p>
<p>There are several ways to provide <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> functionality: a
class can implement a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> method or a <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> method
(see next section), or a reduce function can be declared in
<code class="docutils literal notranslate"><span class="pre">copy_reg</span></code> (<code class="docutils literal notranslate"><span class="pre">copy_reg.dispatch_table</span></code> maps classes to functions). The
return values are interpreted exactly the same, though, and well
refer to these collectively as <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>.</p>
<p><strong>Important:</strong> pickling of classic class instances does not look for a
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> or <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> method or a reduce function in the
<code class="docutils literal notranslate"><span class="pre">copy_reg</span></code> dispatch table, so that a classic class cannot provide
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> functionality in the sense intended here. A classic
class must use <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> and/or <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> to customize
pickling. These are described below.</p>
<p><code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> must return either a string or a tuple. If it returns
a string, this is an object whose state is not to be pickled, but
instead a reference to an equivalent object referenced by name.
Surprisingly, the string returned by <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> should be the
objects local name (relative to its module); the <code class="docutils literal notranslate"><span class="pre">pickle</span></code> module
searches the module namespace to determine the objects module.</p>
<p>The rest of this section is concerned with the tuple returned by
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>. It is a variable size tuple, of length 2 through 5.
The first two items (function and arguments) are required. The
remaining items are optional and may be left off from the end;
giving <code class="docutils literal notranslate"><span class="pre">None</span></code> for the value of an optional item acts the same as
leaving it off. The last two items are new in this PEP. The items
are, in order:</p>
<table class="docutils align-default">
<tbody>
<tr class="row-odd"><td>function</td>
<td>Required.<p>A callable object (not necessarily a function) called
to create the initial version of the object; state
may be added to the object later to fully reconstruct
the pickled state. This function must itself be
picklable. See the section about <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> for a
special case (new in this PEP) here.</p>
</td>
</tr>
<tr class="row-even"><td>arguments</td>
<td>Required.<p>A tuple giving the argument list for the function.
As a special case, designed for Zope 2s
<code class="docutils literal notranslate"><span class="pre">ExtensionClass</span></code>, this may be <code class="docutils literal notranslate"><span class="pre">None</span></code>; in that case,
function should be a class or type, and
<code class="docutils literal notranslate"><span class="pre">function.__basicnew__()</span></code> is called to create the
initial version of the object. This exception is
deprecated.</p>
</td>
</tr>
</tbody>
</table>
<p>Unpickling invokes <code class="docutils literal notranslate"><span class="pre">function(*arguments)</span></code> to create an initial object,
called <em>obj</em> below. If the remaining items are left off, thats the end
of unpickling for this object and <em>obj</em> is the result. Else <em>obj</em> is
modified at unpickling time by each item specified, as follows.</p>
<table class="docutils align-default">
<tbody>
<tr class="row-odd"><td>state</td>
<td>Optional.<p>Additional state. If this is not <code class="docutils literal notranslate"><span class="pre">None</span></code>, the state is
pickled, and <code class="docutils literal notranslate"><span class="pre">obj.__setstate__(state)</span></code> will be called
when unpickling. If no <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method is
defined, a default implementation is provided, which
assumes that state is a dictionary mapping instance
variable names to their values. The default
implementation calls</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">obj</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">state</span><span class="p">)</span>
</pre></div>
</div>
<p>or, if the <code class="docutils literal notranslate"><span class="pre">update()</span></code> call fails,</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">state</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
</pre></div>
</div>
</td>
</tr>
<tr class="row-even"><td>listitems</td>
<td>Optional, and new in this PEP.<p>If this is not <code class="docutils literal notranslate"><span class="pre">None</span></code>, it should be an iterator (not a
sequence!) yielding successive list items. These list
items will be pickled, and appended to the object using
either <code class="docutils literal notranslate"><span class="pre">obj.append(item)</span></code> or <code class="docutils literal notranslate"><span class="pre">obj.extend(list_of_items)</span></code>.
This is primarily used for <code class="docutils literal notranslate"><span class="pre">list</span></code> subclasses, but may
be used by other classes as long as they have <code class="docutils literal notranslate"><span class="pre">append()</span></code>
and <code class="docutils literal notranslate"><span class="pre">extend()</span></code> methods with the appropriate signature.
(Whether <code class="docutils literal notranslate"><span class="pre">append()</span></code> or <code class="docutils literal notranslate"><span class="pre">extend()</span></code> is used depends on which
pickle protocol version is used as well as the number
of items to append, so both must be supported.)</p>
</td>
</tr>
<tr class="row-odd"><td>dictitems</td>
<td>Optional, and new in this PEP.<p>If this is not <code class="docutils literal notranslate"><span class="pre">None</span></code>, it should be an iterator (not a
sequence!) yielding successive dictionary items, which
should be tuples of the form <code class="docutils literal notranslate"><span class="pre">(key,</span> <span class="pre">value)</span></code>. These items
will be pickled, and stored to the object using
<code class="docutils literal notranslate"><span class="pre">obj[key]</span> <span class="pre">=</span> <span class="pre">value</span></code>. This is primarily used for <code class="docutils literal notranslate"><span class="pre">dict</span></code>
subclasses, but may be used by other classes as long
as they implement <code class="docutils literal notranslate"><span class="pre">__setitem__</span></code>.</p>
</td>
</tr>
</tbody>
</table>
<p>Note: in Python 2.2 and before, when using <code class="docutils literal notranslate"><span class="pre">cPickle</span></code>, state would be
pickled if present even if it is <code class="docutils literal notranslate"><span class="pre">None</span></code>; the only safe way to avoid
the <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> call was to return a two-tuple from <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>.
(But <code class="docutils literal notranslate"><span class="pre">pickle.py</span></code> would not pickle state if it was <code class="docutils literal notranslate"><span class="pre">None</span></code>.) In Python
2.3, <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> will never be called at unpickling time when
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> returns a state with value <code class="docutils literal notranslate"><span class="pre">None</span></code> at pickling time.</p>
<p>A <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation that needs to work both under Python
2.2 and under Python 2.3 could check the variable
<code class="docutils literal notranslate"><span class="pre">pickle.format_version</span></code> to determine whether to use the <em>listitems</em>
and <em>dictitems</em> features. If this value is <code class="docutils literal notranslate"><span class="pre">&gt;=</span> <span class="pre">&quot;2.0&quot;</span></code> then they are
supported. If not, any list or dict items should be incorporated
somehow in the state return value, and the <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method
should be prepared to accept list or dict items as part of the
state (how this is done is up to the application).</p>
</section>
<section id="the-reduce-ex-api">
<h2><a class="toc-backref" href="#the-reduce-ex-api" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> API</a></h2>
<p>It is sometimes useful to know the protocol version when
implementing <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>. This can be done by implementing a
method named <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> instead of <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>. <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code>,
when it exists, is called in preference over <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> (you may
still provide <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> for backwards compatibility). The
<code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> method will be called with a single integer
argument, the protocol version.</p>
<p>The object class implements both <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> and <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code>;
however, if a subclass overrides <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> but not <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code>,
the <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> implementation detects this and calls
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>.</p>
</section>
<section id="customizing-pickling-absent-a-reduce-implementation">
<h2><a class="toc-backref" href="#customizing-pickling-absent-a-reduce-implementation" role="doc-backlink">Customizing pickling absent a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation</a></h2>
<p>If no <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation is available for a particular
class, there are three cases that need to be considered
separately, because they are handled differently:</p>
<ol class="arabic simple">
<li>classic class instances, all protocols</li>
<li>new-style class instances, protocols 0 and 1</li>
<li>new-style class instances, protocol 2</li>
</ol>
<p>Types implemented in C are considered new-style classes. However,
except for the common built-in types, these need to provide a
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation in order to be picklable with protocols
0 or 1. Protocol 2 supports built-in types providing
<code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code>, <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> as well.</p>
<section id="case-1-pickling-classic-class-instances">
<h3><a class="toc-backref" href="#case-1-pickling-classic-class-instances" role="doc-backlink">Case 1: pickling classic class instances</a></h3>
<p>This case is the same for all protocols, and is unchanged from
Python 2.1.</p>
<p>For classic classes, <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> is not used. Instead, classic
classes can customize their pickling by providing methods named
<code class="docutils literal notranslate"><span class="pre">__getstate__</span></code>, <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>. Absent these, a
default pickling strategy for classic class instances is
implemented that works as long as all instance variables are
picklable. This default strategy is documented in terms of
default implementations of <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>.</p>
<p>The primary ways to customize pickling of classic class instances
is by specifying <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and/or <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> methods. It is
fine if a class implements one of these but not the other, as long
as it is compatible with the default version.</p>
<section id="the-getstate-method">
<h4><a class="toc-backref" href="#the-getstate-method" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method should return a picklable value
representing the objects state without referencing the object
itself. If no <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method exists, a default
implementation is used that returns <code class="docutils literal notranslate"><span class="pre">self.__dict__</span></code>.</p>
</section>
<section id="the-setstate-method">
<h4><a class="toc-backref" href="#the-setstate-method" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method should take one argument; it will be
called with the value returned by <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> (or its default
implementation).</p>
<p>If no <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method exists, a default implementation is
provided that assumes the state is a dictionary mapping instance
variable names to values. The default implementation tries two
things:</p>
<ul class="simple">
<li>First, it tries to call <code class="docutils literal notranslate"><span class="pre">self.__dict__.update(state)</span></code>.</li>
<li>If the <code class="docutils literal notranslate"><span class="pre">update()</span></code> call fails with a <code class="docutils literal notranslate"><span class="pre">RuntimeError</span></code> exception, it
calls <code class="docutils literal notranslate"><span class="pre">setattr(self,</span> <span class="pre">key,</span> <span class="pre">value)</span></code> for each <code class="docutils literal notranslate"><span class="pre">(key,</span> <span class="pre">value)</span></code> pair in
the state dictionary. This only happens when unpickling in
restricted execution mode (see the <code class="docutils literal notranslate"><span class="pre">rexec</span></code> standard library
module).</li>
</ul>
</section>
<section id="the-getinitargs-method">
<h4><a class="toc-backref" href="#the-getinitargs-method" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> method</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method (or its default implementation) requires
that a new object already exists so that its <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method
can be called. The point is to create a new object that isnt
fully initialized; in particular, the classs <code class="docutils literal notranslate"><span class="pre">__init__</span></code> method
should not be called if possible.</p>
<p>These are the possibilities:</p>
<ul class="simple">
<li>Normally, the following trick is used: create an instance of a
trivial classic class (one without any methods or instance
variables) and then use <code class="docutils literal notranslate"><span class="pre">__class__</span></code> assignment to change its
class to the desired class. This creates an instance of the
desired class with an empty <code class="docutils literal notranslate"><span class="pre">__dict__</span></code> whose <code class="docutils literal notranslate"><span class="pre">__init__</span></code> has not
been called.</li>
<li>However, if the class has a method named <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>, the
above trick is not used, and a class instance is created by
using the tuple returned by <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> as an argument
list to the class constructor. This is done even if
<code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> returns an empty tuple — a <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>
method that returns <code class="docutils literal notranslate"><span class="pre">()</span></code> is not equivalent to not having
<code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> at all. <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> <em>must</em> return a
tuple.</li>
<li>In restricted execution mode, the trick from the first bullet
doesnt work; in this case, the class constructor is called
with an empty argument list if no <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> method
exists. This means that in order for a classic class to be
unpicklable in restricted execution mode, it must either
implement <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> or its constructor (i.e., its
<code class="docutils literal notranslate"><span class="pre">__init__</span></code> method) must be callable without arguments.</li>
</ul>
</section>
</section>
<section id="case-2-pickling-new-style-class-instances-using-protocols-0-or-1">
<h3><a class="toc-backref" href="#case-2-pickling-new-style-class-instances-using-protocols-0-or-1" role="doc-backlink">Case 2: pickling new-style class instances using protocols 0 or 1</a></h3>
<p>This case is unchanged from Python 2.2. For better pickling of
new-style class instances when backwards compatibility is not an
issue, protocol 2 should be used; see case 3 below.</p>
<p>New-style classes, whether implemented in C or in Python, inherit
a default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation from the universal base class
object.</p>
<p>This default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation is not used for those
built-in types for which the <code class="docutils literal notranslate"><span class="pre">pickle</span></code> module has built-in support.
Heres a full list of those types:</p>
<ul class="simple">
<li>Concrete built-in types: <code class="docutils literal notranslate"><span class="pre">NoneType</span></code>, <code class="docutils literal notranslate"><span class="pre">bool</span></code>, <code class="docutils literal notranslate"><span class="pre">int</span></code>, <code class="docutils literal notranslate"><span class="pre">float</span></code>, <code class="docutils literal notranslate"><span class="pre">complex</span></code>,
<code class="docutils literal notranslate"><span class="pre">str</span></code>, <code class="docutils literal notranslate"><span class="pre">unicode</span></code>, <code class="docutils literal notranslate"><span class="pre">tuple</span></code>, <code class="docutils literal notranslate"><span class="pre">list</span></code>, <code class="docutils literal notranslate"><span class="pre">dict</span></code>. (Complex is supported by
virtue of a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation registered in <code class="docutils literal notranslate"><span class="pre">copy_reg</span></code>.)
In Jython, <code class="docutils literal notranslate"><span class="pre">PyStringMap</span></code> is also included in this list.</li>
<li>Classic instances.</li>
<li>Classic class objects, Python function objects, built-in
function and method objects, and new-style type objects (==
new-style class objects). These are pickled by name, not by
value: at unpickling time, a reference to an object with the
same name (the fully qualified module name plus the variable
name in that module) is substituted.</li>
</ul>
<p>The default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation will fail at pickling time
for built-in types not mentioned above, and for new-style classes
implemented in C: if they want to be picklable, they must supply
a custom <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation under protocols 0 and 1.</p>
<p>For new-style classes implemented in Python, the default
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation (<code class="docutils literal notranslate"><span class="pre">copy_reg._reduce</span></code>) works as follows:</p>
<p>Let <code class="docutils literal notranslate"><span class="pre">D</span></code> be the class on the object to be pickled. First, find the
nearest base class that is implemented in C (either as a
built-in type or as a type defined by an extension class). Call
this base class <code class="docutils literal notranslate"><span class="pre">B</span></code>, and the class of the object to be pickled <code class="docutils literal notranslate"><span class="pre">D</span></code>.
Unless <code class="docutils literal notranslate"><span class="pre">B</span></code> is the class object, instances of class <code class="docutils literal notranslate"><span class="pre">B</span></code> must be
picklable, either by having built-in support (as defined in the
above three bullet points), or by having a non-default
<code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation. <code class="docutils literal notranslate"><span class="pre">B</span></code> must not be the same class as <code class="docutils literal notranslate"><span class="pre">D</span></code>
(if it were, it would mean that <code class="docutils literal notranslate"><span class="pre">D</span></code> is not implemented in Python).</p>
<p>The callable produced by the default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> is
<code class="docutils literal notranslate"><span class="pre">copy_reg._reconstructor</span></code>, and its arguments tuple is
<code class="docutils literal notranslate"><span class="pre">(D,</span> <span class="pre">B,</span> <span class="pre">basestate)</span></code>, where <code class="docutils literal notranslate"><span class="pre">basestate</span></code> is <code class="docutils literal notranslate"><span class="pre">None</span></code> if <code class="docutils literal notranslate"><span class="pre">B</span></code> is the builtin
object class, and <code class="docutils literal notranslate"><span class="pre">basestate</span></code> is</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">basestate</span> <span class="o">=</span> <span class="n">B</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
</pre></div>
</div>
<p>if <code class="docutils literal notranslate"><span class="pre">B</span></code> is not the builtin object class. This is geared toward
pickling subclasses of builtin types, where, for example,
<code class="docutils literal notranslate"><span class="pre">list(some_list_subclass_instance)</span></code> produces “the list part” of
the <code class="docutils literal notranslate"><span class="pre">list</span></code> subclass instance.</p>
<p>The object is recreated at unpickling time by
<code class="docutils literal notranslate"><span class="pre">copy_reg._reconstructor</span></code>, like so:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">obj</span> <span class="o">=</span> <span class="n">B</span><span class="o">.</span><span class="fm">__new__</span><span class="p">(</span><span class="n">D</span><span class="p">,</span> <span class="n">basestate</span><span class="p">)</span>
<span class="n">B</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">basestate</span><span class="p">)</span>
</pre></div>
</div>
<p>Objects using the default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation can customize
it by defining <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and/or <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> methods. These
work almost the same as described for classic classes above, except
that if <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> returns an object (of any type) whose value is
considered false (e.g. <code class="docutils literal notranslate"><span class="pre">None</span></code>, or a number that is zero, or an empty
sequence or mapping), this state is not pickled and <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>
will not be called at all. If <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> exists and returns a
true value, that value becomes the third element of the tuple
returned by the default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>, and at unpickling time the
value is passed to <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>. If <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> does not exist,
but <code class="docutils literal notranslate"><span class="pre">obj.__dict__</span></code> exists, then <code class="docutils literal notranslate"><span class="pre">obj.__dict__</span></code> becomes the third
element of the tuple returned by <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>, and again at
unpickling time the value is passed to <code class="docutils literal notranslate"><span class="pre">obj.__setstate__</span></code>. The
default <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> is the same as that for classic classes,
described above.</p>
<p>Note that this strategy ignores slots. Instances of new-style
classes that have slots but no <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method cannot be
pickled by protocols 0 and 1; the code explicitly checks for
this condition.</p>
<p>Note that pickling new-style class instances ignores <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>
if it exists (and under all protocols). <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> is
useful only for classic classes.</p>
</section>
<section id="case-3-pickling-new-style-class-instances-using-protocol-2">
<h3><a class="toc-backref" href="#case-3-pickling-new-style-class-instances-using-protocol-2" role="doc-backlink">Case 3: pickling new-style class instances using protocol 2</a></h3>
<p>Under protocol 2, the default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation inherited
from the object base class is <em>ignored</em>. Instead, a different
default implementation is used, which allows more efficient
pickling of new-style class instances than possible with protocols
0 or 1, at the cost of backward incompatibility with Python 2.2
(meaning no more than that a protocol 2 pickle cannot be unpickled
before Python 2.3).</p>
<p>The customization uses three special methods: <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code>,
<code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> (note that <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> is again
ignored). It is fine if a class implements one or more but not all
of these, as long as it is compatible with the default
implementations.</p>
<section id="id1">
<h4><a class="toc-backref" href="#id1" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method should return a picklable value
representing the objects state without referencing the object
itself. If no <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method exists, a default
implementation is used which is described below.</p>
<p>Theres a subtle difference between classic and new-style
classes here: if a classic classs <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> returns <code class="docutils literal notranslate"><span class="pre">None</span></code>,
<code class="docutils literal notranslate"><span class="pre">self.__setstate__(None)</span></code> will be called as part of unpickling.
But if a new-style classs <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> returns <code class="docutils literal notranslate"><span class="pre">None</span></code>, its
<code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> wont be called at all as part of unpickling.</p>
<p>If no <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method exists, a default state is computed.
There are several cases:</p>
<ul class="simple">
<li>For a new-style class that has no instance <code class="docutils literal notranslate"><span class="pre">__dict__</span></code> and no
<code class="docutils literal notranslate"><span class="pre">__slots__</span></code>, the default state is <code class="docutils literal notranslate"><span class="pre">None</span></code>.</li>
<li>For a new-style class that has an instance <code class="docutils literal notranslate"><span class="pre">__dict__</span></code> and no
<code class="docutils literal notranslate"><span class="pre">__slots__</span></code>, the default state is <code class="docutils literal notranslate"><span class="pre">self.__dict__</span></code>.</li>
<li>For a new-style class that has an instance <code class="docutils literal notranslate"><span class="pre">__dict__</span></code> and
<code class="docutils literal notranslate"><span class="pre">__slots__</span></code>, the default state is a tuple consisting of two
dictionaries: <code class="docutils literal notranslate"><span class="pre">self.__dict__</span></code>, and a dictionary mapping slot
names to slot values. Only slots that have a value are
included in the latter.</li>
<li>For a new-style class that has <code class="docutils literal notranslate"><span class="pre">__slots__</span></code> and no instance
<code class="docutils literal notranslate"><span class="pre">__dict__</span></code>, the default state is a tuple whose first item is
<code class="docutils literal notranslate"><span class="pre">None</span></code> and whose second item is a dictionary mapping slot names
to slot values described in the previous bullet.</li>
</ul>
</section>
<section id="id2">
<h4><a class="toc-backref" href="#id2" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></h4>
<p>The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method should take one argument; it will be
called with the value returned by <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> or with the
default state described above if no <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method is
defined.</p>
<p>If no <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method exists, a default implementation is
provided that can handle the state returned by the default
<code class="docutils literal notranslate"><span class="pre">__getstate__</span></code>, described above.</p>
</section>
<section id="the-getnewargs-method">
<h4><a class="toc-backref" href="#the-getnewargs-method" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> method</a></h4>
<p>Like for classic classes, the <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method (or its
default implementation) requires that a new object already
exists so that its <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method can be called.</p>
<p>In protocol 2, a new pickling opcode is used that causes a new
object to be created as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">obj</span> <span class="o">=</span> <span class="n">C</span><span class="o">.</span><span class="fm">__new__</span><span class="p">(</span><span class="n">C</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
</pre></div>
</div>
<p>where <code class="docutils literal notranslate"><span class="pre">C</span></code> is the class of the pickled object, and <code class="docutils literal notranslate"><span class="pre">args</span></code> is either
the empty tuple, or the tuple returned by the <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code>
method, if defined. <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> must return a tuple. The
absence of a <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> method is equivalent to the existence
of one that returns <code class="docutils literal notranslate"><span class="pre">()</span></code>.</p>
</section>
</section>
</section>
<section id="the-newobj-unpickling-function">
<h2><a class="toc-backref" href="#the-newobj-unpickling-function" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> unpickling function</a></h2>
<p>When the unpickling function returned by <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> (the first
item of the returned tuple) has the name <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code>, something
special happens for pickle protocol 2. An unpickling function
named <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> is assumed to have the following semantics:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">__newobj__</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">cls</span><span class="o">.</span><span class="fm">__new__</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
</pre></div>
</div>
<p>Pickle protocol 2 special-cases an unpickling function with this
name, and emits a pickling opcode that, given cls and args,
will return <code class="docutils literal notranslate"><span class="pre">cls.__new__(cls,</span> <span class="pre">*args)</span></code> without also pickling a
reference to <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> (this is the same pickling opcode used by
protocol 2 for a new-style class instance when no <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>
implementation exists). This is the main reason why protocol 2
pickles are much smaller than classic pickles. Of course, the
pickling code cannot verify that a function named <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code>
actually has the expected semantics. If you use an unpickling
function named <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> that returns something different, you
deserve what you get.</p>
<p>It is safe to use this feature under Python 2.2; theres nothing
in the recommended implementation of <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> that depends on
Python 2.3.</p>
</section>
<section id="the-extension-registry">
<h2><a class="toc-backref" href="#the-extension-registry" role="doc-backlink">The extension registry</a></h2>
<p>Protocol 2 supports a new mechanism to reduce the size of pickles.</p>
<p>When class instances (classic or new-style) are pickled, the full
name of the class (module name including package name, and class
name) is included in the pickle. Especially for applications that
generate many small pickles, this is a lot of overhead that has to
be repeated in each pickle. For large pickles, when using
protocol 1, repeated references to the same class name are
compressed using the “memo” feature; but each class name must be
spelled in full at least once per pickle, and this causes a lot of
overhead for small pickles.</p>
<p>The extension registry allows one to represent the most frequently
used names by small integers, which are pickled very efficiently:
an extension code in the range 1255 requires only two bytes
including the opcode, one in the range 25665535 requires only
three bytes including the opcode.</p>
<p>One of the design goals of the pickle protocol is to make pickles
“context-free”: as long as you have installed the modules
containing the classes referenced by a pickle, you can unpickle
it, without needing to import any of those classes ahead of time.</p>
<p>Unbridled use of extension codes could jeopardize this desirable
property of pickles. Therefore, the main use of extension codes
is reserved for a set of codes to be standardized by some
standard-setting body. This being Python, the standard-setting
body is the PSF. From time to time, the PSF will decide on a
table mapping extension codes to class names (or occasionally
names of other global objects; functions are also eligible). This
table will be incorporated in the next Python release(s).</p>
<p>However, for some applications, like Zope, context-free pickles
are not a requirement, and waiting for the PSF to standardize
some codes may not be practical. Two solutions are offered for
such applications.</p>
<p>First, a few ranges of extension codes are reserved for private
use. Any application can register codes in these ranges.
Two applications exchanging pickles using codes in these ranges
need to have some out-of-band mechanism to agree on the mapping
between extension codes and names.</p>
<p>Second, some large Python projects (e.g. Zope) can be assigned a
range of extension codes outside the “private use” range that they
can assign as they see fit.</p>
<p>The extension registry is defined as a mapping between extension
codes and names. When an extension code is unpickled, it ends up
producing an object, but this object is gotten by interpreting the
name as a module name followed by a class (or function) name. The
mapping from names to objects is cached. It is quite possible
that certain names cannot be imported; that should not be a
problem as long as no pickle containing a reference to such names
has to be unpickled. (The same issue already exists for direct
references to such names in pickles that use protocols 0 or 1.)</p>
<p>Here is the proposed initial assignment of extension code ranges:</p>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head">First</th>
<th class="head">Last</th>
<th class="head">Count</th>
<th class="head">Purpose</th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td>0</td>
<td>0</td>
<td>1</td>
<td>Reserved — will never be used</td>
</tr>
<tr class="row-odd"><td>1</td>
<td>127</td>
<td>127</td>
<td>Reserved for Python standard library</td>
</tr>
<tr class="row-even"><td>128</td>
<td>191</td>
<td>64</td>
<td>Reserved for Zope</td>
</tr>
<tr class="row-odd"><td>192</td>
<td>239</td>
<td>48</td>
<td>Reserved for 3rd parties</td>
</tr>
<tr class="row-even"><td>240</td>
<td>255</td>
<td>16</td>
<td>Reserved for private use (will never be assigned)</td>
</tr>
<tr class="row-odd"><td>256</td>
<td><em>MAX</em></td>
<td><em>MAX</em></td>
<td>Reserved for future assignment</td>
</tr>
</tbody>
</table>
<p><em>MAX</em> stands for 2147483647, or <code class="docutils literal notranslate"><span class="pre">2**31-1</span></code>. This is a hard limitation
of the protocol as currently defined.</p>
<p>At the moment, no specific extension codes have been assigned yet.</p>
<section id="extension-registry-api">
<h3><a class="toc-backref" href="#extension-registry-api" role="doc-backlink">Extension registry API</a></h3>
<p>The extension registry is maintained as private global variables
in the <code class="docutils literal notranslate"><span class="pre">copy_reg</span></code> module. The following three functions are defined
in this module to manipulate the registry:</p>
<dl class="simple">
<dt><code class="docutils literal notranslate"><span class="pre">add_extension(module,</span> <span class="pre">name,</span> <span class="pre">code)</span></code></dt><dd>Register an extension code. The <em>module</em> and <em>name</em> arguments
must be strings; <em>code</em> must be an <code class="docutils literal notranslate"><span class="pre">int</span></code> in the inclusive range 1
through <em>MAX</em>. This must either register a new <code class="docutils literal notranslate"><span class="pre">(module,</span> <span class="pre">name)</span></code>
pair to a new code, or be a redundant repeat of a previous
call that was not canceled by a <code class="docutils literal notranslate"><span class="pre">remove_extension()</span></code> call; a
<code class="docutils literal notranslate"><span class="pre">(module,</span> <span class="pre">name)</span></code> pair may not be mapped to more than one code,
nor may a code be mapped to more than one <code class="docutils literal notranslate"><span class="pre">(module,</span> <span class="pre">name)</span></code>
pair.</dd>
<dt><code class="docutils literal notranslate"><span class="pre">remove_extension(module,</span> <span class="pre">name,</span> <span class="pre">code)</span></code></dt><dd>Arguments are as for <code class="docutils literal notranslate"><span class="pre">add_extension()</span></code>. Remove a previously
registered mapping between <code class="docutils literal notranslate"><span class="pre">(module,</span> <span class="pre">name)</span></code> and <em>code</em>.</dd>
<dt><code class="docutils literal notranslate"><span class="pre">clear_extension_cache()</span></code></dt><dd>The implementation of extension codes may use a cache to speed
up loading objects that are named frequently. This cache can
be emptied (removing references to cached objects) by calling
this method.</dd>
</dl>
<p>Note that the API does not enforce the standard range assignments.
It is up to applications to respect these.</p>
</section>
</section>
<section id="the-copy-module">
<h2><a class="toc-backref" href="#the-copy-module" role="doc-backlink">The copy module</a></h2>
<p>Traditionally, the <code class="docutils literal notranslate"><span class="pre">copy</span></code> module has supported an extended subset of
the pickling APIs for customizing the <code class="docutils literal notranslate"><span class="pre">copy()</span></code> and <code class="docutils literal notranslate"><span class="pre">deepcopy()</span></code>
operations.</p>
<p>In particular, besides checking for a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> or <code class="docutils literal notranslate"><span class="pre">__deepcopy__</span></code>
method, <code class="docutils literal notranslate"><span class="pre">copy()</span></code> and <code class="docutils literal notranslate"><span class="pre">deepcopy()</span></code> have always looked for <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code>,
and for classic classes, have looked for <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code>,
<code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>.</p>
<p>In Python 2.2, the default <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> inherited from object made
copying simple new-style classes possible, but slots and various
other special cases were not covered.</p>
<p>In Python 2.3, several changes are made to the <code class="docutils literal notranslate"><span class="pre">copy</span></code> module:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> is supported (and always called with 2 as the
protocol version argument).</li>
<li>The four- and five-argument return values of <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> are
supported.</li>
<li>Before looking for a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> method, the
<code class="docutils literal notranslate"><span class="pre">copy_reg.dispatch_table</span></code> is consulted, just like for pickling.</li>
<li>When the <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> method is inherited from object, it is
(unconditionally) replaced by a better one that uses the same
APIs as pickle protocol 2: <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code>, <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code>, and
<code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>, handling <code class="docutils literal notranslate"><span class="pre">list</span></code> and <code class="docutils literal notranslate"><span class="pre">dict</span></code> subclasses, and handling
slots.</li>
</ul>
<p>As a consequence of the latter change, certain new-style classes
that were copyable under Python 2.2 are not copyable under Python
2.3. (These classes are also not picklable using pickle protocol
2.) A minimal example of such a class:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">C</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__new__</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">a</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__new__</span><span class="p">(</span><span class="bp">cls</span><span class="p">)</span>
</pre></div>
</div>
<p>The problem only occurs when <code class="docutils literal notranslate"><span class="pre">__new__</span></code> is overridden and has at
least one mandatory argument in addition to the class argument.</p>
<p>To fix this, a <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> method should be added that returns
the appropriate argument tuple (excluding the class).</p>
</section>
<section id="pickling-python-longs">
<h2><a class="toc-backref" href="#pickling-python-longs" role="doc-backlink">Pickling Python longs</a></h2>
<p>Pickling and unpickling Python longs takes time quadratic in
the number of digits, in protocols 0 and 1. Under protocol 2,
new opcodes support linear-time pickling and unpickling of longs.</p>
</section>
<section id="pickling-bools">
<h2><a class="toc-backref" href="#pickling-bools" role="doc-backlink">Pickling bools</a></h2>
<p>Protocol 2 introduces new opcodes for pickling <code class="docutils literal notranslate"><span class="pre">True</span></code> and <code class="docutils literal notranslate"><span class="pre">False</span></code>
directly. Under protocols 0 and 1, bools are pickled as integers,
using a trick in the representation of the integer in the pickle
so that an unpickler can recognize that a bool was intended. That
trick consumed 4 bytes per bool pickled. The new bool opcodes
consume 1 byte per bool.</p>
</section>
<section id="pickling-small-tuples">
<h2><a class="toc-backref" href="#pickling-small-tuples" role="doc-backlink">Pickling small tuples</a></h2>
<p>Protocol 2 introduces new opcodes for more-compact pickling of
tuples of lengths 1, 2 and 3. Protocol 1 previously introduced
an opcode for more-compact pickling of empty tuples.</p>
</section>
<section id="protocol-identification">
<h2><a class="toc-backref" href="#protocol-identification" role="doc-backlink">Protocol identification</a></h2>
<p>Protocol 2 introduces a new opcode, with which all protocol 2
pickles begin, identifying that the pickle is protocol 2.
Attempting to unpickle a protocol 2 pickle under older versions
of Python will therefore raise an “unknown opcode” exception
immediately.</p>
</section>
<section id="pickling-of-large-lists-and-dicts">
<h2><a class="toc-backref" href="#pickling-of-large-lists-and-dicts" role="doc-backlink">Pickling of large lists and dicts</a></h2>
<p>Protocol 1 pickles large lists and dicts “in one piece”, which
minimizes pickle size, but requires that unpickling create a temp
object as large as the object being unpickled. Part of the
protocol 2 changes break large lists and dicts into pieces of no
more than 1000 elements each, so that unpickling neednt create
a temp object larger than needed to hold 1000 elements. This
isnt part of protocol 2, however: the opcodes produced are still
part of protocol 1. <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementations that return the
optional new listitems or dictitems iterators also benefit from
this unpickling temp-space optimization.</p>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0307.rst">https://github.com/python/peps/blob/main/peps/pep-0307.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0307.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#introduction">Introduction</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#protocol-versions">Protocol versions</a></li>
<li><a class="reference internal" href="#security-issues">Security issues</a></li>
<li><a class="reference internal" href="#extended-reduce-api">Extended <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> API</a></li>
<li><a class="reference internal" href="#the-reduce-ex-api">The <code class="docutils literal notranslate"><span class="pre">__reduce_ex__</span></code> API</a></li>
<li><a class="reference internal" href="#customizing-pickling-absent-a-reduce-implementation">Customizing pickling absent a <code class="docutils literal notranslate"><span class="pre">__reduce__</span></code> implementation</a><ul>
<li><a class="reference internal" href="#case-1-pickling-classic-class-instances">Case 1: pickling classic class instances</a><ul>
<li><a class="reference internal" href="#the-getstate-method">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-setstate-method">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-getinitargs-method">The <code class="docutils literal notranslate"><span class="pre">__getinitargs__</span></code> method</a></li>
</ul>
</li>
<li><a class="reference internal" href="#case-2-pickling-new-style-class-instances-using-protocols-0-or-1">Case 2: pickling new-style class instances using protocols 0 or 1</a></li>
<li><a class="reference internal" href="#case-3-pickling-new-style-class-instances-using-protocol-2">Case 3: pickling new-style class instances using protocol 2</a><ul>
<li><a class="reference internal" href="#id1">The <code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> method</a></li>
<li><a class="reference internal" href="#id2">The <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code> method</a></li>
<li><a class="reference internal" href="#the-getnewargs-method">The <code class="docutils literal notranslate"><span class="pre">__getnewargs__</span></code> method</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#the-newobj-unpickling-function">The <code class="docutils literal notranslate"><span class="pre">__newobj__</span></code> unpickling function</a></li>
<li><a class="reference internal" href="#the-extension-registry">The extension registry</a><ul>
<li><a class="reference internal" href="#extension-registry-api">Extension registry API</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-copy-module">The copy module</a></li>
<li><a class="reference internal" href="#pickling-python-longs">Pickling Python longs</a></li>
<li><a class="reference internal" href="#pickling-bools">Pickling bools</a></li>
<li><a class="reference internal" href="#pickling-small-tuples">Pickling small tuples</a></li>
<li><a class="reference internal" href="#protocol-identification">Protocol identification</a></li>
<li><a class="reference internal" href="#pickling-of-large-lists-and-dicts">Pickling of large lists and dicts</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0307.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>