====================
Data Files Support
====================
Old packaging installation methods in the Python ecosystem
have traditionally allowed installation of "data files", which
are placed in a platform-specific location. However, the most common use case
for data files distributed with a package is for use *by* the package, usually
by including the data files **inside the package directory**.
Setuptools focuses on this most common type of data files and offers three ways
of specifying which files should be included in your packages, as described in
the following sections.
include_package_data
====================
First, you can simply use the ``include_package_data`` keyword.
For example, if the package tree looks like this::
project_root_directory
├── setup.py # and/or setup.cfg, pyproject.toml
└── src
└── mypkg
├── __init__.py
├── data1.rst
├── data2.rst
├── data1.txt
└── data2.txt
and you supply this configuration:
.. tab:: setup.cfg
.. code-block:: ini
[options]
# ...
packages = find:
package_dir =
= src
include_package_data = True
[options.packages.find]
where = src
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_packages
setup(
# ...,
packages=find_packages(where="src"),
package_dir={"": "src"},
include_package_data=True
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools]
# ...
# By default, include-package-data is true in pyproject.toml, so you do
# NOT have to specify this line.
include-package-data = true
[tool.setuptools.packages.find]
where = ["src"]
then all the ``.txt`` and ``.rst`` files will be automatically installed with
your package, provided:
1. These files are included via the |MANIFEST.in|_ file, like so::
include src/mypkg/*.txt
include src/mypkg/*.rst
2. OR, they are being tracked by a revision control system such as Git, Mercurial
or SVN, and you have configured an appropriate plugin such as
:pypi:`setuptools-scm` or :pypi:`setuptools-svn`.
(See the section below on :ref:`Adding Support for Revision
Control Systems` for information on how to write such plugins.)
package_data
============
By default, ``include_package_data`` considers **all** non ``.py`` files found inside
the package directory (``src/mypkg`` in this case) as data files, and includes those that
satisfy (at least) one of the above two conditions into the source distribution, and
consequently in the installation of your package.
If you want finer-grained control over what files are included, then you can also use
the ``package_data`` keyword.
For example, if the package tree looks like this::
project_root_directory
├── setup.py # and/or setup.cfg, pyproject.toml
└── src
└── mypkg
├── __init__.py
├── data1.rst
├── data2.rst
├── data1.txt
└── data2.txt
then you can use the following configuration to capture the ``.txt`` and ``.rst`` files as
data files:
.. tab:: setup.cfg
.. code-block:: ini
[options]
# ...
packages = find:
package_dir =
= src
[options.packages.find]
where = src
[options.package_data]
mypkg =
*.txt
*.rst
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_packages
setup(
# ...,
packages=find_packages(where="src"),
package_dir={"": "src"},
package_data={"mypkg": ["*.txt", "*.rst"]}
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools.packages.find]
where = ["src"]
[tool.setuptools.package-data]
mypkg = ["*.txt", "*.rst"]
The ``package_data`` argument is a dictionary that maps from package names to
lists of glob patterns. Note that the data files specified using the ``package_data``
option neither require to be included within a |MANIFEST.in|_ file, nor
require to be added by a revision control system plugin.
.. note::
If your glob patterns use paths, you *must* use a forward slash (``/``) as
the path separator, even if you are on Windows. Setuptools automatically
converts slashes to appropriate platform-specific separators at build time.
.. note::
Glob patterns do not automatically match dotfiles (directory or file names
starting with a dot (``.``)). To include such files, you must explicitly start
the pattern with a dot, e.g. ``.*`` to match ``.gitignore``.
If you have multiple top-level packages and a common pattern of data files for all these
packages, for example::
project_root_directory
├── setup.py # and/or setup.cfg, pyproject.toml
└── src
├── mypkg1
│ ├── data1.rst
│ ├── data1.txt
│ └── __init__.py
└── mypkg2
├── data2.txt
└── __init__.py
Here, both packages ``mypkg1`` and ``mypkg2`` share a common pattern of having ``.txt``
data files. However, only ``mypkg1`` has ``.rst`` data files. In such a case, if you want to
use the ``package_data`` option, the following configuration will work:
.. tab:: setup.cfg
.. code-block:: ini
[options]
packages = find:
package_dir =
= src
[options.packages.find]
where = src
[options.package_data]
* =
*.txt
mypkg1 =
data1.rst
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_packages
setup(
# ...,
packages=find_packages(where="src"),
package_dir={"": "src"},
package_data={"": ["*.txt"], "mypkg1": ["data1.rst"]},
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools.packages.find]
where = ["src"]
[tool.setuptools.package-data]
"*" = ["*.txt"]
mypkg1 = ["data1.rst"]
Notice that if you list patterns in ``package_data`` under the empty string ``""`` in
``setup.py``, and the asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml``, these
patterns are used to find files in every package. For example, we use ``""`` or ``*``
to indicate that the ``.txt`` files from all packages should be captured as data files.
Also note how we can continue to specify patterns for individual packages, i.e.
we specify that ``data1.rst`` from ``mypkg1`` alone should be captured as well.
.. note::
When building an ``sdist``, the datafiles are also drawn from the
``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if
the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``.
.. note::
If using the ``include_package_data`` argument, files specified by
``package_data`` will *not* be automatically added to the manifest unless
they are listed in the |MANIFEST.in|_ file or by a plugin like
:pypi:`setuptools-scm` or :pypi:`setuptools-svn`.
.. https://docs.python.org/3/distutils/setupscript.html#installing-package-data
exclude_package_data
====================
Sometimes, the ``include_package_data`` or ``package_data`` options alone
aren't sufficient to precisely define what files you want included. For example,
consider a scenario where you have ``include_package_data=True``, and you are using
a revision control system with an appropriate plugin.
Sometimes developers add directory-specific marker files (such as ``.gitignore``,
``.gitkeep``, ``.gitattributes``, or ``.hgignore``), these files are probably being
tracked by the revision control system, and therefore by default they will be
included when the package is installed.
Supposing you want to prevent these files from being included in the
installation (they are not relevant to Python or the package), then you could
use the ``exclude_package_data`` option:
.. tab:: setup.cfg
.. code-block:: ini
[options]
# ...
packages = find:
package_dir =
= src
include_package_data = True
[options.packages.find]
where = src
[options.exclude_package_data]
mypkg =
.gitattributes
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_packages
setup(
# ...,
packages=find_packages(where="src"),
package_dir={"": "src"},
include_package_data=True,
exclude_package_data={"mypkg": [".gitattributes"]},
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools.packages.find]
where = ["src"]
[tool.setuptools.exclude-package-data]
mypkg = [".gitattributes"]
The ``exclude_package_data`` option is a dictionary mapping package names to
lists of wildcard patterns, just like the ``package_data`` option. And, just
as with that option, you can use the empty string key ``""`` in ``setup.py`` and the
asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml`` to match all top-level packages.
Any files that match these patterns will be *excluded* from installation,
even if they were listed in ``package_data`` or were included as a result of using
``include_package_data``.
Subdirectory for Data Files
===========================
A common pattern is where some (or all) of the data files are placed under
a separate subdirectory. For example::
project_root_directory
├── setup.py # and/or setup.cfg, pyproject.toml
└── src
└── mypkg
├── data
│ ├── data1.rst
│ └── data2.rst
├── __init__.py
├── data1.txt
└── data2.txt
Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg``,
while the ``.txt`` files are directly under ``mypkg``.
In this case, the recommended approach is to treat ``data`` as a namespace package
(refer :pep:`420`). With ``package_data``,
the configuration might look like this:
.. tab:: setup.cfg
.. code-block:: ini
[options]
# ...
packages = find_namespace:
package_dir =
= src
[options.packages.find]
where = src
[options.package_data]
mypkg =
*.txt
mypkg.data =
*.rst
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_namespace_packages
setup(
# ...,
packages=find_namespace_packages(where="src"),
package_dir={"": "src"},
package_data={
"mypkg": ["*.txt"],
"mypkg.data": ["*.rst"],
}
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools.packages.find]
# scanning for namespace packages is true by default in pyproject.toml, so
# you do NOT need to include the following line.
namespaces = true
where = ["src"]
[tool.setuptools.package-data]
mypkg = ["*.txt"]
"mypkg.data" = ["*.rst"]
In other words, we allow Setuptools to scan for namespace packages in the ``src`` directory,
which enables the ``data`` directory to be identified, and then, we separately specify data
files for the root package ``mypkg``, and the namespace package ``data`` under the package
``mypkg``.
With ``include_package_data`` the configuration is simpler: you simply need to enable
scanning of namespace packages in the ``src`` directory and the rest is handled by Setuptools.
.. tab:: setup.cfg
.. code-block:: ini
[options]
packages = find_namespace:
package_dir =
= src
include_package_data = True
[options.packages.find]
where = src
.. tab:: setup.py
.. code-block:: python
from setuptools import setup, find_namespace_packages
setup(
# ... ,
packages=find_namespace_packages(where="src"),
package_dir={"": "src"},
include_package_data=True,
)
.. tab:: pyproject.toml (**BETA**) [#beta]_
.. code-block:: toml
[tool.setuptools]
# ...
# By default, include-package-data is true in pyproject.toml, so you do
# NOT have to specify this line.
include-package-data = true
[tool.setuptools.packages.find]
# scanning for namespace packages is true by default in pyproject.toml, so
# you need NOT include the following line.
namespaces = true
where = ["src"]
Summary
=======
In summary, the three options allow you to:
``include_package_data``
Accept all data files and directories matched by |MANIFEST.in|_ or added by
a :ref:`plugin