When new to contributing to open source projects, half the problem can be just knowing where to start.
Maybe the code you wish to write is straightforward, and yet somehow you’re not sure where to begin.
Let’s start with the development setup on your local machine (later, we will go through the Github steps). The local setup will be done using conda.
- fork the project you wish to contribute to on Github
- clone the fork to your local machine,
- add the original project repository as a remote,
git remote add <x>
- pull the latest code from the original project repository,
git fetch <x>
- checkout your
git checkout master
- merge the latest code from the original repository in,
git merge <x>/master
- create and checkout a new feature branch that you are going to work on,
git checkout -b <y>
.gitignore, if not add them
- from the project root,
python setup.py egg_infowhich outputs files (including
requires.txt) into a
- using the contents of
requires.txt, create an
distspecifying the Python version, conda packages to install, PyPI packages to install, etc.
- create a conda environment,
conda env create --name <z> --file dist/environment.yml
- activate the environment,
source activate <z>
- check for “extra” packages required only when doing development work (usually
these will be in a
Pipfile, or something similar; check developer documentation if needed)
- install these “extra” packages,
pip install --requirement requirements.txtor
pip install --pipfile Pipfile(the latter being developed at time of writing. The latter may also contain the project’s core dependencies, in which case do manual
pipinstalls of the “extra” packages or put them in your own
requirements.txt. Again, if in doubt, check the documentation)
- ensure the conda environment knows about the project you cloned,
pip install --editable .
- run any tests or linting checks as per developer documentation
- make your changes to the code and repeat previous step
- check the output of any local scripts (not in the project directory) created for your own personal testing
- push up your feature branch
git push origin <y>
- select feature branch in Github, initiate pull request and do a final check of your changes
- submit pull request in Github
On a Unix-like OS, Python is already installed. Let’s suppose you wish to contribute to pytube.
https://github.com/nficano/pytube.git, make desired changes to files in
/path/to/pytube, and you’re good to go…
But what about testing your changes?
open up a Python shell,
from pytube import YouTube...
… and it works.
There are a couple of problems I can think of with the above:
(a) if you are not in
/path/to/pytube, you get
ModuleNotFoundError: No module named 'pytube'
(b) if your project has third-party dependencies (ok, pytube isn’t the best example here because it doesn’t), but if it did, and these aren’t part of the default Python installation, as soon as you write any code using one of these packages, you will get an error, as Python doesn’t know where to look for them.
For the latter, when getting
<pkg> not found error, we can just do
This means you are doing a global pip installation. This is bad practice as there are system files that rely on the default Python installation, so messing around with that isn’t the best idea.
Conda solution without “git clone”
Create a new (empty) conda environment
conda create --name <myenv>.
conda install pytube?
Well, actually pytube isn’t a conda package (package here refers to a software distribution, i.e. the source files of pytube compiled down / compressed into some compact, human non-readable format).
More importantly, when installing packages, conda looks in certain default locations in Anaconda Cloud, and pytube isn’t in one of those locations.
This is because officially, pytube, as a PyPI package, lives in the PyPI cloud (funny that), and this is where the project owner uploads the repository to.
conda install pytube doesn’t work.
On the other hand, pip does look in the right place by default.
conda install pip
pip install pytube
source activate <myenv>
…and we’re golden.
Regardless of your working directory, you can open a shell
or write a script with
from pytube import YouTube without getting an error.
pip install pytube looks at the dependencies
setup.py and resolves them, so you won’t get a
<pkg> not found
error when writing code that uses them.
So we are now good for testing our changes. Speaking of which, in which
files do we make our changes? We haven’t done a
git clone yet.
It turns out
pip install pytube installed the pytube source code into
Therefore we can forget about
git clone and just edit the files in the
Conda solution with “git clone”
Whilst the above solution is feasible, it is generally not recommended.
Your gut is probably telling you it feels strange to edit files in
rather than somewhere like
Further, there is no version control (done previously via
You could do
git init, but what about getting the same branches as on Github
and the same commit history?
If you’re a Git expert, you might know how to do the above quickly. But if you were, you probably wouldn’t be reading this .
Let’s try a different approach.
git clone https://github.com/nficano/pytube.git
Great, now we have version control and Git-wise everything is good to go in
Next, make the changes to files in
Suppose, for testing purposes, we added some
print() calls. We run some code in
a Python shell / script and wait for our logs to appear… but they don’t!
This is because, at the moment, there are two versions of the pytube source code installed:
When you do
from pytube import YouTube, it is pointing to the code in
With the conda environment activated, you can check this with
sys.path. In the resulting
list, there is
When Python sees an
import statement, it goes through each of the
directories in this
list and looks for a directory or file
Thus we are back to square one, having to edit files in
in order to test the effects of our changes locally.
We could fix this in two ways:
(a) Alter the environment so that the module search path includes
rm -r /path/to/anaconda/envs/<myenv>/lib/python/site-packages/pytube
cp -R ~/Documents/pytube /path/to/anaconda/<myenv>/lib/python/site-packages
It’s certainly not obvious how to do (a). (b) is straightforward but means
either we work in
~/Documents/pytube and have to keep copying files over
(impractical) or we work in
/path/to/anaconda/envs/<myenv>/lib/python/site-packages/pytube (not what
However, there is a
pip command that lets us choose (b) and work in
~/Documents/pytube without having to keep copying files over:
pip install --editable ~/Documents/pytube.
This works by removing
and replacing it with a
pytube.egg-link file with contents
from pytube import YouTube now points to
and our logs appear.
Conda solution with “git clone” and environment.yml
Although the above solution isn’t bad, you have to accept
whatever Python version
conda install pip gives you.
For a library like pytube which is Python 2 and 3 compatible, this is a problem.
Before making a pull request, you need to test your code works for both versions and therefore need to create two separate environments specifying a different Python version each time.
pip install <pkg> means all the dependencies listed in
<pkg> will be installed via
pip, i.e. you will end up installing a PyPI
package for each dependency. What if you
wanted to install a conda package for a dependency?
One solution is to use a conda
environment.yml file. Typically,
this is stored in
dist in the project root;
dist is usually in
Although this method is more manual, it is more flexible.
First, you get the list of
python setup.py egg_info which creates
.egg-info directory (also usually in
.gitignore) containing a
requires.txt file, e.g.
arrow<0.12.1,>=0.8.0 logfury>=0.1.2 requests>=2.9.1 six>=1.10 tqdm>=4.5.0
You can then create an
environment.yml file specifying the Python version
and the dependecies above, e.g.
name: <myenv> dependencies: - python=3.6.3 - ipython=6.1.0 - requests>=2.9.1 - pip: - arrow<0.12.1,>=0.8.0 - logfury>=0.1.2 - six>=1.10 - tqdm>=4.5.0
As you can see, some of the dependencies will be installed as conda packages,
requests, others as PyPI packages, e.g.
Going back to our
pytube example, if
<myenv> is activated, deactivate it
then delete it
conda remove --name <myenv> --all
and create a new one, this time using an
environment.yml file, e.g.
name: pytube-dev-python3 dependencies: - python=3.6.3 - ipython=6.1.0
conda env create --name <myenv> --file dist/environment.yml
source activate <myenv>
pip install --editable ~/Documents/pytube.
The first step is to login to your Github account, find the repository you wish to contribute to, and fork it.
This creates a copy of the original repository in your profile’s list of repositories which you then pull down (do not pull down the original repository)
git clone firstname.lastname@example.org:<my-username>/<my-forked-repo-name>.git
which creates an
origin remote, i.e. a link between the local copy of the
project and the copy stored online under your profile’s list of repositories.
Next, create a new remote linking to the original repository (so that when others make changes to it, you can pull those changes into your forked repository)
git remote add upstream https://github
and make sure locally you are up to date (this needs to be done periodically as you work, and always before submitting a pull request)
git fetch upstream.
upstream is locally up to date, merge it into your
git checkout master
git merge upstream/master
then create and checkout a new branch which you are going to work on, e.g.
git checkout -b <my-new-feature>.
When you are done, push this up
git push origin <my-new-feature>
which will create a copy of this branch in your forked repository.
Now go into Github, select your forked repository, and create a pull request . If all your changes look ok, add any notes for the reviewer, and submit the pull request.