Setting up python conscientiously

Andrikos Christos
Analytics Vidhya
Published in
9 min readNov 4, 2020

--

You probably are a full of energy developer ready to code the next big thing or even a scientist that needs to validate a promising and novel idea! You love AI, web, cryptos or any other hot and trending concepts. And you probably consider python a great tool to start with. And you know, it actually is! Python favours probably one of the greatest and most mature communities out there. You got a plethora of implementation examples, multiple libraries provide the functionality you need out of the box. Various blog posts have been produced to assist you in that long-lasting journey, an expedition to your own pythonic world.

Ok let’s go then, let’s start writing some code. Step 0 is to get some stack overflowed routine and just
execute it to assert its output. Next you should be probably ready to launch.

my_python_scirpt.py:

import pandas as pd
import numpy as np
if __name__=="__main__":
#
# TODO: write some extra magnificent piece of code here
#
print("something is happening here")
> python my_python_script.py

Unfortunately you get the following error:

ModuleNotFoundError            Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>
----> 1 import pandas as pd
ModuleNotFoundError: No module named 'pandas'

And at that point, you start googling about the error to find plenty of solutions that may finally work correctly in your case. The specific article, however, should not be treated as another problem fixing manuscript. It is about explaining python setting up process. Instead of providing pure steps, we are going to investigate what’s going under the hood to be able to mitigate any future issues related to python setup.

Disclaimer: I mostly am a macOS guy. I do develop on macOS and deploy on Linux and usually in a dockerized fashion. That’s why the article is mostly macOS oriented. The same principles, however, may be applied to windows systems as well.

Pre-installed python

First things first, any macOS or Linux system comes python ready. That means, python is already installed and in most cases is already included in the corresponding $PATH environmental variable. Python is essential for multiple scripts or software components that are there to enhance the end-user experience. That's the reason why Python is part of any macOS (/Linux) distribution. In that means, typing python into a shell should trigger the interactive Python interpreter prompting you to enter a valid python expression. But that python installation is not dedicated for development (at least on macOS ). This is due to (a) security concerns along with (b) version incompatibility. Regarding the last one, consider that macOS features python 2.7.x(at the time that the particular post is edited) which is compatible with any python based functionality offered by Apple but not with the latest Keras,Pandas or Numpy implementations, for example. Or think of research tools that have not been ported to the latest python versions and most probably they are not to. Hence, we need most of the times to install at least one python version to work by securely and easily.

PIP

Another thing meant to make our lives easier is pip, the python package installer. Pip can be easily used to install any site-packages (i.e. third-party components) that we need to build on top of. For example to install the wide known package pandas we can type the following command into a shell:

> pip install pandas

And after successfully intalled pandas we can go on with our python code snippet, importing and using pandas. By the way, for those that are not aware of, pandas is a great tool for data preprocessing and light analysis.

Of course pip does not come out of the box like python does. We need to manually install it. The easier way to install is pip is by using the corresponding package installer (i.e. brew if Mac else apt). By typing the following commands:

> brew install python<PVersion>-pip #macOS , <PVersion> --> python version 2 or 3> apt install python<PVersion>-pip #Ubuntu, <PVersion> --> python version 2 or 3

we should install the pip and then we should be able to install the python packages we need. Right? Kinda...

Pip installs packages under a <some_path>/python<Version>/site_packages directory which is the the one of the places that the python interpreter looks up when an import code line is met. In this way the installed pandas package is available to our code snippet above.

The two path variables (i.e. <some_path> and <Version>) the <PVersion> of python-pip installation along with the $PATHenvironmental variable are the cause for most of the python installation pain-points which this article is all about.

PATH, the environmental variable

The $PATH enviromental variable registers all the directories that host executables. Despite being a string like most of the enviromental variables it behaves like being an ordered list. Whenever an executable is requested the OS scans $PATH to locate the first matching one.

> echo $PATH
/Users/someone/.pyenv/shims:/usr/local/sbin:/usr/local/Manual_installs:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:

That’s a typical $PATH layout. Assuming that you have installed the following two executables:

  • /usr/local/sbin/test
  • /sbin/test
    keystroking test will result to the execution of the first one which actually is the first matching based on the order dictated by the $PATH.

Python multiversioning

Assume that your laptop affords a pre-installed pythonX and you have already installed the corresponding pythonX-pip. However, you end up with that common need to import a module that is not compatible with that specific python version Xbut with the Y one. The apparent solution to that is to install the required pythonY along with the related and pray that every-other module is compatible with the Y version.

> brew install python3
> brew install python3-pip

That is going to work great for you. So whenever you need to explicitly define a particular python version just go ahead and type so:

e.g.

> pip3 install <python_3_package>
> python3 my_python_3_code.py

But this is extra work and not so elegant. Is it?

Pyenv

Instead of indicating python versions manually you can use the pyenv package. Pyenv is responsible for installing and managing multiple python versions seamlessly. The user is the one to dictate the global or local python version to hold by commands like:

> pyenv shell 3.8.1  # sets python 3.8.1 to be the default python for # the current tty session

and at the next moment typing python or pip should be bounded to the set version.

This take place by a concept named shims. Shims are lightweight executables that simply pass your command along to pyenv. I think shims more like an executable adapter pattern. Pyenv injects a directory of shims just in front of the $PATH. That means that by typing python the directory of shims is the first to be visited and guess what, python executable seems to be there! According to the PyEnv original documentation when you keystroke pip the OS:

  • Search your PATH for an executable file named pip
  • Find the pyenv shim named pip at the beginning of your PATH
  • Run the shim named pip, which in turn passes the command along to pyenv

By setting the version you prefer the command python and pip is bounded to the that version. And this makes python version management really a piece of cake!

Keep in mind that pyenv is not bootstrapped by Python. It does not depend on any python installation. Thus its setup conceptually is pretty clear. A drawback however is that it requires a good overview of the installed versions and the “things” that they depend on them. Consider the case of a python linter running on a specific python version. The flexibility on global python version management may affect the linters behaviour.

Virtualenv

Ok then you set pyenv up, and you are good to go. But sooner or later you gonna suffer the mixed requirements problem. Imagine working on two distinct projects of yours, project A and project B, each one requiring a different list of third party packages. In the case that each project is developed under a distinct python version, the mixed requirements problem may not be evident at first glance. But that is not the common case. Usually, you gonna work on multiple projects in the same python version. Using pip in a global way should not be considered a wise choice! Apart from a highly spoiledpythonX/site_packages/ directory, you will not be able to distinguish all those packages that you have installed for project A and those related to project B. This is the reason why virtualenv has been created; to provide a requirements isolation level. That is going to work great for you.

To install you can:

> pip install virtualenv

Next you can create a new project directory and crate the related virtual environment:

> mkdir my_project_dir && cd project_dir  # create the project dir and move there
> virtualenv the_venv # in case of Python 2
> python3 -m venv the_venv # in case of Python 3

Activate the the virtual environment and you are ready!

> source the_venv/bin/activate

Whatever package you download is going to be stored into that the_venv/.../site-packages directory.

Virtualenvwrapper

There are a couple of high-level tools to manage python virtual environments in a clean and lean way. Pipenv and virtualenvwrapper are the two most prominent ones to the best of my knowledge. In this post, we will go with the virtualenvwrapper one, because this is the one that I am familiar with. It is a python package that abstracts all directory setup details away from the python source code directory, to a one dedicated to storing virtual environments. In this context we need to create the dedicated directory:

> mkdir ~/.virtualenvs  # virtual environments are going to be 
# installed into the home folder into a hiffen file

Being a python package it can be easily installed by:

> pip install virtualenvwrapper

or

> sudo pip install virtualenvwrapper #avoid such sudo ops if possible!

Next we need to make the functionality of that package accessible by terminal. In other words whenever we type any package virtualenvwrapper subcommand in terminal we expect the corresponding functionality to take place. These can be enabled by adding the following lines into our ~/bashrc or ~zshrc files.

> export WORKON_HOME=$HOME/.virtualenvs  # define the virtual envs # direcotry
> source /usr/local/bin/virtualenvwrapper.sh # load virtualenvwrapper # functionality to terminal

And probably you are ready to create your first environment:

> makevirtualenv my_env  # that's going to crate a virtual env named 
# my_end --> located at $WORKON
> workon my_env # to start working into a my_env
> deactivate # is going to deactivate the particular enabled
# virtualenv

By running the following command

> lsvirtualenvs

you can list the available virtual environments. By checking the ~/.virtualenvs/ directory, you can ensure that all the listed virtualenvs are there as well.

Bonus hint: keep in mind that virtualenvwrapper is able to define the python version you need to work with. By typing mkvirtualenv test -p python2, test is going to be a python2 virtualenv as long as you got python2 available on your $PATH.

Mixing the up, the common pitfall

Despite the fact that you can combine all of the three tools into a highly automated and productive environment, there are cases that things may go really wrong. Plus many of the automation you are going to build may become obsolete in the near future. One of the most tricky concepts in the entire set up process is that there are tools to manage python environments that are created in python! For instance you may have virtualenv and virtualenvwrapper installed and runninng in python2 but to handle python3 virtual environments. This is hard to follow concept and in my opinion the causation of all those pain points that may be faced during the “perfect setup”. Any reordering in the contents of $PATH environmental variable or a nasty soft may break that beautiful automation.

Containerized

This is a production-oriented solution that lacks all of the complexity mentioned above. By creating a specialized docker container for a specific project, you can install whatever you need globally into that container. Do not forget though the three main drawbacks of the particular solution, the overall performance will slightly deteriorate, visualizations are challenging to work interactively, docker development mode is not the same with the production one.

Final thoughts

I have used all of the four solutions. I think Virtualenvweapper is the one that worked best for me. I do work with multiple python versions. To mitigate that I bind each virtual environment with a particular python version by using the -p option of Virtualevnwrapper. My installation battle plan goes as it follows:

  • Install the python I want to keep as the main one (this is python3.5 for me)
  • Install the pip for that particular python version (in my case, python3-pip)
  • Install virtualenv on that python setup
  • Install virtualenvwrapper on that python setup
  • Bonus tip: install any linting tools on that setup instead of installing them on distinct virtual environemnts
  • Install any python version I need and use the virtualenvwrapper’s -p option to bind that particular verision to the ralted evironment For me, this is the most straightforward and most clear pipeline. I don't like using exotic tools managing python versions and virtual environments unless I am quite confident on what is going on under the hood.

--

--

Andrikos Christos
Analytics Vidhya
0 Followers

Software engineer and computer scientist. passionate about Machine Learning and AI. New to the sport of writing...