Workbench

For most of our work, we will use Python extended with various packages. Here is an overview with some recommendations.

  1. Python
  2. IPython
  3. Installing on your own machine
  4. NLTK
  5. NumPy
  6. Matplotlib
  7. SciPy
  8. scikit-learn

Python

First question for a Python user is Python 2 or Python 3. Python 3 was first introduced nearly 10 years ago. The community has been slow to adapt because Python 3 is not backward compatible. Many extensions packages and the user's own old programs would not run in Python 3. In the last 3 years or so, things have changed. Python 3 has developed and become more mature, most important packages are available for Python 3, and more and more users have taken the step. IFI's new first programming course, IN1000, use Python 3. Hence we will recommend Python 3, and we will use it in the examples in the course.

On IFI's machines, the commands python2 and ipython2 will start Python 2, while python3 and ipython3 will start Python 3. For some strange reason, the command python will start Python 2, while ipython will start Python3 (Aug. 2018)

You can find  the differences between Python 2 and Python 3 on the web. The main differences for my work seem to be the following:

  • Treatment of strings and encoding. In Python 3 all strings are unicode. It makes the life a lot easier when working with exotic languages, like Norwegian. But if you have written programs earlier in Python 2 using encode and decode, you may have to revise them.
  • Print-statements. You can no longer write print a, b. You have to write print(a, b). This gets a bit more complicated at places where you do not want a line shift. In combination with the new print-statement, you should also use "new" format.
  • Python 3 uses generators in many places where Python 2 produced lists. For example, range(10) is no longer a list; it is an object of type range. This makes no difference for a construction like for i in range(10): . But it may make a difference in some situations where you explicitly require a list. There you may have to write, say list(range(10)) where range(10) would have sufficed in Python 2

IPython

We will work interactively when working with Python The IPython package adds a lot of functionality, including auto indent and command completion with the tab key. Also the help function with ? and ?? are useful, and so are the magic commands.

Try for example

  • import nltk
  • nltk.FreqD #and hit tab
  • nltk.FreqDistribution?
  • nltk.FreqDistribution??
  • history?

Installing on your own machine

If you want to install Python together with all the packages we will use, I recommend Anaconda. It manages all the packages and dependencies and makes it easy to stay updated. It should work on Linux, iOS and Windows. (I have only tried it on Linux so far). If you run Ubuntu, you may be able to install all packages without installing Anaconda.

NLTK

The Natural Langauage Toolkit (NLTK) is installed on IFI's machines and is part of Anaconda. To get access to the nltk data on IFI machines, you should first add the following line to the end of your .bashrc-file.

  • export NLTK_DATA=/projects/nlp/nltk_data

Then logout and login again, and you’re all set. On your own machine, you have to download the NLTK_Data. When using NLTK you are advised to run

  • import nltk, re, pprint

NumPy

NumPy is an extension package to Python for numerical computation. We will need it for several purposes, in particular both matplotlib and scikit-learn are built on top of NumPy. The standard (and recommended) way to import NumPy is as

  • import numpy as np

Afterwards NumPy commands are called by e.g.

  • myarray=np.arange(20)

Matplotlib

Matplotlib is an extension package to Python and NumPy for graphics. For our purposes, we will use the module pyplot. Standard way to import this is as

  • import matplotlib.pyplot as plt

Then we can call pyplot commands with e.g.

  • plt.plot(...)
  • plt.bar(...)
  • plt.hist(...)
  • plt.boxplot(...)

SciPy

scikit-learn

Published Aug. 13, 2018 9:21 PM - Last modified Aug. 13, 2018 9:21 PM