Workbench

For most of our work, we will use Python extended with various packages. Here is an overview with some recommendations.

  1. Python
  2. IPython
  3. Installing on your own machine
  4. NLTK
  5. NumPy
  6. Matplotlib
  7. SciPy
  8. scikit-learn

Python

We will use Python 3. If any of you are still only familiar with Python 2, you can find  the differences between Python 2 and Python 3 on the web. The main differences for my work seem to be the following:

  • Treatment of strings and encoding. In Python 3 all strings are unicode. It makes the life a lot easier when working with exotic languages, like Norwegian. But if you have written programs earlier in Python 2 using encode and decode, you may have to revise them.
  • Print-statements. You can no longer write print a, b. You have to write print(a, b). This gets a bit more complicated at places where you do not want a line shift. In combination with the new print-statement, you should also use "new" format.
  • Python 3 uses generators in many places where Python 2 produced lists. For example, range(10) is no longer a list; it is an object of type range. This makes no difference for a construction like for i in range(10): . But it may make a difference in some situations where you explicitly require a list. There you may have to write, say list(range(10)) where range(10) would have sufficed in Python 2

IPython

We will work interactively when working with Python The IPython package adds a lot of functionality, including auto indent and command completion with the tab key. Also the help function with ? and ?? are useful, and so are the magic commands.

Try for example

  • import nltk
  • nltk.FreqD #and hit tab
  • nltk.FreqDistribution?
  • nltk.FreqDistribution??
  • history?

On IFI's machines, the command python or python2 will start Python, while python3 will start Python 3, v. 3.6.8.  iPython does not work with the current version of Python 3 on IFI's machines running RedHat. If you use the command /opt/ifi/python-3.7/bin/python3, you will get Python 3, v. 3.7.3. You will also be able to use iPython using /opt/ifi/python-3.7/bin/ipython3.

Se also the updates on paths and jupyter notebook here.

Installing on your own machine

If you want to install Python together with all the packages we will use, I recommend Anaconda. It manages all the packages and dependencies between them and makes it easy to stay updated. It should work on Linux, iOS and Windows. If you run Ubuntu, you may be able to install all packages without installing Anaconda.

NLTK

The Natural Langauage Toolkit (NLTK) is installed on IFI's machines and is part of Anaconda. To get access to the nltk data on IFI machines, you should first add the following line to the end of your .bashrc-file.

  • export NLTK_DATA=/projects/nlp/nltk_data

Then logout and login again, and you’re all set. On your own machine, you have to download the NLTK_Data. When using NLTK you are advised to run

  • import nltk, re, pprint

NumPy

NumPy is an extension package to Python for numerical computation. We will need it for several purposes, in particular both matplotlib and scikit-learn are built on top of NumPy. The standard (and recommended) way to import NumPy is as

  • import numpy as np

Afterwards NumPy commands are called by e.g.

  • myarray=np.arange(20)

Matplotlib

Matplotlib is an extension package to Python and NumPy for graphics. For our purposes, we will use the module pyplot. Standard way to import this is as

  • import matplotlib.pyplot as plt

Then we can call pyplot commands with e.g.

  • plt.plot(...)
  • plt.bar(...)
  • plt.hist(...)
  • plt.boxplot(...)

SciPy

scikit-learn

Published Aug. 12, 2019 3:24 PM - Last modified Sep. 10, 2019 9:49 AM