Skip to Main Content

Intro to Coding

This guide will introduce students to the basics of programming in any language, as well as resources useful for learning how to code in any number of languages.

Introduction to Python

Python is an interpreted, high-level, general-purpose programming language.  It contains a large standard library, which provides tools suited to many tasks. For example, you can use NumPy to deal with mathematical related problems, use NLTK to do natural language processing, and use Tensorflow to study machine learning. The library in python can be regarded as packages in other programming languages.  

Python is easy to extend and to be embedded in many other software products.

Python is designed to be beautiful and fun to use. It is similar to human language, easy to learn.  Its core philosophy is summarized in the document The Zen of Python such as:

·       Beautiful is better than ugly.

·       Explicit is better than implicit.

·       Simple is better than complex.

·       Complex is better than complicated.

·       Readability counts.

Installation

The recommended way to install python is to use Anaconda Distribution.  Anaconda Distribution is a free, easy-to-install package manager, environment manager, and Python distribution with a collection of 1,500+ open source packages with free community support. 

There are two versions of Python in use: Python 2.7 and Python 3.7.  Python 2 has more third-party packages and not all port to 3. Python 3 is the present and future, and it has better support for internationalization. Beginners are encouraged to learn the latest version. If you want to use particular packages that are only available in Python 2, then you consider Python 2. 

Conda is a package manager that helps you find and install packages. After the installation of Anaconda, you can use Conda to manage python packages and environments. 

If you do not want to install python, you can consider using Google Colab. Google Colab is a free cloud service that allows you to write and execute Python in your browser.  

Jupyter Notebook

Jupyter Notebooks is an open-source web application that combines code, rich text, mathematical equations and visualizations, into a single document. It can be used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Jupyter Notebooks allow data scientists to create and share their documents, from codes to full-blown reports. It provides many common user interface controls for exploring code and data interactively. Jupyter Notebooks are a powerful way to write and iterate on your Python code.  All of these making it not only a good research tool but also a great tool for teaching.

When you install Anaconda distribution, both Python and the notebook application are included. 

Python Fundamentals

Listed below are some key ideas in Python Fundamentals:

  • Strings: A string is a sequence of characters and can be assigned by using quotes, either ‘Word’ or “Words” will work.
  •  Lists: Lists in Python store multiple values in a single variable. The values can be mix types(numbers or character type values)
  • Tuples: A tuple is a collection that is ordered and unchangeable. Tuples are similar to lists, except tuples are immutable.
  •  Dictionaries: A Python dictionary is an associative container which permits access based on a key, rather than an index. We should use a dictionary when we need fast and convenient access to an element of a collection based on a search key rather than an index
  •  If ... Else: Python supports the usual logical conditions from mathematics.
  •  For Loops: A for loop is used for iterating over a sequence.
  •  Function: A function is a block of code that only runs when it is called.  You can pass data, known as parameters, into a function. A function can return data as a result. 

Data Analysis with Python

Data Analysis with Python covers the topics of manipulating, processing, cleaning, crunching data in Python. The basic packages you need are NumPy and Pandas. The two main packages used for data visualization are Matplotlib and Seaborn. When you want to run simple Ordinary least squares(OLS) regression, you can choose ​statsmodels.api. ​If you want to analyze data with other regression models or do machine learning, you should consider Scikit-learn package.

Textual Analysis with Python

Python is a good tool to do text analysis.