Introductory Unix & Python Tutorial
Welcome to the Unix and Python Tutorial for BE.180! We will be holding tutorial sessions on Wednesday and Thursday this week (time/location of tutorials to be discussed in class), and we expect for you all to attend. The first homework assignment, which assumes basic Python programming knowledge, will be handed out on Thursday, so it is in your best interest to come to these sessions.
Upon completion of this tutorial, you should know how to do the following:
- Log in and out of UNIX
- Manipulate files and execute programs in UNIX
- Start and end a Python session
- Use strings, lists and dictionaries
- Write loops (for- and while-loops) and conditional statements (if-else if-then)
- Open, modify and close a file
- Compose and call simple functions
We will run Python on Athena for this tutorial. You can log in via a Unix workstation in an Athena cluster, or using an SSH (telnet) program on a PC/MAC. Since the computers we will be using are running Windows, we will use the SSH client SecureCRT. Open SecureCRT (found under Start --> Program Files) and connect with the hostname: athena.dialup.mit.edu. Log in with your Athena username and password. You can download SecureCRT onto your personal computer from the Information Services and Technology website.
Foundational Engineering Concepts
- Abstraction: Abstraction is a mechanism to reduce and factor out details so that one can focus on few concepts at a time. (From http://en.wikipedia.org/wiki/Abstraction_%28programming%29)
- EXAMPLES: Physics to EE (1800s), in synthetic biology: parts (proteins) --> devices (inverter) --> systems (ring oscillator)
- Standardization: Standardization is the process of publicly establishing a technical standard. (From http://en.wikipedia.org/wiki/Standardization)
- EXAMPLES: use of the International System of Units (SI) in science, standardization of screw threads and nuts, use of SBML (systems biology markup language), use of FASTA files to carry sequence data
- Decomposition: Decomposition, otherwise known as factoring or decoupling, refers to the process by which a complex problem or system is broken down into parts that are easier to conceive, understand, program, and maintain. (From http://en.wikipedia.org/wiki/Decomposition_%28computer_science%29)
- EXAMPLE: break construction of a building into smaller separate tasks to be handled by experts (strucutral engineer, architect, etc)
Basic programming concepts
- Data abstraction: Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. (From http://en.wikipedia.org/wiki/Abstraction_%28programming%29#Data_abstraction)
- EXAMPLES: lists and dictionaries in Python
- Abstraction of functions / reusable code: Reusability is the likelihood a segment of structured code can be used again to add new functionalities with slight or no modification. Reusable code reduces implementation time, increases the likelihood that prior testing and use has eliminated bugs and localizes code modifications when a change in implementation is required. Subroutines or functions are the simplest form of reuse. A chunk of code is regularly organized using modules or namespaces. The ability to reuse relies on the ability to build larger things from smaller parts, and being able to identify commonalities among those parts. Reusable code can be implemented within the context of the individual, whereas standardization implies a public specification of interface. (From http://en.wikipedia.org/wiki/Reusability)
- EXAMPLE: a function "mRNAtoprotein" which converts an mRNA sequence into an amino acid sequence, which can be used as a black box and called without knowing the details of how the function works
- Iteration: Iteration is the repetition of a process. It can be used both as a general term, synonymous with repetition, and to describe a specific form of repetition with a mutable state (for example the counter "i" in a "for" loop). When used in the first sense, recursion is an example of iteration. (From http://en.wikipedia.org/wiki/Iteration)
- EXAMPLES: for loops, while loops
- Recursion: Mathematical recursion involves a function calling on itself over and over until reaching an end state. A commonly used example is the function used to calculate the factorial of an integer. (From http://en.wikipedia.org/wiki/Recursion)
- EXAMPLES: Fibonacci numbers: f(n) = f(n − 1) + f(n − 2), factorials
- Object orientation: The idea behind object-oriented programming (OOP) is that a computer program may be seen as composed of a collection of individual units, or objects, that act on each other, as opposed to a traditional view in which a program may be seen as a collection of functions or procedures, or simply as a list of instructions to the computer. (From http://en.wikipedia.org/wiki/Object-oriented_programming#Formal_definition)
- EXAMPLES: C++, Java, Python and C#
- Other languages with object-oriented features: Ada, BASIC, Lisp, Fortran, Pascal
- OOP concepts
- Class: the unit of definition of data and behavior; a class (for example, Dog) is the basis of modularity and structure in an object-oriented computer program
- Object: an instance of a class; for example, Spot the Dog
- Inheritance: a mechanism for creating subclasses; inheritance provides a way to define a (sub)class as a specialization or subtype of a more general class (as Dog is a subclass of Canine). It is intended to help reuse of existing code.
- Abstraction: the ability of a program to ignore the details of an object's (sub)class and work at a more generic level when appropriate; for example, Spot the Dog may be treated as a Dog much of the time
- Scope: The scope of a variable describes where in a program's text a variable may be used, while extent (or lifetime) describes when in a program's execution a variable has a value. (From http://en.wikipedia.org/wiki/Scope_%28programming%29)
- Local variable: A variable that is given local scope. Such variables are accessible only from the function or block in which it is declared.
- Global variable: A variable that does not belong to any subroutine in particular and can therefore can be accessed from any context in a program.
- Algorithm: a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state. Informally, the concept of an algorithm is often illustrated by the example of a recipe, although many algorithms are much more complex; algorithms often have steps that repeat (iterate) or require decisions (such as logic or comparison). (From http://en.wikipedia.org/wiki/Algorithm#Classification_by_design_paradigm)
- EXAMPLES: sort algorithm, search algorithm
- Computational complexity theory: the branch of the theory of computation that studies the resources required during computation to solve a given problem. The most common resources are time (how many steps it takes to solve a problem) and space (how much memory it takes).
- Big O notation: Big O (standing for "order of") notation is a mathematical notation used to describe the asymptotic behavior of functions. More precisely, it is used to describe an asymptotic upper bound for the magnitude of a function. Big O notation is useful when analyzing algorithms for efficiency. Big O can also be used to describe the error term in an approximation to a mathematical function.(From http://en.wikipedia.org/wiki/Big_O_notation)
- EXAMPLE: Consider an instance that is n bits long that can be solved in n² steps. We say the problem has time complexity O(n²).
Low-level programming concepts
- Control structures
- Expressions and operators (+,==,*, etc)
- Expressions: An expression in a programming language is a combination of values, variables, operators, and functions that are interpreted (evaluated) according to the particular rules of precedence and of association for a particular programming language, which computes and then produces (returns, in a stateful environment) another value. The expression is said to evaluate to that value. As in mathematics, the expression is (or can be said to have) its evaluated value; the expression is a representation of that value. (From http://en.wikipedia.org/wiki/Expression_%28programming%29)
- Operators: Programming languages generally have a set of operators that are similar to operators in mathematics: they are somehow special functions. In addition to arithmetic operations they often perform boolean operations on truth values and string operations on strings of text. Unlike functions, operators often provide the primitive operations of the language, their name consists of punctuation rather than alphanumeric characters, and they have special infix syntax and irregular parameter passing conventions. (From http://en.wikipedia.org/wiki/Operator_%28programming%29)
- Loops: A loop is a sequence of statements which is specified once but which may be carried out several times in succession. (From http://en.wikipedia.org/wiki/Control_structure)
- Count-controlled loops: (For loops) Loops that can be repeated a certain number of times.
- Condition-controlled loops: (While loops) Loops that can be repeated until some condition changes.
- Conditional statements: (If-Then clause) Requests to the computer to make an execution choice based on a given condition. (From http://en.wikipedia.org/wiki/Conditional_statement)
- Subroutines: (functions, methods, procedures, or subprograms) A portion of code within a larger program, which performs a specific task and is relatively independent of the remaining code. A subroutine is often coded so that it can be executed ("called") several times and/or from several places during a single execution of the program, possibly even by itself. (From http://en.wikipedia.org/wiki/Subroutine)
- Functions: Function and procedure often denote a subprogram that takes parameters and may or may not have a return value. Many make the distinction between "functions", that possess return values and appear in expressions, versus "procedures", that possess no return values and appear in statements.
- Expressions and operators (+,==,*, etc)
- Input / output (I/O): The collection of interfaces that different functional units (sub-systems) of an information processing system use to communicate with each other, or the signals (information) sent through those interfaces. Inputs are the signals received by the unit, and outputs are the signals sent from it. (From http://en.wikipedia.org/wiki/Input/output)
- EXAMPLE: Keyboards and mice are considered input devices of a computer and monitors and printers are considered output devices of a computer.
- Pseudo-code: Description of a computer programming algorithm that uses the structural conventions of programming languages, but omits detailed subroutines or language-specific syntax. (From http://en.wikipedia.org/wiki/Pseudocode)
- Debugging: Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another. (From http://en.wikipedia.org/wiki/Debugging)
- Unit testing / modular coding: A unit test is a procedure used to verify that a particular module of source code is working properly. The idea about unit tests is to write test cases for all functions and methods so that whenever a change causes a regression, it can be quickly identified and fixed. (From http://en.wikipedia.org/wiki/Unit_testing)
- Code validation: how to know your program does not work
- Assertions: A programming language construct that indicates an assumption on which the program is based. Programmers add assertions to the source code as part of the development process. They are intended to simplify debugging and to make potential errors easier to find. Since an assertion failure often indicates a bug, many assertion implementations will print additional information about the source of the problem (such as the filename and line number in the source code or a stack trace). Most implementations will also halt the program's execution immediately. (From http://en.wikipedia.org/wiki/Assert)
- Error handling
Tips on programming style
- Always comment your code. This will allow other people to understand what you have done, and it can provide a reminder to yourself in the future.
- Use logical and consistent indentation and spacing. This makes one's code more readable and may be required in some programming languages.
- We will be providing a Matlab Tutorial during the semester, which will include:
- Getting started with MATLAB
- Data structures: Matrices, vectors
- Matrix manipulation
- Function declaration
- Data visualization
- Numerical solvers