BE.180

From OpenWetWare
Jump to navigationJump to search

BE.180 -- Biological Engineering Programming

BE.180 is a new course that will be offered for the first time in the Spring of 2006. This is a required course for second semester sophomores who are majoring in Biological Engineering. Drew Endy is leading the development of the course.

The current course description is: "Example problems from biological engineering are used to develop structured computer programming skills and explore the theory and practice of complex systems design and construction."

After thinking a bit, talking with Tom Knight, and thinking some more, the theme of the course is starting to focus on the idea of designing and coding the CAD environment for Biological Engineering (aka, Engineering of Biology. aka, "synthetic biology: an engineering technology based on living systems." The way this might work is for a different CAD environment feature to be used each week as a motivating problem for exploring concepts in structured system design and implementation, and computer programming. As a result of this approach, there may not be any *single* language used in the course -- we'll use the languages best suited to the problems.

Current Tasks

  1. Collect list of features that can serve as motivating examples (finish at 11/3 meeting)

Future Tasks

  1. Prioirtize list of features, evaluating on coverage of concepts, fun factor, and feasibility
  2. Develop teaching modules

Next Planning Meeting

  1. Thursday November 3, 2005, 6-7:30 (immediately following BE.526), 56-614

Feature Sandbox (add features ideas here, no idea is bad in the sandbox)

  1. Analysis of Sequence Data for Patterns / Features
    • Manipulation of data/text
    • Pattern Recognition (regular expressions), Logic
  2. Tuning Codon Usage
    • Codon tables
    • Expression optimization
    • Watermarking/Sign your work
    • Restriction site removal/addition
    • Design a genetic code such that every point mutation is non-sense (HINT: should you use a 4-base code?)
  3. Some sort of graphing/visual depiction
  4. Some sort of modeling/dynamic systems
    • Stochastic simulations
    • Feedback
    • Non-linear systems
      • Perhaps an extension for BE.181 - use an ODE solver in 180, then in 181 you study how that ODE solver worked
  5. Biological information representation (note, this is a half-baked idea, please contribute/revise - RS)
    • There are hierarchies of biological components (parts, devices and systems) and associated information. There are hierarchical data structures. How should biological information be represented? How do you search this information? Such a topic could conceivably cover topics/concepts like data structures, searching, object-oriented programming etc.
  6. Search part-part junctions for secondary structure (e.g., ORF/RBS/Operator junctions)
    • Call-out to external tool
  7. Clustering of parts, devices and systems
    • How do you group or categorize parts? How do you measure "distances" between systems? This becomes important as the registry gets bigger and as we move to a network of registries. I want to find all BioBricks that are "similar" to mine.
    • clustering methods
    • distance measures
  8. Homology modeling (useful for synthetic transcription factors or designing linkers?)
    • Call-out to external tool
    • Simulated annealing, energy minimization, MD
  9. Designing sequences to be synthesized subject to various constraints (e.g. design overlapping oligos with similar melting temps)
  10. Obfuscating Biological Programming
    • Ok, I know this is probably a terrible idea, but this is something I'd be interested in AC
    • What's the worst way to design a biological system?
    • How do I design a sequence that will have the functionality of poliovirus that could not be detected by current algorithms (e.g. whatever Blue Heron uses)
    • Cellular Perl to solve everyone's quick hacking needs
  11. Automated design and layout
    • If I have a circuit with some repressors, I'd rather a system automatically determine which repressors work best together. How to best determine a kind of family fitness?
  12. Information storage, compression and transmission - how well does biology do at storing information in an efficient manner,. Where is information rich and where is it not (ties into finding sequence motifs above). Maybe some information theory could feature here.--BC

Concepts in system design and implementation, and computer programming

In parallel with developing a features list, it may be appropriate to generate a list of fundamentals that students should take away from the course. (Credit to Bree for also suggesting this.) Features could be matched with concepts from CS (and lessons from biology) as appropriate. Started brainstorming a list here. -- RS

I think that it's important to make the teaching of programming concepts the primary goal, and the applications to biological engineering should be secondary. So students should all take away how to program, and in the process they may develop interesting and/or useful tools. Maybe the way to do it is to decide on the relevant programming concepts, and then come up with features as teaching examples and assignments that cover these concepts (not the other way around). I think that there is nothing wrong with teaching it like a normal introductory programming course, but using all biologically relevant examples and problems along the way. -- TMT
[this is a good suggestion, but if your primary goal is to learn how to program computers you should take 6.001 -- Endy 20:04, 6 Oct 2005 (EDT)].
What are the prerequisites for the course? If you don't explicitly require 6.001 or 1.00, you'll probably get a number of students who effectively don't know how to program. I don't think that you can require either of those courses without adding them to the BE SB requirements, which isn't currently the case. So you may need to teach it as an intro programming course. -Jkm 15:52, 10 Oct 2005 (EDT)

  1. Data structures: linked lists, trees, stacks etc. Classes and inheritance?
    • Abstraction
  2. Searching: depth-first, breadth-first, heuristic
    • Backtracking, pruning, efficiency analysis
  3. Sorting
  4. Computation: FSM's, Turing machines etc.
  5. Documentation
    • Readable commenting and coding styles
    • How not to program like biology
  6. Recursion
  • Explicit vrs implicit algorithms: A computer program [generally] embodies an explicit algorithm ie a very specific logic flow that is spelled out in the code. Biological systems, in contrast, embody implicit algorithms -- there is no "master intelligence" making sure that the right things happen at the right time, it all happens as a consequence of the basic laws of physics and chemistry ie the behavior is emergent, at a very basic level. From that perspective, it might be instructive to compare "designed" programs with "evolved" programs eg programs produced via Genetic Programming. See, for example, PushGP - AM
  • If most people have no programming background, using multiple languages may be a bad idea, because they'll not only have to get their head around fundamental programming concepts but also repeatedly have to ramp up on different syntaxes, language idiosyncracies and capabilities etc. That could become pretty frustrating. - AM
  • Digital vs. continuous systems - This may fall under abstraction, but it would be cool to explicitly deal with the pros/cons of digital and continuous systems. This could lead to a discussion of why biology might use one or the other and why programming might use one or the other and how we can approximate both when programming.--BC
  • For abstraction - originally give them a function (or a class) that they use as a black box. Then, as a later assignment (when they know a little more), have them rewrite that function with the restriction that the black box stay the same.
  • Thinking more about programming languages, I could understand having two - a traditional first language (Java, C) and MATLAB. In my experience, something like Java is eaiser for a lot of basic CS topics - data structures, OOP, etc. I find Matlab simpler for data manipulation, though.

Biology/CS Matching

  1. Simple Control structures (case, for, etc) - Give them a DNA sequence, ask for the mRNA sequence and the protein sequence (given a codon table? or writing their own?)
    • Probably an early exercise - can you manipulate information and pass it back and forth between functions.
  2. Nonlinear systems - Modeling an oscillator or switch. Analyze it analytically, then duplicate that computationally (stolen from 7.81). Basically, replicate Collins/switch or Elowitz/oscillator models.
    • Elowitz modified his simulation to compare discrete vs continuous. Could be a chance to introduce the idea of stochasticity, why it's important, how to model it.
  3. Data structures, classes, OOP, etc - Annotation. If I give you a stretch of DNA, how do you represent all of the associated data (object of class DNA contains objects of class Promoter, each of which contain a location/strength/whatever). Includes writing the procedures to pass data between objects?
  4. Binary Trees - Evaluating lineages. This is getting hazy, but basically feeding off of Elowitz's noise measurements. Noise calculation ends up playing a significant role in BE.309, so this might be a good time to introduce the idea. -Jkm 00:08, 4 Nov 2005 (EST)

Programming Concepts and Tutorials

  1. High-level general engineering concepts
    1. Abstraction: Abstraction is a mechanism to reduce and factor out details so that one can focus on few concepts at a time. (From http://en.wikipedia.org/wiki/Abstraction_%28programming%29)
      1. EXAMPLES: Physics to EE (1800s), in synthetic biology: parts (proteins) --> devices (inverter) --> systems (ring oscillator)
    2. Standardization: Standardization is the process of publicly establishing a technical standard. (From http://en.wikipedia.org/wiki/Standardization)
      1. EXAMPLES: use of the International System of Units (SI) in science, standardization of screw threads and nuts, use of SBML (systems biology markup language), use of FASTA files to carry sequence data
    3. Decomposition: Decomposition, otherwise known as factoring or decoupling, refers to the process by which a complex problem or system is broken down into parts that are easier to conceive, understand, program, and maintain. (From http://en.wikipedia.org/wiki/Decomposition_%28computer_science%29)
      1. EXAMPLE: VLSI electronics, 1970s (?)
  2. High-level programming concepts
    1. Data abstraction: Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation.
      1. EXAMPLES: lists and dictionaries in Python
    2. Abstraction of functions / reusable code: Reusability is the likelihood a segment of structured code can be used again to add new functionalities with slight or no modification. Reusable code reduces implementation time, increases the likelihood that prior testing and use has eliminated bugs and localizes code modifications when a change in implementation is required. Subroutines or functions are the simplest form of reuse. A chunk of code is regularly organized using modules or namespaces. The ability to reuse relies on the ability to build larger things from smaller parts, and being able to identify commonalities among those parts. Reusable code can be implemented within the context of the individual, whereas standardization implies a public specification of interface. (From http://en.wikipedia.org/wiki/Reusability)
      1. EXAMPLE: a function "mRNAtoprotein" which converts an mRNA sequence into an amino acid sequence, which can be called without knowing the details of how the function works
    3. Iteration: Iteration is the repetition of a process. It can be used both as a general term, synonymous with repetition, and to describe a specific form of repetition with a mutable state (for example the counter "i" in a "for" loop). When used in the first sense, recursion is an example of iteration. (From http://en.wikipedia.org/wiki/Iteration)
      1. EXAMPLES: for loops, while loops
    4. Recursion: Mathematical recursion involves a function calling on itself over and over until reaching an end state. A commonly used example is the function used to calculate the factorial of an integer. (From http://en.wikipedia.org/wiki/Recursion)
      1. EXAMPLES: Fibonacci numbers: f(n) = f(n − 1) + f(n − 2),
    5. Object orientation: The idea behind object-oriented programming (OOP) is that a computer program may be seen as composed of a collection of individual units, or objects, that act on each other, as opposed to a traditional view in which a program may be seen as a collection of functions or procedures, or simply as a list of instructions to the computer. (From http://en.wikipedia.org/wiki/Object-oriented_programming#Formal_definition)
      1. EXAMPLES: C++, Java, Python and C#
      2. Other languages with object-oriented features: Ada, BASIC, Lisp, Fortran, Pascal
      3. OOP concepts
        1. Class: the unit of definition of data and behavior; a class (for example, Dog) is the basis of modularity and structure in an object-oriented computer program
        2. Object: an instance of a class; for example, Spot the Dog
        3. Inheritance: a mechanism for creating subclasses; inheritance provides a way to define a (sub)class as a specialization or subtype of a more general class (as Dog is a subclass of Canine). It is intended to help reuse of existing code.
        4. Abstraction: the ability of a program to ignore the details of an object's (sub)class and work at a more generic level when appropriate; for example, Spot the Dog may be treated as a Dog much of the time
    6. Scope: The scope of a variable describes where in a program's text a variable may be used, while extent (or lifetime) describes when in a program's execution a variable has a value. (From http://en.wikipedia.org/wiki/Scope_%28programming%29)
      1. Local variable: A variable that is given local scope. Such variables are accessible only from the function or block in which it is declared.
      2. Global variable: A variable that does not belong to any subroutine in particular and can therefore can be accessed from any context in a program.
    7. Algorithm: a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state. Informally, the concept of an algorithm is often illustrated by the example of a recipe, although many algorithms are much more complex; algorithms often have steps that repeat (iterate) or require decisions (such as logic or comparison). (From http://en.wikipedia.org/wiki/Algorithm#Classification_by_design_paradigm)
        1. EXAMPLES: sort algorithm, search algorithm
      1. Computational complexity theory: the branch of the theory of computation that studies the resources required during computation to solve a given problem. The most common resources are time (how many steps it takes to solve a problem) and space (how much memory it takes).
      2. Big O notation: Big O (standing for "order of") notation is a mathematical notation used to describe the asymptotic behavior of functions. More precisely, it is used to describe an asymptotic upper bound for the magnitude of a function. Big O notation is useful when analyzing algorithms for efficiency. Big O can also be used to describe the error term in an approximation to a mathematical function.(From http://en.wikipedia.org/wiki/Big_O_notation)
  3. Low-level programming concepts
    1. Programming style
      1. Comment your code
      2. Indentation and spacing: Using a logical and consistent indent and spacing style makes one's code more readable. (From http://en.wikipedia.org/wiki/Programming_style)
    2. Control structures
      1. Expressions and operators (+,==,*, etc)
      2. Loops: A loop is a sequence of statements which is specified once but which may be carried out several times in succession. (From http://en.wikipedia.org/wiki/Control_structure)
        1. Count-controlled loops: (For loops) Loops that can be repeated a certain number of times.
          1. EXAMPLE: Python
factorial = 1
for i in range(1, 6):    # range() gives values to one less than the second argument
  factorial *= i
print factorial
        1. Condition-controlled loops: (While loops) Loops that can be repeated until some condition changes.
      1. Conditional statements: (If-Then clause) Requests to the computer to make an execution choice based on a given condition. (From http://en.wikipedia.org/wiki/Conditional_statement)
        1. EXAMPLE:
If (condition) Then
    (statements)
Else
    (statements)
End If 
      1. Functions
    1. Input / output
    2. Programming stategies
      1. Pseudo-code: Description of a computer programming algorithm that uses the structural conventions of programming languages, but omits detailed subroutines or language-specific syntax. (From http://en.wikipedia.org/wiki/Pseudocode)
      2. Unit testing / modular coding
      3. Code validation
      4. Debugging
      5. Asserts
      6. Error handling
  1. Here we will upload an introductory Python tutorial, including:
    1. Getting started with Python
    2. Data structures: lists, dictionaries, strings
    3. Text manipulation
    4. Regular expressions
    5. Function declaration
    6. Biopython
  2. Here we will upload an introductory MATLAB tutorial, including:
    1. Getting started with MATLAB
    2. Data structures: Matrices, vectors
    3. Matrix manipulation
    4. Data visualization
    5. Numerical solvers
    6. Function declaration
    7. Systems biology toolbox
  3. Possible problem sets or examples
    1. Use Python to practice reading/writing files and using dictionaries to translate a string of nucleotides into amino acids. Each key would be a codon, and each value would be the corresponding amino acid. Give students a text file with all 64 codons that they have to read into a dictionary. Output the protein sequence to a file. Then, have students verify their translation using Biopython's Bio.Translate as discussed here. http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank/
    2. Use Python to find the start and stop codons in a cDNA sequence; output their positions and the length of the encoded protein.