当前位置:首页 >> 其它课程 >>

生物信息学课件(中国科学院)


Bioinformatics 生物信息学
韩春生 研究员 中国科学院动物研究所 2011冬季学期

自我介绍
? ? 2000美国密苏里州立大学生物化学系博士 2000~2003美国休斯敦Lexicon制药公司高级生物信息学科学家

?

2004度中科院百人计划入学者,目前研究方向包括: 1、精子

发生 2、干细胞自我更新与分化
技术专长:分子生物学、干细胞、生物信息学

?

课程描述
课程编号:511012Y 课程属性:学科基础课 学时/学分:40/2 预修课程:分子生物学、遗传学、统计学、C语言 教学目的和要求: 生物信息学是利用数学模型和计算机程序对生物学研究中产生的数据进行分 析计算并得出结论和产生新的科学假说的一种科研手段。通过本课程的教授, 使得学生能够: ? 懂得生物学中有哪些数学问题,数学模型和数学手段; ? 利用数据库技术、计算机编程和网页工具来进行基本的生物信息学分析; ? 掌握核酸和蛋白质序列分析的基本技能; ? 懂得如何从芯片和其他高通量技术产生的数据来构建基因调控网络; ? 本课程的开设要求学生有分子生物学、遗传学、统计学及C语言的基础知识 和技能,更重要的是要求学生要努力培养自己利用数学模型和逻辑思维来思 考和解决生物学问题。本课程为生物学各专业博士、硕士研究生的学科基础 课,同时也可作为数理、计算机等相关学科研究生的选修课。本课程的考核 方式为大作业和期末考试,比例为50%:50%。

教学大纲
第一章 生物信息学入门 (9学时) 1. 生物学中的数学问题(computational problems in biology)(3学时, 3月2日) 第二章 序列和结构 (15学时) 1. 序列比对(sequence alignment)(3学时,3月9日) 第一章 生物信息学入门 (9学时) 2. 数据库原理、PHP编程入门(3学时:3学时上机,3月16日) 3. R语言和Bioconductor软件包(3学时:3学时上机, 3月23日) 第二章 序列和结构 (15学时) 2. 进化树(phylogenetic trees)(1.5学时,3月30日) 3。模式发现(motif discovery)(1.5学时,3月30日) 4. RNA二级结构(RNA secondary structure)(3学时,4月6日,王秀杰) 5. 蛋白质结构分析(protein structure analysis)(6学时,4月13日,蒋太交) 第三章 从芯片数据到基因调控网络 (15) 3.1 生物芯片设计(microarray design)(1学时, 4月27日) 3.2 表达值计算(summation of expression value)(1学时, 4月27日) 3.3 归一化(normalization)(1学时, 4月27日) 3.4 差异基因的分析(differential gene expression)(3学时, 5月4日) 3.5 聚类分析(clustering)(3学时, 5月11日) 3.6 网络入门(introduction to networks)(3学时, 5月18日)) 3.7 贝叶斯网络等…(Basian networks and others…)(3学时, 5月25日)王秀杰)

参考书
教材: 本课程以科研文献阅读为主,没有特定教材。 主要参考书: 1. 简明生物信息学 钟扬, 张亮,赵琼主编 高等教育出版社 2001 2. 常用生物数据分析软件 王俊,丛丽娟,郑洪坤著 科学出版社 2008 3. Bioinformatics: sequence and genome analysis David W. Mount New York : Cold Spring Harbor Laboratory, 2004

Features of my lectures
Enlightening Interactive Interesting English 启发式 互动式 有趣的 (半)英语的

第一章 生物信息学入门
1. 生物学中的数学问题 (computational problems in biology)

Outlines
1. What is bioinformatics? 2. Basic knowledges 3. Mathematical problems in biological researches: From Mendel to nowadays!

Bioinformatics—what is it?
? What is a triangle? ? What is human beings? Plato’s definition ? What is bioinformatics? Biology—subject Computer--tool Mathematics—Model
It is what you are doing for solving biological problems using a computer !!!

Bioinformatics in the Universe
biostatistics bioinformatics Computational biology

biology

mathematics

physics

Natural sciences

Social sciences

sciences

arts

religions

Human civilization

Non-human world

Universe (宇宙=空间+时间)

What do you mean by biology?
? ? ? ? ? ? Taxonomy Physiology Evolution Cell biology Genetics Molecular biology---DNA, RNA, Protein

How about computer?
yes PC, Server Internet Website FTP Telnet PC Unix/Linux C, Perl, PHP, JAVA, .NET Database
no Quantum computer, DNA computer

TCP/IP
Electronic business P2P

Hacker
Apple Chinese version compiler

Spread sheet

And mathematics?
? Object(Subject): Mathematics is the study of quantity (arithmetic,算术), structure (algebra, 代数), space (geometry,几何), and change (calculus , 微积分). Pure mathmatics vs Applied mathematics
Goldbach Conjecture vs Statistics

?

? How does does mathematics work?
Definition, axiom, statement

Reasoning (proof)

theorem (truth, knowledge)

Outlines
1. What is bioinformatics? 2. Basic knowledges 3. Mathematical problems in biological researches: From Mendel to nowadays!

Definitions, notions, terminology

Sets
? ? ? ? A set is a group of objects. Elements/members A={7, 21, 57} 7∈A,8

Objects, classes, interactions

Laws of Thought
? 1.Law of identity: 'Whatever is, is.' ? 2.Law of noncontradiction: 'Nothing can both be and not be.' ? 3.Law of excluded middle: 'Everything must either be or not be.'

Reasoning, Logic, Argument
? Reasoning is the cognitive process of looking for reasons, beliefs, conclusions, actions or feelings. ? Logic is the study of reasoning. ? An argument is a set of one or more meaningful declarative sentences (or "propositions") known as the premises along with another meaningful declarative sentence (or "proposition") known as the conclusion. ? One approach to the study of reasoning is to identify various forms of reasoning that may be used to support or justify conclusions. The main division between forms of reasoning that is made in philosophy is between deductive reasoning and inductive reasoning. Formal logic has been described as "the science of deduction". The study of inductive reasoning is generally carried out within the field known as informal logic or critical thinking.

Deductive reasoning
? Premise 1: All humans are mortal. ? Premise 2: Socrates is a human. ? Conclusion: Socrates is mortal.

Inductive reasoning
? Premise: The sun has risen in the east every morning up until now. ? Conclusion: The sun will also rise in the east tomorrow.

Statistical inference
? Statistical inference is the process of making conclusions using data that is subject to random variation, for example, observational errors or sampling variation.

Outlines
1. What is bioinformatics? 2. Basic knowledges 3. Mathematical problems in biological researches: From Mendel to nowadays!

Biological Story 1
Medel’s Laws

Medel’s Law of Segregation The "First Law"

?Binary phenotype ?Dominance ?Gametes ?Statistics ?Combination

When any individual produces gametes, the copies of a gene separate so that each gamete receives only one copy.

Medel’s Law of Independent Assortment The "Second Law"

Alleles of different genes assort independently of one another during gamete formation.

Computational Problems

Combinatorial principles 组合原理
?Rule of sum (加法原理) ?Rule of product (乘法原理)

More about Mendel’s Laws
? Gregor Johann Mendel, a 19th century Austrian Priest/monk ? Trained as physicist and majority of his published works related to meteorology. ? Between 1856 and 1863, he cultivated and tested some 29,000 pea plants. ? published in 1866. ? In 1900,re-discovered by three European scientists, Hugo de Vries, Carl Correns, and Erich von Tschermak. ? William Bateson, who coined the term "genetics", "gene", and "allele" to describe many of its tenets. ? Very few true Mendelian characters in nature. ? R.A. Fisher ? Thomas Hunt Morgan, Chromosome, classic genetics

Biological Story 2
Hardy-Weinberg Law

Hardy-Weinberg Law (1908)
? ? ? P(A1) = p, P(A2) = q; Random mating P(A1A1) = p2, P(A1A2) = 2pq, P(A2A2) = q2;
??? Mating Father A1A1 ? Mother A1A1 ? frequency p11p11 ? offspring genotype A1A1 1 ? A1A2 0 ? A2A2 0 ?

Parent generation: ? P(A1A1) = p11, P(A1A2) = p12, P(A2A2) = p22; ? p11 + p12 + p22 = 1; ? P(A1) = p=p11+0.5p12, P(A2) = q=p22+0.5p12; ? p + q = 1;

Thomas Hunt Morgan
? September 25, 1866 – December 4, 1945 ? American evolutionary biologist, geneticist and embryologist ? Nobel Prize in Physiology or Medicine in 1933 for discoveries relating the role the chromosome plays in heredity ? 22 books and 370 scientific papers ? The Division of Biology he established at the California Institute of Technology has produced seven Nobel Prize winners.

Biological Story 3
Linear arrangement of genes

First Genetic Map
? ? Alfred Sturtevant (1891-1970) Undergraduate ? ? 1 mu = 1 cM = 1% = 0.88 Mbp Crossover vs recombination

Isn’t there a problem using recombination rate as a measure of distance?
A 0.1 B 0.2 q <= 0.5 C 0.3 D

? Maximum recombination rate (q) is 0.5 ? Distance (l) should be defined as the expected number of crossovers.

q = ∑f(2j+1) Haldane map function: l = -0.5ln(1-2q)

Probability distributions
Normal distribution

binomial distribution

Poisson distribution

Genetic linkage and genetic epidemiology

1.Establish a pedigree 2.Make a number of estimates of recombination frequency 3.Calculate a LOD score for each estimate 4.The estimate with the highest LOD score will be considered the best estimate 5. By convention, a LOD score greater than 3.0 is considered evidence for linkage.

Maximum likelihood
? Maximum likelihood estimation (MLE) is a popular statistical method used for fitting a statistical model to data, and providing estimates for the model's parameters.

Biological Story 4
Association Studies

Association Studies
? An association exists between two characteristics if they occur more often than would be expected by chance in the same individual. ? Disease and allele ? Linkage vs association

Case-contro design
a Cases Controls Total O11 O21
E11 ?
E21 ?

A
R1C1 T
R2C1 T

Total
E12 ? R1C2 T

O12 O22

R1 =O11+O12 R2 =O21+O12 T

E22 ?

R2C2 T

C1 =O11+O21

C2 =O12+O22

Hypothesis test? What is it?
? H0: ? H1= H0 ? P=0.05
H0 P>=0.05 H1

P<0.05

Type I and Type II errors

Type I errors

Type II errors

Sensitivity & Specificity…

Biological Story 5
Sequence alignment

Do you think it is easy?

A T T C GGC A T T CA GT GC T A GA A T T C G G C A T T GC T A G A

(n ? m)! total number of alignments ? m! n!

Many ways…
? Dot matrix analysis ? Dynamic programming ? Word or k-tuple methods (FASTA, BLAST)
S E Q U E N C E A N A L Y S I S P R I M E R ? ? ? ? ? ? ? ? ? ? ? ? ? S ? E ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Q U E ? N C E ? A N A L Y S ? I S ? P R I M E ? R

Algorithms
? In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function.

Phylogeny reconstruction
Distance Based ? Clustering, UPGMA (unweighted pair-group method using arithmetic averages) Character Based ? Maximum parsimony ? Maximum likelihood
You will have a good understanding of different algorithms in this field.

Story 6
Genomics and all other omics…

Where are all the data from?
? ? ? ? High throughput sequencing EST SAGE microarray

Genomics Defined
Traditional Biology: Single gene study Knockout by homologous recombination: Genomic Biology: Genome wide gene study Knockout by gene trapping:

From cloning to function 50% of the mouse genes knocked study, it takes 2~3 years in out in ES cells; 1000 mutant mice a traditional lab. are evaluated every year in Lexicon.

Reverse Genetics
Genetics: From function to gene
From disease condition to gene cloning to drug development

Reverse Genetics: From gene to function
Function of majority of the 30,000 human genes remains unknown.

Overexpression Anti-sense RNA Interfering RNA (knock down) Mouse gene knockout

High Throughput Technologies
Traditional techs: High throughput techs:

?Radioactive sequencing ?Southern/Northern/Western blotting ?Manual work ?Notebook and spreadsheet

?Automatic sequencing ?DNA chip ?Electronic & robotic technology ?Computer and database

Community Effort
Cottage work: Community effort:

?Isolated laboratories ?Publication ?Meeting ?Independent fund application

?Societies/organizations ?Databases ?Internet ?Organized huge fund application

Cross-talking Between Multiple Disciplines
Single discipline Multiple disciplines

?Biology ?Medicine ?Chemistry

?Statistics/Mathematics ?Computer sciences ?Electronics ?Mechanics ?Automatic control ?Social/Legal

Era of Omics
? ? ? ? ? ? ? ? ? Genomics Comparative Genomics Structural Genomics Functional Genomics Pharmacogenomics Transcriptomics Proteomics Metabolomics Phenomics
Life is a system of high complexity

Database technology

Programming…
? Perl/PHP ? HTML ? XML

Story 7
Microarray

Array, matrix…

Data mining
? ? ? ? Clustering Association Predictive modeling Anomaly detection

Networks
? Graphs ? Trees ? Bayesian networks

That is all for today!
Thanks for your attention!


相关文章:
中科院生物信息学
中科院生物信息学_生物学_自然科学_专业资料。陈润生生物信息学今日推荐 四季养生 中医养生与保健 中医养生知识大全 女人养生之道68份文档 新...
中科院生物信息学题目整理
搜 试试 7 帮助 全部 DOC PPT TXT PDF XLS 百度文库 专业资料 自然科学 ...中科院生物信息学题目整理_生物学_自然科学_专业资料。陈润生考试整理生物...
中科院生物信息学期末考试复习题
搜试试 3 帮助 全部 DOC PPT TXT PDF XLS 百度文库 教育专区 高等教育 教育...(一维),而系统生物学是在二维的角度研究生命科学, 即:相互作用→网络→功能, ...
论中国生物信息学教育的开展
中国基因组生物信息学回顾... 9页 2财富值 生物信息学课件(中国科学院... ...论中国生物信息学教育的开展 摘要: 生物信息学作为 21 世纪的新兴核心学科近年来...
生物信息学
搜 试试 帮助 全部 DOC PPT TXT PDF XLS 百度文库 专业资料 自然科学 生物...如欧 洲分子生物学网络组织 EMBNet 中国节点北京大学分子生物信息镜像系统,上海...
生物信息学入门知识
搜试试 7 帮助 全部 DOC PPT TXT PDF XLS 百度文库 专业资料 自然科学 生物...生物信息学专题:http://www.biosino.org/bioinformatics/bioinfo.htm 中国科学院...
生物信息学
搜 试试 7 帮助 全部 DOC PPT TXT PDF XLS ...2012/8/10 1 生物信息学 生命科学与工程学院 赵...Adjunct Professor、中国科学院基因遗传研究所客座教授...
中科院生物信息学复习题
搜 试试 7 帮助 全部 DOC PPT TXT PDF XLS 百度文库 教育专区 高等教育 ...系统生物学对生命科学概念上的发展?答:系统生物学是指在系统的层面上研究生命...
生物信息学
搜 试试 7 帮助 全部 DOC PPT TXT PDF XLS ...生物信息学介绍 生物信息学的现状与展望(The Current...中国科学院院士 张春霆 (天津大学生命科学与工程研究...
生物信息数据库
搜试试 3 帮助 全部 DOC PPT TXT PDF XLS ...生物信息数据库_生物学_自然科学_专业资料。生物信息...中国科学院上海生命科学研究院生物信息中心网站维护我...
更多相关标签:
生物信息学课件 | 中国科学院大学 | 中国科学院 | 中国科学院研究生院 | 中国科学院邮件系统 | 中国科学院海洋研究所 | 中国科学院院士 | 中国科学院物理研究所 |