当前位置:首页 >> 信息与通信 >>

QTL作图的基本原理和完备区间作图方法


第六届“QTL作图和育种模拟研讨会”,2010年4月19-21日,湖北武汉

QTL作图的基本原理和 完备区间作图方法
李慧慧 Institute of Crop Science, Chinese Academy of Agricultural Sciences lihuihui@caas.net.cn

OUTLINES
? QTL作图所需数据 ? 标记数据和连锁图谱构建 ? QTL作图的基本原理 ? 数量性状基因的完备区间作图方法 (ICIM) ? ICIM在实际作图群体中的应用

WHAT IS QTL MAPPING?
The procedure to map individual genetic factors with small effects on the quantitative traits, to specific chromosomal segments in the genome is called QTL mapping. The key questions in QTL mapping studies are: How many QTL are there? Where are they in the marker map? How large an influence does each of them have on the trait of interest?

DATASET OF QTL MAPPING Mapping population Linkage map Marker genotype Phenotypic data

QTL作图群体
F2群体 (张鲁燕专门讲解) 回交 (BC, backcross) 群体 加倍单倍体 (DH, doubled haploids) 群体 重组近交家系 (RIL, recombination inbred lines) 群体 导入系(染色体片断置换系) 自然群体

作图群体的分类
按基因型是否纯合分
暂时群体(Temporary population) 永久群体(Permanent population) 自然群体(Natural population)

按群体间的亲缘关系
初级作图群体(Primary mapping population) 次级作图群体(Secondary mapping population)

回交群体和DH群体中的期望基因型频率

BC1 M1M1M2M2 M1M1M2m2 M1m1M2M2 M1m1M2m2

BC2 M1m1M2m2 M1m1m2m2 m1m1M2m2 m1m1m2m2

DH 群体 M1M1M2M2 m1m1M2M2 m1m1M2M2 m1m1m2m2

观测次数 n1 n2 n3 n4

理论频率 f1= 1 (1-r) 2 f2= 1 r 2 f3= 1 r 2 f4= 1 (1-r) 2

重组率的极大似然估计
建立似然函数

建立对数似然函数 求解重组率的极大似然估计

求信息量 应用估计公式求重组率的估计值和它的方差





某回交试验中 P1和P2的基因型分别为AABB和aabb 回交BC1世代中4种基因型的植株数 AABB:162;AABb:40; AaBB:41;AaBb:158

3个标记间的重组率



(即两个区间上的交换是独立的)时,有 或

当时 (即完全干涉,一个区间上的交换完 全阻止另外一个区间上的交换),有

作图函数
图距(Mapping distance)

图距的单位:摩尔根(M, Morgan)或厘 摩(cM,centi-Morgan), 1M=100cM 图距m是交换率r的函数,即: 为作图函数(Mapping function)。 ,称f

常见作图函数
? Morgan 作图函数
以M为单位 m =r (M) 以cM为单位 m =r ×100 (cM)

? Haldane 作图函数 没有考虑干涉的情况下,即M1-M2间的
交换和M2-M3间的交换相互独立 以M为单位 以cM为单位 m = f (r ) = ?50 ln(1 ? 2r )

r = 1 (1 ? e ? 2 m ) 2
r = 1 (1 ? e ? m / 50 ) 2

? Kosambi作图函数 考虑干涉的情况下,即M1-M2间的交换
和M2-M3间的交换不独立,干涉系数应重组率的函数 1 + 2r 以M为单位 m = f (r ) = 1 ln 1 ? 2r r = 1 e ? 1 4 2
4m

以cM为单位

m = 25 ln

1 + 2r 1 ? 2r

r=

?1 1e 2 em / 25 + 1

e m /+ 1 25
4m

三种作图函数的比较

不同物种的遗传图距和物理图距间的关系
物种 酵母(Yeast) Neurospora Arabidopsis Drosophila 西红柿(Tomato) 人类(Human) 小麦(Wheat) 水稻(Rice) 玉米(Corn) 单倍体基因组大小(kb) 遗传图谱的长度(cM) 碱基对(kb)/cM 2.2×10 4 4.2×10 4 7.0×10 5 2.0×10 5 7.2×10 6 3.0×10 7 1.6×10 5 4.4×10 6 3.0×10
4

3700 500 500 290 1400 2710 2575 1575 1400

6 80 140 700 510 1110 6214 279 2140

EXAMPLE: 10 RILS OF RICE (LINKAGE MAP OF CHR. 5 )
Marker C263 R830 R3166 XNpb387 R569 R1553 C128 C1402 XNpb81 C246 R2953 C1447 Grain width (mm)

Position (cM) RIL1 RIL2 RIL3 RIL4 RIL5 RIL6 RIL7 RIL8 RIL9 RIL10

0.0 1 2 1 1 1 1 1 2 1 1

3.5 1 2 2 1 1 1 1 2 1 1

8.5 1 2 2 1 1 1 1 1 1 1

19.5 1 2 2 1 1 2 1 2 1 1

32.0 66.6 1 2 2 1 1 2 1 2 2 2 1 1 2 1 2 2 1 1 2 2

74.1 78.6 1 1 2 2 2 2 1 1 1 1 1 1 2 2 1 2 1 1 1 1

81.8 1 1 2 2 1 2 1 1 1 1

91.9 92.7 1 2 2 2 1 2 1 2 1 1 1 2 2 2 1 2 1 2 1 1

96.8 1 2 2 2 1 2 1 2 1 1 2.33 1.99 2.24 1.94 2.76 2.32 2.32 2.08 2.24 2.45
?15

QTL作图的基本原理
一个标记位点上3种基因型的性状平均数

?16

P1:MMQQ×P2:MMQQ回交群体中标记位点M与 数量性状基因位点Q的基因型及其频率和基因型值
BC1 基因型 MMQQ MMQq MmQQ MmQq
1 2

BC2 基因型值 m+a m+d m+a m+d 基因型 MmQq Mmqq mmQq mmqq
1 2

基因型频率
1 2

基因型频率
1 2

基因型值 m+d m-a m+d m-a

(1 ? r )
1 2

(1 ? r )
1 2

r r

r r

1 2

1 2

(1 ? r )

(1 ? r )

单标记基因型均值差异分析原理
两种标记基因型:
μ MM = (1 ? r ) μ MMQQ + rμ MMQq
= (1 ? r )( m + a ) + r ( m + d ) = m + (1 ? r ) a + rd

μ Mm = rμ MmQQ + (1 ? r ) μ MmQq

= r ( m + a ) + (1 ? r )( m + d ) = m + ra + (1 ? r ) d

两种标记基因型的平均值差异
μ MM ? μ Mm = (1 ? 2r )(a ? d )

单标记分析中的假设测验
亚群体(Sub-populations) t-统计量的计算
t= ? ? μ1 ? μ 2 se2 se2 + df1 df 2
2 df1 × s12 + df 2 × s2 se2 = df1 + df 2

2 s12和 df1 分别代表第一个亚群体的均方和自由度,2 s

和 df 2分别代表第二个亚群体的均方和自由度,t测 验的自由度为 df = df1 + df 2 。

INTERVAL MAPPING (IM) (LANDER AND BOTSTEIN 1989)
线性模型(j=1,2,…,n )

yi = b0 + b* x * + e j j
b*表示QTL的效应,

x

* j 为取值0和1的指示变量

区间测验 (Interval test) 似然曲线 (Likelihood profile)

回交群体区间作图方 法中指示变量的取值
区间标记型 标记型* 左侧标记 i 1 2 3 4 + + 右侧标记 i+1 + + 样本量

p=

riq ri ( i +1)

x*

n1

1 取 1 的概率为 1-p;取 0 的概率为 p 取 1 的概率为 p;取 0 的概率为 1-p 0

n2
n3

n4

回交群体中的区间标记型和QTL基因型

Additive genetic model and the derived statistical model for mapping additive QTL

The expectation of the genotypic value G conditional on known marker types can be written as a linear function of marker variables

are where b1 = λ1a1, b j = ρ j ?1a j ?1 + λ j a j (j=2, …, m), and bm +1 = ρ m am; and functions of the three recombination fractions between the jth marker and the jth QTL, between the jth QTL and (j+1)th marker, and between the jth and (j+1)th markers.

Linear regression model

表型对标记线性回归模型的性质
假定不同QTL间的效应是可加的,偏回归系数只依 赖于两个相邻标记所标定区间上的QTL.

模型中加入非连锁标记,能有效控制剩余遗传方差, 从而降低统计量的抽样方差,提高QTL的检测功效. 模型中的连锁标记可以降低连锁QTL对检验统计量 的影响. 模型中的两个标记的偏回归系数是不相关的。

Composite Interval Mapping (CIM;Zeng 1994)
? Linear regression model:

y j = b0 + b* x * + j
? Hypotheses:

k ≠ i ,i +1

∑b

?

k

x jk + e j
?

Likelihood function under the null hypothesis:
L0 = ∏ f j (0)
j =1 n

H 0 : b * = 0 vs. H 1 : b * ≠ 0
? Likelihood function under the alternative hypothesis:
L1 = ∏ [ p j (1) f j (1) + p j (0) f j (0)]
j =1 n

Parameters estimation:
? B H 0 = (X' X)X' Y

2 ? ? σ H = [( Y ? XB H )' (Y ? XB H )] / n
0 0 0

?

?

Parameters estimation:
? ? ? B = (X' X)X' (Y ? P b * )

? ? ? ? b * = ( Y ? XB )' P / c

Likelihood ratio test: L n→ LR = ?2 ln( 0 ) ??∞ → χ 2 (1) ? L1 n
? c= ? ∑P
j =1 j

? ? ?? σ 2 = [( Y ? XB)' ( Y ? XB ) ? cb * ] / n
? Pj = ? p j (1) f j (1) ? ? p j (1) f j (1) + p j (0) f j (0)
n1 + n2 j = n1 +1 n +n +n ? ? (1 ? Pj ) + ∑ j1= n 2 n 3 1 (1 ? Pj ) + +
1 2

∑ ? p=

n2 + n3

CIM combines IM with multiple marker regression analysis, which controls the effects of QTLs on other intervals or chromosomes onto the QTL that is being tested, and thus increases the precision of QTL detection.

Other QTL Mapping Methods
Multiple interval mapping (MIM; Kao et al. 1999)
Gi = μ + ∑ a j xij + ∑ w jk (xij xik )
m m j =1 j <k

i = 1,K ,2 m

MIM is the state-of-the-art gene mapping procedure. But: more complex and difficult to implement much larger sample size time consuming Bayesian model (Sillanp?? and Corander 2002) complexity of computation lack of user-friendly software. time consuming no parameter estimation as markers are densely distributed (127 markers, 145 sample size; DH (Xu and Jia 2007)) .

Statistical Methods in QTL Mapping
Frequency statistics ? ANOVA ? Mixed model (REML, BLUP, MINQUE) ? Regression (forward, backward, stepwise) ? Maximum likelihood (EM algorithm) Bayesian statistics ? Hierarchical (SSVS, MCMC, reversible jump MCMC, shrinkage estimation) ?Note:Xu and Jia (Genetics:2007) found that most Bayesian models failed for a barley population (145DHs, 127markers) ? Empirical (mixed model, BLUP)

Which method was commonly used?
IM gives biased estimation of locations and effects of linked QTL

CIM is the commonly used method due to: ? its idea – it controls the effects of QTL on other intervals or chromosomes and thus increases the precision of QTL detection ? its user friendly software – Windows QTL Cartographer

PROBLEMS WHEN USING CIM
CIM may increase the sampling variance compared to IM and thus decrease the mapping power.

Different background marker selection methods may give very different mapping results and which model selection methods should be used is not clear.

PROBLEMS WHEN USING CIM

CIM is not extended to epistasis mapping (Zeng et al. 1999)

Problems with the algorithm in CIM
We found that in Zeng’s algorithm, both QTL effect at the current testing position and regression coefficients of the marker variables used to control genetic background were estimated simultaneously in an expectation and maximization (EM) algorithm. Thus, this algorithm could not completely ensure that the effect of QTL at current testing interval was not absorbed by the background marker variables and therefore may result in biased estimation of the QTL effect.

A modified algorithm for the improvement of CIM----ICIM (Inclusive CIM)
Using all markers in linear regression model

Adjusting the observation values by
note: The adjusted observation moves into a new interval. does not change until the testing position

One dimensional scanning

LA = ∑ ln f (Δyi ; μ1 ,σ ) +
2 i =1

n1

n1 + n 2

i = n1 +1

ln (1 ? p ) f (Δyi ; μ1 ,σ 2 ) + pf (Δyi ; μ 2 ,σ 2 ) + ∑
2

[

]

n1 + n 2 + n3

i = n1 + n 2 +1

∑ ln pf (Δyi ; μ1 ,σ ) + (1 ? p) f (Δyi ; μ 2 ,σ ) +
2

[

]

i = n1 + n 2 + n3 +1

ln f (Δyi ; μ 2 , σ 2 ) ∑

n

SIMULATION STUDY
In the context of QTL experiments, the idea is to simulate a set of QTL with known genetic locations and effects in a segregating population and then evaluate if the QTL can be consistently identified among independent samples from the population.

Comparison of ICIM and CIM
Mean performance across 100 simulation runs
The simulated genome consisted of 6 chromosomes, each of 150 cM in length and with 16 evenly distributed markers.

Power analysis
? Power was calculated as the proportion of runs that detected the presence of QTL for each of the 90 intervals defined by 96 markers evenly distributed on 6 chromosomes.

? Power was calculated as the proportion of runs that detected QTL within the interval defined as 5 cM from each side of the predefined QTL. The 10 putative QTLs were rearranged in the ascending order by the percentage of variance explained by each QTL.

Comparison of ICIM and Bayesian Model
Mapping results from ICIM and stochastic search variable selection (SSVS; George and McMulloch 1995; Yi et al. 2003)
The simulated genome consisted of three chromosomes, each with 100 cM in length and 11 evenly distributed markers (Yi et al. 2003).

Comparison of ICIM and Bayesian Model
Mapping results from ICIM and an empirical Bayesian model (Xu 2007)
224 RILs using 132 RFLP markers and the linkage map of a total length of 2250 cM with an average density of 17 cM (Fracheboud 2002).

ICIM for epistasis mapping
LODA LODAA

Comparison of ICIM with MIM

chr 1

ICIM定位加性 不明显的QTL 间的互作
?The simulated genome (Genome II) consisted of three chromosomes, each with 100 cM in length and 11 evenly distributed markers. For this genome, we considered two genetic models corresponding to Set I (VA=0.375, VI=0.375 and H=0.6; left column) and Set III (VA=0, VI=0.375 and H=0.3; right column) of Boer et al. (2002), respectively.

?A
80 chr 3 60 40
QB3

?D
80 chr 3 60 40
QB3

20 0

20 0 80

Testing position

80 chr 2
QB2

40 20 0 80 60
QB1

QB2

?LODQB2×QB3 =20.52 ?LODQB1×QB3 =20.54 ?LODQB1×QB2 =19.28
QB1

40 20 0 80

chr 2

60

60

Testing position

?LODQB2×QB3 =2.28 ?LODQB1×QB3 =2.32

40 20 0

40 20 0

chr 1

60

?LODQB1×QB2 =8.27

15 10 5 0

80 chr 3 60 40
QB3

40
QB3

20 0

20 0 80

chr 3

80 chr 2
QB2

Testing position

40 20 0 80 60
QB1

?LODQB2×QB3 =3.01 ?LODQB1×QB3 =2.66 ?LODQB1×QB2 =12.79

QB2

chr 2

60

60 40 20 0 80

Testing position

chr 1

chr 1

40 20 0

QB1

0.4 0.3 0.2 0.1 0

?C
80 chr 3 60 40 20 0 80 chr 2 60 40 20 0 80 chr 1 60 40 20
0.5 80 40
QB3

QB3

20 80

Testing position

?aaQB2×QB3 =0.18?(0.25) ?aaQB1×QB3 =‐0.17?(‐0.25) ?aaQB1×QB2 =‐0.45?(‐0.50)

QB2

chr 2

QB2

60 40 20 0 80 40 20 0 0.25 0 60

Testing position

0

chr 3

chr 1

QB1

QB1

0 0.5 0.25 0

LOD score

Additive effect

LOD score

15 10 5 0

LOD score

?B

0.4 0.3 0.2 0.1 0

?E
80 60

LOD score

?LODQB2×QB3 =1.69 ?LODQB1×QB3 =1.78 ?LODQB1×QB2 =7.75

60 40 20 0

?F
60

?aaQB2×QB3 =0.17?(0.25) ?aaQB1×QB3 =‐0.18?(‐0.25) ?aaQB1×QB2 =‐0.43?(‐0.50)

Additive effect

APPLICATIONS OF ICIM

Application I: The famous barley DH population
This barley population was derived from a two-row barley (Hordeum vulgare L.) cross, Harrington ×TR306, and consists of 145 DH lines (Tinker et al. 1996) Markers: 127 markers was used to build a linkage map with the average density 10.62 cM. Phenotype: Phenotypic data for seven agronomic traits were collected in 1992 and/or 1993 at 17 locations. The average kernel weight (KWT) across 25 enviroment was used as the phenotypic data for QTL mapping.

ADDITIVE MAPPING RESULTS
Scanning step 1cM

R^2=80.76%

Nine additive QTL identified by ICIM (PIN=0.01, POUT=0.02)
QTL name qKWT2H-1 qKWT2H-2 qKWT2H-3 qKWT3H-1 qKWT3H-2 qKWT4H qKWT5H qKWT7H-1 qKWT7H-2 Position (cM) 83 140 201 1 22 125 5 4 95 LOD score 4.60 7.23 5.59 4.39 7.41 4.12 34.28 8.27 19.81 Additive effect (mg) 0.39 ?0.51 0.43 ?0.39 0.51 ?0.37 ?1.37 ?0.55 ?0.92 PVE (%) 3.13 5.34 3.77 3.04 5.33 2.73 38.37 6.07 17.20 80.76

Total variation explained (%)

KWT for Harrington is 38.7 mg, while TR306 45.0 mg.

DIGENIC INTERACTIONS
Scanning step 5cM

Application II: A maize RIL population
A maize population of 236 recombinant inbred lines (RILs) was developed by crossing the drought-tolerant tropical maize line CML444 with the drought-susceptible tropical maize line SC-Malawi. Markers: The genetic map with a total length of 2250 cM consisted of the allelic information of 160 molecular marker loci (81 SSRs and 79 RFLPs) with an average density of 17 cM (Fracheboud 2002). Phenotype: The RILs were grown and phenotyped in a total of eleven field experiments in Mexico and Zimbabwe, either under drought stress at flowering or under adequate water supply in the rain-fed experiments. Female flowering (FFLW) as the number of days from sowing to the first visible silk was used as the phenotypic data

Additive Mapping Results From ICIM

Epistasis from ICIM for silk flowering time

TL: Tlaltizapán, Mexico; ZW: Zimbabwe; 03: In 2003; 04: In 2004; A: Breeding cycle A; B: Breeding cycle B; WW: well-watered; IS: Intermediate Water Stress; SS: Severe Water Stress

Publications using ICIM-----Rice
Theor Appl Genet (2006) 112: 1258-1270 66 CSSLs; 116 markers; grain length Plant Cell Report (2009) 28: 247-256 139 CSSLs; 117 markers; mature seed culturability Crop Science (2008) 48: 1799-1806 71 RIL; 375 markers; tiller angle Hereditas (2009) 146: 67-73 180 F2:3;117 markers; brown planthopper resistence

Publications using ICIM-----Wheat
Euphytica (2009) 165: 435-444
240 RILs; 188 markers; flour and noodle color components and yellow pigment content

Publications using ICIM----Soybeans
Breeding Science (2008) 58: 355-359 225 F2; 9 markers on linkage group N; salt tolerance

ICIM多群体联合作图
巢式关联作图群体 NAM) (Nested Association Mapping;

Maize Diversity Project in US; Cornell University; Buckler Lab
模型

Y = b0 + αu + Xβ + ε

Mapping Results From ICIM

Vgt1

Blue: days to silk; Red: days to anthesis; Green: anthesis silking interval (ASI). The vertical lines indicate the breaks between the chromosomes.

ICIM 的作图结果
检测到 52 个QTL控制吐丝期(day to silk); 共解释表型变异的79%.

Candidate gene association mapping for Vgt1 region Estimated DS effects and standard
errors for the Vgt1 region of chromosome 8. Estimates are relative to B73 allele flowering. The blue alleles have the MITE at Vgt1 (Mo17 also scored at the same time carries the polymorphism and equivalent effect). A simple t-test of founder effect estimate for MITE versus non-MITE was highly significant (P=2×10-8). The red alleles carry polymorphisms at Vgt1 target gene Rap2.7, which is also significantly different (P=7×10-4).

Joint linkage and association mapping Genome-wide association mapping

CONCLUSIONS
ICIM has a simpler form and faster convergence speed (EM algorithm converges after 3 to 5 iterations), without losing the optimal properties associated with CIM. ICIM does not increase the sampling variance compared to IM and thus improve the mapping power. ICIM gives clearly high LOD scores at chromosomal regions with QTLs but rather low LOD scores where no QTL is located, and results in less biased estimates of QTL effects, thereby improves the mapping power and precision. ICIM is relatively robust to mapping parameters ICIM can be easily extended to map QTLs with digenic epistasis compared with CIM.

QTL IciMapping
http://www.isbreeding.net/software.html ?ICIM additive and epistasis mapping ?Power analysis for ?confidence interval of QTL ?marker intervals ?Library of mapping populations (both real and simulated for a range of genetic models) ?QTL mapping for non-idealized chromosome segment substitution lines

Thank you for your attention!


相关文章:
采用BSA法发掘小麦成株抗锈性QTL
基因连锁的标记,明确抗 病基因的位置,为基因的分子定位作图及精细定位奠定基础。...F2:3 家系群体进行标记筛选和群体基因型检测,利用完备区间 作图法进行 QTL ...
利用连锁和关联分析定位粳稻芽期及幼苗前期耐盐性QTL
分子标记 辅助选择和耐盐性种质创新提供理论...完备区间作图法 (ICIM)进行QTL定位,取LOD=2.5为QTL...作图群体及作图方法,从而获得更加 完整、精确的QTL....
贝叶斯统计在QTL作图中的应用概述
链蒙特卡罗理论的遗传连锁图谱构建,但有关贝叶斯统计方法在 QTL 作图中的 应用的...[4] 王建康.数量性状基因的完备区间作图方法[J].作物学报,2009,35(2) :239...
QTL定位方法之区间分析
因而, 区间作图法一度 成为 QTL 作图的标准方法。 但是, 区间作图法仍存在许多问题[7]:(1)检验区间连锁 QTL 会影响检验结果, 或者导致假阳性, 或者使 ...
QTL
QTL定位的原理和方法 128页 1下载券 作物QTL分析的理论研究进... 18页 免费...QTL作图主要统计方法及主... 7页 免费 QTL 9页 免费 四种不同QTL作图方法的...
小麦容重QTL定位
并获得重要位点连锁分子标记,以含有 302 个家 系重组自交系群体(RIL)WL 为材料,在 3 个环境中,用完备区间作图软件 IciMapping v3.0 对小麦容重 QTL ...
QTL基因定位技术-正式文件
作图群体(作图群体构建 F2, RIL, Backcross) 第二节 基因型数据(基因型数据...QTL 定位 (理论和上机操作) 第二节 QTL 定位基本方法原理(单标记,区间作图和...
辣椒种内遗传图谱的构建及主要农艺性状的QTL分析.
辣椒种内遗传图谱的构建及主要农艺性状的QTL分析._...(up)2个 的F6代重组自交系为作图群体,构建了一个...利用基于完 备区间作图方法(Inclusive Composite ...
辣椒种内遗传图谱的构建及主要农艺性状的QTL分析
结合植物学性状测量值,利用基于完备 区间作图方法(Inclusive Composite Interval ...前人利用种间作图群体研究中发现,果实相关性状QTL位点主要集中在第2、3、4、...
数量性状的分子标记(QTL定位的原理和方法讲义)
数量性状的分子标记(QTL 定位的原理和方法讲义 定位...95%置信区间(引自 Wu and Li 1996a) 联合定位法...QTL作图的基本原理和完备... 58页 免费 ©...
更多相关标签: