因果基因集精细定位 (运行中)

Xing Abao Lv3

因果基因集精细定位 (Fine-mapping Of CaUsal gene Sets, FOCUS),是一种基于概率框架的精细定位方法,该算法主要目的是从转录组广泛关联研究TWAS信号中准确识别因果基因、克服传统方法在连锁不平衡 (LD) 和多效性 SNP 影响下的局限性。FOCUS 通过对基因表达预测相关性结构的建模,能够有效提取与复杂性状和疾病相关的因果基因集合,为基因筛选提供高可信度的优先排序结果,具有重要的遗传学研究价值。

FOCUS 方法基于概率框架,通过整合 GWAS summary data、eQTL 预测权重以及 LD 信息,估算每组基因集合包含因果基因的后验概率,从而识别与复杂性状相关的关键因果基因。

具体内容可以阅读: Probabilistic fine-mapping of transcriptome-wide association studies. Nicholas Mancuso, Malika K. Freund, Ruth Johnson, Huwenbo Shi, Gleb Kichaev, Alexander Gusev, and Bogdan Pasaniuc. Nature Genetics 51, 675-682 (2019).

核心原理

连锁不平衡 (LD) 效应的修正,连锁不平衡是影响 TWAS 和 GWAS 分析准确性的关键因素。传统分析方法中,LD 效应常导致非因果基因与性状显著关联,从而产生虚假信号。FOCUS 通过引入 LD 结构模型,考虑基因表达预测之间的相关性,有效去除 LD 效应的干扰,精准识别因果基因。

基因表达预测权重的利用,FOCUS 方法依赖通过 eQTL 面板估算的基因表达预测权重,这些权重反映 SNP 与基因表达之间的关系。与传统 TWAS 方法仅依赖目标组织基因表达数据不同,FOCUS 通过使用其他组织的表达数据作为代理,确保在目标组织数据缺乏时,依然能够进行有效分析。

多效性效应的控制,多效性指某些 SNP 影响多个基因或性状的现象,可能导致虚假的基因-性状关联。FOCUS 方法通过建模不同基因间表达预测权重的差异,控制多效性带来的偏倚,从而提高因果基因识别的准确性。

概率框架与后验概率估计,FOCUS 采用贝叶斯概率框架,估算每组基因集合包含因果基因的后验概率,并生成具有指定置信度的基因集合。基于后验概率的精细定位方法为基因筛选提供强有力支持,特别适用于功能验证实验中的基因优先选择。

安装程序

1
2
3
git clone https://github.com/mancusolab/ma-focus.git
cd ma-focus
/tools/Python-3.8.3/bin/pip install -i https://pypi.tuna.tsinghua.edu.cn/simple .
1
2
3
4
5
6
7
8
9
10
# 测试安装
$ /tools/Python-3.8.3/bin/focus
usage: focus [-h] {munge,finemap,import} ...

positional arguments:
{munge,finemap,import}
Subcommands: munge to clean up summary statistics. finemap to perform run twas & finemap. import to import weights from existing databases.

optional arguments:
-h, --help show this help message and exit

快速上手

下载示例数据

https://www.mancusolab.com/ma-focus

下载后的文件用tar命令解压:

1
2
3
4
cd /home/hello/wkdir/ma-focus/test

tar -xzvf EA_WBC.Chen.2020.tar.gz
tar -xzvf 1000GP.MP.FOCUS.tar.gz

执行程序

1
2
3
4
5
6
7
8
9
# 20250502 10:52 测试通过
nohup /tools/Python-3.8.3/bin/focus finemap \
/home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz \
/home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22 \
/home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db \
--chr 22 \
--prior-prob "gencode38" \
--locations 38:EUR \
--out /home/hello/wkdir/ma-focus/test/focus_result > run_focus.log 2>&1 &
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# 日志记录
===================================
FOCUS v0.803
===================================
focus finemap
/home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz
/home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22
/home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db
--chr 22
--prior-prob gencode38
--locations 38:EUR
--out /home/hello/wkdir/ma-focus/test/focus_result

Starting log...
[2025-05-02 10:49:44 - INFO] Detecting 1 populations for fine-mapping.
[2025-05-02 10:49:44 - INFO] As a result, running single-population FOCUS.
[2025-05-02 10:49:44 - INFO] Preparing GWAS summary file for population at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz.
[2025-05-02 10:49:45 - INFO] Preparing reference SNP data for population at /home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22.
[2025-05-02 10:49:45 - INFO] Preparing weight database at /home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db.
[2025-05-02 10:49:45 - INFO] Preparing user-defined locations at 38:EUR.
[2025-05-02 10:49:45 - INFO] Found 1700 independent regions on the entire genome.
[2025-05-02 10:49:45 - INFO] 24 independent regions currently used after being filtered on chromosome, start, and stop.
[2025-05-02 10:49:45 - INFO] Preparing data at region 22:15927607-22:17193405. Skipping if following warning occurs.
[2025-05-02 10:49:45 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 10:49:45 - INFO] Using gencode file prior probability 0.020833333333333332.
[2025-05-02 10:49:47 - INFO] Fine-mapping starts at region 22:15927607-22:17193405.
[2025-05-02 10:49:47 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 22:15927607-22:17193405 will skip if following errors occur.
[2025-05-02 10:49:47 - INFO] Find 1 common genes to be fine-mapped at region 22:15927607-22:17193405.
[2025-05-02 10:49:47 - INFO] Running TWAS.
......
[2025-05-02 10:50:18 - INFO] Calculating PIPs.
[2025-05-02 10:50:18 - INFO] Completed fine-mapping at region 22:44599428-22:46074615.
[2025-05-02 10:50:18 - INFO] Preparing data at region 22:46074615-22:47200568. Skipping if following warning occurs.
[2025-05-02 10:50:18 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 10:50:19 - INFO] Using gencode file prior probability 0.05263157894736842.
[2025-05-02 10:50:19 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:46074615-22:47200568 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 10:50:19 - INFO] Preparing data at region 22:47200568-22:48507891. Skipping if following warning occurs.
[2025-05-02 10:50:19 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 10:50:19 - INFO] Using gencode file prior probability 0.06666666666666667.
[2025-05-02 10:50:19 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:47200568-22:48507891 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 10:50:19 - INFO] Preparing data at region 22:48507891-22:49430885. Skipping if following warning occurs.
[2025-05-02 10:50:19 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 10:50:19 - INFO] Using gencode file prior probability 0.14285714285714285.
[2025-05-02 10:50:19 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:48507891-22:49430885 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 10:50:19 - INFO] Preparing data at region 22:49430885-22:50804870. Skipping if following warning occurs.
[2025-05-02 10:50:19 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 10:50:19 - INFO] Using gencode file prior probability 0.013513513513513514.
[2025-05-02 10:50:19 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:49430885-22:50804870 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 10:50:19 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!
39.93user 37.82system 0:37.29elapsed 208%CPU (0avgtext+0avgdata 329936maxresident)k
6976inputs+136outputs (29major+360525minor)pagefaults 0swaps

结果解释

1
2
3
4
5
6
7
8
9
10
11
12
# 查看结果
$ head focus_result.focus.tsv | less -S
block ens_gene_id ens_tx_id mol_name tissue ref_name type chrom tx_start tx_stop block_genes trait inference_pop1 inter_z_pop1 cv.R2_pop1 cv.R2.pval_pop1 ldregion_pop1 twas_z_pop1 pips_pop1 >
22:15927607-22:17193405 NULL.MODEL NA NULL NA NA NULL 22 NA NA 48 trait NA NA NA NA 22:16051249-22:17189194 0 0.981 1
22:15927607-22:17193405 ENSG00000177663 NA IL17RA lcl genoa protein_coding 22 17036507 17110505 48 trait susie NA 0.17867932295751077 8.09833347920256e-18 22:16051249-22:17189194 0.449 0.0194 0
22:17193405-22:17813322 NULL.MODEL NA NULL NA NA NULL 22 NA NA 17 trait NA NA NA NA 22:17194860-22:17812824 0 0.893 1
22:17193405-22:17813322 ENSG00000015475 NA BID lcl genoa protein_coding 22 17734036 17774750 17 trait susie NA 0.08333185126953091 8.16760905637009e-09 22:17194860-22:17812824 -0.418 0.0552 1
22:17193405-22:17813322 ENSG00000177663 NA IL17RA lcl genoa protein_coding 22 17036507 17110505 17 trait susie NA 0.17867932295751077 8.09833347920256e-18 22:17194860-22:17812824 -0.119 0.0546 0
22:17813322-22:19924835 NULL.MODEL NA NULL NA NA NULL 22 NA NA 87 trait NA NA NA NA 22:17814148-22:19924538 0 0.695 1
22:17813322-22:19924835 ENSG00000015475 NA BID lcl genoa protein_coding 22 17734036 17774750 87 trait susie NA 0.08333185126953091 8.16760905637009e-09 22:17814148-22:19924538 -3.07 0.169 1
22:17813322-22:19924835 ENSG00000100075 NA SLC25A1 lcl genoa protein_coding 22 19175581 19179344 87 trait lasso NA 0.341638476663499 9.018936721488138e-36 22:17814148-22:19924538 -2.69 0.0698 1
22:17813322-22:19924835 ENSG00000215193 NA PEX26 lcl genoa protein_coding 22 18044078 18121769 87 trait blup NA 0.043113259891870426 3.149420886880096e-05 22:17814148-22:19924538 2.64 0.0576 1
1
2
3
4
5
6
7
8
9
10
11
12
13
# 结果解释
chrom : 分子特征所在的染色体编号,标识基因的位置
tx_start : 转录起始位点,指转录本开始转录的染色体位置
tx_stop : 转录终止位点,指转录本结束转录的染色体位置
block_genes : 在该区域内基因的数量,用于设定基因是因果基因的先验概率
inference_pop1 : 模型的推断方法,例如,LASSO、BSLMM 等;FOCUS 算法中可能会使用不同的推断方法来识别因果基因
inter_z_pop1 : 在回归去除平均标记的多效性关联时的 z 分数截距;如果截距为 False,该字段为 None;FOCUS 算法考虑到基因之间的相关性时,会使用这些信息进行修正
cv.R2_pop1 : 交叉验证的预测 R 平方值,表示模型对数据的拟合度,表明模型的解释能力
cv.R2.pval_pop1 : 交叉验证 R 平方的 p 值,用于测试模型的拟合是否显著;FOCUS 算法中的这一值帮助评估所选基因是否具有统计学意义
twas_z_pop1 : 边际 TWAS Z 分数,表示基因表达与表型之间关联的强度;在 FOCUS 方法中,通过这种方式评估基因表达与疾病或复杂性状的关联
pip_pop1 : 边际后验包含概率(Marginal Posterior Inclusion Probability),表示给定数据下基因为因果基因的概率;这是 FOCUS 算法用来判断基因是否真正与性状或疾病有关的关键指标
in_cred_set_pop1 : 标志位,指示模型是否包含在可信集(credible set)中;可信集是包含高置信度因果基因的基因集合,FOCUS 算法根据后验概率筛选基因
ldregion_pop1 : 来自参考基因组的 LD(连锁不平衡)区域,表示基因或变异与目标分子特征之间的连锁不平衡关系;这些信息帮助 FOCUS 算法纠正因 LD 效应引起的干扰

执行程序 (画图)

1
2
3
4
5
6
7
8
9
10
# 20250502 10:52 测试通过
nohup /tools/Python-3.8.3/bin/focus finemap \
/home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz \
/home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22 \
/home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db \
--chr 22 \
--prior-prob "gencode38" \
--locations 38:EUR \
--plot \
--out /home/hello/wkdir/ma-focus/test/focus_resu > run_focus.log 2>&1 &

如果遇到报错: module ‘numpy’ has no attribute ‘bool’.
np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy sca
lar type, use np.bool_ here.

修改,记得换成自己的,106 行换成np.bool_,保存,再运行,完成出图。

1
2
3
4
vim +106 /tools/Python-3.8.3/lib/python3.8/site-packages/pyfocus/viz.py

mask = np.zeros_like(wcor, dtype=np.bool) 修改前
mask = np.zeros_like(wcor, dtype=np.bool_) 修改后
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# 日志记录
===================================
FOCUS v0.803
===================================
focus finemap
/home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz
/home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22
/home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db
--chr 22
--prior-prob gencode38
--locations 38:EUR
--plot
--out /home/hello/wkdir/ma-focus/test/focus_resu

Starting log...
[2025-05-02 11:15:53 - INFO] Detecting 1 populations for fine-mapping.
[2025-05-02 11:15:53 - INFO] As a result, running single-population FOCUS.
[2025-05-02 11:15:53 - INFO] Preparing GWAS summary file for population at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz.
[2025-05-02 11:15:54 - INFO] Preparing reference SNP data for population at /home/hello/wkdir/ma-focus/test/1000GP3_multiPop_allelesAligned/EUR/1000G.EUR.QC.allelesAligned.22.
[2025-05-02 11:15:54 - INFO] Preparing weight database at /home/hello/wkdir/ma-focus/test/ea_fusion_genoa.db.
[2025-05-02 11:15:54 - INFO] Preparing user-defined locations at 38:EUR.
[2025-05-02 11:15:54 - INFO] Found 1700 independent regions on the entire genome.
[2025-05-02 11:15:54 - INFO] 24 independent regions currently used after being filtered on chromosome, start, and stop.
[2025-05-02 11:15:54 - INFO] Preparing data at region 22:15927607-22:17193405. Skipping if following warning occurs.
[2025-05-02 11:15:54 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 11:15:54 - INFO] Using gencode file prior probability 0.020833333333333332.
[2025-05-02 11:15:55 - INFO] Fine-mapping starts at region 22:15927607-22:17193405.
[2025-05-02 11:15:55 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 22:15927607-22:17193405 will skip if following errors occur.
[2025-05-02 11:15:55 - INFO] Find 1 common genes to be fine-mapped at region 22:15927607-22:17193405.
[2025-05-02 11:15:55 - INFO] Running TWAS.
......
[2025-05-02 11:16:25 - INFO] Calculating PIPs.
[2025-05-02 11:16:25 - INFO] Completed fine-mapping at region 22:44599428-22:46074615.
[2025-05-02 11:16:25 - INFO] Creating FOCUS plots at region 22:44599428-22:46074615.
[2025-05-02 11:16:26 - INFO] Preparing data at region 22:46074615-22:47200568. Skipping if following warning occurs.
[2025-05-02 11:16:26 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 11:16:26 - INFO] Using gencode file prior probability 0.05263157894736842.
[2025-05-02 11:16:26 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:46074615-22:47200568 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 11:16:26 - INFO] Preparing data at region 22:47200568-22:48507891. Skipping if following warning occurs.
[2025-05-02 11:16:26 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 11:16:26 - INFO] Using gencode file prior probability 0.06666666666666667.
[2025-05-02 11:16:26 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:47200568-22:48507891 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 11:16:26 - INFO] Preparing data at region 22:48507891-22:49430885. Skipping if following warning occurs.
[2025-05-02 11:16:26 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 11:16:26 - INFO] Using gencode file prior probability 0.14285714285714285.
[2025-05-02 11:16:26 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:48507891-22:49430885 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 11:16:26 - INFO] Preparing data at region 22:49430885-22:50804870. Skipping if following warning occurs.
[2025-05-02 11:16:26 - INFO] Deciding prior probability for a gene to be causal.
[2025-05-02 11:16:26 - INFO] Using gencode file prior probability 0.013513513513513514.
[2025-05-02 11:16:26 - WARNING] No GWAS SNPs with p-value < 5e-08 found at region 22:49430885-22:50804870 at /home/hello/wkdir/ma-focus/test/EA_WBC.Chen.2020/EA_WBC_new_munged_chr22.tsv.gz. Skipping.
[2025-05-02 11:16:26 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 输出结果
-rw-rw-r-- 1 hello hello 62488 5月 2 11:15 focus_resu.chr22.15927607.17193405.pop1.pdf
-rw-rw-r-- 1 hello hello 69634 5月 2 11:15 focus_resu.chr22.17193405.17813322.pop1.pdf
-rw-rw-r-- 1 hello hello 93108 5月 2 11:16 focus_resu.chr22.17813322.19924835.pop1.pdf
-rw-rw-r-- 1 hello hello 98010 5月 2 11:16 focus_resu.chr22.19924835.22002927.pop1.pdf
-rw-rw-r-- 1 hello hello 81141 5月 2 11:16 focus_resu.chr22.23370460.24588236.pop1.pdf
-rw-rw-r-- 1 hello hello 72533 5月 2 11:16 focus_resu.chr22.27438791.29255810.pop1.pdf
-rw-rw-r-- 1 hello hello 93113 5月 2 11:16 focus_resu.chr22.29255810.31043932.pop1.pdf
-rw-rw-r-- 1 hello hello 76731 5月 2 11:16 focus_resu.chr22.31043932.32268999.pop1.pdf
-rw-rw-r-- 1 hello hello 91959 5月 2 11:16 focus_resu.chr22.38911889.40149793.pop1.pdf
-rw-rw-r-- 1 hello hello 117370 5月 2 11:16 focus_resu.chr22.40149793.42294812.pop1.pdf
-rw-rw-r-- 1 hello hello 115725 5月 2 11:16 focus_resu.chr22.42294812.43318194.pop1.pdf
-rw-rw-r-- 1 hello hello 87855 5月 2 11:16 focus_resu.chr22.43318194.44599428.pop1.pdf
-rw-rw-r-- 1 hello hello 77204 5月 2 11:16 focus_resu.chr22.44599428.46074615.pop1.pdf
-rw-rw-r-- 1 hello hello 23096 5月 2 11:16 focus_resu.focus.tsv
-rw-rw-r-- 1 hello hello 18387 5月 2 11:16 focus_resu.log
-rw-rw-r-- 1 hello hello 18407 5月 2 11:16 run_focus.log

QTLMR

https://mp.weixin.qq.com/s/a8uKYpT53-JbvNmlWkH6Mg

https://www.yuque.com/post_gwas/qtlmr/szamugaqzgznhd6h#qVIGT

https://mp.weixin.qq.com/s/SQNYr0TEYXR9-xHzYSjHmA

https://github.com/mancusolab/ma-focus

  • Title: 因果基因集精细定位 (运行中)
  • Author: Xing Abao
  • Created at : 2025-05-01 17:57:22
  • Updated at : 2025-05-04 12:47:38
  • Link: https://bioinformatics.vip/2025/05/01/post-GWAS/因果基因集精细定位/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments