bb19961203 发表于 2022-3-16 00:48

多叉树如何按枝长取平均值转二叉树?(进化树绘制)

本帖最后由 bb19961203 于 2022-3-24 16:42 编辑

目前我需要进行系统进化树的计算,过程应该是准备好物种信息,然后搜索别人已建好的数据库,得到一棵树。因为这个已建好的数据库内容不完全,会导致很多物种信息的枝长相同,但我的计算模型要求是二叉树,只能把并行枝对应值进行平均,只保留一个叶名称。但建树后枝长是层层括号内数据相加,以我的水平只能先绘图再判断是否并行,数据比较多用这种人工方式不但容易出错,也太累了。请问有没有什么可行方法进行这种并行枝的平均呢?目前的多叉转二叉方式似乎都是把数据全保留并左右归类,我这个是有物种信息的这种方法并不合适。https://static.plob.org/wp-content/uploads/2018/11/1541711253-6252-6c9e3dcb4a5cb0003f62f9d15fb0.jpg并行情况类似这张图右上角区域。
(((((((((((((((((((Escherichia_coli_EDL933:0.00000,Escherichia_coli_O157_H7:0.00000)Escherichia:0.00044,((Escherichia_coli_O6:0.00000,Escherichia_coli_K12:0.00022)Escherichia:0.00022,(Shigella_flexneri_2a_2457T:0.00000,Shigella_flexneri_2a_301:0.00000)Shigella:0.00266)Enterobacteriaceae:0.00000)Enterobacteriaceae:0.00813,((Salmonella_enterica:0.00000,Salmonella_typhi:0.00000)Salmonella:0.00146,Salmonella_typhimurium:0.00075)Salmonella:0.00702)Enterobacteriaceae:0.03131,((Yersinia_pestis_Medievalis:0.00000,(Yersinia_pestis_KIM:0.00000,Yersinia_pestis_CO92:0.00000)Yersinia:0.00000)Yersinia:0.03398,Photorhabdus_luminescens:0.05076)Enterobacteriaceae:0.01182)Enterobacteriaceae:0.02183,((Blochmannia_floridanus:0.32481,Wigglesworthia_brevipalpis:0.35452)Enterobacteriaceae:0.08332,(Buchnera_aphidicola_Bp:0.27492,(Buchnera_aphidicola_APS:0.09535,Buchnera_aphidicola_Sg:0.10235)Buchnera:0.10140)Buchnera:0.06497)Enterobacteriaceae:0.15030)Enterobacteriaceae:0.02808,((Pasteurella_multocida:0.03441,Haemophilus_influenzae:0.03754)Pasteurellaceae:0.01571,Haemophilus_ducreyi:0.05333)Pasteurellaceae:0.07365)Gammaproteobacteria:0.03759,((((Vibrio_vulnificus_YJ016:0.00021,Vibrio_vulnificus_CMCP6:0.00291)Vibrio:0.01212,Vibrio_parahaemolyticus:0.01985)Vibrio:0.01536,Vibrio_cholerae:0.02995)Vibrio:0.02661,Photobacterium_profundum:0.06131)Vibrionaceae:0.05597)Gammaproteobacteria:0.03492,Shewanella_oneidensis:0.10577)Gammaproteobacteria:0.12234,((Pseudomonas_putida:0.02741,Pseudomonas_syringae:0.03162)Pseudomonas:0.02904,Pseudomonas_aeruginosa:0.03202)Pseudomonas:0.14456)Gammaproteobacteria:0.04492,((Xylella_fastidiosa_700964:0.01324,Xylella_fastidiosa_9a5c:0.00802)Xylella:0.10192,(Xanthomonas_axonopodis:0.01069,Xanthomonas_campestris:0.00934)Xanthomonas:0.05037)Xanthomonadaceae:0.24151)Gammaproteobacteria:0.02475,Coxiella_burnetii:0.33185)Gammaproteobacteria:0.03328,((((Neisseria_meningitidis_A:0.00400,Neisseria_meningitidis_B:0.00134)Neisseria:0.12615,Chromobacterium_violaceum:0.09623)Neisseriaceae:0.07131,((Bordetella_pertussis:0.00127,(Bordetella_parapertussis:0.00199,Bordetella_bronchiseptica:0.00022)Bordetella:0.00006)Bordetella:0.14218,Ralstonia_solanacearum:0.11464)Burkholderiales:0.08478)Betaproteobacteria:0.03840,Nitrosomonas_europaea:0.22059)Betaproteobacteria:0.08761)Proteobacteria:0.16913,((((((Agrobacterium_tumefaciens_Cereon:0.00000,Agrobacterium_tumefaciens_WashU:0.00000):0.05735,Rhizobium_meliloti:0.05114)Sinorhizobium:0.05575,((Brucella_suis:0.00102,Brucella_melitensis:0.00184)Brucella:0.08660,Rhizobium_loti:0.09308)Rhizobiales:0.02384)Rhizobiales:0.08637,(Rhodopseudomonas_palustris:0.04182,Bradyrhizobium_japonicum:0.06346)Bradyrhizobiaceae:0.14122)Rhizobiales:0.05767,Caulobacter_crescentus:0.23943)Alphaproteobacteria:0.11257,(Wolbachia_sp._wMel:0.51596,(Rickettsia_prowazekii:0.04245,Rickettsia_conorii:0.02487)Rickettsia:0.38019)Rickettsiaceae:0.12058)Alphaproteobacteria:0.12365)Proteobacteria:0.06301,((((Helicobacter_pylori_J99:0.00897,Helicobacter_pylori_26695:0.00637)Helicobacter:0.19055,Helicobacter_hepaticus:0.12643)Helicobacter:0.05330,Wolinella_succinogenes:0.11644)Helicobacteraceae:0.09105,Campylobacter_jejuni:0.20399)Campylobacterales:0.41390)Proteobacteria:0.04428,((Desulfovibrio_vulgaris:0.38320,(Geobacter_sulfurreducens:0.22491,Bdellovibrio_bacteriovorus:0.45934)Deltaproteobacteria:0.04870)Deltaproteobacteria:0.04100,(Acidobacterium_capsulatum:0.24572,Solibacter_usitatus:0.29086)Acidobacteria:0.20514)Bacteria:0.04214)Bacteria:0.05551,((Fusobacterium_nucleatum:0.45615,(Aquifex_aeolicus:0.40986,Thermotoga_maritima:0.34182)Bacteria:0.07696)Bacteria:0.03606,(((Thermus_thermophilus:0.26583,Deinococcus_radiodurans:0.29763)Deinococci:0.24776,Dehalococcoides_ethenogenes:0.53988)Bacteria:0.04370,((((Nostoc_sp._PCC_7120:0.12014,Synechocystis_sp._PCC6803:0.15652)Cyanobacteria:0.04331,Synechococcus_elongatus:0.13147)Cyanobacteria:0.05040,(((Synechococcus_sp._WH8102:0.06780,Prochlorococcus_marinus_MIT9313:0.05434)Cyanobacteria:0.04879,Prochlorococcus_marinus_SS120:0.10211)Cyanobacteria:0.04238,Prochlorococcus_marinus_CCMP1378:0.16170)Cyanobacteria:0.20442)Cyanobacteria:0.07646,Gloeobacter_violaceus:0.23764)Cyanobacteria:0.24501)Bacteria:0.04332)Bacteria:0.02720)Bacteria:0.03471,((((Gemmata_obscuriglobus:0.36751,Rhodopirellula_baltica:0.38017)Planctomycetaceae:0.24062,((Leptospira_interrogans_L1-130:0.00000,Leptospira_interrogans_56601:0.00027)Leptospira:0.47573,((Treponema_pallidum:0.25544,Treponema_denticola:0.16072)Treponema:0.19057,Borrelia_burgdorferi:0.42323)Spirochaetaceae:0.20278)Spirochaetales:0.07248)Bacteria:0.04615,(((Tropheryma_whipplei_TW08/27:0.00009,Tropheryma_whipplei_Twist:0.00081)Tropheryma:0.44723,Bifidobacterium_longum:0.29283)Actinobacteridae:0.14429,(((((Corynebacterium_glutamicum_13032:0.00022,Corynebacterium_glutamicum:0.00000)Corynebacterium:0.03415,Corynebacterium_efficiens:0.02559)Corynebacterium:0.03682,Corynebacterium_diphtheriae:0.06479)Corynebacterium:0.13907,(((Mycobacterium_bovis:0.00067,(Mycobacterium_tuberculosis_CDC1551:0.00000,Mycobacterium_tuberculosis_H37Rv:0.00000)Mycobacterium:0.00022)Mycobacterium:0.03027,Mycobacterium_leprae:0.05135)Mycobacterium:0.01514,Mycobacterium_paratuberculosis:0.02091)Mycobacterium:0.11523)Corynebacterineae:0.09883,(Streptomyces_avermitilis:0.02680,Streptomyces_coelicolor:0.02678)Streptomyces:0.16707)Actinomycetales:0.06110)Actinobacteridae:0.26800)Bacteria:0.03480,((Fibrobacter_succinogenes:0.51984,(Chlorobium_tepidum:0.37204,(Porphyromonas_gingivalis:0.11304,Bacteroides_thetaiotaomicron:0.13145)Bacteroidales:0.34694)Bacteroidetes/Chlorobi_group:0.09237)Bacteria:0.04841,(((Chlamydophila_pneumoniae_TW183:0.00000,(Chlamydia_pneumoniae_J138:0.00000,(Chlamydia_pneumoniae_CWL029:0.00000,Chlamydia_pneumoniae_AR39:0.00000)Chlamydophila:0.00000)Chlamydophila:0.00000)Chlamydophila:0.10482,Chlamydophila_caviae:0.05903)Chlamydophila:0.04170,(Chlamydia_muridarum:0.01938,Chlamydia_trachomatis:0.02643)Chlamydia:0.06809)Chlamydiaceae:0.60169)Bacteria:0.04443)Bacteria:0.04284)Bacteria:0.02646,((Thermoanaerobacter_tengcongensis:0.17512,((Clostridium_tetani:0.10918,Clostridium_perfringens:0.11535)Clostridium:0.03238,Clostridium_acetobutylicum:0.11396)Clostridium:0.15056)Clostridia:0.11788,(((((Mycoplasma_mobile:0.27702,Mycoplasma_pulmonis:0.28761)Mycoplasma:0.28466,((((Mycoplasma_pneumoniae:0.10966,Mycoplasma_genitalium:0.11268)Mycoplasma:0.31768,Mycoplasma_gallisepticum:0.24373)Mycoplasma:0.14180,Mycoplasma_penetrans:0.34890)Mycoplasma:0.06674,Ureaplasma_parvum:0.33874)Mycoplasmataceae:0.19177)Mycoplasmataceae:0.07341,Mycoplasma_mycoides:0.37680)Mycoplasmataceae:0.12541,Phytoplasma_Onion_yellows:0.47843)Mollicutes:0.09099,(((((Listeria_monocytogenes_F2365:0.00063,Listeria_monocytogenes_EGD:0.00144)Listeria:0.00235,Listeria_innocua:0.00248)Listeria:0.13517,((Oceanobacillus_iheyensis:0.13838,Bacillus_halodurans:0.09280)Bacillaceae:0.02676,(((Bacillus_cereus_ATCC_14579:0.00342,Bacillus_cereus_ATCC_10987:0.00123)Bacillus:0.00573,Bacillus_anthracis:0.00331)Bacillus:0.08924,Bacillus_subtilis:0.07876)Bacillus:0.01984)Bacillaceae:0.03907)Bacillales:0.02816,((Staphylococcus_aureus_MW2:0.00000,(Staphylococcus_aureus_N315:0.00022,Staphylococcus_aureus_Mu50:0.00022)Staphylococcus:0.00022)Staphylococcus:0.02479,Staphylococcus_epidermidis:0.03246)Staphylococcus:0.17366)Bacillales:0.02828,(((((((Streptococcus_agalactiae_III:0.00110,Streptococcus_agalactiae_V:0.00155)Streptococcus:0.01637,(Streptococcus_pyogenes_M1:0.00134,(Streptococcus_pyogenes_MGAS8232:0.00045,(Streptococcus_pyogenes_MGAS315:0.00000,Streptococcus_pyogenes_SSI-1:0.00022)Streptococcus:0.00110)Streptococcus:0.00066)Streptococcus:0.02250)Streptococcus:0.01360,Streptococcus_mutans:0.04319)Streptococcus:0.01920,(Streptococcus_pneumoniae_R6:0.00119,Streptococcus_pneumoniae_TIGR4:0.00124)Streptococcus:0.03607)Streptococcus:0.04983,Lactococcus_lactis:0.11214)Streptococcaceae:0.08901,Enterococcus_faecalis:0.07946)Lactobacillales:0.03958,(Lactobacillus_johnsonii:0.20999,Lactobacillus_plantarum:0.14371)Lactobacillus:0.06763)Lactobacillales:0.08989)Bacilli:0.08905)Firmicutes:0.09540)Firmicutes:0.04315)Bacteria:1.34959,(((((Thalassiosira_pseudonana:0.33483,(Cryptosporidium_hominis:0.25048,Plasmodium_falciparum:0.28267)Apicomplexa:0.14359)Eukaryota:0.03495,(((Oryza_sativa:0.07623,Arabidopsis_thaliana:0.09366)Streptophyta:0.15770,Cyanidioschyzon_merolae:0.38319)Eukaryota:0.08133,(Dictyostelium_discoideum:0.34685,(((Eremothecium_gossypii:0.07298,Saccharomyces_cerevisiae:0.07619)Saccharomycetaceae:0.21170,Schizosaccharomyces_pombe:0.24665)Ascomycota:0.15370,(((Anopheles_gambiae:0.10724,Drosophila_melanogaster:0.10233)Diptera:0.09870,((Takifugu_rubripes:0.03142,Danio_rerio:0.05230)Actinopterygii:0.04335,(((Rattus_norvegicus:0.03107,Mus_musculus:0.01651)Murinae:0.00398,(Homo_sapiens:0.00957,Pan_troglodytes:0.03864)Hominidae:0.01549)Euarchontoglires:0.01629,Gallus_gallus:0.04596)Gnathostomata:0.01859)Gnathostomata:0.09688)Metazoa:0.03693,(Caenorhabditis_elegans:0.01843,Caenorhabditis_briggsae:0.01896)Caenorhabditis:0.24324)Metazoa:0.09911)Eukaryota:0.04004)Eukaryota:0.02708)Eukaryota:0.02636)Eukaryota:0.06455,Leishmania_major:0.45664)Eukaryota:0.10129,Giardia_lamblia:0.55482)Eukaryota:0.57543,((Nanoarchaeum_equitans:0.81078,(((Sulfolobus_tokodaii:0.17389,Sulfolobus_solfataricus:0.18962)Sulfolobus:0.33720,Aeropyrum_pernix:0.43380)Thermoprotei:0.09462,Pyrobaculum_aerophilum:0.55514)Thermoprotei:0.12018)Archaea:0.15444,((Thermoplasma_volcanium:0.10412,Thermoplasma_acidophilum:0.09785)Thermoplasma:0.66151,((((Methanobacterium_thermautotrophicum:0.36583,Methanopyrus_kandleri:0.35331)Euryarchaeota:0.07446,(Methanococcus_maripaludis:0.28592,Methanococcus_jannaschii:0.13226)Methanococcales:0.23828)Euryarchaeota:0.06284,((Pyrococcus_horikoshii:0.02786,Pyrococcus_abyssi:0.02179)Pyrococcus:0.02239,Pyrococcus_furiosus:0.02366)Pyrococcus:0.36220)Euryarchaeota:0.04469,(Archaeoglobus_fulgidus:0.34660,(Halobacterium_sp._NRC-1:0.61597,(Methanosarcina_acetivorans:0.02602,Methanosarcina_mazei:0.03087)Methanosarcina:0.30588)Euryarchaeota:0.12801)Euryarchaeota:0.10395)Euryarchaeota:0.06815)Euryarchaeota:0.11833)Archaea:0.43325):0.88776);
newick树以这种形式保存,冒号后数据是这个枝长,但总枝长是每层括号内加起来,直接合并枝长相同部分只是把最小分支合并了……

feob 发表于 2022-3-16 06:24

感谢分享

小亮丶1 发表于 2022-3-16 07:16

http://www.52pojie.cn

heisedeshamo 发表于 2022-3-16 08:20

这个学过但不会,来学习学习

wycdd 发表于 2022-3-16 08:24

这个帮不上忙

daisypojie 发表于 2022-3-16 08:43

qwe12079 发表于 2022-3-16 09:31

看看学习学习

xslxsl 发表于 2022-3-16 10:22

感谢分享

nsy776 发表于 2022-3-16 11:44

看着好流批,奈何我不会:Dweeqw

bb19961203 发表于 2022-3-24 20:34

近期又思考了一下,还是没有找到好办法。

如果不限于树的情况,单纯进行括号的拆分是否可行呢?
目前只尝试了excel的分列,但是无法判断拆分的数据在第几重括号内。
页: [1] 2
查看完整版本: 多叉树如何按枝长取平均值转二叉树?(进化树绘制)