ISSN: 2455-5282
Global Journal of Medical and Clinical Case Reports
Short Communication       Open Access      Peer-Reviewed

Markov Models of Genomic Events

Orchidea Maria Lecian Sapienza*

University of Rome, Rome, Italy
*Corresponding author: Orchidea Maria Lecian Sapienza, University of Rome, Rome, Italy, E-mail: [email protected]
Received: 25 June, 2024 |Accepted: 18 July, 2024 | Published: 19 July, 2024
Keywords: Chains; Markov chains; Enveloping algebras; Genomic events; Allele-specific copy-number abnormalities

Cite this as

Sapienza OML. Markov Models of Genomic Events. Glob J Medical Clin Case Rep. 2024:11(3): 018-020. Available from: 10.17352/2455-5282.000181

Copyright License

© 2024 Sapienza OML. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

The Markov Models of genomic elements are newly considered. The representation of the fundamental matrix of the Markov model is newly theorised. The order of magnitude of the initial conditions for the elements of the transition probabilities is newly hypothesised.

The model is compared with a sub-Hidden Markov Model of genomic events. The chosen representation of the states is newly proven to consist of an enveloping algebra. The new condition is posed on the Markovian feature of the originating chain from the study of the elements of the loci of the state space; in this case, the choice of the representation of the probability matrix is analytically spelled out, and Monte Carlo methods are not necessitated.

Introduction

The present report is aimed at further improving the mathematical definitions in the Markov models of genomic elements, such as that recently presented in [1].

The present paper is aimed at improving from [1] the long-standing interrogations raised in [2-5] about the analytical modellings of algorithms of oncogenesis.

Eq. (1) form [1] is here imposed a new hypothesis, for which the comparison holds also with the (alternative) numerical (Monte Carlo) methods developed in [6] and more recently improved in [7]. More in detail, the new analysis is pointed out, which ensures the new choice of the representation of the probability matrix, for which the confrontation with the numerical methods (if/where necessitated) is compliant. The comparison with numerical methods can be of interest i.e. in the case envisaged in [8] for the numerical test of inference parameters.

Furthermore, the method is compared with the analysis of the sub-Hidden Markov Model (subHMM), which is used in [9] to understand the study of the copy number abnormalities in the allele-specific analyses; in this case, the states of the Markov models are newly proven to consist of an enveloping algebra. Furthermore, the relevance of the hypothesis of a constant number of Markov states in the definition of the fundamental matrix of the originating chain is newly demonstrated to define the Markovian feature. Accordingly, the enveloping algebra defines the committors, which characterise the Markov State Model, from which the subHMMs can be issued. After these proofs, it is possible to analytically calculate the Mean-First Passage Times, the time evolutions of the eigenvalues, and those of the modellisation errors.

Low-rank-tensor methods

The evolution of cancer phenomena can be modelled as continuous-time Markov chains.

Transition rates are hypothesised as separable functions in [1], i.e. such that convergent ’iteration methods’ can be made use of, for which the notion of distribution is retrieved.

Non-stationarity is due to the fact that the age of the tumors might be unknown, for which the marginalisation of the time variable is needed.

The necessity of the low-rank tensor methods is justified from the evidence that given d the number of ’genomic events’, there are n = 2d number of Markov states of the tumor; as from the recent understandings, there are d = 299 known genes which determine the evolution of the tumors [10]. The functional dependence of the state space on 2d is named ’state space explosion’ after [11]; it is tamed after the introduction of the ’marginal distributions’, by which operators that act on the low-rank tensors are defined.

The ’Hierarchical Tucker format is adopted.

Let Q ^ MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsaceWGrbGbaKaaaaa@3767@ be the fundamental matrix of the chosen Markov chain on a discrete state space S with initial distributions assumed as defined.

It is here newly requested that for Eq. (1) from [1] to hold, the hypothesis that the entries of Q ^ MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsaceWGrbGbaKaaaaa@3767@ be infinitesimal must newly be requested.

Let P be the probability matrix associated with the fundamental matrix Q ^ MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsaceWGrbGbaKaaaaa@3767@ and after the new hypothesis; the distributions from p ^ MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmiCayaajaaaaa@36F7@ are defined from the initial value p, where the latter is written as

p= 0 e τ[ Q ^ I ^ ] p(0)dτ     (1) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsacaWGWbGaeyypa0tcfa4aa8qCaOqaaKqzGeGaamyzaKqbaoaaCaaaleqabaqcLbsacqaHepaDjuaGdaWadaWcbaqcLbsaceWGrbGbaKaacqGHsislceWGjbGbaKaaaSGaay5waiaaw2faaaaaaeaajugibiaaicdaaSqaaKqzGeGaeyOhIukacqGHRiI8aiaadchacaGGOaGaaGimaiaacMcacaWGKbGccqaHepaDcaqGGaGaaeiiaiaabccacaqGGaGaaeiiaiaabIcacaqGXaGaaeykaaaa@5411@

The new hypothesis p(0) = o(0) is here therefore newly requested for the proper definition. In Eq. (1),τ is a time variable, and [ Q ^ I ^ ] MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcfa4aamWaaSqaaKqzGeGabmyuayaajaGaeyOeI0IabmysayaajaaaliaawUfacaGLDbaaaaa@3BC8@ is a regular operator. The spectrum σ([ Q ^ I ^ ]) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsacqaHdpWCcaGGOaqcfa4aamWaaOqaaKqzGeGabmyuayaajaGaeyOeI0IabmysayaajaaakiaawUfacaGLDbaajugibiaacMcaaaa@4000@ of the operator [ Q ^ I ^ ] MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcfa4aamWaaOqaaKqzGeGabmyuayaajaGaeyOeI0IabmysayaajaaakiaawUfacaGLDbaaaaa@3BC6@ is written as from the states x ∈ S from the definition

σ([ Q ^ I ^ ]) xs { zC:|zQxx|| Q xx | }{ zC:Rez0 }      (2) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsacqaHdpWCcaGGOaqcfa4aamWaaOqaaKqzGeGabmyuayaajaGaeyOeI0IabmysayaajaaakiaawUfacaGLDbaajugibiaacMcacqGHgksZjuaGdaWfqaqaaKqzGeGaeSOkIufajuaGbaqcLbsacaWG4bGaeyicI4Saam4CaaqcfayabaWaaiWaaeaajugibiaadQhacqGHiiIZcaqGdbGaaiOoaiaacYhacaWG6bGaeyOeI0IaamyuaiabgkHiTiaadIhacaWG4bGaaiiFaiabgsMiJkaacYhacaWGrbqcfa4aaSbaaeaajug4aiaadIhacaWG4baajuaGbeaajugibiaacYhaaKqbakaawUhacaGL9baajugibiabgAOinNqbaoaacmaabaqcLbsacaWG6bGaeyicI4Saae4qaiaacQdaciGGsbGaaiyzaiaadQhacqGHKjYOcaaIWaaajuaGcaGL7bGaayzFaaGaaeiiaiaabccacaqGGaGaaeiiaiaabccacaqGGaGaaeikaiaabkdacaqGPaaaaa@77BF@

It is important to remark that the marginalisation procedures originating from Eq. (2), from which the Markov models descend, therefore differ from the ’dominant-eigenvalue’ technique with

σ([ Q ^ I ^ ]){ zC:Re(z)1 }.     (3) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqcLbsacqaHdpWCcaGGOaqcfa4aamWaaOqaaKqzGeGabmyuayaajaGaeyOeI0IabmysayaajaaakiaawUfacaGLDbaajugibiaacMcacqGHgksZjuaGdaGadaqaaKqzGeGaamOEaiabgIGiolaaboeacaGG6aGaciOuaiaacwgacaGGOaGaamOEaiaacMcacqGHKjYOcqGHsislcaaIXaaajuaGcaGL7bGaayzFaaqcLbsacaGGUaGaaeiiaiaabccacaqGGaGaaeiiaiaabccacaqGOaGaae4maiaabMcaaaa@57D7@

The method of the ’stochastic automata networks’ is further discussed in [12].

Allele-specific copy number methods

Allele-specific copy-number methods allow one to study copy-number abnormalities, as from [9].

For this sake, a sub-Hidden Markov Model (subHMM) is implemented: it allows one to consider both the ’subclone region’ and the ’region-specific genotype’. The hidden-state variable Wk of the state k represents the ’conglomeration of the subclone genotype’ and the ’clonal proportion’.

More in detail, the state Wk[zk,Uk,Tk] is defined as giving rise to time-dependent transition probabilities which can be represented as ’multinomial distribution’.

The states Wk are specified after Zk the ’mainclone genotype’ of the locus k, Uk the ’indicator’ about whether there is a subclone in k, and Tk the ’subclone genotype’ (i.e. if the considered subclone exists).

The transition of the states Wk is considered in [9] only for consecutive ’loci’.

A maximum number of copies is assumed.

Therefore, the elements of the subHMM are here newly proven to compose an enveloping algebra.

Under the hypothesis of the ’constant clonal proportion’, the transition probabilities Pt(z) from Eq. (2) in [9] determine that the hidden states are not observed, and ’allele-specific’ elements are considered.

Conclusion

The Markov model of genomic events is newly further analysed.

More in detail, the choice of the representation of the transition probabilities is reconducted to be well-posed only under the new hypothesis that the entries of the fundamental matrix be infinitesimal.

The new hypothesis on the initial conditions of the transition elements is requested for the time-marginalisation technique to be consistent. The difference with the ’dominant-eigenvalue approach’ is stressed. The case of the sub-Hidden Markov Model in the study of allele-specific copy number analysis is newly approached.

The elements of the Markov models are therefore here newly proven to consist of an enveloping algebra.

Furthermore, it aims to focus on the hypothesis of a constant number of ’constant clonal proportions: in this case, the Markovian feature of the originating chain is newly proven after the study of the entries of the fundamental matrix.

It has to be stressed that the proof of the Markovian property of the originating chain is fundamental in the definition of the Markov State Model(s) from which the subHMM is taken. In the case of the Markovian feature, the possibility to define the committor is necessitated for the study of the Mean-First Passage times and that of the time evolution of the eigenvalues, as from [13] and [14], respectively.

  1. Georg P, Grasedyck L, Klever M, Schill R, Spang R, Wettig T. Low-rank tensor methods for Markov chains with applications to tumor progression models. J Math Biol. 2022;86(1):7. Available from: https://doi.org/10.1007/s00285-022-01846-9.
  2. Hjelm M, Hoeglund M, Lagergren J. New probabilistic network models and algorithms for oncogenesis. J Comput Biol. 2006:853-865. Available from: https://doi.org/10.1089/cmb.2006.13.853.
  3. Beerenwinkel N, Sullivant S. Markov models for accumulating mutations. Biometrika. 2009;96:645-661. Available from: https://doi.org/10.1093/biomet/asp023.
  4. Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics. 2019;36:241-249. Available from: https://doi.org/10.1093/bioinformatics/btz513.
  5. Gotovos A, Burkholz R, Quackenbush J, Jegelka S. Scaling up continuous-time Markov chains helps resolve underspecification. arXiv. 2021. arXiv:2107.02911. Available from: https://doi.org/10.48550/arXiv.2107.02911.
  6. Ji H, Mascagni M, Li Y. Convergence analysis of Markov chain Monte Carlo linear solvers using Ulam-von Neumann Algorithm. SIAM J Numer Anal. 2013;51(4):2107-2122. Available from: https://doi.org/10.1137/130904867.
  7. Fathi-Vajargah B, Hassanzadeh Z. Improvements on the hybrid Monte Carlo algorithms for matrix computations. Sa'dhana'. 2019;44(1):1. Available from: https://doi.org/10.1007/s12046-018-0983-y.
  8. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer Evolution: Mathematical Models and Computational Inference. Syst Biol. 2015;64. Available from: https://doi.org/10.1093/sysbio/syu081.
  9. Choo-Wosoba H, Albert PS, Zhu B. A hidden Markov modeling approach for identifying tumor subclones in next-generation sequencing studies. Biostatistics. 2022;23:69-82. Available from: https://doi.org/10.1093/biostatistics/kxaa013
  10. Bailey MH, Tokheim C, Porta-Pardo EL, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173(2):371-385.e18. Available from: https://doi.org/10.1016/j.cell.2018.02.060.
  11. Buchholz P, Dayar T. On the convergence of a class of multilevel methods for large sparse Markov chains. SIAM J Matrix Anal Appl. 2007;29(3):1025-1049. Available from: https://doi.org/10.1137/060651161.
  12. Plateau B, Stewart WJ. Stochastic automata networks. In: International series in operations research and management science. New York: Springer; 2000. p. 113-151. Available from: https://link.springer.com/chapter/10.1007/978-1-4757-4828-4_5
  13. Lecian OM. Analytical results from the two-states Markovv-states model and applications to validation of molecular dynamics. Int J Math Comput Res. 2023;11(9):3746-3754. Available from: https://doi.org/10.47191/ijmcr/v11i9.08
  14. Lecian OM. Laplace Kernels with Radon measures in Galerkin Markov-State Models: new theorems about analytical expressions of time evolution of eigenvalues and about errors. e-print. 2024. Available from: http://dx.doi.org/10.13140/RG.2.2.24311.39841/1
 

Help ?