butools.fitting.PHFromTrace =========================== .. currentmodule:: butools.fitting .. np:function:: PHFromTrace .. list-table:: :widths: 25 150 * - Matlab: - :code:`[alpha, A, logli] = PHFromTrace(trace, orders, maxIter, stopCond, initial, result)` * - Mathematica: - :code:`{alpha, A, logli} = PHFromTrace[trace, orders, maxIter, stopCond, initial, result]` * - Python/Numpy: - :code:`alpha, A, logli = PHFromTrace(trace, orders, maxIter, stopCond, initial, result)` Performs PH distribution fitting using the EM algorithm (G-FIT, [1]_). Parameters ---------- trace : column vector, length K The samples of the trace orders : list of int, length(N), or int The length of the list determines the number of Erlang branches to use in the fitting method. The entries of the list are the orders of the Erlang distributions. If this parameter is a single integer, all possible branch number - order combinations are tested where the total number of states is "orders". maxIter : int, optional Maximum number of iterations. The default value is 200 stopCond : double, optional The algorithm stops if the relative improvement of the log likelihood falls below stopCond. The default value is 1e-7 initial : tuple of two vectors, optional The initial values of the branch probabilities and rate parameters is given by this tuple. If not given, a default initial guess is determined and the algorithm starts from there. result : {"vecmat", "vecvec"}, optional The result can be returned two ways. If "vecmat" is selected, the result is returned in the classical representation of phase-type distributions, thus the initial vector and the generator matrix. If "vecvec" is selected, two vectors are returned, one holds the branch probabilities, and the second holds the rate parameters of the Erlang branches. The default value is "vecmat" Returns ------- (alpha, A) : tuple of matrix, shape (1,M) and matrix, shape (M,M) If the "vecmat" result format is chosen, the function returns the initial probability vector and the generator matrix of the phase type distribution. (pi, lambda) : tuple of vector, length N and vector, length N If the "vecvec" result format is chosen, the function returns the vector of branch probabilities and the vector of branch rates in a tuple. logli : double The log-likelihood divided by the trace length Notes ----- This procedure is quite fast in the supported mathematical frameworks. If the maximum speed is needed, please use the multi-core optimized c++ implementation called SPEM-FIT_. .. _SPEM-FIT: https://bitbucket.org/ghorvath78/spemfit References ---------- .. [1] Thummler, Axel, Peter Buchholz, and Miklós Telek. A novel approach for fitting probability distributions to real trace data with the EM algorithm. Dependable Systems and Networks, 2005. Examples ======== For Matlab: >>> tr = dlmread('/home/gabor/github/butools/test/data/bctrace.iat'); >>> [alpha, A] = PHFromTrace(tr, 5); Num of iterations: 26, logli: 4.80924 Num of iterations: 26, logli: 4.80924 EM algorithm terminated. (orders=1,4) Num of iterations: 38, logli: 4.70421 EM algorithm terminated. (orders=2,3) Num of iterations: 30, logli: 4.89844 Num of iterations: 60, logli: 4.8989 Num of iterations: 91, logli: 4.89925 Num of iterations: 121, logli: 4.8996 Num of iterations: 150, logli: 4.9 Num of iterations: 179, logli: 4.90046 Num of iterations: 201, logli: 4.90074 EM algorithm terminated. (orders=1,1,3) Num of iterations: 30, logli: 4.85155 Num of iterations: 58, logli: 4.85184 Num of iterations: 87, logli: 4.85214 Num of iterations: 119, logli: 4.85259 Num of iterations: 150, logli: 4.85461 Num of iterations: 179, logli: 4.88627 Num of iterations: 192, logli: 4.91508 EM algorithm terminated. (orders=1,2,2) Num of iterations: 22, logli: 4.88641 Num of iterations: 43, logli: 4.88777 Num of iterations: 64, logli: 4.88814 Num of iterations: 84, logli: 4.88827 Num of iterations: 104, logli: 4.88833 Num of iterations: 124, logli: 4.88836 Num of iterations: 145, logli: 4.88837 Num of iterations: 153, logli: 4.88838 EM algorithm terminated. (orders=1,1,1,2) Num of iterations: 17, logli: 4.85091 Num of iterations: 27, logli: 4.85108 EM algorithm terminated. (orders=1,1,1,1,1) Best solution: logli=4.91508, orders=1,2,2 >>> disp(alpha); 0.065027 0.85788 0 0.077088 0 >>> disp(A); -63.308 0 0 0 0 0 -815.72 815.72 0 0 0 0 -815.72 0 0 0 0 0 -12563 12563 0 0 0 0 -12563 >>> logli = LikelihoodFromTrace(tr, alpha, A); >>> disp(logli); 4.9151 >>> intBounds = linspace(0, MarginalMomentsFromTrace(tr, 1)*4, 50); >>> [pdfTrX, pdfTrY] = PdfFromTrace(tr, intBounds); >>> [pdfPHX, pdfPHY] = IntervalPdfFromPH(alpha, A, intBounds); >>> sqPdf = EmpiricalSquaredDifference(pdfTrY, pdfPHY, intBounds); >>> disp(sqPdf); 0.0079115 >>> rePdf = EmpiricalRelativeEntropy(pdfTrY, pdfPHY, intBounds); >>> disp(rePdf); 0.35834 >>> [cdfTrX, cdfTrY] = CdfFromTrace(tr); >>> step = ceil(length(tr)/2000); >>> cdfTrX = cdfTrX(1:step:length(tr)); >>> cdfTrY = cdfTrY(1:step:length(tr)); >>> cdfPHY = CdfFromPH(alpha, A, cdfTrX); >>> sqCdf = EmpiricalSquaredDifference(cdfTrY(1:end-1), cdfPHY(1:end-1), cdfTrX); >>> disp(sqCdf); 9.9902e-11 >>> reCdf = EmpiricalRelativeEntropy(cdfTrY(1:end-1), cdfPHY(1:end-1), cdfTrX); >>> disp(reCdf); 0.00018583 For Mathematica: >>> tr = Flatten[Import["/home/gabor/github/butools/test/data/bctrace.iat","CSV"]]; >>> {alpha, A, logli} = PHFromTrace[tr, 5]; "Num of iterations: "20", logli: "4.8092300067059535 "Num of iterations: "26", logli: "4.809235435027485 "EM algorithm terminated. (orders="{1, 4}")" "Num of iterations: "20", logli: "4.683317781477141 "Num of iterations: "38", logli: "4.704214271672482 "Num of iterations: "38", logli: "4.704214271672482 "EM algorithm terminated. (orders="{2, 3}")" "Num of iterations: "13", logli: "4.894318246490096 "Num of iterations: "26", logli: "4.898349065288064 "Num of iterations: "39", logli: "4.8985999642994065 "Num of iterations: "51", logli: "4.89878133343547 "Num of iterations: "64", logli: "4.8989479972108 "Num of iterations: "77", logli: "4.8990977181707995 "Num of iterations: "90", logli: "4.89924035598373 "Num of iterations: "102", logli: "4.899372417693129 "Num of iterations: "114", logli: "4.899510325011154 "Num of iterations: "127", logli: "4.899671843230089 "Num of iterations: "139", logli: "4.899836025597829 "Num of iterations: "152", logli: "4.900031245080798 "Num of iterations: "165", logli: "4.900238803829762 "Num of iterations: "178", logli: "4.900443966924616 "Num of iterations: "190", logli: "4.9006139623651945 "Num of iterations: "201", logli: "4.900742157090518 "EM algorithm terminated. (orders="{1, 1, 3}")" "Num of iterations: "14", logli: "4.846111351537217 "Num of iterations: "27", logli: "4.851422822109602 "Num of iterations: "40", logli: "4.851663586226162 "Num of iterations: "52", logli: "4.85177202585639 "Num of iterations: "65", logli: "4.851912954098613 "Num of iterations: "78", logli: "4.8520519001971145 "Num of iterations: "91", logli: "4.852183146393526 "Num of iterations: "104", logli: "4.852333485343567 "Num of iterations: "117", logli: "4.852550797457522 "Num of iterations: "130", logli: "4.852926291581028 "Num of iterations: "143", logli: "4.853720321486975 "Num of iterations: "158", logli: "4.85667738204316 "Num of iterations: "167", logli: "4.8623051325434705 "Num of iterations: "168", logli: "4.863279856542737 "Num of iterations: "169", logli: "4.864341586890624 "Num of iterations: "170", logli: "4.865497327898647 "Num of iterations: "171", logli: "4.866757372059696 "Num of iterations: "172", logli: "4.868137493672351 "Num of iterations: "173", logli: "4.8696621585039255 "Num of iterations: "174", logli: "4.871369345657646 "Num of iterations: "175", logli: "4.8733180347788565 "Num of iterations: "176", logli: "4.875600054271152 "Num of iterations: "177", logli: "4.878358389000584 "Num of iterations: "178", logli: "4.881811700896143 "Num of iterations: "179", logli: "4.886268931740982 "Num of iterations: "180", logli: "4.892054411447398 "Num of iterations: "181", logli: "4.899120695311693 "Num of iterations: "182", logli: "4.9062691521540565 "Num of iterations: "183", logli: "4.911394072570149 "Num of iterations: "184", logli: "4.913841789490561 "Num of iterations: "185", logli: "4.914707340661588 "Num of iterations: "186", logli: "4.914972473547811 "Num of iterations: "187", logli: "4.915049918282578 "Num of iterations: "188", logli: "4.915072471899828 "Num of iterations: "189", logli: "4.915079202008659 "Num of iterations: "190", logli: "4.915081312881225 "Num of iterations: "191", logli: "4.915082026039371 "Num of iterations: "192", logli: "4.91508228990118 "Num of iterations: "192", logli: "4.91508228990118 "EM algorithm terminated. (orders="{1, 2, 2}")" "Num of iterations: "11", logli: "4.879855265152097 "Num of iterations: "21", logli: "4.886270915069811 "Num of iterations: "30", logli: "4.887168456960867 "Num of iterations: "40", logli: "4.887668315460644 "Num of iterations: "49", logli: "4.887917322768672 "Num of iterations: "59", logli: "4.888081691124073 "Num of iterations: "69", logli: "4.888181621001137 "Num of iterations: "78", logli: "4.888239956657024 "Num of iterations: "88", logli: "4.888283611362753 "Num of iterations: "98", logli: "4.888313275634503 "Num of iterations: "108", logli: "4.888334008025995 "Num of iterations: "118", logli: "4.888348859128792 "Num of iterations: "128", logli: "4.888359731068756 "Num of iterations: "138", logli: "4.888367845219129 "Num of iterations: "149", logli: "4.888374537830832 "Num of iterations: "153", logli: "4.8883765319279915 "EM algorithm terminated. (orders="{1, 1, 1, 2}")" "Num of iterations: "9", logli: "4.844577387233243 "Num of iterations: "17", logli: "4.850912188288292 "Num of iterations: "26", logli: "4.8510832077549315 "Num of iterations: "27", logli: "4.851083604173109 "EM algorithm terminated. (orders="{1, 1, 1, 1, 1}")" "Best solution: logli="4.91508228990118", orders="{1, 2, 2} >>> Print[alpha]; {0.06502731323053666, 0.8578848706972437, 0, 0.07708781607263537, 0} >>> Print[A]; {{-63.307631149011584, 0, 0, 0, 0}, {0, -815.7180689395599, 815.7180689395599, 0, 0}, {0, 0., -815.7180689395599, 0, 0}, {0, 0, 0, -12563.087922793351, 12563.087922793351}, {0, 0, 0, 0., -12563.087922793351}} >>> Print[logli]; 4.91508228990118 >>> logli = LikelihoodFromTrace[tr, alpha, A]; >>> Print[logli]; 4.915082396859105 >>> intBounds = Array[# &, 50, {0, MarginalMomentsFromTrace[tr, 1][[1]]*4}]; >>> {pdfTrX, pdfTrY} = PdfFromTrace[tr, intBounds]; >>> {pdfPHX, pdfPHY} = IntervalPdfFromPH[alpha, A, intBounds]; >>> sqPdf = EmpiricalSquaredDifference[pdfTrY, pdfPHY, intBounds]; >>> Print[sqPdf]; 0.007911510224685201 >>> rePdf = EmpiricalRelativeEntropy[pdfTrY, pdfPHY, intBounds]; >>> Print[rePdf]; 0.3583352563975375 >>> {cdfTrX, cdfTrY} = CdfFromTrace[tr]; >>> step = Ceiling[Length[tr]/2000]; >>> cdfTrX = cdfTrX[[1;;Length[tr];;step]]; >>> cdfTrY = cdfTrY[[1;;Length[tr];;step]]; >>> cdfPHY = CdfFromPH[alpha, A, cdfTrX]; >>> sqCdf = EmpiricalSquaredDifference[cdfTrY[[1;;-2]], cdfPHY[[1;;-2]], cdfTrX]; >>> Print[sqCdf]; 9.990199283948094*^-11 >>> reCdf = EmpiricalRelativeEntropy[cdfTrY[[1;;-2]], cdfPHY[[1;;-2]], cdfTrX]; >>> Print[reCdf]; 0.000185826390423264 For Python/Numpy: >>> tr = np.loadtxt("/home/gabor/github/butools/test/data/bctrace.iat") >>> alpha, A = PHFromTrace(tr, 5) iteration: 10 , logli: 4.8091271432 iteration: 20 , logli: 4.80923172357 EM algorithm terminated. [1, 4] Num of iterations: 26 , logli: 4.80923543503 iteration: 10 , logli: 4.65521504335 iteration: 20 , logli: 4.68768389352 iteration: 30 , logli: 4.70415249872 EM algorithm terminated. [2, 3] Num of iterations: 38 , logli: 4.70421427167 iteration: 10 , logli: 4.88937597779 iteration: 20 , logli: 4.89819514171 iteration: 30 , logli: 4.89845514506 iteration: 40 , logli: 4.89863272893 iteration: 50 , logli: 4.89878133343 iteration: 60 , logli: 4.89891146874 iteration: 70 , logli: 4.89903002265 iteration: 80 , logli: 4.89914197577 iteration: 90 , logli: 4.89925126394 iteration: 100 , logli: 4.89936125929 iteration: 110 , logli: 4.89947502785 iteration: 120 , logli: 4.89959540205 iteration: 130 , logli: 4.89972483134 iteration: 140 , logli: 4.89986492887 iteration: 150 , logli: 4.90001566233 iteration: 160 , logli: 4.90017434189 iteration: 170 , logli: 4.90033497306 iteration: 180 , logli: 4.90048885472 iteration: 190 , logli: 4.9006268209 iteration: 200 , logli: 4.90074215709 EM algorithm terminated. [1, 1, 3] Num of iterations: 201 , logli: 4.90074215709 iteration: 10 , logli: 4.8428939533 iteration: 20 , logli: 4.85044993659 iteration: 30 , logli: 4.85156877825 iteration: 40 , logli: 4.85167161239 iteration: 50 , logli: 4.85176192227 iteration: 60 , logli: 4.8518686018 iteration: 70 , logli: 4.85197855253 iteration: 80 , logli: 4.85208231474 iteration: 90 , logli: 4.8521831464 iteration: 100 , logli: 4.85229494225 iteration: 110 , logli: 4.85243806034 iteration: 120 , logli: 4.85264283694 iteration: 130 , logli: 4.85296689978 iteration: 140 , logli: 4.85354745937 iteration: 150 , logli: 4.85478414915 iteration: 160 , logli: 4.85802067972 iteration: 170 , logli: 4.86675737206 iteration: 180 , logli: 4.89912069533 iteration: 190 , logli: 4.91508202604 EM algorithm terminated. [1, 2, 2] Num of iterations: 192 , logli: 4.9150822899 iteration: 10 , logli: 4.87985526515 iteration: 20 , logli: 4.88627091507 iteration: 30 , logli: 4.88723479659 iteration: 40 , logli: 4.88770291202 iteration: 50 , logli: 4.88795746078 iteration: 60 , logli: 4.88810565844 iteration: 70 , logli: 4.88819663412 iteration: 80 , logli: 4.8882549448 iteration: 90 , logli: 4.88829369783 iteration: 100 , logli: 4.88832026537 iteration: 110 , logli: 4.88833897739 iteration: 120 , logli: 4.88835247272 iteration: 130 , logli: 4.8883624119 iteration: 140 , logli: 4.88836986989 iteration: 150 , logli: 4.88837556031 EM algorithm terminated. [1, 1, 1, 2] Num of iterations: 153 , logli: 4.88837653192 iteration: 10 , logli: 4.84789680934 iteration: 20 , logli: 4.85106678625 EM algorithm terminated. [1, 1, 1, 1, 1] Num of iterations: 27 , logli: 4.85108360418 >>> print(alpha) [[ 0.06503 0.85788 0. 0.07709 0. ]] >>> print(A) [[ -63.30763 0. 0. 0. 0. ] [ 0. -815.71807 815.71807 0. 0. ] [ 0. 0. -815.71807 0. 0. ] [ 0. 0. 0. -12563.08792 12563.08792] [ 0. 0. 0. 0. -12563.08792]] >>> logli = LikelihoodFromTrace(tr, alpha, A) >>> print(logli) 4.91508239686 >>> intBounds = np.linspace(0, MarginalMomentsFromTrace(tr, 1)[0]*4, 50) >>> pdfTrX, pdfTrY = PdfFromTrace(tr, intBounds) >>> pdfPHX, pdfPHY = IntervalPdfFromPH(alpha, A, intBounds) >>> sqPdf = EmpiricalSquaredDifference(pdfTrY, pdfPHY, intBounds) >>> print(sqPdf) 0.00791151022468 >>> rePdf = EmpiricalRelativeEntropy(pdfTrY, pdfPHY, intBounds) >>> print(rePdf) 0.358335256398 >>> cdfTrX, cdfTrY = CdfFromTrace(tr) >>> step = math.ceil(Length(tr)/2000) >>> cdfTrX = cdfTrX[0:Length(tr):step] >>> cdfTrY = cdfTrY[0:Length(tr):step] >>> cdfPHY = CdfFromPH(alpha, A, cdfTrX) >>> sqCdf = EmpiricalSquaredDifference(cdfTrY[0:-1], cdfPHY[0:-1], cdfTrX) >>> print(sqCdf) 9.99019928404e-11 >>> reCdf = EmpiricalRelativeEntropy(cdfTrY[0:-1], cdfPHY[0:-1], cdfTrX) >>> print(reCdf) 0.000185826390424