(Non)-Random Number Seed for Bootstrapping

This forum is closed, and read-only.
Locked
stefanbehrens
PLS Expert User
Posts: 54
Joined: Wed Oct 19, 2005 5:53 pm
Real name and title:

(Non)-Random Number Seed for Bootstrapping

Post by stefanbehrens »

Dear Development Team,

I think I may have discovered an issue with the bootstrap estimates for very simple path models on fast computers:

Apparently, the seed for the random number generator (RNG) used to select cases for the bootstrap samples does not always change from run to run. I suspect that system time is used as a seed to "randomize" the RNG for each resample. However, on fast computers and with relatively simple models, the SmartPLS may actually calculate several resamples before the "randomization" seed changes, thus resulting in identical resamples (and parameter estimates).

I'm not sure what this does exactly to the the bootstrap estimates and related t-values, but it can't be good.

Is there any way to fix this (besides running the model on a slower computer)?

All the best,
Stefan


PS: Here some sample output from the bootstrapping report (path coefficients):

Sample 0 0,0037 -0,1155 0,1337 0,5624 -0,2118 -0,1919
Sample 1 0,0037 -0,1155 0,1337 0,5624 -0,2118 -0,1919
Sample 2 -0,0332 -0,0713 0,0237 0,5580 -0,2430 -0,2560
Sample 3 -0,0332 -0,0713 0,0237 0,5580 -0,2430 -0,2560
Sample 4 0,0304 -0,1961 0,0036 0,6400 -0,0717 -0,2091
Sample 5 0,0304 -0,1961 0,0036 0,6400 -0,0717 -0,2091
Sample 6 0,0658 -0,1237 -0,0431 0,5848 -0,1848 -0,1739
Sample 7 0,0658 -0,1237 -0,0431 0,5848 -0,1848 -0,1739
Sample 8 0,0951 -0,1690 0,2187 0,5332 -0,2126 -0,1021
Sample 9 0,0951 -0,1690 0,2187 0,5332 -0,2126 -0,1021
Sample 10 -0,1154 -0,0433 -0,0004 0,6180 -0,2818 -0,2933
Sample 11 -0,1154 -0,0433 -0,0004 0,6180 -0,2818 -0,2933
Sample 12 -0,1990 0,0938 0,1380 0,5396 -0,3738 -0,1740
Sample 13 -0,1990 0,0938 0,1380 0,5396 -0,3738 -0,1740
Sample 14 -0,1990 0,0938 0,1380 0,5396 -0,3738 -0,1740
Sample 15 -0,0360 -0,1343 0,1086 0,6324 -0,2123 -0,1325
Sample 16 -0,0360 -0,1343 0,1086 0,6324 -0,2123 -0,1325
Sample 17 -0,0837 -0,0254 0,1366 0,5996 -0,2713 -0,1032
Sample 18 -0,0837 -0,0254 0,1366 0,5996 -0,2713 -0,1032
...
.


The model I was testing had 6 independent LVs regressed on 1 dependent LV. Each LV was measured by a single indicator (LV-scores from a previous PLS-run). The sample comprised 42 cases.
User avatar
cringle
SmartPLS Developer
Posts: 818
Joined: Tue Sep 20, 2005 9:13 am
Real name and title: Prof. Dr. Christian M. Ringle
Location: Hamburg (Germany)
Contact:

Post by cringle »

Hi Stefan!

Bascially, each subsample is randomly created with cases from the original sample. In your example, I am just wondering about the pairwise identical results...

We check on that.

Best
Christian
stefanbehrens
PLS Expert User
Posts: 54
Joined: Wed Oct 19, 2005 5:53 pm
Real name and title:

Post by stefanbehrens »

Christian,

I am aware of the functioning of the algorithm in principle. However, the "randomness" of the case selection in the resampling is essential here. Knowing that it is impossible for a computer program to produce truly random numbers, I am pointing out a weakness in the pseudo-random number generator (PRNG) that SmartPLS is using (probably the one built into the Java platform, I presume).

Whenever a PRNG is initialized with identical seeds, it will produce an identical series of random numbers (and thus select the same cases from the sample "at random"). In most applications and programming environments, system time (in milisecs) is used to provide a changing seed for the PRNG (typically this is done by inserting a "randomize();" command before you start producing random numbers). However, if from one run to the next the system time (in milisecs) has not changed, then the randomization will only result in the same series of pseudo-random numbers being produced.

Maybe one of the following ideas could help fix the problem:
a) slow down the calculation procedure for the bootstrap (just enough so that each run takes at least 1ms) or
b) use the "randomize();" command only every nth run so that the randomization occurs in time intervals >1ms

I've also checked how this multiplication of identical runs in the bootstrap affects the estimates (t-values, etc) and found little effect for large numbers of runs (>1000). However, it is a problem if the number of runs is set very low (e.g. <200).

A possible workaround for the time being goes as follows:
- run the bootstrap with a large number of resamples (e.g. 2000)
- copy the data from the report into Excel
- check for equality of one line to the next
- delete redundant lines
- calculate the boostrap statistics in Excel from the remaining truly unique lines (mean, stdev, t-value)

Cheers,
Stefan

PS: Here is some more report output data to show that the problem is not only about duplicate lines, but rather (depending on computer speed) several identical lines. This was run using a model of 6 independent and 1 dependent LV. Each LV had one indicator. The sample comprised 40 cases and the same number of cases were set for each resample. The model was run on Laptop with a 1.6GhZ Pentium M, Sonoma chipset, and 1GB RAM.

Sample 0 -0,1618 -0,2888 0,2299 0,5556 -0,0536 -0,1482
Sample 1 0,0406 -0,4319 0,3569 0,5328 -0,0183 -0,0936
Sample 2 -0,1654 -0,1141 0,4500 0,5330 -0,1404 0,1131
Sample 3 0,1274 -0,2103 0,1656 0,7546 -0,0847 -0,2080
Sample 4 0,1274 -0,2103 0,1656 0,7546 -0,0847 -0,2080
Sample 5 -0,3177 -0,0703 0,4185 0,4816 0,1814 0,2453
Sample 6 0,0281 -0,3024 0,3477 0,4880 -0,0882 -0,0231
Sample 7 0,0281 -0,3024 0,3477 0,4880 -0,0882 -0,0231
Sample 8 0,0281 -0,3024 0,3477 0,4880 -0,0882 -0,0231
Sample 9 -0,2096 -0,5909 0,4363 0,2391 -0,0739 0,0480
Sample 10 -0,2096 -0,5909 0,4363 0,2391 -0,0739 0,0480
Sample 11 -0,2096 -0,5909 0,4363 0,2391 -0,0739 0,0480
Sample 12 0,0687 -0,2661 0,3348 0,5511 -0,2903 -0,0090
Sample 13 0,0687 -0,2661 0,3348 0,5511 -0,2903 -0,0090
Sample 14 0,0507 -0,3834 0,3816 0,6378 -0,0190 -0,1298
Sample 15 0,0507 -0,3834 0,3816 0,6378 -0,0190 -0,1298
Sample 16 0,0320 -0,1433 0,3896 0,6447 -0,2545 0,0572
Sample 17 0,0320 -0,1433 0,3896 0,6447 -0,2545 0,0572
Sample 18 0,0320 -0,1433 0,3896 0,6447 -0,2545 0,0572
Sample 19 0,1119 -0,4188 0,4484 0,6017 -0,1768 -0,0578
Sample 20 0,1119 -0,4188 0,4484 0,6017 -0,1768 -0,0578
Sample 21 0,1119 -0,4188 0,4484 0,6017 -0,1768 -0,0578
Sample 22 0,1588 -0,4926 0,3758 0,5599 -0,0387 -0,1945
Sample 23 0,1588 -0,4926 0,3758 0,5599 -0,0387 -0,1945
Sample 24 0,1588 -0,4926 0,3758 0,5599 -0,0387 -0,1945
Sample 25 0,0967 -0,1814 -0,0797 0,7523 -0,0990 -0,1594
Sample 26 0,0967 -0,1814 -0,0797 0,7523 -0,0990 -0,1594
Sample 27 0,0967 -0,1814 -0,0797 0,7523 -0,0990 -0,1594
Sample 28 0,0967 -0,1814 -0,0797 0,7523 -0,0990 -0,1594
Sample 29 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 30 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 31 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 32 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 33 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 34 -0,0958 -0,2777 0,5164 0,7048 -0,1075 0,1547
Sample 35 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 36 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 37 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 38 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 39 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 40 0,3778 -0,1678 -0,3332 0,9999 0,0855 -0,0622
Sample 41 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 42 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 43 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 44 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 45 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 46 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 47 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 48 0,3976 -0,1035 -0,3952 0,8609 0,1412 -0,2086
Sample 49 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 50 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 51 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 52 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 53 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 54 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 55 -0,1288 -0,0363 -0,1639 0,7339 0,0230 -0,2226
Sample 56 0,1913 -0,3949 0,1117 0,6225 -0,1573 -0,4008
Sample 57 0,1913 -0,3949 0,1117 0,6225 -0,1573 -0,4008
Sample 58 0,1913 -0,3949 0,1117 0,6225 -0,1573 -0,4008
Sample 59 0,1913 -0,3949 0,1117 0,6225 -0,1573 -0,4008
Sample 60 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 61 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 62 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 63 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 64 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 65 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 66 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 67 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
Sample 68 -0,0988 -0,1841 -0,0262 0,6536 -0,0487 -0,1454
...
User avatar
swende
SmartPLS Developer
Posts: 111
Joined: Mon Sep 26, 2005 10:34 am
Real name and title: Dipl. Wi-Inf. Sven Wende
Location: Hamburg (Germany)

Post by swende »

Hi Stefan,

thanks for your efforts - it seems, you become one of our best debuggers! ;)

I just took a look into our program code and really found a problem.

But the problems you described are not caused by a lack of "randomness".

In fact, there is a bug within the report viewer. When you calculate your model more than once using the same algorithm, the different report viewers work on cached algorithm results. Thats why you always see the same results during Bootstrapping.

The bug can be observed, trying this:

- 1. Calculate a model with 30 samples and let the report show (but do not look at any report details at this moment).
- 2. Calculate the same model with 200 samples and let the report show
- 3. If you take a look at the first report viewer now, you will see, that there are results for 200 samples

I already fixed the bug.

Until the next release (which will be available soon), I suggest the following workarrounds:

- do only show one report in SmartPLS at a time
- if you need to compare report results from different runs, transfer the report results to Excel or generate a HTML - report after each calculation

Sven
Sven Wende, CEO SmartPLS GmbH
stefanbehrens
PLS Expert User
Posts: 54
Joined: Wed Oct 19, 2005 5:53 pm
Real name and title:

Post by stefanbehrens »

Dear Sven,

thanks for looking into this so quickly. However, I'm still not convinced that the problem you described is cause for the resample-duplications.

I just opened SmartPLS fresh from the Start Menu. Then I opened my simple path model (see above) and ran a bootstrap. Looking at the results still shows the same issue as reported above (see below).

I should point out that running the bootstrap on more complicated models (which have a few indicators for each LV) shows no sign of resample-duplication. As I mentioned above, I believe this due to the fact that the PLS-estimation for each run takes more than 1ms in more complicated models.

Any thoughts? I will try running the HTML reports and see what that does...

Cheers,
Stefan



DATA from a Bootstrap of the simple model:

Sample 0 0,0000 -0,0942 0,1661 0,7536 -0,1551 -0,0710
Sample 1 0,0000 -0,0942 0,1661 0,7536 -0,1551 -0,0710
Sample 2 -0,1377 -0,2231 0,2390 0,5394 -0,0554 -0,1293
Sample 3 -0,1377 -0,2231 0,2390 0,5394 -0,0554 -0,1293
Sample 4 -0,1377 -0,2231 0,2390 0,5394 -0,0554 -0,1293
Sample 5 -0,0112 -0,1887 0,1613 0,6461 -0,0503 -0,2234
Sample 6 -0,0112 -0,1887 0,1613 0,6461 -0,0503 -0,2234
Sample 7 -0,0112 -0,1887 0,1613 0,6461 -0,0503 -0,2234
Sample 8 -0,1007 -0,3398 0,3062 0,5037 -0,1243 -0,0527
Sample 9 -0,1007 -0,3398 0,3062 0,5037 -0,1243 -0,0527
Sample 10 0,2324 -0,1712 0,0173 0,7331 -0,1304 -0,2058
Sample 11 0,2324 -0,1712 0,0173 0,7331 -0,1304 -0,2058
Sample 12 0,2324 -0,1712 0,0173 0,7331 -0,1304 -0,2058
Sample 13 0,2324 -0,1712 0,0173 0,7331 -0,1304 -0,2058
Sample 14 0,1539 -0,3880 0,2769 0,6120 0,0328 0,1769
Sample 15 0,1539 -0,3880 0,2769 0,6120 0,0328 0,1769
Sample 16 0,1539 -0,3880 0,2769 0,6120 0,0328 0,1769
Sample 17 0,1539 -0,3880 0,2769 0,6120 0,0328 0,1769
Sample 18 -0,0832 -0,3777 0,4860 0,2904 -0,1451 0,0006
Sample 19 -0,0832 -0,3777 0,4860 0,2904 -0,1451 0,0006
Sample 20 -0,0832 -0,3777 0,4860 0,2904 -0,1451 0,0006
...


DATA from a Bootstrap of a more complicated model:

Sample 0 0,4486 -0,0811 -0,2028 -0,3442 -0,3547 -0,5762 -0,7420 -0,1283 -0,1676 0,1995 0,2495 -0,0729 -0,3378 0,2261 -0,0923 0,7041 0,4184 0,2751 0,0241
Sample 1 0,6038 0,1148 -0,0742 -0,1074 -0,1649 0,1023 -0,5724 -0,4208 0,0740 0,0339 0,2426 0,2134 -0,1531 -0,0872 -0,5525 0,6402 -0,3336 -0,3680 -0,0519
Sample 2 0,6764 -0,1860 0,0007 0,0041 -0,1676 -0,2656 -0,5892 -0,3305 -0,0223 0,2191 0,2911 0,3027 -0,1241 0,1444 -0,2352 0,4670 0,1529 -0,0140 0,1783
Sample 3 0,5692 0,1549 -0,0964 0,0206 -0,2320 -0,2566 -0,7618 -0,1386 -0,0812 0,2477 0,6890 0,5256 -0,3056 0,1491 -0,5296 0,6271 0,0858 -0,2318 0,0380
Sample 4 0,4198 -0,0084 -0,1715 -0,1612 -0,0441 -0,3555 -0,5182 -0,3492 -0,0629 0,3565 0,3165 0,2675 -0,2306 0,2222 -0,3605 0,4984 0,2732 -0,1099 0,0189
Sample 5 0,5610 -0,1069 -0,2620 0,3242 -0,4177 -0,3751 -0,6373 -0,2229 -0,1270 0,3427 -0,0574 0,0865 -0,2525 0,4716 -0,3166 0,3443 -0,0195 -0,0983 0,1821
Sample 6 0,3189 -0,2443 -0,3570 -0,1508 -0,5398 -0,4494 -0,6976 -0,2317 -0,2832 0,2301 0,1458 0,0237 -0,1794 0,4438 -0,0971 0,5443 0,1861 -0,1156 0,0509
Sample 7 0,5278 -0,1922 -0,0891 -0,1479 -0,2822 0,1446 -0,5549 -0,4458 -0,0929 0,1065 0,2203 0,3748 -0,0255 0,0534 -0,2011 0,3849 -0,1010 -0,1086 0,0311
Sample 8 0,3908 -0,1483 -0,2448 -0,1787 -0,4888 -0,4346 -0,6332 -0,1067 -0,2265 0,3165 0,2890 0,1621 -0,3325 0,4559 -0,0403 0,3008 0,2183 -0,1155 0,0671
Sample 9 0,4042 -0,0968 -0,2548 -0,0434 -0,2554 -0,4022 -0,6359 -0,1792 -0,0592 0,2218 0,1990 0,0289 -0,1687 0,2669 -0,3663 0,4420 -0,0138 -0,1789 -0,1777
Sample 10 0,4729 0,0180 -0,2131 0,0952 -0,6726 -0,5873 -0,7254 -0,3104 -0,3740 0,3494 0,2840 0,1912 -0,1603 0,2069 -0,1627 0,1506 0,1430 -0,1418 -0,0029
Sample 11 0,4139 -0,0649 -0,1824 0,4574 -0,3535 -0,3209 -0,5152 -0,3285 -0,3637 -0,0082 -0,2792 -0,4080 -0,2654 0,3779 -0,1469 0,6693 0,3293 0,1630 0,0741
Sample 12 0,2078 -0,2348 -0,4039 -0,1378 -0,3928 -0,3887 -0,6216 -0,0791 0,0411 0,2388 0,1976 0,2159 -0,3035 0,6562 -0,2685 0,6134 0,3328 0,2672 -0,0138
Sample 13 0,7450 0,0403 -0,0217 0,1856 -0,3147 -0,2103 -0,6658 -0,1839 -0,2384 0,3000 0,4028 0,4090 -0,2463 0,2254 -0,0711 0,4434 0,0950 0,0124 -0,0617
Sample 14 0,4803 -0,2672 -0,2921 0,0566 -0,0944 0,1010 -0,5110 -0,1463 -0,4455 0,4332 0,3393 0,2803 0,0810 0,4753 -0,2176 0,4776 -0,0804 -0,1413 0,0878
Sample 15 0,5832 -0,0566 -0,1361 -0,1983 0,0443 0,1021 -0,7080 -0,3464 -0,4897 0,3080 0,4090 0,1846 -0,1472 0,1032 0,1662 0,4012 -0,1550 0,1547 0,1279
Sample 16 0,3106 0,0277 -0,2403 0,0013 -0,4043 -0,2757 -0,5377 -0,0264 -0,3951 0,2498 0,1351 0,0347 -0,1858 0,5284 -0,2501 0,5152 0,1939 0,0127 -0,1264
Sample 17 0,2455 -0,0227 -0,2747 -0,3851 -0,0546 -0,2954 -0,7574 -0,1577 0,2965 -0,0062 0,2463 0,0500 -0,2234 0,5444 -0,2861 0,5072 0,2504 0,0878 -0,0182
Sample 18 0,7312 0,2383 -0,0273 -0,0065 -0,4845 -0,4682 -0,7439 -0,1433 -0,2667 0,3201 0,6383 0,3581 -0,2134 -0,0045 -0,3792 0,7096 -0,1339 -0,1592 -0,1526
Sample 19 0,5656 0,2236 -0,0511 -0,1744 -0,3140 -0,2574 -0,5999 -0,2433 -0,1975 0,0272 0,5196 0,0671 -0,1687 0,1080 -0,1721 0,5470 -0,0386 -0,1869 -0,1818
Sample 20 0,5893 0,0429 -0,1485 0,2680 -0,6095 -0,1886 -0,4945 -0,1253 -0,1406 0,4187 0,0780 0,2954 -0,4470 0,2766 -0,3310 0,3937 -0,2287 -0,3213 0,1093
User avatar
swende
SmartPLS Developer
Posts: 111
Joined: Mon Sep 26, 2005 10:34 am
Real name and title: Dipl. Wi-Inf. Sven Wende
Location: Hamburg (Germany)

Post by swende »

You are right.

I thought you were talking about two different bootstrap runs, which cause the same results.

So we have two bugs here. The one I described and the "randomness" problem, you describe.

At the moment, the random numbers in SmartPLS are generated using the standard Java random number generator and the current system time as seed.

Like this:

Code: Select all

Random r = new Random(System.currentTimeMillis());
I think, we can workarround this problem in code, using a safer random generator, e.g.

Code: Select all

java.security.SecureRandom
, which should produce very reliable random numbers at the cost of time and performance.
Sven Wende, CEO SmartPLS GmbH
stefanbehrens
PLS Expert User
Posts: 54
Joined: Wed Oct 19, 2005 5:53 pm
Real name and title:

Post by stefanbehrens »

Hi Sven,

so this confirms my hypothesis about the randomization issue. Given today's computer power, I would propose to go with the "more random" way of generating the random numbers. However, if you wanted to put icing on the cake, you could make it optional somewhere in the settings ;-)

Looking forward to your next release.

All the best and thanks for the swift response,
Stefan
domusp
PLS Junior User
Posts: 4
Joined: Sun Oct 30, 2005 2:33 pm
Real name and title:

Post by domusp »

swende wrote:
I just took a look into our program code and really found a problem.

But the problems you described are not caused by a lack of "randomness".

In fact, there is a bug within the report viewer. When you calculate your model more than once using the same algorithm, the different report viewers work on cached algorithm results. Thats why you always see the same results during Bootstrapping.
Hi Sven,
I also encountered the problem you are describing, though it was a different context. When I ran several calculations with FIMIX and used a different number of segments each run, I "obtained" identical results, at least that's what was displayed. Is it possible, that this is caused by the same cache problem?

Best,

Dominik.
User avatar
swende
SmartPLS Developer
Posts: 111
Joined: Mon Sep 26, 2005 10:34 am
Real name and title: Dipl. Wi-Inf. Sven Wende
Location: Hamburg (Germany)

Post by swende »

Hi Dominik,

yes. The caching bug I described can occur for any algorithm.

Sven
Sven Wende, CEO SmartPLS GmbH
Locked