# Algorithm for Generating a Random Grand Tour Sequence

Algorithm for Generating a Random Grand Tour Sequence

by Bob Day

1. Introduction
This post describes a method I came up with for generating a sequence for touring data blocks in a computer in random order.  The method assures that the tour will include the entire set of data blocks (a "grand tour") and that the set contains no internal cycles.  This problem occurred for me recently in conjunction with a program I wrote to determine the access latency of reads of random data blocks of memory in an array (see Section 2, below).  Due to the requirements of the program, the generation of the random block sequence had to be done in place, with no additional array storage.  No doubt this method has been discovered many times by many others, but I hadn't seen it (maybe I don't do enough reading), and I thought it was interesting.  Outside of algorithms that have their roots in mathematical theory, such as RSA encryption, it's one of the very few algorithms I've encountered in programming that wasn't intuitively obvious (at least to me) and required a mathematical proof.

2. Description of the Algorithm
a)  Allocate an array of N structures, "blocks", each of which is at least long enough to contain an index number.  Number the positions of the blocks in the array from 0 to N-1.  Think of the blocks as going in a row, from left to right.

b)  In each block, store its own position (index).  Thus block 0 will contain 0 (in addition to any other data), block 1 will contain 1, etc.

c)  Proceed block by block through the array for N-1 iterations of a loop, starting with block 0: At each block, select at random any of the blocks to its right, and swap the indexes that are stored in the blocks.  For example, at the 101st iteration, if block 100 contained 37 (due to an earlier swap), and the random block selected was 581, which still contained 581, their indexes would now be swapped so that block 100 would contain 581, and block 581, 37.

3. Proof
After I wrote the code for the above algorithm, I thought, "Well, yeah, but how do we know that they're no internal cycles?  Maybe, for example, something like this could happen: block 51 goes to block 11; block 11 goes to block 23, and block 23 goes back to block 51."  As it turns out, there can be no internal cycles.  The method generates a single "grand tour" that includes all of the blocks.  Here's the proof (it's not too hard, but it took me a while to think of it):

1.  After step a) in the preceding section, there are N cycles in the array — each block points to itself.

2.  The first iteration in step c) in the preceding section unites two blocks, block 0, and the random block chosen, into a single cycle.  There are now N-1 cycles in the array, and each of the N-1 blocks to the right of block 0 is in a different cycle. There are N-2 swaps remaining to be performed.

3.  In the second iteration in step c), block 1 and a random block to the right of block 1 are selected to have their indexes swapped.  These blocks are in different cycles (since all of the blocks to the right of block 0 are in different cycles), so swapping their indexes will unite two cycles into one.  There are now N-2 cycles in the array, and each of the N-2 blocks to the right of block 1 is in a different cycle. There are N-3 swaps remaining to be performed.

4.  With each iteration in step c) two blocks are selected that are in different cycles and their indexes are swapped, uniting two cycles into one, and reducing the number of cycles by one.  After iteration i (numbering i from 0), there are (N-i-1) cycles in the array; each of the (N-i-1) blocks to the right of the block at position i is in a different cycle, and there are (n-i-2) iterations remaining.­ All of the blocks to the right of the one we just swapped (that is, the one at position i) are all in different cycles.

5.  After the last iteration (where i equals N-2), there are (N-(N-2)-1) cycles in the array — which equals exactly one!  Voila!

2. Example Code

Here is the code I referred to in the Introduction that implements the algorithm.

//
// MemLatencyRand – A program to compute random access memory
// latency and block read times.
//
// Author: Bob Day
//
// Version 1.0
//
//
// This program may be used and distributed, with
// acknowledgment to the author, under the terms
// of the GNU General Public License.
//
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys\timeb.h>
#include <time.h>
#include <windows.h>
#include <winuser.h>
//
// Function Declarations
//
long BlockReads (long** ppMemBufL, char* pBlockBuf, long blockSizeB,
long numIterations, double* pdTime);
double Drand (long*, unsigned long*);
long Latency (long** ppMemBufL, char* pBlockBuf, long blockSizeB,
long numIterations, double* pdTime);
long NullLoop (long numIterations, double* pdTime);
unsigned long ran32 (void);
//
// Globals
//
long drand_param = -71268367;       // Initialize the parameter for Drand.
//
// Functions
//int main (int argc, char* argv[])
{
double  averageTime;
double  numIterationsD;
double  latencyTime;
double  nullLoopTime;
char*   pBlockBuf = NULL;
long**  ppMemBufL = NULL;
long**  ppMemBuf1L = NULL;
char*   pMemBufB = NULL;
long*   pTempL = NULL;
long**  ppTemp1L = NULL;
long**  ppTemp2L = NULL;
long    blockArraySizeB;
long    blockSizeB;
long    idx;
long    idxMax;
long    jdx;
long    memBufSizeB;
long    memBufSizeL;
long    memBufSizeMB;
long    numBlocks;
long    numIterations;
long    randIdx;
long    tempLong;
long    rc;

rc = 0;                                 // The default is success.
if (argc != 4
{
printf ("%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
"memLatencyRand – A program to compute random access\n",
"memory latency and block read times.\n",
"Author: Bob Day\n",
"Version 1.0\n\n",
"The inputs are a block size (blockSize) in bytes, a memory buffer\n",
"size (bufSize) in megabytes, and a number of iterations (numIter).\n",
"The program allocates an array of however many blocks, each of\n",
"blockSize bytes, that it takes to fill the memory buffer.  Then\n",
"the program determines the memory latency by looping numIter times\n",
"selecting blocks at random and reading the first (4 byte) longword\n",
"in each block.  From the time required to execute this loop, and\n",
"latency time in nanoseconds.  Then the program performs a similar\n",
"loop, but this time reading entire blocks each time, and from the\n",
"timing information it gathers, computes the average read time for\n",
"a randomly selected block.\n\n",
"Call format:\n",
"       memLatencyRand blockSize bufSize numIter\n",
"Where:\n",
"       blockSize is the block size in bytes,\n",
"       bufSize is the buffer size in megabytes, and\n",
"       numIter is the number of iterations to be performed,\n");

goto exit;
}

// Get the arguments…
blockSizeB = atol(argv[1]);             // Block size in bytes.
memBufSizeMB = atol(argv[2]);
memBufSizeB = memBufSizeMB * 1000000;   // Memory buffer size in bytes.
numIterations = atol(argv[3]);          // Number of iterations.
if (blockSizeB < 4)
{
printf ("%s\n", "The block size must be at least 4 bytes; exiting.");
goto exit;
}

if (blockSizeB > memBufSizeB)
{
printf ("%s\n", "The buffer size must at least the block size; exiting.");
goto exit;
}

printf ("%s%d%s\n%s%d%s\n%s%d%s\n\n",
"Block size: ", blockSizeB, " bytes.",
"Buffer size: ", memBufSizeMB, " megabytes.",
"Number of iterations: ", numIterations, ".");

// Initialize the pseudorandom number
// generator…
drand_param = time (0);                 // Generate a random number seed.
if (drand_param > 0)                    // If it's greater than zero,
drand_param = -drand_param;           // make it negative.
else if (drand_param == 0)              // If it's zero,
drand_param = -71268367;              // pick a negative number (anything).

memBufSizeL = memBufSizeB >> 2;         // Memory buffer size in longwords.
numBlocks = memBufSizeB / blockSizeB;   // The number of blocks.
blockArraySizeB = numBlocks * blockSizeB;// The total size of the array of blocks.
ppMemBufL = (long**)malloc ((blockArraySizeB + 3) & ~3); // Interpret the buffer as longs. // (Make sure the buffer is an integral number of longwords.)

if (ppMemBufL == NULL)
{
printf ("%s\n", "Error allocating main buffer; exiting.");
goto exit;
}

pBlockBuf = (char*)malloc (blockSizeB); // Allocate a buffer to do the RAM reads into.
if (pBlockBuf == NULL)
{
printf ("%s\n", "Error allocating block buffer; exiting.");
goto exit;
}

pMemBufB = (char*)ppMemBufL;            // We'll use this later.
// Initialize the buffer…
tempLong = 0;
for (idx = 0; idx < numBlocks; ++idx)   // 1. At the beginning of each block,
{                                     //    write its index.
pTempL = (long*)(pMemBufB + tempLong);
*pTempL = idx;
tempLong += blockSizeB;
}

idxMax = numBlocks – 1;
for (idx = 0, jdx = 1; idx < idxMax; ++idx, ++jdx)
{                                     // 2. Shuffle the indexes…
// (Note: This method of shuffling produces a single loop that includes all of the blocks.)
randIdx = (long)((double)(numBlocks – jdx) * Drand (&drand_param, NULL));

// Pick a number in the range [0, numBlocks-jdx-1].
// Swap it with the current item:
ppTemp1L = (long**)(pMemBufB + idx * blockSizeB);
ppTemp2L = (long**)(pMemBufB + (randIdx + jdx) * blockSizeB);
pTempL = *ppTemp1L;
*ppTemp1L = *ppTemp2L;
*ppTemp2L = pTempL;
}

tempLong = 0;                           // 3. Turn the indexes into actual addresses…
for (idx = 0; idx < numBlocks; ++idx)
{
ppTemp1L = (long**)(pMemBufB + tempLong);
*ppTemp1L = (long*)(pMemBufB + (__int64)(*ppTemp1L) * blockSizeB);
tempLong += blockSizeB;
}

rc = NullLoop (numIterations, &nullLoopTime);
if (rc != 0)                            // If error…
{
printf ("%s\n", "Error computing loop overhead; exiting.");
goto exit;
}

rc = Latency (ppMemBufL, pBlockBuf, blockSizeB, numIterations, &latencyTime);
if (rc != 0)                            // If error…
{
printf ("%s\n", "Error computing latency; exiting.");
goto exit;
}

numIterationsD = (double)numIterations;
averageTime = ((latencyTime – nullLoopTime) / numIterationsD) * 1e9;
printf ("%s%s%f%s\n\n",
"(single 4 byte integer read per block): ",
averageTime,
" nanoseconds.");

if (rc != 0)                            // If error…
{
printf ("%s\n", "Error doing block reads; exiting.");
goto exit;
}

averageTime = ((blockReadTime – nullLoopTime) / numIterationsD) * 1e9;
printf ("%s%f%s\n", "Random block read time: ", averageTime, " nanoseconds.");

exit:
if (ppMemBufL != NULL) free (ppMemBufL);
if (pBlockBuf != NULL) free (pBlockBuf);
printf ("%s", "\n");
// printf ("%s", "\nPress any key to exit:");
// getchar();
// Sleep (INFINITE);

if (rc != 0) rc = 0;
return rc;
}

long Latency (long** ppMemBufL, char* pBlockBuf, long blockSizeB,
long numIterations, double* pdTime)
{
double  dTime;
HANDLE  hProcess = NULL;
long    idx;
long    rc;
long    t1;
long    t2;
rc = 0;                                 // The default is success.
*pdTime = (double)0.0;
hProcess = GetCurrentProcess ();
rc = SetPriorityClass (hProcess, REALTIME_PRIORITY_CLASS);
// rc = SetPriorityClass (hProcess, HIGH_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

t1 = timeGetTime();                     // Get the starting time in milliseconds.
for (idx = 0; idx < numIterations; ++idx)
{
ppMemBufL = (long**)(*ppMemBufL);
// memcpy (pBlockBuf, (char*)ppMemBufL + 4, blockSizeB – 4);
}
t2 = timeGetTime();                     // Get the ending time in milliseconds.
rc = SetPriorityClass (hProcess, NORMAL_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

if ((long)(__int64)ppMemBufL == NULL)   // Do Something with ppMemBufL to make
{                                     // sure the compiler won't optimize it
rc = -1;                              // out.  (If ppMemBufL is NULL, it's
goto exit;                            // definitely an error!)
}

dTime = ((double)(t2 – t1)) / (double)1000.0;
// printf ("Latency time (including loop overhead): %f seconds.\n", dTime);
// Display the time in seconds.
*pdTime = dTime;                        // Return the time.
rc = 0;                                 // If we get here, we were successful.
exit:
return rc;
} // End of Latency

long NullLoop (long numIterations, double* pdTime)
{
double  dTime;
HANDLE  hProcess = NULL;
long    rc;
long    t1;
long    t2;
rc = 0;                                 // The default is success.
*pdTime = (double)0.0;
hProcess = GetCurrentProcess ();
rc = SetPriorityClass (hProcess, REALTIME_PRIORITY_CLASS);
// rc = SetPriorityClass (hProcess, HIGH_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

t1 = timeGetTime();                     // Get the starting time in
// milliseconds.
__asm
{
push    eax
mov     eax, numIterations
test    eax, eax
jle L2
L1:   dec     eax
jne L1
L2:   pop     eax
}

t2 = timeGetTime();                     // Get the ending time in milliseconds.
rc = SetPriorityClass (hProcess, NORMAL_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

dTime = ((double)(t2 – t1)) / (double)1000.0;
// printf ("Loop overhead time = %f seconds.\n", dTime);
// Display the time in seconds.
*pdTime = dTime;                        // Return the time.
rc = 0;                                 // If we get here, we were successful.
exit:
return rc;
} // End of NullLoop.

long BlockReads (long** ppMemBufL, char* pBlockBuf, long blockSizeB,
long numIterations, double* pdTime)
{
double  dTime;
HANDLE  hProcess = NULL;
long    idx;
long    rc;
long    t1;
long    t2;

rc = 0;                                 // The default is success.
*pdTime = (double)0.0;
hProcess = GetCurrentProcess ();
rc = SetPriorityClass (hProcess, REALTIME_PRIORITY_CLASS);
// rc = SetPriorityClass (hProcess, HIGH_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

t1 = timeGetTime();                     // Get the starting time in milliseconds.
for (idx = 0; idx < numIterations; ++idx)
{
ppMemBufL = (long**)(*ppMemBufL);
memcpy (pBlockBuf, (char*)ppMemBufL + 4, blockSizeB – 4);
}

t2 = timeGetTime();                     // Get the ending time in milliseconds.
rc = SetPriorityClass (hProcess, NORMAL_PRIORITY_CLASS);
if (rc == 0)
{
printf ("%s\n", "Error setting priority class.");
rc = -1;
goto exit;
}

if (rc == 0)
{
printf ("%s\n", "Error setting thread priority.");
rc = -1;
goto exit;
}

if ((long)(__int64)ppMemBufL == NULL)   // Do Something with ppMemBufL to make
{                                     // sure the compiler won't optimize it
rc = -1;                              // out.  (If ppMemBufL is NULL, it's
goto exit;                            // definitely an error!)
}

dTime = ((double)(t2 – t1)) / (double)1000.0;
// Display the time in seconds.
*pdTime = dTime;                        // Return the time.
rc = 0;                                 // If we get here, we were successful.

exit:
return rc;

/******************************************************************************/

/*                                                                            */

/* FUNCTION: ran32                                                            */
/*                                                                            */
/* DESCRIPTION:                                                               */
/*   Returns a 32 bit pseudorandom number.                                    */
/*                                                                            */
/******************************************************************************/
unsigned long ran32 (void)
{
unsigned long   rand1;
unsigned long   rand2;
Drand (&drand_param, &rand1);       /* Get a 30 bit random number, and        */
Drand (&drand_param, &rand2);       /* another one.                           */
return ((rand1 & 0x00FFFF00) << 8) | ((rand2 & 0x00FFFF00) >> 8);
}
/******************************************************************************/
/*                                                                            */
/* End of ran32                                                               */
/*                                                                            */
/******************************************************************************/

/******************************************************************************/
/*                                                                            */
/* FUNCTION: Drand                                                            */
/*                                                                            */
/* DESCRIPTION:                                                               */
/*   Long period random number generator of L'Ecuyer with Bays-Durham shuffle */
/*   and added safeguards.  Returns a 31 bit uniform random value.  Call with */
/*   idum a negative integer to initialize; thereafter, do not alter idum     */
/*   between successive deviates in a sequence.                               */
/*                                                                            */
/* SOURCE:                                                                    */
/*   Numerical Recipes in C, Second Edition, 1992                             */
/*                                                                            */
/******************************************************************************/
/*                                                                            */
/* Symbol Definitions                                                         */
/*                                                                            */
#define IM1 2147483563
#define IM2 2147483399
#define AM  (1.0/IM1)
#define IMM1    (IM1-1)
#define IA1 40014
#define IA2 40692
#define IQ1 53668
#define IQ2 52774
#define IR1 12211
#define IR2 3791
#define NTAB    32
#define NDIV    (1+IMM1/NTAB)
/*                                                                            */
/* Start of Code                                                              */
/*                                                                            */
double Drand (long *idum, unsigned long* pIrand)
{
double      randNum;
static long idum2 = 123456789;
static long iy = 0;
static long iv[NTAB];
long        j;
long        k;

if (*idum <= 0)                         /* If we're initializing…           */
{
*idum = (*idum == 0) ? 1 : -(*idum);  /* (Be sure to prevent idum == 0).    */
idum2 = *idum;
for (j = NTAB+7; j >= 0; –j)         /* Load the shuffle table (after 8    */
{                                   /* warm-ups).                         */
k = (*idum)/IQ1;
*idum = IA1 * (*idum – k*IQ1) – k*IR1;
if (*idum < 0) *idum += IM1;
if (j < NTAB) iv[j] = *idum;
}

iy = iv[0];
}

k = (*idum)/IQ1;
*idum = IA1 * (*idum – k*IQ1) – k*IR1;  /* Compute idum = (IA1 * idum) % IM1  */
if (*idum < 0) *idum += IM1;            /* without overflows by Schrage's     */
/* Method.                            */
k = idum2/IQ2;
idum2 = IA2 * (idum2 – k*IQ2) – k*IR2;  /* Compute idum2 = (IA2 * idum) % IM2 */
if (idum2 < 0) idum2 += IM2;            /* likewise.                          */
j = iy/NDIV;                            /* Will be in the range 0..NTAB-1.    */
iy = iv[j] – idum2;                     /* Here idum is shuffled; idum and    */
iv[j] = *idum;                          /* idum2 are combined to generate     */
/* output.                            */
if (iy < 0) iy += IMM1;
if (pIrand != NULL) *pIrand = iy;
randNum = ((double)(iy & 0x1FFFFFFF))/(double)536870912.0;
if ((randNum < (double)0.0) || (randNum >= (double)1.0))
{
printf ("Value out of range in Drand function (%f).\n", randNum);
}

return randNum;                         /* Return R in range 0.0 <= R < 1.0   */
}
/******************************************************************************/
/*                                                                            */
/* End of Drand                                                               */
/*                                                                            */
/******************************************************************************/