# I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters

For example,
The function could be something like def RandABCD(n, .25, .34, .25, .25):

Where n is the length of the string to be generated and the following numbers are the desired probabilities of A, B, C, D.

I would imagine this is quite simple, however i am having trouble creating a working program. Any help would be greatly appreciated.

## 13 thoughts on “I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters”

1. user

Please post the code you have so far. It's much easier to help you when we see what you've tried.

2. user

in the function signature you put as an example how would the function know what 4 characters it needs to weigh the probabilities against?

3. user

Well, the first thing I see is that your probabilities don't add up to 1.

4. user

.25, .34, .25, .25 — these don't add up to 1 ;-P

5. user

Out of curiosity, are the letters 'A', 'C', 'G', and 'T' (DNA)?

6. user

probably looking for (.22, .34, .22, .22) as your probabilities.

7. user

The random class is quite powerful in python. You can generate a list with the characters desired at the appropriate weights and then use random.choice to obtain a selection.

First, make sure you do an import random.

For example, let’s say you wanted a truly random string from A,B,C, or D.
1. Generate a list with the characters
li = [‘A’,’B’,’C’,’D’]

1. Then obtain values from it using random.choice
output = “”.join([random.choice(li) for i in range(0, n)])

You could easily make that a function with n as a parameter.

In the above case, you have an equal chance of getting A,B,C, or D.

You can use duplicate entries in the list to give characters higher probabilities. So, for example, let’s say you wanted a 50% chance of A and 25% chances of B and C respectively. You could have an array like this:

li = [‘A’,’A’,’B’,’C’]

And so on.

It would not be hard to parameterize the characters coming in with desired weights, to model that I’d use a dictionary.

characterbasis = {‘A’:25, ‘B’:25, ‘C’:25, ‘D’:25}

Make that the first parameter, and the second being the length of the string and use the above code to generate your string.

8. user

For four letters, here’s something quick off the top of my head:

``````from random import random

def randABCD(n, pA, pB, pC, pD):
# assumes pA + pB + pC + pD == 1
cA = pA
cB = cA + pB
cC = cB + pC
def choose():
r = random()
if r < cA:
return 'A'
elif r < cB:
return 'B'
elif r < cC:
return 'C'
else:
return 'D'
return ''.join([choose() for i in xrange(n)])
``````

I have no doubt that this can be made much cleaner/shorter, I’m just in a bit of a hurry right now.

The reason I wouldn’t be content with David in Dakota’s answer of using a list of duplicate characters is that depending on your probabilities, it may not be possible to create a list with duplicates in the right numbers to simulate the probabilities you want. (Well, I guess it might always be possible, but you might wind up needing a huge list – what if your probabilities were 0.11235442079, 0.4072777384, 0.2297927874, 0.25057505341?)

EDIT: here’s a much cleaner generic version that works with any number of letters with any weights:

``````from bisect import bisect
from random import uniform

def rand_string(n, content):
''' Creates a string of letters (or substrings) chosen independently
with specified probabilities. content is a dictionary mapping
a substring to its "weight" which is proportional to its probability,
and n is the desired number of elements in the string.

This does not assume the sum of the weights is 1.'''
l, cdf = zip(*[(l, w) for l, w in content.iteritems()])
cdf = list(cdf)
for i in xrange(1, len(cdf)):
cdf[i] += cdf[i - 1]
return ''.join([l[bisect(cdf, uniform(0, cdf[-1]))] for i in xrange(n)])
``````
9. user

Here is a rough idea of what might suit you

``````import random as r

def distributed_choice(probs):
r= r.random()
cum = 0.0

for pair in probs:
if (r < cum + pair):
return pair
cum += pair
``````

The parameter `probs` takes a list of pairs of the form (object, probability). It is assumed that the sum of probabilities is 1 (otherwise, its trivial to normalize).

To use it just execute:

``````''.join([distributed_choice(probs)]*4)
``````
10. user

Here’s the code to select a single weighted value. You should be able to take it from here. It uses bisect and random to accomplish the work.

``````from bisect import bisect
from random import random

def WeightedABCD(*weights):
chars = 'ABCD'
breakpoints = [sum(weights[:x+1]) for x in range(4)]
return chars[bisect(breakpoints, random())]
``````

Call it like this: `WeightedABCD(.25, .34, .25, .25)`.

EDIT: Here is a version that works even if the weights don’t add up to 1.0:

``````from bisect import bisect_left
from random import uniform

def WeightedABCD(*weights):
chars = 'ABCD'
breakpoints = [sum(weights[:x+1]) for x in range(4)]
return chars[bisect_left(breakpoints, uniform(0.0,breakpoints[-1]))]
``````
11. user

Hmm, something like:

``````import random
class RandomDistribution:
def __init__(self, kv):
self.entries = kv.keys()
self.where = []
cnt = 0
for x in self.entries:
self.where.append(cnt)
cnt += kv[x]
self.where.append(cnt)

def find(self, key):
l, r = 0, len(self.where)-1
while l+1 < r:
m = (l+r)/2
if self.where[m] <= key:
l=m
else:
r=m
return self.entries[l]

def randomselect(self):
return self.find(random.random()*self.where[-1])

rd = RandomDistribution( {"foo": 5.5, "bar": 3.14, "baz": 2.8 } )
for x in range(1000):
print rd.randomselect()
``````

should get you most of the way…

12. user

Thank you all for your help, I was able to figure something out, mostly with this info.
For my particular need, I did something like this:

``````import random
#Create a function to randomize a given string
def makerandom(seq):
return ''.join(random.sample(seq, len(seq)))
def randomDNA(n, probA=0.25, probC=0.25, probG=0.25, probT=0.25):
notrandom=''
A=int(n*probA)
C=int(n*probC)
T=int(n*probT)
G=int(n*probG)

#The remainder part here is used to make sure all n are used, as one cannot
#have half an A for example.
remainder=''
for i in range(0, n-(A+G+C+T)):
ramainder+=random.choice("ATGC")
notrandom=notrandom+ 'A'*A+ 'C'*C+ 'G'*G+ 'T'*T + remainder
return makerandom(notrandom)
``````