I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters

For example,
The function could be something like def RandABCD(n, .25, .34, .25, .25):

Where n is the length of the string to be generated and the following numbers are the desired probabilities of A, B, C, D.

I would imagine this is quite simple, however i am having trouble creating a working program. Any help would be greatly appreciated.

13 thoughts on “I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters

  1. user

    in the function signature you put as an example how would the function know what 4 characters it needs to weigh the probabilities against?

  2. user

    The random class is quite powerful in python. You can generate a list with the characters desired at the appropriate weights and then use random.choice to obtain a selection.

    First, make sure you do an import random.

    For example, let’s say you wanted a truly random string from A,B,C, or D.
    1. Generate a list with the characters
    li = [‘A’,’B’,’C’,’D’]

    1. Then obtain values from it using random.choice
      output = “”.join([random.choice(li) for i in range(0, n)])

    You could easily make that a function with n as a parameter.

    In the above case, you have an equal chance of getting A,B,C, or D.

    You can use duplicate entries in the list to give characters higher probabilities. So, for example, let’s say you wanted a 50% chance of A and 25% chances of B and C respectively. You could have an array like this:

    li = [‘A’,’A’,’B’,’C’]

    And so on.

    It would not be hard to parameterize the characters coming in with desired weights, to model that I’d use a dictionary.

    characterbasis = {‘A’:25, ‘B’:25, ‘C’:25, ‘D’:25}

    Make that the first parameter, and the second being the length of the string and use the above code to generate your string.

  3. user

    For four letters, here’s something quick off the top of my head:

    from random import random
    def randABCD(n, pA, pB, pC, pD):
        # assumes pA + pB + pC + pD == 1
        cA = pA
        cB = cA + pB
        cC = cB + pC
        def choose():
            r = random()
            if r < cA:
               return 'A'
            elif r < cB:
               return 'B'
            elif r < cC:
               return 'C'
               return 'D'
        return ''.join([choose() for i in xrange(n)])

    I have no doubt that this can be made much cleaner/shorter, I’m just in a bit of a hurry right now.

    The reason I wouldn’t be content with David in Dakota’s answer of using a list of duplicate characters is that depending on your probabilities, it may not be possible to create a list with duplicates in the right numbers to simulate the probabilities you want. (Well, I guess it might always be possible, but you might wind up needing a huge list – what if your probabilities were 0.11235442079, 0.4072777384, 0.2297927874, 0.25057505341?)

    EDIT: here’s a much cleaner generic version that works with any number of letters with any weights:

    from bisect import bisect
    from random import uniform
    def rand_string(n, content):
        ''' Creates a string of letters (or substrings) chosen independently
            with specified probabilities. content is a dictionary mapping
            a substring to its "weight" which is proportional to its probability,
            and n is the desired number of elements in the string.
            This does not assume the sum of the weights is 1.'''
        l, cdf = zip(*[(l, w) for l, w in content.iteritems()])
        cdf = list(cdf)
        for i in xrange(1, len(cdf)):
            cdf[i] += cdf[i - 1]
        return ''.join([l[bisect(cdf, uniform(0, cdf[-1]))] for i in xrange(n)])        
  4. user

    Here is a rough idea of what might suit you

    import random as r
    def distributed_choice(probs):
        r= r.random()
        cum = 0.0
        for pair in probs:
            if (r < cum + pair[1]):
                return pair[0]          
            cum += pair[1]

    The parameter probs takes a list of pairs of the form (object, probability). It is assumed that the sum of probabilities is 1 (otherwise, its trivial to normalize).

    To use it just execute:

  5. user

    Here’s the code to select a single weighted value. You should be able to take it from here. It uses bisect and random to accomplish the work.

    from bisect import bisect
    from random import random
    def WeightedABCD(*weights):
      chars = 'ABCD'
      breakpoints = [sum(weights[:x+1]) for x in range(4)]
      return chars[bisect(breakpoints, random())]

    Call it like this: WeightedABCD(.25, .34, .25, .25).

    EDIT: Here is a version that works even if the weights don’t add up to 1.0:

    from bisect import bisect_left
    from random import uniform
    def WeightedABCD(*weights):
      chars = 'ABCD'
      breakpoints = [sum(weights[:x+1]) for x in range(4)]
      return chars[bisect_left(breakpoints, uniform(0.0,breakpoints[-1]))]
  6. user

    Hmm, something like:

    import random
    class RandomDistribution:
        def __init__(self, kv):
            self.entries = kv.keys()
            self.where = []
            cnt = 0
            for x in self.entries:
                cnt += kv[x]
        def find(self, key):
            l, r = 0, len(self.where)-1
            while l+1 < r:
               m = (l+r)/2
               if self.where[m] <= key:
            return self.entries[l]
        def randomselect(self):
            return self.find(random.random()*self.where[-1])
    rd = RandomDistribution( {"foo": 5.5, "bar": 3.14, "baz": 2.8 } )
    for x in range(1000):
        print rd.randomselect()

    should get you most of the way…

  7. user

    Thank you all for your help, I was able to figure something out, mostly with this info.
    For my particular need, I did something like this:

    import random
    #Create a function to randomize a given string
    def makerandom(seq):
        return ''.join(random.sample(seq, len(seq)))
    def randomDNA(n, probA=0.25, probC=0.25, probG=0.25, probT=0.25):
    #The remainder part here is used to make sure all n are used, as one cannot
    #have half an A for example.
        for i in range(0, n-(A+G+C+T)):
        notrandom=notrandom+ 'A'*A+ 'C'*C+ 'G'*G+ 'T'*T + remainder
        return makerandom(notrandom)

Leave a Reply

Your email address will not be published.