For example,
The function could be something like def RandABCD(n, .25, .34, .25, .25):
Where n is the length of the string to be generated and the following numbers are the desired probabilities of A, B, C, D.
I would imagine this is quite simple, however i am having trouble creating a working program. Any help would be greatly appreciated.
Please post the code you have so far. It's much easier to help you when we see what you've tried.
Homework smell detected.
in the function signature you put as an example how would the function know what 4 characters it needs to weigh the probabilities against?
Well, the first thing I see is that your probabilities don't add up to 1.
.25, .34, .25, .25 — these don't add up to 1 ;-P
Out of curiosity, are the letters 'A', 'C', 'G', and 'T' (DNA)?
probably looking for (.22, .34, .22, .22) as your probabilities.
The random class is quite powerful in python. You can generate a list with the characters desired at the appropriate weights and then use random.choice to obtain a selection.
First, make sure you do an import random.
For example, let’s say you wanted a truly random string from A,B,C, or D.
1. Generate a list with the characters
li = [‘A’,’B’,’C’,’D’]
output = “”.join([random.choice(li) for i in range(0, n)])
You could easily make that a function with n as a parameter.
In the above case, you have an equal chance of getting A,B,C, or D.
You can use duplicate entries in the list to give characters higher probabilities. So, for example, let’s say you wanted a 50% chance of A and 25% chances of B and C respectively. You could have an array like this:
li = [‘A’,’A’,’B’,’C’]
And so on.
It would not be hard to parameterize the characters coming in with desired weights, to model that I’d use a dictionary.
characterbasis = {‘A’:25, ‘B’:25, ‘C’:25, ‘D’:25}
Make that the first parameter, and the second being the length of the string and use the above code to generate your string.
For four letters, here’s something quick off the top of my head:
I have no doubt that this can be made much cleaner/shorter, I’m just in a bit of a hurry right now.
The reason I wouldn’t be content with David in Dakota’s answer of using a list of duplicate characters is that depending on your probabilities, it may not be possible to create a list with duplicates in the right numbers to simulate the probabilities you want. (Well, I guess it might always be possible, but you might wind up needing a huge list – what if your probabilities were 0.11235442079, 0.4072777384, 0.2297927874, 0.25057505341?)
EDIT: here’s a much cleaner generic version that works with any number of letters with any weights:
Here is a rough idea of what might suit you
The parameter
probs
takes a list of pairs of the form (object, probability). It is assumed that the sum of probabilities is 1 (otherwise, its trivial to normalize).To use it just execute:
Here’s the code to select a single weighted value. You should be able to take it from here. It uses bisect and random to accomplish the work.
Call it like this:
WeightedABCD(.25, .34, .25, .25)
.EDIT: Here is a version that works even if the weights don’t add up to 1.0:
Hmm, something like:
should get you most of the way…
Thank you all for your help, I was able to figure something out, mostly with this info.
For my particular need, I did something like this: