[Python 3] Simple Markov Chain Generator

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,259
One time generator

Code:
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
      
    def generate(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    m = MarkovChain()
    m.parseTextFile("test.txt")
    print(m.generate())


if __name__ == "__main__":
    main()

#
# The text test file, "test.txt", contains the following:
#
#         That's where "with a cable" comes in. You secure it to something that'd require them to have tools to cut the cable, which is
#         slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar
#         a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises.
#         Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't
#         tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not
#         foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not
#         steal. Sort of like putting locks on the door. Some "burglars" try every knob they see and are more than happy to enter an
#         unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught.
#         That doesn't make a lock "useless", it means the lock protects you from some unknown % of risk. That's what the safe with a
#         cable does: it shaves some percentage points of your risk away.
#
#
# The contents of "test.txt" can be anything.
Multiple times generator, dependent of elapsed time in seconds

Code:
import time
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words
        self.prepare()

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
        self.prepare()

    def prepare(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
       
    def generate(self):
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    testStrings =  "That's where with a cable comes in. You secure it to something that'd require them to have tools to cut the cable, which is " \
        "slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar " \
        "a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises. " \
        "Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't " \
        "tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not " \
        "foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not " \
        "steal. Sort of like putting locks on the door. Some \"burglars\" try every knob they see and are more than happy to enter an " \
        "unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught. " \
        "That doesn't make a lock \"useless\", it means the lock protects you from some unknown % of risk. That's what the safe with a " \
        "cable does: it shaves some percentage points of your risk away."
    m = MarkovChain()
    m.parseWords(testStrings)
    startTime = time.monotonic()
    while (time.monotonic() - startTime < 4):
        print(m.generate())
        time.sleep(1)


if __name__ == "__main__":
    main()
Second code snippet in action:

https://ideone.com/VhOr4i
 
Last edited:

jonas

Ultra Cool Member
Reaction score
46
Alright, I haven't programmed python for a long time, but here are some comments.

Why do you use indirection for "markov" and "textData" in the first snippet?

I most certainly wouldn't change the type of textData in the middle of the code in the second snippet - that is a recipe for disaster. Function "parseWords" is a little bit confusing, because I would expect a variable named "words" to contain an enumerable of strings, not a string. Maybe a better variant would be

Code:
def parseText(self, text):
   self.text = text
   self.prepare()

def parseTextFile(self, textFilePath):
   with open(textFilePath) as file:
     self.parseText(file.read())

def prepare(self):
  textWords = [i.lower() for i in self.text.split(" ") if i.isalpha()]
  ...
In generate, I would prefer the loop not to be unrolled once, but since this is python there might not be an elegant way to do it. Variables "seed" and "i" don't seem to be used. The variable names "check" and "new" are completely confusing. "nextWord" is just an alias of "currentWord". Since you do not actually use the range, use "xrange" (in python 2).
Code:
def generate(self):
  anyWord = list(self.markov.keys())
  # Should require non-emptyness of anyWord, and throw meaningful error
  currentWord = random.choice(anyWord)
  sentence = [currentWord]
  for _ in xrange(0, random.randrange(15, 30)):
    predictedWords = self.markov[currentWord]
    nextWordSet = predictedWords or anyWord
    currentWord = random.choice(nextWordSet)
    sentence.append(currentWord)
  return " ".join(sentence)
Other than that, good job!

EDIT: Just noticed that you are using python 3, so disregard my comment about xrange
 
Last edited:

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,259
  1. I changed the textData in the first snippet, so that it becomes consistent with the actual concept of obtaining text data from a text file, and actually parse the data into data that is easy to access. I didn't think about the data being changed to something else will ruin how it was expected, but I don't have a better way of doing this.
  2. The indirection for markov is the same as textData, in that I wanted the data structures to be consistent with a very high-level concept that the text data is easily obtained as you go through them. Maybe the program is too small for me, that it feels trivial to have these indirections.
  3. Thanks for pointing out that seed and i variables are not used in the programming language. The only thing I could think of for seed is to store the initial randomized value as a class member, and then continue to generate markov chains starting from that seed itself. The i variable is more of a loop iterations limiter, where once it loops to between 15 and 20 iterations, it will stop generating the markov chain, and quit the application. In other words, the i variable represents how many markov chain tokens to generate.
  4. I understand some of the variable names are ambiguous. I apologize for that.
 
General chit-chat
Help Users
  • No one is chatting at the moment.
  • jonas jonas:
    Good to see you Varine!
  • The Helper The Helper:
    Happy Sunday!
    +1
  • V-SNES V-SNES:
    Happy Sunday!
    +1
  • ToshibaNuon ToshibaNuon:
    Happy sunday!
    +2
  • The Helper The Helper:
    And its Friday!
  • The Helper The Helper:
    Happy Saturday!
    +1
  • V-SNES V-SNES:
    Happy Saturday!
  • The Helper The Helper:
    Happy Monday!
  • V-SNES V-SNES:
    Happy Friday!
    +1
  • The Helper The Helper:
    Happy Friday!
    +1
  • tom_mai78101 tom_mai78101:
    Starting this upcoming Thursday, I will be in Japan for 10 days.
  • tom_mai78101 tom_mai78101:
    Thursday - Friday will be my Japan arrival flight. 9 days later, on a Sunday, will be my return departure flight.
    +2
  • The Helper The Helper:
    Hope you have safe travels my friend!
    +1
  • vypur85 vypur85:
    Wow spring time in Japan is awesome. Enjoy!
  • The Helper The Helper:
    Hopefully it will be more pleasure than work
  • vypur85 vypur85:
    Recently tried out ChatGPT about WE triggering. Wow it's capable of giving a somewhat legitimate response.
  • The Helper The Helper:
    I am sure it has read all the info on the forums here
  • The Helper The Helper:
    i think triggering is just scripting and chatgpt is real good at code
  • vypur85 vypur85:
    Yeah I suppose so. It's interesting how it can explain in so much detail.
  • vypur85 vypur85:
    But yet it won't work.
  • The Helper The Helper:
    it does a bad ass job doing excel vba code it has leveled me up at my job when I deal with excel that is for sure
  • vypur85 vypur85:
    Nice! I love Excel coding as well. Has always been using Google to help me. Maybe I'll use ChatGPT next time when I need it.
  • The Helper The Helper:
    yeah whatever it puts out even if it is not perfect I can fix it and the latest version of chatgpt can create websites from pictures it will not be long until it can do that with almost all the tools
    +1

    The Helper Discord

    Staff online

    Members online

    Affiliates

    Hive Workshop NUON Dome World Editor Tutorials

    Network Sponsors

    Apex Steel Pipe - Buys and sells Steel Pipe.
    Top