[Python 3] Simple Markov Chain Generator

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,843
One time generator

Code:
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
      
    def generate(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    m = MarkovChain()
    m.parseTextFile("test.txt")
    print(m.generate())


if __name__ == "__main__":
    main()

#
# The text test file, "test.txt", contains the following:
#
#         That's where "with a cable" comes in. You secure it to something that'd require them to have tools to cut the cable, which is
#         slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar
#         a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises.
#         Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't
#         tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not
#         foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not
#         steal. Sort of like putting locks on the door. Some "burglars" try every knob they see and are more than happy to enter an
#         unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught.
#         That doesn't make a lock "useless", it means the lock protects you from some unknown % of risk. That's what the safe with a
#         cable does: it shaves some percentage points of your risk away.
#
#
# The contents of "test.txt" can be anything.

Multiple times generator, dependent of elapsed time in seconds

Code:
import time
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words
        self.prepare()

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
        self.prepare()

    def prepare(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
       
    def generate(self):
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    testStrings =  "That's where with a cable comes in. You secure it to something that'd require them to have tools to cut the cable, which is " \
        "slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar " \
        "a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises. " \
        "Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't " \
        "tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not " \
        "foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not " \
        "steal. Sort of like putting locks on the door. Some \"burglars\" try every knob they see and are more than happy to enter an " \
        "unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught. " \
        "That doesn't make a lock \"useless\", it means the lock protects you from some unknown % of risk. That's what the safe with a " \
        "cable does: it shaves some percentage points of your risk away."
    m = MarkovChain()
    m.parseWords(testStrings)
    startTime = time.monotonic()
    while (time.monotonic() - startTime < 4):
        print(m.generate())
        time.sleep(1)


if __name__ == "__main__":
    main()

Second code snippet in action:

https://ideone.com/VhOr4i
 
Last edited:
Alright, I haven't programmed python for a long time, but here are some comments.

Why do you use indirection for "markov" and "textData" in the first snippet?

I most certainly wouldn't change the type of textData in the middle of the code in the second snippet - that is a recipe for disaster. Function "parseWords" is a little bit confusing, because I would expect a variable named "words" to contain an enumerable of strings, not a string. Maybe a better variant would be

Code:
def parseText(self, text):
   self.text = text
   self.prepare()

def parseTextFile(self, textFilePath):
   with open(textFilePath) as file:
     self.parseText(file.read())

def prepare(self):
  textWords = [i.lower() for i in self.text.split(" ") if i.isalpha()]
  ...

In generate, I would prefer the loop not to be unrolled once, but since this is python there might not be an elegant way to do it. Variables "seed" and "i" don't seem to be used. The variable names "check" and "new" are completely confusing. "nextWord" is just an alias of "currentWord". Since you do not actually use the range, use "xrange" (in python 2).
Code:
def generate(self):
  anyWord = list(self.markov.keys())
  # Should require non-emptyness of anyWord, and throw meaningful error
  currentWord = random.choice(anyWord)
  sentence = [currentWord]
  for _ in xrange(0, random.randrange(15, 30)):
    predictedWords = self.markov[currentWord]
    nextWordSet = predictedWords or anyWord
    currentWord = random.choice(nextWordSet)
    sentence.append(currentWord)
  return " ".join(sentence)

Other than that, good job!

EDIT: Just noticed that you are using python 3, so disregard my comment about xrange
 
Last edited:
  1. I changed the textData in the first snippet, so that it becomes consistent with the actual concept of obtaining text data from a text file, and actually parse the data into data that is easy to access. I didn't think about the data being changed to something else will ruin how it was expected, but I don't have a better way of doing this.
  2. The indirection for markov is the same as textData, in that I wanted the data structures to be consistent with a very high-level concept that the text data is easily obtained as you go through them. Maybe the program is too small for me, that it feels trivial to have these indirections.
  3. Thanks for pointing out that seed and i variables are not used in the programming language. The only thing I could think of for seed is to store the initial randomized value as a class member, and then continue to generate markov chains starting from that seed itself. The i variable is more of a loop iterations limiter, where once it loops to between 15 and 20 iterations, it will stop generating the markov chain, and quit the application. In other words, the i variable represents how many markov chain tokens to generate.
  4. I understand some of the variable names are ambiguous. I apologize for that.
 
General chit-chat
Help Users
  • No one is chatting at the moment.

      The Helper Discord

      Members online

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top