[Python 3] Simple Markov Chain Generator

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,667
One time generator

Code:
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
      
    def generate(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    m = MarkovChain()
    m.parseTextFile("test.txt")
    print(m.generate())


if __name__ == "__main__":
    main()

#
# The text test file, "test.txt", contains the following:
#
#         That's where "with a cable" comes in. You secure it to something that'd require them to have tools to cut the cable, which is
#         slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar
#         a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises.
#         Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't
#         tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not
#         foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not
#         steal. Sort of like putting locks on the door. Some "burglars" try every knob they see and are more than happy to enter an
#         unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught.
#         That doesn't make a lock "useless", it means the lock protects you from some unknown % of risk. That's what the safe with a
#         cable does: it shaves some percentage points of your risk away.
#
#
# The contents of "test.txt" can be anything.

Multiple times generator, dependent of elapsed time in seconds

Code:
import time
import random

class MarkovChain:
    def __init__(self):
        self.textData = []
        self.markov = []

    def parseWords(self, words):
        self.textData = words
        self.prepare()

    def parseTextFile(self, textFilePath):
        with open(textFilePath) as file:
            self.textData = file.read()
        self.prepare()

    def prepare(self):
        self.textData = [i.lower() for i in self.textData.split(" ") if i.isalpha()]
        self.markov = {i:[] for i in self.textData}
        for before, after in zip(self.textData, self.textData[1:]):
            self.markov[before].append(after)
       
    def generate(self):
        new = list(self.markov.keys())
        seed = random.randrange(len(new))
        currentWord = random.choice(new)
        sentence = [currentWord]
        for i in range(0, random.randrange(15, 30)):
            check = self.markov[currentWord]
            if (len(check) > 0):
                nextWord = random.choice(check)
                sentence.append(nextWord)
                currentWord = nextWord
            else:
                currentWord = random.choice(new)
        return " ".join(sentence)

def main():
    testStrings =  "That's where with a cable comes in. You secure it to something that'd require them to have tools to cut the cable, which is " \
        "slightly less likely. Most burglaries are fairly quick, because they make noise and every minute spent inside gets the burglar " \
        "a minute closer to getting caught. The kind of burglar who doesn't get caught tries to spend less than 3 minutes on-premises. " \
        "Bringing along a toolbox increases noise, decreases agility, and makes it harder to carry fenceable items away. So they don't " \
        "tend to have a nice pair of bolt cutters unless they're stupid or know in advance something valuable requires them. It's not " \
        "foolproof, but it's a way to increase the odds the burglar won't be able to steal some things you'd really rather them not " \
        "steal. Sort of like putting locks on the door. Some \"burglars\" try every knob they see and are more than happy to enter an " \
        "unlocked car/residence. But others are willing to kick the door in or break a window, taking added risk they'll be caught. " \
        "That doesn't make a lock \"useless\", it means the lock protects you from some unknown % of risk. That's what the safe with a " \
        "cable does: it shaves some percentage points of your risk away."
    m = MarkovChain()
    m.parseWords(testStrings)
    startTime = time.monotonic()
    while (time.monotonic() - startTime < 4):
        print(m.generate())
        time.sleep(1)


if __name__ == "__main__":
    main()

Second code snippet in action:

https://ideone.com/VhOr4i
 
Last edited:

jonas

You can change this now in User CP.
Reaction score
66
Alright, I haven't programmed python for a long time, but here are some comments.

Why do you use indirection for "markov" and "textData" in the first snippet?

I most certainly wouldn't change the type of textData in the middle of the code in the second snippet - that is a recipe for disaster. Function "parseWords" is a little bit confusing, because I would expect a variable named "words" to contain an enumerable of strings, not a string. Maybe a better variant would be

Code:
def parseText(self, text):
   self.text = text
   self.prepare()

def parseTextFile(self, textFilePath):
   with open(textFilePath) as file:
     self.parseText(file.read())

def prepare(self):
  textWords = [i.lower() for i in self.text.split(" ") if i.isalpha()]
  ...

In generate, I would prefer the loop not to be unrolled once, but since this is python there might not be an elegant way to do it. Variables "seed" and "i" don't seem to be used. The variable names "check" and "new" are completely confusing. "nextWord" is just an alias of "currentWord". Since you do not actually use the range, use "xrange" (in python 2).
Code:
def generate(self):
  anyWord = list(self.markov.keys())
  # Should require non-emptyness of anyWord, and throw meaningful error
  currentWord = random.choice(anyWord)
  sentence = [currentWord]
  for _ in xrange(0, random.randrange(15, 30)):
    predictedWords = self.markov[currentWord]
    nextWordSet = predictedWords or anyWord
    currentWord = random.choice(nextWordSet)
    sentence.append(currentWord)
  return " ".join(sentence)

Other than that, good job!

EDIT: Just noticed that you are using python 3, so disregard my comment about xrange
 
Last edited:

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,667
  1. I changed the textData in the first snippet, so that it becomes consistent with the actual concept of obtaining text data from a text file, and actually parse the data into data that is easy to access. I didn't think about the data being changed to something else will ruin how it was expected, but I don't have a better way of doing this.
  2. The indirection for markov is the same as textData, in that I wanted the data structures to be consistent with a very high-level concept that the text data is easily obtained as you go through them. Maybe the program is too small for me, that it feels trivial to have these indirections.
  3. Thanks for pointing out that seed and i variables are not used in the programming language. The only thing I could think of for seed is to store the initial randomized value as a class member, and then continue to generate markov chains starting from that seed itself. The i variable is more of a loop iterations limiter, where once it loops to between 15 and 20 iterations, it will stop generating the markov chain, and quit the application. In other words, the i variable represents how many markov chain tokens to generate.
  4. I understand some of the variable names are ambiguous. I apologize for that.
 
General chit-chat
Help Users
  • No one is chatting at the moment.
  • Varine Varine:
    How can you tell the difference between real traffic and indexing or AI generation bots?
  • The Helper The Helper:
    The bots will show up as users online in the forum software but they do not show up in my stats tracking. I am sure there are bots in the stats but the way alot of the bots treat the site do not show up on the stats
  • Varine Varine:
    I want to build a filtration system for my 3d printer, and that shit is so much more complicated than I thought it would be
  • Varine Varine:
    Apparently ABS emits styrene particulates which can be like .2 micrometers, which idk if the VOC detectors I have can even catch that
  • Varine Varine:
    Anyway I need to get some of those sensors and two air pressure sensors installed before an after the filters, which I need to figure out how to calculate the necessary pressure for and I have yet to find anything that tells me how to actually do that, just the cfm ratings
  • Varine Varine:
    And then I have to set up an arduino board to read those sensors, which I also don't know very much about but I have a whole bunch of crash course things for that
  • Varine Varine:
    These sensors are also a lot more than I thought they would be. Like 5 to 10 each, idk why but I assumed they would be like 2 dollars
  • Varine Varine:
    Another issue I'm learning is that a lot of the air quality sensors don't work at very high ambient temperatures. I'm planning on heating this enclosure to like 60C or so, and that's the upper limit of their functionality
  • Varine Varine:
    Although I don't know if I need to actually actively heat it or just let the plate and hotend bring the ambient temp to whatever it will, but even then I need to figure out an exfiltration for hot air. I think I kind of know what to do but it's still fucking confusing
  • The Helper The Helper:
    Maybe you could find some of that information from AC tech - like how they detect freon and such
  • Varine Varine:
    That's mostly what I've been looking at
  • Varine Varine:
    I don't think I'm dealing with quite the same pressures though, at the very least its a significantly smaller system. For the time being I'm just going to put together a quick scrubby box though and hope it works good enough to not make my house toxic
  • Varine Varine:
    I mean I don't use this enough to pose any significant danger I don't think, but I would still rather not be throwing styrene all over the air
  • The Helper The Helper:
    New dessert added to recipes Southern Pecan Praline Cake https://www.thehelper.net/threads/recipe-southern-pecan-praline-cake.193555/
  • The Helper The Helper:
    Another bot invasion 493 members online most of them bots that do not show up on stats
  • Varine Varine:
    I'm looking at a solid 378 guests, but 3 members. Of which two are me and VSNES. The third is unlisted, which makes me think its a ghost.
    +1
  • The Helper The Helper:
    Some members choose invisibility mode
    +1
  • The Helper The Helper:
    I bitch about Xenforo sometimes but it really is full featured you just have to really know what you are doing to get the most out of it.
  • The Helper The Helper:
    It is just not easy to fix styles and customize but it definitely can be done
  • The Helper The Helper:
    I do know this - xenforo dropped the ball by not keeping the vbulletin reputation comments as a feature. The loss of the Reputation comments data when we switched to Xenforo really was the death knell for the site when it came to all the users that left. I know I missed it so much and I got way less interested in the site when that feature was gone and I run the site.
  • Blackveiled Blackveiled:
    People love rep, lol
    +1
  • The Helper The Helper:
    The recipe today is Sloppy Joe Casserole - one of my faves LOL https://www.thehelper.net/threads/sloppy-joe-casserole-with-manwich.193585/
  • The Helper The Helper:
    Decided to put up a healthier type recipe to mix it up - Honey Garlic Shrimp Stir-Fry https://www.thehelper.net/threads/recipe-honey-garlic-shrimp-stir-fry.193595/

      The Helper Discord

      Members online

      No members online now.

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top