Hacker Newsnew | past | comments | ask | show | jobs | submit | atum47's commentslogin

I spent years doing that post processing on Photoshop, trying to increase realism on my archviz scenes, clients never went for it. They use to prefer the fake, perfect 3D look. Nice project, well done.


I usually have this technical hypothetical discussions with ChatGpt, I can share if you like, me asking him this: aren't LLMs just huge Markov Chains?! And now I see your project... Funny


> I can share if you like

Respectfully, absolutely nobody wants to read a copy-and-paste of a chat session with ChatGPT.


When you say nobody you mean you, right? You can't possible be answering for every single person in the world.

I was having a discussion about similarities between Markov Chains and LLMs and short after I found this topic on HN, when I wrote "I can share if you like" was as a proof about the coincidence.


LLMs are indeed Markov chains. The breakthrough is that we are able to efficiently compute well performing probabilities for many states using ML.


LLMs are not Markov Chains unless you contort the meaning of a Markov Model State so much you could even include the human brain.


Not sure why that's contorting, a markov model is anything where you know the probability of going from state A to state B. The state can be anything. When it's text generation the state is previous text to text with an extra character, which is true for both LLMs and oldschool n-gram markov models.


A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.


I'm not saying it's an n-gram Markov model or that you should store them as a lookup table. Markov models are just a mathematical concept that don't say anything about storage, just that the state change probabilities are a pure function of the current state.


You say state can be anything, no restrictions at all. Let me sell you a perfect predictor then :) The state is the next token.


Yes, technically you can frame an LLM as a Markov chain by defining the "state" as the entire sequence of previous tokens. But this is a vacuous observation under that definition, literally any deterministic or stochastic process becomes a Markov chain if you make the state space flexible enough. A chess game is a "Markov chain" if the state includes the full board position and move history. The weather is a "Markov chain" if the state includes all relevant atmospheric variables.

The problem is that this definition strips away what makes Markov models useful and interesting as a modeling framework. A “Markov text model” is a low-order Markov model (e.g., n-grams) with a fixed, tractable state and transitions based only on the last k tokens. LLMs aren’t that: they model using un-fixed long-range context (up to the window). For Markov chains, k is non-negotiable. It's a constant, not a variable. Once you make it a variable, near any process can be described as markovian, and the word is useless.


Sure many things can be modelled as Markov chains, which is why they're useful. But it's a mathematical model so there's no bound on how big the state is allowed to be. The only requirement is that all you need is the current state to determine the probabilities of the next state, which is exactly how LLMs work. They don't remember anything beyond the last thing they generated. They just have big context windows.


The etymology of the "markov property" is that the current state does not depend on history.

And in classes, the very first trick you learn to skirt around history is to add Boolean variables to your "memory state". Your systems now model, "did it rain The previous N days?" The issue obviously being that this is exponential if you're not careful. Maybe you can get clever by just making your state a "sliding window history", then it's linear in the number of days you remember. Maybe mix the both. Maybe add even more information .Tradeoffs, tradeoffs.

I don't think LLMs embody the markov property at all, even if you can make everything eventually follow the markov property by just "considering every single possible state". Of which there are (size of token set)^(length) states at minimum because of the KV cache.


The KV cache doesn't affect it because it's just an optimization. LLMs are stateless and don't take any other input than a fixed block of text. They don't have memory, which is the requirement for a Markov chain.


Have you ever actually worked with a basic markov problem?

The markov property states that your state is a transition of probabilities entirely from the previous state.

These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space. In that case, you'd go from 1 state to 3 states, 2^3 states if you needed the precise binary information for each day. Being "clever", maybe you assume only the # of days it rained, in the past 3 days mattered, you can get a 'linear' amount of memory.

Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation. The entire point of the markov observation is that you can represent a seemingly huge predictive model with just a couple of variables in a discrete state space, and ideally you're the clever programmer/researcher and can significantly collapse said space by being, well, clever.

Are you deliberately missing the point or what?


> Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum.

Okay, so we're agreed.


>Sure many things can be modelled as Markov chains

Again, no they can't, unless you break the definition. K is not a variable. It's as simple as that. The state cannot be flexible.

1. The markov text model uses k tokens, not k tokens sometimes, n tokens other times and whatever you want it to be the rest of the time.

2. A markov model is explcitly described as 'assuming that future states depend only on the current state, not on the events that occurred before it'. Defining your 'state' such that every event imaginable can be captured inside it is a 'clever' workaround, but is ultimately describing something that is decidedly not a markov model.


It's not n sometimes, k tokens some other times. LLMs have fixed context windows, you just sometimes have less text so it's not full. They're pure functions from a fixed size block of text to a probability distribution of the next character, same as the classic lookup table n gram Markov chain model.


1. A context limit is not a Markov order. An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop. You can't use a k-trained markov model on anything but k tokens, and each token has the same relationship with each other regardless. An LLM’s defining behavior is the opposite: within its window it can condition on any earlier token, and which tokens matter can change drastically with the prompt (attention is content-dependent). “Window size = 8k/128k” is not “order k” in the Markov sense; it’s just a hard truncation boundary.

2. “Fixed-size block” is a padding detail, not a modeling assumption. Yes, implementations batch/pad to a maximum length. But the model is fundamentally conditioned on a variable-length prefix (up to the cap), and it treats position 37 differently from position 3,700 because the computation explicitly uses positional information. That means the conditional distribution is not a simple stationary “transition table” the way the n-gram picture suggests.

3. “Same as a lookup table” is exactly the part that breaks. A classic n-gram Markov model is literally a table (or smoothed table) from discrete contexts to next-token probabilities. A transformer is a learned function that computes a representation of the entire prefix and uses that to produce a distribution. Two contexts that were never seen verbatim in training can still yield sensible outputs because the model generalizes via shared parameters; that is categorically unlike n-gram lookup behavior.

I don't know how many times I have to spell this out for you. Calling LLMs markov chains is less than useless. They don't resemble them in any way unless you understand neither.


I think you're confusing Markov chains and "Markov chain text generators". A Markov chain is a mathematical structure where the probabilities of going to the next state only depend on the current state and not the previous path taken. That's it. It doesn't say anything about whether the probabilities are computed by a transformer or stored in a lookup table, it just exists. How the probabilities are determined in a program doesn't matter mathematically.


Just a heads-up: this is not the first time somebody has to explain Markov chains to famouswaffles on HN, and I'm pretty sure it won't be the last. Engaging further might not be worth it.


I did not even remember you and had to dig to find out what you were on about. Just a heads up, if you've had a previous argument and you want to bring that up later then just speak plainly. Why act like "somebody" is anyone but you?

My response to both of you is the same.

LLMs do depend on previous events, but you say they don't because you've redefined state to include previous events. It's a circular argument. In a Markov chain, state is well defined, not something you can insert any property you want to or redefine as you wish.

It's not my fault neither of you understand what the Markov property is.


By that definition n-gram Markov chain text generators also include previous state because you always put the last n grams. :) It's exactly the same situation as LLMs, just with higher, but still fixed n.


We've been through this. The context of a LLM is not fixed. Context windows =/ n gram orders.

They don't because n gram orders are too small and rigid to include the history in the general case.

I think srean's comment up the thread is spot on. This current situation where the state can be anything you want it to be just does not make a productive conversation.


'A Markov chain is a mathematical structure where the probabilities of going to the next state only depend on the current state and not the previous path taken.'

My point, which seems so hard to grasp for whatever reason is that In a Markov chain, state is a well defined thing. It's not a variable you can assign any property to.

LLMs do depend on the previous path taken. That's the entire reason they're so useful! And the only reason you say they don't is because you've redefined 'state' to include that previous path! It's nonsense. Can you not see the circular argument?

The state is required to be a fixed, well-defined element of a structured state space. Redefining the state as an arbitrarily large, continuously valued encoding of the entire history is a redefinition that trivializes the Markov property, which a Markov chain should satisfy. Under your definition, any sequential system can be called Markov, which means the term no longer distinguishes anything.


They only have the previous path in as much as n-gram Markov text generators have the previous path.


> An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop.

I don't necessarily agree with GP, but I also don't think that a markov chain and markov generator definitions include the word "small".

That constant can be as large as you need it to be.


Well LLMs aren't human brains, unless you contort the definition of matrix algebra so much you could even include them.


QM and GR can be written as matrix algebra, atoms and electrons are QM, chemistry is atoms and electrons, biology is chemistry, brains are biology.

An LLM could be implemented with a Markov chain, but the naïve matrix is ((vocab size)^(context length))^2, which is far too big to fit in this universe.

Like, the Bekenstein bound means writing the transition matrix for an LLM with just 4k context (and 50k vocabulary) at just one bit resolution, the first row (out of a bit more than 10^18795 rows) ends up with a black hole >10^9800 times larger than the observable universe.


Yes, sure enough, but brains are not ideas, and there is no empirical or theoretical model for ideas in terms of brain states. The idea of unified science all stemming from a single ultimate cause is beautiful, but it is not how science works in practice, nor is it supported by scientific theories today. Case in point: QM models do not explain the behavior of larger things, and there is no model which gives a method to transform from quantum to massive states.

The case for brain states and ideas is similar to QM and massive objects. While certain metaphysical presuppositions might hold that everything must be physical and describable by models for physical things, science, which should eschew metaphysical assumptions, has not shown that to be the case.


Markov models with more than 3 words as "context window" produce very unoriginal text in my experience (corpus used had almost 200k sentences, almost 3 million words), matching the OP's experience. These are by no means large corpuses, but I know it isn't going away with a larger corpus.[1] The Markov chain will wander into "valleys" of reproducing paragraphs of its corpus one for one because it will stumble upon 4-word sequences that it has only seen once. This is because 4 words form a token, not a context window. Markov chains don't have what LLMs have.

If you use a syllable-level token in Markov models the model can't form real words much beyond the second syllable, and you have no way of making it make more sense other than increasing the token size, which exponentially decreases originality. This is the simplest way I can explain it, though I had to address why scaling doesn't work.

[1] There are 4^400000 possible 4-word sequences in English (barring grammar) meaning only a corpus with 8 times that amount of words and with no repetition could offer two ways to chain each possible 4 word sequence.


I was really sleepy when I wrote this


Yeah, there's only two differences between using Markov chains to predict words and LLMs:

* LLMs don't use Markov chains, * LLMs don't predict words.


* Markov chains have been used to predict syllables or letters since the beginning, and an LLMs tokenizer could be used for Markov chains

* The R package markovchain[1] may look like it's using Markov chains, but it's actually using the R programming language, zeros and ones.

[1] https://cran.r-project.org/web/packages/markovchain/index.ht...


They are definitely not Markov Chains they may, however, be Markov Models. There's a difference between MC and MM.


What do you mean? The states are fully observable (current array of tokens), and using an LLM we calculate the probabilities of moving between them. What is not MC about this?


I suggest getting familiar with or brushing up on the differences between a Markov Chain and a Markov Model. The former is a substantial restriction of the latter. The classic by Kemeny and Snell is a good readable reference.

MC have constant and finite context length, their state is the most recent k tuple of emitted alphabets and transition probabilities are invariant (to time and tokens emitted)


LLMs definitely also have finite context length. And if we consider padding, it is also constant. The k is huge compared to most Markov chains used historically, but it doesn't make it less finite.


That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).


What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.


That's different.

You can certainly feed k-grams one at a time to, estimate the the probability distribution over next token and use that to simulate a Markov Chain and reinitialize the LLM (drop context). In this process the LLM is just a look up table to simulate your MC.

But an LLM on its own doesn't drop context to generate, it's transition probabilities change depending on the tokens.


Don't know what happened. I stumbled onto a funny coincidence - me talking to a LLM about its similarities with MC - decided to share on a post about using MC to generate text. Got some nasty comments and a lot of down votes. Even though my comment sparked a pretty interesting discussion.

Hate to be that guy, but I remember this place being nicer.


Ever since LLMS became popular, there's been an epidemic of people pasting ChatGPT output onto forums (or in your case, offering to). These posts are always received similarly to yours, so I'm skeptical that you're genuinely surprised by the reaction.

Everyone has access to ChatGPT. If we wanted its "opinion" we could ask it ourselves. Your offer is akin to "Hey everyone, want me to Google this and paste the results page here?". You would never offer to do that. Ask yourself why.

These posts are low-effort and add nothing to the conversation, yet the people who write them seem to expect everyone to be impressed by their contribution. If you can't understand why people find this irritating, I'm not sure what to tell you.


Nobody was being nasty. roarcher explained why people reacted the way they did.


...are you under the impression that you have an exclusive relationship with "him"? Everyone else has access to ChatGPT too.


Yes. Yes I was. Thank you for the wake up call. I was under the impression that he was talking only to me.


Android gestures need to be prevented in order for the game to work on mobile (drag finger from right to left go back). Nice game thought.


Worked fine for me on Firefox on Galaxy Note 20. I've never run into single finger back gestures that can start anywhere - seems like that would break almost everything that involves dragging.


Interesting. I don't get that on my android phone. What model and browser are you using?


I was working on a image editor on the browser, https://victorribeiro.com/customFilter

Right now the neat future it have is the ability of running custom filters of varied window size of images, and use custom formulas to blend several images

I don't have a tutorial at hand on how to use it, but I have a YouTube video where I show some of its features

https://youtube.com/playlist?list=PL3pnEx5_eGm9rVr1_u1Hm_LK6...


At some point I would like to add more features as you described in your article; feature detection, image stitching...

Here's the source code if anyone's interested https://github.com/victorqribeiro/customFilter


I vaguely remember XnView having this matrix based custom filters.


> Now any body can write JavaScript code in Rust

Yeah, but do they want to?


Galaxy S2 was the best phone I ever had. I think it was pure Android, if not, you had some minor apps from Samsung that you could've delete. I think I left the Samsung train right about S5 (which was supposed water proof, but it wasn't). It was just down fall from there. Samsung account, locked phone full with shitty apps that you couldn't remove (without rooting)... I don't think I'll ever have anything from Samsung in the near future


I had the original Galaxy S, and that was very much not pure Android, and no other Galaxy I've had has been (so presume the S2 wasn't either). Prior to Android, they had their own OS, and wanted to continue the UI/UX rather than changing to Android.


I was visiting my old university town when I noticed they have a movie theater now. I went in cause I wanted to support it and the only session at the time was tron. I left as soon as my pop corn bucket was empty. What a mess!


Is the memory card actually more expensive than the controller?


Most voted comment on the last YouTube ad, which I liked by the way: where's the Linux support?

I support gog, I've been buying games from them for a long time. By yeah, where's the Linux support?


That's a weird comment. I'm on GNU/Linux, I can just go to the website, log in, go to my library, and download any game I want. What part of it isn't working for you?


I'm just echoing the sentiment from the community

https://www.gog.com/wishlist/galaxy/release_the_gog_galaxy_c...


Thank you. The op was so full of ads i couldn't even read it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: