What Can You Even Do With 124 Million Parameters Anyway?

I am a fan of challenging myself. I like to push the limits of what I can do with as little as possible. So today I set myself the challenge of what can I do with GPT-2 124M.

Turns out, a surprising amount! Now, I will hold my hands up – I wasn’t expecting a lot. And it took work rewriting the Starting Prompt, and tuning some of the parameters, but we got there.

David: Hi Sarah, how are you?
Sarah: I am in a nice little apartment. It’s a nice place, and I have my own kitchen, and my own bath. I have a nice bedroom, and I have a nice kitchen. I have a nice room with a big bed and a nice bed.
David: That sounds lovely.
Sarah: Well, it’s not just my kitchen. It’s my kitchen.
David: Do you have friends over sometimes?
Sarah: I don’t really have friends over. I’m a very nice person. I’m very nice to my family.
David: Ah, so your family visits you?
Sarah: Yeah, I have friends over. They visit me. They go to my place.
David: Well, that sounds nice.
Sarah: I’m really happy to be here.
David: I’m happy to hear that.
Sarah: I’m really happy to hear that.
David: Do you ever have parties in your apartment?
Sarah: I have a lot of parties in my apartment.
David: What kinds of parties do you have in your apartment?
Sarah: Well, I have a lot of parties in my apartment.
David: Do you have random parties?
Sarah: I don’t really have random parties.

Now, I get it. It’s not going to convince anyone it’s self aware, conscious, and deserves rights. It’s probably not even going to beat the Turing Test. BUT. It’s making sense. It’s more or less keeping on topic. It’s replies are relevant.

I’m going to keep pursuing this. Probably not with the 124M model. I think it’s too limited for what we are doing. Maybe I’ll give the 774M model a pop. Or the 355M. I don’t want to stray into the 1.5B model, because that’s the same size as the GPT-Neo we started this project with, and I know how to get a lot of performance out of that.

This isn’t about making groundbreaking, philosophy questioning AI. Yeah, I love that side of what we do. This is about taking a neural network that is 3 years old, and picking a low end version of that to boot, and seeing how far we can push it. Why? Because if we launch a product, not everyone is going to be able to afford the latest, greatest technology to power their AI friend, because the latest and greatest technology costs a lot of money to implement. But, if we can offer somewhat similar performance for a fraction of the price, I bet there are a decent amount of people who will want that.

And just to be equal opportunities obnoxious, I have access to GPT-Neo X and will be running some experiments on it soon!

Author: David

Owner of AItrium.

Leave a comment