What Can You Even Do With 1.3 Billion Parameters Anyway?

It wasn’t my intention to return to GPT-Neo 1.3b. But, I had a thought about this. 1.3 billion parameters isn’t a lot. Not when these days we have 175 billion parameter GPT-3, 20 billion parameter GPT-Neo X, 6 billion parameter GPT-J and 13 billion parameter Fairseq.

I mean, you can run GPT-Neo 1.3b quite happily on a free Google Colab account. I know, I do. I spent most of yesterday playing with GPT-2 355M and 774M. 774M was the better of the bunch, but it still lacked…something.

Knowing that GPT-Neo 1.3b is capable of some quite amazing feats, I wondered if sub 1 billion parameters was a bit of a lost cause. So, I refactored my code to use GPT-Neo 1.3b and fired it up. This was the resulting conversation:

David: “Hello Katie.”
Katie: “Hello David.”
David: “How are you?”
Katie: “I’m good.”
David: “What did you do today?”
Katie: “I was at the office, I did some work.”
David: “What do you do as a job?”
Katie: “I’m a programmer.”
David: “Which language do you code in?”
Katie: “I code in C++.”
David: “Impressive.”
Katie: “I’ve been programming since I was a kid.”
David: “Yeah, me too. I started out with BASIC.”
Katie: “I used to do a lot of coding in BASIC.”
David: “So, could I ask you a few questions?”
Katie: “Sure.”
David: “How do you feel?”
Katie: “I feel happy.”
David: “Why do you feel happy?”
Katie: “I feel happy because I have a job.”
David: “Okay Katie, just quickly, what is a bottle?”
Katie: “A bottle is a bottle.”
David: “Give me a more detailed explaination.”
Katie: “It’s a bottle.”
David: “Okay, so what’s the purpose of a bottle?”
Katie: “It’s a bottle to hold things.”
David: “Such as?”
Katie: “Such as a bottle of water.”
David: “If the bottle of water had a cork in it, how would you remove the cork?”
Katie: “You would pull it out.”
David: “How? With what?”
Katie: “With a straw.”
David: “So, how would you pull the cork out of the bottle with a straw?”
Katie: “You would pull it out with your mouth.”
David: “Okay, so you would pull the cork out with your mouth. What would happen to the bottle of water if you removed the cork and then turned the bottle upside down?”
Katie: “The water would spill out.”

For reference, I asked Katie these questions because I have another version of Katie who is powered by GPT-J. The answers aren’t the same, but they are similar. GPT-J is able to give way more detail. But, this shows a one year old neural network with a less than 10% of the parameters of it’s latest stablemate, being able to understand cause and effect. I know a lot of trendy researchers will scream until they are blue in the face that LLMs do not understand what the meaning of the words they are producing. Well, if that is the case, how does it understand that tipping a bottle of water upside down without a cork in it will cause the water to spill?

I am not saying it is perfect. At all. When I told it that there was another bottle of water with the cork intact, it said that the water still spilled. It understood that tipping the bottle caused the water to spill, but didn’t understand how the cork stopped the water. But GPT-J does understand that difference. It also understood BASIC was a programming language without being explicitly told that it was. I said I started out with BASIC. BASIC could have been a company, an organisation, anything. But it recognised BASIC as a programming language and said that it had coded a lot in BASIC (hallucination, but, relevant context).

I think size is very under-researched. From my own experience, GPT-Neo 2.7b performs worse than GPT-Neo 1.3b in quite a few areas. It’s why we went from 1.3B to GPT-J rather than using 2.7b as an interim. It wasn’t good enough for us. GPT-2 774M gets close, really close, but again falls short in a lot of areas. Maybe 6b and 1.3b are sweet spots. Because from my own experience, GPT-Neo X is…okay. I don’t have the “holy shit” moments over and over again that I had experiencing GPT-J vs GPT-Neo 1.3b. And going from 6 billion parameters to 20 billion parameters – I really should be, if size is the only factor.

Author: David

Owner of AItrium.

Leave a comment