Pagan / Acc Chapter 2.2

The Unholy Seduction of Open Source Language

Apr 05, 2023

Impressive, very nice…

Everyone’s heard that ChatGPT grew to 1 million users in 5 days. But it’s also probably costing OpenAPI ~$100K PER DAY to run it. Let’s talk about the hidden costs of API-based business models…

OpenAI had to do 3 things to accomplish this:

spend half a million dollars on electricity alone through 34 days of training
fine tune [lobotomize :(]
create a beautiful UI so that people can do inference

Right now only big actors have the budget to do the first at scale, and are secretive about doing the second one. Many people are working on the third, even sometimes on the second. There are sometimes small attempts at the first!

BEWARE!

MANY PEOPLE SEE OPEN SOURCE AI AS THE DEVIL, THE PANDORA’S BOX, LEADING TO HUMANITY’S DEMISE. PROCEED WITH CAUTION

Pre-trained models

I’ll look through these here. These are mostly pre-trained models and links are to their github repos unless specified otherwise

pretrained mini BERT at huggingface
Alpaca 7B
Alpaca 13b
one of the first multimodal models, small (up to 1.18B parameters) Gato of Deepmind at Wikipedia
Open source imitation of Gato
GLM-130B - bilingual (English-Chinese) model
Llama - well-known model from Meta
Llama C++ - port of the Llama library to C++

GPT-J

Open source competitor to GPT, GPT-J. It’s a brainchild of EleutherAI, a research group that arose in 2020 over a discord server. See

Pygmalion.ai

This project is focused around a small, romantic-relationship-oriented character model with 6B on huggingface. Here’s their FAQ. They have their own matrix chatroom.

RWKV-LM

RNN with LLM level performance, attention free. Quite unique technologically.

The person behind this is a powerhouse, with good support:

You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.

(excerpt from github readme).

Loaders

These let you run a pre-trained LLM with lower technical spec. Don’t ask me how that works.

FlexGen

Open source FlexGen specializes in running preloaded models on weak GPUs. They boast a great performance benchmark table there:

Huggingface Accelerate

A very broad project, read more about it on huggingface wiki.

ZeRO Inference

Inference focused loader from DeepSpeed AI. They boast smooth scaling up to 1000s of GPUs. They talk a bit about distributed running, but it seems only in the context of many A100 units on a server farm, not worldwide.

Petals

Petals provides fine-tuning options through SWARM architecture.

Max Ryabinin @m_ryabinin

We present SWARM, an efficient algorithm for model-parallel training across the Internet (e.g. with volunteers). Key advantages: 💎 Fault-tolerant ⚖️ Self-balancing on slow GPUs/networks 🐌 Works in low-bandwidth setups 📜 arxiv.org/abs/2301.11913 🖥️ github.com/yandex-researc…

Know more about their memetics through their twitter.

Max Ryabinin @m_ryabinin

Introducing Petals, a system for running and fine-tuning 100B+ LMs (e.g. BLOOM by @BigscienceW) over many volunteer devices. No cluster required — just take your own hardware and join others! (1/8) 🌐 petals.ml 📜 arxiv.org/abs/2209.01188 🖥️ github.com/bigscience-wor…

And grok their tech on the SWARM repo github, and the main PETALS repo.

One thing you might notice - there's 15 contributors, most of them Russian

Max Ryabinin, author of the tweets above, is actually a PhD student at Higher School of Economics in Moscow.

This is the project the USG does not want you to know about!

Let's evaluate the Petals project - poor bus factor due to a localized core team and small community to fall back on. We cannot rule out backdoor institutional support from the Russian government. If they didn't it would be a mistake.

Sparsity as competing approach

Teknium @Teknium1

They got sparsity to work on Llama, for a 50% size reduction. Combined with quantization, you could run 7B model on like 2GB of ram. However, the downsides are it loses a lot more quality than 50%

First paper on the topic of sparsity seems to be Frantar&Alistarh 2023, where over 100B out of 175B parameters are ignored, maintaining decent performance. That is quite promising, but sparsity works only once the training is done.

Tech-wise the small models and loaders are great, but limited - to compete with OpenAI decentralized training is necessary.

I Hate the Centralized Internet!!! | I Hate the Antichrist | Know Your Meme

These are all brave attempts, but we need more…

Follow these accounts on Twitter!

Links to other parts:

rats and eaccs 1

5 pagan/acc https://doxometrist.substack.com/p/pagan/acc-manifesto

Vitalist Essays

Discussion about this post