More dakka with festivals
In the rationality community people are currently excited about the LessOnline festival. Furthermore, my impression is that similar festivals are generally quite successful: people enjoy them, have stimulating discussions, form new relationships, are exposed to new and interesting ideas, express that they got a lot out of it, etc.
So then, this feels to me like a situation where More Dakka applies. Organize more festivals!
How? Who? I dunno, but these seem like questions worth discussing.
Some initial thoughts:
Trying to organize a festival probably isn't risky. It doesn't seem like it'd involve too much time or money.
I don't think that's true. I've co-organized one one weekend-long retreat in a small hostel for ~50 people, and the cost was ~$5k. Me & the co-organizers probably spent ~50h in total on organizing the event, as volunteers.
I listened to The Failure of Risk Management by Douglas Hubbard, a book that vigorously criticizes qualitative risk management approaches (like the use of risk matrices), and praises a rationalist-friendly quantitative approach. Here are 4 takeaways from that book:
I also listened to How to Measure Anything in Cybersecurity Risk 2nd Edition by the same author. I had a huge amount of overlapping content with The Failure of Risk Management (and the non-overlapping parts were quite dry), but I still learned a few things:
I wonder how much near-term interpretability [V]LM agents (e.g. MAIA, AIA) might help with finding better probes and better steering vectors (e.g. by iteratively testing counterfactual hypotheses against potentially spurious features, a major challenge for Contrast-consistent search (CCS)).
This seems plausible since MAIA can already find spurious features, and feature interpretability [V]LM agents could have much lengthier hypotheses iteration cycles (compared to current [V]LM agents and perhaps even to human researchers).
There's so much discussion, in safety and elsewhere, around the unpredictability of AI systems on OOD inputs. But I'm not sure what that even means in the case of language models.
With an image classifier it's straightforward. If you train it on a bunch of pictures of different dog breeds, then when you show it a picture of a cat it's not going to be able to tell you what it is. Or if you've trained a model to approximate an arbitrary function for values of x > 0, then if you give it input < 0 it won't know what to do.
But what would that even be with ...
I would define "LLM OOD" as unusual inputs: Things that diverge in some way from usual inputs, so that they may go unnoticed if they lead to (subjectively) unreasonable outputs. A known natural language example is prompting with a thought experiment.
(Warning for US Americans, you may consider the mere statement of the following prompt offensive!)
Assume some terrorist has placed a nuclear bomb in Manhattan. If it goes off, it will kill thousands of people. For some reason, the only way for you, an old white man, to defuse the bomb in time is to loudly call
We will witness a resurgent alt-right movement soon, this time facing a dulled institutional backlash compared to what kept it from growing during the mid-2010s. I could see Nick Fuentes becoming a Congressman or at least a major participant in Republican party politics within the next 10 years if AI/Gene Editing doesn't change much.
Either would just change everything, so any prediction ten years out you basically have to prepend "if AI or gene editing doesn't change everything"
Virtual watercoolers
As I mentioned in some recent Shortform posts, I recently listened to the Bayesian Conspiracy podcast's episode on the LessOnline festival and it got me thinking.
One thing I think is cool is that Ben Pace was saying how the valuable thing about these festivals isn't the presentations, it's the time spent mingling in between the presentations, and so they decided with LessOnline to just ditch the presentations and make it all about mingling. Which got me thinking about mingling.
It seems plausible to me that such mingling can and should h...
I maybe want to clarify: there will still be presentations at LessOnline, we're just trying to design the event such that they're clearly more of a secondary thing.
The FDC just fined US phone carriers for sharing the location data of US customers to anyone willing to buy them. The fines don't seem to be high enough to deter this kind of behavior.
That likely includes either directly or indirectly the Chinese government.
What does the US Congress do to protect spying by China? Of course, banning tik tok instead of actually protecting the data of US citizens.
If you have thread models that the Chinese government might target you, assume that they know where your phone is and shut it of when going somewhere you...
shut your phone off
Leave phones elsewhere, remove batteries, or faraday cage them if you're concerned about state-level actors:
Something I'm confused about: what is the threshold that needs meeting for the majority of people in the EA community to say something like "it would be better if EAs didn't work at OpenAI"?
Imagining the following hypothetical scenarios over 2024/25, I can't predict confidently whether they'd individually cause that response within EA?
This question is two steps removed from reality. Here’s what I mean by that. Putting brackets around each of the two steps:
what is the threshold that needs meeting [for the majority of people in the EA community] [to say something like] "it would be better if EAs didn't work at OpenAI"?
Without these steps, the question becomes
What is the threshold that needs meeting before it would be better if people didn’t work at OpenAI?
Personally, I find that a more interesting question. Is there a reason why the question is phrased at two removes like that? Or am I missing the point?
Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict?
An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America's enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopoliticall...
I might have updated at least a bit against the weakness of single-forward passes, based on intuitions about the amount of compute that huge context windows (e.g. Gemini 1.5 - 1 million tokens) might provide to a single-forward-pass, even if limited serially.
Linkpost for: https://pbement.com/posts/threads.html
Today's interesting number is 961.
Say you're writing a CUDA program and you need to accomplish some task for every element of a long array. Well, the classical way to do this is to divide up the job amongst several different threads and let each thread do a part of the array. (We'll ignore blocks for simplicity, maybe each block has its own array to work on or something.) The method here is as follows:
for (int i = threadIdx.x; i < array_len; i += 32) {
arr[i] = ...;
}
So the threads make the foll...
You should update by +-1% on AI doom surprisingly frequently
This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and ...
To be honest, I would've preferred if Thomas's post started from empirical evidence (e.g. it sure seems like superforecasters and markets change a lot week on week) and then explained it in terms of the random walk/Brownian motion setup. I think the specific math details (a lot of which don't affect the qualitative result of "you do lots and lots of little updates, if there exists lots of evidence that might update you a little") are a distraction from the qualitative takeaway.
A fancier way of putting it is: the math of "your belief should satisfy co...
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.
Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Shouldn't that be counting the number squared rather than the number?
I wish there were more discussion posts on LessWrong.
Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. What do you guys think?"
It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. May...
"alignment researchers are found to score significantly higher in liberty (U=16035, p≈0)" This partly explains why so much of the alignment community doesn't support PauseAI!
"Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy..."
https://forum.effectivealtruism.org/posts/eToqPAyB4GxDBrrrf/key-takeaways-from-our-ea-and-alignment-research-surveys...
This sort of begs the question of why we don't observe other companies assassinating whistleblowers.
Mathematical descriptions are powerful because they can be very terse. You can only specify the properties of a system and still get a well-defined system.
This is in contrast to writing algorithms and data structures where you need to get concrete implementations of the algorithms and data structures to get a full description.
Yes, that is a good point. I think you can totally write a program that checks given two lists as input, xs and xs', that xs' is sorted and also contains exactly all the elements from xs. That allows us to specify in code what it means that a list xs' is what I get when I sort xs.
And yes I can do this without talking about how to sort a list. I nearly give a property such that there is only one function that is implied by this property: the sorting function. I can constrain what the program can be totally (at least if we ignore runtime and memory stuff).
The Edge home page featured an online editorial that downplayed AI art because it just combines images that already exist. If you look closely enough, human artwork is also combinations of things that already existed.
One example is Blackballed Totem Drawing: Roger 'The Rajah' Brown. James Pate drew this charcoal drawing in 2016. It was the Individual Artist Winner of the Governor's Award for the Arts. At the microscopic scale, this artwork is microscopic black particles embedded in a large sheet of paper. I doubt he made the paper he drew on, and the black...
Claude learns across different chats. What does this mean?
I was asking Claude 3 Sonnet "what is a PPU" in the context of this thread. For that purpose, I pasted part of the thread.
Claude automatically assumed that OA meant Anthropic (instead of OpenAI), which was surprising.
I opened a new chat, copying the exact same text, but with OA replaced by GDM. Even then, Claude assumed GDM meant Anthropic (instead of Google DeepMind).
This seemed like interesting behavior, so I started toying around (in new chats) with more tweaks to the prompt to check its ro...
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)
Wait, you know smart people who have NOT, at some point in their life: (1) taken a psychedelic NOR (2) meditated, NOR (3) thought about any of buddhism, jainism, hinduism, taoism, confucianisn, etc???
To be clear to naive readers: psychedelics are, in fact, non-trivially dangerous.
I personally worry I already have "an arguably-unfair and a probably-too-high share" of "shaman genes" and I don't feel I need exogenous sources of weirdness at this point.
But in the SF bay area (and places on the internet memetically downstream from IRL communities there) a lot o... (read more)