and bring an end to human civilization
This is interesting, and a good explanation for a naive reader on AI risks (eg me).
So here's another naive question: how difficult or undesirable for other reasons is to introduce a powerful but conceptually simple safety rail to the system, one that would override the risks with maximizing the goal X. I don't want to sound 100 years old (I'm slightly less) and bring up outdated Sci-Fi notions of let's say laws of robotics Asimov proposed in his Foundation novels, but something along the lines of "maximise profits BUT ONLY IF it doesn't cause a death or severe disability of any more people than would have died/got injured otherwise"? Maximise paperclip output BUT ONLY IF it doesn't kill anyone extra in the process?
Or perhaps, instead of maximising profits (paperclip output, the number of new drugs invented, the pace of scientific discovery as a whole, etc), set out a fixed goal that appears safer. Not: produce as many paperclips as possible, but produce X number of paperclips. Sure, X might be lower than what could be done, but is still likely high enough? In other words, artificial limit on the maximising, replacing it with a manually adjusted figure?