The Year of Endless Technical Problems

Diggus Bickus · 10 de Sep, 2025

Just had this happen by merely clicking "What's new" by the way.

Null · 10 de Sep, 2025

Vecr dijo:
If anything looks weird in here, don't run it. It's pretty straightforwards though:

this has absolutely and completely and totally slaughtered the system. iowait is 25%+ with no cpu usage.

Null · 10 de Sep, 2025

SNEED must be mq-deadline or the site eats shit.

oh yeah oh yeah oh yeah oh yeah

babaisyou · 10 de Sep, 2025

If SNEED is having issues with BFQ then you might benefit from setting slice_idle to zero, per this kernel document. But I believe ZFS has it's own IO scheduler so it shouldn't be using one.

CHUCK should also have no IO scheduler on it as it is NVMe backed.

Vecr · 10 de Sep, 2025

Null dijo:
this has absolutely and completely and totally slaughtered the system. iowait is 25%+ with no cpu usage.

Is

Código:

$ zcat /proc/config.gz | grep CONFIG_HZ_1000
CONFIG_HZ_1000=y

on your machine?
Sorry for not telling you about that. You need a fast tick rate. Back to the drawing board, I'll try to test with a load that creates higher IOWAIT on a system I have access to.

Do you have a particular NUMA setup? What program gets pinned to what node, that sort of thing. That might be useful to know for testing.

Ah yeah, the scheduler behaves differently when IOWAIT is high but CPU is low.

Null · 10 de Sep, 2025

Vecr dijo:
on your machine?

No.

I've reconfigured MariaDB to be less resource hoggy. I am seeing windows where the site is blazing fast and I am trying to figure out how.

Lian Xing · 10 de Sep, 2025

>an unexpected database error occurred.
I thought we'd lost the ability to sneed forever.

Harvey Danger · 10 de Sep, 2025

Lian Xing dijo:
>an unexpected database error occurred.
I thought we'd lost the ability to sneed forever.

Buckle up cowboy, we're testing in PROD.

Null dijo:
SNEED must be mq-deadline or the site eats shit.

That's weird. You said SNEED is a SSD, have you tried explicitly setting to none already?

Null · 10 de Sep, 2025

Harvey Danger dijo:
That's weird. You said SNEED is a SSD, have you tried explicitly setting to none already?

yes as previously explained it killed the site

gentoofag · 10 de Sep, 2025

Do you have compression enabled on any filesystems or zfs datasets? If so turn it off right now. I had a problem where my system would stall for a minute after using a Windows VM. Turns out it was caused by f2fs's kernel threads compressing all the writes that were done to the VM image.
A way to check for similar issues is to show kernel threads in htop (shift-k) and look for ones with high priority and CPU usage. Also keep a window open with dmesg -w -H and watch for anything interesting to show up.

Vecr · 10 de Sep, 2025

I didn't make much progress. I don't want to bother you more. If you get a CONFIG_HZ_1000=y kernel and details on your NUMA setup (if you have one) I can try to help again.

SCV · 10 de Sep, 2025

I had some thoughts more thoughts about this since yesterday. The correct way is still add monitoring until the problem becomes apparent but seeing as we're doin' the cowboy thing I have a few things to try that so far haven't been suggested (in this thread at least).

Have you tried turning pcie power management off? Just add "pcie_aspm=off" to the grub linux command, update grub, and reboot. I've seen a few times where buggy power management can tank performance or imitate a flaky pcie device or connection. And since you have nvme drives...

I assume the server has ECC memory but do you have rasdaemon setup so you actually will see ECC (and other machine check) errors? ECC errors will tank performance but can be sporadic based on what (or nothing) is using that memory or even memory temperature and it won't necessarily crash if the ECC can recover. Since the site hasn't been down for several days recently you probably haven't run memtest but at this point it might be worth it. You MUST use the free version of memtest86+ from the company website. The one bundled with most linux distros WILL NOT REPORT CORRECTED ECC ERRORS.

I know you said you use debian but we are on a new-ish server. What kernel version are we on currently? If it's older a yolo upgrade to the newest LTS might just werk (YeeHaw!)

I presume you've checked dmesg for anything suspicious. But giving us a copy of dmesg to look at might yield some clues.

Edit: I feel like this must've been checked but during the slowness there's no packet loss right?

Null · 10 de Sep, 2025

It's so weird, I've had this burning all consuming desire to fix the site all week, I sat down and did 6 hours of work on it today, and almost as soon as I got it working great, Charlie got shot.

Looseleaf Paper · 10 de Sep, 2025

Null dijo:
It's so weird, I've had this burning all consuming desire to fix the site all week, I sat down and did 6 hours of work on it today, and almost as soon as I got it working great, Charlie got shot.

Is the traffic from the shooting killing the clear net? I had to dust off my TOR browser to post this.

Null · 10 de Sep, 2025

No. DNS issue. Will resolve itself.

The Noise · 11 de Sep, 2025

how it feels to finally be able to use 3000+ page threads again without the site shitting itself and doing nothing

thanks null

skunt · 18 de Sep, 2025

any updates on this @Null, did you manage to fix it?
did anything here help?

Provably Wrong · 18 de Sep, 2025

In last week’s MATI, Josh said AI suggested an issue with having lots of requests allocating and releasing lots of memory each and reducing that happened to solve the issue. Not because of not enough memory but because you can’t do infinity of these memory operations at once and apparently we hit the limit because Josh was feeling RAM rich and upped the spending limits like a nigger getting his first credit card.

At this rate he might just abandon us and just post to his AI so be can get the answers he wants, and just have AI Josh niggerpost in a random thread every other day.

Margo Martindale · 19 de Sep, 2025

It still feels kinda slow, like half the time the reaction image icons are not even loading for more, and some images

MerelyAPlateOfSpaghetti · 19 de Sep, 2025

Margo Martindale dijo:
It still feels kinda slow, like half the time the reaction image icons are not even loading for more, and some images

I don't disagree, but it's been reliably slow. No more random 504 errors, very few "clicked a link and it took 15 seconds to load" issues. That's a major step in the right direction.

The Year of Endless Technical Problems

Diggus Bickus

Oh Watamelons, and Molasses.

Null

Ooperator

Null

Ooperator

babaisyou

Vecr

DM if I don't respond.

Null

Ooperator

Lian Xing

Gabe?...Gabe!?

Harvey Danger

getting tired of this whole internet thing

Null

Ooperator

gentoofag

Life is Mizzy

Vecr

DM if I don't respond.

SCV

ffmpeg -i nothing_really_mattress.mkv

Null

Ooperator

Looseleaf Paper

Null

Ooperator

The Noise

The Noiseim Have Gone Insane

skunt

Morning! Have a ship!

Provably Wrong

Release BroTeam from his Machinima contract

Margo Martindale

The Trannytale Strangler

MerelyAPlateOfSpaghetti

Reject attraction to degeneracy