Master Server Maintenance Thread

Started by Led, June 26, 2016, 10:38:09 AM

Previous topic - Next topic
Upstream failure event
Feb 01 2018 01:49:03 PM PT   All of our connections to one of our transit providers in Chicago (GTT) went down at approximately 3:31pm CST. They came partially back up at approximately 3:37pm CST but we are still seeing problems with them now.

This occurred across all routers on our end and no other upstreams failed, so this was an issue entirely within GTT. We are following up with them to ask what happened and for an ETA on a fix.
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com


INAP problems around 12:30pm CST
Mar 05 2018 11:01:10 AM PT   We saw some internal packet loss within one of our upstreams in Chicago, INAP, between approximately 12:30pm CST and 12:45pm CST. They have confirmed the problem within their network and say that they are still investigating it now.

The loss seems to have subsided, but while it was occurring, a portion of clients reaching our network through this specific upstream would have seen up to 90% packet loss, essentially making services at this location unreachable.
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Upcoming routine upstream maintenance (Telia)
Apr 11 2018 01:43:46 PM PT   One of our upstream providers will be performing maintenance on one of its routers between 4:30am CDT and 5:30am CDT on April 13. This maintenance may cause brief periods of connectivity loss or increased latency for some clients to your service.
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Chicago problem

May 31 2018 01:16:30 PM PT   We are currently investigating an issue with our Chicago PoP that seems to be breaking connectivity for most clients.

---

If anyone has any issue related to this, report it here, please!  :cheers:
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Upcoming INAP maintenance early 7/17

Jul 16 2018 11:26:06 AM PT - Between 2:30am and 4:30am CDT on Tuesday, July 17

INAP will be performing maintenance on one of our links to them in Chicago. This may cause a brief connection interruption and/or routing reconvergence for customers reaching us, or being reached over, our INAP connections at this location (INAP is one of three transit providers that we use in Chicago, in addition to direct peering with many ISPs).
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Brief upstream (Telia) issue a short time ago

[spoiler]Between approximately 8:13pm and 8:27pm CDT, one of our upstreams in Chicago (Telia) experienced some sort of problem with the router that we connect to. This brought down our links to that upstream and caused some lost packets and routing reconvergence for customers coming in, or going out, over that transit provider.

We have opened a ticket with Telia to ask about this and will continue to follow up with them.

Update @ 9:52pm CDT: Telia has responded that this was a "very brief major outage" and that they will send us an RFO at a later (unspecified) time.[/spoiler]

If anyone has any problem, let us know here! Thanks  :cheers:
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Upcoming Xen upgrade with reboot @ 1:30am Central Time on September 14
Sep 13 2018 09:34:17 AM PT    We are planning to reboot the machine hosting your VDS at approximately 1:30am Central Time on September 14, in order to apply critical Intel microcode and Xen updates to address a new speculation-related vulnerability (Foreshadow/L1TF).

We have avoided the need for other reboots over the last year by using the Xen livepatching functionality to update running code. Because this specific new flaw requires the application of updated microcode at boot-time, and because its Xen code updates cannot be patched into a running system, that was not possible in this case. We plan to continue avoiding reboots on our end as much as possible in the future.

As part of the fix for Foreshadow/L1TF, we are also being forced to disable hyperthreading (SMT) globally for our machines. This means that customer virtual cores can no longer be assigned to exclusive hyperthreaded cores. However, our systems have such low overall CPU usage that customer VDSes with heavy CPU usage on specific virtual cores usually have them assigned to physical cores with very light-usage neighbor threads by the Xen scheduler already, essentially turning virtual cores into full physical cores. As a result, we expect (and have so far observed) minimal, if any, performance impact from the switch away from SMT. If you do notice reduced performance after the maintenance or see unusual CPU usage on your VDS (now also visible through newly-added CPU usage graphs on the "Server usage" page), please contact us, and we can explore a possible move to a different physical machine.

We also recommend that all customers take this opportunity to apply the latest security updates from their OS distributions. The vendors for all currently-supported operating systems have released patches for the new vulnerability.

For the reboot, your VDS will need to be shut down for approximately 15-30 minutes. We will attempt to gracefully shut it down on our end through Xen, but sometimes this doesn't work perfectly, so we advise that you turn off applications that might write to disk before the maintenance event. If you are running Windows 2012 R2, please also note your VDS may need additional reboots to work properly, due to a bug within Xen related to how it boots up (that only occurs during bootup) -- our system will attempt to detect when this is the case and perform the extra reboot operations, but we recommend checking afterward yourself, as well.

---------------------------

Servers should be back up; server port order is off at the moment and will be adjusted some time late this weekend.
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet

Upcoming move to a new machine after 3am CST on Nov. 18
Dec 17 2018 12:26:40 PM PT    Starting at approximately 3am CST on Nov. 18, we plan to begin moving all customers off of the machine hosting your server, to other machines at the same location. We are doing this in order to decommission this old machine, which is nearing the end of its useful life because it is no longer acceptably fast or power-efficient.

We expect the process to take at least a few hours, as we move the servers one by one. At some point in the morning, you will see your VDS go offline for a period of time as it is moved. The amount of time will depend on how much hard drive space you have used, from as little as 5 minutes to as much as a couple of hours. You will be able to monitor the progress of the move through the "Server control" page in the control panel.

If the timeframe of this maintenance event will not work for you, please let us know ASAP. We can individually move your server earlier than the maintenance event, for instance, if that works better for you.
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet

I think that should say December 18th, i.e. tomorrow as I post this.
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet

MS move completed; MS restarted  :cheers:
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet

Subject and date   Description
Some attacks in Chicago causing null-routes
May 20 2019 10:13:24 PM PT    We have seen a few attacks in Chicago today that have forced our system to implement emergency null-routes against the target customers' IP addresses. Null-routes are always a big deal for us to see because they mean that some clients will have experienced short bursts of partial packet loss (generally lasting for 5-10 seconds seconds with each null) -- and we know how much even short periods of packet loss can hurt a game server or other latency-sensitive streaming-type service.

Since DDoS attacks are always getting larger, we have been exploring upgrade options in Chicago since last October, when it became clear that INAP (our primary upstream at most locations) has an inadequately-sized network and has no immediate or even long-term plans to upgrade it or otherwise improve their rudimentary systems for dealing with attacks (our own mitigation systems are highly robust). We have quotes in from other upstreams that we already partner with, and we chose an upgrade path some time ago. For the last few weeks, we have been waiting for INAP, which is also our facilities partner at this location, to work on some physical components of the upgrade.

We will continue to push the upgrades through as much as we can from our end. After upgrades are complete, this location will have significantly higher capacity and much more resistance to attacks.

We also have other upgrades in the pipe for late this year/early next year that will further increase our internal and external capacity in Chicago and other locations (including Seattle). We are always considering and implementing more upgrades!
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet

Upcoming Xen upgrade with reboot @ 1:30am Eastern Time on June 20

You may need to reboot your servers depending on your location.  I had to restart the MS processes.
The BOBclan:  A Rich History


Quote from: Unit 33 on November 29, 2014, 03:44:44 AM
'Please, tell me more about the logistics of the design of laser swords being wielded by space wizards' - Some guy on the internet.

Upcoming router maintenance early 8/13 that might cause brief blip

Aug 12 2019 02:45:35 PM PT   

We will be shifting traffic to our secondary router in Chicago between 1am and 4am  CDT on Tuesday, 8/13, in order to take the primary router offline and perform upgrades to it, including installing new 100G ports and upgrading its internal connection to our aggregation switch. We expect for this to cause a brief (few seconds-long) blip in connectivity for your service at this location, though there is the possibility of a slightly longer downtime if there is more than an expected amount of reconvergence.
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

Facility power maintenance on 8/16 and 8/20 will cause downtime
Aug 14 2019 12:01:10 AM PT   

We have been notified by the facilities provider in Chicago (Equinix) that they will be replacing half of their Automatic Static Transfer Switches on Friday, August 16, between 10pm CDT and 6am of the next day, and the rest of the transfer switches on Tuesday, August 20, between 10pm CDT and 6am of the next day.
Anyder | Talent, Ops & Culture | SWBF & Player Engagement
Email: communityambassador@swbfgamers.com
SWBFSpy Discord: http://discord.swbfspy.com
SWBFSpy Info: http://info.swbfspy.com

August 15, 2019, 06:14:44 PM #59 Last Edit: August 15, 2019, 06:16:22 PM by Led
I will not that this maintenance will shutdown the Master Server, so no hosting or playing will be possible during these maintenance windows.




Update @ 2:58pm CDT on 8/15: We have asked the site to migrate the network switches that will be impacted by the Friday night to alternate power in advance -- specifically, between 7am and 9am CDT on Friday morning, as that is a far lower-usage time than 10pm. Most customers will see a short (few-minute-long) connectivity blip during this window on Friday morning as a result.

We plan to schedule a maintenance for next week to move those, and other switches, back to the power feeds that have already been fixed. This will impact all customers and will likely occur on Tuesday morning at 7am. We will update this event with more specific details.
Quote from: Abraham Lincoln. on November 04, 1971, 12:34:40 PM
Don't believe everything you read on the internet