December 18th outage

Discussion in 'Suggestions and Feedback' started by Homebuilde1, Dec 19, 2008.

  1. Let me first say that I like DASP; I tell people about DASP; I plan on staying with DASP. After last year's DDOS attack another backbone provider was added to prevent what happened. My assumption was that if one backbone provider was down, they could switch to the other and service would not be interrupted. So, I was feeling pretty confident until now. You used to operate with one backbone provider; why such latency in this case?

    Forgive my ignorance but what happened on your end?

    I don?t understand why an email cannot be sent out letting people know you're down. I would rather tell my customer we are down then to have them call me.

    The other question is what happens if a backbone provider goes down next week? I checked the forum and a day later there is still no explanation. While it is not DASP's fault that the backbone provider went down, is it DASP?s fault that they didn?t have a plan in place in the event that the backbone provider did go down? Maybe you did and this was it. If so,might youneed another plan?

    I'm not being critical; things happen I know. I?m a forgiving and understanding guy. But, I need to communicate to my customer?s IT staff why this will not going to happen again. Please, what can I tell them?
     
  2. Hi,
    I don't have any answers or inside info but wanted to point out the Emergency Status site DASP setup.
    It's an outside server and they did keep it updated yesterday: http://216.128.30.225/index.html
    The notes there are gone now but it was updated pretty much each hour yesterday.
    Salute,
    Mark


    Technical Evangelist for DiscountASP.NET
    http://www.iis7test.com/webcasts/
    http://weblogs.asp.net/markwisecarver/
    http://blogs.windowsclient.net/wisecarver/default.aspx

    (Microsoft IT Usability http://msitusability.multiply.com/)

    Post Edited (wisemx) : 12/19/2008 5:52:10 PM GMT
     
  3. mjp

    mjp

    We do have two separate backbone connections, and under normal circumstances, the second would accommodate all the traffic if one went down, as it did yesterday. But a couple of things conspired to create the slow conditions that we experienced when relying on one connection. First, the outage came at our busiest time of day, the early morning. Had it happened 12 hours later, I would hazard a guess that very few users would have noticed the problem. Second, the two lines do not have the overhead that they used to. Meaning we use more of the bandwidth on each line as time goes by, so one is less able to maintain normal operation when the other is unavailable.

    We have been aware of the decreasing overhead for a few months now, and have been negotiating with a third provider to bring in another connection from another company. This was going to be done this month, but we decided to put it off until the beginning of 2009 because a lot of the support engineers for both the providers and the router/firewall hardware that would have to be reconfigured to accommodate the third line are on vacation. We did not want to open up a third line, experience a problem and not be able to get sufficient support. We ran into several issues bringing in the second line and learned that support from the provider is essential during this kind of service activation.

    So those issues converged to cause the problem yesterday. An outage of any kind during the busiest quarter of the day creates a kind of log jam effect as the large numbers of users continue to try to connect, creating more requests and further slowing things down. So, long story short, we had an operating connection, but it was saturated with traffic, and that caused the delays.

    Once we bring the third line in we will be back to a scenario where there is plenty of overhead (unused bandwidth) on each line, allowing a failure of one to be gracefully absorbed by the remaining lines. In any event, you never expect a failure of a backbone provider. It is one of those services that you assume is going to be consistently available, but of course there is no such thing in this business, so we (and you) took a beating yesterday.

    I hope that explains the reasons behind the problem.
     

Share This Page