Network Outage

Discussion in 'Suggestions and Feedback' started by bwaops, Oct 9, 2006.

  1. Last Friday when all of our websites and email (which you host) went down, I had no means to contact you to find out what was going on. I ended up pulling an invoice and calling your billing department. The woman on the phone was as helpful as possible and told me that there was a network outage at your data center in Irvine. Fortunately, the problem was corrected within 30 minutes or so. It would be really nice if you could provide some type of phone number which people can call for informational purposes. I'm not suggesting telephone suport...just some type of recorded message.

    Chad Morrett
    Operations Manager
    Better World Advertising
     
  2. mjp

    mjp

    Telephone system status messages tend to be problematic. More often than not, updating phone messages is a cumbersome manual process that only a small handful of staff members are trained to handle. If no one is in the office who knows how to update the phone system at the moment an outage is occurring, your callers hear your standard "all is well" message, which, of course, negates the purpose of the line altogether. It's tough to administer in a timely and consistent way.

    We posted a notice/explanation regarding the outage in Control Panel once the network was functioning normally, and left it up there for a couple of days so everyone could have a chance to read it.

    It may seem like a drawback of email-only support, the fact that you cannot pick up the phone to ask about a problem. But honestly, had you been able to call during the outage, the rep answering the phone would not have been able to tell you what was happening. We didn't know exactly what the problem was until it was remedied, so I hazard to guess that most people would not have been satisfied by calls to support at that time.
     
  3. There is an easy solution to this. Get a seperatehosting account (offsite - different data center, different network)for emergency only support purpose. It can be nothing but a simple HTML page with status information, and possibly some free forum software so people can discuss the issue when there is one.

    Call the other domainname something like 'discountaspoffsite.com'. Make sure everything related to it is NOT hosting at discountasp.net (nameservers, etc.)
     
  4. Patrick is right. That's what my other host does. I love it. I love the RSS feed, I love their honesty.

    I think it's a good idea with little cost and it wouldn't take much to staff- especially if you have 24/7 support anyway. Look into it!

    Cheers, Ed
     
  5. mjp

    mjp

    I like DreamHost's honesty too.

    Though it's too bad they have so many problems to be honest about. ;)
     
  6. sawrightnet

    sawrightnet Webmaster/Designer

    It really bothers me that DiscountASP is taking this problem so lightly. Your responses serves only DiscountASP and not your customers. You note how tough it is for you and your company to notify us of problems but you never address or even acknowledge what we are going through with our customers.

    Your comment about your tech department not knowing how to post a simple message to a phone system really scares me. If they can't operate voice mail what are they doing in Tech.

    Your comments about how many probems others ISPs are having wasn't funny. My sites have been down more at DiscountASP than any other ISP I have ever used. I had less problems with email when I operated my own email server on an NT4 server running on a standard DSL connection.

    Your customers need information. "I don't know" is better than nothing. Telling us about a problem after it is fixed does us no good.

    Thanks for listening.
     
  7. For about two years I have been asking through tech support and other means that DiscountASP offer us some kind of status either by web or by phone. They continue to say this is a good idea but then nay say any idea. Basically, after two years, I have come to the conclusion the DASP doesn't really want to share their problems with us. No other conclusion is possible, they just really don't care about your problems with your customers and will not / do not want to let you know that problems are occuring.

    Email is down today, no notification in the admin panel, no means of communicating the problem to my customers, why is it down? when could it be up? etc. I am in the dark and maybe that is why DASP doesn't want to share anything with us because their tech staff is in the dark too. They probably just pull the plug and let everything reboot for any problem.
     
  8. mjp

    mjp

    That's not what I said.

    While it may not be rocket science to update messages on most office phone systems, it is not usually a simple or straightforward task. And you don't give the kind of access necessary to make changes to the phone system to every member of your staff. You give it to people who are trained on the system, and that, typically, is a small number of people.
     
  9. Ok, so you made your point, but failed to answer the direct question, to which I have asked many times...what is DASPs plan and when will the plan be put in place for notification of system outages?

    DASP NEVER proactively notifies us of current system problems and I think every single customer would appreciate some real-time notification with problem, solution and time-frame.

    What is the plan? What is the time frame?
     
  10. PRB

    PRB

    I was out of town for a long weekend, what was the total down time for sites?
     
  11. First, you guys at DSAP always poo poo any users suggestion for a communication method. Everyone recognizes that each has a drawback, but anything would be better than what you have now. I would rather have a "drawback" to a communication method than no communication at all. I suprised you don't see that?

    Second, you said that a control panel network status item would be looked into back in October, see the thread on V2.0 Mail Migration problems, http://community.discountasp.net/default.aspx?f=15&m=13374. That was over a month ago and still nothing. Why didn't you start looking into this back then when you said you would? I mean, no offense, but this thread went on for days before you offered to look into a communication method; when you had already said you were going to do this in October. This seems a little odd to me. Are you serious about it this time?

    Third, I think you may have answered your own question. The down time will dictate if a notice is needed. Anything that is unresolved in 15 minutes would warrent a notification. If you guys solve the problem within that time frame then no notice. The notification system puts a little more pressure on the tech people to resolve the problem quickly (ie within the notification time limit).

    Finally, to answer PRBs question, email was down for about 3-4 hours on Saturday morning (12/2/06) starting around 6-7 AM PT. I think an email notification was sent when the service came back up.
     
  12. mjp

    mjp

    I don't necessarily disagree with any of your suggestions for status or communication methods, though I have to point out that every method has drawbacks, and no method is going to be 100% effective or satisfactory for every user. Different people want different methods of communication. There are three different methods mentioned in this thread alone.

    Having said that, I will pursue the issue on this end, and work toward establishing a method/policy that is satisfactory for the majority of users, as well as being workable for staff. A Control Panel message is the most likely solution, but I will look into the viability of other methods as well. I think we can rule out a telephone message in our case, for various reasons.

    We will also have to determine a threshold over which a message is deployed. In other words, if a problem is discovered and immediately fixed, do you make a notification? Someone said, "Telling us about a problem after it is fixed does us no good," but that's when the majority of users will find out about the majority of problems. Something like the mail issue on Saturday would certainly warrant a notice, as it was ongoing for a few hours.
     
  13. Here we are, six months later, and no comment by DASP regarding this issue.

    We have no mention of a plan to notify DASP customers of unexpected downtimes. Just like the time before that when DASP said something was in the works and now we know nothing is in the works, DASP has no plans of making its customers lives easier with some kind of downtime notification process.

    I think it is ridiculous and I find it very unprofessional that DASP doesn't care at all about this issue.
     
  14. mjp

    mjp

    Are you going to post this message in every thread about an outage?
     
  15. Nope, just in the two that I participated. And the ones in which this topic was raised.
     
  16. Takeshi Eto

    Takeshi Eto DiscountASP.NET Staff

    Jason

    We update the Status Page when we have new information.

    Eric
    DiscountASP.NET
     
  17. Takeshi Eto

    Takeshi Eto DiscountASP.NET Staff

    Our team is contacting the FBI.

    The bottleneck is outside of our hosting system - the linecoming into our cage is being saturatedby the attack.That line is equivalent to 667 T1 lines. The magnitude of the attack is the largest we have ever seen. So we are working on this issue with our upstream provider's senior security team to implement various filtering schemes - some of which takes special equipment or specialized personel - that's why it takes time to setup. They tried something which looked promising and it didn't work. We'll continue to update the status page with more details when we have more information.

    When we post something to the status page, that is what we are seeing and believe is truewith the network status. When the attack pattern is diminished for about an hour or two and things are coming back online, from past behavior, everything looks good. So we posted. Well, after we posted that notice, the attack started to intensify again. So we posted that we are being attacked again. We are not trying to pull a fast one here. We are just being honest.
     
  18. mjp

    mjp

    I don't think you fully comprehend the scale of the attack.

    When our upstream provider (one of the world's largest) attempted to filter the traffic going to our network, it slowed down the entire data center and they had to remove the filter. This attack is on the same scale as one that brought down Yahoo.

    We weren't prepared for it, that is correct. But I don't think there's any shame in admitting that we weren't prepared for an attack on a scale that most people who attempted to help us had never seen. Who would be prepared for that?

    Now that we know what we could face in the future, we are prepared.
     
  19. I haven't participated on this forum before now, and until yesterday I would have agreed with the great reviews DiscountASP has been getting. But I never had a problem until yesterday.

    I think it is pathetic that there is no phone support or messaging, and the lack of timeliness in getting an "Emergency" web page up and running is scary. Then, once it was up the page told me nothing.

    You owe all of your customers better than this. Unfortunately, I think many of us will be gone before you figure out how to type "We're sorry".

    Please give me some reason to stay with you... some bit of meaningful information... a promise that this sort of crippling downtime won't occur again. Otherwise I am outa here.
     
  20. Please tell us the FBI has been notified and are investigating
     
  21. MJP/Eric/DASP,

    We don't want to be left in the wind. The www.daspstatus.com page works great! However, I didn't know about it until the e-mail went out telling everyone. If someone would take ownership of notifying us of the issue that would be the best solution. Leveraging the DASPStatus page to keep us up to date is a wonderful idea. However, it would be nice to see that we will give you an update in 2 hours. Come back in 2 hours and state nothing has changed then update in 2 hours. At lease we would know that someone at DASP is working on the issue and plans on keeping everyone informed. </o:p>
    </o:p>
    COMMUNICATION! COMMUNICATION! COMMUNICATION! </o:p>
    </o:p>
    It's that simple!!</o:p>
    </o:p>
    Myexpectation is to know there are a problem (ASAP), what the problem is,when the problem has occurred, and an ETA for resolution or update on the problem. MJP: This process requires 1 or 2 people. 1 to work on the issue and 1 to type something up on the status page. This could take less than a minute. I'll even help you out. "We are experiencing technical issues. I will update in 1 hour to provide more information."

    MJP, I'm not frustrated that you had a DDOS atack, I'm frustrated the lack of communication and your non-chalant attitude you exibit. Communication is as strong as your weakest link and this link was displayed Thursday and Friday. I don't want to hear we'll look into it. I want a commitment to respond to "Emergency Outage" with as much information you can provide us. Whoever complained about having to much information in a crisis?

    MJP comprehend that your customers want COMMUNICATION! COMMUNICATION! COMMUNICATION!

    Richard
     
  22. This is exactly why I don't run my own web server, it is so much easier to blame you (DASP) when the website is down. I did some reading on DDos's and how difficult they can be to shut off so I'm not that irritated with the down time. Two days is not going to run us out of business, BUT, I'd rather it not happen again. I've tried several other hosts and even with the recent event, DASP is by far the best overall.

    As far as http://www.daspstatus.com is concerned, I didn't even know it existed until the broadcast email went out andthen with the DDos ongoing, I was lucky to get it without timing out. What I would like to see is a real-time network status page like some other hosts I've had (ie. http://www.verio.com/support/files/network_status.cfm) . Obviously the status page should be outside of the datacenter. All I want to know is that you are aware of the problem and are working on it, saving me having to submit a support ticket.

    Just my 2 cents,

    Paul
     
  23. Richard wrote: 'MJP comprehend that your customers want COMMUNICATION! COMMUNICATION! COMMUNICATION!' and he is right on. I would add that what we don't need is sarcasm, defensiveness, and condescension.

    I think most of us understand that:

    1 - The attack was not DASP's fault
    2 - dDOS attacks can happen to anybody, big or small
    3 - We don't expect perfection, just honest, straightforward communication
    4 - A MUCH better, more thorough and more frequently updated status page during a crisis.

    Thanks. I am still waiting for you to tell me why I should stay. Other users have made an okay case, but I want you to convince me you are a good partner.

    Regards,
    Steve (Sleddog)
     
  24. Takeshi Eto

    Takeshi Eto DiscountASP.NET Staff

    We totally agree with everyone that communication is VERY important - especially in this type of crisis. There was a total outage so we set up an offsite status page to inform customers of what was going on. The problem was that we do mass emails to our customers from our datacenter, but it was down so we couldn't do the mass email to inform customers about the status page until the DDOS attack subsided enough to let us send emails out.

    We updated the status page when there was new information. There are two schools of thought on this and the host gets hammered either way you go. If we post when there is new information like we did, some people bash the host for not updating enough - even if it was "we are still working on it". And when a host posts every hour, "we are still working on it", other people will bash the host for continuing to post the same message over and over. I've seen first hand how hosts get bashed both ways at the other hosts that I've worked for.

    We are not perfect, for sure. No host can claim to be. We learn from our experiences and we will continue to improve our hosting infrastructure and our communications and customer support.
     
  25. mjp

    mjp

    Sleddog said...
    Richard wrote: "MJP comprehend that your customers want COMMUNICATION! COMMUNICATION! COMMUNICATION!" and he is right on. I would add that what we don't need is sarcasm, defensiveness, and condescension.

    [/quote]If you're referring to my post, "Are you going to post this message in every thread about an outage?" that was a response from February to someone who was digging up several old threads and posting the same thing in each of them.

    If you are not referring to that post, then I'm at a loss, because we haven't been sarcastic, defensive or condescending about the DDoS.

    I'm going to continue to maintain, as I have in the past, that there is no such thing as perfect communication. It is a question of balance, and it is impossible to provide communication that suits everyone's needs. That isn't a defense of our policy, it is a personal observation based on communicating with hundreds of thousands of web hosting customers since 1996.

    You want, "A MUCH better, more thorough and more frequently updated status page," but the problem that we face is that your "better" is not necessarily everyone else's "better."

    We have a wide range of knowledge in our customer base. Understandably some of you would appreciate detailed technical explanations, but those explanations would be meaningless to many of our customers, who would be forced to contact us and ask for clarification. So we are communicating in the way that works best for us and the majority of our users.

    For anyone who wants more details, there is Bruce's blow-by-blow of the DDoS here in the forum: http://community.discountasp.net/default.aspx?f=15&amp;m=18352 That post is 100% "honest, straightforward communication," and that is what we're aiming for all the time.
     
  26. mjp, the DASPStatus.com page solves a lot of the issues about communication and as Eric stated "We learn from our experiences and we will continue to improve our hosting infrastructure and our communications and customersupport.", this is actually a good benefit to all of us (DASP customers) as it shows how agile you(DASP)is torepondingto anissue(s). Yes there are some of your customers that are not concerned as much as the other or have different opinions about the volume of communication but in an "Emergency" (maybe to harsh of a word), I have never heard anyone complain about too much communication.

    I understand that there is no such thing as perfect communication but that information should be accessible when needed. Again, I believe the DASPStatus page will resolve this issue. In the future the customers that are not seeking information critical information,will not visit the DASPStatuspage. The ones that do, seek information or an update on the situation.

    Also, ""A MUCH better, more thorough and more frequently updated status page" really should traslate into a DASP commiting to improve the communication with their customers. That is all we are really asking. We would like for DASP to take ownership of providing better communication and delivering results. A few of your responses indicate that you heard our suggestions, acknowledge it, then blew them off with your personal observations from since 1996. Your expeirence, skill, and points are all well taken but it still doesn't really address action that will be taken to enhance or resolve the communication delimma. If the resolution is the DASP Status page, tell us. If you're working ona more robust solution, tell us.

    Again, I believe the DASP Status page could resolve the issue if it is maintained and communicated as a formal source for information.
     
  27. Takeshi Eto

    Takeshi Eto DiscountASP.NET Staff

    We are committed to improving communications. And we will continue to work at it.

    Some of the discussion is around the method of communication. And our staff is pointing out some issues around the efficiency, the effectiveness, the operational costs around some of the different methods of communication based on their experiences. Some of the realities are counter-intuitive.

    We speak from our staff's numerous years of combined experience in the hosting business.Altogether our staff has worked at or with more than a dozen hosting companies/brands(small startups, mid-range,and huge) from back in the day. Some hosts have gone away, some have been acquired,and some are in the top 10 in the world.All the hosting companies we've been involved with have had some global issue that affected all their customers at one time or another and we've all participated in dealing with these emergency situations from all angles of the organization. I'll also add that all hosts have been attacked maliciously one time or another.

    We are still in the middle of cleaning up after the DDOS attack both on the infrastructure side and support side. We will definitely be conducting a thorough post-mortem analysis of this incident and we will make changes to improve our hosting service and communications.
     
  28. PRB

    PRB

    I just went another 10 - 15 minutes without being able to get to these forums, your site, and my sites....
     
  29. Eric, In regards to improving communication, when will the DASP team address the current procedure and provide us (DASP Customers) a guideline for our expectations? Iknow your still cleaning up fromDDOS attack but more of after the dust settles, can we expect that this issue will be addressed? In the interim, can we utilize the DASP Status page as a means to gain information on current/post "Emergency" issues?

    On another note, it may appear that I'm ungrateful for the effort your team has provided but that is not the case. I fully understand the capacity in which your team was/is involved in this unfortunate issue and wanted to acknowledge how much I appreciate the amount of work the DASP team has contributed to my issue, as well as others.

    Richard
     
  30. mjp

    mjp

    We haven't yet determined how the status page will be used. We are leaning toward reserving it for global outages - that is, anything that affects all users, as opposed to a problem on, say, one server.

    We are also discussing increasing the use of the forum for immediate information updates, and getting everyone used to heading here when they have a question or notice a problem. Increased reliance on the forum would likely involve a switch to new forum software, though, with more notification options (RSS subscriptions to an "outage" forum, for example), so that is not something we can immediately implement without weighing the different options.

    Overall our goal is to improve communication and the sense of community in several ways, so I hope that we will be able to continue to introduce new ways to keep the information flowing both ways.
     

Share This Page