Server Troubleshooting - Helping Other Users

Having issues with your Timekoin Server? Someone might be able to answer your questions here.
Locked
User avatar
KnightMB
Site Admin
Posts: 1019
Joined: Thu Feb 23, 2012 5:03 pm

Server Troubleshooting - Helping Other Users

Post by KnightMB »

This topic details some very basic and easy to do troubleshooting for other users having issues (such as Outbound only or Can't get elected).

You can do just about anything in a web browser for troubleshooting. Timekoin servers communicate with each other via HTTP, so some basic examination of other servers is actually quite easy.

Outbound Only Mode:
This warning message is displayed to server operators when no peers are actually trying to communicate with the given server.

Test 1: Can anyone from the Internet actually communicate with the user's server?
Way to Test: Ask what IP or Domain the user is located at (we can use timekoin.com as an example), along with port number (1528 for example) and subfolder (usually timekoin, but can be anything the user wants)
Web Browser: Using this information, here is an example server poll that works with the timekoin server.
timekoin.com:80/timekoin/peerlist.php?action=poll&challenge=1
Should simply respond back with: 6c300461

Test 2: Is the response time taking too long?
Way to Test: Just count the seconds between the moment you send the request and when the server responds back. It is not an exact science, but it can give you a general idea.
Reason: If it takes longer than 3 seconds (3000 ms to be technical) for the server to respond to this simple query it might be a problem with the Internet connection such as the ISP is having issues or the link that the server is using has become over-saturated (full) due to other traffic such as large downloads, uploads, other p2p file sharing, etc. As a way to keep server communications flowing in Timekoin, it has a 3000 ms timeout on request it makes to other servers. This serves two purposes. One is to keep performance levels good and the other is for security reasons. On the performance level, if a server is connected to other peers that take a very long time to respond, it affects the performance of the server because it relies on the other peers to provide timely information such as the simple poll request above to signal that the other servers are still online and processing request. On the security side, many types of attacks on a network can consist of both flooding (just sending out a ton of garbage information to keep the other severs busy) and the lesser known "slow attacks" where the attacker purposely slows done communication response to backlog and overflow server buffers or fill up processing queue.

Test 3: Is the response getting mangled along the way somewhere along the Internet?
Way to Test: This one is a bit more tricky. Some ISP actually inject other traffic into the poll request. So the same simple poll example above might have a response that includes the correct response and some other data that is not suppose to be there. Timekoin has very strict filtering rules on the data it receives and this problem can cause other servers to ignore the troubled server because it seems like all the response data is corrupt.

Test 4: Is the troubled server just connected with bad peers to begin with?
Way to Test: You can manually poll a server to see if it is full or still accepting new peer connections. Using the example above for Test 1, change the action type to join.
timekoin.com:80/timekoin/peerlist.php?action=join
Should simply respond back with: OK or FULL
The OK response means it is waiting for more peers to join it. The FULL response means it has the peer list filled to the set capacity by the server operator.
If a troubled server is FULL but yet still shows the Outbound Only warning, then none of the connected peers are actually taking the time to communicate back. This can usually be quickly resolved by simply kicking the entire peer list and letting it rebuild fresh peers that *want* to communicate with the troubled server.

Test 5: Ask the other peers for *your* troubled server performance data
Way to Test: The server operator can also do a "Poll Failure Scores" from the peer list tab to see if the other peers are actually tracking performance information for the troubled server. If all of them give a "No Response" answer, then it means those peers are not tracking communication with the troubled server and should be removed to allow room for peers that will. Another indicator that the other peers have quit communicating with your own server.

Test 6: Server has a corrupt database
Way to Test: This one is very difficult to test without doing a side by side comparison of a known good server and the one having issues. If the troubled server is responding to poll request with corrupt data, the other peers tend to ignore it until communication cease. The reason is, from a security perspective, a malicious hacker could modify their own server to give out corrupt data responses in hopes of confusing or attacking the Timekoin network. Because a server with a corrupt database can appear to be doing the same thing, all the other peers on the Timekoin network would NOT know the difference between a server having a bad day and a server crafted to attack. Normally a corrupt database is going to create a lot of errors in the logs and may need the utilities for checking and repair run on it as well.
Locked