Guus Sliepen [Mon, 6 Nov 2017 21:35:28 +0000 (22:35 +0100)]
Support autoconf's --runstatedir option.
Put the PID file in @runstatedir@ instead of @localstatedir@/run. This
requires autoconf 2.70, which is not released yet, so add a fallback to
use @localstatedir@/run if @runstatedir@ is not set.
Guus Sliepen [Wed, 25 Oct 2017 19:08:29 +0000 (21:08 +0200)]
Only forward SPTPS packets if Forwarding = internal.
This tries to match what is done for packets using the legacy protocol.
However, since SPTPS is end-to-end encrypted, Forwarding = kernel cannot
be implemented. In that case, we also drop the packets.
Guus Sliepen [Sat, 7 Oct 2017 15:47:19 +0000 (17:47 +0200)]
Convert sizeof foo to sizeof(foo).
While technically sizeof is an operator and doesn't need the parentheses
around expressions it operates on, except if they are type names, code
formatters don't seem to handle this very well.
Allow forcing either IPv4 or IPv6 for sptps_test, and use IPv4 for the
sptps-basic test. Since sptps_test is only opening a single listening
socket, and you cannot control which address family it uses, this gets
around a problem where the listening side is using a different address
family than the one connecting to it.
Guus Sliepen [Tue, 22 Aug 2017 18:51:44 +0000 (20:51 +0200)]
Make autoconnect try to heal network splits.
When we have less than three connections, we greedily try to connect to any
viable node. However, once we have three connections, try to connect to
nodes that we know of but that aren't reachable.
We also make sure that if there are 100 reachable nodes, and 1 unreachable
one, that not all 100 reachable nodes try to connect to the unreachable
at the same time.
On Linux network restart, Tinc can get into a loop writing millions of error messages "Error while reading from Linux tun/tap device (tun mode) /dev/net/tun: File descriptor in bad state" to the log. https://github.com/NixOS/nixpkgs/pull/27675
It should be somehow aborted.
Here is my quick hack.
Guus Sliepen [Sun, 28 May 2017 10:48:32 +0000 (12:48 +0200)]
Set KillMode=mixed in the systemd service file.
This ensures only the main process is sent the SIGTERM, and not anything
else that might have started in the same control group, including the
tinc-down script.
Guus Sliepen [Tue, 7 Mar 2017 18:19:19 +0000 (19:19 +0100)]
Use free_known_addresses() to free memory allocated by get_known_addresses().
We know what struct addrinfo looks like, but the standard says nothing
about how it is allocated. So we cannot trust freeaddrinfo() to work
correctly on the struct addrinfo list we allocated ourselves in
get_known_addresses(). To make a distinction by allocations from the
latter and from str2addrinfo(), we keep two pointers (*ai and *kai) in
struct outgoing, and use the freeing function that is appropriate for
each.
Etienne Dechamps [Sun, 18 Dec 2016 14:32:25 +0000 (14:32 +0000)]
Clarify the flow of add_edge_h().
This is an attempt at making the control flow through this function
easier to understand by rearranging branches and cutting back on
indentation levels.
This is a pure refactoring; there is no change in behavior.
Etienne Dechamps [Sun, 18 Dec 2016 14:25:20 +0000 (14:25 +0000)]
Fix edge updates containing local address changes.
This commit fixes a logic bug in the edge update code where local
address changes are not taken into account if they are bundled in with
other changes. This bug breaks local discovery in some scenarios.
On Windows, don't cancel I/O when disabling the device.
I have observed cases where disable_device() can get stuck on the
GetOverlappedResult() call, especially when the computer is waking up
from sleep. This is problematic when combined with DeviceStandby=yes:
other_side (1.2.3.4 port 655) didn't respond to PING in 5 seconds
Closing connection with other_side (1.2.3.4 port 655)
Disabling Windows tap device
<STUCK>
gdb reveals the following stack trace:
#0 0x77c7dd3c in ?? ()
#1 0x7482aad0 in KERNELBASE!GetOverlappedResult () from C:\WINDOWS\SysWoW64\KernelBase.dll
#2 0x0043c343 in disable_device () at mingw/device.c:244
#3 0x0040fcee in device_disable () at net_setup.c:759
#4 0x00405bb5 in check_reachability () at graph.c:292
#5 0x00405be2 in graph () at graph.c:301
#6 0x004088db in terminate_connection (c=0x4dea5c0, report=true) at net.c:108
#7 0x00408aed in timeout_handler (data=0x5af0c0 <pingtimer>) at net.c:168
#8 0x00403af8 in get_time_remaining (diff=0x2a8fd64) at event.c:239
#9 0x00403b6c in event_loop () at event.c:303
#10 0x00409904 in main_loop () at net.c:461
#11 0x00424a95 in main2 (argc=6, argv=0x2b42a60) at tincd.c:489
#12 0x00424788 in main (argc=6, argv=0x2b42a60) at tincd.c:416
This is with TAP-Win32 9.0.0.9. I suspect driver bugs related to sleep.
In any case, this commit fixes the issue by cancelling I/O only when the
entire tinc process is being gracefully shut down, as opposed to every
time the device is disabled. Thankfully, the driver seems to be
perfectly fine with this code issuing TAP_IOCTL_SET_MEDIA_STATUS ioctls
while there are I/O operations inflight.
Fix crash on Windows when a socket is available for both write and read.
Currently, if both write and read events fire at the same time on a
socket, the Windows-specific event loop will call both the write and
read callbacks, in that order. Problem is, the write callback could have
deleted the io handle, which makes the next call to the write callback a
use-after-free typically resulting in a hard crash.
In practice, this issue is triggered quite easily by putting the
computer to sleep, which basically freezes the tinc process. When the
computer wakes up and the process resumes, all TCP connections are
suddenly gone; as a result, the following sequence of events might
appear in the logs:
Metadata socket read error for node1 (1.2.3.4 port 655): (10054) An existing connection was forcibly closed by the remote host.
Closing connection with node1 (1.2.3.4 port 655)
Sending DEL_EDGE to everyone (BROADCAST): 13 4bf6 mynode node1
Sending 43 bytes of metadata to node2 (5.6.7.8 port 655)
Could not send 10891 bytes of data to node2 (5.6.7.8 port 655): (10054) An existing connection was forcibly closed by the remote host.a
Closing connection with node2 (5.6.7.8 port 655)
<CRASH>
In this example the crash occurs because the socket to node2 was
signaled for reading *in addition* to writing, but since the connection
was terminated, the attempt to call the read callback crashed the
process.
This commit fixes the problem by not even attempting to fire the write
callback when the write event on the socket is signaled - instead, we
just rely on the part of the event loop that simulates level-triggered
write events. Arguably that's even cleaner and faster, because the code
being removed was technically redundant - we have to go through that
write check loop anyway.
Guus Sliepen [Sun, 30 Oct 2016 14:17:52 +0000 (15:17 +0100)]
Use AES256 and SHA256 by default for the legacy protocol.
At the start of the decade, there were still distributions that shipped
with versions of OpenSSL that did not support these algorithms. By now
everyone should support them. The old defaults were Blowfish and SHA1,
both of which are not considered secure anymore.
The meta-protocol now always uses AES in CFB mode, but the key length
will adapt to the one specified by the Cipher option. The digest for the
meta-protocol is hardcoded to SHA256.
Guus Sliepen [Sun, 5 Jun 2016 12:47:21 +0000 (14:47 +0200)]
Preserve IPv6 scope_id in edges.
When creating an edge after authenticating a peer, we copy the
address used for the TCP connection, but change the port to that used
for UDP. But the way we did it discarded the scope_id for IPv6
addresses. This prevented UDP communication from working correctly when
connecting to a peer on the same LAN using an IPv6 link-local address.
Thanks to Rafał Leśniak for pointing out this issue.
thorkill [Thu, 19 May 2016 13:48:15 +0000 (15:48 +0200)]
Prevent tincd from sending packets to unexpecting nodes
Make tincd recognize when it was asleep and close connections to it's
peers. This happens when e.g. RoadWarrior has been suspended for
"longer" time period. After resume, it will start to communicate
with it's peers using the contextes it had before suspend.
On the other side, the nodes closed the connections since PingTimeout
and/or TCP connection went down.
Sending data to such unaware (sptps mostly) nodes will cause
havoc in the logs. Misleading the developers to wrong assumptions
that something is wrong with sptps.