The keys are transported in the SDP attachment of a SIPmessage. That means, the SIP transport layer must make sure that no one else can see the attachment. This can be done by using TLS transport layer, or other methods like S/MIME. Using TLS assumes that the next hop in the SIP proxy chain can be trusted and it will take care about the security requirements of the request. The main advantage of this method is that it is extremely simple. The key exchange method has been picked up by several vendors already, even though some vendors do not use a secure mechanism to transport the key. This helps to get the critical mass of implementation to make this method the de facto standard. To illustrate this principle with an example, the phone sends a call to the proxy. By using the sips scheme, it indicates that the call must be made secure. The key is base-64 encoded in the SDP attachment. INVITE sips:*97@ietf.org;user=phone SIP/2.0 Via: SIP/2.0/TLS 172.20.25.100:2049;branch=z9hG4bK-s5kcqq8jqjv3;rport From: "123" ;tag=mogkxsrhm4 To: Call-ID: 3c269247a122-f0ee6wcrvkcq@snom360-000413230A07 CSeq: 1 INVITE Max-Forwards: 70 Contact: ;reg-id=1 User-Agent: snom360/6.2.2 Accept: application/sdp Allow: INVITE, ACK, CANCEL, BYE, REFER, OPTIONS, NOTIFY, SUBSCRIBE, PRACK, MESSAGE, INFO Allow-Events: talk, hold, refer Supported: timer, 100rel, replaces, callerid Session-Expires: 3600;refresher=uas Min-SE: 90 Content-Type: application/sdp Content-Length: 477 v=0 o=root 2071608643 2071608643 IN IP4 172.20.25.100 s=call c=IN IP4 172.20.25.100 t=0 0 m=audio 57676 RTP/SAVP 0 8 9 2 3 18 4 101 a=crypto:1 AES_CM_128_HMAC_SHA1_32 inline:WbTBosdVUZqEb6Htqhn+m3z7wUh4RJVR8nE15GbN a=rtpmap:0 pcmu/8000 a=rtpmap:8 pcma/8000 a=rtpmap:9 g722/8000 a=rtpmap:2 g726-32/8000 a=rtpmap:3 gsm/8000 a=rtpmap:18 g729/8000 a=rtpmap:4 g723/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 a=ptime:20 a=encryption:optional a=sendrecv The phone receives the answer from the proxy and now there can be a two-way secure call: SIP/2.0 200 Ok Via: SIP/2.0/TLS 172.20.25.100:2049;branch=z9hG4bK-s5kcqq8jqjv3;rport=62401;received=66.31.106.96 From: "123" ;tag=mogkxsrhm4 To: ;tag=237592673 Call-ID: 3c269247a122-f0ee6wcrvkcq@snom360-000413230A07 CSeq: 1 INVITE Contact: Supported: 100rel, replaces Allow-Events: refer Allow: INVITE, ACK, CANCEL, BYE, REFER, OPTIONS, PRACK, INFO Accept: application/sdp User-Agent: pbxnsip-PBX/1.5.1 Content-Type: application/sdp Content-Length: 298 v=0 o=- 1996782469 1996782469 IN IP4 203.43.12.32 s=- c=IN IP4 203.43.12.32 t=0 0 m=audio 57076 RTP/SAVP 0 101 a=rtpmap:0 pcmu/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-11 a=crypto:1 AES_CM_128_HMAC_SHA1_32 inline:bmt4MzIzMmYxdnFyaWM3d282dGR5Z3g0c2k5M3Yx a=ptime:20 a=sendrecv
A common problem with secure media is that the key exchange might not be finished when the first media packet arrives. In order to avoid initial clicks, those packets must be dropped. Usually this is only a short period of time, so that this is no major problem. The SDES method does not address the "end-to-end" media encryption. For example, if user A is talking to user B via a proxy P, SDES allows negotiation of keys between A and P or between B and P, but not between A and B. For end-to-end media security you must first establish a trust relationship with the other side. If you use a trusted intermediate for this, the call setup delay will significantly increase, which makes applications like push-to-talk difficult. If you do this peer-to-peer, it might be difficult for you to identify the other side. For example, your operator might implement a B2BUA architecture and play the role of the other side, so that you still don't have end-to-end security. Newer, modern protocols, like ZRTP, offer end-to-end encryption for SIP/RTP calls.