Skip to content

Azure/azure-cosmosdb-java

Repository files navigation

Java SDK for SQL API of Azure Cosmos DB

Important

Retirement of this SDK (Azure Cosmos DB Async Java SDK v2.x) has been announced for August 31, 2024. We recommend upgrading to Azure Cosmos DB Java SDK v4. Please visit our Azure Cosmos DB Java SDK migration guide for details on the benefits of migration and the migration process.

Maven Central Build Status Known Vulnerabilities

Consuming the official Microsoft Azure Cosmos DB Java SDK

This project provides a SDK library in Java for interacting with SQL API of Azure Cosmos DB Database Service. This project also includes samples, tools, and utilities.

Jar dependency binary information for maven and gradle can be found here at maven.

For example, using maven, you can add the following dependency to your maven pom file:

<dependency>
  <groupId>com.microsoft.azure</groupId>
  <artifactId>azure-cosmosdb</artifactId>
  <version>2.6.16</version>
</dependency>

Useful links:

Prerequisites

  • Java Development Kit 8
  • An active Azure account. If you don't have one, you can sign up for a free account. Alternatively, you can use the Azure Cosmos DB Emulator for development and testing. As emulator https certificate is self signed, you need to import its certificate to java trusted cert store as explained here
  • (Optional) SLF4J is a logging facade.
  • (Optional) SLF4J binding is used to associate a specific logging framework with SLF4J.
  • (Optional) Maven

SLF4J is only needed if you plan to use logging, please also download an SLF4J binding which will link the SLF4J API with the logging implementation of your choice. See the SLF4J user manual for more information.

API Documentation

Javadoc is available here.

The SDK provide Reactive Extension Observable based async API. You can read more about RxJava and Observable APIs here.

Usage Code Sample

Code Sample for creating a Document:

import com.microsoft.azure.cosmosdb.rx.*;
import com.microsoft.azure.cosmosdb.*;

ConnectionPolicy policy = new ConnectionPolicy();
policy.setConnectionMode(ConnectionMode.Direct);

AsyncDocumentClient asyncClient = new AsyncDocumentClient.Builder()
                .withServiceEndpoint(HOST)
                .withMasterKeyOrResourceToken(MASTER_KEY)
                .withConnectionPolicy(policy)
                .withConsistencyLevel(ConsistencyLevel.Eventual)
                .build();

Document doc = new Document(String.format("{ 'id': 'doc%d', 'counter': '%d'}", 1, 1));

Observable<ResourceResponse<Document>> createDocumentObservable =
    asyncClient.createDocument(collectionLink, doc, null, false);
    createDocumentObservable
                .single()           // we know there will be one response
                .subscribe(

                    documentResourceResponse -> {
                        System.out.println(documentResourceResponse.getRequestCharge());
                    },

                    error -> {
                        System.err.println("an error happened: " + error.getMessage());
                    });

We have a get started sample app available here.

Also We have more examples in form of standalone unit tests in examples project.

Guide for Prod

To achieve better performance and higher throughput there are a few tips that are helpful to follow:

Use Appropriate Scheduler (Avoid stealing Eventloop IO Netty threads)

SDK uses netty for non-blocking IO. The SDK uses a fixed number of IO netty eventloop threads (as many CPU cores your machine has) for executing IO operations.

The Observable returned by API emits the result on one of the shared IO eventloop netty threads. So it is important to not block the shared IO eventloop netty threads. Doing CPU intensive work or blocking operation on the IO eventloop netty thread may cause deadlock or significantly reduce SDK throughput.

For example the following code executes a cpu intensive work on the eventloop IO netty thread:

Observable<ResourceResponse<Document>> createDocObs = asyncDocumentClient.createDocument(
  collectionLink, document, null, true);

createDocObs.subscribe(
  resourceResponse -> {
    //this is executed on eventloop IO netty thread.
    //the eventloop thread is shared and is meant to return back quickly.
    //
    // DON'T do this on eventloop IO netty thread.
    veryCpuIntensiveWork();
  });

After result is received if you want to do CPU intensive work on the result you should avoid doing so on eventloop IO netty thread. You can instead provide your own Scheduler to provide your own thread for running your work.

import rx.schedulers;

Observable<ResourceResponse<Document>> createDocObs = asyncDocumentClient.createDocument(
  collectionLink, document, null, true);

createDocObs.subscribeOn(Schedulers.computation())
subscribe(
  resourceResponse -> {
    // this is executed on threads provided by Scheduler.computation()
    // Schedulers.computation() should be used only the work is cpu intensive and you are not doing blocking IO, thread sleep, etc. in this thread against other resources.
    veryCpuIntensiveWork();
  });

Based on the type of your work you should use the appropriate existing RxJava Scheduler for your work. Please read here Schedulers.

Disable netty's logging

Netty library logging is very chatty and need to be turned off (suppressing log in the configuration may not be enough) to avoid additional CPU costs. If you are not in debugging mode disable netty's logging altogether. So if you are using log4j to remove the additional CPU costs incurred by org.apache.log4j.Category.callAppenders() from netty add the following line to your codebase:

org.apache.log4j.Logger.getLogger("io.netty").setLevel(org.apache.log4j.Level.OFF);

OS Open files Resource Limit

Some Linux systems (like Redhat) have an upper limit on the number of open files and so the total number of connections. Run the following to view the current limits:

ulimit -a

The number of open files (nofile) need to be large enough to have enough room for your configured connection pool size and other open files by the OS. It can be modified to allow for a larger connection pool size.

Open the limits.conf file:

vim /etc/security/limits.conf

Add/modify the following lines:

* - nofile 100000

Use native SSL implementation for netty

Netty can use OpenSSL directly for SSL implementation stack to achieve better performance. In the absence of this configuration netty will fall back to Java's default SSL implementation.

on Ubuntu:

sudo apt-get install openssl
sudo apt-get install libapr1

and add the following dependency to your project maven dependencies:

<dependency>
  <groupId>io.netty</groupId>
  <artifactId>netty-tcnative</artifactId>
  <version>2.0.20.Final</version>
  <classifier>linux-x86_64</classifier>
</dependency>

For other platforms (Redhat, Windows, Mac, etc) please refer to the instructions on netty's wiki.

Using system properties to modify default Direct TCP options

We have added the ability to modify the default Direct TCP options utilized by the SDK. In priority order we will take default Direct TCP options from:

  1. The JSON value of system property azure.cosmos.directTcp.defaultOptions. Example:

    java -Dazure.cosmos.directTcp.defaultOptions={\"idleEndpointTimeout\":\"PT24H\"} -jar target/cosmosdb-sdk-testing-1.0-jar-with-dependencies.jar Direct 10 0 Read
  2. The contents of the JSON file located by system property azure.cosmos.directTcp.defaultOptionsFile. Example:

    java -Dazure.cosmos.directTcp.defaultOptionsFile=/path/to/default/options/file -jar Direct 10 0 Query
    
  3. The contents of the JSON resource file named azure.cosmos.directTcp.defaultOptions.json. Specifically, the resource file is read from this stream:

    RntbdTransportClient.class.getClassLoader().getResourceAsStream("azure.cosmos.directTcp.defaultOptions.json")

    Example: Contents of resource file azure.cosmos.directTcp.defaultOptions.json.

    {
      "bufferPageSize": 8192,
      "connectionTimeout": "PT1M",
      "idleChannelTimeout": "PT0S",
      "idleEndpointTimeout": "PT1M10S",
      "maxBufferCapacity": 8388608,
      "maxChannelsPerEndpoint": 10,
      "maxRequestsPerChannel": 30,
      "receiveHangDetectionTime": "PT1M5S",
      "requestExpiryInterval": "PT5S",
      "requestTimeout": "PT1M",
      "requestTimerResolution": "PT0.5S",
      "sendHangDetectionTime": "PT10S",
      "shutdownTimeout": "PT15S"
    }
    

Values that are in error are ignored.

Common Perf Tips

There is a set of common perf tips written for our sync SDK. The majority of them also apply to the async SDK. It is available here.

Future, CompletableFuture, and ListenableFuture

The SDK provide Reactive Extension (Rx) Observable based async API.

RX API has advantages over Future based APIs. But if you wish to use Future you can translate Observables to Java native Futures:

// You can convert an Observable to a ListenableFuture.
// ListenableFuture (part of google guava library) is a popular extension
// of Java's Future which allows registering listener callbacks:
// https://github.com/google/guava/wiki/ListenableFutureExplained

import rx.observable.ListenableFutureObservable;

Observable<ResourceResponse<Document>> createDocObservable = asyncClient.createDocument(
  collectionLink, document, null, false);

// NOTE: if you are going to do CPU intensive work
// on the result thread consider changing the scheduler see Use Proper Scheduler
// (Avoid Stealing Eventloop IO Netty threads) section
ListenableFuture<ResourceResponse<Document>> listenableFuture =
  ListenableFutureObservable.to(createDocObservable);

ResourceResponse<Document> rrd = listenableFuture.get();

For this to work you will need RxJava Guava library dependency. More information available on RxJavaGuava's GitHub.

Checking out the Source Code

The SDK is open source and is available here sdk.

Clone the Repo

git clone https://github.com/Azure/azure-cosmosdb-java.git
cd azure-cosmosdb-java

How to Build from Command Line

Run the following maven command to build:

maven clean package -DskipTests

Running Tests from Command Line

Running tests require Azure Cosmos DB Endpoint credentials:

mvn test -DACCOUNT_HOST="https://REPLACE_ME_WITH_YOURS.documents.azure.com:443/" -DACCOUNT_KEY="REPLACE_ME_WITH_YOURS"

Import into Intellij or Eclipse

  • Load the main parent project pom file in Intellij/Eclipse (That should automatically load examples).

  • For running the samples you need a proper Azure Cosmos DB Endpoint. The endpoints are picked up from TestConfigurations.java. There is a similar endpoint config file for the sdk tests here.

  • You can pass your endpoint credentials as VM Arguments in Eclipse JUnit Run Config:

     -DACCOUNT_HOST="https://REPLACE_ME.documents.azure.com:443/" -DACCOUNT_KEY="REPLACE_ME"
  • or you can simply put your endpoint credentials in TestConfigurations.java

  • The SDK tests are written using TestNG framework, if you use Eclipse you may have to add TestNG plugin to your eclipse IDE as explained here. Intellij has builtin support for TestNG.

  • Now you can run the tests in your Intellij/Eclipse IDE.

FAQ

We have a frequently asked questions which is maintained here.

Release changes

Release changelog is available here.

Contribution and Feedback

This is an open source project and we welcome contributions.

If you would like to become an active contributor to this project please follow the instructions provided in Azure Projects Contribution Guidelines.

We have travis build CI which should pass for any PR.

If you encounter any bugs with the SDK please file an issue in the Issues section of the project.

License

MIT License Copyright (c) 2018 Copyright (c) Microsoft Corporation

Attribution

This project includes the MurmurHash3 algorithm, which came with the following notice: “The MurmurHash3 algorithm was created by Austin Appleby and placed in the public domain. * This java port was authored by Yonik Seeley and also placed into the public domain. * The author hereby disclaims copyright to this source code.”