Today, we will introduce Apache Mesos, an open source distributed computing system with the target to allow applications to run on a computer cluster as if it was running on a single computer. On top of a Mesos cluster, we will run Mesosphere Marathon, an open source container orchestration platform. Similar to a watchdog, Marathon helps running and maintaining long-running applications. However, unlike a mere watchdog, Marathon runs the applications in containers, and it provides a modern Web Portal and a modern RESTful API.
With the help of Marathon, we will
- run several instances of a simple “Hello World” script on the cluster (within and outside of Docker containers);
- see, what happens, if an application dies unexpectedly;
- see, what happens, if an application reservation request exceeds the available resources.
For simplicity and quick installation purposes, all components of the Mesos architecture will be run within Docker containers.
What is Mesos?
Mesos is an open source framework and provides a distributed computer system. Mesos provides applications (e.g. Hadoop, Spark, Kafka, Elasticsearch) with APIs for resource management and scheduling across entire datacenters and cloud environments.
The Mesos Agents advertise the available resources (CPU, DRAM, …) to the master, which will relay those offers to frameworks like Marathon, Hadoop and Jenkins and many more. The frameworks may reserve all or part of the offered resources and run the application on the Mesos agents (slave).
What is Marathon?
Mesosphere, the owner of Marathon calls Marathon “a production-grade container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.”
Among others, it offers:
- active-standby redundancy for increased availability
- Marathon is starting containers on the Mesos Agent. Both, Mesos containers (using cgroups) and Docker containers are supported.
- It offers a powerful GUI and
- a REST API for easier integration.
A more complete feature list can be found here.
Compared to other schedulers like Apache Aurora used by Twitter, Marathon seems to be much easier to handle. On the other side, Aurora offers elaborate prioritization and preemption features. Those may be important, if the same resources are shared between production and development: if a production workload does not find any resources on a Mesos slave, Aurora will kill of less important applications in order to free up resources.
A good comparison between Mesosphere Marathon and Apache Aurora can be found on this Hootsuite Development’s web page.
Target Configuration for this Blog Post
In this “Hello World” example, we will create a simple Mesos configuration with a Marathon framework, a single Zookeeper, a single master and a single agent (slave):
We will also show what happens, if we kill a Marathon app.
Versions & Tools used
- Vagrant 1.8.6
- Virtualbox 5.0.20
- Docker 1.12.1
- Mesos 1.1.0 (mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404)
- Marathon 1.3.6 (mesosphere/marathon:latest; was v1.3.6 at time of test)
- ZooKeeper/Exhibitor 1.5.2 (netflixoss/exhibitor:1.5.2)
Prerequisites:
- >~4 GB RAM: after starting the a Mesos master, a Mesos agent (slave), a ZooKeeper and a Marathon Docker container, I have observed a DRAM usage of ~3.8 GB
Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH
If you are using an existing docker host, make sure that your host has enough free memory.
We will run the applications in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest versions without the need to control the java version used.
If you are new to Docker, you might want to read this blog post.
Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:
Prerequisites of this step:
- I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
- Administration rights on you computer.
Steps to install a Docker Host VirtualBox VM:
Download and install Virtualbox (if the installation fails with error message “<to be completed> see Appendix A of this blog post: Virtualbox Installation Workaround)
1. Download and Install Vagrant (requires a reboot)
2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:
basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker basesystem# vagrant init williamyeh/ubuntu-trusty64-docker basesystem# vagrant up basesystem# vagrant ssh
Now you are logged into the Docker host and we are ready for the next step: to create the Docker images.
Note: I have experienced problems with the vi editor when running
vagrant ssh
in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.
Step 2 (recommended): Download Docker Images
This step is optional, since the download will be done automatically with each docker run
command, if the image is not available on the Docker host. However, I recommend to download the images in advance, so if you run the applications, you can observe the logs and other feedback (syntax errors) immediately.
Step 2.1 Download Zookeeper
By looking at the Mesosphere Github documentation, it seems like a Zookeeper (Exhibitor) is needed always. The only tag available as of today is the tag 1.5.2. Let us download it first:
(dockerhost)$ sudo docker pull netflixoss/exhibitor:1.5.2 1.5.2: Pulling from netflixoss/exhibitor a3ed95caeb02: Pull complete 831a6feb5ab2: Pull complete b32559aac4de: Pull complete 5e99535a7b44: Pull complete aa076096cff1: Pull complete 423664404a49: Pull complete 929c1efe4d14: Pull complete 387bf8857f2e: Pull complete 5efe9ea3de0d: Pull complete a53f74fd9d17: Pull complete 78b42a885be7: Pull complete 684d8691844e: Pull complete Digest: sha256:9b384a431d2e231f0bd3fcda5eff20d5eabd5ba1e3361764a4834d3401fbc4d4 Status: Downloaded newer image for netflixoss/exhibitor:1.5.2
Step 2.2 Download Mesos Master
It is quite confusing, how many distributions of Mesos exist. The Docker Hub distribution from Mesoscloud seems to offer most of the stars:
$ sudo docker search mesos NAME DESCRIPTION STARS OFFICIAL AUTOMATED mesoscloud/mesos-master Mesos Master 50 [OK] mesoscloud/mesos-slave Mesos Slave 31 [OK] ...
However, if I search for mesos-master, I find another image with even more stars and downloads:
$ sudo docker search mesos-master NAME DESCRIPTION STARS OFFICIAL AUTOMATED mesosphere/mesos-master Mesos-Master in Docker 71 mesoscloud/mesos-master Mesos Master 50 [OK] ...
Mesosphere also offers an two-in-one solution with master and slave combined in a single container: mesosphere/mesos. However, the recommended way of running Mesos is to run a master on one Docker host and slaves on other Docker hosts.
Let us download the files mesosphere/mesos-master and mesosphere/mesos-slave. Note that the “latest” tag does not exist, so we need to specify the tag explicitly. Let us try with the latest version I have found to exist for both, master and slave: i.e. 1.1.01.1.0-2.0.107.ubuntu1404.
(dockerhost)$ docker pull mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404 1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-master bf5d46315322: Already exists 9f13e0ac480c: Already exists e8988b5b3097: Already exists 40af181810e7: Already exists e6f7c7e5c03e: Already exists a3ed95caeb02: Already exists 01a862c74d96: Already exists 651b06ceb77e: Already exists Digest: sha256:a011e002d641c6ba8361c542bd9429af721b7c7434598a9615cbd5b05511af7f Status: Downloaded newer image for mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404
The version of the downloaded Mesos Master image can be checked with following command:
(dockerhost)$ sudo docker run -it --rm --name mesos-master mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404 --version mesos 1.1.0
We are using version 1.1.0, currently.
Step 2.3 Download Mesos Slave
(dockerhost)$ docker pull mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404 1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-slave bf5d46315322: Already exists 9f13e0ac480c: Already exists e8988b5b3097: Already exists 40af181810e7: Already exists e6f7c7e5c03e: Already exists a3ed95caeb02: Already exists 01a862c74d96: Already exists 651b06ceb77e: Already exists Digest: sha256:bb75cc78c6880a2faa5307e3d8caa806105c673e9002429e60e3ae858d162999 Status: Downloaded newer image for mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404
Step 2.4 Download Marathon
While Mesos will offer computing resources, Marathon is a framework that will ask for those resources.
(dockerhost)$ sudo docker pull mesosphere/marathon Using default tag: latest latest: Pulling from mesosphere/marathon 43c265008fae: Pull complete af36d2c7a148: Pull complete 143e9d501644: Pull complete bfc4cdbc8d81: Pull complete 38c6fc3e9968: Pull complete 0bfa8d5153bb: Pull complete 05bc8d0fffca: Pull complete f1266a2a7ecb: Pull complete f505e7ed4b7e: Pull complete 219f8c7fc022: Pull complete Digest: sha256:9c881ff6f46a0da69f622a19a1677f1424a12ef37d076ec439854f15b97179fa Status: Downloaded newer image for mesosphere/marathon:latest
Marathon does not offer a –version option or alike. The Marathon version can only be seen in the log, when running Marathon:
(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
Step 3: Run Mesos
In this step, we will run Mesos Master interactively (with -it
switch instead of -d
switch) to better see, what is happening. In a productive environment, you will use the detached mode -d instead of the interactive terminal mode -it.
We have found out by analyzing the Dockerfiles of mesos-master and mesos-slave images that both are based on mesosphere/mesos with different entrypoints and commands:
- mesosphere/mesos-master: entrypoint:
mesos-master
with default option--registry=in_memory
- mesosphere/mesos-slave: entrypoint:
mesos-slave
with no default options
Interestingly, the Dockerfiles have no exposed ports specified. The reason is, that the Docker images are supposed to run in host network mode, and by that, they are sharing the interface including IP address(es) and Ports with the Docker host.
The usage of the docker images is documented on Github. It seems like a zookeeper is needed:
Step 3.1: Run Zookeeper (Exhibitor) interactively in a Container
In order to better see, what is happening, we will run the Zookeeper in interactive terminal (-it) mode instead of detached mode (-d), as described in the documentation. With the --nat=host
option, we share the network with the Docker host, so we do not need to explicitly expose the used TCP ports:
(dockerhost)$ sudo docker run -it --net=host --name=zookeeper netflixoss/exhibitor:1.5.2 v1.0 INFO com.netflix.exhibitor.core.activity.ActivityLog Exhibitor started [main] Dec 12, 2016 5:05:28 PM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. INFO org.mortbay.log Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [main] INFO org.mortbay.log jetty-1.0 [main] Dec 12, 2016 5:05:29 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9.1 09/14/2011 02:36 PM' INFO org.mortbay.log Started SocketConnector@0.0.0.0:8080 [main] INFO com.netflix.exhibitor.core.activity.ActivityLog State: down [ActivityQueue-0] Dec 12, 2016 5:05:30 PM java.util.prefs.FileSystemPreferences$6 run WARNING: Prefs file removed in background /root/.java/.userPrefs/prefs.xml INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog jps didn't find instance - assuming ZK is not running [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog Starting in standalone mode [ActivityQueue-0] ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-1] INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /zookeeper/bin/zkServer.sh [ActivityQueue-0] ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /zookeeper/bin/../conf/zoo.cfg [pool-2-thread-1] INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2] INFO com.netflix.exhibitor.core.activity.ActivityLog State: serving [ActivityQueue-0]
In the log, we see, that the ZooKeeper is using port 8080. Therefore, we will see in Step 3.3 that the TCP port for Marathon needs to be changed in order to avoid a port resource collision.
Step 3.2: Run Master interactively in a Container
Here again, we are following the documentation of the images, but we are starting the Mesos master in interactive mode in a different terminal in order to see all logs, and we use the latest version 1.1.0 instead of 0.28.0:
(dockerhost)$ sudo docker run -it --net=host \ --name mesos-master \ -e MESOS_PORT=5050 \ -e MESOS_ZK=zk://127.0.0.1:2181/mesos \ -e MESOS_QUORUM=1 \ -e MESOS_REGISTRY=in_memory \ -e MESOS_LOG_DIR=/var/log/mesos \ -e MESOS_WORK_DIR=/var/tmp/mesos \ -v "$(pwd)/log/mesos:/var/log/mesos" \ -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \ mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404 WARNING: Logging before InitGoogleLogging() is written to STDERR I1212 17:10:44.898630 1 main.cpp:263] Build: 2016-11-16 01:30:23 by ubuntu I1212 17:10:44.898900 1 main.cpp:264] Version: 1.1.0 I1212 17:10:44.898916 1 main.cpp:267] Git tag: 1.1.0 I1212 17:10:44.898927 1 main.cpp:271] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873 I1212 17:10:44.903816 1 logging.cpp:194] INFO level logging started! I1212 17:10:44.904436 1 main.cpp:370] Using 'HierarchicalDRF' allocator 2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer 2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic 2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 2016-12-12 17:10:44,907:1(0x7ff4a6961700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@747: Client environment:user.name=(null) 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@755: Client environment:user.home=/root 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@767: Client environment:user.dir=/ 2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff490000930 flags=0 I1212 17:10:44.909833 11 master.cpp:380] Master 917a95ab-7b77-4316-8e52-1431a8043af3 (openshift-installer-native-docker-compose) started on 127.0.0.1:5050 I1212 17:10:44.912077 11 master.cpp:382] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --port="5050" --quiet="false" --quorum="1" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/tmp/mesos" --zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs" 2016-12-12 17:10:44,909:1(0x7ff4a8164700):ZOO_INFO@log_env@747: Client environment:user.name=(null) 2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@755: Client environment:user.home=/root 2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@767: Client environment:user.dir=/ 2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff4980038a0 flags=0 W1212 17:10:44.913079 11 master.cpp:385] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. ************************************************** I1212 17:10:44.914429 11 master.cpp:434] Master allowing unauthenticated frameworks to register I1212 17:10:44.914448 11 master.cpp:448] Master allowing unauthenticated agents to register I1212 17:10:44.914455 11 master.cpp:462] Master allowing HTTP frameworks to register without authentication I1212 17:10:44.914474 11 master.cpp:504] Using default 'crammd5' authenticator W1212 17:10:44.914487 11 authenticator.cpp:512] No credentials provided, authentication requests will be refused I1212 17:10:44.914687 11 authenticator.cpp:519] Initializing server SASL 2016-12-12 17:10:44,922:1(0x7ff497fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] 2016-12-12 17:10:44,923:1(0x7ff497fff700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0003, negotiated timeout=10000 2016-12-12 17:10:44,923:1(0x7ff4977fe700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] 2016-12-12 17:10:44,924:1(0x7ff4977fe700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0002, negotiated timeout=10000 I1212 17:10:44.924991 8 group.cpp:340] Group process (zookeeper-group(2)@127.0.0.1:5050) connected to ZooKeeper I1212 17:10:44.925303 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1212 17:10:44.925424 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper I1212 17:10:44.925204 5 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5050) connected to ZooKeeper I1212 17:10:44.925606 5 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1212 17:10:44.925617 5 group.cpp:418] Trying to create path '/mesos' in ZooKeeper I1212 17:10:44.931725 8 contender.cpp:152] Joining the ZK group I1212 17:10:44.932301 11 master.cpp:1951] Successfully attached file '/var/log/mesos/mesos-master.INFO' I1212 17:10:44.936324 9 contender.cpp:268] New candidate (id='1') has entered the contest for leadership I1212 17:10:44.937194 5 detector.cpp:152] Detected a new leader: (id='1') I1212 17:10:44.937408 5 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper I1212 17:10:44.939241 5 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected I1212 17:10:44.939414 5 master.cpp:2017] Elected as the leading master! I1212 17:10:44.939437 5 master.cpp:1560] Recovering from registrar I1212 17:10:44.941402 5 registrar.cpp:362] Successfully fetched the registry (0B) in 1.7152ms I1212 17:10:44.941573 5 registrar.cpp:461] Applied 1 operations in 5462ns; attempting to update the registry I1212 17:10:44.946907 5 registrar.cpp:506] Successfully updated the registry in 5.135104ms I1212 17:10:44.947170 5 registrar.cpp:392] Successfully recovered registrar I1212 17:10:44.947314 5 master.cpp:1676] Recovered 0 agents from the registry (184B); allowing 10mins for agents to re-register 2016-12-12 17:11:11,640:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 14ms 2016-12-12 17:11:11,641:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms 2016-12-12 17:11:35,045:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 50ms 2016-12-12 17:11:35,046:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 51ms
Step 3.3: Run Slave interactively in a Container
Here again, we are following the documentation of the images, but we are starting the Mesos slave in interactive mode (-it) in a third terminal in order to see all log, and we use the latest version 1.1.0 instead of 0.28.0:
(dockerhost)$ sudo docker run -it --net=host --privileged \ -e MESOS_PORT=5051 \ -e MESOS_MASTER=zk://127.0.0.1:2181/mesos \ -e MESOS_SWITCH_USER=0 \ -e MESOS_CONTAINERIZERS=docker,mesos \ -e MESOS_LOG_DIR=/var/log/mesos \ -e MESOS_WORK_DIR=/var/tmp/mesos \ -v "$(pwd)/log/mesos:/var/log/mesos" \ -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /cgroup:/cgroup \ -v /sys:/sys \ -v /usr/local/bin/docker:/usr/local/bin/docker \ mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404 WARNING: Logging before InitGoogleLogging() is written to STDERR I1212 17:18:39.260704 1 main.cpp:243] Build: 2016-11-16 01:30:23 by ubuntu I1212 17:18:39.261031 1 main.cpp:244] Version: 1.1.0 I1212 17:18:39.261075 1 main.cpp:247] Git tag: 1.1.0 I1212 17:18:39.261108 1 main.cpp:251] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873 I1212 17:18:39.265000 1 logging.cpp:194] INFO level logging started! I1212 17:18:39.400902 1 containerizer.cpp:200] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni I1212 17:18:39.429229 1 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher 2016-12-12 17:18:39,438:1(0x7ff8f7601700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer 2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic 2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 I1212 17:18:39.438886 1 slave.cpp:208] Mesos agent started on (1)@127.0.0.1:5051 2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@747: Client environment:user.name=(null) I1212 17:18:39.439324 1 slave.cpp:209] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://127.0.0.1:2181/mesos" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/tmp/mesos" 2016-12-12 17:18:39,441:1(0x7ff8f7601700):ZOO_INFO@log_env@755: Client environment:user.home=/root 2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@log_env@767: Client environment:user.dir=/ 2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff9023a0200 sessionId=0 sessionPasswd=<null> context=0x7ff8d4001e00 flags=0 2016-12-12 17:18:39,445:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] 2016-12-12 17:18:39,446:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0004, negotiated timeout=10000 W1212 17:18:39.440688 1 slave.cpp:212] ************************************************** Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address. ************************************************** I1212 17:18:39.448009 8 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5051) connected to ZooKeeper I1212 17:18:39.448274 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1212 17:18:39.448418 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper I1212 17:18:39.450460 1 slave.cpp:533] Agent resources: cpus(*):2; mem(*):2928; disk(*):1902607; ports(*):[31000-32000] I1212 17:18:39.452038 8 detector.cpp:152] Detected a new leader: (id='1') I1212 17:18:39.452385 8 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper I1212 17:18:39.452198 1 slave.cpp:541] Agent attributes: [ ] I1212 17:18:39.454398 1 slave.cpp:546] Agent hostname: openshift-installer-native-docker-compose I1212 17:18:39.459702 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected I1212 17:18:39.462762 9 state.cpp:57] Recovering state from '/var/tmp/mesos/meta' I1212 17:18:39.464331 9 status_update_manager.cpp:203] Recovering status update manager I1212 17:18:39.465109 9 docker.cpp:764] Recovering Docker containers I1212 17:18:39.465148 11 containerizer.cpp:555] Recovering containerizer I1212 17:18:39.472216 6 provisioner.cpp:253] Provisioner recovery complete I1212 17:18:39.917292 8 slave.cpp:5281] Finished recovery I1212 17:18:39.938516 8 slave.cpp:915] New master detected at master@127.0.0.1:5050 I1212 17:18:39.939077 8 slave.cpp:936] No credentials provided. Attempting to register without authentication I1212 17:18:39.939728 8 slave.cpp:947] Detecting new master I1212 17:18:39.938575 9 status_update_manager.cpp:177] Pausing sending status updates E1212 17:18:40.622269 8 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected I1212 17:18:40.631978 8 slave.cpp:4179] Got exited event for master@127.0.0.1:5050 W1212 17:18:40.634527 8 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected E1212 17:18:40.632072 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected E1212 17:18:42.052165 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected I1212 17:18:42.052312 7 slave.cpp:4179] Got exited event for master@127.0.0.1:5050 W1212 17:18:42.057737 7 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected I1212 17:18:45.093793 6 slave.cpp:4179] Got exited event for master@127.0.0.1:5050 W1212 17:18:45.094339 6 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected E1212 17:18:45.093952 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected I1212 17:18:48.748410 9 slave.cpp:4179] Got exited event for master@127.0.0.1:5050 W1212 17:18:48.748883 9 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected E1212 17:18:48.748569 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected 2016-12-12 17:18:49,470:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 12ms I1212 17:18:50.030248 8 detector.cpp:152] Detected a new leader: None I1212 17:18:50.030894 8 slave.cpp:908] Lost leading master I1212 17:18:50.031404 8 slave.cpp:947] Detecting new master I1212 17:18:50.031327 7 status_update_manager.cpp:177] Pausing sending status updates 2016-12-12 17:18:53,405:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 43ms I1212 17:19:02.637687 7 slave.cpp:1324] Skipping registration because no master present I1212 17:19:22.959288 8 detector.cpp:152] Detected a new leader: (id='2') I1212 17:19:22.961132 8 group.cpp:697] Trying to get '/mesos/json.info_0000000002' in ZooKeeper I1212 17:19:22.965332 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected I1212 17:19:22.971281 8 slave.cpp:915] New master detected at master@127.0.0.1:5050 I1212 17:19:22.972019 8 slave.cpp:936] No credentials provided. Attempting to register without authentication I1212 17:19:22.975116 8 slave.cpp:947] Detecting new master I1212 17:19:22.971410 9 status_update_manager.cpp:177] Pausing sending status updates I1212 17:19:23.927111 11 slave.cpp:1115] Registered with master master@127.0.0.1:5050; given agent ID a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0 I1212 17:19:23.931748 12 status_update_manager.cpp:184] Resuming sending status updates I1212 17:19:24.781162 11 slave.cpp:1175] Forwarding total oversubscribed resources {} I1212 17:19:39.456789 10 slave.cpp:5044] Current disk usage 62.36%. Max allowed age: 1.934608332272824days 2016-12-12 17:19:46,345:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms 2016-12-12 17:20:19,750:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 41ms
Step 4: Connect to Mesos Web Portal
Mesos’ web portal is running on port 5050 per default. In my case, I am running Mesos on Docker containers, which are hosted by a VirtualBox Docker host VM. Since the VM only has a NATed virtual Ethernet interface, I need to specify a port forwarding rule in VirtualBox, so I can reach port 5050 of the Docker host:
After that, we can access the Mesos Master Portal from a Browser on the base system:
Note: if you see the error “Failed to connect to 127.0.0.1:5050!”, try reloading (instead of retrying) the page via the browser reload function. See Appendix A.1 for details.
We can see the single Mesos slave (i.e. agent) connected to the master on the left:
The details of the agents (slaves) can be seen by clicking on the Agents link in the menu:
So, what is next? It looks like we need a framework like Marathon in order to start applications on Mesos. Let us start Marathon now.
Step 5: Start Marathon
Let us start Marathon in a container as described here. However, we will use port 7070 instead of 8080, because 8080 collides with a port that is used by the ZooKeeper. In addition, we will change the master URI and let it point to the external Mesos master we have started in step 3. Moreover, we need to set the environment variable MESOS_WORK_DIR because of a Mesos bug. See Appendix B for details.
(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon (container)# ./bin/start --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070 MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [2016-12-13 14:46:09,798] INFO Starting Marathon 1.3.6/unknown with --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main) [2016-12-13 14:46:10,148] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main) [2016-12-13 14:46:10,336] INFO Logging initialized @1841ms (org.eclipse.jetty.util.log:main) [2016-12-13 14:46:10,849] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-2) [2016-12-13 14:46:11,060] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps: * continueOnError(notifyHealthCheckManager) * continueOnError(notifyRateLimiter) * continueOnError(notifyLaunchQueue) * continueOnError(emitUpdate) * continueOnError(postTaskStatusEvent) * continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main) [2016-12-13 14:46:11,128] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main) [2016-12-13 14:46:11,195] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 14:46:11,200] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 14:46:11,202] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 14:46:11,203] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 14:46:11,205] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main) [2016-12-13 14:46:11,321] INFO All actors suspended: * Actor[akka://marathon/user/groupManager#1830533307] * Actor[akka://marathon/user/launchQueue#-485770292] * Actor[akka://marathon/user/killOverdueStagedTasks#540935184] * Actor[akka://marathon/user/offersWantedForReconciliation#-876030651] * Actor[akka://marathon/user/taskKillServiceActor#1912899513] * Actor[akka://marathon/user/rateLimiter#1490654495] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924] * Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147] * Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826] * Actor[akka://marathon/user/taskTracker#623538008] * Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 14:46:11,417] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main) [2016-12-13 14:46:11,418] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main) [2016-12-13 14:46:11,497] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 14:46:11,497] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 14:46:11,498] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 14:46:11,504] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,535] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 14:46:11,704] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,705] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,710] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,765] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:11,782] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,027] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,028] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,073] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,096] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.memory.free=110MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Client environment:os.memory.total=158MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,124] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@3411e275 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 14:46:12,194] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 14:46:12,201] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 14:46:12,212] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90014, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 14:46:12,217] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread) [2016-12-13 14:46:12,279] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-13 14:46:12,284] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 14:46:12,302] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5301f666 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1) [2016-12-13 14:46:12,310] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 14:46:12,310] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 14:46:12,323] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90015, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 14:46:12,367] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,391] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,398] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,398] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,404] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,425] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,429] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,431] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,433] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,441] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,443] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,445] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,445] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,446] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,449] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-5) [2016-12-13 14:46:12,451] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 14:46:12,468] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 14:46:12,469] INFO Started o.e.j.s.ServletContextHandler@65817241{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,484] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-4) [2016-12-13 14:46:12,487] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,489] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-4) [2016-12-13 14:46:12,492] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 14:46:12,494] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 14:46:12,514] INFO All actors active: * Actor[akka://marathon/user/groupManager#1830533307] * Actor[akka://marathon/user/launchQueue#-485770292] * Actor[akka://marathon/user/killOverdueStagedTasks#540935184] * Actor[akka://marathon/user/offersWantedForReconciliation#-876030651] * Actor[akka://marathon/user/taskKillServiceActor#1912899513] * Actor[akka://marathon/user/rateLimiter#1490654495] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924] * Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147] * Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826] * Actor[akka://marathon/user/taskTracker#623538008] * Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 14:46:12,520] INFO Started ServerConnector@7fc3545e{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,520] INFO Started @4026ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 14:46:12,521] INFO All services up and running. (mesosphere.marathon.Main$:main) [2016-12-13 14:46:12,524] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5) [2016-12-13 14:46:12,536] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5) [2016-12-13 14:46:12,545] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 14:46:12,555] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-13T14:46:29.995Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 14:46:12,588] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,589] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,589] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,589] INFO => Schedule next revive at 2016-12-13T14:46:17.586Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,638] INFO Create new Scheduler Driver with frameworkId: Some(value: "44a35e16-dc32-4f91-afac-33dfff498944-0000" ) and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1) WARNING: Logging before InitGoogleLogging() is written to STDERR W1213 14:46:12.777942 192 sched.cpp:1697] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** I1213 14:46:12.791020 196 sched.cpp:226] Version: 1.0.1 2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@747: Client environment:user.name=(null) 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@755: Client environment:user.home=/root 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@767: Client environment:user.dir=/marathon 2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f3522fb1d80 sessionId=0 sessionPasswd=<null> context=0x7f352403b5c0 flags=0 [2016-12-13 14:46:12,799] INFO Reset offerLeadership backoff (mesosphere.marathon.core.election.impl.ExponentialBackoff:pool-1-thread-1) [2016-12-13 14:46:12,799] INFO Became active. Accepting event streaming requests. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 14:46:12,800] INFO Starting scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-8) 2016-12-13 14:46:12,803:134(0x7f35198c4700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] [2016-12-13 14:46:12,811] INFO Scheduler actor ready (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-6) 2016-12-13 14:46:12,826:134(0x7f35198c4700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f4c9e1f90016, negotiated timeout=10000 I1213 14:46:12.828322 205 group.cpp:349] Group process (group(1)@127.0.0.1:35047) connected to ZooKeeper I1213 14:46:12.829164 205 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1213 14:46:12.829583 205 group.cpp:427] Trying to create path '/mesos' in ZooKeeper I1213 14:46:12.833498 205 detector.cpp:152] Detected a new leader: (id='1') I1213 14:46:12.834187 205 group.cpp:706] Trying to get '/mesos/json.info_0000000001' in ZooKeeper I1213 14:46:12.836730 205 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected I1213 14:46:12.837736 205 sched.cpp:330] New master detected at master@127.0.0.1:5050 I1213 14:46:12.838312 205 sched.cpp:341] No credentials provided. Attempting to register without authentication I1213 14:46:12.843353 205 sched.cpp:743] Framework registered with 44a35e16-dc32-4f91-afac-33dfff498944-0000 [2016-12-13 14:46:12,846] INFO Creating tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-13 14:46:12,872] INFO Registered as 44a35e16-dc32-4f91-afac-33dfff498944-0000 to master '03f81417-51c9-4055-adbe-c1fb74fc8ab4' (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$1ef061b0:Thread-14) [2016-12-13 14:46:12,872] INFO Store framework id: value: "44a35e16-dc32-4f91-afac-33dfff498944-0000" (mesosphere.util.state.FrameworkIdUtil:Thread-14) [2016-12-13 14:46:12,895] INFO Received reviveOffers notification: SchedulerRegisteredEvent (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-13 14:46:16,658] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/deployments HTTP/1.1" 200 22 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 14 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-35) [2016-12-13 14:46:16,660] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/groups?embed=group.groups&embed=group.apps&embed=group.apps.deployments&embed=group.apps.counts&embed=group.apps.readiness HTTP/1.1" 200 95 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-36) [2016-12-13 14:46:16,666] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/queue HTTP/1.1" 200 32 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-34) [2016-12-13 14:46:17,603] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 14:46:17,604] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 14:46:17,609] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 14:46:17,612] INFO => Schedule next revive at 2016-12-13T14:46:22.602Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
Let us connect to the Marathon portal:
We have chosen Marathon to run on port 7070. In my case, the Docker host VirtualBox VM has a NATed interface only, so I need to open the Marathon port on VirtualBox:
Now we can access the Marathon dashboard:
Let us see, whether Maraton is visible on Mesos:
Yes, the Marathon framework can be seen on the Mesos master; perfect.
Now, our topology now looks like follows:
However, we have not started any Marathon application yet. Let us do so now.
Step 6: Start a “Hello World” Application via Marathon Web Portal in a Mesos Container
Let us create an application running in a Mesos container on the Marathon web portal by clicking the button.
We choose
- ID: while-loop-hello-world
- ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
- CPUs: 0.1
- If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
- Memory: 32 MiB
- 32 MiB is the minimum value supported
- Disk Space: 0
- Instances: 4
- Command:
while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
- we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).
After less than 5 sec, we see, that all of the 4 instances are up and running:
On the Docker host, we can see the four instances running:
$ ps -ef | grep Hello | grep -v grep root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
We also can find the log files on the Docker host:
$ sudo find / -name this_is_my_output.log /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9285a2e4-c517-11e6-ad85-02422b10e522/runs/44a5c7dd-59f3-4b59-8d94-33754a3cbdc3/this_is_my_output.log /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9286db66-c517-11e6-ad85-02422b10e522/runs/f3145a9f-2c88-4f99-a8de-618a4157b9e8/this_is_my_output.log
And we can see the output:
$ sudo tail -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log I am a Hello World script I am a Hello World script I am a Hello World script I am a Hello World script I am a Hello World script ...
Step 7: Test of Application Resiliency
Step 7.1: Killing a Process and Testing, whether it was restarted automatically
Now, let us see, what happens, if one of the processes dies. For that, we just will kill one of the “Hello World” processes found on the Docker host:
(dockerhost)$ ps -ef | grep Hello | grep -v grep root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
(dockerhost)$ sudo kill -9 14137
(dockerhost)$ ps -ef | grep Hello | grep -v grep root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log root 19818 19807 1 12:04 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
We can clearly see that the process was killed successfully, and a new process was started automatically.
Step 7.2: Find Traces of the killed Process in the Logs
The output log of the old process can still be retrieved:
$ cat -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33 dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log ... I am a Hello World script I am a Hello World script I am a Hello World script I am a Hello World script (EOF)
However, we cannot see any traces that something went wrong. So, how can we troubleshoot, if a process dies unexpectedly?
Here, we are: the Marathon portal is providing us with information on what happened:
Step 8: Test: Resource Over-subsription not supported per Default?
In this test, we observe, what happens, if an application instance is exceeding the offered resource limits. For that , we will try to start four instances with 1 CPU core each, while the slave provides 2 CPUs and not the required 4 CPUs:
We press again and see that 2 of the 4 applications are started soon:
Two of the task instances were started successfully, and two instances are waiting for resources and are “Unscheduled”. This is the expected behavior:
On the Mesos Dashboard, we can see that the offered 2 CPUs are used by two instances already, leaving no CPU resources for the other two instances:
Aurora vs. Marathon: Consider a situation, where you want to run high priority production applications with low priority development applications on the same hardware. Mesosphere Marathon does not provide any good answer to such situations. In case you hit such a situation, consider to using Apache Aurora instead of Mesosphere Marathon. Apache Aurora allows high priority application to preempt low priority applications.
Mesos: about hard-limit of CPU resources: Mesos’ CPU reservation feels similar to a hard assignment of resources (since over-subscription is not supported per default, as we have seen above), but under the hood, Mesos does not apply hard limits on the CPU usage, unless the Mesos slave has
--cgroups_enable_cfs
(CFS = Completely Fair Scheduler) enabled. See also the second answer on this StackOverflow question. For more information on over-subscription by Mesos, see this Apache Mesos Documentation page.
Conclusion: Per default, resource over-subscription is not allowed by Mesos. See this Apache Mesos Documentation page on more information about over-subscription.
Step 9: Start a “Hello World” Application via Marathon Web Portal in a Docker Container
In this step we learn how to run an application in a Docker container instead of a Mesos container. We will perform similar steps as in Step 6.
We choose
- ID: while-loop-hello-world-container-small
- ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
- CPUs: 0.1
- If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
- Memory: 32 MiB
- 32 MiB is the minimum value supported
- Disk Space: 0
- Instances: 4
- Command:
while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
- we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).
- Docker: ubuntu:latest
We click “Create Application”
Choose a low number of CPUs (0.1 CPUs in my case), the lowest number of Memory and four instances, together with a while loop as command:
In the Docker Container tab, we choose an Ubuntu image.
First, we see “waiting”,
then we see that all four instances are “running”:
On the Docker host, we can see the four Docker containers that have been started:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 78e346ddb3eb ubuntu:latest "/bin/sh -c 'while tr" About an hour ago Up About an hour mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 809ad6513236 ubuntu:latest "/bin/sh -c 'while tr" About an hour ago Up About an hour mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b de7523c00d41 ubuntu:latest "/bin/sh -c 'while tr" About an hour ago Up About an hour mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 d3729dad1bed ubuntu:latest "/bin/sh -c 'while tr" About an hour ago Up About an hour mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718
The processes are:
$ ps -ef | grep Hello | grep -v grep root 19306 19258 0 14:40 ? 00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31730 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522 -e PORT=31730 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31730 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31730 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522/runs/e924e32b-521b-4708-b538-bc2973094718:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19319 19280 0 14:40 ? 00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31873 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522 -e PORT=31873 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31873 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31873 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522/runs/c42d9e0c-f66e-42be-ac0e-46bc8f44e47b:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19320 19281 0 14:40 ? 00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31245 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522 -e PORT=31245 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31245 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31245 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522/runs/e7fcebfb-f301-4369-83eb-2782832d83a9:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19368 19352 0 14:40 ? 00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19379 19334 0 14:40 ? 00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31928 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522 -e PORT=31928 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31928 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31928 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522/runs/13aeacaa-20d5-4628-8e65-f82a0fc71724:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19422 19406 0 14:40 ? 00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19443 19427 0 14:40 ? 00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log root 19473 19457 0 14:40 ? 00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
We can see that there are four docker run commands and four while loops running. We had assigned 0.1 CPUs to each application instance. This has been translated into a CPU share of 1024/10 = ~102.
Note, that Docker does not allow to hard-limit the used CPU per default, if
--cgroups_enable_cfs
(CFS) is not set on the Docker run command. Instead, CPU shares are configured for the Docker container.
Appendix A: Errors & Caveats
Appendix A.1: Mesos Portal Error: Failed to connect to 127.0.0.1:5050!
Problem:
After starting the Mesos Portal in a Browser, we see following symptom:
After clicking “Try now”, the problem shows up immediately again. After clicking the Browser’s page reload button, it looks better, but the problem will show up soon again.
Status: Open
I have not found any solution yet. I have not yet tested any older version.
Appendix A.2: Critical Error: Mesos Master: Lost leadership… committing suicide!
- Mesos master 1.1.0 run in a Docker container (image mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404)
- ZooKeeper 1.5.2 run in a Docker container (image netflixoss/exhibitor:1.5.2)
Symptoms:
From time to time, we get following critical error log of the Mesos Master:
... 2016-12-18 08:15:59,593:22(0x7f18037fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198753ms 2016-12-18 08:15:59,595:22(0x7f1820ef8700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198755ms 2016-12-18 08:15:59,597:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1735: Socket [127.0.0.1:2181] zk retcode=-4, errno=32(Broken pipe): failed while flushing send queue 2016-12-18 08:15:59,597:22(0x7f18037fe700):ZOO_ERROR@handle_socket_error_msg@1746: Socket [127.0.0.1:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response 2016-12-18 08:15:59,599:22(0x7f1820ef8700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] I1218 08:15:59.601364 25 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ... I1218 08:15:59.602128 26 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ... 2016-12-18 08:15:59,604:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1764: Socket [127.0.0.1:2181] zk retcode=-112, errno=116(Stale file handle): sessionId=0x158f4c9e1f90035 has expired. I1218 08:15:59.605609 28 group.cpp:510] ZooKeeper session expired I1218 08:15:59.606210 29 contender.cpp:217] Membership cancelled: 4 2016-12-18 08:15:59,606:22(0x7f1823737700):ZOO_INFO@zookeeper_close@2543: Freeing zookeeper resources for sessionId=0x158f4c9e1f90035 Lost leadership... committing suicide! 2016-12-18 08:15:59,607:22(0x7f182573b700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@747: Client environment:user.name=(null) 2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@755: Client environment:user.home=/root 2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@log_env@767: Client environment:user.dir=/ 2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f182e49c200 sessionId=0 sessionPasswd=<null> context=0x7f1810008470 flags=0 2016-12-18 08:15:59,609:22(0x7f1803fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181] (container)#
Resolution: None
I have found a similar issue for Marathon here. Both problems seem to be caused by ZooKeeper problems, but it is not clear, how to resolve the issue.
Appendix A.3: ZooKeeper: continuous warnings ‘Exceeded deadline’ and ‘Current disk usage’
After starting the ZooKeeper, I continuously see log messages like follows:
2016-12-18 10:24:14,664:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 77ms I1218 10:24:58.446185 15 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days I1218 10:25:58.450482 14 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days 2016-12-18 10:26:21,523:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 26ms I1218 10:26:58.454617 9 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days 2016-12-18 10:27:58,326:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms I1218 10:27:58.458858 12 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days 2016-12-18 10:28:01,681:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms
Resolution: None
Since the warnings do not seem to be critical, I have not yet dug into the problem.
Appendix A.4: Marathon Portal Error: no Reaction upon Restart Request
Symptoms:
If we try to restart an application, there is no reaction whatsoever:
Running ps -ef
on the Docker host yields the same process numbers before and after pressing the Restart button:
(dockerhost)$ ps -ef | grep while | grep -v grep root 13142 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_EXECUTOR_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31690","PORT0":"31690","PORTS":"31690","PORT_10000":"31690"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/a67b861b-c20c-4d94-8db5-e956b3dea8ab --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522/runs/a67b861b-c20c-4d94-8db5-e956b3dea8ab root 13144 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_EXECUTOR_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31996","PORT0":"31996","PORTS":"31996","PORT_10000":"31996"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/cb6ec83f-a228-47f5-ad06-ee6675134bb0 --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522/runs/cb6ec83f-a228-47f5-ad06-ee6675134bb0 root 13164 13143 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done root 13165 13154 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done
Status: Workaround given
The workaround is to destroy the application and run a new application with the exact same parameters.
Appendix A.5: Marathon Portal Error: Retrieval of STDERR and STDOUT fails
Seen on Marathon 1.3.6 with both, a shell script on Mesos container as well as with a shell script on a Docker container.
Symptoms:
If you try to retrieve the error log of output of a hello world application on the Marathon portal, we get the error message “Sorry, there was a problem retrieving file. Click to retry.” Retrying does not help.
Status: Open (idea for a workaround given below)
A possible workaround is to run the script with redirection into a file you can retrieve later, E.g. we could define a script like
while true; do echo "I am a Hello World script" 2>&1 1> this_is_my_output.log
After having found the working directory of Mesos-started scripts, you can retrieve the log file from the slave’s file system.
Appendix A.6: Critical Marathon Error: Flag 'work_dir' is required, but it was not provided
Marathon 1.3.6 run as Docker container (mesosphere/marathon:latest; image ID 9d03a8fd0fdd)
Symptoms
When we try to start Marathon as seen below, the attempt fails with the exception:
Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided
Full log:
(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main) [2016-12-12 21:01:46,588] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main) [2016-12-12 21:01:46,899] INFO Logging initialized @1979ms (org.eclipse.jetty.util.log:main) [2016-12-12 21:01:47,322] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:01:47,517] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps: * continueOnError(notifyHealthCheckManager) * continueOnError(notifyRateLimiter) * continueOnError(notifyLaunchQueue) * continueOnError(emitUpdate) * continueOnError(postTaskStatusEvent) * continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main) [2016-12-12 21:01:47,579] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main) [2016-12-12 21:01:47,657] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:01:47,662] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:01:47,665] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:01:47,666] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:01:47,668] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main) [2016-12-12 21:01:47,779] INFO All actors suspended: * Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650] * Actor[akka://marathon/user/rateLimiter#1278378489] * Actor[akka://marathon/user/groupManager#890512610] * Actor[akka://marathon/user/taskTracker#-11699813] * Actor[akka://marathon/user/launchQueue#1496971565] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315] * Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379] * Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868] * Actor[akka://marathon/user/offerMatcherManager#219115497] * Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480] * Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5) [2016-12-12 21:01:47,891] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main) [2016-12-12 21:01:47,892] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main) [2016-12-12 21:01:47,971] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:01:47,972] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:01:47,974] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:01:47,983] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,018] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8) [2016-12-12 21:01:48,229] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,238] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,311] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,328] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:01:48,527] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,528] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,553] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,565] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,576] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,577] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,577] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,577] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,577] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,577] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,578] INFO Client environment:os.memory.free=121MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,578] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,578] INFO Client environment:os.memory.total=157MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,578] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@7531678e (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:01:48,623] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:01:48,633] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:01:48,653] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90004, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:01:48,661] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread) [2016-12-12 21:01:48,722] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-12 21:01:48,723] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:01:48,734] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5ca52bc0 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1) [2016-12-12 21:01:48,748] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:01:48,749] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:01:48,755] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90005, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:01:48,854] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7) [2016-12-12 21:01:48,854] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:01:48,868] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:01:48,885] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8) [2016-12-12 21:01:48,893] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:01:48,897] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-8) [2016-12-12 21:01:48,900] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3) [2016-12-12 21:01:48,901] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3) [2016-12-12 21:01:48,902] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:01:48,911] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-7) [2016-12-12 21:01:48,914] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1) [2016-12-12 21:01:48,924] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-8) [2016-12-12 21:01:48,925] INFO All actors active: * Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650] * Actor[akka://marathon/user/rateLimiter#1278378489] * Actor[akka://marathon/user/groupManager#890512610] * Actor[akka://marathon/user/taskTracker#-11699813] * Actor[akka://marathon/user/launchQueue#1496971565] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315] * Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379] * Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868] * Actor[akka://marathon/user/offerMatcherManager#219115497] * Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480] * Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-8) [2016-12-12 21:01:48,994] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7) [2016-12-12 21:01:48,994] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7) [2016-12-12 21:01:48,995] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:02:06.407Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:01:49,008] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) [2016-12-12 21:01:49,008] INFO => Schedule next revive at 2016-12-12T21:01:53.993Z in 4986 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) WARNING: Logging before InitGoogleLogging() is written to STDERR W1212 21:01:49.134382 63 sched.cpp:1697] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** I1212 21:01:49.179211 63 leveldb.cpp:174] Opened db in 40.639535ms I1212 21:01:49.188115 63 leveldb.cpp:181] Compacted db in 8.801601ms I1212 21:01:49.188176 63 leveldb.cpp:196] Created db iterator in 11072ns I1212 21:01:49.188189 63 leveldb.cpp:202] Seeked to beginning of db in 758ns I1212 21:01:49.188194 63 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns I1212 21:01:49.188232 63 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1212 21:01:49.189082 70 recover.cpp:451] Starting replica recovery I1212 21:01:49.189720 70 recover.cpp:477] Replica is in EMPTY status I1212 21:01:49.189486 73 master.cpp:375] Master 8be08601-c962-42fa-9f78-7beda337b644 (openshift-installer-native-docker-compose) started on 127.0.0.1:35322 I1212 21:01:49.190399 73 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/9Y1S6q" --zk_session_timeout="10secs" W1212 21:01:49.190558 73 master.cpp:380] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. ************************************************** I1212 21:01:49.190712 73 master.cpp:429] Master allowing unauthenticated frameworks to register I1212 21:01:49.191285 73 master.cpp:443] Master allowing unauthenticated agents to register I1212 21:01:49.191337 73 master.cpp:457] Master allowing HTTP frameworks to register without authentication I1212 21:01:49.191366 73 master.cpp:499] Using default 'crammd5' authenticator Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided W1212 21:01:49.192327 73 authenticator.cpp:512] No credentials provided, authentication requests will be refused I1212 21:01:49.192360 73 authenticator.cpp:519] Initializing server SASL root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [scallop] Error: Unknown option 'work_dir' root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 -work_dir=/work_dir MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [scallop] Error: Unknown option 'w' root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir=/work_dir MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [scallop] Error: Unknown option 'work_dir=/work_dir' root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [2016-12-12 21:05:35,546] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main) [2016-12-12 21:05:35,823] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main) [2016-12-12 21:05:36,115] INFO Logging initialized @1856ms (org.eclipse.jetty.util.log:main) [2016-12-12 21:05:36,621] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-4) [2016-12-12 21:05:36,800] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps: * continueOnError(notifyHealthCheckManager) * continueOnError(notifyRateLimiter) * continueOnError(notifyLaunchQueue) * continueOnError(emitUpdate) * continueOnError(postTaskStatusEvent) * continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main) [2016-12-12 21:05:36,864] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main) [2016-12-12 21:05:36,930] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:05:36,936] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:05:36,940] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:05:36,940] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-12 21:05:36,942] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main) [2016-12-12 21:05:37,047] INFO All actors suspended: * Actor[akka://marathon/user/taskKillServiceActor#-984511042] * Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224] * Actor[akka://marathon/user/killOverdueStagedTasks#-698941106] * Actor[akka://marathon/user/groupManager#-958454684] * Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502] * Actor[akka://marathon/user/offerMatcherManager#2056148696] * Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880] * Actor[akka://marathon/user/rateLimiter#-1808696027] * Actor[akka://marathon/user/offersWantedForReconciliation#362759878] * Actor[akka://marathon/user/launchQueue#-1787605975] * Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5) [2016-12-12 21:05:37,115] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main) [2016-12-12 21:05:37,116] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main) [2016-12-12 21:05:37,174] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:05:37,181] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,185] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:05:37,188] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-12 21:05:37,219] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:05:37,416] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,426] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,428] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,429] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,430] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,431] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,434] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,437] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,491] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,508] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:37,717] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,719] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,775] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,840] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,857] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,863] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,864] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,864] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,865] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,866] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,866] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,867] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,867] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,868] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,868] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,869] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,869] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,870] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,870] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,870] INFO Client environment:os.memory.free=128MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,871] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,871] INFO Client environment:os.memory.total=165MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,872] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@2fba82d1 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-12 21:05:37,922] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:05:37,939] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:05:37,971] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90006, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-12 21:05:37,978] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread) [2016-12-12 21:05:38,049] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-12 21:05:38,052] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:05:38,075] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@2c6c37f7 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1) [2016-12-12 21:05:38,082] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:05:38,084] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:05:38,088] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90007, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-12 21:05:38,174] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7) [2016-12-12 21:05:38,176] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:05:38,188] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-12 21:05:38,195] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-12 21:05:38,200] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-12 21:05:38,208] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:05:38,209] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-2) [2016-12-12 21:05:38,216] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,222] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@46383a78 (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1) [2016-12-12 21:05:38,224] INFO All actors active: * Actor[akka://marathon/user/taskKillServiceActor#-984511042] * Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224] * Actor[akka://marathon/user/killOverdueStagedTasks#-698941106] * Actor[akka://marathon/user/groupManager#-958454684] * Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502] * Actor[akka://marathon/user/offerMatcherManager#2056148696] * Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880] * Actor[akka://marathon/user/rateLimiter#-1808696027] * Actor[akka://marathon/user/offersWantedForReconciliation#362759878] * Actor[akka://marathon/user/launchQueue#-1787605975] * Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-4) [2016-12-12 21:05:38,230] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5) [2016-12-12 21:05:38,239] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5) [2016-12-12 21:05:38,257] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,284] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:05:55.712Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9) [2016-12-12 21:05:38,322] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,333] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,335] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,338] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,339] INFO => Schedule next revive at 2016-12-12T21:05:43.333Z in 4995 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,341] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3) [2016-12-12 21:05:38,356] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,359] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,361] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,375] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,385] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,390] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,396] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,400] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,418] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,423] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,428] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,429] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,431] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,452] INFO Started o.e.j.s.ServletContextHandler@33ffb91e{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) WARNING: Logging before InitGoogleLogging() is written to STDERR W1212 21:05:38.488497 223 sched.cpp:1697] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** [2016-12-12 21:05:38,494] INFO Started ServerConnector@23580056{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,496] INFO Started @4237ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-12 21:05:38,497] INFO All services up and running. (mesosphere.marathon.Main$:main) I1212 21:05:38.524034 223 leveldb.cpp:174] Opened db in 33.764878ms I1212 21:05:38.525073 223 leveldb.cpp:181] Compacted db in 985356ns I1212 21:05:38.525146 223 leveldb.cpp:196] Created db iterator in 9599ns I1212 21:05:38.525156 223 leveldb.cpp:202] Seeked to beginning of db in 501ns I1212 21:05:38.525161 223 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns I1212 21:05:38.525203 223 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1212 21:05:38.526582 229 recover.cpp:451] Starting replica recovery I1212 21:05:38.527250 229 recover.cpp:477] Replica is in EMPTY status I1212 21:05:38.528532 229 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@127.0.0.1:33899 I1212 21:05:38.528669 229 recover.cpp:197] Received a recover response from a replica in EMPTY status I1212 21:05:38.528774 229 recover.cpp:568] Updating replica status to STARTING I1212 21:05:38.529911 232 master.cpp:375] Master e999af40-d59e-4d95-a77a-05403042ca4f (openshift-installer-native-docker-compose) started on 127.0.0.1:33899 I1212 21:05:38.530508 232 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/X38lNg" --zk_session_timeout="10secs" W1212 21:05:38.531312 232 master.cpp:380] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. ************************************************** I1212 21:05:38.532498 232 master.cpp:429] Master allowing unauthenticated frameworks to register I1212 21:05:38.532708 232 master.cpp:443] Master allowing unauthenticated agents to register I1212 21:05:38.532913 232 master.cpp:457] Master allowing HTTP frameworks to register without authentication I1212 21:05:38.533169 232 master.cpp:499] Using default 'crammd5' authenticator W1212 21:05:38.533359 232 authenticator.cpp:512] No credentials provided, authentication requests will be refused I1212 21:05:38.533457 232 authenticator.cpp:519] Initializing server SASL I1212 21:05:38.532569 229 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 3.159782ms I1212 21:05:38.534039 229 replica.cpp:320] Persisted replica status to STARTING I1212 21:05:38.534257 229 recover.cpp:477] Replica is in STARTING status I1212 21:05:38.534668 229 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (5)@127.0.0.1:33899 I1212 21:05:38.535151 229 recover.cpp:197] Received a recover response from a replica in STARTING status I1212 21:05:38.540293 229 recover.cpp:568] Updating replica status to VOTING Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided (container):/marathon#
This error message is described in the documentation of the used Docker image as follows:
“Note: Currently the Docker container fails due to strange behavior from the latest Mesos version. There will be an error about work_dir that is still unresolved”
Unfortunately they did not provide any workaround or solution for the problem.
Resolution (Workaround):
I have found a workaround here: “It seems like explicitly setting the Mesos work directory by adding ENV MESOS_WORK_DIR /var/lib/mesos
to the Dockerfile resolves the issue.”
I.e., we need to set the MESOS_WORK_DIR variable:
(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon (container)# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib. MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so' No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script. [2016-12-13 13:55:36,831] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main) [2016-12-13 13:55:37,118] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main) [2016-12-13 13:55:37,371] INFO Logging initialized @1806ms (org.eclipse.jetty.util.log:main) [2016-12-13 13:55:37,864] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:38,059] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps: * continueOnError(notifyHealthCheckManager) * continueOnError(notifyRateLimiter) * continueOnError(notifyLaunchQueue) * continueOnError(emitUpdate) * continueOnError(postTaskStatusEvent) * continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main) [2016-12-13 13:55:38,119] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main) [2016-12-13 13:55:38,196] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 13:55:38,200] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 13:55:38,204] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 13:55:38,204] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main) [2016-12-13 13:55:38,208] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main) [2016-12-13 13:55:38,361] INFO All actors suspended: * Actor[akka://marathon/user/launchQueue#872625406] * Actor[akka://marathon/user/offerMatcherManager#1742672554] * Actor[akka://marathon/user/reviveOffersWhenWanted#-965567836] * Actor[akka://marathon/user/expungeOverdueLostTasks#-1682888773] * Actor[akka://marathon/user/offersWantedForReconciliation#-1611878367] * Actor[akka://marathon/user/killOverdueStagedTasks#-1736387483] * Actor[akka://marathon/user/taskTracker#-1718074132] * Actor[akka://marathon/user/groupManager#1707955637] * Actor[akka://marathon/user/taskKillServiceActor#-2061381059] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-786726035] * Actor[akka://marathon/user/rateLimiter#-27927539] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 13:55:38,466] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main) [2016-12-13 13:55:38,466] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main) [2016-12-13 13:55:38,518] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 13:55:38,519] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 13:55:38,522] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838) [2016-12-13 13:55:38,525] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,555] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-4) [2016-12-13 13:55:38,727] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,733] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,737] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,785] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:38,802] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,057] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,060] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,119] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,136] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,153] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.memory.free=118MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Client environment:os.memory.total=159MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,154] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@3da7f0be (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,183] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 13:55:39,195] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 13:55:39,230] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f9000d, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181)) [2016-12-13 13:55:39,236] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread) [2016-12-13 13:55:39,309] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,328] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,331] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,331] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,334] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-13 13:55:39,338] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 13:55:39,341] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,351] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,356] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@44a67b18 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1) [2016-12-13 13:55:39,357] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,360] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,364] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,369] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 13:55:39,369] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 13:55:39,374] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f9000e, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181)) [2016-12-13 13:55:39,378] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,380] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,391] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,392] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,392] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,407] INFO Started o.e.j.s.ServletContextHandler@72f0fd27{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,419] INFO Started ServerConnector@3c2f8cd8{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,419] INFO Started @3855ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING) [2016-12-13 13:55:39,420] INFO All services up and running. (mesosphere.marathon.Main$:main) [2016-12-13 13:55:39,464] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-5) [2016-12-13 13:55:39,464] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 13:55:39,482] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1) [2016-12-13 13:55:39,485] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:39,487] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:39,498] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 13:55:39,501] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 13:55:39,506] INFO All actors active: * Actor[akka://marathon/user/launchQueue#872625406] * Actor[akka://marathon/user/offerMatcherManager#1742672554] * Actor[akka://marathon/user/reviveOffersWhenWanted#-965567836] * Actor[akka://marathon/user/expungeOverdueLostTasks#-1682888773] * Actor[akka://marathon/user/offersWantedForReconciliation#-1611878367] * Actor[akka://marathon/user/killOverdueStagedTasks#-1736387483] * Actor[akka://marathon/user/taskTracker#-1718074132] * Actor[akka://marathon/user/groupManager#1707955637] * Actor[akka://marathon/user/taskKillServiceActor#-2061381059] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-786726035] * Actor[akka://marathon/user/rateLimiter#-27927539] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-8) [2016-12-13 13:55:39,509] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 13:55:39,522] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-13) [2016-12-13 13:55:39,528] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5) [2016-12-13 13:55:39,537] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,568] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-13T13:55:57.005Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:39,589] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,589] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,596] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,597] INFO => Schedule next revive at 2016-12-13T13:55:44.589Z in 4993 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,655] INFO Create new Scheduler Driver with frameworkId: Some(value: "44a35e16-dc32-4f91-afac-33dfff498944-0000" ) and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@4eb166a1 (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1) WARNING: Logging before InitGoogleLogging() is written to STDERR W1213 13:55:39.779234 64 sched.cpp:1697] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** I1213 13:55:39.809289 64 leveldb.cpp:174] Opened db in 28.534471ms I1213 13:55:39.811184 64 leveldb.cpp:181] Compacted db in 1.79725ms I1213 13:55:39.811257 64 leveldb.cpp:196] Created db iterator in 11125ns I1213 13:55:39.811283 64 leveldb.cpp:202] Seeked to beginning of db in 670ns I1213 13:55:39.811290 64 leveldb.cpp:271] Iterated through 0 keys in the db in 197ns I1213 13:55:39.811357 64 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1213 13:55:39.812511 77 recover.cpp:451] Starting replica recovery I1213 13:55:39.812706 77 recover.cpp:477] Replica is in EMPTY status I1213 13:55:39.813297 77 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (3)@127.0.0.1:40102 I1213 13:55:39.813491 77 recover.cpp:197] Received a recover response from a replica in EMPTY status I1213 13:55:39.813727 77 recover.cpp:568] Updating replica status to STARTING I1213 13:55:39.815317 73 master.cpp:375] Master d4536c66-4886-4c88-9a02-7dac4b5f2e6d (openshift-installer-native-docker-compose) started on 127.0.0.1:40102 I1213 13:55:39.815476 73 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs" W1213 13:55:39.815767 73 master.cpp:380] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. ************************************************** I1213 13:55:39.815810 73 master.cpp:429] Master allowing unauthenticated frameworks to register I1213 13:55:39.815817 73 master.cpp:443] Master allowing unauthenticated agents to register I1213 13:55:39.815824 73 master.cpp:457] Master allowing HTTP frameworks to register without authentication I1213 13:55:39.815835 73 master.cpp:499] Using default 'crammd5' authenticator W1213 13:55:39.815845 73 authenticator.cpp:512] No credentials provided, authentication requests will be refused I1213 13:55:39.815851 73 authenticator.cpp:519] Initializing server SASL I1213 13:55:39.817608 77 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 3.763381ms I1213 13:55:39.817780 77 replica.cpp:320] Persisted replica status to STARTING I1213 13:55:39.817886 77 recover.cpp:477] Replica is in STARTING status I1213 13:55:39.818063 77 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (7)@127.0.0.1:40102 I1213 13:55:39.818200 77 recover.cpp:197] Received a recover response from a replica in STARTING status I1213 13:55:39.818279 77 recover.cpp:568] Updating replica status to VOTING I1213 13:55:39.819156 77 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 782170ns I1213 13:55:39.819229 77 replica.cpp:320] Persisted replica status to VOTING I1213 13:55:39.819284 77 recover.cpp:582] Successfully joined the Paxos group I1213 13:55:39.819321 77 recover.cpp:466] Recover process terminated I1213 13:55:39.820673 64 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni I1213 13:55:39.823923 73 master.cpp:1847] The newly elected leader is master@127.0.0.1:40102 with id d4536c66-4886-4c88-9a02-7dac4b5f2e6d I1213 13:55:39.823953 73 master.cpp:1860] Elected as the leading master! I1213 13:55:39.823961 73 master.cpp:1547] Recovering from registrar I1213 13:55:39.824046 73 registrar.cpp:332] Recovering registrar I1213 13:55:39.824394 73 log.cpp:553] Attempting to start the writer I1213 13:55:39.824522 73 slave.cpp:198] Agent started on 1)@127.0.0.1:40102 I1213 13:55:39.824532 73 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="true" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos/0" W1213 13:55:39.824811 73 slave.cpp:202] ************************************************** Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address. ************************************************** I1213 13:55:39.825176 73 slave.cpp:519] Agent resources: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000] I1213 13:55:39.825285 73 slave.cpp:527] Agent attributes: [ ] I1213 13:55:39.825294 73 slave.cpp:532] Agent hostname: openshift-installer-native-docker-compose I1213 13:55:39.825721 73 state.cpp:57] Recovering state from '/var/lib/mesos/0/meta' I1213 13:55:39.826089 73 status_update_manager.cpp:200] Recovering status update manager I1213 13:55:39.826179 73 containerizer.cpp:522] Recovering containerizer I1213 13:55:39.826676 72 replica.cpp:493] Replica received implicit promise request from (21)@127.0.0.1:40102 with proposal 1 I1213 13:55:39.826875 74 provisioner.cpp:253] Provisioner recovery complete I1213 13:55:39.828560 74 slave.cpp:4782] Finished recovery I1213 13:55:39.829459 74 slave.cpp:895] New master detected at master@127.0.0.1:40102 I1213 13:55:39.829998 74 slave.cpp:916] No credentials provided. Attempting to register without authentication I1213 13:55:39.829576 72 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.862869ms I1213 13:55:39.830559 72 replica.cpp:342] Persisted promised to 1 I1213 13:55:39.830961 72 coordinator.cpp:238] Coordinator attempting to fill missing positions I1213 13:55:39.829550 75 status_update_manager.cpp:174] Pausing sending status updates I1213 13:55:39.830510 74 slave.cpp:927] Detecting new master I1213 13:55:39.831506 77 replica.cpp:388] Replica received explicit promise request from (23)@127.0.0.1:40102 for position 0 with proposal 2 I1213 13:55:39.832626 77 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 920616ns I1213 13:55:39.832670 77 replica.cpp:712] Persisted action at 0 I1213 13:55:39.833004 75 replica.cpp:537] Replica received write request for position 0 from (24)@127.0.0.1:40102 I1213 13:55:39.834054 75 leveldb.cpp:436] Reading position from leveldb took 40734ns I1213 13:55:39.835325 75 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 722995ns I1213 13:55:39.835367 75 replica.cpp:712] Persisted action at 0 I1213 13:55:39.835605 75 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I1213 13:55:39.836382 75 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 699339ns I1213 13:55:39.836410 75 replica.cpp:712] Persisted action at 0 I1213 13:55:39.836419 75 replica.cpp:697] Replica learned NOP action at position 0 I1213 13:55:39.836597 75 log.cpp:569] Writer started with ending position 0 I1213 13:55:39.836915 75 leveldb.cpp:436] Reading position from leveldb took 12450ns I1213 13:55:39.838395 77 registrar.cpp:365] Successfully fetched the registry (0B) in 14.328064ms I1213 13:55:39.838459 77 registrar.cpp:464] Applied 1 operations in 4626ns; attempting to update the 'registry' I1213 13:55:39.840064 77 log.cpp:577] Attempting to append 226 bytes to the log I1213 13:55:39.840198 77 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I1213 13:55:39.840348 77 replica.cpp:537] Replica received write request for position 1 from (25)@127.0.0.1:40102 I1213 13:55:39.842838 77 leveldb.cpp:341] Persisting action (245 bytes) to leveldb took 2.394634ms I1213 13:55:39.848156 77 replica.cpp:712] Persisted action at 1 I1213 13:55:39.848462 77 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I1213 13:55:39.849424 77 leveldb.cpp:341] Persisting action (247 bytes) to leveldb took 911778ns I1213 13:55:39.849508 77 replica.cpp:712] Persisted action at 1 I1213 13:55:39.849520 77 replica.cpp:697] Replica learned APPEND action at position 1 I1213 13:55:39.849964 77 registrar.cpp:509] Successfully updated the 'registry' in 11.484928ms I1213 13:55:39.850054 77 registrar.cpp:395] Successfully recovered registrar I1213 13:55:39.850160 77 master.cpp:1655] Recovered 0 agents from the Registry (187B) ; allowing 10mins for agents to re-register I1213 13:55:39.850020 75 log.cpp:596] Attempting to truncate the log to 1 I1213 13:55:39.850306 75 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I1213 13:55:39.850533 75 replica.cpp:537] Replica received write request for position 2 from (26)@127.0.0.1:40102 I1213 13:55:39.851971 75 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.375167ms I1213 13:55:39.852044 75 replica.cpp:712] Persisted action at 2 I1213 13:55:39.852219 75 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I1213 13:55:39.853096 75 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 825996ns I1213 13:55:39.853198 75 leveldb.cpp:399] Deleting ~1 keys from leveldb took 55808ns I1213 13:55:39.853235 75 replica.cpp:712] Persisted action at 2 I1213 13:55:39.853245 75 replica.cpp:697] Replica learned TRUNCATE action at position 2 I1213 13:55:39.855962 68 sched.cpp:226] Version: 1.0.1 I1213 13:55:39.857800 75 sched.cpp:330] New master detected at master@127.0.0.1:40102 I1213 13:55:39.857892 75 sched.cpp:341] No credentials provided. Attempting to register without authentication I1213 13:55:39.858000 75 master.cpp:2424] Received SUBSCRIBE call for framework 'marathon' at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:39.858047 75 master.cpp:2500] Subscribing framework marathon with checkpointing enabled and capabilities [ ] I1213 13:55:39.858240 75 hierarchical.cpp:271] Added framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 I1213 13:55:39.858304 74 sched.cpp:743] Framework registered with 44a35e16-dc32-4f91-afac-33dfff498944-0000 [2016-12-13 13:55:39,867] INFO Starting scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,875] INFO Scheduler actor ready (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,876] INFO Became active. Accepting event streaming requests. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:39,876] INFO Reset offerLeadership backoff (mesosphere.marathon.core.election.impl.ExponentialBackoff:pool-1-thread-1) [2016-12-13 13:55:39,901] INFO Creating tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1) [2016-12-13 13:55:39,908] INFO Registered as 44a35e16-dc32-4f91-afac-33dfff498944-0000 to master 'd4536c66-4886-4c88-9a02-7dac4b5f2e6d' (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$1ef061b0:Thread-14) [2016-12-13 13:55:39,908] INFO Store framework id: value: "44a35e16-dc32-4f91-afac-33dfff498944-0000" (mesosphere.util.state.FrameworkIdUtil:Thread-14) [2016-12-13 13:55:39,925] INFO Received reviveOffers notification: SchedulerRegisteredEvent (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-4) I1213 13:55:39.948446 72 master.cpp:4550] Registering agent at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with id d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 I1213 13:55:39.948563 72 registrar.cpp:464] Applied 1 operations in 10900ns; attempting to update the 'registry' I1213 13:55:39.949964 77 log.cpp:577] Attempting to append 424 bytes to the log I1213 13:55:39.950723 77 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I1213 13:55:39.951776 76 replica.cpp:537] Replica received write request for position 3 from (27)@127.0.0.1:40102 I1213 13:55:39.952749 76 leveldb.cpp:341] Persisting action (443 bytes) to leveldb took 891144ns I1213 13:55:39.952900 76 replica.cpp:712] Persisted action at 3 I1213 13:55:39.953106 76 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I1213 13:55:39.953799 76 leveldb.cpp:341] Persisting action (445 bytes) to leveldb took 501557ns I1213 13:55:39.953876 76 replica.cpp:712] Persisted action at 3 I1213 13:55:39.953902 76 replica.cpp:697] Replica learned APPEND action at position 3 I1213 13:55:39.954263 76 registrar.cpp:509] Successfully updated the 'registry' in 5.665024ms I1213 13:55:39.954856 71 hierarchical.cpp:478] Added agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 (openshift-installer-native-docker-compose) with cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000] (allocated: ) I1213 13:55:39.954433 75 log.cpp:596] Attempting to truncate the log to 3 I1213 13:55:39.955282 71 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I1213 13:55:39.955612 71 replica.cpp:537] Replica received write request for position 4 from (28)@127.0.0.1:40102 I1213 13:55:39.955880 76 master.cpp:4619] Registered agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000] I1213 13:55:39.956233 76 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:39.956048 75 slave.cpp:1095] Registered with master master@127.0.0.1:40102; given agent ID d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 I1213 13:55:39.957394 73 status_update_manager.cpp:181] Resuming sending status updates I1213 13:55:39.957774 75 slave.cpp:1155] Forwarding total oversubscribed resources I1213 13:55:39.958096 75 master.cpp:5002] Received update of agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with total oversubscribed resources I1213 13:55:39.958235 75 hierarchical.cpp:542] Agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 (openshift-installer-native-docker-compose) updated with oversubscribed resources (total: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000]) I1213 13:55:39.956370 71 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 663264ns I1213 13:55:39.958457 71 replica.cpp:712] Persisted action at 4 I1213 13:55:39.958593 71 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I1213 13:55:39.959283 71 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 622339ns I1213 13:55:39.959475 71 leveldb.cpp:399] Deleting ~2 keys from leveldb took 25791ns I1213 13:55:39.959547 71 replica.cpp:712] Persisted action at 4 I1213 13:55:39.959604 71 replica.cpp:697] Replica learned TRUNCATE action at position 4 I1213 13:55:40.008047 73 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O0 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 [2016-12-13 13:55:44,622] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:44,622] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2) I1213 13:55:44.624449 73 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:44.626070 73 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 I1213 13:55:44.626485 73 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 [2016-12-13 13:55:44,627] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7) [2016-12-13 13:55:44,630] INFO => Schedule next revive at 2016-12-13T13:55:49.621Z in 4995 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7) I1213 13:55:44.638694 71 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O1 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:49.643371 71 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 [2016-12-13 13:55:49,641] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 13:55:49,642] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 13:55:49,642] INFO 1 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9) [2016-12-13 13:55:49,643] INFO => Schedule next revive at 2016-12-13T13:55:54.640Z in 4999 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9) I1213 13:55:49.649920 71 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 I1213 13:55:49.652896 71 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:49.664896 75 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O2 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:54.672166 77 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:54.672760 77 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 [2016-12-13 13:55:54,671] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:54,672] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2) I1213 13:55:54.675570 77 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:54.688268 75 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O3 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 [2016-12-13 13:55:54,848] INFO initiate task reconciliation (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:54,859] INFO Requesting task reconciliation with the Mesos master (mesosphere.marathon.SchedulerActions:ForkJoinPool-2-worker-7) I1213 13:55:54.862715 77 master.cpp:5398] Performing implicit task state reconciliation for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 [2016-12-13 13:55:54,862] INFO task reconciliation has finished (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-2) [2016-12-13 13:55:54,871] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-694294900] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:57,020] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-5) [2016-12-13 13:55:57,024] INFO Received offers NOT WANTED notification, canceling 0 revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) [2016-12-13 13:55:57,026] INFO => Suppress offers NOW (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6) I1213 13:55:57.036034 74 master.cpp:2882] Processing SUPPRESS call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102 I1213 13:55:57.036320 74 hierarchical.cpp:1002] Suppressed offers for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 I1213 13:56:09.705217 73 master.cpp:5463] Performing explicit task state reconciliation for 1 tasks of framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
This has resolved the issue.
Summary
In this blog post we have performed following tasks:
- Started a Marathon + Mesos environment with ZooKeeper, Mesos Master, Mesos Agent (Slave) and Mesosphere Marathon.
- Started a “Hello World” application via Marathon in a default Mesos container.
- Tested the resiliency by killing a process managed by Marathon.
- Started a “Hello World” application via Marathon in a Docker container.
- Showed, what happens, if all CPU resources are reserved.
- Described many errors we have observed in the Appendix.
Our observations were:
- Different from our expectation for Mesos and Marathon to be production-ready, the Mesos applications do not seem to be very robust: we have seen critical errors like “suicides” of the Mesos Master, ZooKeeper and Mesosphere Marathon. I.e., monitoring and watchdogs are key when using Mesos + Marathon in a production environment. Apart from the critical errors, there are some minor problems like “connection lost” messages on the Mesos Web Portal, non-working “Application Restart” functions on Mesosphere Marathon. All problems are discussed in Appendix A.1 to A.6.
- We have demonstrated in Step 8, that Mesos will not allow for over-subscription of resources per default. I.e., if you have 2 CPU cores and you are reserving 1 CPU core of 2 instances each, all CPU resources are occupied and no other task instance can be started (it will reside in “waiting” state). Mesos’ CPU reservation feels similar to a hard assignment of resources (since over-subscription is not supported per default, as we have seen above), but under the hood, Mesos does not apply hard limits on the CPU usage, unless the Mesos slave has
--cgroups_enable_cfs
(CFS = Completely Fair Scheduler) enabled. See also the second answer on this StackOverflow question. For more information on over-subscription by Mesos, see this Apache Mesos Documentation page. - Mesos supports processes run in Mesos containers as well as processes run on Docker containers.
- We could demonstrate that Mesosphere Marathon does, what it is designed for: it will make sure that an application is restarted, if it has crashed or has been killed.