JsonParser的Parse方法大大减慢了我的代码速度

时间:2023-01-18 09:50:11

I'm working on project, which should extract data from JSON files(contains information about polish deputies) and do a few calculations just using these data.

我正在研究项目,它应该从JSON文件中提取数据(包含有关波兰代表的信息),并使用这些数据进行一些计算。

Code is executing properly, but one method drastically slows down everything. Im not the best in describing, so let's show my Jsonreader class
Gist link
(Method was used in 17th,43th,50th line) Code looks kinda messy, but it works fine, excluding fragments using jsonparser.parse method. It tooks unacceptable ~2sec per envoy. I have to change that few lines, but I don't know how. I was thinking about prescribing json file to map object and then working on it, but im not sure if it is a good option.
(Sorry for my poor grammar)

代码正在执行,但是一种方法会大大减慢一切。我不是最好的描述,所以让我们展示我的Jsonreader类Gist链接(方法用于第17,43,50行)代码看起来有点混乱,但它工作正常,使用jsonparser.parse方法排除片段。每位特使花了不可接受的约2秒。我必须改变这几行,但我不知道如何。我正在考虑将json文件指定为映射对象,然后对其进行处理,但我不确定它是否是一个不错的选择。 (抱歉我的语法很差)

1 个解决方案

#1


1  

How can i check if problem lies in taht getContent method?

我怎样才能检查问题是否存在于getContent方法中?

You can prove it indirectly: just check your service API performance in your web browser network debugger tab, or measure the time for simple wget like time wget YOUR_URL.

您可以间接证明它:只需在Web浏览器网络调试器选项卡中检查您的服务API性能,或者测量简单wget的时间,例如时间wget YOUR_URL。

I agree with Andreas having doubt that the parse method is the root of the evil. Actually it's not. If you look at your gist closer, you can see that the parse method accepts a delegated reader that actually uses the underlying input stream that's "connected" with the remote host. I/O usually are very time-consuming operations, especially networking. Also, establishing an HTTP connection is an expensive thing here. At my machine I've ended up with the following average timing:

我同意安德烈亚斯怀疑解析方法是邪恶的根源。实际上并非如此。如果你仔细看看你的要点,你会发现parse方法接受一个委托读者,它实际上使用了与远程主机“连接”的基础输入流。 I / O通常是非常耗时的操作,尤其是网络。此外,建立HTTP连接在这里是一件昂贵的事情。在我的机器上,我最终得到了以下平均时间:

  • making HTTP requests: ~1.50..2.00s at the very first request, and 0.50..1.00s for the consecutive ones;
  • 发出HTTP请求:第一次请求时为~1.50 ... 2.00s,连续请求为0.50..1.00s;

  • reading data: ~0.80s (either dumb reading until the end, or JSON parsing -- does not really matter, Gson is really very fast; also you can profile the performance even in your browser with a network debugger or time wget URL if you use a Unix terminal).
  • 读取数据:~0.80s(直到最后都是哑读,或者是JSON解析 - 并不重要,Gson真的非常快;你甚至可以在浏览器中用网络调试器或时间wget URL来分析性能使用Unix终端)。

Another point suggested by Andreas is use of multiple threads in order to run independent tasks in parallel. This can speed the things up, but it won't make super change to you since your service access is not that fast, unfortunately.

Andreas建议的另一点是使用多个线程来并行运行独立任务。这可以加快速度,但不会对您进行超级更改,因为您的服务访问速度并不快。

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 1063935ms         = ~17:43
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 353044ms           = ~5:53

Running the demo later gave the following results (approximately 3 times faster, no idea what's the real cause of the previous slowdown)

稍后运行演示会得到以下结果(大约快3倍,不知道之前减速的真正原因是什么)

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 382249ms          = ~6:22
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 130502ms           = ~2:11
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 110119ms           = ~1:50

AbstractDemo.java

The following class violates some good OOP design concepts, but in order not to bloat the total number of classes, let its stuff just be here.

下面的类违反了一些优秀的OOP设计概念,但是为了不增加类的总数,让它的东西就在这里。

abstract class AbstractDemo
        implements Callable<List<EnvoyData>> {

    // Gson is thread-safe
    private static final Gson gson = new Gson();

    // JsonParser is thread-safe: https://groups.google.com/forum/#!topic/google-gson/u6hq2OVpszc
    private static final JsonParser jsonParser = new JsonParser();

    interface IPointsAndYearbooksConsumer {

        void acceptPointsAndYearbooks(SerializedDataPoints points, SerializedDataYears yearbooks);

    }

    interface ITripsConsumer {

        void acceptTrips(SerializedDataTrips trips);

    }

    AbstractDemo() {
    }

    protected abstract List<EnvoyData> doCall()
            throws Exception;

    // This implementation measures time (in milliseconds) taken for each demo call
    @Override
    public final List<EnvoyData> call()
            throws Exception {
        final String name = getClass().getSimpleName();
        final long start = currentTimeMillis();
        try {
            out.printf("Executing %s...\n", name);
            final List<EnvoyData> result = doCall();
            out.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            return result;
        } catch ( final Exception ex ) {
            err.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            throw ex;
        }
    }

    // This is a generic method that encapsulates generic pagination and lets you to iterate over the service pages in for-each style manner 
    static Iterable<JsonElement> jsonRequestsAt(final URL startUrl, final Function<? super JsonObject, URL> nextLinkExtrator, final JsonParser jsonParser) {
        return () -> new Iterator<JsonElement>() {
            private URL nextUrl = startUrl;

            @Override
            public boolean hasNext() {
                return nextUrl != null;
            }

            @Override
            public JsonElement next() {
                if ( nextUrl == null ) {
                    throw new NoSuchElementException();
                }
                try ( final Reader reader = readFrom(nextUrl) ) {
                    final JsonElement root = jsonParser.parse(reader);
                    nextUrl = nextLinkExtrator.apply(root.getAsJsonObject());
                    return root;
                } catch ( final IOException ex ) {
                    throw new RuntimeException(ex);
                }
            }
        };
    }

    // Just a helper method to iterate over the start response
    static Iterable<JsonElement> getAfterwords()
            throws MalformedURLException {
        return jsonRequestsAt(
                afterwordsUrl(),
                root -> {
                    try {
                        final JsonElement next = root.get("Links").getAsJsonObject().get("next");
                        return next != null ? new URL(next.getAsString()) : null;
                    } catch ( final MalformedURLException ex ) {
                        throw new RuntimeException(ex);
                    }
                },
                jsonParser
        );
    }

    // Just extract points and yearbooks.
    // You can return a custom data holder class, but this one uses consuming-style passing the results via its parameter consumer
    static void extractPointsAndYearbooks(final Reader reader, final IPointsAndYearbooksConsumer consumer) {
        final JsonObject expensesJsonObject = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wydatki")
                .getAsJsonObject();
        final SerializedDataPoints points = gson.fromJson(expensesJsonObject.get("punkty").getAsJsonArray(), SerializedDataPoints.class);
        final SerializedDataYears yearbooks = gson.fromJson(expensesJsonObject.get("roczniki").getAsJsonArray(), SerializedDataYears.class);
        consumer.acceptPointsAndYearbooks(points, yearbooks);
    }

    // The same as above but for another type of response
    static void extractTrips(final Reader reader, final ITripsConsumer consumer) {
        final JsonElement tripsJsonElement = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wyjazdy");
        final SerializedDataTrips trips = tripsJsonElement.isJsonArray()
                ? gson.fromJson(tripsJsonElement.getAsJsonArray(), SerializedDataTrips.class)
                : null;
        consumer.acceptTrips(trips);
    }

    // It might be a constant field, but the next methods are dynamic (parameter-dependent), so let them all be similar
    // Checked exceptions are not that evil, and let the call-site decide what to do with them
    static URL afterwordsUrl()
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json");
    }

    // The same as above
    static URL afterwordsUrl(final int page)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json?_type=objects&page=" + page);
    }

    // The same as above
    static URL tripsUrl(final int envoyId)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wyjazdy");
    }

    // The same as above
    static URL expensesUrl(final int envoyId)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wydatki");
    }

    // Since jsonParser is encapsulated
    static JsonElement parseJsonElement(final Reader reader) {
        return jsonParser.parse(reader);
    }

    // A helper method to return a reader for the given URL
    static Reader readFrom(final URL url)
            throws IOException {
        final HttpURLConnection request = (HttpURLConnection) url.openConnection();
        request.connect();
        return new BufferedReader(new InputStreamReader((InputStream) request.getContent()));
    }

    // Waits for all futures used in multi-threaded demo
    // Not sure how good this method is since I'm not an expert in concurrent programming unfortunately
    static void waitForAllFutures(final Iterable<? extends Future<?>> futures)
            throws ExecutionException, InterruptedException {
        final Iterator<? extends Future<?>> iterator = futures.iterator();
        while ( iterator.hasNext() ) {
            final Future<?> future = iterator.next();
            future.get();
            iterator.remove();
        }
    }

}

SingleThreadedDemo.java

The simplest demo. Entire data pulling is executed in a single thread, so it tends to be the slowest demo here. This one is totally thread-safe having no fields and can be declared as a singleton.

最简单的演示。整个数据拉动在一个线程中执行,因此它往往是最慢的演示。这个是完全线程安全的,没有字段,可以声明为单例。

final class SingleThreadedDemo
        extends AbstractDemo {

    private static final Callable<List<EnvoyData>> singleThreadedDemo = new SingleThreadedDemo();

    private SingleThreadedDemo() {
    }

    static Callable<List<EnvoyData>> getSingleThreadedDemo() {
        return singleThreadedDemo;
    }

    @Override
    protected List<EnvoyData> doCall()
            throws IOException {
        final List<EnvoyData> envoys = new ArrayList<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) {
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                    extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                        // ... consume points and yearbooks here
                    });
                }
                try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                    extractTrips(tripsReader, trips -> {
                        // ... consume trips here
                    });
                }
            }
        }
        return envoys;
    }

}

MultiThreadedDemo.java

Unfortunately I'm really very weak in Java concurrency, and probably these multi-threaded demos can be improved dramatically. This semi-multi-threaded demo that uses both approaches:

不幸的是,我在Java并发性方面非常弱,而且这些多线程演示可能会得到显着改善。这个使用两种方法的半多线程演示:

  • one thread for iterating over pages;
  • 一个用于迭代页面的线程;

  • multiple threads to grab points, yearbooks, and trips data.
  • 多个线程来获取积分,年鉴和旅行数据。

Also note that this demo (and another multi-threaded one below) is not fail-safe: if anything throws an exception in a submitted task, the executor service background thread won't terminate properly. Thus you might want to make it fail-safe and robust yourself.

另请注意,此演示(以及下面的另一个多线程)不是故障安全的:如果在提交的任务中抛出异常,则执行程序服务后台线程将无法正常终止。因此,您可能希望自己实现故障安全和强大。

final class MultiThreadedDemo
        extends AbstractDemo {

    private final ExecutorService executorService;

    private MultiThreadedDemo(final ExecutorService executorService) {
        this.executorService = executorService;
    }

    static Callable<List<EnvoyData>> getMultiThreadedDemo(final ExecutorService executorService) {
        return new MultiThreadedDemo(executorService);
    }

    @Override
    protected List<EnvoyData> doCall()
            throws InterruptedException, ExecutionException, MalformedURLException {
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) {
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsJsonPrimitive().getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            }
        }
        waitForAllFutures(futures);
        return envoys;
    }

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                    // ... consume points and yearbooks here
                });
                return null;
            }
        }));
    }

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                extractTrips(tripsReader, trips -> {
                    // ... consume trips here
                });
                return null;
            }
        }));
    }

}

MultiThreadedEstimatedPagesDemo.java

This one is somewhat more enhanced version of the previous demo. But this demo submits executor service tasks for iterating over the service pages. To achieve it, it's necessary to detect the number of pages beforehand. And having the number of pages lets to make https://...poslowie.json?...page=... URLs processing parallel. Note that if there are more than 1 page found, the next iteration starts from the 2nd page, not making a duplicate request.

这个是前一个演示的一些更强大的版本。但是这个演示提交了执行者服务任务,用于迭代服务页面。为了实现它,有必要事先检测页数。并且有多少页面可以使https://...poslowie.json?... page = ... URL并行处理。请注意,如果找到的页面超过1,则下一次迭代从第2页开始,而不是重复请求。

final class MultiThreadedEstimatedPagesDemo
        extends AbstractDemo {

    private final ExecutorService executorService;

    private MultiThreadedEstimatedPagesDemo(final ExecutorService executorService) {
        this.executorService = executorService;
    }

    static Callable<List<EnvoyData>> getMultiThreadedEstimatedPagesDemo(final ExecutorService executorService) {
        return new MultiThreadedEstimatedPagesDemo(executorService);
    }

    @Override
    protected List<EnvoyData> doCall()
            throws IOException, JsonIOException, JsonSyntaxException, InterruptedException, ExecutionException {
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final JsonObject page1RootJsonObject;
        final int totalPages;
        try ( final Reader page1Reader = readFrom(afterwordsUrl()) ) {
            page1RootJsonObject = parseJsonElement(page1Reader).getAsJsonObject();
            totalPages = estimateTotalPages(page1RootJsonObject);
        }
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        futures.add(executorService.submit(() -> {
            final JsonArray dataObjectArray = page1RootJsonObject.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            }
            return null;
        }));
        for ( int page = 2; page <= totalPages; page++ ) {
            final int finalPage = page;
            futures.add(executorService.submit(() -> {
                try ( final Reader reader = readFrom(afterwordsUrl(finalPage)) ) {
                    final JsonElement afterwordJsonElement = parseJsonElement(reader);
                    final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
                    for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                        final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                        submitExtractPointsAndYearbooks(futures, envoyId);
                        submitExtractTrips(futures, envoyId);
                    }
                }
                return null;
            }));
        }
        waitForAllFutures(futures);
        return envoys;
    }

    private static int estimateTotalPages(final JsonObject rootJsonObject) {
        final int elementsPerPage = rootJsonObject.get("Dataobject").getAsJsonArray().size();
        final int totalElements = rootJsonObject.get("Count").getAsInt();
        return (int) ceil((double) totalElements / elementsPerPage);
    }

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                    // ... consume points and yearbooks here
                });
                return null;
            }
        }));
    }

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                extractTrips(tripsReader, trips -> {
                    // ... consume trips here
                });
                return null;
            }
        }));
    }

}

Test.java

And the demo itself:

演示本身:

public final class Test {

    private Test() {
    }

    public static void main(final String... args)
            throws Exception {
        runSingleThreadedDemo();
        runMultiThreadedDemo();
        runMultiThreadedEstimatedPagesDemo();
    }

    private static void runSingleThreadedDemo()
            throws Exception {
        final Callable<?> singleThreadedDemo = getSingleThreadedDemo();
        singleThreadedDemo.call();
    }

    private static void runMultiThreadedDemo()
            throws Exception {
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedDemo(executorService);
        demo.call();
        executorService.shutdown();
    }

    private static void runMultiThreadedEstimatedPagesDemo()
            throws Exception {
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedEstimatedPagesDemo(executorService);
        demo.call();
        executorService.shutdown();
    }

}

#1


1  

How can i check if problem lies in taht getContent method?

我怎样才能检查问题是否存在于getContent方法中?

You can prove it indirectly: just check your service API performance in your web browser network debugger tab, or measure the time for simple wget like time wget YOUR_URL.

您可以间接证明它:只需在Web浏览器网络调试器选项卡中检查您的服务API性能,或者测量简单wget的时间,例如时间wget YOUR_URL。

I agree with Andreas having doubt that the parse method is the root of the evil. Actually it's not. If you look at your gist closer, you can see that the parse method accepts a delegated reader that actually uses the underlying input stream that's "connected" with the remote host. I/O usually are very time-consuming operations, especially networking. Also, establishing an HTTP connection is an expensive thing here. At my machine I've ended up with the following average timing:

我同意安德烈亚斯怀疑解析方法是邪恶的根源。实际上并非如此。如果你仔细看看你的要点,你会发现parse方法接受一个委托读者,它实际上使用了与远程主机“连接”的基础输入流。 I / O通常是非常耗时的操作,尤其是网络。此外,建立HTTP连接在这里是一件昂贵的事情。在我的机器上,我最终得到了以下平均时间:

  • making HTTP requests: ~1.50..2.00s at the very first request, and 0.50..1.00s for the consecutive ones;
  • 发出HTTP请求:第一次请求时为~1.50 ... 2.00s,连续请求为0.50..1.00s;

  • reading data: ~0.80s (either dumb reading until the end, or JSON parsing -- does not really matter, Gson is really very fast; also you can profile the performance even in your browser with a network debugger or time wget URL if you use a Unix terminal).
  • 读取数据:~0.80s(直到最后都是哑读,或者是JSON解析 - 并不重要,Gson真的非常快;你甚至可以在浏览器中用网络调试器或时间wget URL来分析性能使用Unix终端)。

Another point suggested by Andreas is use of multiple threads in order to run independent tasks in parallel. This can speed the things up, but it won't make super change to you since your service access is not that fast, unfortunately.

Andreas建议的另一点是使用多个线程来并行运行独立任务。这可以加快速度,但不会对您进行超级更改,因为您的服务访问速度并不快。

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 1063935ms         = ~17:43
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 353044ms           = ~5:53

Running the demo later gave the following results (approximately 3 times faster, no idea what's the real cause of the previous slowdown)

稍后运行演示会得到以下结果(大约快3倍,不知道之前减速的真正原因是什么)

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 382249ms          = ~6:22
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 130502ms           = ~2:11
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 110119ms           = ~1:50

AbstractDemo.java

The following class violates some good OOP design concepts, but in order not to bloat the total number of classes, let its stuff just be here.

下面的类违反了一些优秀的OOP设计概念,但是为了不增加类的总数,让它的东西就在这里。

abstract class AbstractDemo
        implements Callable<List<EnvoyData>> {

    // Gson is thread-safe
    private static final Gson gson = new Gson();

    // JsonParser is thread-safe: https://groups.google.com/forum/#!topic/google-gson/u6hq2OVpszc
    private static final JsonParser jsonParser = new JsonParser();

    interface IPointsAndYearbooksConsumer {

        void acceptPointsAndYearbooks(SerializedDataPoints points, SerializedDataYears yearbooks);

    }

    interface ITripsConsumer {

        void acceptTrips(SerializedDataTrips trips);

    }

    AbstractDemo() {
    }

    protected abstract List<EnvoyData> doCall()
            throws Exception;

    // This implementation measures time (in milliseconds) taken for each demo call
    @Override
    public final List<EnvoyData> call()
            throws Exception {
        final String name = getClass().getSimpleName();
        final long start = currentTimeMillis();
        try {
            out.printf("Executing %s...\n", name);
            final List<EnvoyData> result = doCall();
            out.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            return result;
        } catch ( final Exception ex ) {
            err.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            throw ex;
        }
    }

    // This is a generic method that encapsulates generic pagination and lets you to iterate over the service pages in for-each style manner 
    static Iterable<JsonElement> jsonRequestsAt(final URL startUrl, final Function<? super JsonObject, URL> nextLinkExtrator, final JsonParser jsonParser) {
        return () -> new Iterator<JsonElement>() {
            private URL nextUrl = startUrl;

            @Override
            public boolean hasNext() {
                return nextUrl != null;
            }

            @Override
            public JsonElement next() {
                if ( nextUrl == null ) {
                    throw new NoSuchElementException();
                }
                try ( final Reader reader = readFrom(nextUrl) ) {
                    final JsonElement root = jsonParser.parse(reader);
                    nextUrl = nextLinkExtrator.apply(root.getAsJsonObject());
                    return root;
                } catch ( final IOException ex ) {
                    throw new RuntimeException(ex);
                }
            }
        };
    }

    // Just a helper method to iterate over the start response
    static Iterable<JsonElement> getAfterwords()
            throws MalformedURLException {
        return jsonRequestsAt(
                afterwordsUrl(),
                root -> {
                    try {
                        final JsonElement next = root.get("Links").getAsJsonObject().get("next");
                        return next != null ? new URL(next.getAsString()) : null;
                    } catch ( final MalformedURLException ex ) {
                        throw new RuntimeException(ex);
                    }
                },
                jsonParser
        );
    }

    // Just extract points and yearbooks.
    // You can return a custom data holder class, but this one uses consuming-style passing the results via its parameter consumer
    static void extractPointsAndYearbooks(final Reader reader, final IPointsAndYearbooksConsumer consumer) {
        final JsonObject expensesJsonObject = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wydatki")
                .getAsJsonObject();
        final SerializedDataPoints points = gson.fromJson(expensesJsonObject.get("punkty").getAsJsonArray(), SerializedDataPoints.class);
        final SerializedDataYears yearbooks = gson.fromJson(expensesJsonObject.get("roczniki").getAsJsonArray(), SerializedDataYears.class);
        consumer.acceptPointsAndYearbooks(points, yearbooks);
    }

    // The same as above but for another type of response
    static void extractTrips(final Reader reader, final ITripsConsumer consumer) {
        final JsonElement tripsJsonElement = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wyjazdy");
        final SerializedDataTrips trips = tripsJsonElement.isJsonArray()
                ? gson.fromJson(tripsJsonElement.getAsJsonArray(), SerializedDataTrips.class)
                : null;
        consumer.acceptTrips(trips);
    }

    // It might be a constant field, but the next methods are dynamic (parameter-dependent), so let them all be similar
    // Checked exceptions are not that evil, and let the call-site decide what to do with them
    static URL afterwordsUrl()
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json");
    }

    // The same as above
    static URL afterwordsUrl(final int page)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json?_type=objects&page=" + page);
    }

    // The same as above
    static URL tripsUrl(final int envoyId)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wyjazdy");
    }

    // The same as above
    static URL expensesUrl(final int envoyId)
            throws MalformedURLException {
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wydatki");
    }

    // Since jsonParser is encapsulated
    static JsonElement parseJsonElement(final Reader reader) {
        return jsonParser.parse(reader);
    }

    // A helper method to return a reader for the given URL
    static Reader readFrom(final URL url)
            throws IOException {
        final HttpURLConnection request = (HttpURLConnection) url.openConnection();
        request.connect();
        return new BufferedReader(new InputStreamReader((InputStream) request.getContent()));
    }

    // Waits for all futures used in multi-threaded demo
    // Not sure how good this method is since I'm not an expert in concurrent programming unfortunately
    static void waitForAllFutures(final Iterable<? extends Future<?>> futures)
            throws ExecutionException, InterruptedException {
        final Iterator<? extends Future<?>> iterator = futures.iterator();
        while ( iterator.hasNext() ) {
            final Future<?> future = iterator.next();
            future.get();
            iterator.remove();
        }
    }

}

SingleThreadedDemo.java

The simplest demo. Entire data pulling is executed in a single thread, so it tends to be the slowest demo here. This one is totally thread-safe having no fields and can be declared as a singleton.

最简单的演示。整个数据拉动在一个线程中执行,因此它往往是最慢的演示。这个是完全线程安全的,没有字段,可以声明为单例。

final class SingleThreadedDemo
        extends AbstractDemo {

    private static final Callable<List<EnvoyData>> singleThreadedDemo = new SingleThreadedDemo();

    private SingleThreadedDemo() {
    }

    static Callable<List<EnvoyData>> getSingleThreadedDemo() {
        return singleThreadedDemo;
    }

    @Override
    protected List<EnvoyData> doCall()
            throws IOException {
        final List<EnvoyData> envoys = new ArrayList<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) {
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                    extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                        // ... consume points and yearbooks here
                    });
                }
                try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                    extractTrips(tripsReader, trips -> {
                        // ... consume trips here
                    });
                }
            }
        }
        return envoys;
    }

}

MultiThreadedDemo.java

Unfortunately I'm really very weak in Java concurrency, and probably these multi-threaded demos can be improved dramatically. This semi-multi-threaded demo that uses both approaches:

不幸的是,我在Java并发性方面非常弱,而且这些多线程演示可能会得到显着改善。这个使用两种方法的半多线程演示:

  • one thread for iterating over pages;
  • 一个用于迭代页面的线程;

  • multiple threads to grab points, yearbooks, and trips data.
  • 多个线程来获取积分,年鉴和旅行数据。

Also note that this demo (and another multi-threaded one below) is not fail-safe: if anything throws an exception in a submitted task, the executor service background thread won't terminate properly. Thus you might want to make it fail-safe and robust yourself.

另请注意,此演示(以及下面的另一个多线程)不是故障安全的:如果在提交的任务中抛出异常,则执行程序服务后台线程将无法正常终止。因此,您可能希望自己实现故障安全和强大。

final class MultiThreadedDemo
        extends AbstractDemo {

    private final ExecutorService executorService;

    private MultiThreadedDemo(final ExecutorService executorService) {
        this.executorService = executorService;
    }

    static Callable<List<EnvoyData>> getMultiThreadedDemo(final ExecutorService executorService) {
        return new MultiThreadedDemo(executorService);
    }

    @Override
    protected List<EnvoyData> doCall()
            throws InterruptedException, ExecutionException, MalformedURLException {
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) {
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsJsonPrimitive().getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            }
        }
        waitForAllFutures(futures);
        return envoys;
    }

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                    // ... consume points and yearbooks here
                });
                return null;
            }
        }));
    }

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                extractTrips(tripsReader, trips -> {
                    // ... consume trips here
                });
                return null;
            }
        }));
    }

}

MultiThreadedEstimatedPagesDemo.java

This one is somewhat more enhanced version of the previous demo. But this demo submits executor service tasks for iterating over the service pages. To achieve it, it's necessary to detect the number of pages beforehand. And having the number of pages lets to make https://...poslowie.json?...page=... URLs processing parallel. Note that if there are more than 1 page found, the next iteration starts from the 2nd page, not making a duplicate request.

这个是前一个演示的一些更强大的版本。但是这个演示提交了执行者服务任务,用于迭代服务页面。为了实现它,有必要事先检测页数。并且有多少页面可以使https://...poslowie.json?... page = ... URL并行处理。请注意,如果找到的页面超过1,则下一次迭代从第2页开始,而不是重复请求。

final class MultiThreadedEstimatedPagesDemo
        extends AbstractDemo {

    private final ExecutorService executorService;

    private MultiThreadedEstimatedPagesDemo(final ExecutorService executorService) {
        this.executorService = executorService;
    }

    static Callable<List<EnvoyData>> getMultiThreadedEstimatedPagesDemo(final ExecutorService executorService) {
        return new MultiThreadedEstimatedPagesDemo(executorService);
    }

    @Override
    protected List<EnvoyData> doCall()
            throws IOException, JsonIOException, JsonSyntaxException, InterruptedException, ExecutionException {
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final JsonObject page1RootJsonObject;
        final int totalPages;
        try ( final Reader page1Reader = readFrom(afterwordsUrl()) ) {
            page1RootJsonObject = parseJsonElement(page1Reader).getAsJsonObject();
            totalPages = estimateTotalPages(page1RootJsonObject);
        }
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        futures.add(executorService.submit(() -> {
            final JsonArray dataObjectArray = page1RootJsonObject.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            }
            return null;
        }));
        for ( int page = 2; page <= totalPages; page++ ) {
            final int finalPage = page;
            futures.add(executorService.submit(() -> {
                try ( final Reader reader = readFrom(afterwordsUrl(finalPage)) ) {
                    final JsonElement afterwordJsonElement = parseJsonElement(reader);
                    final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
                    for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) {
                        final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                        submitExtractPointsAndYearbooks(futures, envoyId);
                        submitExtractTrips(futures, envoyId);
                    }
                }
                return null;
            }));
        }
        waitForAllFutures(futures);
        return envoys;
    }

    private static int estimateTotalPages(final JsonObject rootJsonObject) {
        final int elementsPerPage = rootJsonObject.get("Dataobject").getAsJsonArray().size();
        final int totalElements = rootJsonObject.get("Count").getAsInt();
        return (int) ceil((double) totalElements / elementsPerPage);
    }

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) {
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> {
                    // ... consume points and yearbooks here
                });
                return null;
            }
        }));
    }

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) {
        futures.add(executorService.submit(() -> {
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) {
                extractTrips(tripsReader, trips -> {
                    // ... consume trips here
                });
                return null;
            }
        }));
    }

}

Test.java

And the demo itself:

演示本身:

public final class Test {

    private Test() {
    }

    public static void main(final String... args)
            throws Exception {
        runSingleThreadedDemo();
        runMultiThreadedDemo();
        runMultiThreadedEstimatedPagesDemo();
    }

    private static void runSingleThreadedDemo()
            throws Exception {
        final Callable<?> singleThreadedDemo = getSingleThreadedDemo();
        singleThreadedDemo.call();
    }

    private static void runMultiThreadedDemo()
            throws Exception {
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedDemo(executorService);
        demo.call();
        executorService.shutdown();
    }

    private static void runMultiThreadedEstimatedPagesDemo()
            throws Exception {
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedEstimatedPagesDemo(executorService);
        demo.call();
        executorService.shutdown();
    }

}