Researcher Profile - MASADA Tomonari

写真b

MASADA Tomonari

*Items subject to periodic update by Rikkyo University (The rest are reprinted from information registered on researchmap.)

Affiliation*

Graduate School of Artificial Intelligence and Science Master's Program in Artificial Intelligence and Science
College of Economics Department of Economics
Graduate School of Artificial Intelligence and Science Doctoral Program in Artificial Intelligence and Science

Title*

Professor

Degree

修士（学術）（東京大学） / 修士（理学）（東京大学） / 博士（情報理工学）（東京大学） / 学士（理学）（東京大学）

Homepage

https://tomonari-masada.github.io/

Research Interests

probabilistic models

text mining

machine learning

data mining

Courses in charge*

Academic Year 2025

Special Seminar on Natural Language Processing

Campus Career*

4 2022 - Present

Graduate School of Artificial Intelligence and Science Master's Program in Artificial Intelligence and Science Professor
4 2022 - Present

Graduate School of Artificial Intelligence and Science Doctoral Program in Artificial Intelligence and Science Professor
4 2020 - Present

College of Economics Department of Economics Professor
4 2020 - 3 2022

Graduate School of Artificial Intelligence and Science Artificial Intelligence and Science Professor

External Link

Research Areas

Informatics / Information theory
Informatics / Intelligent informatics
Informatics / Database science

Research History

4 2020 - Present

立教大学大学院人工知能科学研究科教授

▶ More details

researchmap
4 2012 - 3 2020

Nagasaki University Graduate school of Engineering Associate Professor

▶ More details

Country：Japan

researchmap
2008 - 2012

Nagasaki University Faculty of Engineering

▶ More details

researchmap
2008 - 2012

Assistant Professor,Electrical and Electronic ,Faculty of Engineering,Nagasaki University

▶ More details

researchmap
2007 - 2008

Nagasaki University Faculty of Engineering, Department of Computer and Information Sciences

▶ More details

researchmap
2007 - 2008

Assistant Professor,Computer and Information Sciences,Faculty of Engineering,Nagasaki University

▶ More details

researchmap
10 1999 - 9 2001

富士写真光機株式会社職員（技術系）光学設計部

▶ More details

researchmap
1999 - 2001

Engineering Staff

▶ More details

researchmap

▼display all

Education

- 2004

The University of Tokyo

▶ More details

Country： Japan

researchmap
- 1999

The University of Tokyo

▶ More details

Country： Japan

researchmap
- 1995

The University of Tokyo

▶ More details

Country： Japan

researchmap
- 1993

The University of Tokyo Faculty of Science Department of Information Science

▶ More details

Country： Japan

researchmap

Awards

2 2020

九州半導体・エレクトロニクスイノベーション協議会令和元年度第二回「SIIQ技術大賞」金賞

正田備也

▶ More details

researchmap
5 2018

Science and Engineering Institute Best Oral Presentation Document Modeling with Implicit Approximate Posterior Distributions

Tomonari MASADA

▶ More details

researchmap
6 2011

INSTICC Best Paper Award DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS - An Unsupervised Feature Extraction for Document Clustering

▶ More details

researchmap
2006

情報処理学会論文賞

▶ More details

Country：Japan

researchmap
2003

DEWS優秀プレゼンテーション賞

▶ More details

Country：Japan

researchmap

Papers

Feature Extraction from Equipment Sensor Signals with Time Series Clustering and Its Application to Defect Prediction

Daisuke Hamaguchi, Tomonari Masada, Takumi Eguchi

IEEE International Symposium on Semiconductor Manufacturing Conference Proceedings2020- 15 12 2020

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Institute of Electrical and Electronics Engineers Inc.

In semiconductor manufacturing processes, it is important to quickly identify any signs of the occurrence of defects. We applied a time-series clustering method to the signal data of processing equipment and obtained information related to the occurrence of defects. By using the information as the feature values of a prediction model, we were able to predict defects more accurately than by using only conventional feature values.

DOI： 10.1109/ISSM51728.2020.9377525

Scopus

researchmap
Myanmar Text-to-Speech System based on Tacotron-2.

Yuzana Win, Tomonari Masada

International Conference on Information and Communication Technology Convergence(ICTC) 578 - 583 2020

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICTC49870.2020.9289599

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinM20
Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model).

Yuzana Win, Htoo Pyae Lwin, Tomonari Masada

International Conference on Information and Communication Technology Convergence(ICTC) 572 - 577 2020

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICTC49870.2020.9289277

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinLM20
Context-Dependent Token-Wise Variational Autoencoder for Topic Modeling. Peer-reviewed

Tomonari Masada

Current Trends in Web Engineering - ICWE 2019 International Workshops 35 - 47 2019

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-030-51253-8_6

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icwe/icwe2019w.html#Masada19
Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis. Peer-reviewed

Tomonari Masada, Takumi Eguchi, Daisuke Hamaguchi

2019 International Conference on Data Mining Workshops 391 - 398 2019

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/ICDMW.2019.00064

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/icdm/icdm2019w.html#MasadaEH19
Mini-Batch Variational Inference for Time-Aware Topic Modeling. Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, August 28-31, 2018, Proceedings, Part II 156 - 164 2018

▶ More details

Authorship：Lead author Publisher：Springer

DOI： 10.1007/978-3-319-97310-4_18

researchmap
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10862 395 - 402 2018

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for automatic Tanka composition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score. While a scoring of sequences can also be achieved by using RNN output probabilities, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. The experimental results, where we scored Japanese Tanka poems generated by RNN, show that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected by RNN output probabilities.

DOI： 10.1007/978-3-319-93713-7_33

Scopus

researchmap
Document Modeling with Implicit Approximate Posterior Distributions. Peer-reviewed

Tomonari Masada

Proceedings of the International Conference on Data Processing and Applications, ICDPA 2018, Guangdong, China, May 12-14, 2018 45 - 48 2018

▶ More details

Publisher：ACM

DOI： 10.1145/3224207.3224214

researchmap
Adversarial Learning for Topic Models. Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Advanced Data Mining and Applications - 14th International Conference, ADMA 2018, Nanjing, China, November 16-18, 2018, Proceedings 292 - 302 2018

▶ More details

Publisher：Springer

DOI： 10.1007/978-3-030-05090-0_25

researchmap
Estimating Word probabilities with neural networks in latent dirichlet allocation Peer-reviewed

Tomonari Masada

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10526 129 - 137 2017

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

This paper proposes a new method for estimating the word probabilities in latent Dirichlet allocation (LDA). LDA uses a Dirichlet distribution as the prior for the per-document topic discrete distributions. While another Dirichlet prior can be introduced for the per-topic word discrete distributions, point estimations may lead to a better evaluation result, e.g. in terms of test perplexity. This paper proposes a method for the point estimation of the per-topic word probabilities in LDA by using multilayer perceptron (MLP). Our point estimation is performed in an online manner by mini-batch gradient ascent. We compared our method to the baseline method using a perceptron with no hidden layers and also to the collapsed Gibbs sampling (CGS). The evaluation experiment showed that the test perplexity of CGS could not be improved in almost all cases. However, there certainly were situations where our method achieved a better perplexity than the baseline. We also discuss a usage of our method as word embedding.

DOI： 10.1007/978-3-319-67274-8_12

Scopus

researchmap
Exploring OOV Words from Myanmar Text Using Maximal Substrings Peer-reviewed

Yuzana Win, Tomonari Masada

PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016 657 - 663 2016

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a method for exploring out-of-vocabulary (OOV) words from Myanmar text by using maximal substrings. Our main purpose is to find OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words that do not exist in the Myanmar dictionary. Our method consists of two steps. In the first step, we extract maximal substrings, i.e., the substrings whose number of occurrences are decreased only after appending a character before or after them, from Myanmar news articles. In the second step, we make the post processing of maximal substrings, because the results obtained by maximal substrings contain noisy characters. Our post-processing is threefold. First, we reduce the number of maximal substrings. Second, we remove maximal substrings whose prefixes and suffixes are meaningless characters. Third, we find OOV words that are the substrings consisting of the two words from the existing dictionary. Consequently, we obtain the substrings as candidates of new compound words that can be inserted into the existing Myanmar dictionary after being scrutinized by native speakers. We evaluate the accuracy of new compound words by using the subjective perspective. It is found that our results do seem promising. We appeal that new compound words obtained by our method are useful for expressing the words as a single unit of meaning that can be utilized in Myanmar text effectively.

DOI： 10.1109/IIAI-AAI.2016.73

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/iiaiaai/iiaiaai2016.html#conf/iiaiaai/WinM16
Extraction of Proper Names from Myanmar Text Using Latent Dirichlet Allocation Peer-reviewed

Yuzana Win, Tomonari Masada

2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) 96 - 103 2016

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a method for proper names extraction from Myanmar text by using latent Dirichlet allocation (LDA). Our method aims to extract proper names that provide important information on the contents of Myanmar text. Our method consists of two steps. In the first step, we extract topic words from Myanmar news articles by using LDA. In the second step, we make a post-processing, because the resulting topic words contain some noisy words. Our post-processing, first of all, eliminates the topic words whose prefixes are Myanmar digits and suffixes are noun and verb particles. We then remove the duplicate words and discard the topic words that are contained in the existing dictionary. Consequently, we obtain the words as candidate of proper names, namely personal names, geographical names, unique object names, organization names, single event names, and so on. The evaluation is performed both from the subjective and quantitative perspectives. From the subjective perspective, we compare the accuracy of proper names extracted by our method with those extracted by latent semantic indexing (LSI) and rule-based method. It is shown that both LSI and our method can improve the accuracy of those obtained by rule-based method. However, our method can provide more interesting proper names than LSI. From the quantitative perspective, we use the extracted proper names as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by LSI and rule-based method in precision, recall and F-score.

DOI： 10.1109/TAAI.2016.7880176

researchmap
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT IV9789 232 - 245 2016

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER INT PUBLISHING AG

This paper proposes a new inference for the latent Dirichlet allocation (LDA) [4]. Our proposal is an instance of the stochastic gradient variational Bayes (SGVB) [9,13]. SGVB is a general framework for devising posterior inferences for Bayesian probabilistic models. Our aim is to show the effectiveness of SGVB by presenting an example of SGVB-type inference for LDA, the best-known Bayesian model in text mining. The inference proposed in this paper is easy to implement from scratch. A special feature of the proposed inference is that the logistic normal distribution is used to approximate the true posterior. This is counterintuitive, because we obtain the Dirichlet distribution by taking the functional derivative when we lower bound the log evidence of LDA after applying a mean field approximation. However, our experiment showed that the proposed inference gave a better predictive performance in terms of test set perplexity than the inference using the Dirichlet distribution for posterior approximation. While the logistic normal is more complicated than the Dirichlet, SGVB makes the manipulation of the expectations with respect to the posterior relatively easy. The proposed inference was better even than the collapsed Gibbs sampling [6] for not all but many settings consulted in our experiment. It must be worthwhile future work to devise a new inference based on SGVB also for other Bayesian models.

DOI： 10.1007/978-3-319-42089-9_17

researchmap
A simple stochastic gradient variational bayes for the correlated topic model Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)9932 424 - 428 2016

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

This paper proposes a new inference for the correlated topic model (CTM) [3]. CTM is an extension of LDA [4] for modeling correlations among latent topics. The proposed inference is an instance of the stochastic gradient variational Bayes (SGVB) [7,8]. By constructing the inference network with the diagonal logistic normal distribution, we achieve a simple inference. Especially, there is no need to invert the covariance matrix explicitly. We performed a comparison with LDA in terms of predictive perplexity. The two inferences for LDA are considered: the collapsed Gibbs sampling (CGS) [5] and the collapsed variational Bayes with a zero-order Taylor expansion approximation (CVB0) [1]. While CVB0 for LDA gave the best result, the proposed inference achieved the perplexities comparable with those of CGS for LDA.

DOI： 10.1007/978-3-319-45817-5_39

Scopus

researchmap
Heuristic Pretraining for Topic Models Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE9101 351 - 360 2015

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

DOI： 10.1007/978-3-319-19066-2_34

researchmap
Traffic Speed Data Investigation with Hierarchical Modeling Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

FUTURE DATA AND SECURITY ENGINEERING, FDSE 20159446 123 - 134 2015

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER INT PUBLISHING AG

This paper presents a novel topic model for traffic speed analysis in the urban environment. Our topic model is special in that the parameters for encoding the following two domain-specific aspects of traffic speeds are introduced. First, traffic speeds are measured by the sensors each having a fixed location. Therefore, it is likely that similar measurements will be given by the sensors located close to each other. Second, traffic speeds show a 24-hour periodicity. Therefore, it is likely that similar measurements will be given at the same time point on different days. We model these two aspects with Gaussian process priors and make topic probabilities location-and time-dependent. In this manner, our model utilizes the metadata of the traffic speed data. We offer a slice sampling to achieve less approximation than variational Bayesian inferences. We present an experimental result where we use the traffic speed data provided by New York City.

DOI： 10.1007/978-3-319-26135-5_10

researchmap
Exploring Technical Phrase Frames from Research Paper Titles Peer-reviewed

Yuzana Win, Tomonari Masada

2015 IEEE 29TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS WAINA 2015 558 - 563 2015

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence graph of the trigrams. Even by simply applying PageRank algorithm to the co-occurrence graph, we obtain the trigrams that can be regarded as technical keyphrases at the higher ranks in terms of PageRank score. In contrast, our method assigns weights to the edges of the co-occurrence graph based on Jaccard similarity between trigrams and then apply weighted PageRank algorithm. Consequently, we obtain widely different but more interesting results. While the top-ranked trigrams obtained by unweighted PageRank have just a self-contained meaning, those obtained by our method are technical phrase frames, i.e., a word sequence that forms a complete technical phrase only after putting a technical word (or words) before or/and after it. We claim that our method is a useful tool for discovering important phraseological patterns, which can expand query keywords for improving information retrieval performance and can also work as candidate phrasings in technical writing to make our research papers attractive.

DOI： 10.1109/WAINA.2015.37

researchmap
ChronoSAGE: Diversifying Topic Modeling Chronologically Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

WEB-AGE INFORMATION MANAGEMENT, WAIM 20148485 476 - 479 2014

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper provides an application of sparse additive generative models (SAGE) for temporal topic analysis. In our model, called ChronoSAGE, topic modeling results are diversified chronologically by using document timestamps. That is, word tokens are generated not only in a topic-specific manner, but also in a time-specific manner. We firstly compare ChronoSAGE with latent Dirichlet allocation (LDA) in terms of pointwise mutual information to show its practical effectiveness. We secondly give an example of time-differentiated topics, obtained by ChronoSAGE as word lists, to show its usefulness in trend detection.

DOI： 10.1007/978-3-319-08010-9_51

researchmap
A topic model for traffic speed data analysis Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8482 ( 2 ) 68 - 77 2014

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Verlag

We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied. © 2014 Springer International Publishing Switzerland.

DOI： 10.1007/978-3-319-07467-2_8

Scopus

researchmap
Explaining Prices by Linking Data: A Pilot Study on Spatial Regression Analysis of Apartment Rents Peer-reviewed

Bin Shen, Tomonari Masada

2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE) 188 - 189 2014

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper reports a pilot study where we link different types of data for explaining prices. In this study, we link the apartment rent data with the publicly accessible location data of landmarks like supermarkets, hospitals, etc. We apply the regression analysis to find the most important factor determining the apartment rents. We claim that the results of this type of spatial data mining can enhance the user experience in the apartment search system, because we can indicate a rationale behind pricing as additional information to users and thus can make them more confident in their choices.

DOI： 10.1109/GCCE.2014.7031088

researchmap
Collaborator Recommendation for Isolated Researchers Peer-reviewed

Tin Huynh, Atsuhiro Takasu, Tomonari Masada, Kiem Hoang

2014 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA) 639 - 644 2014

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Successful research collaborations may facilitate major outcomes in science and their applications. Thus, identifying effective collaborators may be a key factor that affects success. However, it is very difficult to identify potential collaborators and it is particularly difficult for young researchers who have less knowledge about other researchers and experts in their research domain. This study introduces and defines the problem of collaborator recommendation for 'isolated' researchers who have no links with others in coauthor networks. Existing approaches such as link-based and content-based methods may not be suitable for isolated researchers because of their lack of links and content information. Thus, we propose a new approach that uses additional information as new features to make recommendations, i.e., the strength of the relationship between organizations, the importance rating, and the activity scores of researchers. We also propose a new method for evaluating the quality of collaborator recommendations. We performed experiments by crawling publications from the Microsoft Academic Search website. The metadata were extracted from these publications, including the year, authors, organizational affiliations of authors, citations, and references. The metadata from publications between 2001 and 2005 were used as the training data while those from 2006 to 2011 were used for validation. The experimental results demonstrated the effectiveness and efficiency of our proposed approach.

DOI： 10.1109/WAINA.2014.105

researchmap
Trimming prototypes of handwritten digit images with subset infinite relational model Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Electrical Engineering240 129 - 134 2013

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

We propose a new probabilistic model for constructing efficient prototypes of handwritten digit images. We assume that all digit images are of the same size and obtain one color histogram for each pixel by counting the number of occurrences of each color over multiple images. For example, when we conduct the counting over the images of digit "5", we obtain a set of histograms as a prototype of digit "5". After normalizing each histogram to a probability distribution, we can classify an unknown digit image by multiplying probabilities of the colors appearing at each pixel of the unknown image. We regard this method as the baseline and compare it with a method using our probabilistic model called Multinomialized Subset Infinite Relational Model (MSIRM), which gives a prototype, where color histograms are clustered column- and row-wise. The number of clusters is adjusted flexibly with Chinese restaurant process. Further, MSIRM can detect irrelevant columns and rows. An experiment, comparing our method with the baseline and also with a method using Dirichlet process mixture, revealed that MSIRM could neatly detect irrelevant columns and rows at peripheral part of digit images. That is, MSIRM could "trim" irrelevant part. By utilizing this trimming, we could speed up classification of unknown images. © 2013 Springer Science+Business Media Dordrecht(Outside the USA).

DOI： 10.1007/978-94-007-6738-6_16

Scopus

researchmap
A revised inference for correlated topic model Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)7952 ( 2 ) 445 - 454 2013

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

In this paper, we provide a revised inference for correlated topic model (CTM) [3]. CTM is proposed by Blei et al. for modeling correlations among latent topics more expressively than latent Dirichlet allocation (LDA) [2] and has been attracting attention of researchers. However, we have found that the variational inference of the original paper is unstable due to almost-singularity of the covariance matrix when the number of topics is large. This means that we may be reluctant to use CTM for analyzing a large document set, which may cover a rich diversity of topics. Therefore, we revise the inference and improve its quality. First, we modify the formula for updating the covariance matrix in a manner that enables us to recover the original inference by adjusting a parameter. Second, we regularize posterior parameters for reducing a side effect caused by the formula modification. While our method is based on a heuristic intuition, an experiment conducted on large document sets showed that it worked effectively in terms of perplexity. © 2013 Springer-Verlag Berlin Heidelberg.

DOI： 10.1007/978-3-642-39068-5-54

Scopus

researchmap
Three-way nonparametric Bayesian clustering for handwritten digit image classification Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8228 ( 3 ) 149 - 156 2013

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

This paper proposes a new approach for handwritten digit image classification using a nonparametric Bayesian probabilistic model, called multinomialized subset infinite relational model (MSIRM). MSIRM realizes a three-way clustering, i.e., a simultaneous clustering of digit images, pixel columns, and pixel rows, where the numbers of clusters are adjusted automatically with Chinese restaurant process (CRP). We obtain MSIRM as a modification of subset infinite relational model (SIRM) by Ishiguro et al. [4] While this modification is straightforward, our application of MSIRM to handwritten digit image classification leads to an impressive result. To represent a large number of training digit images in a compact form, we cluster the training images and then classify a test image to the class of the cluster most similar to the test image. By extending this line of thought, MSIRM clusters not only digit images but also pixel columns and pixel rows to obtain a more compact representation. With this three-way clustering, we achieved 2.95% and 5.38% test error rates for MNIST and USPS datasets, respectively. © Springer-Verlag 2013.

DOI： 10.1007/978-3-642-42051-1_20

Scopus

researchmap
Clustering Documents with Maximal Substrings Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ENTERPRISE INFORMATION SYSTEMS, ICEIS 2011102 19 - 34 2012

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper provides experimental results showing that we can use maximal substrings as elementary building blocks of documents in place of the words extracted by a current state-of-the-art supervised word extraction. Maximal substrings are defined as the substrings each giving a smaller number of occurrences even by appending only one character to its head or tail. The main feature of maximal substrings is that they can be extracted quite efficiently in an unsupervised manner. We extract maximal substrings from a document set and represent each document as a bag of maximal substrings. We also obtain a bag of words representation by using a state-of-the-art supervised word extraction over the same document set. We then apply the same document clustering method to both representations and obtain two clustering results for a comparison of their quality. We adopt a Bayesian document clustering based on Dirichlet compound multinomials for avoiding overfitting. Our experiment shows that the clustering quality achieved with maximal substrings is acceptable enough to use them in place of the words extracted by a supervised word extraction.

DOI： 10.1007/978-3-642-29958-2_2

researchmap
Extraction of topic evolutions from references in scientific articles and its GPU acceleration Peer-reviewed

Tomonari Masada, Atsuhiro Takasu

ACM International Conference Proceeding Series 1522 - 1526 2012

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives. © 2012 ACM.

DOI： 10.1145/2396761.2398465

Scopus

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/cikm/cikm2012.html#conf/cikm/MasadaT12
Unsupervised segmentation of bibliographic elements with latent permutations Peer-reviewed

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)6724 LNCS 254 - 267 2011

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：Springer

This paper introduces a novel approach for large-scale unsupervised segmentation of bibliographic elements. Our problem is to segment a word token sequence representing a citation into subsequences each corresponding to a different bibliographic element, e.g. authors, paper title, journal name, publication year, etc. Obviously, each bibliographic element should be represented by contiguous word tokens. We call this constraint contiguity constraint. Therefore, we should infer a sequence of assignments of word tokens to bibliographic elements so that this constraint is satisfied. Many HMM-based methods solve this problem by prescribing fixed transition patterns among bibliographic elements. In this paper, we use generalized Mallows models (GMM) in a Bayesian multi-topic model, effectively applied to document structure learning by Chen et al. [4], and infer a permutation of latent topics each of which can be interpreted as one among the bibliographic elements. According to the inferred permutation, we arrange the order of the draws from a multinomial distribution defined over topics. In this manner, we can obtain an ordered sequence of topic assignments satisfying contiguity constraint. We do not need to prescribe any transition patterns among bibliographic elements. We only need to specify the number of bibliographic elements. However, the method proposed by Chen et al. works for our problem only after introducing modification. The main contribution of this paper is to propose strategies to make their method work also for our problem. © 2011 Springer-Verlag.

DOI： 10.1007/978-3-642-24396-7_20

Scopus

researchmap
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I6634 435 - 447 2011

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper provides a new approach to topical trend analysis. Our aim is to improve the generalization power of latent Dirichlet allocation (LDA) by using document timestamps. Many previous works model topical trends by making latent topic distributions time-dependent. We propose a straightforward approach by preparing a different word multinomial distribution for each time point. Since this approach increases the number of parameters, overfitting becomes a critical issue. Our contribution to this issue is two-fold. First, we propose an effective way of defining Dirichlet priors over the word multinomials. Second, we propose a special scheduling of variational Bayesian (VB) inference. Comprehensive experiments with six datasets prove that our approach can improve LDA and also Topics over Time, a well-known variant of LDA, in terms of test data perplexity in the framework of VB inference.

DOI： 10.1007/978-3-642-20841-6-36

DOI： 10.1007/978-3-642-20841-6_36

researchmap
DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS An Unsupervised Feature Extraction for Document Clustering Peer-reviewed

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 5 - 13 2011

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION

This paper provides experimental results showing how we can use maximal substrings as elementary features in document clustering. We extract maximal substrings, i.e., the substrings each giving a smaller number of occurrences even after adding only one character at its head or tail, from the given document set and represent each document as a bag of maximal substrings after reducing the variety of maximal substrings by a simple frequency-based selection. This extraction can be done in an unsupervised manner. Our experiment aims to compare bag of maximal substrings representation with bag of words representation in document clustering. For clustering documents, we utilize Dirichlet compound multinomials, a Bayesian version of multinomial mixtures, and measure the results by F-score. Our experiment showed that maximal substrings were as effective as words extracted by a dictionary-based morphological analysis for Korean documents. For Chinese documents, maximal substrings were not so effective as words extracted by a supervised segmentation based on conditional random fields. However, one fourth of the clustering results given by bag of maximal substrings representation achieved F-scores better than the mean F-score given by bag of words representation. It can be said that the use of maximal substrings achieved an acceptable performance in document clustering.

researchmap
Semi-supervised Bibliographic Element Segmentation with Latent Permutations Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

DIGITAL LIBRARIES: FOR CULTURAL HERITAGE, KNOWLEDGE DISSEMINATION, AND FUTURE CREATION7008 60 - + 2011

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper proposes a semi-supervised bibliographic element segmentation. Our input data is a large scale set of bibliographic references each given as an unsegmented sequence of word tokens. Our problem is to segment each reference into bibliographic elements, e.g. authors, title, journal, pages, etc. We solve this problem with an LDA-like topic model by assigning each word token to a topic so that the word tokens assigned to the same topic refer to the same bibliographic element. Topic assignments should satisfy contiguity constraint, i.e., the constraint that the word tokens assigned to the same topic should be contiguous. Therefore, we proposed a topic model in our preceding work [8] based on the topic model devised by Chen et al. [3]. Our model extends LDA and realizes unsupervised topic assignments satisfying contiguity constraint. The main contribution of this paper is the proposal of a semi-supervised learning for our proposed model. We assume that at most one third of word tokens are already labeled. In addition, we assume that a few percent of the labels may be incorrect. The experiment showed that our semi-supervised learning improved the unsupervised learning by a large margin and achieved an over 90% segmentation accuracy.

DOI： 10.1007/978-3-642-24826-9_11

researchmap
Implementation of a programming environment with a multithread model for reconfigurable systems Peer-reviewed

Keisuke Dohi, Yuichiro Shibata, Tsuyoshi Hamada, Tomonari Masada, Kiyoshi Oguri, Duncan A. Buell

ACM SIGARCH Computer Architecture News38 ( 4 ) 40 - 45 14 9 2010

▶ More details

Publishing type：Research paper (scientific journal) Publisher：Association for Computing Machinery (ACM)

DOI： 10.1145/1926367.1926375

researchmap
Infinite Latent Process Decomposition Peer-reviewed

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW) 810 - 811 2010

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE COMPUTER SOC

This paper presents infinite latent process decomposition (iLPD), a new microarray analysis method, as an extension of latent process decomposition in Our method assumes an infinite number of latent processes. Further, our new collapsed variational Bayesian inference improves the inference proposed in [2] in the treatment of Dirichlet hyperparameters. We also give the results of the comparison experiment.

researchmap
Modeling Topical Trends over Continuous Time with Priors Peer-reviewed

Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS6064 302 - + 2010

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

In this paper, we propose a new method for topical trend analysis. We model topical trends by per-topic Beta distributions as in Topics over Time (TOT), proposed as an extension of latent Dirichlet allocation (LDA). However, TOT is likely to overfit to timestamp data in extracting latent topics. Therefore; we apply prior distributions to Beta distributions in TOT. Since Beta distribution has no conjugate prior; we devise a trick, where we set one among the two parameters of each per-topic Beta distribution to one based on a Bernoulli trial and apply Gamma distribution as a conjugate prior. Consequently; we can marginalize out the parameters of Beta distributions and thus treat; timestamp data in a Bayesian fashion. In the evaluation experiment, we compare our method with LDA and TOT in link detection task on TDT4 dataset. We use word predictive probabilities as term weights and estimate document similarities by using those weights in a TFIDF-like scheme. The results show that our method achieves a moderate fitting to timestamp data.

DOI： 10.1007/978-3-642-13318-3_38

researchmap
A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - Towards cost effective, high performance N-body simulation Peer-reviewed

Tsuyoshi Hamada, Keigo Nitadori, Khaled Benkrid, Yousuke Ohno, Gentaro Morimoto, Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri, Makoto Taiji

Computer Science - Research and Development24 ( 1-2 ) 21 - 31 9 2009

▶ More details

Publishing type：Research paper (scientific journal)

Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple O(N2)algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized O(N log N) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs. © 2009 Springer-Verlag.

DOI： 10.1007/s00450-009-0089-1

Scopus

researchmap
Accelerating the Phase Only Correlation method using GPUs Peer-reviewed

MATSUO Kentaro, MIYOSHI Masayuki, HAMADA Tsuyoshi, SHIBATA Yuichiro, MASADA Tomonari, OGURI Kiyoshi

ITE Technical Report33 ( 0 ) 201 - 206 2009

▶ More details

Language：Japanese Publisher：The Institute of Image Information and Television Engineers

The Phase Only Correlation (POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc. We have proposed a novel approach to accelerate POC method using GPU to solve the calculation cost problem. Using our GPU-based POC implementation, each POC calculation can be done within 2.36 seconds for 256×256 pixels, within 7.92 seconds for 512×512 pixels, and 27.65 seconds for 1024×1024 pixels.

DOI： 10.11485/itetr.33.6.0_201

CiNii Article

researchmap

Other Link： http://hdl.handle.net/10069/22664
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS5446 556 - + 2009

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

In this paper, we propose a new probabilistic model, Bay of Timestamps (BoT), for chronological text mining. BoT is an extension of latent Dirichlet allocation (LDA), and has two remarkable features when compared with a previously proposed Topics over Time (ToT), which is also an extension of LDA. First, we can avoid overfitting to temporal data, because temporal data are modeled in a Bayesian manner similar to word frequencies. Second, BoT has a conditional probability where no functions requiring time-consuming computations appear. The experiments using newswire documents show that BoT achieves more moderate fitting to temporal data in shorter execution time than ToT.

DOI： 10.1007/978-3-642-00672-2-51

DOI： 10.1007/978-3-642-00672-2_51

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices Peer-reviewed

Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

NEXT-GENERATION APPLIED INTELLIGENCE, PROCEEDINGS5579 491 - 500 2009

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LIDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LIDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.

DOI： 10.1007/978-3-642-02568-6_50

researchmap
Dynamic hyperparameter optimization for bayesian topical trend analysis Peer-reviewed

Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

International Conference on Information and Knowledge Management, Proceedings 1831 - 1834 2009

▶ More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking. Copyright 2009 ACM.

DOI： 10.1145/1645953.1646242

Scopus

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/cikm/cikm2009.html#conf/cikm/MasadaFTHSO09
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation Peer-reviewed

Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS5678 253 - 264 2009

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

This paper provides a new method for multi-topic Bayesian analysis for microarray data. Our method achieves a further maximization of lower bounds in a marginalized variational Bayesian inference (MVB) for Latent Process Decomposition (LPD), which is an effective probabilistic model for microarray data. In our method, hyperparameters in LPD are updated by empirical Bayes point estimation. The experiments based on microarray data of realistically large size show efficiency of our hyperparameter reestimation technique.

DOI： 10.1007/978-3-642-03348-3_26

researchmap
GPU acceleration of multiple topic extraction from images by LDA document model Peer-reviewed

MASADA Tomonari, HAMADA Tsuyoshi, SHIBATA Yuichiro, OGURI Kiyoshi

ITE Technical Report32 ( 0 ) 1 - 6 2008

▶ More details

Language：Japanese Publisher：The Institute of Image Information and Television Engineers

In this paper, we propose a GPU acceleration of multi-topic extraction from images by using LDA (latent Dirichlet allocation). LDA is originally proposed as a probabilistic model for documents by Blei et al. In recent days, LDA is applied to multimedia information other than documents. We provide the results of experiments where we apply LDA to Professor Wang's 10,000 test images and extract multiple visional topics. We adpot collapsed variational Bayesian inference method for LDA and accelerate this by using Nvidia CUDA compatible GPU devices.

DOI： 10.11485/itetr.32.54.0_1

CiNii Article

researchmap
A Sub-Petaflops High Performance Computing System using GPUs Peer-reviewed

Hamada Tsuyoshi, Masada Tomonari, Shibata Yuichiro, Oguri Kiyoshi

ITE Technical Report32 ( 0 ) 17 - 19 2008

▶ More details

Language：Japanese Publisher：The Institute of Image Information and Television Engineers

DOI： 10.11485/itetr.32.54.0_17

CiNii Article

researchmap
Unmixed spectrum clustering for template composition in lung sound classification Peer-reviewed

Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS5012 964 - 969 2008

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

In this paper, we propose a method for composing templates of lung sound classification. First, we obtain a sequence of power spectra by FFT for each given lung sound and compute a small number of component spectra by ICA for each of the overlapping sets of tens of consecutive power spectra. Second, we put component spectra obtained from various lung sounds into a single set and conduct clustering a large number of times. When component spectra belong to the same cluster in all clustering results, these spectra show robust similarity. Therefore, we can use such spectra to compose a template of lung sound classification.

DOI： 10.1007/978-3-540-68125-0_100

researchmap
Comparing LDA with pLSI as a dimensionality reduction method in document clustering Peer-reviewed

Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

LARGE-SCALE KNOWLEDGE RESOURCES4938 13 - 26 2008

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：SPRINGER-VERLAG BERLIN

In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.

DOI： 10.1007/978-3-540-78159-2_2

researchmap
P2P情報検索における単語の重みに基づいたデータ分散配置手法（共著）

倉沢央, 若木宏美, 正田備也, 高須淳宏, 安達淳

情報処理学会マルチメディア、分散、協調とモバイルシンポジウム(DICOMO2007) 7 2007

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
Accuracy of Document Classification with Dirichlet Mixtures Peer-reviewed

MASADA TOMONARI, TAKASU ATSUHIRO, ADACHI JUN

48 ( SIG11(TOD34) ) 14 - 26 15 6 2007

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan (IPSJ)

The naive Bayes classifier is a well-known method for document classification. However, the naive Bayes classifier gives a satisfying classification accuracy only after an appropriate tuning of the smoothing parameter. Moreover, we should find appropriate parameter values separately for different document sets. In this paper, we focus on an effective probabilistic framework for document classification, called Dirichlet mixtures, which requires no parameter tuning and provides satisfying classification accuracies with respect to various document sets. Many researches in the field of image processing and of natural language processing utilize Dirichlet mixtures. Especially, in the field of natural language processing, many experiments are conducted by using real document data sets. However, most researches use the perplexity as an evaluation measure. While the perplexity is a purely theoretical measure, the accuracy is popular for document classification in the field of information retrieval or of text mining. The accuracy is computed by comparing correct labels with predictions made by the classifier. In this paper, we conduct an evaluation experiment by using 20 newsgroups data set and the Korean Web newspaper articles under the intention that we will use Dirichlet mixtures for multilingual applications. In the experiment, we compare the naive Bayes classifier with the classifier based on Dirichlet mixtures and clarify their qualitative and quantitative differences.

CiNii Article

researchmap

Other Link： http://hdl.handle.net/10069/16317
P2P情報検索における索引とファイルの分散配置手法

倉沢央, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告2007 ( 36 ) 147 - 154 5 4 2007

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
トピック指向単語クラスタリングを用いた複数トピックの包括的提示による検索支援

若木裕美, 正田備也, 高須淳宏, 安達淳

電子情報通信学会第18回データ工学ワークショップ (DEWS 2007) 3 2007

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
Detection of abnormal lung sounds through investigation of breathing cycle Peer-reviewed

Senya Kiyasu, Kohsuke Yanagihara, Tomonari Masada, Sueharu Miyahara, Mikio Oka

Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers61 ( 12 ) 1769 - 1773 2007

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Inst. of Image Information and Television Engineers

The purpose of our research is to develop a method for recognizing abnormal lung sounds without the need of a medical specialist. Listening to the sounds of the human body is one of the most important methods of checking someone's health. However, identification of abnormal lung sounds is difficult for an untrained person. We differentiated true abnormal sounds from interfering noise by investigating the fact that lung sounds are generated periodically in relation to the breathing cycle.

DOI： 10.3169/itej.61.1769

Scopus

researchmap
Using a Knowledge Base to Disambiguate Personal Name in Web Search Results Peer-reviewed

Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

APPLIED COMPUTING 2007, VOL 1 AND 2 839 - + 2007

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ASSOC COMPUTING MACHINERY

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method.

DOI： 10.1145/1244002.1244188

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/sac/sac2007.html#conf/sac/VuMTA07
Disambiguation of people in web search using a knowledge base Peer-reviewed

Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

2007 IEEE International Conference on Research, Innovation and Vision for the Future, RIVF 2007 185 - 191 2007

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method. © 2007 IEEE.

DOI： 10.1109/RIVF.2007.369155

Scopus

researchmap
Query Refinement based on Topical Term Clustering. Peer-reviewed

Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2007, 8th International Conference, Carnegie Mellon University, Pittsburgh, PA, USA, May 30 - June 1, 2007. Proceedings, CD-ROM 2007

▶ More details

Publisher：CID

researchmap
Citation data clustering for author name disambiguation. Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Proceedings of the 2nf International Conference on Scalable Information Systems, Infoscale 2007, Suzhou, China, June 6-8, 2007 62 2007

▶ More details

Publisher：ACM

DOI： 10.4108/infoscale.2007.203

researchmap
Using web directories for similarity measurement in personal name disambiguation Peer-reviewed

Quang Minh Vu, Atsuhiro Takasu, Tomonari Masada, Jun Adachi

Proceedings - 21st International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINAW'072 379 - 384 2007

▶ More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

In this paper, we target on the problem of personal name disambiguation in search results returned by personal name queries. Usually, a personal name refers to several people. Therefore, when a search engine returns a set of documents containing that name, they are often relevant to several individuals with the same namesake. Automatic differentiation of people in the resulting documents may help users to search for the person of interest easier. We propose a method that uses web directories to improve the similarity measurement in personal name disambiguation. We carried out experiments on real web documents in which we compared our method with the vector space model method and the named entity recognition method. The results show that our method has advantages over these previous methods. © 2007 IEEE.

DOI： 10.1109/AINAW.2007.367

Scopus

researchmap
具体性指向単語クラスタリングによる網羅的トピックの発見と検索質問拡張支援

若木裕美, 正田備也, 高須淳宏, 安達淳

電子情報通信学会第17回データ工学ワークショップ (DEWS 2006), 2C-i4 3 2006

▶ More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
A new measure for query disambiguation using term co-occurrences Peer-reviewed

Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS4224 904 - 911 2006

▶ More details

Language：English Publishing type：Research paper (scientific journal) Publisher：SPRINGER-VERLAG BERLIN

This paper explores techniques that discover terms to replace given query terms from a selected subset of documents. The Internet allows access to large numbers of documents archived in digital format. However, no user can be an expert in every field, and they trouble finding the documents that suit their purposes experts when they cannot formulate queries that narrow the search to the context they have in mind. Accordingly, we propose a method for extracting terms from searched documents to replace user-provided query terms. Our results show that our method is successful in discovering terms that can be used to narrow the search.

DOI： 10.1007/11875581_108

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages Peer-reviewed

Tomonari Masada, Atsuhiro Takasu, Jun Adach i

Proc. International Workshop on Web Document Analysis, 2005 (WDA2005) 9 2005

▶ More details

Language：English

researchmap
検索語の曖昧性を解消するキーワードの提示手法

若木裕美, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」137 ( 137 ) 269 - 276 7 2005

▶ More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

CiNii Article

researchmap
共著関係に基づくグラフを用いた書誌情報における著者同定手法の提案と評価

鈴木康平, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」(夏のデータベースワークショップDBWS2005), 2005. ( 137 ) 7 2005

▶ More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Improving Web Search by Query Expansion with a Small Number of Terms. Peer-reviewed

Tomonari Masada, Teruhito Kanazawa, Atsuhiro Takasu, Jun Adachi

Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-5, National Center of Sciences, Tokyo, Japan, December 6-9, 2005 2005

▶ More details

Publisher：National Institute of Informatics (NII)

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/ntcir/ntcir2005.html#conf/ntcir/MasadaKTA05
Decomposing the Web graph into parameterized connected components Peer-reviewed

T Masada, A Takasu, J Adachi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMSE87D ( 2 ) 380 - 388 2 2004

▶ More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

researchmap

Other Link： http://dblp.uni-trier.de/db/journals/ieicet/ieicet87d.html#journals/ieicet/MasadaTA04
R2D2 at NTCIR-4 Web Retrieval Task. Peer-reviewed

Teruhito Kanazawa, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization, NTCIR-4, National Center of Sciences, Tokyo, Japan, June 2-4, 2004 2004

▶ More details

Publisher：National Institute of Informatics (NII)

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/ntcir/ntcir2004.html#conf/ntcir/KanazawaMTA04
Web page grouping based on parameterized connectivity Peer-reviewed

T Masada, A Takasu, J Adachi

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS2973 374 - 380 2004

▶ More details

Language：English Publishing type：Research paper (scientific journal) Publisher：SPRINGER-VERLAG BERLIN

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the Web, page grouping is expected to provide a general grasp of the Web for effective Web search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our method is a generalization of the decomposition into strongly connected components. Each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by a parameter, called the threshold parameter. We call the resulting groups parameterized connected components. The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our method.

DOI： 10.1007/978-3-540-24571-1_34

researchmap
パラメータ化された連結成分分解を用いたWeb情報の有効利用

正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」 (夏のデータベースワークショップDBWS2003), 2003. ( 131(71 ) 22 7 2003

▶ More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
パラメータ化された連結成分分解によるWebページのグループ化

正田備也, 高須淳宏, 安達淳

情報処理学会データベースシステム研究会、情処研報2002 ( 67, DB ) 297 - 304 7 2002

▶ More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
A Package for Triangulations. Peer-reviewed

Tsuyoshi Ono, Yoshiaki Kyoda, Tomonari Masada, Kazuyoshi Hayase, Tetsuo Shibuya, Motoki Nakade, Mary Inaba, Hiroshi Imai, Keiko Imai, David Avis

Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996 V-17-V-18 - 17 1996

▶ More details

Publisher：ACM

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/compgeom/compgeom96.html#conf/compgeom/OnoKMHSNIIIA96
Enumeration of Regular Triangulations. Peer-reviewed

Tomonari Masada, Hiroshi Imai, Keiko Imai

Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996 224 - 233 1996

▶ More details

Publisher：ACM

CiNii Article

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/compgeom/compgeom96.html#conf/compgeom/MasadaII96

▼display all

Misc.

Text Mining for "Zenkyoto Generation"

( 2020 ) 297 - 302 5 12 2020

▶ More details

Language：Japanese

CiNii Article

researchmap
Learning Tasks That Enhance Student Participation in Lecture Class

NIWA Kazuhisa, MASADA Tomonari, FUKUZAWA Katsuhiko, MINE Mariko, YAMAJI Hiroki

Journal of the Center for Educational Innovation Nagasaki University5 ( 5 ) 19 - 24 3 2014

▶ More details

Language：Japanese Publisher：Nagasaki University

General education reform at Nagasaki University has required new pedagogies that enhance student participation in lecture class. The authors addressed this urgent issue by developing widely applicable methods in an interdisciplinary course titled "Information and Society." The course consisted of four lecture series of ICT application, in which 72 students engaged in learning tasks that were designed to facilitate note-taking of key concepts and generalreflection of the lecture content as well as the assessment of their comprehension level. The main instructor edited students' descriptions to put them onto the course site so that the whole class could share the learning and prepare for feedback sessions. Students also responded to questionnaires that were designed to inquire their prior conceptualizations. Future directions using effective learning tasks in lecture class are discussed.

CiNii Article

researchmap

Other Link： http://hdl.handle.net/10069/34322
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

Tomonari Masada

International Journal of Organizational and Collective Intelligence2 ( 2 ) 49 - 62 2011

▶ More details

researchmap
An Automatic Optimization Technique of DMA Transfer and Data Allocation for Reconfigurable Machines

SHIDA Sayaka, DOHI Keisuke, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

The IEICE transactions on information and systems92 ( 12 ) 2127 - 2136 1 12 2009

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Article

researchmap
Evaluation of circuit proliferation method that uses concept of pressure in PCA

ARAKI Yuta, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

IEICE technical report109 ( 320 ) 19 - 24 26 11 2009

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

PCA is hardwired logic with self-reconfigurability that can dynamically modify the structure and append the functionality. However, an area management scheme in a distributed manner must be established in order to leverage the dynamic reconfigurability, which is still a challenging topic for PCA. This paper introduces a simple dynamic circuit construction method like cell proliferation, and proposes a new rule for proliferation. The evaluation results using random graphs show that the new rule can decrease the number of proliferation procedures compared to old rules.

CiNii Article

researchmap
FPGA implementation and accuracy evaluation of a power-supply voltage control circuit

SOEJIMA Masato, SAKEMI Jyunya, SHIBATA Yuichiro, KUROKAWA Fujio, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

IEICE technical report109 ( 198 ) 19 - 24 10 9 2009

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Demands for more steady and more efficient direct voltage power supplies have been increasing in the context of energy conservation measures. Digital DC-DC converters are especially gathering attention because they have a high degree of reliability and flexibility. This paper addresses an FPGA-based control mechanism for a DC-DC converter. This approach enables programmability for a wide range of control algorithms while higher operational speed is anticipated. In this paper, we discuss required arithmetic precision and trade-off relationship between control accuracy and hardware costs through prototype implementation of an FPGA-based DC-DC converter. Evaluation of the prototype system reveals that fixed point arithmetic with 10-bit fraction part is enough in terms of dynamic characteristics. Design of a counter module which uses multiple phase-shifted clock signals to increase the PWM resolution while keeping the system clock frequency low is also discussed.

CiNii Article

researchmap
A Memory Access Optimization Method for Reconfigurable Systems Based on a Multithread Programming Model

DOHI Keisuke, SHIDA Sayaka, SHIBATA Yuichiro, HAMADA Tsuyoshi, MASADA Tomonari, OGURI Kiyoshi

IEICE technical report109 ( 26 ) 61 - 66 7 5 2009

▶ More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

Reconfigurable systems are known to be able to achieve higher performance than traditional microprocessor architecture for many application fields. However, in order to extract a full potential of the reconfigurable systems, programmers often have to design and describe the best suited code for their target architecture with specialized knowledge. The aim of this paper is to assist the users of reconfigurable systems by implementing a programming environment with a multithread model. The experimental results show our translator automatically generates efficient performance-aware code segments including DMA transfer and shift registers for memory access optimization.

CiNii Article

researchmap
A Sub-Petaflops High Performance Computing System using GPUs

HAMADA Tsuyoshi, MASADA Tomonari, SHIBATA Yuichiro, OGURI Kiyoshi

IEICE technical report108 ( 324 ) 17 - 19 21 11 2008

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Article

researchmap
GPU acceleration of multiple topic extraction from images by LDA document model

MASADA Tomonari, HAMADA Tsuyoshi, SHIBATA Yuichiro, OGURI Kiyoshi

IEICE technical report108 ( 324 ) 1 - 6 21 11 2008

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

In this paper, we propose a GPU acceleration of multi-topic extraction from images by using LDA (latent Dirichlet allocation). LDA is originally proposed as a probabilistic model for documents by Blei et al. In recent days, LDA is applied to multimedia information other than documents. We provide the results of experiments where we apply LDA to Professor Wang's 10,000 test images and extract multiple visional topics. We adpot collapsed variational Bayesian inference method for LDA and accelerate this by using Nvidia CUDA compatible GPU devices.

CiNii Article

researchmap
Dimensionality Reduction via Latent Dirichlet Allocation for Document Clustering

MASADA Tomonari, KIYASU Senya, MIYAHARA Sueharu

IPSJ SIG Notes2007 ( 65 ) 381 - 386 3 7 2007

▶ More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

In this paper, we employ the latent Dirichlet allocation as a method for the dimensionality reduction of feature vectors and reveal its effectiveness in document clustering. In the evaluation experiment, we perform clustering on the document sets of Japanese and Korean Web news articles. We regard the categories assigned to each article as the ground truth of clustering evaluation. We compare the clustering results obtained by using the feature vectors whose entries are term frequencies with the results obtained by using the feature vectors whose dimensions are reduced by the latent Dirichlet allocation.

CiNii Article

researchmap

Other Link： http://id.nii.ac.jp/1001/00018810/
Dimensionality Reduction via Latent Dirichlet Allocation for Document Clustering

MASADA Tomonari, KIYASU Senya, MIYAHARA Sueharu

IEICE technical report107 ( 131 ) 381 - 386 2 7 2007

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

In this paper, we employ the latent Dirichlet allocation as a method for the dimensionality reduction of feature vectors and reveal its effectiveness in document clustering. In the evaluation experiment, we perform clustering on the document sets of Japanese and Korean Web news articles. We regard the categories assigned to each article as the ground truth of clustering evaluation. We compare the clustering results obtained by using the feature vectors whose entries are term frequencies with the results obtained by using the feature vectors whose dimensions are reduced by the latent Dirichlet allocation.

CiNii Article

researchmap
Personal Name Disambiguation in Web Search Using Knowledge Base (jointly worked)

Quang Minh VU, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

DBSJ Letters5 ( 4 ) 53 - 56 2007

▶ More details

researchmap
Name Disambiguation in Web Search Using Knowledge Base

MINH VU Quang, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IPSJ SIG Notes2006 ( 78 ) 185 - 192 13 7 2006

▶ More details

Language：English Publisher：Information Processing Society of Japan (IPSJ)

Results of queries by personal names often contain documents related to several people because of namesake problem. In order to discriminate documents related to different people, it is required an effective method to measure document similarities and to find out relevant documents of the same person. Some previous researches have used cosine similarity method or have tried to extract common named entities for measuring similarities. We propose a new method which uses web directories as knowledge base to find out shared contexts in document pairs and uses the measurement of shared contexts as similarities between document pairs. Experimental results show that our proposed method outperforms cosine similarity method and common named entities method.

CiNii Article

researchmap

Other Link： http://id.nii.ac.jp/1001/00018907/
Name Disambiguation in Web Search Using Knowledge Base

VU Quang MINH, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IEICE technical report106 ( 149 ) 143 - 148 6 7 2006

▶ More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

Results of queries by personal names often contain documents related to several people because of namesake problem. In order to discriminate documents related to different people, it is required an effective method to measure document similarities and to find out relevant documents of the same person. Some previous researches have used cosine similarity method or have tried to extract common named entities for measuring similarities. We propose a new method which uses web directories as knowledge base to find out shared contexts in document pairs and uses the measurement of shared contexts as similarities between document pairs. Experimental results show that our proposed method outperforms cosine similarity method and common named entities method.

CiNii Article

researchmap
検索語の曖昧性解消のためのトピック指向単語抽出および単語クラスタリング

若木裕美, 正田備也, 高須淳宏, 安達淳

情報処理学会論文誌（トランザクション）データベース47 ( SIG19 ) 72 - 85 2006

▶ More details

researchmap
Topic-oriented Term Extraction and Term Clustering for Query Focusing

Hiromi WAKAKI, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

IPSJ Transactions on Databases47 ( SIG19 ) 72 - 85 2006

▶ More details

researchmap
Query Ambiguity Indication Using Infrequent Term Cooccurrences

WAKAKI Hiromi, MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IEICE technical report. Data engineering105 ( 172 ) 1 - 6 7 7 2005

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

Conventional search engines are designed mainly for general keyword search. Therefore, in many cases, we can find no appropriate combination of query terms. In this paper, we present a query disambiguation method by using infrequent term cooccurrences. This strategy comes from the following idea : terms appearing with a wide variety of terms cannot establish an independent topic. Based on this hypothesis, terms are weighted. The experimental results show that the terms ranked higher by our method can improve the average precision of Web search when added to the original query terms. As compared with other term ranking methods, our method gives higher ranks to the terms denoting more particular and adequate stuff and referring to more specific concepts.

CiNii Article

researchmap
リンク情報の利用によるWeb検索性能の改善

正田備也, 高須淳宏, 安達淳

情報処理学会論文誌（トランザクション）データベース46 ( SIG8 ) 48 - 59 2005

▶ More details

researchmap
Improving Web Search Performance with Hyperlink Information

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

IPSJ Transactions on Databases46 ( SIG8 ) 48 - 59 2005

▶ More details

researchmap
Decomposing the Web Graph into Parameterized Connected Components

MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IEICE Trans. Information and Systems87 ( 2 ) 380 - 388 1 2 2004

▶ More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

CiNii Article

researchmap
新しい連結性概念とWeb ページのグループ化への応用

正田備也, 高須淳宏, 安達淳

DBSJ Letters2 ( 1 ) 3 - 6 2003

▶ More details

Language：Japanese Publisher：日本データベース学会

CiNii Article

researchmap
A New Notion of Connectivity and its Application to Web Page Grouping

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

2 ( 1 ) 3 - 6 2003

▶ More details

researchmap
Enumerating triangulations in general dimensions

H Imai, T Masada, F Takeuchi, K Imai

INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS12 ( 6 ) 455 - 480 12 2002

▶ More details

Language：English Publisher：WORLD SCIENTIFIC PUBL CO PTE LTD

We propose algorithms to enumerate (1) regular triangulations, (2) spanning regular triangulations, (3) equivalence classes of regular triangulations with respect to symmetry, and (4) all triangulations. All of the algorithms are for arbitrary points in general dimension. They work in output-size sensitive time with memory only of several times the size of a triangulation. For the enumeration of regular triangulations, we use the fact by Gel'fand, Zelevinskii and Kapranov that regular triangulations correspond to the vertices of the secondary polytope. We use reverse search technique by Avis and Fukuda, its extension for enumerating equivalence classes of objects, and a reformulation of a maximal independent set enumeration algorithm. The last approach can be extended for enumeration of dissections.

DOI： 10.1142/S0218195902000980

researchmap
Grouping Web pages based on parameterized connectivity

MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IEICE technical report. Data engineering102 ( 208 ) 137 - 142 11 7 2002

▶ More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

The rapid growth of the amount of information on WWW makes Web search methods based only on textual information more and more unrealistic. In recent years, many researches provide attractive link-based retrieving methods. This paper proposes a method for link-based Web page grouping, which aims to reduce the complexity of following text-based retrievals by enlarging the size of units for those retrievals. This method also makes the granularity of groups controllable by adjusting one threshold parameter. This paper includes the results of preliminary experiments, which clarify the characteristic of proposed grouping method.

CiNii Article

researchmap
パラメータ化された連結性に基づくWeb ページのグループ化

正田備也, 高須淳宏, 安達淳

DBSJ Letters1 ( 1 ) 47 - 50 2002

▶ More details

Language：Japanese Publisher：日本データベース学会

CiNii Article

researchmap
Grouping Web Pages Based on Parameterized Connectivity

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

1 ( 1 ) 47 - 50 2002

▶ More details

researchmap

▼display all

Presentations

Documents as a Bag of Maximal Substrings: An Unsupervised Feature Extraction for Document Clustering

13th International Conference on Enterprise Information Systems (ICEIS 2011) 2011

▶ More details

researchmap
Semi-supervised Bibliographic Element Segmentation with Latent Permutations

International Conference on Asia-Pacific Digital Libraries (ICADL 2011) 2011

▶ More details

researchmap
Infinite Latent Process Decomposition

IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2010) 2010

▶ More details

Presentation type：Poster presentation

researchmap
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

The 1st International Workshop on Web Intelligent Systems and Services (WISS 2010) 2010

▶ More details

researchmap
시간에 따른 의미 변화 인지를 위한 가중치 구조의 적용

2010 IEEK Summer Conference 2010

▶ More details

researchmap
Modeling Topical Trends over Continuous Time with Priors

the seventh International Symposium on Neural Networks (ISNN 2010) 2010

▶ More details

researchmap
An Adaptive Weighting Scheme for Time-dependent Semantic Change Recognition

2010

▶ More details

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

IEA/AIE 2009 2009

▶ More details

researchmap
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Proc. of the Joint Conference on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM) 2009

▶ More details

researchmap
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

ADMA 2009 2009

▶ More details

researchmap
Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

CIKM 2009 2009

▶ More details

Presentation type：Poster presentation

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

2009

▶ More details

researchmap
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

2009

▶ More details

researchmap
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

2009

▶ More details

researchmap
Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

2009

▶ More details

Presentation type：Poster presentation

researchmap
Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

16th International Conference on Computers in Education (ICCE 2008) 2008

▶ More details

Presentation type：Poster presentation

researchmap
Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2008 2008

▶ More details

researchmap
Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

3rd International Conference on Large-scale Knowledge Resources 2008

▶ More details

researchmap
Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

2008

▶ More details

Presentation type：Poster presentation

researchmap
Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

2008

▶ More details

researchmap
Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

2008

▶ More details

researchmap
Clustering Images with Multinomial Mixuture Models.

8th International Symposium on advanced Intelligent Systems (ISIS 2007) 2007

▶ More details

researchmap
書誌情報における著者名の曖昧性解消のためのクラスタリング手法の提案

第18回データ工学ワークショップ 2007

▶ More details

researchmap
Clustering Images with Multinomial Mixuture Models.

2007

▶ More details

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages

Third International Workshop on Web Document Analysis 2005

▶ More details

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages

2005

▶ More details

researchmap
Web Page Grouping Based on Parameterized Connectivity

The 9th International Conference on Database Systems for Advanced Applications 2004

▶ More details

researchmap
Web Page Grouping Based on Parameterized Connectivity

2004

▶ More details

researchmap
パラメータ化された連結性とWebページのグループ化への応用

第2回情報科学技術フォーラム (FIT2003) 2003

▶ More details

researchmap
パラメータ化された連結成分分解を用いたWeb情報の有効利用

夏のデータベース・ワークショップ DBWS2003 2003

▶ More details

researchmap
パラメータ化された連結性に基づくWebページのグループ化

第14回データ工学ワークショップ 2003

▶ More details

researchmap
パラメータ化された連結性に基づくWebページのグループ化

第1回情報科学技術フォーラム 2002

▶ More details

researchmap
パラメータ化された連結成分分解によるWebページのグループ化

夏のデータベースワークショップ DBWS2002 2002

▶ More details

researchmap
Enumeration of Regular Triangulations

12th annual ACM Symposium on Computational Geometry 1996

▶ More details

researchmap
Enumeration of Regular Triangulations

1996

▶ More details

researchmap

▼display all

Professional Memberships

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS

▶ More details

researchmap
INFORMATION PROCESSING SOCIETY OF JAPAN

▶ More details

researchmap

Works

部分文字列の出現頻度を文書の特微量として用いたベイズ的トピックモデルに関する研究

2011

-

2012

▶ More details

researchmap
統計学的ライムを利用した情報ナビゲーション

2010

-

2012

▶ More details

researchmap
外的知識を利用としたッマルチトピック・モデルによる多様なテキスト情報の連結

2010

-

2011

▶ More details

researchmap
「情報処理学会論文誌：データベース(TOD)」編集委員

2007

-

2011

▶ More details

researchmap
テキストの時間情報を利用したマルチトピック・モデルによる文書間・単語間類似度への時間性の導入

2009

-

2010

▶ More details

researchmap
テキストの時間情報を利用したマルチトピック・モデルによる注目すべき話題群の時間的変遷の分析

2008

-

2009

▶ More details

researchmap

▼display all

Research Projects

Topic models bridging between documents as members composing a corpus and documents as sequences composed by words

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

▶ More details

4 2021 - 3 2024

Grant number：21K12017

Grant amount：\4030000 （ Direct Cost: \3100000 、 Indirect Cost：\930000 ）

researchmap
Research on the effectiveness of using RNN in topic models

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

MASADA Tomonari

▶ More details

4 2018 - 3 2021

Grant number：18K11440

Grant amount：\4420000 （ Direct Cost: \3400000 、 Indirect Cost：\1020000 ）

Topic models, including LDA (latent Dirichlet allocation), can automatically extract semantically meaningful themes from a large corpus. However, text analysis using topic models often only considers word frequencies in a document and does not consider the way words are arranged. This work aims to improve topic models with RNN (recurrent neural network) for modeling word order. Several previous studies propose a method for combining RNN with topic models. Therefore, we have tried to propose a new method. As a result, we have proposed a new topic model using NNs (neural networks), where we perform no VAE (variational autoencoder) inference. We instead maximize the target given in the original LDA paper by training NNs in an amortized manner and obtaining posterior parameters as output of NNs. However, we currently only use MLP (multilayer perceptron) and thus have not achieved our goals yet. We now have a plan to replace MLP with RNN or other more recent NN architectures in near future.

researchmap
Exploring Data Processing and Analysis Methods for Predicting Defect Occurrences in Semiconductor Production Lines

Sony Semiconductor Manufacturing Corporation, Japan

▶ More details

4 2019 - 3 2020

Authorship：Principal investigator Grant type：Collaborative (industry/university)

Grant amount：\2730000 （ Direct Cost: \2481000 、 Indirect Cost：\249000 ）

We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurement of the same manufacturing process, which is operated repeatedly in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other because the documents intrinsically show considerable diversity. In contrast, our data, which come from the repeatedly operated manufacturing process, only show limited diversity. As a result, LDA provides topics closely similar to each other. Our main and unexpected finding is that the difference between similar topics is useful in discovering remarkable change patterns. We performed an experiment over the data sets containing sensor measurements collected in the factory. The results have revealed that subtle difference between very similar topics often corresponds to an interesting change pattern of sensor measurements.

researchmap
A Study on Digital Library System for Experimental Information Extraction, Visualization and Recommendation

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Takasu Atsuhiro, Ohta Manabu, Maneeroj Saranya

▶ More details

4 2015 - 3 2018

Grant number：15H02789

Grant amount：\15860000 （ Direct Cost: \12200000 、 Indirect Cost：\3660000 ）

Researchers need to survey research trend in the related research fields in various tasks, such as research planning, research trend analysis, and writing papers. Digital libraries have been playing an important role in providing research papers fulltext. Fulltext search is a main technology for retrieving research papers. This study focuses on experiment information included in papers and developed sequence analysis models for extracting experiment information. We also developed a recommender system for actively providing scholarly information.

researchmap
Tiny data mining: reconstruction of large scale data with probability distributions as bases

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

MASADA Tomonari

▶ More details

4 2014 - 3 2017

Grant number：26330256

Grant amount：\4810000 （ Direct Cost: \3700000 、 Indirect Cost：\1110000 ）

The aim of our research is to make a efficient and effective summary of a large set of documents like news articles, academic papers, novels, etc. When the number of given documents is very large, we can only read a small portion of it. As a result, we may miss the documents containing our favorite topics. Therefore, our research aims to extract word lists from the give document set as a summary. For example, one among the extracted word lists was "game, hit, pitcher, and trade," we can know that there are documents discussing baseball. In this manner, by looking at the extracted word lists, we can know what kind of topics are discussed in the given document set. Furthermore, our research also provides a clue to find which documents are closely related to which word lists. Therefore, we can also find the documents relevant to the word lists we choose. While an existing method called topic modeling is adopted in our research, we propose its new application and its new implementation.

researchmap
A Study on Information Alignment by Composite Generative Model

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

TAKASU Atsuhiro, MASADA Tomonari, FUKAGAWA Daiji

▶ More details

4 2011 - 3 2015

Grant number：23300040

Grant amount：\19890000 （ Direct Cost: \15300000 、 Indirect Cost：\4590000 ）

The purpose of this study is to develop topic models for analyzing information in various aspects. We first develop a topic model for handling time as well as text, where we add timestamps to each document. The model generates both text and timestamps simultaneously. Next we extend the model to treat networked documents where documents are linked each other like citations of academic papers. We apply the models to researcher recommendation systems and empirically show that features extracted by the models are effective for recommendation.

researchmap
Information Navigation using Statistical Rhymes

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

MASADA Tomonari

▶ More details

2010 - 2011

Grant number：22700150

Grant amount：\4030000 （ Direct Cost: \3100000 、 Indirect Cost：\930000 ）

This project is based on the following assumption : Words that co-occur in statistically significant frequency can be used as a guide in useful information navigation system even when those co-occurrences are not based on semantic similarity or relatedness. We call such co-occurrences statistical rhyme. We have been trying to extract statistical rhymes with Bayesian probabilistic models. We consequently succeeded in proposing a new LDA(latent Dirichlet allocation)-like topic extraction method that can give a segmentation of word token sequences appearing in bibliographic data, which we can observe in references section of academic papers or in publications section of researchers' Web sites. Our method split each bibliographic data into the segments each corresponding to different data field, e. g. authors, paper title, journal, pages, publication year, etc. Further, we improved segmentation accuracy by making the inference semi-supervised.

researchmap
Algorithms for sub-pixel analysis of remotely, sensed hyperspectral images

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

KIYASU Senya, MIYAHARA Sueharu, MASADA Tomonari

▶ More details

2005 - 2007

Grant number：17560376

Grant amount：\3740000 （ Direct Cost: \3500000 、 Indirect Cost：\240000 ）

In this research, we developed several algorithms for sub-pixel analysis of land cover for remotely sensed multispectral image. Several techniques of sub-pixel analysis for remotely sensed image have been developed which estimate the proportion of components of land cover in a pixel. However, when the available training data do not correctly represent the spectral characteristics of the categories in the pixel, large errors may appear in the results of estimation.
We developed the algorithm by which a hyperspectral image is analyzed as follows. At first, we provide small size of initial training data and determine pure pixels in the image. In the next step, component spectra are adaptively estimated for each mixed pixel using the surrounding pure pixels. Then the proportions of components in the mixed pixels are estimated based on the determined component spectra. We confirmed the validity of the method by numerical simulation and applied it to remotely sensed multispectral images.

researchmap

▼display all